Meaning-based computing: A broad church

04 Sep 2006

New search technologies promise to put an end frustrating searches that don''t quite satisfy user requirements. By Rajiv Singh

In an age of information overflow, information retrieval or search is increasingly being acknowledged as a critical issue, and the focus on search technologies is stronger now than at any time before. Today, search is one of the most challenging problems faced by individuals, business organisations and government and related organisations.

The transformation that is occurring in the area of ''search'' is indeed radical. As computers become more pervasive in our daily lives, increasing fluidity or more human-like interactions, will be sought with these machines.

Finding information is a need not just for businesses, but also at an individual level. This is particularly true today as increasing number of personal tasks — be it buying a product or getting directions or phone numbers — begin to go online. A basic search box, with no further interaction than the feeding of a keyword, is no longer enough to serve anybody''s purpose.

A large number of search enhancement tools are now talking of context. "Not so long ago, content was king. In 2006, context will reign," so says Susan Feldman, vice president, content technologies research, IDC. Context is just a term, however, and its varied usage refers to a diverse range of ideas — from domain-specific search engines to personalisation. At a general level, however, search engines have to address a specific problem — the fact that people are never going to ask the right questions, or, given the limitations of search technologies, the right answers are never going to be thrown up.

The quiet revolution
For an analyst, such as Susan Feldman, the advent of search technology issues is part of a larger, ongoing development in the last decade. Discussing the issue in an article, Search: the quiet revolution, Feldman says, "Search technology has been around for more than four decades. But only in the past ten years, as the worldwide web has become an integral part of the technology landscape, has it occupied a prominent place in our work and personal lives. And only in the past four years has search finally become a hot and lucrative area of technology development. Why the delay?" Feldman''s reasoning, as enumerated below, would show that some kind of a historical conjunction may now have been reached, allowing search technologies to emerge at the forefront of technology related issues.

According to Feldman, to be truly effective, search technologies require computing power to sort through massive amounts of text or data. This has now been achieved, she says, even with our desktop machines. With computing power no longer a problem, she points out, sophisticated language analysis and complex matching algorithms are now coming into play in aid of search technologies.

An added spur, in the advancement of such technologies, has been the discovery by companies that lost information puts them at risk for non-compliance. Negative issues such as these are not the only reason why information access is a critical issue for firms today. The fact is that search technologies are also turning out to be a positive aid in product development and decision making processes.

Finally, as Feldman points out, our professional and personal lives have been transformed in the past decade at an individual level. Increasingly, we are looking for tools that will allow us to sustain both professional and personal tasks - from scheduling meetings with a client, to buying movie tickets for the family.

Even as a historical conjunction of demand, technology and computing power may now have come to pass, Feldman says that the kind of access to information that individuals and organisations require may still be some distance away. What is lacking, she says, is a deep understanding of information interactions, and how to automate them effectively.

Unstructured information
Language is the basis of all our interactions, be it with the computer or in the cyberspace, and language, as Feldman points out, is "complex and ambiguous." This is where frustrations with the sophistication of "search" technology, as they exist, begin to be experienced by individuals and organisations alike. This is also the point at which a company such as Autonomy Corporation now enters the picture.

Late last month, content management and enterprise search specialist, Autonomy Corporation Plc, announced its half year results up to 2006, and reported a trebling in sales and a 351 per cent jump in operating profits to $27.9 million. A company release quoted Autonomy founder and chief executive, Dr Michael Lynch, as saying "…during the past year unstructured information issues had gone "prime time"."

Autonomy says that the last few years have seen an explosive growth in the use of unstructured information, which definition would include documents, emails, telephone conversations and multimedia. According to Autonomy, unstructured information has traditionally been difficult for computers to understand and use. Interestingly, it says that more than 85 per cent of all information inside an enterprise is now unstructured.

Of course, unstructured information issues are what Autonomy specialises in, a space that it defines as Meaning-Based Computing (MBC). According to Autonomy, MBC enables "…computers to understand the relationships that exist between disparate pieces of information and perform sophisticated analysis operations with real business value, automatically and in real-time." The disparate pieces of information that MBC can act on are things like email and documents, as well as PDFs, voice over IP and other types of content.

As Lynch, explains, information in the IT world is divided into two distinct groups: structured material that goes in relational databases, and all the unstructured material that doesn''t fit into IT infrastructures very well. It is this unstructured material that is now exploding in terms of usage.

According to Autonomy, MBC solves the problem of accessing unstructured information by applying a new breed of applications "…which not only uncovers, but also makes sense of the 85 per cent of enterprise information that remains hidden to all other technologies including keyword search engines and relational databases." Autonomy says MBC would also be an active aid in tracking illegal activity.

Accessing these disparate pieces of information in a meaningful way is perhaps what Feldman is referring to when she speaks about information interactions.

A broad church
According to Lynch, unlike structured data, which allows IT to automate, the problem with unstructured data is that it has not proved to be amenable to automation - so far. The aim of MBC, Lynch says, is to enable companies to attempt a similar process with unstructured information.

According to Lynch, a variety of technologies go into MBC, from speech recognition to text understanding, creating platforms that take the search and interactivity process yet another step ahead. In an interview with James Murray (ITWeek, 21st Aug, 2006) Lynch refers to this application of technologies as "a broad church."

It''s an interesting analogy that Lynch uses, for the origins, of what he refers to as MBC, have a quaint connection to churches, vicars and some very interesting English history. A 18th century English country vicar and mathematician, Thomas Bayes, set about trying to prove the possibility that God exists and arrived at a theorem that discussed the mathematical probability of things.

Lynch says that MBC traces its origins to Bayes''s enterprising work and also builds on the work of Claude Shannon, ''the Father of Information Theory'', whose Principles of Information enable identification of the patterns that naturally occur in text. These two sets of research, says Lynch, lie at the heart of MBC. The underlying pattern-recognition algorithm, derived from Bayes'' formulations, enables computers to comprehend context, generalize from words to an idea, and according to Lynch, grasp the root concepts beneath the play of syntax..

However, for now, Lynch says, " …we are just at the beginning of this movement, but in a few years time you will see unstructured information used and processed all over the place."

Currently, Lynch confesses, MBC is dominated by enterprise search, but the technologies being developed to aid enterprises will soon begin to have a wider use. One such radical technology is implistic query. Instead of pausing one''s work and seeking the help of a search engine with a query, implistic technology would not only read what is on the screen at any time, be it an email or a web page, but with the press of one key understand what is on the screen and summon up related information. Hyperlinking is another interesting technology, which would provide links to internal and external information based on whatever one is working on. Smart or Active folders would do filing by themselves allowing, for instance, all documents related to Autonomy to be filed under that company''s name. This, says Lynch, is the direction in which search technology is broadly headed.

Even as MBC evolves, Autonomy says that more than 16,000 blue-chip corporations and government agencies are now accessing the pattern matching algorithms in its products to extract meaning from unstructured information. These range from the US Department of Homeland Security, which is using MBC across 21 agencies to monitor suspected terrorist groups, the Ford Motor Company which is using the company''s applications to transform the text, audio and video files in its research libraries into meaningful reference material in order to speed up work on new projects, as well as a financial giant like Zurich Financial Services, which is using MBC applications to prioritise research emanating from more than 500 of its sources, for its risk managers.

John Deere, Sprint, Whirlpool, Saab Ericsson, LexisNexis China and RioTinto are the major new clients that Autonomy sold its software to in the second quarter. The company has also signed new business with multiple government, defence and intelligence agencies around the globe, including the US, the UK, the Netherlands, France, Italy and Singapore.

The new additions join a roster of existing clients that includes BAE Systems, Boeing, Daimler Chrysler, Shell, AOL, BBC, Reuters, Hutchison 3G, Ericsson, T-Mobile, Philips, Coca Cola, Kraft Foods, Nestle, Lloyds TSB, GlaxoSmithKline, KPMG, Citigroup, ABN AMRO, Deutsche Bank, Nomura and the US Securities and Exchange Commission.

According to Feldman, Search technologies, such as text mining, text analytics, categorisation, speech analysis, and translation will eventually be embedded in a majority of people-facing applications, such as cell phones, cars, gas pumps, home entertainment centres, call centres and transit systems.

Feldman feels that as the Search needs of people and organisations continue to develop, and get more complex, more than a search engine what perhaps really is the need of the times is a true information discovery platform, incorporating all search related technologies. Given the pace at which IT technologies develop, that may occur sooner than one may think.