Information Retrieval Technologies in Modern Search Engines.
At the heart of modern search technology is the not so modern science of Information Retrieval or IR. Wikipedia defines IR:
Information retrieval (IR) is the art and science of searching for information in documents, searching for documents themselves, searching for metadata which describes documents, or searching within databases, whether relational stand alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. There is a common confusion, however, between data retrieval, document retrieval, information retrieval, and text retrieval, and each of these have their own bodies of literature, theory, praxis and technologies. IR is a broad interdisciplinary field, that draws on many other disciplines. Indeed, because it is so broad, it is normally poorly understood, being approached typically from only one perspective or another. It stands at the junction of many established fields, and draws upon cognitive psychology, information architecture, information design, human information behavior, linguistics, semiotics, information science, computer science and librarianship.A simplified definition is given by Cornell University:
Searching a body of information for objects that match a search query.By some definitions we can trace the roots of IR to ancient times and note such events as alphabetization and the table of contents as historical landmarks. However for our purposes we will define IR as:
The art and science of organizing, storing and retrieving information contained in a large corpus of documents via electronic databases.
Our definition of IR is centered primarily on the brilliant work of Gerard Salton, who is considered by many to be the father of Information Retrieval.
The following articles present a brief exploration of various IR technologies and design concepts in use by modern large-scale internet search engines. No specificity is implied in regards to particular search engines, rather these articles serve as an introduction to some of the traditional IR technologies that are being used as the conceptual foundation of modern search engines.
Next: Primary Software Architecture
Back to Research Library


