Information Theory Ramblings: Information Retrieval: Section 2

2 Information Retrieval

In the sense that it will be discussed here, information retrieval can be defined as such. According to Manning (2009):

Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). (p. 25)

According to an article written by Galeas, Kretschmer, and Freisleben (2009), the information retrieval
process is generally explained in two main stages: the indexing stage and the query evaluation stage. In the first stage, the indexing stage, “the documents of a collection are processed to generate a database (index) containing the information about the terms of all documents in the collection” (p. 1). In the indexing stage the documents that are going to be processed are being organized and labeled in such a way that they can be measured. This is where the information is given a representation. What they are being measured against is the query that is entered by the user in the second stage, query evaluation. In the query evaluation stage “the user sends a query to the system, and the system responds with a ranked list of relevant documents” (p. 1). When the query is sent to the system it is also given a representation. The system then measures the query against the collection of documents and checks for relevance criteria. The result is a list of relevant documents ranked by relevance. This is the general process of information retrieval. The function of this process that determines how the documents and query are represented, how the relevance is measured, and when any calculations are made is the IR model. Standard models typically use the frequency of terms as the primary relevance criterion. There is also a method of finding relevant documents by adding in a process to take the position of a term into account. With this additional process, if a certain term is in a cluster or spread out, that will be measurable. Though this increases the overall relevance of documents to the query, it slows the system response time down immensely. As the second stage is the user interaction stage, this is undesirable (Aboutajdine 2010). This fact motivates the desire for models that place the bulk of the algorithms and calculations in the indexing stage so that the user receives both accurate and fast results.

--------------------------------------

Abderrahim, E. Q., Aboutajdine, D., & Ennouary, Y. (2010, November 2). Formal Concept Analysis for Information Retrieval. International Journal of Computer Science and Information Security, 7.

Galeas, P., Kretschmer, R., & Freisleben, B. (2009, October 10). Information Retrieval via Truncated Hilbert Expansions. International Conference on Information Retrieval, 1, 1938.

Manning, C. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.

------------------------------------
Thanks for reading!

Dustin Smith
Ventown Inc.
My Webpage
LinkedIn Profile

Information Theory Ramblings

Wednesday, February 16, 2011

Information Retrieval: Section 2

No comments:

Post a Comment