<

LIS 506 Assignment 4 - Information Retreival (IR)

Introduction

For this assignment, you will index a small corpus of texts. You will also pose several queries against your index, noting their success. Bring a printed copy to class on June 24, 2009.

Deliverables

You will turn in two items for this exercise:

  1. An inverted index
  2. Written responses to the problems posed in this document

Inverted Index

At the end of this exercise you will find a list of ten "documents". These are titles from conference articles. Your task is to build an inverted index for these documents (of course, using only words from the titles).

When creating your inverted index, you should include for each term: the term itself, the documents that it occurs in, along with the number of occurrences in those documents. Here are two sample entries:

Term Doc Numbers
retrieval 1:1, 2:1, 3:1, 4:1, 5:1, 6:1
seeking 8:1, 9:1

The table above shows that the term retreival occurs once in documents 1, 2, 3, 4, 5, and 6 while the term seeking occurs once in documents 8 and 9. The list of postings should be ordered numerically by document number.

Here are several heuristics that you should follow when constructing your index:

  1. Set all letters to lower case
  2. Remove stop words (consult the stop list at: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop
  3. You may handle punctuation however you like, so long as your treatment is consistent.
  4. You may handle truncation however you like, as long as your treatment is consistent.

For the sake of clarity, your index should be arranged alphabetically.

Problems

  1. Which documents would be returned for the query: information AND retrieval
  2. Which documents would be returned for the query: information AND retrieval AND user
  3. Which documents would be returned for the query: information AND (seeking or (not retrieval)) AND (user OR interaction)
  4. Imagine that a searcher is only interested in documents about real users information seeking behavior. Given full knowledge of his interest, documents 1-4 are non-relevant, while documents 5-10 are relevant. Calculate the precision and recall for each of the queries in problems 1-3.

The Documents

Document ID Title
1 Automatic text processing: the transformation, analysis, and retrieval of information by computer
2 Information retrieval: data structures and algorithms
3 Relevance feedback in information retrieval
4 Information filtering and information retrieval: two sides of the same coin?
5 Real life information retrieval: a study of user queries on the Web
6 A case for interaction: a study of interactive information retrieval behavior and effectiveness
7 Real life, real users, and real needs: a study and analysis of user queries on the web
8 Dynamic queries for visual information seeking
9 What Are They Doing with the Internet? A Study of User Information Seeking Behaviors.
10 A longitudinal study of World Wide Web users' information-searching behavior

Home Assignment 1 Assignment 2 Assignment 3 Assignment 4 Assignment 5 Final Project