For this assignment, you will index a small corpus of texts. You will also pose several queries against your index, noting their success. Bring a printed copy to class on June 24, 2009.
You will turn in two items for this exercise:
At the end of this exercise you will find a list of ten "documents". These are titles from conference articles. Your task is to build an inverted index for these documents (of course, using only words from the titles).
When creating your inverted index, you should include for each term: the term itself, the documents that it occurs in, along with the number of occurrences in those documents. Here are two sample entries:
Term | Doc Numbers |
---|---|
retrieval | 1:1, 2:1, 3:1, 4:1, 5:1, 6:1 |
seeking | 8:1, 9:1 |
The table above shows that the term retreival occurs once in documents 1, 2, 3, 4, 5, and 6 while the term seeking occurs once in documents 8 and 9. The list of postings should be ordered numerically by document number.
Here are several heuristics that you should follow when constructing your index:
For the sake of clarity, your index should be arranged alphabetically.
Document ID | Title |
---|---|
1 | Automatic text processing: the transformation, analysis, and retrieval of information by computer |
2 | Information retrieval: data structures and algorithms |
3 | Relevance feedback in information retrieval |
4 | Information filtering and information retrieval: two sides of the same coin? |
5 | Real life information retrieval: a study of user queries on the Web |
6 | A case for interaction: a study of interactive information retrieval behavior and effectiveness |
7 | Real life, real users, and real needs: a study and analysis of user queries on the web |
8 | Dynamic queries for visual information seeking |
9 | What Are They Doing with the Internet? A Study of User Information Seeking Behaviors. |
10 | A longitudinal study of World Wide Web users' information-searching behavior |
Home | Assignment 1 | Assignment 2 | Assignment 3 | Assignment 4 | Assignment 5 | Final Project |