Leveraging "The Wisdom of the Crowds" for Efficient Tagging and Retrieval of Documents from the Historic Newspaper Archive of the New York Public Library

Funding for this work is provided by the National Endowment for Humanities, NEH HD-51153-10. The project has been designated a "WE THE PEOPLE" project.

Personnel | Collaborators | Summary | Publications | BODHI Project Details

Alumni Collaborators
Project Summary

Computers may have defeated humans in chess and arithmetic, but there are many areas where the human mind still excels such as visual cognition and language processing (Comm. of ACM, Vol 52, No 3, March ’09). If one mind is good, it has been argued that several minds are likely to be superior in certain tasks than individuals and even experts. This project aims to leverage the wisdom of the crowds (von Ahn, 2008) to collaboratively tag historical newspaper articles in the holdings of the New York Public Library1 (NYPL). Patrons and scholars will be encouraged to generate custom tags for articles they read and use often; these will be integrated into a meta-data library and evaluated for their contribution to improving retrieval performance. The text in the newspaper articles along with user-generated tags will be subjected to statistical analysis and machine learning for automatic categorization. The creation and analysis of this corpus is likely to enable advanced search mechanisms on these holdings making them more useful to the general public.

BODHI Project Details

System Architecture Diagram
Screen shots of the OCR corrector