ALCTS / CCS/ Cataloging and Classification Research Discussion Group
Saturday, June 15, 2002
11:30 a.m. – 12:30 p.m.
Atlanta Hilton, Clayton Room

FRBR: Application of the Entity-Relationship Model to Humphry Clinker
Presented by Dr. Edward O’Neill, Research Scientist, Office of Research, OCLC

(Notes taken by Judith Hopkins with the aid of a copy of Dr. O’Neill's draft paper which he kindly provided me; these notes are posted with his permission)

The Functional Requirements for Bibliographic Records (FRBR)(ISBN 3-598-11328-X) is the result of 6 years of work by an IFLA study group. The process started with an international conference in Stockholm in 1990 that looked at how the world of bibliographic data had changed in the last half century: the growth of shared cataloging databases, the role of publishers and distributors in providing bibliographic data, the role of electronic publications, etc. This led to the recognition that these kinds of changes were stretching traditional practices of cataloging. While cataloging rules and practices had changed over years to accommodate new types of materials, this change had not been done in a principled way. The rules were adjusted to deal with new situation rather than by establishing general principles. The IFLA study group was to look at what we do when we catalog, what kinds of information we record, how necessary that information is, etc. and to build a conceptual model of how bibliographic data works that could be used as the basis for a more principled look at cataloging use. The study group identified four tasks that users of all types (including library staff) perform:

The FRBR model defines three groups of entities and describes the relationship among these groups of entities).
  1. The products of intellectual or artistic endeavor (Work, expression, manifestation, item)
  2. Those responsible for the intellectual or artistic content (Person or corporate body)
  3. Those that serve as the subjects of intellectual or artistic endeavor (concept, object, event, and place; persons and works can also be the subjects of works)

Past catalogs have tended to be flat sequences of individual items which didn’t show hierarchical relationships. The potential for the FRBR model is to assist in developing the relationships to enable users to deal with smaller result set instead of having to deal with hundreds or thousands of bibliographic records.

The study on which Dr. O’Neill reported focused on the Group 1 entities: Work, Expression, Manifestation, and Item. These entities represent two different aspects of user interest: the intellectual endeavor and the physical manifestation. Work and Expression are abstract. Work is abstract idea, e.g., the intellectual concept of Hamlet. Expression is still somewhat abstract: the artistic or intellectual realization of work. Manifestation: physical embodiment of an expression of a work. Item: a single example of a manifestation.

In his PowerPoint presentation Dr. O’Neill provided examples of each of these types of entities.

The OCLC study ( focused on a single work, The expedition of Humphry Clinker by Tobias Smollett (a novel first published in 1771 in the form of overlapping letters) to study the benefits and drawbacks of creating a FRBR entity-relationship model. The goal of the study was to go beyond organizing bibliographic records to organizing the bibliographic objects represented by the bibliographic records.

Humphry Clinker was selected for several reasons:

It was assumed that if the FRBR entity-relationship model could be successfully applied to Humphry Clinker, it could be successfully applied to a broad class of similar works.

In December 2001 OCLC’s WorldCat was searched for all possible bibliographic records for Humphry Clinker. The initial search result had a very high recall but very low precision so, using the FRBR definition of a work, the results were carefully reviewed to remove records that were not part of the Humphry Clinker work. This resulted in 179 records being identified, including 14 records for microforms and 8 records for translations.

A number of the manifestations were examined and a digital camera was used to capture key pages: title page, verso, first page of text, last page of text, a particular pre-selected letter, the first page of any supplemental material, illustrations, and other pages that could help differentiate between similar manifestations. In all 38 books were examined and almost 600 digital photographs were taken.

After a review of the contents of the bibliographic records and the examination of the books it seemed clear that, except for the translations, the original text of Humphry Clinker had not been significantly changed. Changes to the original text involved correcting minor errors, replacing the long “f” (long s), repositioning the date on letters, and moving chapter headings to the top of the page. If one applied a strict definition of expression any of these changes could be considered sufficient to create a new expression. However the long s could be considered simply a typeface and, since the other errors were created during the type setting phase of the manufacturing process, it could also be argued that these changes would produce a new manifestation instead of a new expression

Other changes however were intentional and should be considered different expressions. Most of the intentional changes occurred by supplementing the original text with additional material such as acknowledgments, bibliography, biographical note, chapter titles, chronological tables, dedication, glossary, illustrations, introduction and/or forward, list of illustrations, maps, notes, publisher’s note, table of contents, textual notes, reproduction of original title page, and reviews.

The significance of this supplemental material varies. Some is rather minor in importance, e.g., acknowledgments and dedications, and is not likely to cause a reader to seek out this particular manifestation. Features of these types are rarely reflected in bibliographic records yet under the strict FRBR definition of Expression the addition or change to any of this supplemental material is sufficient to create a new Expression.

Introductions, notes, and other similar supplements were the most significant and were generally attributed to an editor. At least twenty-three different editors have contributed to Humphry Clinker of which fourteen were used as added entries in any of the bibliographic records.

Many of the illustrators were respected artists. While sixty-seven bibliographic records were identified as illustrated in the physical description field, less than one-third of the records for illustrated editions identified the illustrator.

While bibliographies were usually cited in a note, the notes rarely contained sufficient information to determine if the bibliographies in different manifestations were the same.

These inconsistencies in the bibliographic records (based on use of different cataloging rules and AACR2's emphasis on relative rather than absolute significance) are a serious impediment to identifying expressions. One example of the latter results from the “Rule of Three” where, depending on how many editors are named on the title-page, an added entry can be made for an editor's contribution in one manifestation but not in another even though that editor's contribution may be identical in both.

After they had completed the broad overview of the work, the next step for the OCLC researchers was to identify an Expression and a manifestation for each of the sample bibliographic records. The original unaugmented Expression was identified as the “original”; the other expressions were named for their editor(s) and/or illustrator(s). Where there were multiple expressions with the same editors, edition numbers were also used.

Manifestation were named for their publisher and, if necessary, their date of publication.

Of the 179 records 157 were for English print editions (minus 22 for translations and microforms). All relevant details for each of these were entered in a spreadsheet. The forty-eight different expressions identified fell into four distinct groups: the original, the edited, the illustrated, and the translated. The original expression had 43 manifestations represented by 49 bibliographic records. These manifestations were the result of the expression either being published by a new publisher or by being republished with the type being reset.

There were eight translations into seven languages, each with a single manifestation. The remaining thirty-nine expressions were the result of either editors or illustrators or both.

The FRBRization results were quite different from the initial findings derived solely from the information on the bibliographic records. In the latter, the most reliable indication that two records represented different expressions was that their added entries were different, with an added entry indicating that an edition had been edited, translated, or illustrated. Of the 157 English language records analyzed, 44 had one or more personal name added entries. However, an additional 32 edited and/or illustrated records were found by examining the statement of responsibility and two more were identified through the notes. An additional 20 records were identified as edited or illustrated by examining the books themselves. Overall, 108 of the English language records represented edited and/or illustrated editions but only 44 could be easily identified from the bibliographic records. Any simple algorithmic approach would incorrectly treat these hard-to-identify expressions as the original expression. These unidentified expressions would effectively be lost, undifferentiated from the original expression.

Based on the examination of many of the books and the comparison of a book to its bibliographic description it became clear that the bibliographic records simply do not contain sufficient information to reliably identify expressions. Distinctions based solely on the content of the bibliographic records will fail to identify a significant number of expressions and will create duplicate expressions based on different cataloging practices rather than on any real differences between the books.

In applying the FRBR entity-relationship model to bibliographic records the study identified several ambiguities that confounded the FRBRization process. The FRBR report provides an unambiguous definition for expression but then proceeds to allow for flexible interpretations. The IFLA report does not adequately consider the impact of such flexibility in a shared cataloging environment where consistency can be more important than flexibility.

While the FRBRization process identified 114 manifestations they were expressed in 165 bibliographic records, i.e., there were 51 duplicate records. A large number of duplicate records potentially could limit the functionality of the FRBR entity-relation ship model.

Identifying expressions was problematic and raised the question of whether they are valid entities. While some expressions, e.g., translations, are distinct and identifiable, most of the expressions observed for Humphry Clinker were not. Determining if two manifestations embody the same expression proved to be very difficult. Bibliographic records rarely contained sufficient information to reliably distinguish expressions.

Is the difficulty of identifying expressions a result of an overly strict definition? Conceptually, considering any modification to the content no matter how minor to result in a new expression makes sense. The work is a distinct intellectual creation, the expression is the set of all items with identical content, and the manifestation is a distinct physical unit. In practice, however, it is very difficult to determine if two manifestations have identical content. Even if it could easily be determined when the content was identical, new expressions would be created from changes so minor that most readers would not notice them.

Changing the definition of expression to require significant changes would reduce the problem of trivial expressions but quite likely would raise other problems. Some contributors would be named in statements of responsibility, others may have ‘signed’ contributions, and others could be completely anonymous. Add translations to the mix and the difficulty of finding a way to equate the variety of changes becomes very complex. However, unless these changes are equated in a meaningful way, moving beyond the no matter how minor standard will be difficult. Building an entity-relationship model that includes expressions may be neither practical nor conceptually sound.

If expressions were dropped from the FRBR model the model would be greatly simplified but with a significant loss of functionality. What are the alternatives to expressions? At least for Humphry Clinker the increased use of added entries would appear to be an effective way to identify expression-like changes. Increased use of added entries with the role of the contributor explicitly identified would be effective in differentiating among manifestations with different supplemental material. The inclusion of an added entry for all identifiable contributors would require minimal effort and, at least for Humphry Clinker, would meet the need served by expressions. For Humphry Clinker, replacing the expression in the FRBR model with additional manifest ation attributes simplifies the model without any loss of functionality.

The FRBR model provides a powerful means to improve the organization of bibliographic items particularly with large works such as Humphry Clinker. Works are a valuable concept and can be reliably identified from existing bibliographic records. Identifying Expressions, however, is far more difficult. In the example of Humphry Clinker the set of expression created from the existing bibliographic records is very different from the set based on physical examination of the books themselves. Existing bibliographic records do not contain enough information to consistently associate the records with expressions. While conclusions based on a single work are risky, it is unlikely that the problems encountered with Humphry Clinker are unique. Clearly many of the difficulties are the result of the size of the work; smaller works are likely to present far fewer problems. The irony is that the FRBR model provides minimal benefits to the small works that can be reliably FRBRized but fails on the large and complex works where it is most needed.