What is TEI?
"TEI" is short for "Text Encoding Initiative." The TEI was founded in 1987 to develop guidelines for encoding machine-readable texts of interest in the humanities and social sciences. The work is supported by the Association for Computing and the Humanities, the Association for Literary and Linguistic Computing, and the Association for Computational Linguistics, and has received generous grant funding from the Mellon Foundation, the EEC, the National Endowment for the Humanities, and other institutions. The Guidelines, called "P3" were delivered in 1994, and have become the de facto standard for encoding of literary and linguistics texts, corpora, and the like. Of the several hundred scholars that worked on various aspects of P3, many have gone on to contribute greatly to other standards, text corpora, and the theory and practice of text encoding. Work on P4, a complete revision of the original, is ongoing, with the aim of publishing a first new draft by the end of 2001.

TEI Lite is a DTD that includes only a small subset of the whole TEI tagset, selected to include the most commonly needed tags. It is packaged neatly so users do not have to work out which TEI modules to include, or configure the TEI to include them. It is currently the most commonly used subset of TEI.

TEI at UC Berkeley
TEI has been used extensively by the UC Berkeley Library since 1996, when the first oral histories were encoded and made available online. Six years and over a thousand documents later, the importance and use of TEI has increased enormously. The Library has generally avoided extending and customizing TEI primarily to facilitate electronic interchange between collaborating institutions and to leverage and share existing toolsets, stylesheets, and search utilities. To encourage encoding uniformity the Library has created a set of local Best Practices guidelines which is being adopted by the various departments involved in TEI encoding projects. These guidelines evolve to meet the variety of requirements introduced with each new encoding project. In the near future UC Berkeley hopes to adopt a broader set of guidelines being developed by the California Digital Library:

  • Best Practices Guidelines for Encoding Oral Histories in TEI
  • Best Practices Guidelines for using TEI Lite

TEI at the Regional Oral History Office
Since 1996 The Bancroft Library's Regional Oral History Office has been making transcriptions of their oral histories available online. Because of the importance of recording information relating to the interviewers and interviewees, these documents have been encoded using the full TEI. Full TEI contains elements such as the <particdesc> used to encode speakers, voices, and other identifiable participants in a verbal interaction but TEI Lite does not provide this element. TEI-encoded oral histories include digital images taken from the pages of the original source documents, and efforts are now underway to include digitized portions of the recorded conversations themselves as well as brief videos of the interviews.

Other Projects using TEI
A number of other grant-funded collaborative projects make extensive use of TEI at UC Berkeley. Among these are:

These projects all make archival and manuscript holdings of numerous California archives and museums available online. These archives contain personal diaries, letters, photographs, and drawings. The JARDA project also makes available War Relocation Authority materials: camp newsletters, final reports, photographs, and other documents relating to the day-to-day administration of the camps. With the exception of a number of oral histories, these documents are encoded in TEI Lite. Source documents are photocopied and sent to a vendor for keying. They are returned encoded in TEI Lite where they are further refined and proofed by staff. The level of encoding required by such documents is relatively simple. While special care is taken to encode metadata crucial for searching and gathering together documents in related subject groups, the greater portion of these documents are minimally encoded.

TEI and Making of America II and METS
The Making of America II testbed project continues and extends research and demonstration projects that have begun to develop best practices for the encoding of intellectual, structural, and administrative data about primary resources housed in research libraries. One of the standards that has arisen as a result of this project is the Metadata Encoding and Transmission Standard (METS), a Digital Library Federation Initiative to create a standard method for representing digital objects such as images, text, audio and video recordings, etc., using XML. One application of this technology links digitized page images with their corresponding transcriptions encoded in full TEI or TEI Lite.

