Best Practices
Encoding tools
EAD history

Encoding Standard for Electronic Finding Aids

About EAD
EAD at Berkeley

April 2001-2002: EAD 2002
August 1998: Release of Version 1.0 of the EAD DTD
April 1996-December 1996: Development and Release of the Beta DTD
January 1996: Alpha Release
August 1995-December 1995: SAA, LC, and CLR
July 1995: The Bentley Fellowship Program
1993-1995: The Berkeley Finding Aid Project (BFAP)

About EAD
Encoded Archival Description (EAD) is the standard for archival finding aids which is supported by the Society of American Archivists and the Library of Congress.

Finding aids are inventories, registers, indexes or guides to collections held by archives and manuscript repositories, libraries, and museums. Finding aids provide detailed descriptions of collections, their intellectual organization and, at varying levels of analysis, of individual items in the collections. Access to the finding aid is essential for understanding the true content of a collection and for determining whether it is likely to satisfy a scholar's research needs.

EAD makes it possible to provide access to archival finding aids in a platform-independent electronic format. Access to finding aids through the Internet will assist scholars in determining whether collections contain material relevant to their research and will make it possible for primary sources to be used in K-12 classrooms.

The Online Archive of California (OAC) is the location for a growing collection of finding aids for the archives and special collections of the UC System and for a number of other institutions. More information about the history of the EAD is available from the Library of Congress EAD Page.

EAD at Berkeley
Since 1995 the UC Library has employed a wide variety of techniques to encode our legacy finding aids into SGML. This reflects the wide variety of formats these documents were in. As we began our retrospective conversion with The Bancroft Library's electronic finding aids—authored originally in WordPerfect—we began by employing WordPerfect macros of varying sophistication.

Since the beginning of the project we have utilized the technique of stepwise refinement to encode legacy finding aids. A practice we have continued to this day. Stepwise refinement involves beginning the encoding process by adding "coarse" markup, essentially fitting the legacy information into a broad hierarchical structure consisting of little more than component information. The a variety of techniques are employed to add more markup of an increasingly finer granularity, e.g., next adding the <unittitle> information, then encoding <unitdate>s, etc. Most of these subsequent passes were performed also using WordPerfect macros, but as the project progressed the perl programming language was employed.

Today we employ Perl and have created a small toolkit of simple perl programs. The kit is composed of several small scripts useful for stepwise refinement including scripts to recognize and encode <unitdate>, <persname>, and <corpname> within <unittitle>. The toolkit also includes a preconfigured parser (nsgmls) used to validate each and every finding aid before it is submitted for publication on the Online Archive of California (OAC). As we submit all our Finding Aids to the OAC the toolkit is now housed there at

Before long we found that we could more efficiently encode a finding aid's "front matter"—that is, all of the information not occurring within the <dsc>—through a standard web template. This proved faster than trying to create macros or specialized programs to accommodate the wide variety of layouts in the finding aids produced by the eight contributing repositories at UC . The templates can be seen in action at: and the cgi script we use is available for anybody else to use part of the toolkit.

Curiously, we have found that using commercial SGML editors such as AdeptEdit, Author/Editor, or XMetaL, was not an efficient way to convert legacy information into EAD. Although each member of the Digital Publishing Group has copies of XMetaL installed, we find it useful solely as a reference tool, particularly while bringing new encoders up to speed in EAD. It is far faster to programmatically convert text to EAD in broad strokes than to apply the copy and paste method required when using these editors. XMetaL may have a role in the authoring of new finding aids, but much customization—mainly in the form of targetted dialog boxes and refinement macros—needs to be done before finding aid authors can consider it a viable replacement for their trusted word processing program.

After we completed conversion of all of our word processing files for legacy information held by Berkeley and by many of the affiliates of the Online Archive of California, a process funded by a variety of grants, we turned our attention to all of the legacy finding aids available only on paper. These we contracted out to a conversion vendor, Apex Data Services, which keyed the data and generated EAD. This EAD was then further refined in house when the data was returned. Our experience with employing an outside vendor for the process was fairly good, far better than our earlier experience using scanning and OCR in-house. Most finding aids required very little editing and correction but a small few of the more complex variety required great deals of time to bring up to local standards.

We are investigating a variety of options for incorporating EAD directly into the authoring process, including a complete suite of MS Word templates and macros, dubbed "EAD Stylus", and available as part of the toolkit. Another option is to more fully integrate EAD into the Generic Digital Projects Database, developed initially for UC's role in the Making of America II project. The Generic Database was designed to accommodate the workflow and data entry for Berkeley's variety of digitization projects including images, electronic text, sound files, moving pictures, etc. As it was intended to accommodate hierarchical description and produce arbitrarily generic output, it was easily adapted towards EAD.

Relational databases have taken on a larger role at in recent years. We now can easily import EAD-encoded finding aids into any arbitrary relational database—for enriching the data, adding item-level information for digitized surrogates, collection management, etc.—and exporting back out to EAD or serving out on the web. A tutorial and several sample programs written in perl are available at:

Now that conversion of our legacy finding aids is complete we are involved more and more in digitizing surrogates of the archival materials themselves: selected photographs, books, diaries, letters, both represented by images or sequences of images, and as searchable electronic text encoded in TEI. We are committed to using the emerging METS standard for encapsulating single and multipart digital objects in XML "wrappers." More information on these efforts is available on our Making of America II website.

Since the earliest days of the project, UC Berkeley has realized the importance of developing and adhering to consortial standards. The EAD encoding standard allows a surprisingly divergent, and often distressing, variety of encoding methodologies. In 1996 four institutions, UC, Stanford University, Duke, and the University of Virginia, met to develop a uniform encoding standard for EAD finding aids. This standard, the American Heritage Retrospective Conversion Guidelines, was adopted and later developed upon and refined by the UC EAD consortial project which later grew into the Online Archive of California.

Recently, the Online Archive of California has developed a standard for the encoding of new finding aids, the Best Practices Guidelines for the Encoding of New Finding Aids, which builds upon those guidelines layed out in the Retrospective Conversion Guidelines. Although intended for new finding aids, the BPG provides guidelines, which are beneficial to all finding aids.

Although we foresee difficulties applying the full BPG to our "legacy" EAD documents we are involved in a process of upgrading them to a subset of BPG programmatically. This involves, most importantly, stripping out the old style <drow>/<dentry> tabular markup employed in the early days of EAD at Berkeley, and combining the separate Series Description and Container List into a single <dsc> of type "combined".

Finally, UC has no plans at the present time to begin encoding finding aids in XML. First, all of our current tools handle both XML and SGML so there is no reason for us to switch. Secondly, the XML standard lacks the robust entity management mechanism present in the SGML standard. We have found this entity management to be crucial, especially when interchanging finding aids with other institutions and consortia (hard-coding a specific path or URL in every entity declaration is onerous). If new tools become available for either authoring or publishing, which require XML and which we would find valuable, or if stronger entity management is included in a future version of the XML standard, we would like to switch over.


The Berkeley Finding Aid Project (BFAP), 1993-1995
Beginning in the fall of 1993, researchers in the Library at Berkeley began developing a prototype standard for encoding archive and library finding aids in the form of a Standard Generalized Markup Language Document Type Definition (SGML DTD). A Department of Education Higher Education Act Title IIA Research and Development Grant funded this initial project. The objective of the project was to investigate the desirability and feasibility of such a standard.

In April 1995, after development of the FindAid DTD was well underway and the researchers had built a substantial database of encoded finding aids, the Commission on Preservation and Access funded a conference that gathered leading archivists and computer specialists to evaluate whether the researchers had achieved their objectives. It was the consensus of the participants in the Berkeley Finding Aid Conference that it had, and they recommended that the research and development continue.

Return to top of page

The Bentley Fellowship Program, July 1995
Hoping to strengthen the case for profession-wide adoption of a FindAid-like, SGML based encoding standard, Daniel Pitti sought the assistance of a team of eight experts in archival descriptive standards augmented by an expert in SGML to critique and refine the FindAid approach.

The team successfully applied to the Bentley Library Research Fellowship Program for the Study of Modern Archives in Ann Arbor, Michigan, where the team met in July 1995. The team agreed that at the most basic level, a finding aid document consists of two segments: a segment that provides information about the finding aid itself (its title, compiler, compilation date) and a segment that provides information about a body of archival material (a collection, a record group, or a series).

Following the example of the Text Encoding Initiative (TEI), the group designated the segment about the finding aid itself as the "header." Within the segment providing information about the described material (the actual finding aid), two types of information could be presented: 1) hierarchically organized information that describes a unit of records or papers along with its component parts or divisions and 2) adjunct information that may not directly describe records or papers but that facilitates their use by researchers (e.g, a bibliography).

The hierarchy of descriptive information, reflecting archival principles of arrangement, generally begins with a summary of the whole and proceeds to delineation of the parts as a set of contextual views. Descriptions of the parts inherit information from descriptions of the whole. They successfully produced encoding standard design principles and laid the groundwork for developing a data model and new DTD. The team also renamed the developing standard Encoded Archival Description or EAD.

Return to top of page

SAA, LC, and CLR, August 1995-December 1995
In August of 1995, based on the Bentley team work, Daniel Pitti revised the data model first developed in BFAP, and distributed it for comment. At the Society of American Archivists (SAA) [point to SAA homepage if it has one] meeting in Washington, D.C. in early September 1995, the Committee on Archival Information Exchange appointed a working group [point to list] to oversee the ongoing development of the EAD DTD, and the writing of guidelines.

In October 1995, Daniel Pitti released a "straw man" EAD DTD for testing. Two weeks later, the Library of Congress National Digital Library (LC/NDL) sponsored and was host to three days of meetings to refine the data model and "straw man" DTD. Participants included most of the Bentley Team, representatives from several LC divisions, and two SGML experts.

It was announced at this meeting that the Library of Congress Network Development and MARC Standards Office would be the maintenance agency for EAD once it is endorsed as a standard by the archival community through the Society of American Archivists. The Council on Library Resources funded writing guidelines for the application of EAD in libraries and archives.

Return to top of page

Alpha Release, January 1996
The prototype EAD DTD was declared ready for release to early implementors as an "alpha" version. This version of the EAD DTD was not advertised as perfect but was considered good enough to yield valuable results when applied to a variety of finding aids in diverse institutions.

The alpha version DTD and a revised alpha tag library were made available at two sites, one at the University of California, Berkeley, and the other at the Library of Congress. The ability to obtain copies of the DTD and related documentation electronically helped to speed testing and sharing of test results.

Return to top of page

Development and Release of the Beta DTD, April 1996-December 1996
EAD developers convened in Berkeley, California, on April 27-29, 1996, for a three-day meeting sponsored by the Council on Library Resources and hosted by the University of California, Berkeley. The primary purpose for the April 1996 meeting was to provide an opportunity for the original Bentley team to meet with Anne Gilliland-Swetland and Tom La Porte to review their draft application guidelines and resolve problems with the DTD that had surfaced thus far during the alpha testing.

Revisions to the alpha DTD, tag library, and application guidelines began immediately after the April 1996 meeting in California, with the revised goal of making a beta test version available later that summer. The "final" beta version DTD became available in mid-September 1996. Several minor typographical modifications occurred in late November 1996, resulting in a date change to the September EAD files.

A beta version tag library appeared in October 1996, and draft beta version application guidelines followed two months later. The EAD Working Group decided that no further changes would be made to the beta DTD for at least a year. The files remained stable to permit implementation and full testing by EAD Working Group members and participating institutions.

Return to top of page

Release of Version 1.0 of the EAD DTD, August 1998
After months of reviewing comments and after notifying the archival community the EAD Working Group set about the task of modifying the beta version DTD and totally revising the existing beta tag library to reflect more accurately the proposed Version 1.0 structure. Since the Version 1.0 DTD represented significant changes from the previous beta version, implementors were encouraged to update their encoded documents as quickly as their resources permit.

Return to top of page

EAD 2002, April 2001-2002
A series of 67 suggestions for changes and additions were received from users via a web-based suggestions form made public on the EAD Web site. The suggestions were consolidated into a list that was circulated internally and discussed during a special meeting of the EAD Working Group, held in Washington, DC, April 27-29, 2001. The meeting included representatives from Australia, Canada, France, the United Kingdom, and the United States—bearing witness to the international importance of this emerging standard.

The discussions resulted in the deprecation of only eight (8) EAD elements that had been part of the Version 1.0 (1998) EAD DTD. Much of the need to deprecate elements at all was due to a desire to keep the EAD DTD compatible with provisions of the General International Standard Archival Description (ISAD(G)). The availability of the 2002 version of the EAD DTD comes at a time when more and more users are moving from SGML to XML markup. The entire suite of DTD and entity reference files was reengineered to meet the needs of XML and related technologies that are currently in use.

Return to top of page

| UC Berkeley Library Home | Catalogs | Search the Library Website |
Copyright © 2011 The Regents of the University of California. All rights reserved.
Document maintained by the Digital Publishing Group.
Last updated 05/12/2011. Server manager: contact