Text Encoding Initiative

Basically anyone can develop their own XML codeset for every type of text. In the world of XML, it is preferable to work with standard specifications wherever possible, in order to promote optimal document interchange and the development of processing software. The Text Encoding Initiative (TEI) is a consortium working on the development and maintenance of a standard for the digital representation of texts used as research material in the humanities. In that context, XML specifications have been developed for many textual structures (such as critical editions, prose, poetry, text corpora, lexicons, older manuscripts). TEI is now seen as the de facto standard for annotating electronic texts in the humanities and is also widely used by libraries, museums and publishers. Examples of projects within the humanities that use TEI's XML definitions can be found in an overview on the TEI website.

These XML specifications have been codified in a so-called TEI DTD, which also comes in a simplified version: TEI Lite. TEI DTDs can be easily expanded for specific purposes. This is useful when working in a project with standard text structures (such as poetry or text corpora) in which certain project-specific elements need to be annotated.

An important part of any document that is annotated based on the TEI DTD is the so-called TEI header. This can include the following metadata:

  • file description: fullest possible bibliographical description of the digital document (such as title, edition, publisher).
  • encoding description: a description of the relationship between the electronic text and its source(s). This indicates, for example, whether (and how) the text was normalized during transcription, how the uncertainties in the source have been solved, and which annotations the document contains.
  • text profile: descriptive and contextual information about the text, such as the subject of the text, the language/languages used in the text, the situation in which the text was produced (for example, during an interview) or a description of the participants in transcribed conversations.
  • revision history: this describes the history of the document, addressing, for example, the changes made in adapted versions.

Example of a TEI header

