Practical examples of XML

XML (and its precursor SGML) can be used for the structural and substantive annotation of digitized texts, which is a valuable tool for the publication and analysis of texts. In many cases, the XML tags used follow the definitions of the Text Encoding Initiative.

  • Jane Austen's novel The Watsons
    Example of a novel with simple structure marking (paragraphs, the element <p>) and highlighting of important words (the element <hi>). This document also contains a so-called TEI header, which contains all kinds of metadata related to the novel.
  • Shakespeare's Sonnets
    Example of a document where the structure of a play has been annotated.
    Note: this is an SGML document (SGML is the precursor of XML)
  • The TEI By Example project carried out by the Royal Academy of Dutch Language and Literature provides a good overview of possible applications of XML for annotating poetry , including a large number of examples.
  • XML is becoming an increasingly popular way to add structural and content annotation to text corpora for linguistic research, allowing extensive exploration and analysis of the material. In many cases, the XML tags used follow the definitions of the Text Encoding Initiative. Examples of linguistic XML coding
  • Example of a digitized historical inventory in EAD (an application of XML).


XML (or its precursor SGML) plays an important role in a growing number of (research) projects in the Humanities. The dtd of the Text Encoding Initiative, or an adapted version of this document, is often the starting point for the XML used in these projects. Examples:
  • American Verse Project: an electronic archive of volumes of American poetry prior to 1920. The full text of each volume or poetry has been converted into digital form and been coded in Standard Generalized Mark-up Language (SGML) using the TEI Guidelines, with various forms of access provided through the internet.
  • Victorian Women Writers Project: highly accurate transcriptions of works by British female writers of the 19th century (presented via a web interface), encoded using XML.
  • Der junge Goethe in seiner Zeit: a new edition of Goethe's early works, combined with a large selection of works by others, allowing users to view Goethe's work in its historical and literary context. The digital texts are made available in FolioView and also encoded in SGML, following the TEI guidelines.
  • The Digital Locke Project: a pilot project that makes a start with a scholarly text edition of the manuscripts of the British philosopher John Locke (1632-1704) in the form of an XML-encoded database that is used simultaneously for an online version and the printed version of the manuscripts. The TEI dtd formed the basis for this project.
  • Emblem Project Utrecht: digitization of Dutch 17th-century emblem books, including both religious and profane books, combining full transcriptions, page facsimiles and indexes, as well as extended search options. The books have been encoded using TEI/XML.
  • Newcastle Electronic Corpus of Tyneside English: a corpus of dialect speech from Tyneside in North-East England. The corpus is coded with XML, based on the TEI dtd.
  • Voices of the Holocaust: an online collection of interviews with Holocaust survivors conducted in the immediate aftermath of World War II. TEI encoding is used to provide a structured data model for the transcriptions, which allows various manifestations of the interviews (text, audio) -- as well as other types of content (metadata, GIS, scholarly criticism) -- to be integrated into a dynamic, robust presentation for the user.