Digital Humanities Workbench |
Home page > Digital data > Digital text > Annotation Annotation in text filesFor various reasons, all sorts of information may be be added to the actual text of digitized text files. This is called annotation of the text, which is usually applied using certain codes. We distinguish between the following three types of annotation.
Annotation systemsThere are various ways to add annotations to a text. It is generally done by adding codes, which is usually called markup. The easiest way is to add a code behind a reserved character in the text. In such a system, epithets could be encoded as follows:fleet-footed#ep Achilles A thematic enhancement could be represesented as: {theme=love} An advantage of this approach is that it is simple. The downside is that it has not been standardized and that software for text analysis is not specifically tailored to processing these arbitrary codes.
COCOA is an annotation system that was frequently used a few decades ago and
that you can still find in texts that were digitized in the 20th century. The
principle of COCOA is that a code is placed between angled brackets and that all
codes can consist of two parts: the first part specifies the marker type,
the (optional) second part can add a certain value. COCOA markers are
placed at the start of a particular element. The start of Act 3 in a certain
play could thus be marked with the following COCOA code: <act 3>.
XML is now more commonly used to annotate texts. In text archives you
can still find many texts that are encoded with SGML, the precursor to XML. The
main advantage of using XML is that it is a widespread standard that is handled
well by most modern software. Based on XML, the Text Encoding Initiative (TEI)
has developed a number of encoding sets for use in the humanities, including an
encoding set for novels and plays.
Further informationFor further information about the process of annotation, see the texts about formal annotation and free annotation in this Workbench. More information about XML and TEI. |
Other topics in this section: Introduction Types File formats |