Digital Humanities Workbench |
Home page > Tools > XML > Introduction Introduction to XMLXML (Extensible Markup Language) was developed to store the contents of files in a structured way. A central aspect of XML is the use of tags that mark and annotate the structure of both files and individual meaningful elements in a document. XML documents consist only of letters, numbers, and punctuation (they are so so-called plain text files) and contain no specific binary code for formatting or structuring, as is the case with Word documents and Excel files, for example. As a result, XML documents are program and platform-independent: a text file with XML tags created with program A on OS C can also be processed by program B on OS D. In addition, it is an open system that can be used for free. Using XML is a way to make digital information more future-proof: an open and relatively simple standard is a good basis for future reuse of (research) data. Some key benefits of using XML are:
Although many different XML applications have been developed for encoding specific types of documents (see, for example, the overview of XML Applications and Initiatives on the Cover Pages website), anyone can develop an entirely new coding system that meets the needs of a specific research project, for example. It is also possible to use an existing XML application and to adjust or extend it for a specific research project. XML has hundreds of applications and is used in many different disciplines. Here are some examples:
This Workbench focuses on how XML can be used to enhance (mainly textual) documents for research purposes in the humanities, where extra information is often added to digitized textual sources in order to aid analysis of these texts. This can include information about the origin and structure of the documents and/or more content-related information, which is used to classify the content of documents in any number of ways. This process is usually called annotation. Research material enhanced with XML tags can be edited, searched, analysed and presented in various ways (on a website, for example). XML can greatly improve the accessibility of research material, on condition that the XML tags are applied correctly, of course. In linguistic research and textual analysis, XML is used for the markup and annotation of text corpora, including standard text corpora, such as the British National Corpus and SoNaR, as well as specific research projects. In literary research, XML is used for the annotation of digitized literary texts. The TEI By Example project carried out by the Royal Academy of Dutch Language and Literature provides a good overview of possible applications of XML for annotating poetry . In historical and cultural-historical research, XML is used for the annotation and disclosure of, for example, letters and other historical documents, as well as for the markup of more structured data sets (based on personal archives, for example). In addition to annotation for research purposes, XML is also widely used for annotation for the delivery and digital publication of primary texts, manuscripts and other documents. |
Other topics in this section: Basic principles Examples Using XML Text Encoding Initiative Further information |