Digital Humanities Workbench


Home page > Tools > XML > Using XML

Working with XML

Coding XML

There are several ways to add XML coding to a text. Because XML files are 'plain' text, you can use a text editor such as Notepad or NoteTab to add tags to a text. However, this is not recommended, because it can be a lot of work and it can be difficult to do it without making any mistakes.

It is better to use a so-called XML editor. These text editors have been developed specifically to work with XML: for example, they make a clear distinction between text and tags by using different colors, and allow users to check whether their code  is correct. When the coding is not done 'freely' (i.e. if no dtd or XML schema is used), the editor can also show lists of the XML codes and/or attributes that are permitted in any given place in the text. In addition to ensuring that the XML code will be correct and consistent, it also makes the coding process more efficient.

The free XML editor XMLPad is often used for simple projects in our faculty. For larger projects, the advanced commercial XML editor Oxygen has been used in the past. However, the faculty no longer has a licence for this software.

Searching XML

Most standard concordance programs (such as WordSmith) are able to perform basic searches on text corpora with simple XML encoding. For more information, please see the manual.
For corpora with more complex XML encoding, (such as as in the Corpus Gesproken Nederlands or the British National Corpus), these programs are not advanced enough. This is why these corpora often have their own exploration software, such as Xaira, in the case of the BNC. This program can also be used to explore other corpora with XML coding, although other corpora must first be prepared before they can be explored. However, even these exploration programs have their limitations.
If you want complete freedom in requesting information from an XML document, you can use the XML query language XQuery. However, you do need to have technical knowledge about XML, as well as some programming experience, to work with this language. The organization w3schools has a brief XQuery tutorial on its website.

Processing XML

XML documents can be processed in various ways. For example, it is possible to automatically extract tagged elements from a collection of text files and save them in a database or in a file that can be subjected to further statistical analysis. However, you need to have technical knowledge about XML, as well as some programming experience to do this as well. For help in this matter, you can contact the faculty's ICT Support Unit.