Digital Humanities Workbench |
Home page > Tools > XML Examples of how XML is used in linguistic researchXML is used increasingly often for structural and analytical annotation of text corpora. In many cases, the XML tags used follow the definitions of the Text Encoding Initiative.
Example of structural markup (taken from the English Gigaword Corpus).
<DOC id="AFE19940514.0014" type="story" >
<HEADLINE> Queen Beatrix to appoint party negotiators to explore coalition </HEADLINE> <DATELINE> THE HAGUE, May 14 (AFP) </DATELINE> <TEXT> <P> Queen Beatrix was expected Saturday to formally appoint three party officials to negotiate a broad coalition government for the Netherlands, thrown into political turmoil after this month's general election. </P> <P> The Christian Democrats (CDA), who have dominated the political scene for most of this century, lost 20 seats in the vote on May 3, retaining only 34 in the 150-seat lower house of parliament. </P> (...) </TEXT>
Example of the annotation of clauses, in the context of research into speech, thought and writing presentation .
<sptag cat="NRS" who="B" next="IS" whonext="B" s="0.21" w="3">
I asked him </sptag> <sptag cat="FIS" who="B" next="DS" whonext="B" s="0.43" w="6"> what Franco was doing down here. </sptag> <sptag cat="DS" who="L" next="NRS" whonext="L" s="0.64" w="7"> 'He is opening the new Almeria airport,' </sptag> <sptag cat="NRS" who="L" next="N" s="0.36" w="4"> he said with pride. <p /> </sptag>
Example of morphosyntactic annotation of the sentence 'Moreover, the analysis of skills provides a common topic of research for both art and science historians.' Derived from the BNC Baby corpus.
<s n="820">
<w type="AV0" lemma="moreover">Moreover</w> <c type="PUN">, </c> <w type="AT0" lemma="the">the </w> <w type="NN1" lemma="analysis">analysis </w> <w type="PRF" lemma="of">of </w> <w type="NN2" lemma="skill">skills </w> <w type="VVZ" lemma="provide">provides </w> <w type="AT0" lemma="a">a </w> <w type="AJ0" lemma="common">common </w> <w type="NN1" lemma="topic">topic </w> <w type="PRF" lemma="of">of </w> <w type="NN1" lemma="research">research </w> <w type="PRP" lemma="for">for </w> <w type="AV0" lemma="both">both </w> <w type="NN1" lemma="art">art </w> <w type="CJC" lemma="and">and </w> <w type="NN1" lemma="science">science </w> <w type="NN2" lemma="historian">historians</w> <c type="PUN">.</c> </s>
Example of syntactic annotation of the Dutch sentence 'Er klinkt een zacht geluid.', as a result of automatic syntactic analysis with the Alpino parsing program.
<?xml version="1.0" encoding="ISO-8859-1"?>
<alpino_ds version="1.0"> <node id="0" rel="top" cat="top" begin="0" end="6"> <node id="1" rel="--" cat="smain" begin="0" end="5"> <node id="2" rel="mod" pos="adv" begin="0" end="1" root="er" word="Er"/> <node id="3" rel="hd" pos="verb" begin="1" end="2" root="klink" word="klinkt"/> <node id="4" rel="su" cat="np" begin="2" end="5"> <node id="5" rel="det" pos="det" begin="2" end="3" root="een" word="een"/> <node id="6" rel="mod" pos="adj" begin="3" end="4" root="zacht" word="zacht"/> <node id="7" rel="hd" pos="noun" begin="4" end="5" root="geluid" word="geluid"/> </node> </node> <node id="8" rel="--" pos="punct" begin="5" end="6" root="." word="."/> </node> <sentence>Er klinkt een zacht geluid.</sentence> </alpino_ds> |
Other topics in this section: Introduction Basic principles Examples Using XML Text Encoding Initiative Further information |