Corpus details

Dutch PAROLE Distributable Corpus

Official name:Dutch PAROLE Distributable Corpus
Language:Dutch
Language type:written; written to be spoken
Corpus type:general / reference
Period:1984 - 1995
Size:3 million words
Description:Written Dutch corpus consisting of various text types: books, newspapers, periodicals and miscellaneous (a.o. texts to be read out in tv-news broadcasts). An equal proportion of the corpus texts (up to 250,000 running words) was morphosyntactically annotated according to a common core PAROLE tagset, extended with a set of language specific features.
Exploration:The corpus consists of plain text files and can be explored with standard exploration software like WordSmith and Windows Grep.
Annotation:part of speech
Origin:The PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus, which is one of the results of the large European corpus harmonisation effort, called PAROLE.
Location:Faculty network, folder G:\LET\Data\Corpora\Nederlands\Parole
Details:On the faculty network, the corpus is available in three formats: with part of speech codes (folder "Pos_annotated"); with TEI-codes (not grammatical - folder "Tei_encoded"); stripped of all codes (folder "Stripped")
  
See Also: 
  
Name:Parole documentation
Description:Documentation about the Parole distributable corpus
  
Name:Parole online
Description:As of May 1O, 2012, online access to the Parole Corpus is no longer available. Eventually, a new online search application will be developed.
  


back to overview