Corpus details

Corpus Gesproken Nederlands

Official name:Corpus Gesproken Nederlands
Common name:CGN
Language type:spoken
Corpus type:general / reference
Size:900 hours / 9.000.000 words
Description:The Spoken Dutch Corpus (Corpus Gesproken Nederlands) was constructed between 1998 and 2004. The corpus consists of approximately 900 hours of Dutch as spoken by adults in Flanders and the Netherlands. It contains various text types, like parts of novels that are read aloud, political debates, private conversations, interviews, news bulletins and football commentaries.
Exploration:For the exploration of the CGN Corpus, the program Corex has been developed. Corex allows the user to listen to the speech files, to view the multiple annotations and to conduct queries in the corpus, in which searches for words and phrases, annotation layers and metadata (text type, speaker's age, sex, etc.) can be combined.
Annotation:lemma; inflection; part of speech; prosody; syntaxis
Transcription:orthografic; phonetic
Sound files:yes
See Also: 
Name:Home Corpus Gesproken Nederlands
Description:Website hosted by the TST-centrale (which maintains the CGN and Corex) offering extensive documentation about the corpus and the project.
Name:CGN webcursus
Description:Web based tutorial introducing CGN and Corex (in Dutch).
Name:Een verkenning van Corex
Description:Introduction to the use of Corex (in Dutch).
Name:Corex Manual
Description:Manual describing the functions of CGN's exploration software.
Name:Practicum CGN / Corex
Description:Instruction in the basic use of Corex (in Dutch).
Name:Corex metadata
Description:Access database with speaker data (download before use).

back to overview