Overview of text corpora in the Faculty of Humanities

Overview of Dutch corpora

Click on the name of a corpus to view details.
A globe [] indicates that the corpus can be accessed through the Internet.

NameDescription
Child Language Data Exchange System
(CHILDES)
Corpus material related to first language acquisition.
Corpus Gesproken Nederlands
(CGN)
Large corpus of spoken Dutch with various types of annotation.
Corpus Hedendaags Nederlands
(CHN)
CHN is a monitor corpus for contemporary Dutch.
Corpus Hermans W.F. Hermans' novella "Het Behouden Huis", analysed and coded to facilitate research of word order in Dutch.
Corpus Oudnederlands All known old Dutch textual material from the period 475 - 1200.
Corpus Renkema Language of civil servants ("ambtelijke taal").
Corpus Uit den Boogaart
(Eindhoven corpus)
Written and spoken Dutch produced between 1960 and 1973.
Dutch Parallel Corpus
(DPC)
DPC is a parallel corpus of 10 million words containing the language pairs Dutch - English and Dutch - French.
Dutch PAROLE Distributable Corpus Written Dutch corpus consisting of various text types.
ESF Corpus Spontaneous second language acquisition data of adult immigrant workers (Arabic > Dutch and Turkish > Dutch).
Stevin Nederlandstalig Referentiecorpus
(SoNaR Corpus)
SoNaR aims to build a 500-million word balanced reference corpus for contemporary written Dutch.
TalkBank TalkBank is a multilingual corpus containing sample databases from within several subfields of communication.
VU Chatcorpus
(ChatIG Corpus)
Dutch corpus consisting of controlled chat sessions by secondary school pupils of different age groups.