Digital Humanities Workbench


Home page > Tools

COREX

Brief description

The Corpus Gesproken Nederlands (CGN) is a database of contemporary Dutch, as spoken by adults in the Netherlands and Flanders. The CGN has several annotation layers, such as speech signal and the orthographic transcription of speech signals. The material has also been lemmatized and enriched with word class information (parts or speech - POS). A broad phonetic transcription is also available for a selection of 1,000,000 words, as well as prosodic analysis for a small part of the corpus. Finally, information about word order has been added to part of the material by means of syntactic analysis. The following example shows a part of the basic information for the expression "nou je hebt ze in uh uh rond en vierkant".


5 17267 21281 N01002 fn000248.6

nou je hebt ze in uh uh rond en vierkant.
ORT nou je hebt ze in uh uh rond en vierkant.

BW() VNW(pers,pron,nomin,red,2v,ev) WW(pv,tgw,met-t) VNW(pers,pron,stan,red,3,mv) VZ(init) TSW() TSW() ADJ(vrij,basis,zonder) VG(neven) ADJ(vrij,basis,zonder) LET()
POS
nou je hebben ze in uh uh rond en vierkant .
LEM nou je hebt ze in uh uh rond en vierkant.

Because the CGN has several annotation layers, programs like AntConc and WordSmith are less suited to searching it. Moreover, with these programs it is not possible to combine data from several annotation layers, or to make efficient use of the corpus's metadata. For this reason, COREX, an exploration program, was developed for the CGN project. With COREX, you can listen to speech files, view various annotations and carry out searches on the CGN. COREX makes it easy to navigate the subcorpora, based on predefined or user-defined criteria, such as speaker gender, age, and various other descriptive data (also known as metadata). The recorded speech is synchronized to the transcript and the annotation data.

Guides and practical introductions

COREX Manual
Detailed manual for CGN and COREX.

CGN webcursus  
This online course is intended for anyone wishing to know more about the Corpus Gesproken Nederlands and/or learn how to work with COREX, the CGN search software.

Practicum Corpus Gesproken Nederlands / Corex  
Basic instructions for using COREX to explore the CGN.

Een Verkenning van COREX  
Introduction to the exploration programme for the Corpus Gesproken Nederlands

Zoekacties en gebruikte codes binnen COREX  
Explanation of the codes used in the CGN.

Availability

COREX is available on all VU PCs for students and staff members of the Faculty of Humanities.

More information

Over het Corpus Gesproken Nederlands  
This article provides a brief overview of the Corpus Gesproken Nederlands.

CGN project website  


Logo CGN