Digital Humanities Workbench |
Home page > Tools COREX
Brief descriptionThe Corpus Gesproken Nederlands (CGN) is a database of contemporary Dutch, as spoken by adults in the Netherlands and Flanders. The CGN has several annotation layers, such as speech signal and the orthographic transcription of speech signals. The material has also been lemmatized and enriched with word class information (parts or speech - POS). A broad phonetic transcription is also available for a selection of 1,000,000 words, as well as prosodic analysis for a small part of the corpus. Finally, information about word order has been added to part of the material by means of syntactic analysis. The following example shows a part of the basic information for the expression "nou je hebt ze in uh uh rond en vierkant".5 17267 21281 N01002 fn000248.6 nou je hebt ze in uh uh rond en vierkant. ORT nou je hebt ze in uh uh rond en vierkant. BW() VNW(pers,pron,nomin,red,2v,ev) WW(pv,tgw,met-t) VNW(pers,pron,stan,red,3,mv) VZ(init) TSW() TSW() ADJ(vrij,basis,zonder) VG(neven) ADJ(vrij,basis,zonder) LET() POS nou je hebben ze in uh uh rond en vierkant . LEM nou je hebt ze in uh uh rond en vierkant. Because the CGN has several annotation layers, programs like AntConc and WordSmith are less suited to searching it. Moreover, with these programs it is not possible to combine data from several annotation layers, or to make efficient use of the corpus's metadata. For this reason, COREX, an exploration program, was developed for the CGN project. With COREX, you can listen to speech files, view various annotations and carry out searches on the CGN. COREX makes it easy to navigate the subcorpora, based on predefined or user-defined criteria, such as speaker gender, age, and various other descriptive data (also known as metadata). The recorded speech is synchronized to the transcript and the annotation data.
Guides and practical introductionsCOREX ManualDetailed manual for CGN and COREX. CGN webcursus This online course is intended for anyone wishing to know more about the Corpus Gesproken Nederlands and/or learn how to work with COREX, the CGN search software.
Practicum Corpus Gesproken Nederlands / Corex
Een Verkenning van COREX
Zoekacties en gebruikte codes binnen COREX
AvailabilityCOREX is available on all VU PCs for students and staff members of the Faculty of Humanities.
More informationOver het Corpus Gesproken NederlandsThis article provides a brief overview of the Corpus Gesproken Nederlands. |