Corpus details

Stevin Nederlandstalig Referentiecorpus

Official name:Stevin Nederlandstalig Referentiecorpus
Common name:SoNaR Corpus
Language type:written
Corpus type:general / reference
Period:1954 - 2011
Size:500 million words
Description:The STEVIN project SoNaR has resulted in a 500-million word balanced reference corpus for contemporary (1954-present) written Dutch. The corpus contains over 500 million words (i.e. word tokens) of full texts from a wide variety of text types including both texts from conventional media and texts from the new media.
Exploration:The SoNaR corpus is both available online (OpenSoNaR) and offline. Please contact Eric Akkerman for further details about the use of the offline corpus.
Annotation:All texts except for texts from the social media (Twitter, Chat, SMS) have been tokenized, tagged for part of speech and lemmatized.
Origin:Radboud Universiteit Nijmegen, Universiteit Tilburg, Universiteit Twente, Hogeschool Gent, Katholieke Universiteit Leuven, Universiteit Utrecht.
Edition:final release (2013)
Location:The offline version of the SoNaR corpus is available on the faculty network (on request).
The online version OpenSoNaR is available at
Details:SoNaR documentation can be found at
See Also: 
Name:SoNaR project site
Description:Description of OpenSoNaR at the CLARIN site.
Name:D-Coi project site
Description:D-Coi was was a preparatory project for the SoNaR project. The D-Coi material is incorporated in the SoNaR-corpus.

back to overview