Digital Humanities Workbench

Home page > E-resources > Linguistics > Lexical data

Lexical databases

Lexical databases contain structured information about words and are usually available online. The main difference between lexical databases and dictionaries is that dictionaries aim to explain or translate words, while lexical databases are primarily developed for research purposes.

WordNet is a lexical database in which words are organized in a completely different way than in a 'normal' dictionary. WordNet contains information about English nouns, verbs, adjectives and adverbs and is organized on the basis of so-called synsets. A synset is a set of words (in the same word class) that are mutually replaceable in a certain context, and are thus largely synonymous. The words {car, auto, automobile, machine and motorcar} belong to a synset, for example, because they can be used to refer to the same concept. A synset usually includes explanatory commentary (a kind of simple definition), such as "four-wheeled; usually propelled by an internal combustion engine." A very important aspect of WordNet is that synsets are connected by means of various semantic relationships, such as hyponymy, hyperonymy and meronymy. Read more about WordNet.
Use WordNet online
About WordNet
The Global WordNet Association

Referentiebestand Nederlands (RBN)
The Referentiebestand Nederlands is a corpus-based lexical database of Dutch with over 45,000 keywords and more than 90,000 examples. It includes detailed information regarding the orthography, morphology, syntax, semantics, pragmatics and combinatorics for every meaning of a certain word.
The file is an intermediate product which can be used to build other lexicons, such as bilingual dictionaries with Dutch as the source language, and as a reference point for dictionaries with Dutch as the target language. It can also be used as a component of linguistic applications that automatically lemmatize words and/or tag words with information about word class or semantics. The RBN combines two kinds of information: information that is intended primarily for a human user and information that is primarily intended for automatic language processing. The combination of these two types of information makes the RBN a versatile lexicon.
Just like the Microsoft Access database, the RBN is available on the faculty network (G:\FGW\Data\Databases\RBN).
It is available online via the HLT Agency website, which also contains more information about this database.
Link: RBN documentation

The expert centre CELEX has developed lexical databases for Dutch, English and German. These databases contain detailed information about the orthography, phonology, morphology, syntax and frequency of words, but no information about their meaning. CELEX data can be used in different types of linguistic research and linguistic experimentation. Read more about CELEX.
The University Library holds a CD-ROM containing all CELEX data with accompanying documentation, see Burnage, G., Baayen, R., Piepenbrock, R., & Rijn, H. (1990). CELEX : A guide for users. Nijmegen: CELEX. It is not easy, however, to extract information from these data. For help, you can get in touch with the faculty's ICT Support Unit.
Links: WebCelex (Note: requires Firefox)

MRC Psycholinguistic Database
Words are a central component of a lot of psycholinguistic research. Words, being a combination of phonological, orthographic, morphologic, syntactic and semantic information, have many features that strongly influence how they are processed by the human cognitive system. In order to understand how this works, many experiments are carried out on the reception of two or more groups of words which differ in certain characteristics / attributes. The MRC Psycholinguistic Database was designed as a resource to aid the selection of relevant (English) words for this kind of research. It contains over 150,000 words, each with 26 different linguistic characteristics / features, such as: number of letters, number of phonemes, phonetic transcription, stress pattern, number of syllables, morphology, part of speech, frequency data, familiarity, concreteness, imageability, meaningfullness, age of acquisition, and status (dialect, archaic, poetic, specialized, etc.).
Link: MRC Online

Other topics in this section: Grammars   Linguistic lexicons   Text corpora