Home page >
E-resources >
Linguistics > Lexical data
Lexical databases
Lexical databases contain structured information about
words and are usually available online. The main difference between lexical
databases and dictionaries is that dictionaries aim to explain or translate
words, while lexical databases are primarily developed for research purposes.
WordNet
WordNet is a lexical database in which words are organized in a completely
different way than in a 'normal' dictionary. WordNet contains information about
English nouns, verbs, adjectives and adverbs and is organized on the basis of
so-called synsets. A synset is a set of words (in the same word class)
that are mutually replaceable in a certain context, and are thus largely
synonymous. The words {car, auto, automobile, machine and
motorcar} belong to a synset, for example, because they can be used to
refer to the same concept. A synset usually includes explanatory commentary
(a kind of simple definition), such as "four-wheeled; usually propelled by an
internal combustion engine." A very important aspect of WordNet is that synsets
are connected by means of various semantic relationships, such as hyponymy,
hyperonymy and meronymy.
Read more about WordNet.
Links:
Use WordNet online
About WordNet
The Global WordNet Association
Referentiebestand Nederlands (RBN)
The Referentiebestand Nederlands is a corpus-based
lexical database of Dutch with over 45,000 keywords and more than 90,000
examples. It includes detailed information regarding the orthography,
morphology, syntax, semantics, pragmatics and combinatorics for every meaning of
a certain word.
The file is an intermediate product which can be used to
build other lexicons, such as bilingual dictionaries with Dutch as the source
language, and as a reference point for dictionaries with Dutch as the target
language. It can also be used as a component of linguistic applications that
automatically lemmatize words and/or tag words with information about word class
or semantics. The RBN combines two kinds of information: information that is
intended primarily for a human user and information that is primarily intended
for automatic language processing. The combination of these two types of
information makes the RBN a versatile lexicon.
Just like the Microsoft Access database, the RBN is available on the faculty network (G:\FGW\Data\Databases\RBN).
It is available online via the HLT Agency website, which also contains more information
about this database.
Link: RBN documentation
CELEX
The expert centre CELEX has developed lexical databases for Dutch, English and
German. These databases contain detailed information about the orthography,
phonology, morphology, syntax and frequency of words, but no information about
their meaning. CELEX data can be used in different types of linguistic research
and linguistic experimentation. Read more about CELEX.
The University Library holds a CD-ROM containing all CELEX data with accompanying documentation, see
Burnage, G., Baayen, R., Piepenbrock, R., & Rijn, H. (1990). CELEX : A guide for users. Nijmegen: CELEX.
It is not easy, however, to extract information from these data. For help, you can get in
touch with the faculty's ICT Support Unit.
Links: WebCelex (Note: requires Firefox)
MRC Psycholinguistic Database
Words are a central component of a lot of psycholinguistic
research. Words, being a combination of phonological, orthographic, morphologic,
syntactic and semantic information, have many features that strongly influence
how they are processed by the human cognitive system. In order to understand
how this works, many experiments are carried out on the reception of two or more
groups of words which differ in certain characteristics / attributes. The MRC
Psycholinguistic Database was designed as a resource to aid the selection of
relevant (English) words for this kind of research. It contains over 150,000
words, each with 26 different linguistic characteristics / features, such as:
number of letters, number of phonemes, phonetic transcription, stress pattern,
number of syllables, morphology, part of speech, frequency data, familiarity,
concreteness, imageability, meaningfullness, age of acquisition, and status
(dialect, archaic, poetic, specialized, etc.).
Link: MRC Online