Digital Humanities Workbench |
Home page > Data analysis > Text analysis > Corpus analysis Corpus analysisCorpus analysis is an empirical research strategy that is widely used within language research, using authentic (real, actually attested) language material. A so-called corpus (also known as a text corpus) is a digital collection of texts, text fragments and/or transcripts (of spoken language), which are selected in such a way that they are the best possible representation of a particular language, dialect or text type, making the collection as a whole a reliable source for linguistic research. This can be descriptive / exploratory research, as well as research designed to test linguistic hypotheses. Many corpora have been developed worldwide that can be used by linguistic researchers. See the faculty corpus overview for an overview of the corpora available to staff and students of our faculty. In some cases, you will have to build your own corpus if the linguistic material you aim to investigate has not been integrated into a corpus yet. In both cases, the usability of a corpus strongly depends on its composition and design. It is very important, therefore, to find out all these details when working with existing corpora, and to carefully consider your needs when building your own corpus.
Tasks and activities
ToolsVarious tools can be used at the various stages of corpus research (see above). A brief overview of the most important tools available to staff and students in our faculty can be found in the table below. The name of the program is also a link to a more detailed description.
More informationCorpus linguisticsOnline tutorial, based on the book Corpus linguistics by t. McEnery & A. Wilson (Edinburgh University Press, 1996). [Available at the VU University Library] McEnery, T., R. Xiao and Y. Tono (2006). Corpus-based language studies: an
advanced resource book . London: Routledge. International Journal of Corpus Linguistics (IJCL) and Corpora |
Other topics in this section: Introduction Basic text analysis Qualitative analysis Content analysis Sentiment analysis Text mining Stylometry |