Digital Humanities Workbench |
Home page > Data analysis > Text analysis > Basic text analysis Basic text analysis
In this workbench, we use the term basic text analysis for the analysis of the occurrence
and usage of certain words in a text or a collection of texts. One might also call this lexical analysis;
essentially it ficuses on the vocabulary of texts.
This can involve the frequency of words in a text (e.g. in contrastive analysis), but it
usually comes down to searching for certain words or phrases, word patterns and
annotations in a text. Other word-related aspects, such as measuring the
distribution of certain words in a text and establishing a text-specific
vocabulary, are also part of this method of analysis.
ToolsSoftware for lexical text analysis offers the possibility to quickly and efficiently examine how certain words are used in a text or a collection texts, answering questions such as the following: how often do they occur?, in which context do they occur?, with which other words are they combined?, and in which part of the text do they occur? A detailed overview of the various functions these programs have can be found here. Accordingly, software allows researchers to approach texts in a different way than with just (linear) reading, as well as providing many more specific ways to analyse texts than programs such as Word. Online texts are often only partially searchable: you can only search those elements the creator of the website has allowed you to (or you can search the web page on which the texts are displayed by using the default browser search function - Ctrl+ F). More detailed analyses are not usually possible.
Thematic analysisIn the case of thematic analysis it is usually not clear in advance which exact words are to be searched for. You can approach it in the folllowing ways:
Working from a frequency list is (much) less labour-intensive than annotating a text. In addition, it is a more intuitive and flexible process than annotation (which is usually applied only once, after which it is rarely changed, so that it could end up being a leading part of the analysis). A disadvantage compared to annotation is that words often have homonyms that are not interesting for your research and that you will have to filter out of the concordance. Moreover, not all concepts, thematic aspects and the like can be described with specific words. A particular theme may very well be represented through words that you would not initially associate with it. Note: in order to use programs like this, it is usually necessary to prepare the the text you want to analyse. This depends partly on the format the file has been stored in (see: preparation and file formats). It may also be useful, or even necessary, to enhance the text with certain structural or analytical information (see: formal annotation).
SoftwareSome concordance programs that were commonly used in the humanities in the past are Oxford Concordance Program and TACT. Nowadays, WordSmith (for which our faculty has a licence) and Concordance are quite popular.AntConcis a freeware program that is available on all VU PCs for students and staff members of the Faculty of Humanities. The online program Voyant Tools is currently experiencing a rise in popularity, because of the ways it allows users to visualise word usage in a text, such as word clouds, bubble lines, scatter plots and networks. |
Other topics in this section: Introduction Qualitative analysis Content analysis Corpus analysis Sentiment analysis Text mining Stylometry |