Digital Humanities Workbench


Home page > Data analysis > Text analysis

Functions of text analysis software

The current generation of text analysis programs have a range of functions, of which the following are the most common.
Note: In the following, 'text' may also be read as 'collection of texts'.

  • The production of frequency lists
    A frequency list is a list of words indicating the frequency of every given word in a text. These overviews can be presented in different ways: ascending or descending by frequency, alphabetically, or as a retrograde word list.
  • The production of concordances
    A concordance is an overview of all the words in a text, or a selection of words, which also provides the location and the immediate context of every word. Content size can usually be set by the user, and the concordance can also be ordered by the context to the left or to the right of the word. Example of a concordance of the word forms grow, grew and grown in the novel Alice in Wonderland ordered by right context.
  • Searching for words and phrases
    There several ways to search for words or phrases in a text. It is often possible to use so-calledwildcards to search for words that start or end with certain letters (e.g. all words that start with love or end with ness). In addition, it is also possible to require or exclude combinations with certain other words, as well as indicating multiple words of interest (alternation: 'word A' or ' Word B' or ' word C'). The output of the search query is usually shown in the form of a concordance.
  • Plotting words
    This displays a graphical overview of the places in which a word or phrase occurs in the text, thus showing how the words are distributed over the text
  • Analysing word combinations
    Here, software analyses which other words a certain word is typically combined with. This can involve simply counting common phrases, but also establishing collocations. We speak of a collocation if two or more words occur together more often than might be statistically expected.
  • Investigating text-specific vocabulary
    This involves investigating which words in a text occur only in that text. This is usually done by means of a statistical comparison of the frequency list of the text in question with a frequency list that is based on a large collection of other texts  (the so-called reference file).
  • Visualization
    Although plotting words (see above) has long been a standard function of text analysis software, modern programs also have all kinds of extra functions that allow one to visualize various aspects of word usage in a text, for example by means of word clouds, bubble lines, scatter plots and networks. Voyant Tools is an example of a program that has a lot of functionality for text analysis, including many visualization features.