Digital Humanities Workbench


Home page > Data analysis > Text analysis > Stylometry

Stylometry

Stylometry is the study of measurable features of style, such as word and sentence length, various frequencies (of words, word lengths, word forms, etc.), vocabulary richness, use of punctuation, use of certain expressions and preferences for certain spelling variants. Statistical analysis has always been an important pillar of stylometry. Techniques in the field of artificial intelligence are now also being employed for stylometry.

An important application of stylometry is authorship attribution, in which individual style elements of one or more texts are examined, in order to determine who is responsible for creating that text(s). Authorship attribution tries to help answer questions such as:

  • Which of the 35 speeches attributed to Lysias are actually his?
  • Was the Historia Augusta written by one author or by six?
  • Did the same person write the Iliad and the Odyssey?
  • Were the plays attributed to Shakespeare all actually written by him?
  • What is the origin of the shared books of the Ethica Nicomachea by Aristotle and the Ethica Eudemia, which is attributed to one of his pupils ( see Wake 1957 )?

Authorship attribution also plays an important role in forensic linguistics.

Title page of the first collection of The Federalist Papers (1788). An often-cited application of stylometry is determining the authorship of the " Federalist Papers", a series of articles published in 1787-88 with the aim of promoting the ratification of the new United States Constitution. They were written by three authors, Jay, Hamilton and Madison, under the pseudonym “Publius”. We knew the author of some articles, but the authorship of others  was still under debate. In the early 1960s, researchers Mosteller and Wallace used stylometric methods in an attempt to resolve this uncertainty.

An interesting example of a relatively simple program in this area is Signature, which has been developed to support stylometric analysis and text comparison, with special attention for authorship attribution.

More advanced programs for stylometric analysis are stylo and the statistiscal package R. However, these programs are more complex to install and use.

More information

Grieve, J.W. (2005). Quantitative authorship distribution: A history and an evaluation of techniques. Master thesis, Dept. of Linguistics, Simon Fraser University.
http://summit.sfu.ca/system/files/iritems1/8840/etd1721.pdf (consulted on 15-3-2016).