Digital Humanities Workbench |
Home page > From source to data > Transcription > Text Transcription of textIf you want to use the computer to analyse textual sources, the digital images of those sources must be converted to computer readable text. For printed documents that are relatively recent, this can often be realised by optical character recognition (OCR). For printed historic documents, however, OCR often does not produce satisfactory results, although progress is certainly being made in this area in the last decade (see the section about digitisation for more information about OCR). For handwritten documents (like historical manuscripts, letters and children's writing), OCR usually is very problematic, if possible at all. Therefore, many older printed documents and most handwritten manuscripts have to be transcribed manually. Usually, this is done in the form of a so-called diplomatic transcription, which follows the original document as closely as possible. In a normalized (also called regularized) transcription, the original text is cleaned up and more easily readable, e.g. using modern orthography. Because a normalized transcription can be made on the basis of a diplomatic transcription, but not vice versa, diplomatic transcriptions are often preferred. This implies that decisions have to be made about how to deal with certain aspects of the original text: page layout (including line length); typeface (capitalization, use of bold and italics, underline, strikeout, accent markers); punctuation (or lack of it); illegible text; older spelling and misspelling; archaic abbreviations; handwritten notes in printed text; and images and drawings in the text. It is important that all transcription decisions you make in this respect are well documented.
Collaboration and crowd sourcingAs with many modern applications, transcription can be done online, which enables groups of students and/or scholars to work together on the transcription of a single (larger) document or a collection of documents. For a growing number of larger transcription projects (usually conducted by academic departments, libraries or digital archives), this is not restricted to the research group, but all interested individuals are asked to participate. Examples of such crowd sourcing transcription projects are Transcribe Bentham (a double award-winning collaborative transcription initiative, which is digitising and making available digital images of this unpublished manuscripts of this philosopher and reformer through a platform known as the Transcription Desk), Making History - Transcribe (Virginia Memory) and Smithsonian Digital Volunteers, but nowadays there are many more projects of this kind. Usually this transcription method implies a workflow in which all participants may be involved in transcription and the reviewing of the work of others, followed by a final check and approval by the project team.
ToolsYou can make a transcription of a document by opening two windows: one in which the digital image is displayed and one in which you transcribe it with an editor (as txt, HTML, XML or rtf / docx). However, a number of dedicated tools is available to support the transcription process.
Transcript
FromThePage
Transkribus
eLaborate Further reading
|
Other topics in this section: Speech |