Digital Humanities Workbench

Home page > Data collection > Digital archives > Implications


Using sources that have already been digitized for your research has a number of advantages:

  • Digitization is a time-consuming activity, so using sources that have already been digitized can be a big time-saver.
    Note: If the digitized file is a digital image of the original source, it must still be transcribed before certain types of digital analysis can be performed on it.
  • Many digitized sources contain metadata and/or annotations, which provide additional opportunities for analysis. Metadata may include, for example, data indicating how the source was obtained, or its relationship with other resources (which may be located elsewhere). Additionally, there might be information about the dimensions of the original source and about the original appearance and context of use of the source.
  • New developments have made it possible to combine and analyse information from different sources or data sets (see the Linked Open Data page.) This has made it even easier than before to put source material in a context.

Using existing digital sources and/or data sets may also have certain consequences.


  • When you are working with material from an archive or a collection, this means that it was selected by other people. It is not always clear what criteria were used for this, so you cannot always know how representative the collection is for the phenomenon you want to study.
  • When you are using existing annotated sources, you have to take into account the fact that annotations are never theory-neutral, and this also applies to metadata to a certain extent.
  • Another aspect is how close a digitized source actually is to the original and how this could possibly affect how it is analysed. During the process of digitization some physical characteristics of the source can disappear (dimensions, paint strokes on a painting, the third dimension in the case of sculptures and architecture). Texts are often missing the original layout and the original book cover.
  • In addition, the quality of the digitized version is not always optimal, which can also affect the results of the analysis.


  • The provenance of sources is not always clear. Take a scan or photo of a source published on a personal web page, where metadata are missing or unreliable. It could be unclear whether the photo was just a part of the source or a full digital version. Moreover, it is always possible that the source was digitally manipulated in some way or another. Although manipulating (historical) documents is not a new phenomenon - removing names from documents or erasing people from photographs was common practice in the former Soviet Union - digital techniques have made manipulating sources a lot easier. Literary and philosophical sources do not always contain information about which edition of the text was digitized.
  • Large collections of digitized sources (such as Google Books) often make use of automatic metadata generation, which makes them less useful and can affect the results of the analysis.
  • Where good metadata can help contextualize a source, a lack of clarity regarding the origin of a digitized source can decontextualize it.
  • Another point of concern in this context is the fact that more and more source material is being made available through social media sites such as YouTube and Flickr. It is often difficult to identify the origin of this material and descriptive information about the sources is submitted by individuals on the basis of categories they named themselves. Because there are no clear criteria for these descriptions (as opposed to descriptive data in an institutionalized archive), the metadata on these websites is not always of the highest quality.
  • In particular, a lot of born-digital material can be very unstructured, making its preparation for analysis a very time-consuming process.