- Digital humanities
- From source to data
- Data collection
- Digital data
- Data analysis
- Tools
- Devices
- E-resources
- Special topics
- Digital heritage
- Digital archaelogy
- E-literature
- Scholarly editing
- Language technology
|
Home page > Data collection > Digital archives > Implications
Implications
Using sources that have already been digitized for your research has a number
of advantages:
- Digitization is a time-consuming activity, so using sources that have already been
digitized can be a big time-saver.
Note: If the digitized file is a digital
image of the original source, it must still be transcribed before certain
types of digital analysis can be performed on it.
-
Many digitized sources contain metadata and/or annotations, which provide
additional opportunities for analysis. Metadata may include, for example, data
indicating how the source was obtained, or its relationship with other
resources (which may be located elsewhere). Additionally, there might be
information about the dimensions of the original source and about the original
appearance and context of use of the source.
-
New developments have made it possible to combine and analyse information from different sources or data sets (see the Linked Open Data page.) This has made it even easier than before to put source material in a context.
Using existing digital sources and/or data sets may also have certain consequences.
Contents
-
When you are working with material from an archive or a collection, this means that it was selected by other people. It is not always clear what criteria were used for this, so you cannot always know how representative the collection is for the phenomenon you want to study.
-
When you are using existing annotated sources, you have to take into account the fact that annotations are never theory-neutral, and this also applies to metadata to a certain extent.
-
Another aspect is how close a digitized source actually is to the original and how this could possibly affect how it is analysed. During the process of digitization some physical characteristics of the source can disappear (dimensions, paint strokes on a painting, the third dimension in the case of sculptures and architecture). Texts are often missing the original layout and the original book cover.
-
In addition, the quality of the digitized version is not always optimal, which can also affect the results of the analysis.
Provenance
-
The provenance of sources is not always clear. Take a scan or photo of a
source published on a personal web page, where metadata are missing or
unreliable. It could be unclear whether the photo was just a part of the
source or a full digital version. Moreover, it is always possible that the
source was digitally manipulated in some way or another. Although manipulating
(historical) documents is not a new phenomenon - removing names from
documents or erasing people from photographs was common practice in the former
Soviet Union - digital techniques have made manipulating sources a lot easier.
Literary and philosophical sources do not always contain information about
which edition of the text was digitized.
-
Large collections of digitized sources (such as Google Books) often make use of automatic metadata generation, which makes them less useful and can affect the results of the analysis.
-
Where good metadata can help contextualize a source, a lack of clarity regarding the origin of a digitized source can decontextualize it.
-
Another point of concern in this context is the fact that more and more
source material is being made available through social media sites such as
YouTube and Flickr. It is often difficult to identify the origin of this
material and descriptive information about the sources is submitted by
individuals on the basis of categories they named themselves. Because there
are no clear criteria for these descriptions (as opposed to descriptive data
in an institutionalized archive), the metadata on these websites is not always
of the highest quality.
-
In particular, a lot of born-digital material can be very unstructured, making its preparation for analysis a very time-consuming process.
|
|