Linked Data is a way of publishing structured data so that they can be interlinked, and be processed by computer programs. This technology enables the connection and sharing of data from different (re)sources and makes it possible to formulate semantic queries that will yield richer information than traditional database searches or internet searches would do.
In a scholarly context, Linked Open Data offer opportunities for publishing and re-using digital research output. In the field of cultural heritage, many museums and data archives provide online access to their collections and data. In the last decade many of these institutions have embarked on projects to provide their datasets as Linked Data, in order to achieve easy cross-referencing, interlinking and integration. Thus, LOD for cultural heritage and digital humanities enable large-scale digital humanities research, collaboration and aggregation.
Main concepts
Three main concepts when working with Linked Data are URIs, RDF and SPARQL
Uniform Resource Identifiers (URIs) are character strings that are used to identify a web resource. For more information, see the Wikipedia entry about URIs.
Resource Description Framework (RDF) is a standard model for data interchange on the Web. With this model the characteristics of web resources can be expressed in the form of so-called triples, with a subject–predicate–object structure. The subject describes the resource, and the predicate describes an aspect of that resource and expresses a relationship between the subject and the object. An example of such triples might be:
<Abraham Kuyper> <is of type> <politician>
<Abraham Kuyper> <has birthdate> <1837-10-29>
<Abraham Kuyper> <is founder of> <Vrije Universiteit Amsterdam>
<Vrije Universiteit Amsterdam> <is located in> <Amsterdam>
<Vrije Universiteit Amsterdam> <has Latin name> <Universitas Libera>
On the basis of data stored in such triples, information from various databases can be combined (linked).
SPARQL is a semantic query language for databases, with which data stored in RDF format can be retrieved and manipulated. In practice, this means that these data can be searched and extracted from the databases by issuing so-called SPARQL queries. The syntax used in these queries is similar to the SQL syntax that is used to query 'traditional' databases, but is adapted to the subject-predicate-object structure of the data. It allows the use of different datasets from different locations simultaneously by specifying prefixes. An example of such a query could be: List people born in places that were part of a VOC trading route from Dutch Ships and Sailors, using both DBpedia and Dutch Ships and Sailors (see below).
There are various implementations of SPARQL, one of which is the Virtuoso SPARQL Query Editor. This editor can be used, for example, to query the RDF-version of the Short Title Catalog of the Netherlands (STCN), which holds bibliographical information of books published from 1540-1800. This enables researchers to create more complex queries than is possible with the web interface to the normal STCN catalog. In this way, for example, dates of publication can be combined with genre information, to find out which genres were popular in the course of time. For more information, you are referred to the web page Zoeken in de STCN met SPARQL
Linked Open Data projects with a VU connection
BiographyNet
BiographyNet has created an interlinked semantic knowledge base by extracting relations between people, places, historic events and time periods based on data from biographical descriptions in the Biography Portal of the Netherlands.
Rijksmuseum as Linked Open Data
The Rijksmuseum linked dataset contains over 350,000 objects, including detailed descriptions and high-quality images released under a public domain license. Also available on this site are collection and vocabulary statistics, as well as lessons learned from the process of converting the collection to Linked Data.
CEDAR Linked Open Census data
This project takes Dutch census data as its starting point to build a semantic data-web of historical information. With such a web, it will be possible to answer questions such as: What kind of patterns can be identified and interpreted as expressions of regional identity? How can patterns of changes in skills and labour be related to technological progress and patterns of geographical migration? How can changes of local and national policies in the structure of communities and individual lives be traced?
Dutch Ships and Sailors
Dutch Ships and Sailors provides an infrastructure for maritime historical datasets, linking correlating data through semantic web technology. It brings together datasets related to recruitment and shipping in the East-India trade (mainly 18th century) and in the shipping of the northern provinces of the Netherlands (mainly 19th century).
Amsterdam Museum Linked Open Data
The Amsterdam Museum dataset describes more than 70.000 cultural heritage objects related to the city of Amsterdam described by the museum.