Digital Humanities Workbench

homepage Faculty of Humanities VU University Amsterdam



Site map




About this site

Home page > Digital data > Linked (Open) Data

Linked (Open) Data

Main concepts DBpedia Projects Tutorials Further reading

Linked Data is a way of publishing structured data so that they can be interlinked, and be processed by computer programs. This technology enables the connection and sharing of data from different (re)sources and makes it possible to formulate semantic queries that will yield richer information than traditional database searches or internet searches would do.

The term Linked Open Data (LOD) refers to datasets that are freely available (open content). Datasets that are publicly available can be viewed on lod-cloud.net. This gives you an idea of the 'linkedness' of these data and illustrates why it is called 'Linked Open Data'.

In a scholarly context, Linked Open Data offer opportunities for publishing and re-using digital research output. In the field of cultural heritage, many museums and data archives provide online access to their collections and data. In the last decade many of these institutions have embarked on projects to provide their datasets as Linked Data, in order to achieve easy cross-referencing, interlinking and integration. Thus, LOD for cultural heritage and digital humanities enable large-scale digital humanities research, collaboration and aggregation.

Linked Open Data - What is it? from Europeana on Vimeo.

N.B. The concept of Linked Data is closely connected to that of the Semantic Web, an approach of the internet as a web of meaningful data in stead of as a web of documents and other files.

Main concepts

Three main concepts when working with Linked Data are URIs, RDF and SPARQL

Uniform Resource Identifiers (URIs) are character strings that are used to identify a web resource. For more information, see the Wikipedia entry about URIs.

Resource Description Framework (RDF) is a standard model for data interchange on the Web. With this model the characteristics of web resources can be expressed in the form of so-called triples, with a subject�predicate�object structure. The subject describes the resource, and the predicate describes an aspect of that resource and expresses a relationship between the subject and the object. An example of such triples might be:
     <Abraham Kuyper> <is of type> <politician>
     <Abraham Kuyper> <has birthdate> <1837-10-29>
     <Abraham Kuyper> <is founder of> <Vrije Universiteit Amsterdam>
     <Vrije Universiteit Amsterdam> <is located in> <Amsterdam>
     <Vrije Universiteit Amsterdam> <has Latin name> <Universitas Libera>
On the basis of data stored in such triples, information from various databases can be combined (linked).

SPARQL is a semantic query language for databases, with which data stored in RDF format can be retrieved and manipulated. In practice, this means that these data can be searched and extracted from the databases by issuing so-called SPARQL queries. The syntax used in these queries is similar to the SQL syntax that is used to query 'traditional' databases, but is adapted to the subject-predicate-object structure of the data. It allows the use of different datasets from different locations simultaneously by specifying prefixes. An example of such a query could be: List people born in places that were part of a VOC trading route from Dutch Ships and Sailors, using both DBpedia and Dutch Ships and Sailors (see below).

Image BiographyNet There are various implementations of SPARQL, one of which is the Virtuoso SPARQL Query Editor. This editor can be used, for example, to query the RDF-version of the Short Title Catalog of the Netherlands (STCN), which holds bibliographical information of books published from 1540-1800. This enables researchers to create more complex queries than is possible with the web interface to the normal STCN catalog. In this way, for example, dates of publication can be combined with genre information, to find out which genres were popular in the course of time. For more information, you are referred to the web page Zoeken in de STCN met SPARQL

DBPedia: a structured version of Wikipedia

Logo DBpedia DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. Information that has been entered on Wikipedia by volunteers, can be searched in a structured and machine readable way via DBpedia. Wikipedia uses categories and subjects (you can see such information e.g. in small boxes at the bottom of Wikipedia articles) and provides information like dates and places related to events and persons. DBpedia allows you to ask sophisticated queries against Wikipedia, but the DBpedia dataset can also be used to link other datasets on the Web to Wikipedia data. As such it plays an important role in many Linked Open Data projects, as can be seen in Linking Open Data cloud diagram mentioned above.

Linked Open Data projects with a VU connection

Image BiographyNet BiographyNet
BiographyNet has created an interlinked semantic knowledge base by extracting relations between people, places, historic events and time periods based on data from biographical descriptions in the Biography Portal of the Netherlands.

ILogo Rijks Rijksmuseum as Linked Open Data
The Rijksmuseum linked dataset contains over 350,000 objects, including detailed descriptions and high-quality images released under a public domain license. Also available on this site are collection and vocabulary statistics, as well as lessons learned from the process of converting the collection to Linked Data.

Logo CEDAR CEDAR Linked Open Census data
This project takes Dutch census data as its starting point to build a semantic data-web of historical information. With such a web, it will be possible to answer questions such as: What kind of patterns can be identified and interpreted as expressions of regional identity? How can patterns of changes in skills and labour be related to technological progress and patterns of geographical migration? How can changes of local and national policies in the structure of communities and individual lives be traced?

Dutch Ships and Sailors
Dutch Ships and Sailors provides an infrastructure for maritime historical datasets, linking correlating data through semantic web technology. It brings together datasets related to recruitment and shipping in the East-India trade (mainly 18th century) and in the shipping of the northern provinces of the Netherlands (mainly 19th century).

Logo Amsterdam Museum Amsterdam Museum Linked Open Data
The Amsterdam Museum dataset describes more than 70.000 cultural heritage objects related to the city of Amsterdam described by the museum.

Tutorials

Linked Data: Guides and Tutorials (From Tom Heath's web site Linked Data - Connect Distributed Data across the Web).
Tutorial Using SPARQL to access Linked Open Data by Matthew Lincoln
This tutorial explains why many cultural institutions are adopting graph databases, and how researchers can access these data though the query language called SPARQL.