Digital Humanities Workbench


Home page > Digital data > Linked (Open) Data

Linked (Open) Data

Main concepts     DBpedia     Projects     Tutorials     Further reading   

Linked Data is a way of publishing structured data so that they can be interlinked, and be processed by computer programs. This technology enables the connection and sharing of data from different (re)sources and makes it possible to formulate semantic queries that will yield richer information than traditional database searches or internet searches would do.

The term Linked Open Data (LOD) refers to datasets that are freely available (open content). Datasets that are publicly available can be viewed on lod-cloud.net. This gives you an idea of the 'linkedness' of these data and illustrates why it is called 'Linked Open Data'.

In a scholarly context, Linked Open Data offer opportunities for publishing and re-using digital research output. In the field of cultural heritage, many museums and data archives provide online access to their collections and data. In the last decade many of these institutions have embarked on projects to provide their datasets as Linked Data, in order to achieve easy cross-referencing, interlinking and integration. Thus, LOD for cultural heritage and digital humanities enable large-scale digital humanities research, collaboration and aggregation.

Linked Open Data - What is it? from Europeana on Vimeo.

N.B. The concept of Linked Data is closely connected to that of the Semantic Web, an approach of the internet as a web of meaningful data in stead of as a web of documents and other files.

Main concepts

Three main concepts when working with Linked Data are URIs, RDF and SPARQL

Uniform Resource Identifiers (URIs) are character strings that are used to identify a web resource. For more information, see the Wikipedia entry about URIs.

Resource Description Framework (RDF) is a standard model for data interchange on the Web. With this model the characteristics of web resources can be expressed in the form of so-called triples, with a subject–predicate–object structure. The subject describes the resource, and the predicate describes an aspect of that resource and expresses a relationship between the subject and the object. An example of such triples might be:
     <Abraham Kuyper> <is of type> <politician>
     <Abraham Kuyper> <has birthdate> <1837-10-29>
     <Abraham Kuyper> <is founder of> <Vrije Universiteit Amsterdam>
     <Vrije Universiteit Amsterdam> <is located in> <Amsterdam>
     <Vrije Universiteit Amsterdam> <has Latin name> <Universitas Libera>
On the basis of data stored in such triples, information from various databases can be combined (linked).

SPARQL is a semantic query language for databases, with which data stored in RDF format can be retrieved and manipulated. In practice, this means that these data can be searched and extracted from the databases by issuing so-called SPARQL queries. The syntax used in these queries is similar to the SQL syntax that is used to query 'traditional' databases, but is adapted to the subject-predicate-object structure of the data. It allows the use of different datasets from different locations simultaneously by specifying prefixes. An example of such a query could be: List people born in places that were part of a VOC trading route from Dutch Ships and Sailors, using both DBpedia and Dutch Ships and Sailors (see below).

Image BiographyNet There are various implementations of SPARQL, one of which is the Virtuoso SPARQL Query Editor. This editor can be used, for example, to query the RDF-version of the Short Title Catalog of the Netherlands (STCN), which holds bibliographical information of books published from 1540-1800. This enables researchers to create more complex queries than is possible with the web interface to the normal STCN catalog. In this way, for example, dates of publication can be combined with genre information, to find out which genres were popular in the course of time. For more information, you are referred to the web page Zoeken in de STCN met SPARQL  

DBPedia: a structured version of Wikipedia

Logo DBpedia DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. Information that has been entered on Wikipedia by volunteers, can be searched in a structured and machine readable way via DBpedia. Wikipedia uses categories and subjects (you can see such information e.g. in small boxes at the bottom of Wikipedia articles) and provides information like dates and places related to events and persons. DBpedia allows you to ask sophisticated queries against Wikipedia, but the DBpedia dataset can also be used to link other datasets on the Web to Wikipedia data. As such it plays an important role in many Linked Open Data projects, as can be seen in Linking Open Data cloud diagram mentioned above.

Linked Open Data projects with a VU connection

Image BiographyNet BiographyNet
BiographyNet has created an interlinked semantic knowledge base by extracting relations between people, places, historic events and time periods based on data from biographical descriptions in the Biography Portal of the Netherlands.

ILogo Rijks Rijksmuseum as Linked Open Data
The Rijksmuseum linked dataset contains over 350,000 objects, including detailed descriptions and high-quality images released under a public domain license. Also available on this site are collection and vocabulary statistics, as well as lessons learned from the process of converting the collection to Linked Data.

Logo CEDAR CEDAR Linked Open Census data
This project takes Dutch census data as its starting point to build a semantic data-web of historical information. With such a web, it will be possible to answer questions such as: What kind of patterns can be identified and interpreted as expressions of regional identity? How can patterns of changes in skills and labour be related to technological progress and patterns of geographical migration? How can changes of local and national policies in the structure of communities and individual lives be traced?

Image Dutch Ships and Sailors Dutch Ships and Sailors
Dutch Ships and Sailors provides an infrastructure for maritime historical datasets, linking correlating data through semantic web technology. It brings together datasets related to recruitment and shipping in the East-India trade (mainly 18th century) and in the shipping of the northern provinces of the Netherlands (mainly 19th century).

Logo Amsterdam Museum Amsterdam Museum Linked Open Data
The Amsterdam Museum dataset describes more than 70.000 cultural heritage objects related to the city of Amsterdam described by the museum.

Tutorials

Further reading

  • What Is Linked Data?
    A short introductory video lecture from Manu Sporny from CambridgeSemantics.
  • Data For Dummies (Erfgoed Leiden en Omstreken)   
    Linked Open Data explained in simple terms, in the context of enhancing the accessibility of digital heritage collections. [In Dutch]
  • Linked Open Data   
    Theme page of the DEN Foundation (Digitaal Erfgoed Nederland / Digital Heritage Netherlands), with a brief explanation of Linked Data and links to more information and a road map for the publication of a dataset as Linked Open Data. [In Dutch]
  • Linked Data: Evolving the Web into a Global Data Space by Tom Heath and Christian Bizer, ©2013
    This book gives an overview of the principles of Linked Data as well as the Web of Data that has emerged through the application of these principles. The book discusses patterns for publishing Linked Data, describes deployed Linked Data applications and examines their architecture.
  • Linked Data for Digital History: Lessons Learned from Three Case Studies by Victor de Boer, Albert Meroño-Peñuela and Niels Ockeloen. In: Historiografía digital: proyectos para almacenar y construir la Historia. Mirella Romero Recio and Mª Jesús Colmenero Ruiz (eds.) Anejos de la Revista de Historiografía. Universidad Carlos III de Madrid, 2016.
    In this paper, the authors present a number of case studies which use Linked Data principles for the representation and publication of digital history datasets. The three research projects that are presented are all collaborations between Dutch humanities researchers and computer scientists from VU University Amsterdam.

Other topics in this section: Introduction   Digital text   Digital images   Structured data   Big data