Big data

Big data is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them. (Source: Wikpedia)

Characteristics of big data: the four V's

In most literature, the characteristics of big data are described as follows:
  • Volume: big data implies enormous volumes of data, often (but not exclusively) generated by machines, networks and human interaction on systems like social media.
  • Variety: this refers to the many different possible sources and types of big data. The data may be both structured and unstructured, coming in the form of text (Word, PDF, HTML, XML), images, videos, PDFs, audio, etc.
  • Velocity: this refers to the speed at which the data is generated and processed. From certain sources, like social media sites and mobile devices, the flow of data is massive and continuous. Related to this is the so-called volality of the data, which has to do with the period that data are valid and (more importantly) how long they are stored and available.
  • Veracity: the quality of data coming from different sources can vary greatly, both with regard to its content and to its structure. It is often difficult to check if all data in a big data set are correct and accurate for the intended use.

In humanities research, examples of big data sets are communications via social media like Facebook and Twitter, which are analysed from a communication perspective, and large cultural data sets of visual material. However, in the context of humanities research also large newpaper archives (like LexisNexis Academic) and text collections (like Google Books) are sometimes treated anmd analysed as big data.

Big data analysis requires advanced tools to extract meaning from the raw data. These tools transform, organize, and model the data to identify patterns and draw conclusions. The various techniques that are used to analyse big data sets are often referred to as data analytics. Specific instances of this relevant to the humanties are text analytics or text mining and cultural analytics.

