Home page >
Tools > Programming
Programming
The tools that are described in this workbench vary from simple to more complex, both regarding the tasks they can perform and the complexity of their use. They have in common, however, that they are prefabricated (which is why they are sometimes referred to as off the shelf software). For many types of humanities research this will suffice, but for certain tasks they may not be fully appropriate. In those cases it may be possible to replace them by alternative tools, which provide a better fit or which are (more) customizable
(see the introduction for suggestions where to search for alternatives). Or, when no appropriate alternatives are found, it may be necessary to develop custom tools.
Often this is done by IT-specialists (programmers) cooperating with humanities
scholars. Increasingly, however, humanities scholars and students are taking the step
to learn to program themselves, for the following reasons:
-
There are many relatively simple tasks for which no ready-made tools are available, such as
preparing data for processing, converting the output of one tool to the input format of
another, extracting data from running text etc., which take a lot of time when done by hand,
but for which custom tools can be made without too much effort.
-
A general knowledge of programming principles can be of help for learning to work with
database tools and text-search tools.
-
Some more generic tools, which have a graphical user interface and can be used without
programming, like Microsoft Access or SPSS for example, have an underlying programming
or scripting environment for advanced usage.
-
Other generic tools, such as Microsoft Excel or R, can be regarded as development
environments for creating custom tools for a certain range of tasks. Hence, most of the time
they cannot be used without at least some programming. (The exception being when the tool is
used for data entry only).
-
Many software packages provide an embedded extension language for writing
macros and custom scripts, which in fact is a program-specific programming language or an
adapted version of a general purpose programming language, such as Basic, Javascript,
Perl or Python.
For example, the Microsoft Office tools include Visual Basic for Applications as an
extension language.
What is a programming language?
Programming languages differ from natural languages in that they require a greater
degree of precision and completeness. So, a programming language defines a set of codes
and strict syntax rules for writing sequences of unambiguous formal expressions
(definitions and/or instructions) to form programs.
Hence, the act of programming is often called
coding.
A programming language facilitates composing new
programs of pre-defined elements and existing program libraries.
A program library is a collection of program modules, which can be
used as building blocks for new software.
Most programming languages have an associated core library (or standard library).
Core libraries typically include definitions for commonly used algorithms, data structures,
and mechanisms for input and output.
Programming languages may differ in what other program libraries are available.
There are many programming languages, which are designed for different purposes and are
used in different contexts. There are general-purpose languages and there are domain-specific
languages.
The availability of relevant program libraries can play an important role in choosing
a programming language as well.
In some contexts, it is not exceptional to combine the use of several programming
languages in one application, depending on the kinds of tasks performed.
For example, a typical webdatabase application may be coded using a combination of HTML, CSS
and Javascript (for the client-side part in the webbrowser), PHP and SQL (for the server part),
and XML or JSON (for the data exchange between client and server).
Software development process
Creating a program involves more than coding alone.
In the process of software development the following stages can be distinguished:
-
Information analysis:
a comprehensive specification is made of all relevant types of information and their relationships,
the task(s) to be performed, interactions with other systems, interactions with the human
user(s), input and output data, and system requirements. The results are usually written
down in a report called the requirement specifications.
-
Design:
a plan is made how the requirement specifications can be met. This involves
creating specifications of data models, procedures, interfaces (with other systems),
user interfaces, input and output formats;
eventual decomposition in smaller modules; selection of tools and building blocks
(program libraries) which can be used;
-
Implementation (or coding):
to create the actual program, the specifications from the design stage are translated to
formal expressions
(definitions and/or instructions) in a programming language.
-
Testing:
performing checks whether the implemented program works as designed and without errors;
-
Evaluation:
when the software satisfies the goals of the project, it can be accepted as the final product.
However, evaluation of the results can also lead to adaptations or refinements of
the requirements and another iteration through the stages of the development cycle.
This often occurs in projects where it is hard to formulate perfect requirement
specifications in advance, such as large projects or research projects.
The amount of work involved in each of these stages may vary from project to project,
depending on the size and nature of the project and the context in which the program will be
used. For example, the requirement specifications of a small data conversion project may take
only a few lines, while the requirement specifications for a web database of research data
may take a report of tens of pages. When a program is developed as an extension to existing
software, many design decisions are dictated by what is already available.
Applications
Most applications of custom programming in the humanities deal with tasks like the following:
-
extraction of data from (mainly huge) data sets, found on the internet (tweets; facebook posts; digitized publications in Google Books, etc.);
-
extraction of structured data from unstructured data sets (like text corpora);
-
processing of this kind of data, in order to prepare them for analysis;
-
syntactic, semantic and content-based annotation of text corpora;
-
conversion of data, for example from an XML corpus or a database to a data structure that can be analysed with a statistical package like SPSS;
-
forms of data analysis that cannot adequately be performed by ready made software.
Programming languages
A program can be coded in many different programming languages. Each programming language has it's specific advantages and disadvantages. In the field of humanities research, the following programming languages deserve a special mention:
-
Python, which nowadays is the programming language of choice in humanities research, especially for all kinds of text analysis tasks. It is used for all of the applications mentioned in the paragraph above. N.B. One of the reasons for its popularity is that many modules for natural language processing (NLP) tasks are written in Python (see also the setion about language technology in this Workbench).
-
Visual Basic for Applications (VBA), a macro or scripting language within several Microsoft applications, like Word, Excel and Access.
-
PHP, a server scripting language, and a powerful tool for making dynamic and interactive Web pages. Often used for creating interfaces with web databases.
-
Javascript,
the embedded programming language in webbrowsers, and a powerful tool
for making dynamic web pages by using operations running locally inside the webbrowser.
It facilitates the dynamic modification of parts of a web page without the need for
refreshing the entire page for each change. It can also be used to control applets,
which are embedded in the web page.
-
C# (pronounced c-sharp), a general-purpose, object-oriented programming language, similar to Java. It is a proprietary language of Microsoft, which has open implementations of most parts of the system, but has Common Language Runtime (CLR) as a closed environment. Hence, it only runs reliably on the Windows platform.
It may be used, for example, to develop interactive psycholinguistics test or e-learning modules which require functionalities that are not supported by ready made applications.
-
Java, a portable general-purpose, object-oriented programming language, which runs on
the major computer platforms (Windows, Unix/Linux, MacOs) and also on smaller systems like
mobile devices. Java is not as easy to learn and use as Python or PHP, so it might not be the
obvious choice for humanities researchers to learn as a first programming language.
However, the language is used very often by IT-specialists, particularly for client-server
web applications. It is also often
used in XML-processing applications and Natural Language Processing (NLP) applications.
The high availability of program libraries for many application-domains might be another
reason to choose Java.
N.B. Although query languages for databases (like SQL) and XML documents (like XQuery) are usually not called programming languages, they do require the same kind of formal reasoning.
Courses and tutorials
Students of our faculty can follow the course
Python Programming for Text Analysis (course code L_AAMPLIN017), which is offered in the context of the research master Linguistics (no prior knowledge required).
The course
Coding the Humanities, which is part of the university minor
Digital Humanities, is a more general course on programming.
Online Python tutorials:
Python for Non-Programmers
List of python tutorials that don't assume that you have previous experience in programming.
Python for Programmers
List of tutorials that are aimed at people who have previous experience with other programming languages (like C, Perl, Lisp or Visual Basic).
PHP 5 Tutorial (W3Schools).
JavaScript Tutorial (W3Schools).