Digital Humanities Workbench


Home page > Special topics > Language technology > Applications

Applications of Language technology

This page is a brief (and by no means exhaustive) overview of some key language technology applications that are made possible by the instruments discussed elsewhere in this section. The applications marked with an asterisk (*) are also used/developed in our faculty.

Intelligent spelling and grammar check

The standard spelling check in office software (e.g. Microsoft Office) is a simple form of language technology. As everyone knows, it is a great tool to find spelling mistakes and grammatical errors, although many are also overlooked. This is mainly because little is done with the meaning and context of the words and phrases in question. More advanced / intelligent systems are currently being developed, which should detect more errors.

Analysis of text corpora *

In most large corpora of text that are used as a basis for linguistic research, the source texts are enhanced with linguistic information. This includes, at least, information on the part of speech and conjugation of any word in the text (morpho-syntactic information), but often also includes information on the stem of the word (lemmatization). In many recent corpora, (part of) the text has also been subjected to complete syntactic analysis. Tagger-lemmatizers and parsers have an important role to play in this. Even if you build a text corpus for research yourself, you can use these tools for linguistic analysis.

Machine translation

A lot of money has been invested in developing machine translation over the past 70 years, but results have been mixed. Texts within a specific, well-defined, domain can be translated fairly well. For an overview, please see Aspecten van automatisch vertalen: resultaten - problemen (Steven Krauwer, 2001-2003). There are currently a number of commercial systems on the market, but, unfortunately, it has never been made public how they work. Systran is an example. This program is used by the European Commission, for example, to create rough translations for internal use. It can be used to translate many different language pairs. A demo web version is available at http://www.systransoft.com/ A typical application of machine translation systems is translating web pages. The Google search engine, for example, offers to translate any non-English web page resulting from a search. Quality is often less important here - what matters is that the core of the content remains the same, so that you can get a basic idea of the information presented on a Chinese website, for example.

Search Technology (information retrieval) *

Smart search technology does not only search on word form, but also considers its meaning, synonyms and related terms. Questions can be asked in plain Dutch and typos and misspellings will be corrected automatically.

Text classification

Automatic text classification distinguishes various predetermined categories or themes in documents. This technique is used by intelligent search engines, for example, which crawl the web for documents that are relevant for a specific query. Another application is the automatic classification of emails, so they can be routed to the appropriate department directly or can be answered automatically, making it easy to respond to bulk questions that always involve the same topic and giving service desks more time to address specific questions.

Text mining *

Text mining is a search method that can be used to search for high-quality information in large collections of data files. This means that the information found has to be very relevant to the query. Our faculty uses the Weka program for text mining.
For more information about this topic, please see the Wikipedia entry for Text mining.

Automatic summarization

Language technology makes it possible to automatically create summaries (of a pre-determined length) of random texts. One such application is the Automatic Summarizer for Dutch and English scientific documents made by Martijn Wieling.

Diagnosis, training and support of people with communication disorders

In order to improve the position of Dutch and Flemish people with communication disorders, the Dutch Language Union has carried out research on applications of language and speech technology (TST) for this group of people. TST can be used to diagnose constraints, to train and help revalidate communication skills, and to create tools supporting the remaining skills. For more information (including a summary of this research project)http://taalunieversum.org/taal/technologie/communicatieve_beperkingen/

Dialogue Systems (natural language interface systems)

A dialogue system is a language technology based communication system that mimics human interaction. A user of a dialog system can input questions, a number of single words, or a story. The input is subjected to linguistic analysis and compared with a database. If the system has sufficient information, it will associate the user's comments with relevant documents and verify the relevance of those documents. If the documents are not relevant at first, the system will ask the user a question in return, until a relevant document is found. A dialogue system can only refer to documents and cannot give specific information. It will be an interface for a search system, allowing someone to ask questions to a product catalog, or request travel information (OVIS), for example. Many dialogue systems today are voice-controlled, made possible by the addition of components for speech analysis and speech synthesis (see below).
For more information about this topic, see the Kennislink article Bellen met een pratende computer.

Speech synthesis

Speech synthesis (text to speech) is becoming increasingly common in applications requiring the automatic reading of digitized text. This application is a great tool for the visually impaired. However, the general public is also using these applications more and more often, in the form of e-books or e-mails that read out loud in the car, for example. An example of commercial software in this area is ReadSpeaker, which can read text found on a website out loud to the user. The technology is easy to use and requires no special knowledge. No special plug-ins are required either. You can find a practical example of how this technology is used on the website of the municipality of Haarlemmermeer (click "Lees voor") BrowseAloud is a free program you can install yourself to have the text on any website read out loud. This program not only reads the text out loud, but also includes a 'read-along cursor'. This cursor highlights the text that is currently being read out loud, allowing the user to read and listen to the text at the same time.

Speech analysis

Speech analysis involves computer-assisted analysis of spoken language. This technique is used for dictation software, for example, which can be used to automatically transcribe speech. Although this technique still has some snags, it can already be useful in the production of technical texts (medical, insurance, etc.). It may also be a solution for people with RSI or any other motor condition or limitation, and for people who have to keep their hands free to operate a keyboard while they work. An example of a state-of-the-art commercial application in this area is Dragon NaturallySpeaking. Its website also contains a video that demonstrates how the program works.
Another application of speech analysis that is currently the subject of a lot of research is searching spoken media. This would allow users to search (untranscribed) audio and video files through a (text-based) interface. An example of an application for the general public can be found on the website of the Willem Frederik Hermans Institute, which provides, among other things, audio material of Willem Frederik Hermans speaking in documentaries and interviews. The Human Media Interaction research group at the University of Twente has used a combination of speech recognition and indexing to allow users to search for specific fragments in multimedia files. Similar techniques have long been used by security and espionage services to scan telephone traffic for suspicious topics, for example.

Other topics in this section: Introduction   Instruments