Corpus details


Official name:TalkBank
Language:Multilingual; English; Dutch
Language type:spoken
Corpus type:special purpose
Period:1999 - 2004
Size:The various subcorpora differ in size.
Description:TalkBank is a multilingual corpus established in 2002. It contains sample databases from within several subfields of communication, including first language acquisition, second language acquisition, conversation analysis, classroom discourse, and aphasic language. It uses these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary linguistic materials via networked computers.
The majority of the corpora in TalkBank have either audio or video media linked to transcripts. All transcripts are formatted in the CHAT or CA/CHAT system and can be automatically converted to XML using the CHAT2XML convertor.
Exploration:Online: Talkbank Transcript Browser
Downloaded corpus files: CLAN is software that has been designed especially for the exploration of TalkBank material. However, most data are stored in ASCII (text) format and can therefore also be explored with Wingrep. WordSmith is less useful for the exploration of CHILDES material, because it ignores line feeds and does not show separate lines in the output.
Details:A prominent part of TalkBank is CHILDES (Child Language Data Exchange System), a corpus of first language acquisition data.
See Also: 
Name:TalkBank project site
Description:Website of the TalkBank project
Name:TalkBank Database Guide
Description:This guide provides documentation regarding each of the corpora in the TalkBank database that do not deal with child language.

back to overview