Corpus details

Lancaster/ IBM Spoken English Corpus

Official name:Lancaster/ IBM Spoken English Corpus
Common name:SEC
Language:English (southern British English)
Language type:spoken
Corpus type:sociolect
Period:1984 - 1987
Size:52,000 words
Description:Collection of mostly prepared (and mostly monologic) southern British English speech (approximating to RP).
Exploration:The corpus consists of plain text files and can be explored with standard exploration software like WordSmith and Windows Grep.
Annotation:part of speech
Transcription:orthografic; prosodic
Sound files:not available
Example:M03 002 [_( Weather_NNP Forecast_NNP ]_)
M03 003 [_( Speaker_NNP :_: male_NN ]_)
M03 006 now_RN the_ATI weather_NN forecast_NN until_IN dawn_NN
M03 006 tomorrow_NR ._. ^ over_IN England_NP
M03 007 and_CC Wales_NP ,_, many_AP places_NNS will_MD be_BE
M03 007 cloudy_JJ but_CC dry_JJ ;_; but_CC in_IN parts_NNS
Location:Faculty network, folder G:\LET\Data\Corpora\Engels\SpokenEC
Details:Three versions of the corpus are available: (i) orthographic transcription without further annotation (folder SEC_ORT); (ii) orthographic transcription with part of speech codes (folder SEC_HOR); (iii) orthographic transcription with prosodic annotation (folder SEC_PRO).
See Also: 
Name:Manual SEC
Description:Corpus manual with information about part of speech codes and prosodic transcription

back to overview