Santa Barbara Corpus of Spoken American English

Official name:Santa Barbara Corpus of Spoken American English
Common name:Santa Barbara Corpus
Language:English (American English)
Language type:spoken
Corpus type:general / reference
Size:60 discourse segments, totalling approximately 249,000 words.
Description:The Santa Barbara Corpus of Spoken American English is based on a large body of recordings of naturally occurring spoken interaction from all over the United States. The Santa Barbara Corpus represents a wide variety of people of different regional origins, ages, occupations, genders, and ethnic and social backgrounds. The predominant form of language use represented is face-to-face conversation, but the corpus also documents many other ways that that people use language in their everyday lives: telephone conversations, card games, food preparation, on-the-job talk, classroom lectures, sermons, story-telling, town hall meetings, tour-guide spiels, and more.
Exploration:The plain text version of the corpus can be explored with standard exploration software like WordSmith and Windows Grep. The chat version of the corpus can be explored with the program clan.
Fragmentation:discourse fragments
Sound files:yes
Example:Plain text version:
0.00 9.21	LENORE: 	... So you don't need to go ... borrow equipment from anybody,
9.21 9.52	        	to --
9.52 14.10	        	... to do the feet?
14.10 15.78	        	... [Do the hooves]?
15.01 16.78	LYNNE:  	    [(H)=] >YWN Well,
16.78 18.32	        	   we're gonna have to find somewhere,
18.33 18.85	        	to get,
18.85 20.69	        	(Hx) ... something (Hx) YWN>.
Origin:Linguistics Department of the University of California, Santa Barbara.
Reference:The corpus consists of 4 parts, each with its own reference, see
Location:Faculty network, G:\LET\Data\Corpora\Engels\SBCSAE.
Details:In the SBCSAE folder on the faculty network, you can also find a Perl script that converts from the SBC's transcript file-format to Praat's TextGrid format.
