Corpus details

MicroConcord Corpus

Official name:MicroConcord Corpus
Language:English
Language type:written
Corpus type:genre specific
Size:2 million words
Description:The MicroConcord Corpus consists of two subcorpora, which both consist of 1 million words. The subcorpus of Academic Texts (corpus A) is a collection of short samples of academic prose from books and papers published by OUP. It was sampled to broadly represent academic writing and covers a range of disciplines including scientific, philosophical and religious texts. The MicroConcord Corpus of Journalistic Texts (corpus B) consists of samples of journalistic text from the British newspaper, 'The Independent'. It covers home, foreign, business, arts and sports news published in 1989.
Exploration:The corpora consist of plain text files and can be explored with standard exploration software like WordSmith and Windows Grep.
Annotation:text structure (basic)
Example:<sect> Foreign News Page 8 </sect></st>
<dt> 891002 </dt>
<hl> The Bar Conference: Donaldson sets the tone for battle over rights of audience </hl>
<bl> By PATRICIA WYNN DAVIES, Legal Correspondent </bl>
<st>
<p> BARRISTERS could retain much of their monopoly over advocacy in the higher courts under a subtle agenda for discussion spelled out by Lord Donaldson, the Master of the Rolls, as he opened the Bar's annual conference in London at the weekend.
Origin:Oxford University Press
Location:Faculty network, G:\LET\Data\Corpora\Engels\MicroConcord
  
See Also: 
  
Name:Corpus documentation
Description:Documentation of sources and tagset.
  


back to overview