Corpus details

Reuters Corpus (RCV1)

Official name:Reuters Corpus (RCV1)
Language type:written
Corpus type:genre specific
Period:1996 - 1997
Size:810.000 news stories; 90 million words
Description:A collection of 810.000 newswires from Reuters for one year from 20-08-1996 to 19-08-1997. The material is marked up in XML.
Annotation:text category
Material:news stories
Fragmentation:full texts
Edition:Volume 1 (2000)
Location:Please contact Eric Akkerman.
Details:Restricted use due to copyright restrictions. Please contact Eric Akkerman for further information.
Contact:e.akkerman at
See Also: 
Name:Website Reuters corpus
Description:This website contains information about the corpus, inclusing statistics and a small bibliography of relevant articles.
Name:The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources
Description:Article by T.G. Rose, M. Stevenson and M. Whitehead, introducing the Reuters corpus
Name:Lewis, D. D., Yang, Y., Rose, T. & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5. 361-397.
Description:This article provides an extensive description of RCV1 and its category coding

back to overview