Corpus Linguistics: Tools and Resources
IT Services Course
Hilary Term 2015
Tips
Please feel free to get in touch with Ylva Berglund Prytz (ylva.berglund@it.ox.ac.uk) or Martin Wynne
(martin.wynne@it.ox.ac.uk) with any questions.
Explore the Corpora mailing list http://www.hit.uib.no/corpora/. You can sign up and ask a question on the
list, or search the archive for questions and answers in the past.
A software application which you can use for doing corpus linguistics with texts and corpora on your own
computer is AntConc (http://www.antlab.sci.waseda.ac.jp/).
It is free, and it is very simple to find, download and install. It has the main functions such as concordance,
collocation, wordlists, etc., and built-in support for many languages and writing systems. There are versions
for Windows, Mac and Linux.
RESOURCES
For modern European languages in particular, the Virtual Language Observatory at
http://www.clarin.eu/vlo/ is increasingly becoming the one-stop shop, and is constantly added to and kept
up to date.
Here is a selection of corpora available online:
English
Brigham Young Corpora (BNC, American English, Time) http://corpora.byu.edu/
British National Corpus http://ota.oerc.ox.ac.uk/bncweb-cgi/BNCweb.pl/ (full access for Oxford users),
http://www.natcorp.ox.ac.uk/, http://bncweb.info/
The Compleat Lexical Tutor concordances http://www.lextutor.ca/conc/
ELISA (interviews on film + transcription) http://www.uni-tuebingen.de/elisa/
MICASE Michigan Corpus of Academic Spoken English http://www.lsa.umich.edu/eli/micase/
Oxford English Corpus (more than 2 billion words and counting)
http://dws-sketch.uk.oup.com/bonito/home.html (log-in required - ask Martin Wynne)
Phrases in English (multiword expressions in the BNC) http://phrasesinenglish.org/
Chinese
The Lancaster Corpus of Mandarin Chinese (download from OTA)
http://www.ota.ox.ac.uk/headers/2474.xml
Czech
Czech National Corpus http://ucnk.ff.cuni.cz/
Finnish
Korp – access to various corpora https://korp.csc.fi/
French
ABU: la Bibliothèque Universelle (Online texts) http://abu.cnam.fr/
Ylva Berglund Prytz (ylva.berglund@it.ox.ac.uk) and Martin Wynne (martin.wynne@it.ox.ac.uk)
Corpus français (Université de Leipzig) http://wortschatz.uni-leipzig.de/ws_fra/
Online Concordancers at The Compleat Lexical Tutor French and English corpora with online concordancer
http://www.lextutor.ca/concordancers/
German
Das digitale Wörterbuch der deutschen Sprache http://www.dwds.de/
Institut fűr Deutsche Sprache http://corpora.ids-mannheim.de/
Italian
MultiSemCor English and Italian parallel corpus http://multisemcor.itc.it/
Portuguese
Corpus do Português http://www.corpusdoportugues.org/
COMPARA – parallel Portuguese-English http://www.linguateca.pt/COMPARA/
Russian
Russian National Corpus (Национальный корпус русского языка) http://ruscorpora.ru/
Swedish
Språkbanken (Swedish corpora) http://spraakbanken.gu.se/
Spanish
Corpus del Español http://www.corpusdelespanol.org/
SOL – Spanish Online Concordancias españolas en la Web http://spraakbanken.gu.se/lb/konk/rom2/
Multi-Lingual
Corpuseye Danish project with resources in different languages http://corp.hum.sdu.dk/
Intellitext Online interface to corpora in English, Chinese, Arabic, French, German, Italian, Japanese
http://corpus.leeds.ac.uk/it/
KWICfinder make concordances of webpages http://www.kwicfinder.com/
SACODEYL multi-media, teenagers http://www.um.es/sacodeyl/
WebCorp concordances of from online texts http://www.webcorp.org.uk/
ARCHIVES: TEXT, CORPORA, MEDIA
American Rhetoric project Text, audio and (streaming) video. http://www.americanrhetoric.com
Internet Archive Text, audio, video http://www.archive.org
Oxford Text Archive http://ota.ox.ac.uk/ (see 'Catalogue' and 'Oxford' pages)
OxLip+ for electronic text collections http://oxlip-plus.bodleian.ox.ac.uk/
Ylva Berglund Prytz (ylva.berglund@it.ox.ac.uk) and Martin Wynne (martin.wynne@it.ox.ac.uk)