0% found this document useful (0 votes)

35 views2 pages

Asd

The document defines key terms used in corpus linguistics. It provides definitions for over 20 terms related to annotating, analyzing, and describing the components of text corpora.

Uploaded by

wnryhabsoqmokaaafr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views2 pages

Asd

The document defines key terms used in corpus linguistics. It provides definitions for over 20 terms related to annotating, analyzing, and describing the components of text corpora.

Uploaded by

wnryhabsoqmokaaafr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Annotation = codes used within a corpus that add information about things such as, for example,

grammatical category. Also refers to the process of adding such information to a corpus.

Balance = a property of a corpus (or, more precisely, a sampling frame). A corpus is said to be
balanced if the relative sizes of each of its subsections have been chosen with the aim of adequately
representing the range of language that exists in the population of texts being sampled.

Colligation = more generally, colligation is co-occurrence between grammatical categories (e.g. verbs
colligate with adverbs) but can also mean a co-occurrence relationship between a word and a
grammatical category.

Collocation = a co-occurrence relationship between words or phrases. Words are said to collocate
with one another if one is more likely to occur in the presence of the other than elsewhere.

Comparability = two corpora or subcorpora are said to be comparable if their sampling frames are
similar or identical.

Concordance = a display of every instance of a specified word or other search term in a corpus,
together with a given amount of preceding and following context for each result or „hit“.

Concordancer = a computer program that can produce a concordance from a specified text or corpus.
Modern concordance software can also facilitate more advanced analyses.

Corpus = from the Latin for „body“ (plural corpora), a corpus is a body of language representative of a
particular variety of language or genre which is collected and stored in electronic form for analysis
using concordance software.

Corpus-based = where corpora are used to test performed hypotheses or exemplify existing linguistic
theories. Can mean either: any approach to language that uses corpus data and methods. X an
approach to linguistics that uses corpus methods but does not subscribe to corpus-driven principles.

Corpus-driven = an inductive process where corpora are investigated from the bottom up and
patterns found therein are used to explain linguistic regularities and exceptions of the language
variety/genre exemplified by those corpora.

Diachronic = diachronic corpora sample texts across a span of time or from different periods in time
in order to study the changes in the use of language over time. Compare: synchronic.

Frequency list = a list of all the items of a given type in a corpus (e.g. all words, all nouns, all four-
word sequences) together with a count of how often each occurs.

Key word in context (KWIC) = a way of displaying a node word or search term in relation to its context
within a text. This usually means the node is displayed centrally in a table with co-text displayed in
columns to its left and right. Here, „key word“ means „search term“ and is distinguished from
keyword.

Keyword = a word that is more frequent in a text or corpus under study than it is in some (larger)
reference corpus. Differences between corpora in how the word being studied occurs will be
statistically significant for it to be a keyword.

Lemma = a group of words related to the same base word differing only by inflection.

- Walked, walking, and walks are all part of the verb lemma walk

Lemmatisation = a form of annotation where every token is labelled to indicate its lemma.
Lexis = the words and other meaningful units (such as idioms) in a language. The lexis or vocabulary
of a language is usually viewed as being stored in a kind of mental dictionary, the lexicon.

Metadata = the texts that makeup a corpus are data. Metadata is data about that data – it gives
information about things such as the author, publication date, and title for a written text.

Monitor corpus = a corpus that grows continually, with new texts being added over time so that the
dataset continues to represent the most recent state of the language as well as earlier periods.

Node = in the study of collocation – and when looking at a key word in context (KWIC) – the node
words is the word whose co-occurence patterns are being studied.

Reference corpus = a corpus which, rather than being representative of a particular language variety,
attempts to represent the general nature of a language by using a sampling frame emphasizing
representativeness.

Representativeness = a representative corpus is one sampled in such a way that it contains all the
types of text, in the correct proportions, that are needed to make the contents of the corpus an
accurate reflection of the whole of the language variety of language that is samples.

Sample corpus = a corpus that aims for balance and representativeness within a specified sampling
frame.

Sampling frame = a definition, or set of instructions, for samples to be included in a corpus. A

sampling frame specifies how samples are to be chosen from the population of text, what types of
texts are to be chosen, the time they come from and other such features. The number and length of
the samples may also be specified.

Statistical significance = a quantitative result is considered statistically significant if there is a low

probability (usually lower than 5%) that the figures extracted from the data are simply the result of
chance. A variety of statistical procedures can be used to test statistical significance.

Synchronic = relating to the study of language or languages as they exist at a particular moment in
time, without reference to how they might change over time. A synchronic corpus contains texts
drawn from a single period – typically the present or very recent past.

Tagging = an informal term for annotation, especially forms of annotation that assign an analysis to
every word in a corpus (such as part-of-speech or semantic tagging).

Text = as a count noun it is any artefact containing language usage – typically a written document or a
recorded and/or transcribed spoken text. As a non-count noun it is collected discourse, on any scale.

Token = any single, particular instance of an individual word in a text or corpus.

Type = A single particular wordform. Any difference of form (e.g. spelling) makes a word a different
type. All tokens comprising the same characters are considered to be examples of the same type. X
Can also be used when discussing text types.

Type-token ratio = a measure of vocabulary diversity in a corpus, equal to the total number of types
divided by the total number of tokens. The closer the ratio is to 1 (or 100%), the more varied the
vocabulary is. This statistic is not comparable between corpora of different sizes.

Corpus Linguistics Part 1
No ratings yet
Corpus Linguistics Part 1
30 pages
Corpus Glossary
No ratings yet
Corpus Glossary
10 pages
Dicción 1
No ratings yet
Dicción 1
52 pages
CASS Gloss Final1 PDF
No ratings yet
CASS Gloss Final1 PDF
12 pages
Corpus Definitions. Last Year
No ratings yet
Corpus Definitions. Last Year
6 pages
Corpus Linguistic1
No ratings yet
Corpus Linguistic1
6 pages
Seminar 2
No ratings yet
Seminar 2
11 pages
Seminar 1
No ratings yet
Seminar 1
7 pages
Corpus Linguistics and Corpus Analysis
No ratings yet
Corpus Linguistics and Corpus Analysis
7 pages
Corpus 2
No ratings yet
Corpus 2
49 pages
Definition of A Corpus
No ratings yet
Definition of A Corpus
6 pages
Corpora
No ratings yet
Corpora
2 pages
The Translation of Synonyms in The Holy Qur An: A Corpus-Based Approach
No ratings yet
The Translation of Synonyms in The Holy Qur An: A Corpus-Based Approach
15 pages
Copia Di CORPUS LINGUISTICS
No ratings yet
Copia Di CORPUS LINGUISTICS
51 pages
Concordancing and ELT: Porntip Bodeepongse
No ratings yet
Concordancing and ELT: Porntip Bodeepongse
19 pages
Literature Review by Maxkamova Dilnoza
No ratings yet
Literature Review by Maxkamova Dilnoza
3 pages
Issues and Concepts
No ratings yet
Issues and Concepts
15 pages
Corpus Linguistics Final
No ratings yet
Corpus Linguistics Final
13 pages
Corpus Linguistics 1
No ratings yet
Corpus Linguistics 1
48 pages
Corpus Linguistics: Prepared By: Elona Bardhi
No ratings yet
Corpus Linguistics: Prepared By: Elona Bardhi
8 pages
Corpora in Translation Studies
No ratings yet
Corpora in Translation Studies
5 pages
Group Members:: Ayesha Azhar Bareera Akbar Irum Masood Maryam Ahmed Tahira Jabeen
No ratings yet
Group Members:: Ayesha Azhar Bareera Akbar Irum Masood Maryam Ahmed Tahira Jabeen
58 pages
Stylistics: Corpus Approaches Martin Wynne
No ratings yet
Stylistics: Corpus Approaches Martin Wynne
6 pages
Topics
No ratings yet
Topics
85 pages
Corpus Approach To Analysing Gerund Vs Infinitive
No ratings yet
Corpus Approach To Analysing Gerund Vs Infinitive
16 pages
Lexicology and Lexicography Overview
No ratings yet
Lexicology and Lexicography Overview
16 pages
1 Corpus Linguistics
No ratings yet
1 Corpus Linguistics
38 pages
Summary LC
No ratings yet
Summary LC
9 pages
Lexicology and Corpus
No ratings yet
Lexicology and Corpus
16 pages
2015 Using Sketch Engine To Investigate Synoymous Verbs
No ratings yet
2015 Using Sketch Engine To Investigate Synoymous Verbs
13 pages
Article in Corpus Linguistics
No ratings yet
Article in Corpus Linguistics
6 pages
Corpus Linguistics: An Introduction
No ratings yet
Corpus Linguistics: An Introduction
43 pages
Jones 2022
No ratings yet
Jones 2022
14 pages
Seminar 3
No ratings yet
Seminar 3
10 pages
Types of CL
No ratings yet
Types of CL
5 pages
Introduction
No ratings yet
Introduction
8 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
17 pages
548-Article Text-736-1-10-20221121
No ratings yet
548-Article Text-736-1-10-20221121
4 pages
Unit 2 Representativeness, Balance and Sampling
No ratings yet
Unit 2 Representativeness, Balance and Sampling
8 pages
Dialnet Introduction 4125114
No ratings yet
Dialnet Introduction 4125114
4 pages
Unit 7 Text Book AL Bad and Good English
No ratings yet
Unit 7 Text Book AL Bad and Good English
17 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
24 pages
Linguistiqus Summary s3
No ratings yet
Linguistiqus Summary s3
11 pages
12 Corpora Linguistics
No ratings yet
12 Corpora Linguistics
27 pages
(Charles F. Meyer) English Corpus Linguistics An
No ratings yet
(Charles F. Meyer) English Corpus Linguistics An
186 pages
Lan & Meng 2023
No ratings yet
Lan & Meng 2023
23 pages
2024 09+10 LDA Jung
No ratings yet
2024 09+10 LDA Jung
17 pages
Corpora in The Classroom1
No ratings yet
Corpora in The Classroom1
81 pages
HLindquist - Corpus Linguistics and The Description of English - 2018
No ratings yet
HLindquist - Corpus Linguistics and The Description of English - 2018
256 pages
CORPUS TYPES and CRITERIA
100% (2)
CORPUS TYPES and CRITERIA
14 pages
Lexical Phonology & Morphology Overview
No ratings yet
Lexical Phonology & Morphology Overview
3 pages
8-CORPUS Analysis - Module 2-12-01-2024
No ratings yet
8-CORPUS Analysis - Module 2-12-01-2024
41 pages
Huang 2015
No ratings yet
Huang 2015
5 pages
Corpus and Dictionary Making
No ratings yet
Corpus and Dictionary Making
17 pages
Lexicography Is The Practice or Art of Compiling Dictionaries
No ratings yet
Lexicography Is The Practice or Art of Compiling Dictionaries
3 pages
Corpus Linguistics and The Study of Nineteenth-Century Fiction
No ratings yet
Corpus Linguistics and The Study of Nineteenth-Century Fiction
8 pages
Corpus
No ratings yet
Corpus
123 pages

Asd

Uploaded by

Asd

Uploaded by

Annotation = codes used within a corpus that add information about things such as, for example,

Sampling frame = a definition, or set of instructions, for samples to be included in a corpus. A

Statistical significance = a quantitative result is considered statistically significant if there is a low

Token = any single, particular instance of an individual word in a text or corpus.

You might also like