0% found this document useful (0 votes)

6 views8 pages

Chap 2 Part 1

Uploaded by

Eman Mohamed Yousef Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views8 pages

Chap 2 Part 1

Uploaded by

Eman Mohamed Yousef Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CHAPTER 2

Presented by: Eman Mohamed Yousef

2.5 What Is a Corpus?
The Traditional Notion of a Linguistic The Notion of the Web as Corpus.
Corpus
Finite Size Non-Finiteness

Balance Flexibility

Part-whole Relationship Decentering / Recentering

Permanence Provisionality

as the BNC, we know precisely what kinds No such certainty exists.

and types of English are being analyzed.
The Study of Biber, Egbert, and Davies (2015)
Function An empirical study to determine the kinds of texts that exist on the Web
based on a collection of texts taken from the Corpus of Global Web-
Based English (GloWbE), a corpus that is 1.9 billion words in length
and that contains samples of English from 20 different countries in
which English is used.

Steps 1. They developed a very carefully planned methodology for

extracting a representative body of texts in the corpus from the Web
2. They trained a group of evaluators to categorize the texts into
specific registers.

Findings ² They found that three registers predominated:

1. The Narrative Register
2. The Informational Description/Explanation Register
3. The Opinion Register
² They also discovered a number of texts that were hybrid in nature:
“documents that combine multiple communicative purposes in a
single text”.
2.6Corpus
2.6 CorpusSize
Size
First Generation Corpora Second Generation Corpora
As Brown and LOB were relatively As BNC: regularly 100 million words in
short (each of one million words in length or even longer.
length)

Keyed in by Hand a Optical Scanners made it easier to

tremendous amount of very tedious convert printed texts into digital
and time-consuming typing. formats
The Study of Davies:
Function His goal was to determine the extent to which the length of a corpus
could provide valid information on 10 different linguistic constructions,
including individual lexical items; frequently occurring grammatical
structures, such as modal verbs and passives; collocations, and an
assortment of other commonly studied grammatical items.
Steps 1. He provides a useful guide for determining how lengthy a corpus
needs to be to accurately describe particular linguistic structures.
2. He analyzed three different corpora of varying length: the Brown
Corpus (one million words), the BNC (100 million words), and COCA
(at that time 500 million+ words).
Findings ² Individual lexical items were better studied in larger corpora than in
shorter corpora.
§ adjectives such as fun or tender are among the group of adjectives
that are most common in COCA, in the Brown Corpus, they
occurred five times or less.
§ In contrast, certain types of syntactic structures, such as modal
verbs, have more even distributions across the three corpora, thus
being one of the few areas “where Brown provides sufficient data”
The Study of Biber:
Function Biber (1993) provides a different mechanism for estimating the
necessary size of a corpus for the study of particular linguistic
constructions.

Steps ² His approach employs statistical formulas that

1. take the frequency with which linguistic constructions are likely to
occur in a corpus
2. then calculate how large the corpus will have to be to validly study
the distribution of the constructions.
Findings ² Reliable information could be obtained on frequently occurring
linguistic items such as nouns in as few as 59.8 text samples.
² infrequently occurring grammatical constructions such as
conditional clauses required a much larger number of text samples
(1,190) for valid information to be obtained.
2.7 The Internal Structure of a Corpus
BNC ICE

While the two corpora contain the same range of genres, the genres are much more specifically
delineated in ICE Corpora than they are in the BNC.
For instance, in both corpora, 60 percent of the spoken texts are dialogues and 40 percent are
monologues.

Dialogues and monologues are interspersed In the category of speech, there are dialogues
among the various genres (e.g. business, and monologues. Dialogues can be either
leisure) making up the spoken part of the private (e.g. direct conversations) or public
corpus. (e.g. broadcast discussions)

In both corpora, there is a clear bias towards spontaneous dialogues

While the amount of writing in the BNC greatly exceeded the amount of speech, just the opposite
is true in the ICE Corpus

Although the BNC makes a distinction between the natural, applied, and social sciences and,
unlike the ICE, does not include equal numbers of texts in each of these categories.
2.7 The Internal Structure of a Corpus
BNC ICE
² Both the ICE and BNC are multi-purpose corpora;
They are intended to be used for a variety of different purposes, ranging from studies of
vocabulary, to studies of the differences between various national varieties of English, to
studies whose focus is grammatical analysis, to comparisons of the various genres of English.
For this reason, each of these corpora contains a broad range of genres.

COCA
It represents five major registers: spoken (transcripts of dialogical speech taken from various
television and radio shows), fiction, newspapers, magazines, and academic writing.

The overall size of the corpus (one billion words)

COCA has a diachronic component because it contains text collected over a span of 19 years.

BNC & Ice
No ratings yet
BNC & Ice
24 pages
1 Corpus Linguistics
No ratings yet
1 Corpus Linguistics
38 pages
Exploring Corpora Task 1 - 2023
No ratings yet
Exploring Corpora Task 1 - 2023
13 pages
McEnery Corpusit 2001
No ratings yet
McEnery Corpusit 2001
47 pages
8-CORPUS Analysis - Module 2-12-01-2024
No ratings yet
8-CORPUS Analysis - Module 2-12-01-2024
41 pages
RoutledgeHandbooks 9780367076399 Chapter4
No ratings yet
RoutledgeHandbooks 9780367076399 Chapter4
14 pages
Corpus Lingustics
No ratings yet
Corpus Lingustics
24 pages
Types of Corpora
100% (6)
Types of Corpora
2 pages
Corpus Linguistics Overview
No ratings yet
Corpus Linguistics Overview
42 pages
Types of CL
No ratings yet
Types of CL
5 pages
Designing A Corpus
No ratings yet
Designing A Corpus
29 pages
Different Types of Corpora
No ratings yet
Different Types of Corpora
6 pages
The Brown Corpus
No ratings yet
The Brown Corpus
9 pages
Topics
No ratings yet
Topics
85 pages
Corpus Linguistics: An Introduction
No ratings yet
Corpus Linguistics: An Introduction
43 pages
Cospus Approaches in Discourse Analysis
No ratings yet
Cospus Approaches in Discourse Analysis
14 pages
The International Encyclopedia of Language and Social Interaction - 2015 - Vaughan
No ratings yet
The International Encyclopedia of Language and Social Interaction - 2015 - Vaughan
17 pages
Concordancing and ELT: Porntip Bodeepongse
No ratings yet
Concordancing and ELT: Porntip Bodeepongse
19 pages
The Spoken BNC2014 Designing and Building A Spoken
No ratings yet
The Spoken BNC2014 Designing and Building A Spoken
26 pages
Corpus Linguistics
No ratings yet
Corpus Linguistics
17 pages
Corpus 2
No ratings yet
Corpus 2
49 pages
Cheng 2012 PP 3-8 Intro
No ratings yet
Cheng 2012 PP 3-8 Intro
6 pages
Corpora in English Language Teaching
No ratings yet
Corpora in English Language Teaching
21 pages
Unit 7 Extended Well-Known and Influential Corpora
No ratings yet
Unit 7 Extended Well-Known and Influential Corpora
56 pages
Sociolinguistics and Corpus Linguistics 1st Edition Paul Baker Available Instanly
100% (4)
Sociolinguistics and Corpus Linguistics 1st Edition Paul Baker Available Instanly
168 pages
Corpus Linguistics: History and Analysis
No ratings yet
Corpus Linguistics: History and Analysis
66 pages
Text Corpus: Meaning, Features, Classification
No ratings yet
Text Corpus: Meaning, Features, Classification
14 pages
Corpus Design and Types of Corpora
No ratings yet
Corpus Design and Types of Corpora
68 pages
Corpus Design and Types of Corpora
No ratings yet
Corpus Design and Types of Corpora
68 pages
Name: David Elkharis Larosa Class: B Subject: Discourse Analysis Corpus Approaches To Discourse Analysis A. What Is A Corpus?
No ratings yet
Name: David Elkharis Larosa Class: B Subject: Discourse Analysis Corpus Approaches To Discourse Analysis A. What Is A Corpus?
6 pages
Corpus Linguistics & The BNC: Outline
No ratings yet
Corpus Linguistics & The BNC: Outline
2 pages
Introduction
No ratings yet
Introduction
8 pages
Seminar 1
No ratings yet
Seminar 1
7 pages
BNC Nadeem Hassan
No ratings yet
BNC Nadeem Hassan
15 pages
Linguistic Corpora Overview
No ratings yet
Linguistic Corpora Overview
41 pages
Lexical Variation
No ratings yet
Lexical Variation
10 pages
Aula2 - 2003 KilgGrefenstette WACIntro PDF
No ratings yet
Aula2 - 2003 KilgGrefenstette WACIntro PDF
15 pages
Linguistics Researchers' Guide
100% (1)
Linguistics Researchers' Guide
13 pages
Corpus Linguistics Practical Introduction PDF
No ratings yet
Corpus Linguistics Practical Introduction PDF
32 pages
Brown Corpus
No ratings yet
Brown Corpus
2 pages
Corpus Design: G Kennedy, Introduction To Corpus Linguistics, CH 2 CF Meyer, English Corpus Linguistics, Ch. 2
No ratings yet
Corpus Design: G Kennedy, Introduction To Corpus Linguistics, CH 2 CF Meyer, English Corpus Linguistics, Ch. 2
38 pages
Roberta - Facchinetti Corpus - Linguistics (25.years - On)
100% (1)
Roberta - Facchinetti Corpus - Linguistics (25.years - On)
392 pages
Lancaster University - Week 1 Lecture: Part 3
No ratings yet
Lancaster University - Week 1 Lecture: Part 3
4 pages
Linguistic Learning Practice Portfolio
No ratings yet
Linguistic Learning Practice Portfolio
28 pages
Lindquist H. Corpus Linguistics and The Description of English
No ratings yet
Lindquist H. Corpus Linguistics and The Description of English
241 pages
Corpus Linguistics and The Description of English - Facebook Com LinguaLIB
No ratings yet
Corpus Linguistics and The Description of English - Facebook Com LinguaLIB
241 pages
Literature Review On Corpus Linguistics
No ratings yet
Literature Review On Corpus Linguistics
7 pages
BNC170BBNNNCCC
No ratings yet
BNC170BBNNNCCC
170 pages
Corpus Typology
No ratings yet
Corpus Typology
23 pages
Lan & Meng 2023
No ratings yet
Lan & Meng 2023
23 pages
Exploring English With Online Corpora
100% (2)
Exploring English With Online Corpora
209 pages
Corpora - and - Grammar NLP
No ratings yet
Corpora - and - Grammar NLP
64 pages
Corpus Linguistics and Corpus Analysis
No ratings yet
Corpus Linguistics and Corpus Analysis
7 pages
Newman & Rice (2001) English SIT, STAND and LIE in Small and Large Corpora
No ratings yet
Newman & Rice (2001) English SIT, STAND and LIE in Small and Large Corpora
26 pages
Corpora
No ratings yet
Corpora
12 pages
CWB Encoding Tutorial
No ratings yet
CWB Encoding Tutorial
13 pages
Statistical Phrase-Based Translation: Philipp Koehn, Franz Josef Och, Daniel Marcu
No ratings yet
Statistical Phrase-Based Translation: Philipp Koehn, Franz Josef Och, Daniel Marcu
7 pages
Purepos 2.0: A Hybrid Tool For Morphological Disambiguation
No ratings yet
Purepos 2.0: A Hybrid Tool For Morphological Disambiguation
7 pages
Semantic Prosody
No ratings yet
Semantic Prosody
14 pages
Explicitation Techniques in Arabic-English Translation
No ratings yet
Explicitation Techniques in Arabic-English Translation
20 pages
Furiassi (2010) - False Anglicisms in Italian
No ratings yet
Furiassi (2010) - False Anglicisms in Italian
257 pages
Paper Grading Rubric PDF
No ratings yet
Paper Grading Rubric PDF
4 pages
Corpus Stylistic: Presented By: Quissa Marie M. Gonzales-BSED Presented To: Dr. Arjan Espiritu
No ratings yet
Corpus Stylistic: Presented By: Quissa Marie M. Gonzales-BSED Presented To: Dr. Arjan Espiritu
16 pages
Thematic Analysis and Visualization of Textual Corpus
No ratings yet
Thematic Analysis and Visualization of Textual Corpus
17 pages
The Discussion Section As Argument The Language Used To Prove Knowledge Claims
No ratings yet
The Discussion Section As Argument The Language Used To Prove Knowledge Claims
12 pages
The Phraseology of Public International English
No ratings yet
The Phraseology of Public International English
21 pages
(Andreas H. Jucker, Daniel Schreier, Marianne Hund PDF
100% (1)
(Andreas H. Jucker, Daniel Schreier, Marianne Hund PDF
529 pages
2ndLITU CULI2016 Handbook
No ratings yet
2ndLITU CULI2016 Handbook
98 pages
Sprikreports: No. 1, October 2000
No ratings yet
Sprikreports: No. 1, October 2000
10 pages
Corpus Stylistics in Heart of Darkness and Its Italian Translations Lorenzo Mastropierro PDF Download
100% (1)
Corpus Stylistics in Heart of Darkness and Its Italian Translations Lorenzo Mastropierro PDF Download
83 pages
New Academic Word List Development
No ratings yet
New Academic Word List Development
26 pages
Verb-Noun Collocations in L2 Writing
No ratings yet
Verb-Noun Collocations in L2 Writing
26 pages
The Oxford 3000: WWW - Le.ac - Uk 1
100% (1)
The Oxford 3000: WWW - Le.ac - Uk 1
2 pages
The Routledge Handbook of Corpus Linguistics Second Edition O'Keeffe Anne (Ed) Download PDF
No ratings yet
The Routledge Handbook of Corpus Linguistics Second Edition O'Keeffe Anne (Ed) Download PDF
47 pages
If You Look at ... Lexical Bundles in University Teaching and Textbooks PDF
No ratings yet
If You Look at ... Lexical Bundles in University Teaching and Textbooks PDF
35 pages
Review of Working With Specialized Language
No ratings yet
Review of Working With Specialized Language
2 pages
Language and Politics - by John Joseph PDF
No ratings yet
Language and Politics - by John Joseph PDF
6 pages
A Rule Based Punjabi Dialect Conversion System
No ratings yet
A Rule Based Punjabi Dialect Conversion System
7 pages
Twitter Sentiment Analysis Guide
No ratings yet
Twitter Sentiment Analysis Guide
7 pages
Terms
No ratings yet
Terms
30 pages
Learner Corpora SLA
No ratings yet
Learner Corpora SLA
18 pages
(English Corpus Linguistics 17) Vera Benninghoven - The Functions of Gener
100% (2)
(English Corpus Linguistics 17) Vera Benninghoven - The Functions of Gener
260 pages
Getting Started With Antconc Wide Emu 2013
No ratings yet
Getting Started With Antconc Wide Emu 2013
11 pages
Li Haiying, Graesser, Arthur C. & Cai Zhiqiang - Comparison of Google Translation With Human Translation PDF
No ratings yet
Li Haiying, Graesser, Arthur C. & Cai Zhiqiang - Comparison of Google Translation With Human Translation PDF
6 pages

Chap 2 Part 1

Uploaded by

Chap 2 Part 1

Uploaded by

CHAPTER 2

Presented by: Eman Mohamed Yousef

Part-whole Relationship Decentering / Recentering

as the BNC, we know precisely what kinds No such certainty exists.

Steps 1. They developed a very carefully planned methodology for

Findings ² They found that three registers predominated:

Keyed in by Hand a Optical Scanners made it easier to

Steps ² His approach employs statistical formulas that

In both corpora, there is a clear bias towards spontaneous dialogues

The overall size of the corpus (one billion words)

You might also like