0% found this document useful (0 votes)
12 views26 pages

Collocations

The document discusses collocations, which are expressions consisting of two or more words that conventionally go together, such as 'strong tea' instead of 'powerful tea'. It highlights characteristics of collocations, including limited compositionality, substitutability, and modifiability, as well as methods for identifying them through frequency analysis, mean and variance, hypothesis testing, and mutual information. The content is based on Chapter 5 of Manning and Schuetze (1999) and emphasizes the importance of collocations in language use.

Uploaded by

Ana Ozaki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views26 pages

Collocations

The document discusses collocations, which are expressions consisting of two or more words that conventionally go together, such as 'strong tea' instead of 'powerful tea'. It highlights characteristics of collocations, including limited compositionality, substitutability, and modifiability, as well as methods for identifying them through frequency analysis, mean and variance, hypothesis testing, and mutual information. The content is based on Chapter 5 of Manning and Schuetze (1999) and emphasizes the importance of collocations in language use.

Uploaded by

Ana Ozaki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Words: Collocations

Lecturer: Ana Ozaki


Semester 01/2019
Free University of Bozen-Bolzano
Collocations
Our topic today is a linguistic phenomenon called
co ocation.

‘Natural way’ of using words.

Applications: information retrieval, support text editors,


arti cial text generation, etc.

Note: this material is based on Chapter 5 of (Manning


and Schuetze, 1999).
ll
fi
Collocations
‘A co ocation is an expression consisting of two or more
words that correspond to some conventional way of
saying things.’ (Firth 1957)

E.g. we say ‘strong tea’


instead of ‘powerful tea’.

3
ll
Collocations
‘A co ocation is an expression consisting of two or more
words that correspond to some conventional way of
saying things.’ (Firth 1957)

E.g. we say ‘strong tea’


instead of ‘powerful tea’.

E.g. we say ‘broad daylight’


instead of ‘bright daylight’.

4
ll
Collocations: Characteristics
Collocations are characterised by limited compositionality.

A language expression is compositional if the meaning of the


expression can be predicted from the meaning of the parts.

Example of collocation: white wine


Collocations: Characteristics
Collocations are characterised by limited compositionality.

A language expression is compositional if the meaning of the


expression can be predicted from the meaning of the parts.

Not a collocation: expensive wine


Collocations: Characteristics
Limited substitutability: cannot substitute by other words, even if in the
context they have the same meaning.

E.g., cannot say yellow wine instead of white wine

Limited modi ability: cannot be freely modi ed with additional lexical


material or through grammatical transformations.

Di culty in translating the expression to another language.

(Italian) Fame da lupo, (German) Bärenhunger, (Portuguese) Fome de leão


ffi
fi
fi
Collocations

Other examples of collocations:

Idioms ‘Can't judge a book by its cover’

For more examples and details on this topic, please see


Chapter 5 of (Manning and Schuetze, 1999).
Finding Collocations
Frequency: if two (or more) words occur together a lot, that is
evidence that they form a collocation, by the limited
substitutability and the limited modi ability principles.

However, the selection of the most frequent bigrams is usually


not very e ective.

Lots of function words: ‘of the’, ‘in the’, ‘to the’, …

A part-of-speech lter can greatly improve the results.

Adjective Noun (e.g., strong tea)


ff
fi
fi
Finding Collocations

TODO add experiment


Collocational Window
Collocations may not be xed phrases.

Many collocations can stand in a more exible structure than a xed


sequence of words.

E.g., the verb ‘to knock’ forms a collocation with the noun ‘door’.

She knocked on the metal door.

A man knocked on the wooden front door.

They knocked at the door.

Other verbs such as ‘hit’, ‘beat’ or ‘rap’ do not form a collocation with ‘door’.
fi
fl
fi
Mean and Variance

One way of discovering the relationship between ‘knock’


and ‘door’ is by computing the mean and variance of the
o sets (i.e., the distance between the two words).

She knocked on the metal door.


Mean
1 A man knocked on the wooden
(4 + 5 + 3) = 4
<latexit sha1_base64="(null)">(null)</latexit>
3 front door.

They knocked at the door.


ff
Mean and Variance
The variance measures how much the o sets deviate
from the mean µ .
Pn
<latexit sha1_base64="(null)">(null)</latexit>

2
2 (d
i=1 i µ)
=
n 1
Mean
1
(4 + 5 + 3) = 4 She knocked on the metal door.
<latexit sha1_base64="(null)">(null)</latexit>
3

Standard Deviation A man knocked on the wooden


r
1 front door.
((4 4)2 + (5 4)2 + (3 4)2 )
<latexit sha1_base64="(null)">(null)</latexit>
2
=1
<latexit sha1_base64="(null)">(null)</latexit>

They knocked at the door.


ff
Mean and Variance

TODO Experiment
Mean and Variance

TODO
Hypothesis Testing
High frequency and low variance may be accidental.

To know whether words occur together (or at a close


distance) more than by chance, one can resort to a
technique from statistics called hypothesis testing.

The key point here is that we are not simply considering


the frequency but also the amount of data, so that we
can rule out events that occur by chance.
Hypothesis Testing

Null hypothesis: Assume that co-occurrences between


words are by chance.

Compute the probability p that the event would occur if


the null hypothesis H is true; reject H if p is too low.

Typically if p is lower than 0.05.


Hypothesis Testing
Assume that the null hypothesis holds if two words v, w
do not form a collocation.

This means that v, w are generated independently, and

P (vw) = P (v)(w)

In this model, the probability of co-occurrence is the


product of the probability of the occurrence of each word.
.
Hypothesis Testing
Hypothesis Testing
Mutual Information
Mutual Information
Mutual Information
Take Home Message

Collocations are language expressions consisting of two


of more words that correspond to the usual way of
expressing ideas.

It is linguistic phenomenon shared by practically all


human languages (it is easier to form sentences using
‘chunks’ of words than word by word).
Take Home Message
Collocations are characterised by:

limited compositionality,

limited modi ability,

limited substitutability.

Example: ‘Fame da lupo’.


fi
Take Home Message
Finding collocations:

Frequency plus part-of-speech lter.

Mean and variance.

Hypothesis testing.

Mutual information.
fi

You might also like