See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/288152909
Collocations
Chapter · December 2006
DOI: 10.1016/B0-08-044854-2/00414-4
CITATIONS                                                                                              READS
6                                                                                                      7,736
1 author:
            Ramesh Krishnamurthy
            160 PUBLICATIONS   627 CITATIONS   
                SEE PROFILE
Some of the authors of this publication are also working on these related projects:
              cobuild View project
              corpus linguistics View project
 All content following this page was uploaded by Ramesh Krishnamurthy on 23 August 2019.
 The user has requested enhancement of the downloaded file.
Collocation
Ramesh Krishnamurthy, Aston University, Birmingham, UK
Abstract
J.R. Firth first gave collocation prominence in linguistic theory. Halliday, Sinclair,
Stubbs, and Hoey have all extended Firth’s ideas. Palmer and Hornby recognised the
pedagogical value of collocation, and incorporated it into their early EFL dictionaries.
More recent EFL dictionaries, based on large computerized language corpora, have
used complex software and statistical measures to gain further insights into the way
that collocational patterns are woven into language, and the results are visible in
the dictionary entries of later editions. This has fed back into language pedagogy, and
is also influencing translation and computational research.
Body Text
1. Historical use of the term collocation
The fact that certain words co-occurred frequently was noticed in Biblical
concordances (e.g. Cruden listed the occurrences of dry with ground in 1769). Style
and usage guides in the 19th-20th centuries (e.g. Fowler’s The King’s English)
addressed only the overuse of collocations, labelling them clichés, and criticising their
use, especially by journalists (e.g. Brian O’Nolan, in more humorous vein).
                                            1
2. Collocation in modern Linguistics
In modern linguistics, collocation refers to the fact that certain lexical items tend to
co-occur more frequently in natural language use than syntax and semantics alone
would dictate. Collocation was first given theoretical prominence by J.R. Firth, who
separated it from cognitive and semantic ideas of word-meaning (calling it an
‘abstraction at the syntagmatic level’) and accorded it a distinct status in his account
of the linguistic levels at which meaning can arise. Firth implicitly indicated that
collocation required a quantitative basis, giving actual numbers of co-occurrences in
some texts.
Halliday saw collocation as a cohesive device and identified the need for a measure of
significant proximity between collocating items, and said that collocation could only
be discussed in terms of probability, thus validating the need for quantitative analyses
and the use of statistics. Sinclair performed the first computational investigation of
collocation, comparing written and spoken corpora, identifying 5 words as the span
of significant proximity, and experimenting with statistical measures and
lemmatization.
Halliday and Sinclair thought that collocation could enable a lexical analysis of
language independent of grammar. Sinclair suggested that lexical items could be
defined by their collocational environments, saw collocation as part of the idiom
principle (lexically determined choices), as opposed to the open choice principle
(grammatically determined choices). Leech included ‘collocative’ in his categories of
meaning, but marginalized it as an idiosyncratic property of individual words,
incapable of contributing to generalizations. Sinclair and Stubbs suggest that all
                                             2
lexical items have collocations, Hoey accommodates collocation within a model of
‘lexical priming’, and suggests that most sentences are made up of interlocking
collocations, and can therefore be seen as reproductions of earlier sentences.
3. Collocation and lexicography
The pedagogical value of collocation was recognized by English teachers in the
1930s, and English collocations were described in detail by Harold Palmer in a report
on phraseology research with A.S. Hornby, using the term fairly loosely to cover
longer phrases, proverbs, etc as well as individual word-combinations. They showed a
major interest in the classification of collocations in grammatical and semantic terms,
but also used collocations to indicate the relevant senses of words in wordlists (draw
1. e.g., a picture 2. e.g., a line), and in their dictionary examples (a practice continued
in Hornby’s 1948 OALD and subsequent editions).
Early EFL dictionaries avoided using the term collocation, e.g. OALD 1974 refers to
‘special uses of an adjective with a preposition’ (liable: ~for, be ~ to sth), ‘special
grammatical way in which the headword is used’ (meantime: in the ~). LDOCE 1978
refers to ‘ways in which English words are used together, whether loosely bound or
occurring in fixed phrases’ and ‘special phrases in which a word is usually (or
always) found’, but also has a section headed ‘Collocations’, defined as ‘a group of
words which are often used together to form a natural-sounding combination’ and
states that they are shown in 3 ways: in example sentences, in explanations in Usage
Notes, or in heavy black type inside round brackets if they are very frequent or almost
a fixed phrase (‘but not an idiom’), signalled by ‘in the phr.’ or similar rubrics, and
gives the example a mountain fastness.
                                             3
Later EFL dictionaries (Cobuild, Cambridge, Macmillan, etc) continued to
incorporate collocations in their dictionaries, including them in definitions and
examples, and typographically highlighting them in phrases. Sinclair’s Introduction to
the Cobuild Dictionary (1987), in the section on ‘Word and Environment’, talks of
‘the way in which the patterns of words with each other are related to the meanings
and uses of the words’ and says that ‘the sense of a word is bound up with a particular
usage… a close association of words or a grouping of words into a set phrase’ and ‘(a
word) only has a particular meaning when it is in a particular environment’,
discussing examples such as hard luck, hard facts, hard evidence, strong evidence,
tough luck, and sad facts.
In Sinclair (1987), collocates are defined as ‘words which co-occur significantly with
headwords’, and regular or significant collocation as ‘lexical items occurring within
five words… of the headword’ with a greater frequency than expected, which ‘was
established only on the basis of corpus evidence’. For the first time in lexicography, a
statistical notion of collocation has been introduced.
Collocation is used to distinguish senses: ‘Different sets of collocates found with
these different senses pinpoint the fact that they are different senses’; ‘Collocation…
frequently reinforces meaning distinctions’; and lexical sets used in disambiguation
are ‘signalled by coincidence of collocation’ (Sinclair 1987). Collocation can also be
a marker of metaphoricity: the presence of modifiers and qualifiers indicates
metaphorical uses of treadmill and blanket (e.g. …the corporate treadmill; …the
treadmill of office life; a security blanket for new democracies; a blanket of snow).
Collocation is the ‘lexical realisation of the situational context’ (ibid.). In the central
patterns of English, ‘meaning was only created by choosing two or more words
                                              4
simultaneously’ (ibid.). However, the flexibility of collocation (sometimes crossing
sentence boundaries) caused problems in the wording of definitions: often, ‘no
particular group of collocates occurs in a structured relationship with the word’ and
therefore ‘there is no suitable pattern ready for use as a vehicle of explanation’ (ibid.).
The difficulty of eliciting collocates by intuition is discussed: we tend to think of
semantic sets; feet suggests ‘legs, toes, head’ or ‘shoe, sandals, sock’, or ‘walk, run’,
whereas significant corpus collocates of feet are ‘tall, high, long, and numbers’ (ibid.).
Prompted by hint, we produce ‘subtle, small, clue’; the corpus indicates ‘give, take,
no’. The difference between left-hand and right-hand collocates is exemplified by
open: the most frequent words before open are ‘the, to, an, is, an, wide, was, door,
more, eyes’ and after open are ‘to, and, the, for, up, space, a, it, in, door’ (ibid.).
Lexicographers can also use collocations to distinguish between near-synonyms, e.g.
the difference between electric (collocates: specific devices such as guitar, chair,
light, car, motor, windows, oven, all ‘powered by electricity’), and electrical
(collocates: more generic terms such as engineering, equipment, goods, appliances,
power, activity, signals, systems, etc, ‘concerning or involving electricity’).
4. Finding collocations in a corpus
Initially, collocates for dictionary headwords were identified manually by
lexicographers wading through pages of printouts of concordance lines. This was
clearly unsatisfactory, and only impressionistic views were feasible. Right-sorted
concordances obscured left-context collocates and vice versa. The fixed-length
context of printouts prevented the observation of collocates beyond a few words.
                                              5
Subsequent software developments have enabled the automatic measurement of
statistically significant co-occurrences, within a specifiable and adjustable span or
window of context, using different measures of statistical significance, principally
mutual information (or MI-score) and t-score. MI-score privileges lower-frequency,
high-attraction collocates (e.g. dentist with hygienist, optician, and molar) while t-
score favours higher-frequency collocates (e.g. dentist with chair), including
significant grammatical words (e.g. dentist with a, and your). The software can also
display the collocate’s positional distribution if required, and recursive options are
available to investigate the detailed phraseology of collocating items.
Software has also become more publicly available, from MicroConcord to
Wordsmith Tools and Michael Barlow’s Collocate. Kilgarriff and Tugwell’s
WordSketch (Kilgarriff et al 2004) was used in creating the Macmillan EFL
dictionary, and offers clause-functional information about collocations, e.g. wear +
objects: suit, dress, hat, etc + prepositional phrases (after of: armour, clothing, jeans,
etc; after with: pride, sleeve, collar, etc; after on: sleeve, wrist, finger, etc; after over:
shirt, head, dress, etc); similarly fish is the subject of the verbs swim, catch, fry, etc,
the object of the verbs catch, eat, feed, etc, modified by the adjectives tropical, bony,
oily, etc, and so on.
Lexicographers are in general less concerned about the detailed classification of
collocations, although their judgments affect the both the placement and specific
treatment of the combinations. Hornby’s attempts at classification (focusing on verbs)
later used transformations and meaning distinctions as well as surface patterns, and
                                               6
Hunston and Francis (2000) list the linguistic and lexicological terminology that has
developed subsequently for collocational units: lexical phrases, composites, gambits,
routine formulae, phrasemes, etc, and refer to the work of Moon and Melčuk in
discussing degrees of fixity and variation, which does impact on lexicography.
However, one of Firth’s original terms, colligation, used to describe the habitual co-
occurrence of grammatical elements, has not achieved the same widespread usage as
collocation. One manifestation of colligation, phrasal verbs, the combination of verb
and particle (adverb or preposition) to form semantic units, has been highlighted in
EFL dictionaries, and several EFL publishers have produced separate dictionaries of
phrasal verbs.
There have been some dictionaries of collocations, but so far each has had its own
limitations: not wholly corpus-based (e.g. Benson, Benson and Ilson; Hill and Lewis),
based on a small corpus (e.g. Kjellmer), or limited coverage (the recent Oxford
Collocational Dictionary for Students).
5. Collocation in computational linguistics, pedagogy, and translation
Interest in collocation has increased substantially in the past decade, as evidenced by
workshops at lexicographical, linguistic, pedagogical, and translation conferences.
For computational purposes, the relevant features of collocation are that they are
‘arbitrary, domain independent, recurrent, and cohesive lexical clusters’ (Smadja
1993), and ‘of limited semantic compositionality’ (Manning and Schutze 1999).
But the greatest interest has been generated in the pedagogic profession, with
numerous conference and journal papers. Lewis’s book (2000) encapsulates the main
                                           7
concerns: students do not recognise collocations in their input, and hence fail to
produce them; collocation represents fluency (which precedes accuracy, represented
by grammar); transparent versus ‘arbitrary’ (or idiomatic) combinations, with familiar
words in rarer combinations (a heavy smoker is not a fat person); transformation is
misleading (extremely disappointed but rarely extreme disappointment); students may
generalise more easily from corpus concordance examples than from canonical
versions in dictionaries (exploring versus explaining); collocation as a bridge between
the artificial separation of lexis and grammar; collocation extends knowledge of
familiar words (easier than acquiring new words in isolation); longer chunks are more
useful and easier to store than isolated words.
6. Conclusions and the future
From many fields, it seems that collocation has a great future. The applications of
collocation in language teaching have been one of the notable recent successes. Its
more detailed exploration in large language corpora requires a significant advance in
software. The exact parameters are not fully established, and the statistical measures
can be improved. Research to identify word-senses by the clustering of collocates was
initiated in the 1960s (Sinclair et al 1970), but has still not become sufficiently robust
for automatic processing. The identification of lexical sets by collocation, signalled in
Sinclair (1966, 1970) and Halliday (1966), is yet to be achieved, as is a corpus-
generated thesaurus. The theoretical impetus of collocation has yet to reach the level
of a language-pervasive system, although Hoey’s notion of Lexical Priming heads in
that direction.
                                            8
Further Reading
Benson, M., Benson, E. & Ilson, R. (1986). The BBI Combinatory Dictionary of
       English. New York: John Benjamins
Church, K.W. & Hanks, P. (1989). ‘Word Association Norms, Mutual Information,
       and Lexicography’ in Proceedings of the 27th Annual Meeting of the
       Association for Computational Linguistics, reprinted in Computational
       Linguistics 16:1, 1990.
Church, K.W., Gale, W., Hanks, P., & Hindle, D. (1990). ‘Using Statistics in Lexical
       Analysis’, in U. Zernik (ed.) Lexical Acquisition: Using on-line Resources to
       Build a Lexicon. Lawrence Erlbaum Associates .
Clear, J. (1993). ‘From Firth Principles: Computational Tools for the Study of
       Collocation’, in Baker, M., Francis, G., & Tognini-Bonelli, E. (eds.) Text and
       Technology. Amsterdam: John Benjamins.
Cowie, A.P. (1999). English Dictionaries for Foreign Learners - a History. Oxford:
       Clarendon Press.
Firth. J.R. (1957): ‘Modes of Meaning’ in Papers in Linguistics 1934-51. London:
       Oxford University Press.
Firth, J.R. (1957): ‘A Synopsis of Linguistic Theory 1930-55’ in Studies in Linguistic
       Analysis, Philosophical Society, Oxford; reprinted in F. Palmer (ed.) Selected
       Papers of J.R. Firth. Harlow: Longman.
Halliday, M.A.K (1966). ‘Lexis as a linguistic level’ in Bazell, C.E., Catford, J.C.,
       Halliday, M.A.K., Robins, R.H. (eds.) In Memory of J.R. Firth. London:
       Longman
Halliday, M.A.K. & Hasan, R. (1976). Cohesion in English. London: Longman
                                           9
Hill, J. & Lewis, M. (1997). LTP Dictionary of Selected Collocations. Hove: LTP
Hoey, M (2003). ‘Textual colligation – a special kind of lexical priming’ to appear in
       K Aijmer & B Altenberg (eds) Proceedings of ICAME 2002, Göteborg.
Kenny, D. (1998). ‘Creatures of Habit? What Translators Usually Do with Words’ in
        Meta 43(4), 515-523.
Kilgarriff, A., Rychly, P., Smrz, P. & Tugwell, D. (2004). ‘The Sketch Engine’, in
       Williams , G. & Vessier, S. (eds.) Proceedings of Euralex 2004. Lorient,
       France: Université de Bretagne Sud.
Kjellmer, G. (1994) A dictionary of English collocations. Oxford: Clarendon
       Press
Leech, G. (1974). Semantics. London: Penguin.
Lewis, M. (2000) Teaching Collocation. Hove: Language Teaching Publications.
Louw, B. (1993). ‘Irony in the text or insincerity in the writer? The diagnostic
       potential of semantic prosodies’ in M Baker et al (eds) Text and Technology.
       Amsterdam: John Benjamins.
Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based
       Approach. Oxford: OUP
Palmer, H.E. (1933). Second Interim Report on English Collocations Tokyo:
       Kaitakusha
Sinclair, J.M. (1966). ‘Beginning the Study of Lexis’ in Bazell, C.E., Catford, J.C.,
       Halliday, M.A.K., Robins, R.H. (eds.) In Memory of J.R. Firth. London:
       Longman
Sinclair, J.M., Jones, S. & Daley, R. (1970). English Lexical Studies, Report to OSTI
       on Project C/LP/08. Now published as Krishnamurthy (ed.) (2004). English
                                           10
       collocation studies: the OSTI Report. London: Continuum.
Sinclair, J.M. (1987). Looking Up - An account of the COBUILD Project in lexical
       Computing. London: Collins ELT.
Sinclair, J.M. (1987). ‘Introduction’ In the Collins Cobuild English Language
       Dictionary. London/Glasgow: Collins.
Sinclair, J.M. (1987). ‘Collocation: a progress report’ in Steele, R. & Threadgold, T.
       (eds.) Language Topics. Amsterdam/Philadelphia: Benjamins.
Sinclair, J.M. (1991). Corpus, Concordance, Collocation Oxford: O.U.P.
Stubbs, M. (1996). Text and Corpus Analysis Oxford: Blackwell.
Smadja, F. (1993). ‘Retrieving Collocations from Text: Xtract’, Computational
       Linguistics 19(1):143-177.
Smadja, F., McKeown, K. & V. Hatzivassiloglou (1996). ‘Translating Collocations
       for Bilingual Lexicons: A Statistical Approach’. Computational
       Linguistics22(1):1-38.
A brief biography
Ramesh Krishnamurthy was born in Madras, India, and has degrees in French and
German from Cambridge University, and Sanskrit and Indian Religions from London
University. He worked for the COBUILD project at Birmingham University from
1984-2003, where he compiled and edited dictionaries, grammars, and other
publications, and contributed to the development of corpora, software, and electronic
products. He has been an Honorary Research Fellow at Birmingham University and
Wolverhampton University, and has taught on undergraduate and postgraduate
courses, and supervised postgraduate research. He has contributed to several
                                          11
                         European linguistics projects, and conducted workshops and courses on corpus
                         linguistics and lexicography in several countries.
                                                                   12
View publication stats