In corpus linguistics, a collocation is a series of words or terminology that co-occurrence more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.
There are about seven main types of collocations: adjective + noun, noun + noun (such as collective nouns), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (), and verb + adverb.
Collocation extraction is a computational technique that finds collocations in a document or corpus, using various computational linguistics elements resembling data mining.
Collocations can be in a syntax relation (such as verb–object: make and decision), lexicon relation (such as antonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a grammar correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation a common focus for language teaching.
Corpus linguists specify a key word in context (KWIC) and identify the words immediately surrounding them, to illustrate the way words are used in practice.
The processing of collocations involves a number of parameters, the most important of which is the measure of association, which evaluates whether the co-occurrence is purely by chance or statistically significant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include mutual information, t scores, and log-likelihood.Dunning, Ted (1993): " Accurate methods for the statistics of surprise and coincidence ". Computational Linguistics 19, 1 (Mar. 1993), 61–74.
Rather than select a single definition, GledhillGledhill C. (2000): Collocations in Science Writing , Narr, Tübingen proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates;Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.Sinclair J. (1996): "The Search for Units of Meaning", in Textus, IX, 75–106. Smadja F. A & McKeown, K. R. (1990): " Automatically extracting and representing collocations for language generation ", Proceedings of ACL'90, 252–259, Pittsburgh, Pennsylvania. construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern,Hunston S. & Francis G. (2000): Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of English , Amsterdam, John Benjamins or as a relation between a base and its collocative partners;Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexikographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010–1019. and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form. Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.Frath P. & Gledhill C. (2005): " Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units", in Recherches anglaises et Nord-américaines, vol. 38 :25–43 These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum:
There are also a number of specialized dictionaries devoted to describing the frequent collocations in a language.Herbst, T. and Klotz, M. 'Syntagmatic and Phraseological Dictionaries' in Cowie, A.P. (Ed.) The Oxford History of English Lexicography, 2009: part 2, 234–243 These include (for Spanish) Redes: Diccionario combinatorio del español contemporaneo (2004), (for French) Le Robert: Dictionnaire des combinaisons de mots (2007), and (for English) the LTP Dictionary of Selected Collocations (1997) and the Macmillan Collocations Dictionary (2010).
where is the sample mean of the occurrence of , is the number of occurrences of , is the probability of under the null-hypothesis that and appear independently in the text, and is the sample variance. With a large , the t-test is equivalent to a z-test.
|
|