In linguistics, a grapheme is the smallest functional unit of a writing system.Coulmas, F. (1996), The Blackwell Encyclopedia of Writing Systems. Oxford: Blackwell, p. 174 The word grapheme is derived from Ancient Greek's ('write'), and the suffix -eme (by analogy with phoneme and other ). The study of graphemes is called graphemics. The concept of a grapheme is abstract; it is similar to the notion of a character in computing. (A specific geometric shape that represents any particular grapheme in a given typeface is called a glyph.) In orthographic and linguistic notation, a particular glyph (character) is represented as a grapheme (is used in its graphemic sense) by enclosing it within : e.g. .
In the so-called referential conception, graphemes are interpreted as the smallest units of writing that correspond with sounds (more accurately ). In this concept, the sh in the written English word shake would be a grapheme because it represents the phoneme /ʃ/. This referential concept is linked to the dependency hypothesis that claims that writing merely depicts speech.
By contrast, the analogical concept defines graphemes analogously to phonemes, i.e. via written such as shake vs. snake. In this example, h and n are graphemes because they distinguish two words. This analogical concept is associated with the autonomy hypothesis which holds that writing is a system in its own right and should be studied independently from speech. Both concepts have weaknesses.Lockwood, D. G. (2001), Phoneme and grapheme: How parallel can they be? LACUS Forum 27, 307–316.
Some models adhere to both concepts simultaneously by including two individual units,Rezec, O. (2013), Ein differenzierteres Strukturmodell des deutschen Schriftsystems. Linguistische Berichte 234, pp. 227–254. which are given names such as phonological-fit grapheme for the grapheme according to the referential concept ( sh in shake), and graphemic grapheme for the grapheme according to the analogical conception ( h in shake).Herrick, E. M. (1994), Of course a structural graphemics is possible! LACUS Forum 21, pp. 413–424.
In newer concepts, in which the grapheme is interpreted semiotics as a dyadic linguistic sign,Fedorova, L. (2013), The development of graphic representation in abugida writing: The akshara’s grammar. Lingua Posnaniensis 55:2, pp. 49–66. it is defined as a minimal unit of writing that is both lexically distinctive and correspondent to a linguistic unit (phoneme, syllable, or morpheme).Meletis, D. (2019), The grapheme as a universal basic unit of writing. Writing Systems Research.
Thus, a grapheme can be regarded as an abstraction of a collection of glyphs that are all functionally equivalent.
For example, in written English (or other languages using the Latin alphabet), there are two different physical representations of the lowercase Latin letter "a": "a" and "ɑ". Since, however, the substitution of either of them for the other cannot change the meaning of a word, they are considered to be allographs of the same grapheme, which can be written . Similarly, the grapheme corresponding to "Arabic numeral zero" has a unique semantic identity and Unicode value but exhibits variation in the form of slashed zero. Italic and bold face forms are also allographic, as is the variation seen in serif (as in Times New Roman) versus sans-serif (as in Helvetica) forms.
There is some disagreement as to whether capital and lower case letters are allographs or distinct graphemes. Capitals are generally found in certain triggering contexts that do not change the meaning of a word: a proper name, for example, or at the beginning of a sentence, or all caps in a newspaper headline. In other contexts, capitalization can determine meaning: compare, for example Polish language and Shoe polish: the former is a language, the latter is for shining shoes.
Some linguists consider digraphs like the in ship to be distinct graphemes, but these are generally analyzed as sequences of graphemes. Non-stylistic ligatures, however, such as , are distinct graphemes, as are various letters with distinctive , such as .
Identical glyphs may not always represent the same grapheme. For example, the three letters , and appear identical but each has a different meaning: in order, they are the Latin letter A, the Cyrillic letter Azǔ/Азъ and the Greek letter Alpha. Each has its own code point in Unicode: , and .
There are additional graphemic components used in writing, such as , mathematical symbols, such as the space, and other typographic symbols. Ancient logogram often used silent to disambiguate the meaning of a neighboring (non-silent) word.
Multigraphs representing a single phoneme are normally treated as combinations of separate letters, not as graphemes in their own right. However, in some languages a multigraph may be treated as a single unit for the purposes of collation; for example, in a Czech language dictionary, the section for words that start with comes after that for . For more examples, see .
|
|