In orthography and typography, a homoglyph is one of two or more , characters, or with shapes that appear identical or very similar but may have differing meaning. The designation is also applied to sequences of characters sharing these properties.
In 2008, the Unicode Consortium published its Technical Report #36 on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.
Examples of homoglyphic symbols are (a) the diaeresis and umlaut (both a pair of dots, but with different meaning, although encoded with the same ); and (b) the hyphen and minus sign (both a short horizontal stroke, but with different meaning, although often encoded with hyphen-minus). Among numerical digit and letters, digit 1 and lowercase l are always encoded separately but in many are given very similar glyphs, and digit 0 and capital O are always encoded separately but in many typefaces are given very similar glyphs. Virtually every example of a homoglyphic pair of characters can potentially be differentiated graphically with clearly distinguishable glyphs and separate code points, but this is not always done. that do not emphatically distinguish the one/el and zero/oh homoglyphs are considered unsuitable for writing , , source code, IDs and other text where characters cannot always be differentiated without context. Fonts which distinguish glyphs by means of a slashed zero, for example, are preferred for those uses.
are typeface design variants that look different but mean the same thing for example and , or a dollar sign with one or two strokes. The term synoglyph has a similar but slightly more abstract meaning for example the symbol and the letter (in Lsd) both mean the pound sterling, but only in that context. Allographs and synoglyphs are also known informally as display variants.
Most current type designs carefully distinguish between these homoglyphs, usually by drawing the digit zero narrower and drawing the digit one with prominent serifs. Early computer print-outs went even further and marked the zero with a slash or dot, which led to a new conflict involving the Scandinavian letter "Ø" and the Greek letter Φ (phi). The redesigning of character types to differentiate these characters has meant less confusion.
Some type designs conform to the DIN 1450 legibility standard by carefully designing such characters to be easy to distinguish: slashed zero to distinguish it from capital ⟨O⟩; lowercase l with a tail and uppercase ⟨I⟩ with serifs to distinguish it from the digit ⟨1⟩; distinguishing the numeral ⟨5⟩ from the capital ⟨S⟩; etc. Nigel Tao, Chuck Bigelow, and Rob Pike. Go fonts: DIN Legibility Standard". 2016.
An example of confusion due to near-homoglyphs arose from the use of a to represent a (thorn). Early English typesetters imported Dutch typesets that did not contain the latter character, so used the letter instead because (in Blackletter typeface) they look sufficiently similar. It has led in modern times to such phenomena as Ye olde shoppe, implying incorrectly that the word the was formerly written ye rather than þe. The spelling of the name Menzies (pronounced Mengis and originally spelled Menȝies) arose for the same reason: the letter was substituted for (yogh).
In certain narrow-spaced fonts (such as Tahoma), placing the letter ⟨c⟩ next to a letter such as ⟨j⟩, ⟨l⟩ or ⟨i⟩ will create a homoglyph, such as ⟨cj cl ci⟩ (⟨g d a⟩).
When some characters are placed next to each other, seen together at a glance they give the visual impression of another, unrelated character. A more precise way of saying this is that some typographic ligatures can look similar to standalone glyphs. For example, the ⟨⟩ ligature (of ⟨f⟩ and ⟨i⟩) can look similar to ⟨A⟩ in some typefaces or fonts. This potential for confusion is sometimes an argument made against the use of ligatures.
Efforts by DNS registry and Web browser designers aim to minimize the risks of homoglyphic confusion. Commonly, this is achieved by prohibiting names which mix character sets from multiple languages (toys-Я-us.org, using the Cyrillic letter ⟨Я⟩, would be invalid, but wíkipedia.org and Wikipedia still exist as different websites); Canada's .ca registry goes one step further by requiring names which differ only in to have the same owner and same registrar. The handling of Chinese characters varies: in .org and .info registration of one variant renders the other unavailable to anyone, while in .biz the traditional and simplified versions of the same name are delivered as a two-domain bundle which both point to the same domain name server.
Relevant documentation will be found both on the developers' Web sites, and on an IDN Forum provided by ICANN.
The Cyrillic letter () not only looks like Latin (), but also occupies the same button in JCUKEN-QWERTY hybrid layout keyboards. This design nuance can be seen on the C/С button represented in Keyboard Monument in Yekaterinburg.
|
|