Updated 18 October, 2024
This page provides explanations of terms used in the articles about writing systems and scripts. Text in italics is cited from elsewhere (most often the Unicode glossary).
A writing system in which consonants are indicated by the base letters that have an inherent vowel, and in which other vowels are indicated by additional distinguishing marks of some kind modifying the base letter. The term “abugida” is derived from the first four letters of the Ethiopic script in the Semitic order: alf, bet, gaml, dant. (See Section 6.1, Writing Systems.)
A writing system in which only consonants are indicated. The term “abjad” is derived from the first four letters of the traditional order of the Arabic script: alef, beh, jeem, dal. (See Section 6.1, Writing Systems.)
A writing system in which both consonants and vowels are indicated. The term “alphabet” is derived from the first two letters of the Greek script: alpha, beta. (See Section 6.1, Writing Systems.)
Base (Combining_mark
| ZWJ | ZWNJ)*. A base character that is a letter or digit, followed by zero or more combining characters, zero width joiners, and/or zero width non-joiners. This commonly reflects the minimal typographic unit used for operating on text.velar | k | kʰ | g | gʰ | ŋ |
---|---|---|---|---|---|
palatal | ʧ | ʧʰ | ʤ | ʤʰ | ɳ |
retroflex | ʈ | ʈʰ | ɖ | ɖʰ | ɳ |
dental | t | tʰ | d | dʰ | n |
bilabial | p | pʰ | b | bʰ | m |
A maximal sequence of characters following the pattern Base? (Combining_mark | ZWJ | ZWNJ)+
. Usually a base character that is a letter or digit, followed by one or more combining characters, zero width joiners, and/or zero width non-joiners.
Vowel_dependent
is one of the categories in the Indic_Syllabic_Category
property set (see a list). The Unicode Standard definition says: A symbol or sign that represents a vowel and that is attached or combined with another symbol, usually one that represents a consonant.Dependent vowels are usually combining marks, but may also be letters (eg. in Thai, or New Tai Lue, which has no combining characters).
(1) Any symbol that primarily denotes an idea or concept in contrast to a sound or pronunciation—for example, ♻, which denotes the concept of recycling by a series of bent arrows. (2) A generic term for the unit of writing of a logosyllabic writing system. In this sense, ideograph (or ideogram) is not systematically distinguished from logograph (or logogram). (3) A term commonly used to refer specifically to Han characters, equivalent to the Chinese, Japanese, or Korean terms also sometimes used: hànzì, kanji, or hanja.u
A normalization form that erases any canonical differences, and generally produces a composed result. For example, a + umlaut is converted to ä in this form. This form most closely matches legacy usage. The formal definition is D120 in Section 3.11, Normalization Forms.
A normalization form that erases any canonical differences, and produces a decomposed result. For example, ä is converted to a + umlaut in this form. This form is most often used in internal processing, such as in collation. The formal definition is D118 in Section 3.11, Normalization Forms.
From Sanskrit. The name of a sign used in many Indic and other Brahmi-derived scripts to suppress the inherent vowel of the consonant to which it is applied, thereby generating a dead consonant. (See Section 12.1, Devanagari.) The sign varies in shape from script to script, and may be known by other names in various languages.It may also be visible or hidden in consonant clusters, depending on the language and context. Used for scripts such as Devanagari, Bengali, Tamil, Balinese, etc. See also invisible stacker, and pure killer.