Updated 5 May, 2022
This page provides explanations of terms used in the articles about writing systems and scripts. Text in italics is cited from elsewhere (most often the Unicode glossary).
A writing system in which consonants are indicated by the base letters that have an inherent vowel, and in which other vowels are indicated by additional distinguishing marks of some kind modifying the base letter. The term “abugida” is derived from the first four letters of the Ethiopic script in the Semitic order: alf, bet, gaml, dant. (See Section 6.1, Writing Systems.)
A writing system in which only consonants are indicated. The term “abjad” is derived from the first four letters of the traditional order of the Arabic script: alef, beh, jeem, dal. (See Section 6.1, Writing Systems.)
A writing system in which both consonants and vowels are indicated. The term “alphabet” is derived from the first two letters of the Greek script: alpha, beta. (See Section 6.1, Writing Systems.)
Base (Combining_mark
| ZWJ | ZWNJ)*. A base character that is a letter or digit, followed by zero or more combining characters, zero width joiners, and/or zero width non-joiners. This commonly reflects the minimal typographic unit used for operating on text.velar | k | kʰ | g | gʰ | ŋ |
---|---|---|---|---|---|
palatal | ʧ | ʧʰ | ʤ | ʤʰ | ɳ |
retroflex | ʈ | ʈʰ | ɖ | ɖʰ | ɳ |
dental | t | tʰ | d | dʰ | n |
bilabial | p | pʰ | b | bʰ | m |
Base? (Combining_mark
| ZWJ | ZWNJ)+. Usually a base character that is a letter or digit, followed by one or more combining characters, zero width joiners , and/or zero width non-joiners. See also BCCS.A symbol or sign that represents a vowel and that is attached or combined with another symbol, usually one that represents a consonant..u In Semitic and Indic writing systems, vowels are normally represented by dependent vowel-signs. Dependent vowels are usually combining characters, but may also be standalone (eg. in Thai, or New Tai Lue, which has no combining characters). (Example)
In Indic scripts, certain vowels are depicted using independent letter symbols that stand on their own. This is often true when a word starts with a vowel or a word consists of only a vowel.
In writing systems based on a script in the Brahmi family of Indic scripts, a consonant letter symbol normally has an inherent vowel, unless otherwise indicated. The phonetic value of this vowel differs among the various languages written with these writing systems. An inherent vowel is overridden either by indicating another vowel with an explicit vowel sign or by using virama to create a dead consonant.u (Example)
A normalization form that erases any canonical differences, and generally produces a composed result. For example, a + umlaut is converted to ä in this form. This form most closely matches legacy usage. The formal definition is D120 in Section 3.11, Normalization Forms.
A normalization form that erases any canonical differences, and produces a decomposed result. For example, ä is converted to a + umlaut in this form. This form is most often used in internal processing, such as in collation. The formal definition is D118 in Section 3.11, Normalization Forms.