thisLanguage/thisScript (draft)

Updated 20 September, 2021

@ This page gathers basic information about the Syriac script and its use for the Classical, Eastern and Western Syriac dialects of the Syriac language. It aims (generally) to provide an introduction to the orthography and typographic features, and (specifically) to advise how to write Syriac using Unicode.

Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription.

More about using this page
Related pages.
Other script summaries.

Snippets for invisible characters

NNBSP [U+202F NARROW NO-BREAK SPACE]

MVS [U+180E MONGOLIAN VOWEL SEPARATOR]

܏MVS [U+070F SYRIAC ABBREVIATION MARK]

͏MVS [U+034F COMBINING GRAPHEME JOINER]

͏Ideographic Space [U+3000 IDEOGRAPHIC SPACE]

ZWNJ [U+200C ZERO WIDTH NON-JOINER]

ZWJ [U+200D ZERO WIDTH JOINER]

ZWSP [U+200B ZERO WIDTH SPACE]

ZWJ [U+2060 WORD JOINER]

؜ALM [U+061C ARABIC LETTER MARK]

RLE [U+202B RIGHT-TO-LEFT EMBEDDING]

LRE [U+202A LEFT-TO-RIGHT EMBEDDING]

PDF [U+202C POP DIRECTIONAL FORMATTING]

RLI [U+2067 RIGHT-TO-LEFT ISOLATE]

LRI [U+2066 LEFT-TO-RIGHT ISOLATE]

PDI [U+2069 POP DIRECTIONAL ISOLATE]

PDI [U+2068 FIRST STRONG ISOLATE]

RLM [U+200F RIGHT-TO-LEFT MARK]

LRM [U+200E LEFT-TO-RIGHT MARK]

Sample (thisScript)

Select part of this sample text to show a list of characters, with links to more details.
Change size:   CHANGE_ME

Sample_text_goes_here

Usage & history

Unified Canadian Aboriginal Syllabics are used for a range of Algonquin and Inuit orthographies indigenous to Canada, including Cree, Ojibwe, Inuktitut, and occasionally Blackfoot languages.

Inuktitut syllabics are used in Canada by the Inuktitut-speaking Inuit of the territory of Nunavut and the Nunavik region in Quebec. The script is used by governmental agencies and in business, education, and media.

In 1976, the Language Commission of the Inuit Cultural Institute made Inuktitut syllabics the co-official script for the Inuit languages, along with the Latin script and standardised both orthographies.

ᖃᓂᐅᔮᖅᐸᐃᑦ qaniujaːqpaˈit Inuktitut syllabics ᖃᓕᐅᔮᖅᐸᐃᑦ qaliujaːqpait something else

The 'Inuktitut language' comprises a number of similar dialects, which have divergeant vocabulary and pronunciation. The following lists dialects in Nunavit: Inuinnaqtun, Nattilingmiutut, Qamani’tuarmiutut, Paallirmiutut, Aivilingmiutut, North Qikiqtaaluk, Central Qikiqtaaluk, South Qikiqtaaluk (includes the capital, Iqaluit), and Sanikiluarmiutut. The orthography follows the sounds spoken, which leads to different spellings for the different dialects. Some dialects use only the Latin orthography.

The Canadian syllabic script was first created in 1840 by the British missionary James Evans for writing the Swampy Cree dialect. The individual symbols may represent different phonemes for each language.

The syllabic script was first adapted to represent Inuktitut around the middle of the 1800s, again by missionaries, and early print runs occurred in the 1870s.w

Sources Scriptsource and Wikipedia.

Orthographic development & variants

@ Hausa has also been written in ajami since the early 17th century.w

@ There is no standard system of using ajami for Hausa, and different writers may use letters with different values.w

@ There are or have been a number of variant practices for writing Hausa ajami. There are also some confusable characters. They include the following:

Basic features

@ The thisScript script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel-signs to the consonant. See the table to the right for a brief overview of features for the modern thisLanguage orthography.

@ The thisScript script is an alphabet. Both consonants and vowels are indicated by letters. See the table to the right for a brief overview of features for the thisLanguage language.

XXX text runs left to right in horizontal lines.

essential features: cursive (+ degree of change), bicameral,

word separation

XX basic consonant letters, repertoire extensions, registers & tone calculation

consonant clusters: virama, conjunct types, RA

syllable-initial clusters

XX basic vowels, matres lectionis, other consonants, nasalisation

XX tone marks, how calculate tone

syllable/word-final characters or indicators

other combining marks of note

numbers, punctuation

See also vocalics.

Character index

Letters

Show

Basic consonants

xxxxx

Extended consonants

xxxxx

Vowels

xxxxx

Vocalics

xxx

Other

xxxx

Not used for XXXXX

xxxxx

Combining marks

Show

Vowels

xxxxx

Vocalics

xxx

Tones

xxx

Medials

xxx

Finals

xxx

Other

xxxxx

Not used for XXXXX

xxxxx

Numbers

Show
०␣१␣२␣३␣४␣५␣६␣७␣८␣९

Punctuation

Show
xxxxx

CLDR additions

xxxx

Not used for XXXXX

xxx

Symbols

Show
xxx

Not used for XXXXX

xxx

Formatting

Show
‌␣‍␣⁧␣‫␣⁦␣‪␣⁨␣⁩␣‬␣‏␣‎
Character lists show:

Phonology

These are sounds for the XXXXXX language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Phonology

The following represents the repertoire of the Balinese language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i y y ɨ ɨ ʉ ʉ ɯ ɯ u ɪ ʏ ʊ e ø ø ɘ ɘ ɵ ɵ ɤ ɤ o ə ə ɛ œ œ ɜ ɜ ɞ ɞ ʌ ʌ ɔ æ ɐ ɐ a ɶ ɶ ɑ ɑ ɒ

Diphthongs

i y y ɨ ɨ ʉ ʉ ɯ ɯ u ɪ ʏ ʊ e ø ø ɘ ɘ ɵ ɵ ɤ ɤ o ə ə ɛ œ œ ɜ ɜ ɞ ɞ ʌ ʌ ɔ æ ɐ ɐ a ɶ ɶ ɑ ɑ ɒ

All but 1 of the diphthongs in Shan end in i or u/w.

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar uvular pharyngeal epiglottal glottal
stop p b
t d
    ʈ ɖ
ʈʰ ɖʰ
c ɟ k ɡ
ɡʰ
q ɢ   ʡ ʔ
affricate   t͡s d͡z   t͡ʃ d͡ʒ
t͡ʃʰ d͡ʒʰ
t͡ɕ d͡ʑ
t͡ɕʰ d͡ʑʰ
             
fricative f v θ ð s z
ɬ ɮ
ʃ ʒ
ɕ ʑ
ɧ
ʂ ʐ ç ʝ x ɣ χ ʁ ħ ʕ ʜ ʢ h ɦ
nasal m   n   ɳ ɲ ŋ ɴ    
approximant ʋ w   l ɫ ɹ   ɻ ɭ j ʎ ɰ
ʟ
     
trill/flap     r ɾ ɺ   ɽ ʀ

Structure

Hausa has 3 syllable types: CV, CVV, and CVC, where VV can be a long vowel or a diphthong.c The long vs. short vowel distinction is phonemically important, however when a syllable with a long vowel acquires and final consonant, the vowel is shortened.

Consonant clusters may occur where syllables are side by side, but not within a syllable. Gemination is, however, a distinctive feature.c

Semivowels ʷ and ʲ may occur after an initial consonant.

Vowels

Newar consonants have an inherent vowel sound. Other, non-inherent vowel sounds following a consonant are written using vowel-signs and other symbols. Vowels have short and long lengths, and are regularly nasalised.

Standalone vowels are written using independent vowel letters.

Additional symbols are used to express length and nasalisation.

Inherent vowel

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter 𑐎 [U+1140E NEWA LETTER KA].

Vowel-signs

Non-inherent vowel sounds that follow a consonant can be represented using vowel-signs, eg. kiː is written 𑐎𑐷 [U+1140E NEWA LETTER KA + U+11437 NEWA VOWEL SIGN II].

Newar uses the following vowel-signs. They may be used on their own, or in combination with other characters (see composite_vowels).

𑐶␣𑐷␣𑐸␣𑐹␣𑐾␣𑑀␣𑐵␣ ␣𑐿␣𑑁

Newar vowel-signs are all combining characters. All vowel-signs are stored after the base consonant, and the font puts them in the correct place for display. This also applies for the 5 circumgraphs, where a single code point produces glyphs on more than one side of the consonant base.

Five vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

Headstroke assimilation

A rather unusual feature of Newa orthography is that vowel-signs with a wavy horizontal line replace the flat headstroke of the base consonant.

This includes vowels written with the following vowel-signs: 𑐾 [U+1143E NEWA VOWEL SIGN E], 𑑀 [U+11440 NEWA VOWEL SIGN O], 𑐿 [U+1143F NEWA VOWEL SIGN AI], and 𑑁 [U+11441 NEWA VOWEL SIGN AU].p,6

The character 𑐎 [U+1140E NEWA LETTER KA] with each of the wavy line vowel-signs applied.

Alternative shapes for u

The sound u is produced by the letter 𑐸 [U+11438 NEWA VOWEL SIGN U], but that letter can have a different shape when attached to different consonant letters. The vowel-sign used to represent the long sound also has contextual variations, though not as many as the short vowel. All of these orthographic variants are produced automatically by the font; there is no need to use different characters.

The short sound is rendered as a curved shape with the following 4 consonant letters:p,7

𑐐␣𑐟␣𑐨␣𑐱

The alternative shape is shown in fig_u_shape.

𑐎𑐸 𑐐𑐸 𑐟𑐸 𑐨𑐸 𑐱𑐸
The normal shape for 𑐸 [U+11438 NEWA VOWEL SIGN U] (left), and the alternate shape used with the consonants shown.

Both short and long sounds are also written as ligatures with the consonant letters 𑐖 [U+11416 NEWA LETTER JA] and 𑐬 [U+1142C NEWA LETTER RA], as shown in fig_u_ligatures.

𑐖 𑐖𑐸 𑐖𑐹 𑐬 𑐬𑐸 𑐬𑐹
𑐸 [U+11438 NEWA VOWEL SIGN U] and 𑐹 [U+11439 NEWA VOWEL SIGN UU] producing ligatures with 𑐖 [U+11416 NEWA LETTER JA] (left) and 𑐬 [U+1142C NEWA LETTER RA] (right).

The consonants 𑐨 [U+11428 NEWA LETTER BHA] and 𑐴 [U+11434 NEWA LETTER HA] also take on special shapes when followed by a u-vowel (see bha_ha).

Pre-base vowel-sign

𑐶

The short i sound is written using 𑐶 [U+11436 NEWA VOWEL SIGN I], which appears to the left of the base consonant letter or cluster.

The combination 𑐎𑐶 [U+1140E NEWA LETTER KA + U+11436 NEWA VOWEL SIGN I] produces a pre-base positioned glyph.

This combining mark is always typed and stored after the base consonant. The font places the glyph before the base consonant.

When an orthographic syllable begins with a consonant cluster that is rendered as a conjunct, the vowel-sign is rendered before the start of the syllable, eg. here are 3 sets of consonant clusters, each followed by i when spoken, but the vowel-sign appears to the left of each cluster.𑐗𑑂𑐏𑐶 𑐳𑑂𑐟𑐶 𑐧𑑂𑐬𑐶 jkhi sti bri

Circumgraphs

Another idiosyncracy of Newa orthography is that 5 vowel-signs change shape when attached to the base consonants that don't have a headstroke. Four of those vowel-signs are so-called 'wavy-headed', and when combined with the 7 headless consonants they are rendered as circumgraphs.p,6

The following table shows the various forms, combined with both KA (has headstroke) and GA (headless). The last 4 vowel-signs combined with the headless GA produce the circumgraphs.

With headstrokeWithout headstroke
𑐵 [U+11435 NEWA VOWEL SIGN AA] 𑐎𑐵 𑐐𑐵
𑐾 [U+1143E NEWA VOWEL SIGN E] 𑐎𑐾 𑐐𑐾
𑑀 [U+11440 NEWA VOWEL SIGN O] 𑐎𑑀 𑐐𑑀
𑐿 [U+1143F NEWA VOWEL SIGN AI] 𑐎𑐿 𑐐𑐿
𑑁 [U+11441 NEWA VOWEL SIGN AU] 𑐎𑑁 𑐐𑑁

No special encoding is needed to create these circumgraph forms. The shape change should be effected automatically by the font. Also, unlike some other Indic scripts, it is not possible to compose these circumgraph forms by combining other Newa characters, since the shapes don't exist in the character set. This makes life a little easier.

Vowel length and nasalisation

It is common to see Newar vowels described in a chart which shows long and nasalised forms.

Vowel length is indicated by using a dedicated character in the case of 𑐷 [U+11437 NEWA VOWEL SIGN II] and 𑐹 [U+11439 NEWA VOWEL SIGN UU], but otherwise by adding 𑑅 [U+11445 NEWA SIGN VISARGA].

Nasalisation is indicated using 𑑃 [U+11443 NEWA SIGN CANDRABINDU] for a short vowel, and 𑑄 [U+11444 NEWA SIGN ANUSVARA] for a long vowel.

𑐎𑐵 𑐎𑐵𑑅 𑐎𑐵𑑃 𑐎𑐵𑑄
From left to right, short, long, nasalised, and long nasalised forms of , respectively.

The following matrix shows these various forms for the vowel-signs. The same rules apply to the standalone vowel letters. Note that long, nasalised ĩː and ũː vowels use the short form of the vowel-sign.m,5-6

 ShortLongShort nasalLong nasal
a inherent 𑑅 𑑃 𑑄
æ 𑐵 𑐵𑑅 𑐵𑑃 𑐵𑑄
i 𑐶 𑐷 𑐶𑑃 𑐶𑑄
u 𑐸 𑐹 𑐸𑑃 𑐸𑑄
e 𑐾 𑐾𑑅 𑐾𑑃 𑐾𑑄
o 𑑀 𑑀𑑅 𑑀𑑃 𑑀𑑄
əi - 𑐿 - 𑐿𑑄
əu - 𑑁 - 𑑁𑑄

Composite vowels

The composite vowels in Newa are described in length_nasalisation, just above.

Consonants with no following vowel

Newa uses   𑑂 [U+11442 NEWA SIGN VIRAMA] (the Newa equivalent of the Sanskrit virama) to indicate that there is no inherent vowel after a consonant, eg. the following explicitly represents just the sound k.𑐎𑑂

A word that ends in a consonant shows a virama. This is commonly seen in vowels that end with j, such as at the end of this word: 𑐧𑐶𑐮𑐫𑑂 bily͓

Consonant clusters also use this character, but if the cluster forms a conjunct then the virama is not rendered visibly (see clusters).

Standalone vowels

Newa represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound, a.

𑐂␣𑐃␣𑐄␣𑐅␣𑐊␣𑐌␣𑐀␣𑐁␣ ␣𑐋␣𑐍

Nasalisation and length are marked in the same way as for vowel-signs.

In Sanskrit texts, elision of an initial a due to sandhi is indicated using 𑑇 [U+11447 NEWA SIGN AVAGRAHA].

Tones

Traditionally, Tai Viet didn't mark tones, other than by the class of consonant.

 

່␣້␣໊␣໋

@ The expected typing and storage position for tone marks is immediately after the base consonant of the syllable, or after a superscript vowel-sign if there is one. However, the tone mark should be typed before [U+0EB3 LAO VOWEL SIGN AM], and should be displayed above the nikhahit, eg. ກ່ຳ.

@ The following chart shows how to tell which tones are associated with a syllable.

Register Checked? Mark Description Example
High checked - ˧˥ high-rising

ᦜᧅ lk̽ (la᷄k) lak˧˥ post

ᦜᦱᧅ lak̽ (la᷄ːk) laːk˧˥ differ from others

open - ˥ high ᦂᦱ ka (ká) kaː˥ crow
˧˥ high-rising ᦂᦱᧈ ka¹ (ka᷄) kaː˧˥ to go
˩˧ low-rising ᦂᦱᧉ ka² (ka᷅) kaː˩˧ rice shoots
Low

checked

- ˧ mid

ᦟᧅ ḻk̽ (lāk) lak˧ steal

ᦟᦱᧅ ḻak̽ (lāːk) laːk˧ drag, pull

open - ˥˩ falling ᦅᦱ ḵa (kâ) kaː˥˩ to be stuck
˧ mid ᦅᦱᧈ ḵa¹ (ka) kaː˧ price 
˩ low ᦅᦱᧉ ḵa² (kà) kaː˩ to do business

@ Experts disagree on the number and nature of tones in the various dialects of Lao. According to some, most dialects of Lao and Isan have six tones, those of Luang Prabang have five. wl See the section tonevalues for more detailed discussion of tones.

Encoding choices

Visually, several of the standalone vowels and some vowel-signs look as it they could be composed of smaller parts. This section gives guidance on which approach is best.

Newa is relatively resistant to incorrect coding techniques, but it is possible that someone may occasionally try to use 2 characters rather than the single character which is canonical. Doing so produces text that will not match correctly encoded text for search, spell-checking, and so on, and so should be avoided. The list below shows some examples.

Use Do not use
𑐁 [U+11401 NEWA LETTER AA] 𑐀𑐵 [U+11400 NEWA LETTER A + U+11435 NEWA VOWEL SIGN AA]
𑐌 [U+1140C NEWA LETTER O] 𑐄𑑀 [U+11404 NEWA LETTER U + U+11440 NEWA VOWEL SIGN O]
𑑀 [U+11440 NEWA VOWEL SIGN O] 𑐾𑐵 [U+1143E NEWA VOWEL SIGN E + U+11435 NEWA VOWEL SIGN AA]

Vowel sounds mapped to characters

The following tables show how Newar vowel sounds commonly map to characters or sequences of characters in the Newa orthography. vs indicates a vowel-sign, and s a standalone vowel.

Plain vowels

Diphthongs and other combinations

Vowels (alphabet)

Basic vowels

The standard vowel sounds for Santali are written as follows.

ᱤ␣ᱩ␣ᱮ␣ᱳ␣ᱚ␣ᱟ

Additional vowels

Three additional vowel sounds are represented using [U+1C79 OL CHIKI GAAHLAA TTUDDAAG],

ᱮᱹ␣ᱚᱹ␣ᱟᱹ

ᱚᱹ [U+1C5A OL CHIKI LETTER LA + U+1C79 OL CHIKI GAAHLAA TTUDDAAG] is rarely used, and the phonetic difference between it and [U+1C5A OL CHIKI LETTER LA] is not clearly defined, but the ALA-LOC transcription page says that it has a lower pitch. The phonemic difference between the two may be only marginal.rp,9

Nasalisation

Nasalisation of vowels is indicated using [U+1C78 OL CHIKI MU TTUDDAG],rp,9 eg. ᱦᱟᱸᱰᱮ

When the letter is followed by [U+1C79 OL CHIKI GAAHLAA TTUDDAAG] a separate Unicode character is used, rather than adding the two characters. That character is [U+1C7A OL CHIKI MU-GAAHLAA TTUDDAAG],rp,9 eg. ᱵᱮᱺᱫᱤ

Long vowels

To indicate a prolonged vowel sound, [U+1C7B OL CHIKI RELAA] is used,rp,9 eg. ᱢᱚᱹᱬᱮᱻ ᱢᱚᱸᱻᱦᱟ

Tones

Traditionally, Tai Viet didn't mark tones, other than by the class of consonant.

 

່␣້␣໊␣໋

@ The expected typing and storage position for tone marks is immediately after the base consonant of the syllable, or after a superscript vowel-sign if there is one. However, the tone mark should be typed before [U+0EB3 LAO VOWEL SIGN AM], and should be displayed above the nikhahit, eg. ກ່ຳ.

@ The following chart shows how to tell which tones are associated with a syllable.

Register Checked? Mark Description Example
High checked - ˧˥ high-rising

ᦜᧅ lk̽ (la᷄k) lak˧˥ post

ᦜᦱᧅ lak̽ (la᷄ːk) laːk˧˥ differ from others

open - ˥ high ᦂᦱ ka (ká) kaː˥ crow
˧˥ high-rising ᦂᦱᧈ ka¹ (ka᷄) kaː˧˥ to go
˩˧ low-rising ᦂᦱᧉ ka² (ka᷅) kaː˩˧ rice shoots
Low

checked

- ˧ mid

ᦟᧅ ḻk̽ (lāk) lak˧ steal

ᦟᦱᧅ ḻak̽ (lāːk) laːk˧ drag, pull

open - ˥˩ falling ᦅᦱ ḵa (kâ) kaː˥˩ to be stuck
˧ mid ᦅᦱᧈ ḵa¹ (ka) kaː˧ price 
˩ low ᦅᦱᧉ ḵa² (kà) kaː˩ to do business

@ Experts disagree on the number and nature of tones in the various dialects of Lao. According to some, most dialects of Lao and Isan have six tones, those of Luang Prabang have five. wl See the section tonevalues for more detailed discussion of tones.

Vowel sounds mapped to characters

The following tables show how the above vowel sounds commonly map to characters or sequences of characters in the Santali language.

Plain vowels

Vocalics

ᬋ␣ᬌ␣ᬍ␣ᬎ␣ᬺ␣ᬻ␣ᬼ␣ᬽ

@ At the beginning of a syllable the vocalic is treated as a consonant, eg. ᬓᭂᬋᬂ kər̥̣ŋ̇̽ (kěrěng) eat a lot, ᬢᬍᬃ tl̥̣r̽ (talĕr) therefore.

@ As a second component in a consonant cluster, the vocalic ra repa has a postfixed form and a subjoined form.

@ The postfixed form is seen where the independent (consonant) form of ra repa follows a syllable which ends in a consonant, The sequence of Unicode characters to be used for this is C + + [ consonant + U+1B44 BALINESE ADEG ADEG + U+1B0B BALINESE LETTER RA REPA], eg. ᬧᬓ᭄ᬋᬋᬄ pk͓r̥̣r̥̣h̽ (Pak Rěrěh) Mr Rereh.

@ The subjoined form is used to represent the dependent ra repa after a syllable-initial consonant. The sequence of characters to be used here is C + [ consonant + U+1B3A BALINESE VOWEL SIGN RA REPA], eg. ᬓᬺᬰ᭄ᬡ kr̥ŝ͓n̂ (Krěsna) Krishna .

Consonants

Click on the characters in the lists for detailed information. For a mapping of sounds to consonant letters see consonant_mappings.

Basic consonants

Stops

Affricates

Fricatives

Nasals

approximants

Vowels

Stops & affricate

high class
ผ␣ถ␣ฐ␣ข␣ฃ
mid class
ป␣บ␣ต␣ด␣ฏ␣ฎ␣ก␣อ
low class
พ␣ภ␣ท␣ธ␣ฑ␣ฒ␣ค␣ฆ␣ฅ

Affricates

high class
mid class
low class
ช␣ฌ

Fricatives

high class
ฝ␣ส␣ศ␣ษ␣ห
low class
ฟ␣ซ␣ฮ

Nasals

high class
หม␣หน␣หง
low class
ม␣น␣ณ␣ง

Approximants

high class
หว␣หร␣หล␣หย␣หญ
low class
ว␣ร␣ล␣ฬ␣ย␣ญ

High class nasals & approximants with HO

A silent [U+0E2B THAI CHARACTER HO HIP] is added before the following characters to make their default tonal class high, eg. หมา, หยุด.

ม␣น␣ง␣ว␣ร␣ล␣ย␣ญ

See onset_clusters for further details about how these are presented.

The letter O ANG

[U+0E2D THAI CHARACTER O ANG] is silent when used as a base for vowels at the beginning of a syllable. When it appears alone after a base consonant it becomes the vowel ɔː. It is also used in combination with other characters to produce additional vowel sounds (see complex_vowels).

Basic consonants

@ Only 18 of the consonants in the thisScript Unicode block are used for pure thisScript language text. The remainder are used for words derived from Sanskrit or Kawi.

ᬓ␣ᬕ␣ᬗ␣ᬘ␣ᬚ␣ᬜ␣ᬢ␣ᬤ␣ᬦ␣ᬧ␣ᬩ␣ᬫ␣ᬬ␣ᬭ␣ᬮ␣ᬯ␣ᬲ␣ᬳ

@ [U+1B33 BALINESE LETTER HA] at the beginning of a word or after a preceding vowel is mostly used as a support for a vowel-sign (see independentvowels), and is not pronounced or transcribed. Word finally with a suffix vowel, however, it is transcribed. l

Additional/honorific consonants

@ Many of the additional consonants are commonly used in words originating from Arabic and Dutch, and are most common in north Bali and Lombok. When used in pure thisScript words, they are similar to capital letters and are used to create an honorific effect. There are similar characters in Javanese. In words originating from Sanskrit, Old Javanese, or Old thisScript, they represent aspirated or other consonants. l

@ Additional consonants used for Sanskrit words.

ᬔ␣ᬞ␣ᬟ␣ᬠ␣ᬛ␣᭄ᬙ

@ Additional consonants used for words from Kawi.

ᬖ␣ᬝ␣ᬡ␣ᬣ␣ᬥ␣ᬨ␣ᬪ␣ᬰ␣ᬱ

@ Two consonants, [U+1B14 BALINESE LETTER KA MAHAPRANA] and [U+1B19 BALINESE LETTER CA LACA,] are considered very rare, and one other, [U+1B1B BALINESE LETTER JA JERA], seems to be known from only one word ( ᬦᬶᬃᬛᬭ nir̽ʤ̇r nirjhara pond ). (It is possible that an original ai may have been lost in thisScript, to be replaced by the glyph for jʰa.)

@ A number of the Sanskrit or Kawi consonants are rather poorly attested. The letter [U+1B19 BALINESE LETTER CA LACA] is only found in non-initial position following [U+1B18 BALINESE LETTER CA], ie. ᬘ᭄ᬙ c͓C, and most of the retroflex series is often omitted in books about the script.

@ There are also a few characters in the Unicode block that are used for the Sasak language. (It's not clear to me whether the fact that these relate to aspirated or retroflex forms originally affects the pronunciation.)

ᭅ␣ᭆ␣ᭇ␣ᭈ␣ᭉ␣ᭊ␣ᭋ

Repertoire extension using nukta

@ [U+093C DEVANAGARI SIGN NUKTA] is used to represent foreign sounds, eg. in ख़ारीदारी kʰˑārīdārī (xārīdārī) shopping, the dot changes to ख़ x. The following graphemes combine nukta with an existing consonant.

ऩ␣ऱ␣ऴ␣झ़␣क़␣ख़␣ग़␣ज़␣ड़␣ढ़␣फ़␣य़

@ The way the Unicode Standard recommends to represent these graphemes with code points is a little complicated.

  • These graphemes can all be represented by a basic consonant plus nukta, but all except the 4th also have precomposed forms in Unicode.
  • The first 3, ऩ ऱ ऴ, should be written using precomposed code points. It's possible to decompose them to ऩ ऱ ऴ, but they recompose under Unicode Normalisation Form C (NFC).
  • There is no precomposed code point for the 4th, झ़ jʰˑ (ʒ̣), so it has to be written in decomposed form.
  • The Unicode Standard recommends that content authors use decomposed sequences for the other 8 (only). They do not recompose under NFC.

@ In practise, it's hard to envisage content authors being aware of, let alone respecting these rules. Keyboards or other input mechanisms, or perhaps sometimes normalising applications, can help, but it's likely that Devanagari text will always contain a mixture of forms for these graphemes.

@ The Unicode block contains the following precomposed code points.

ऩ␣ऱ␣ऴ␣क़␣ख़␣ग़␣ज़␣ड़␣ढ़␣फ़␣य़

@ The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.

Repertoire extension (rerekan)

@ The combining mark [U+1B34 BALINESE SIGN REREKAN​] is used, as is a similar sign in Javanese, to extend the character repertoire for foreign sounds.

ᬚ᬴␣ᬧ᬴␣ᬯ᬴␣ᬓ᬴␣ᬳ᬴␣ᬕ᬴␣ᬗ᬴␣ᬤ᬴

@ The first 7 of the 8 listed above are attested in Library of Congress transliterations and in earlier Sasak orthography. The 8th, ᬤ᬴ could be used for one-to-one transliteration for Javanese .

@ In rendering, the dots of these letters appear above the top character, which can cause some ambiguity in reading. The following are all visually indistinguishable: ᬓ᬴᭄ᬚ kˑ͓ʤ (xja), or ᬓ᭄ᬚ᬴ k͓ʤˑ (kza), or indeed ᬓ᬴᭄ᬚ᬴ kˑ͓ʤˑ (xza). In practice these combinations are probably rather rare.

@ In recent times, Sasak users abandoned the use of the Javanese-influenced rerekan in favour of a series of modified letters (see above), making use, in addition, of some of unused Kawi letters for the Arabic sounds. In place of ᬓ᬴ x and ᬕ᬴ ɣ, for instance, the new fusion of KA and HA, [U+1B46 BALINESE LETTER KHOT SASAK] and the Kawi letter [U+1B16 BALINESE LETTER GA GORA] are used.

syllable-initial consonant clusters

@ The consonants ya, ra, la and wa regularly appear immediately after the initial consonant in a syllable. thisScript has no special characters for these medial sounds, they are just written using the normal approach for dealing with consonant clusters, eg. ᬓ᭄ᬭᬫ k͓rm (krama) member.

@ Multiple medials can occur: r or l can be followed by w or y, eg. ᬩ᭄ᬭ᭄ᬬᬕ᭄ b͓r͓yg͓ (bryag) laughter.

@ In addition, the vocalics can produce consonant sounds in medial position, eg. ᬓᬺᬰ᭄ᬡ kr̥ŝ͓n̂ (Krĕsna) Krishna.

@ See clusters for more details on shaping of glyphs.

Syllable-final consonants

@ Word-final consonant sounds with no following consonant are by default represented by ordinary consonant characters, followed by a visible [U+1B44 BALINESE ADEG ADEG​] character, eg. ᬓᬵᬤᭂᬧ᭄ kɑ̄dəp͓ kādĕp sold, ᬓᬧᬮ᭄ kpl͓ (kapal) ship.

@ However, there is also a set of combining characters that don't need to be followed by the adeg adeg.

ᬂ␣ᬃ␣ᬄ␣ᬀ␣ᬁ

@ [U+1B02 BALINESE SIGN CECEK] and [U+1B04 BALINESE SIGN BISAH] only appear at the end of a word, eg. ᬓᭂᬋᬂ kər̥̣ŋ̇̽ (kěrěng) eat a lot, ᬫᬗᬄ mŋh̽ (mangah) logic, unless the word involves repetition, eg. ᬘᬾᬂᬘᬾᬂ ceŋ̇̽ceŋ̇̽ (cengceng) musical instrument.

@ [U+1B03 BALINESE SIGN SURANG] can appear at the end of any syllable, eg. ᬓᬃᬡ kr̽n̂ (karna) ear.

@ A syllable-final diacritic may appear above a stack, eg. ᬩᬗ᭄ᬓᬸᬂ bŋ͓kuŋ̇̽ (bangkung) pig

@ The final two characters in the list above have a specialist usage related to (usually religious) Sanskrit words.

@ [U+1B01 BALINESE SIGN ULU CANDRA] appears only in holy letters, eg. ᬫᬁ mŋ̽ (Mang). When combined with independent vowel ạʷ it becomes a special symbol called omkara and is pronounced m. In this form it is used to represent god, eg. ᬒᬁᬱᬦ᭄ᬢᬶ᭞ᬱᬦ᭄ᬢᬶ᭞ᬱᬦ᭄ᬢᬶ᭞ᬒᬁ ạʷŋ̽sn͓ti,sn͓ti,sn͓ti,ạʷŋ̽ (omsanti,santi,santi,om) May peace be everywhere.

Consonant clusters

The absence of a vowel sound between two or more consonants is visually indicated in one of the following ways.

  1. Create a conjunct. There are a number of possibilities here:
    1. Stacking : Reduce a non-initial consonant in size and shape and position it below the first.
    2. Conjoining : The two consonants sit side by side, but the second consonant has a special shape.
    3. Ligation : Create a fusion of the letter shapes, where it may be difficult to identify one or more of the components.
    4. The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
  2. Show a visible virama below the non-final consonants in the cluster.
  3. Use the anusvara .

In Unicode, the conjunct formation is achieved by adding [U+0B4D ORIYA SIGN VIRAMA] between the consonants. The font hides the virama glyph automatically when a conjunct is formed.

See also finals and gemination.

Stacking

The overwhelming majority of conjuncts in Odia are achieved by subjoining a reduced form of the non-initial consonant below the initial.

ହନ→ହ୍ନ
ɦnɔ
ଳପ→ଳ୍ପ
ɭpɔ
କକ→କ୍କ
kkɔ
Examples of stacked conjuncts.

In most cases the non-initial consonant is just reduced in size, but in some cases the shape is changed, either by removing the circular top line, or in a more fundamental way.

କତ→କ୍ତ
ktɔ
କଢ→କ୍ଢ
kɖɔ
Stacked conjuncts where the subjoined shape is significantly different from the normal shape.

However, when TA is the initial consonant, it is sometimes the initial that is reduced and subjoined. In other combinations, however, it retains its full form.

ତନ→ତ୍ନ
tkɔ
ତନ→ତ୍ନ
tnɔ
Stacked conjuncts with an initial TA. The TA may be subjoined in some combinations.

Initial RA in clusters

A trailing RA has a fairly regular appearance as a subjoined glyph below the preceding consonant, although that line may join with the preceding letter shape, and therefore cause a slight change to it.

କର→କ୍ର
krɔ
A trailing RA in a cluster is rendered as a subjoined glyph.

However, like many other Indian scripts, [U+0B30 ORIYA LETTER RA] at the beginning of a cluster is represented idiosyncratically, and appears as a small, superscript glyph over the top right of the following consonant.

ରକ→ର୍କ
rkɔ
An initial RA in a cluster is rendered as a superscript over the following consonant.

Observation: Unlike Devanagari, it appears that the RA doesn't move over a following vowel-sign, such as [U+0B3E ORIYA VOWEL SIGN AA].

Ligated forms

Certain combinations of consonants form conjuncts by producing a merged glyph one or both of the original letters may be unrecognisable.

ଜଞ→ଜ୍ଞ
d͡ʒɲɔ
ତତ→ତ୍ତ
ttɔ
କଷ→କ୍ଷ
kʰjɔ
Clusters that fuse into forms different from their original component shapes.

The following is a list of combinations that produce such an effect. Click on the items to see the component letters.

କ୍ଷ␣ତ୍ତ␣ଦ୍ଧ␣ଦ୍ଭ␣ଙ୍କ␣ଙ୍ଖ␣ଙ୍ଗ␣ଙ୍ଘ␣ଜ୍ଞ␣ଟ୍ଟ␣ତ୍ତ␣ଦ୍ଭ␣ଦ୍ଦ␣ଦ୍ଧ␣ଧ୍ଯ␣ଧ୍ୟ␣ନ୍ଧ␣ନ୍ଦ␣ବ୍ଦ␣ବ୍ବ␣ମ୍ପ␣ମ୍ଫ␣ମ୍ଭ

Conjoined consonants or a visible halanta

Three letters in particular tend not to stack, but sit alongside the initial consonant in the cluster.

ତଯ→ତ୍ଯ
td͡ʒɔ
ତୟ→ତ୍ୟ
tjɔ
Conjoined letters for the clusters , and tj, respectively (top to bottom).

As can be seen above, the conjoined forms for ʤ and j are identical.

The letter NYA also sits alongside the cluster initial, but usually the halanta is shown below the initial letter.

ତଞ→ତ୍ଞ
tɲɔ
A consonant cluster that shows a visible virama, rather than creating a conjunct.

Observation: Noto, Nirmala, and Kangila fonts all show the halanta below the initial consonant in the first example at fig_conjoined, but Oriya MN and Oriya Sangam MN fonts don't show it.

The halanta is also left showing for borrowed words.d,404 The halanta can be made visible by following it with U+200C ZERO WIDTH NON-JOINER.

Triple-consonant clusters

Oriya has a number of clusters involving 3 consonants. For example, the following words contain triple-consonant clusters. As always, click on the example to see the composition. ପୂର୍ଣ୍ଣ ତୀକ୍ଷ୍ଣ ଚନ୍ଦ୍ର.

@ The absence of a vowel sound between two or more consonants is visually indicated in one of the following ways.

  1. Stacked consonants, where the non-initial consonant appears below the initial, often with a different shape from normal.
  2. Conjoined consonants, where consonants sit side-by-side but the non-initial consonant has a slightly different form that usual.
  3. A visible adeg adeg character following the initial consonant.
  4. A syllable-final consonant diacritic followed by a regular consonant.

@ In Unicode, the stacking and conjoining behaviour is achieved by adding [U+1B44 BALINESE ADEG ADEG​] between the consonants. The font hides the glyph automatically.

@ Because there are no spaces between words, consonants at the end of one word and the beginning of the next also behave like other consonant clusters. When this leads to stacks or conjoined sequences, the joined words are not typically split at line ends.

Stacking

@ To represent consonants without intervening vowels, the non-initial consonant is typically drawn below the initial consonant, and with a slightly different shape.

@ The following table shows 28 consonants in their normal and subjoined forms.

ᬓ᭄ᬓ␣ᬔ᭄ᬔ␣ᬕ᭄ᬕ␣ᬖ᭄ᬖ␣ᬗ᭄ᬗ␣ᬘ᭄ᬘ␣ᬙ᭄ᬙ␣ᬚ᭄ᬚ␣ᬜ᭄ᬜ␣ᬝ᭄ᬝ␣ᬞ᭄ᬞ␣ᬟ᭄ᬟ␣ᬠ᭄ᬠ␣ᬡ᭄ᬡ␣ᬢ᭄ᬢ␣ᬣ᭄ᬣ␣ᬤ᭄ᬤ␣ᬥ᭄ᬥ␣ᬦ᭄ᬦ␣ᬩ᭄ᬩ␣ᬪ᭄ᬪ␣ᬫ᭄ᬫ␣ᬬ᭄ᬬ␣ᬭ᭄ᬭ␣ᬮ᭄ᬮ␣ᬯ᭄ᬯ␣ᬰ᭄ᬰ␣ᬳ᭄ᬳ

@ Many of the subjoined forms are just slightly smaller versions of the original, but several have very different shapes altogether, most of which ligate with the cluster initial consonant by joining strokes.

@ There can be up to 3 consonants combined in this way, and the third consonant must be one of ya, ra, la or wa.

Conjoined consonants

@ Some consonants in non-initial positions in a cluster remain side by side, but the non-initial consonant uses a special conjoined form, eg. the s, which is normally shaped , loses some ink to its left side in ᬅᬓ᭄ᬱᬭ ạk͓ṡ̂r (akśara) alphabet, which is characteristic of all these conjoined forms.

ᬓ᭄ᬲ␣ᬓ᭄ᬱ␣ᬓ᭄ᬋ␣ᬓ᭄ᬌ␣ᬓ᭄ᬨ␣ᬓ᭄ᬧ␣ᬓ᭄ᬧ᬴␣ᬓ᭄ᭈ␣ᬓ᭄ᭊ

@ The conjoined form of [U+1B32 BALINESE LETTER SA] is unusual in that it involves strokes both below and to the right of the initial consonant, eg. ᬧᬓ᭄ᬲ pk͓s (paksa) force.

Visible adeg adeg

@ Because there is no word separator, consonants at the end of one word and beginning of the following word are normally stacked, too.

@ In some cases this leads to ambiguity about whether this is one or two words. If you really want to make clear which is which, you can use an explicit adeg-adeg, eg. ᬧᬓ᭄ᬭᬫᬦ᭄ pk͓rmn͓ (pakraman) membership vs. ᬧᬓ᭄‌ᬭᬫᬦ᭄ pk͓ₓrmn͓ (Pak Raman) Mr Raman.

@ You do this in Unicode by including [U+200C ZERO WIDTH NON-JOINER] after the adeg-adeg.

@ A somewhat ambiguous situation arises where apparently norms prevent certain combinations stacking. For example, the name of the village tamblung should not stack the mbl, but should look like ᬢᬫ᭄‌ᬩ᭄ᬮᬂ . The Unicode Standard advises to use a zero-width non-joiner after ma, to achieve this.

@ Note that this may also be achieved by intelligence in the font, as was actually the case when I generated this example (click on it to see). It's not clear to me what is the preferred approach: put ZWNJ in only when the font doesn't do what you want, or use it always. The latter may lead to more consistent content where different fonts are applied to the text (eg. after cut and paste). In theory, this shouldn't affect searching and sorting, although some applications may not ignore the ZWNJ as they should.

syllable-initial consonant clusters

@ The consonants ya, ra, la and wa regularly appear immediately after the initial consonant in a syllable. thisScript has no special characters for these medial sounds, they are just written using the normal approach for dealing with consonant clusters, eg. ᬓ᭄ᬭᬫ k͓rm (krama) member.

@ Multiple medials can occur: r or l can be followed by w or y, eg. ᬩ᭄ᬭ᭄ᬬᬕ᭄ b͓r͓yg͓ (bryag) laughter.

@ In addition, the vocalics can produce consonant sounds in medial position, eg. ᬓᬺᬰ᭄ᬡ kr̥ŝ͓n̂ (Krĕsna) Krishna.

@ See clusters for more details on shaping of glyphs.

Syllable-final diacritics

@ As mentioned earlier, thisScript represents some final consonants using diacritics. Such syllable-final diacritics are followed by ordinary consonant shapes in consonant clusters.

Consonant sounds to characters

The following maps Russian consonant sounds to common graphemes, grouped by high class (h), mid class (m), low class (l) and syllable-final (f).

Initials

t͡ɕ
m

Finals

Consonant to script mapping

The following tables show how thisLanguage consonant sounds commonly map to characters or sequences of characters.

p
 

𞤨 [U+1E928 ADLAM SMALL LETTER PE], eg. 𞤨𞤢𞤯𞤢𞤤.

𞤆 [U+1E906 ADLAM CAPITAL LETTER PE]

b
 

𞤦 [U+1E926 ADLAM SMALL LETTER BA], eg. 𞤦𞤢𞥄𞤬𞤢𞤤.

𞤄 [U+1E904 ADLAM CAPITAL LETTER BA] 

ɓ
 
ᵐb
 

𞤲𞥋𞤦 [U+1E932 ADLAM SMALL LETTER NUN + U+1E94B ADLAM NASALIZATION MARK + U+1E926 ADLAM SMALL LETTER BA], eg. 𞤲𞤢𞤲𞥋𞤦𞤢𞤪𞤢.

t
 

𞤼 [U+1E93C ADLAM SMALL LETTER TU], eg. 𞤼𞤢𞥄𞤤𞤭.

𞤚 [U+1E91A ADLAM CAPITAL LETTER TU]

d
 

𞤣 [U+1E923 ADLAM SMALL LETTER DAALI], eg. 𞤣𞤢𞤲𞤳𞤭.

𞤁 [U+1E901 ADLAM CAPITAL LETTER DAALI]

ɗ
 

𞤯 [U+1E92F ADLAM SMALL LETTER DHA], eg. 𞤯𞤢𞥄𞤼𞤭.

𞤍 [U+1E90D ADLAM CAPITAL LETTER DHA]

k
 

𞤳 [U+1E933 ADLAM SMALL LETTER KAF], eg. 𞤳𞤢𞤷𞥆𞤵.

𞤑 [U+1E911 ADLAM CAPITAL LETTER KAF]

ɡ
 
k͡p
 

Infrequent

𞥂 [U+1E942 ADLAM SMALL LETTER KPO]

𞤠 [U+1E920 ADLAM CAPITAL LETTER KPO], eg. 𞤠𞤫𞤤𞤫.

ɡ͡b
 

Infrequent

𞥀 [U+1E940 ADLAM SMALL LETTER GBE]

𞤞 [U+1E91E ADLAM CAPITAL LETTER GBE], eg. 𞤞𞤢𞤲𞤼𞤢𞤥𞤢

ᵑɡ
 
q
 

𞤹 [U+1E939 ADLAM SMALL LETTER QAAF], eg. 𞤹𞤭𞤪𞥆𞤢.

𞤗 [U+1E917 ADLAM CAPITAL LETTER QAAF]

ʔ
 

 𞥇   [U+1E947 ADLAM HAMZA], eg. 𞤗𞤵𞤪𞥇𞤢𞤲 Qurʿan.

[U+2019 RIGHT SINGLE QUOTATION MARK], eg. 𞤧𞤢𞥄𞥋𞤭

t͡ʃ
 

𞤷 [U+1E937 ADLAM SMALL LETTER CHI], eg. 𞤷𞤵𞤪𞤳𞤭.

𞤕 [U+1E915 ADLAM CAPITAL LETTER CHI]

d͡ʒ
 
ᶮd͡ʒ
 

𞤲𞥋𞤶 [U+1E932 ADLAM SMALL LETTER NUN + U+1E94B ADLAM NASALIZATION MARK + U+1E936 ADLAM SMALL LETTER JIIM], eg. 𞤲𞥋𞤶𞤢𞤲𞥋𞤣𞤭.

f
 

𞤬 [U+1E92C ADLAM SMALL LETTER FA], eg. 𞤬𞤭𞤲𞤳𞤢𞥄𞤪𞤭.

𞤊 [U+1E90A ADLAM CAPITAL LETTER FA]

v
 

Infrequent

𞤾 [U+1E93E ADLAM SMALL LETTER VA], eg. 𞤾𞤫𞤤𞤮𞥅.

𞤜 [U+1E91C ADLAM CAPITAL LETTER VA]

ʃ
 

Infrequent

𞥃 [U+1E943 ADLAM SMALL LETTER SHA], eg. 𞥃𞤢𞤴𞤳𞤵𞤶𞤮

𞤡 [U+1E921 ADLAM CAPITAL LETTER SHA]

x
 

Infrequent

𞤿 [U+1E93F ADLAM SMALL LETTER KHA], eg. 𞤤𞤢𞥄𞤿𞤢𞤪𞤢

𞤝 [U+1E91D ADLAM CAPITAL LETTER KHA]

h
 

𞤸 [U+1E938 ADLAM SMALL LETTER HA], 𞤸𞤢𞤴𞤣𞤢𞤪𞤢.

𞤖 [U+1E916 ADLAM CAPITAL LETTER HA]

m
 

𞤥 [U+1E925 ADLAM SMALL LETTER MIIM], eg. 𞤥𞤢𞥄𞤲𞥋𞤣𞤫.

𞤃 [U+1E903 ADLAM CAPITAL LETTER MIIM]

n
 

𞤲 [U+1E932 ADLAM SMALL LETTER NUN], eg. 𞤲𞤢𞤼𞤢𞤤.

𞤐 [U+1E910 ADLAM CAPITAL LETTER NUN]

ɲ
 

𞤻 [U+1E93B ADLAM SMALL LETTER NYA], eg. 𞤻𞤢𞥄𞤪𞤭𞥅𞤪𞤵.

𞤙 [U+1E919 ADLAM CAPITAL LETTER NYA]

w
 

𞤱 [U+1E931 ADLAM SMALL LETTER WAW], eg. 𞤱𞤢𞤤𞤭𞤴𞤵.

𞤏 [U+1E90F ADLAM CAPITAL LETTER WAW]

r~ɾ
 

𞤪 [U+1E92A ADLAM SMALL LETTER RA], eg. 𞤪𞤢𞤱𞤢𞥄𞤲𞥋𞤣𞤵

𞤈 [U+1E908 ADLAM CAPITAL LETTER RA]

l
 

𞤤 [U+1E924 ADLAM SMALL LETTER LAAM], eg. 𞤤𞤢𞥄𞤩𞤢𞤤.

𞤂 [U+1E902 ADLAM CAPITAL LETTER LAAM]

j
 

𞤴 [U+1E934 ADLAM SMALL LETTER YA], eg. 𞤴𞤢𞥄𞤧𞤭.

𞤒 [U+1E912 ADLAM CAPITAL LETTER YA]

ʔʲ
 

𞤰 [U+1E930 ADLAM SMALL LETTER YHE], eg. 𞤥𞤮𞤰𞥆𞤭𞥅.

𞤎 [U+1E90E ADLAM CAPITAL LETTER YHE]

Combining marks (don't use)

Deprecated

[U+0B82 TAMIL SIGN ANUSVARA] is not used for Tamil. Nor should it be used as a graphical variant of the pulli.s

Symbols

@ thisLanguage uses the following characters with the symbol property.

᭚␣᭛␣᭜␣᭝␣᭞␣᭟␣᭠

@ All but the last are described in phrase, and the final mark is described in linebreak.

Formatting characters

U+200C ZERO WIDTH NON-JOINER (ZWNJ) can be used to force the production of a visible virama, rather than a half-form (see visiblevirama). It can also be used to prevent the formation of vowel ligatures (see vowelligatures).

U+200D ZERO WIDTH JOINER (ZWJ) is used to produce special joining forms for YA (see consonant_syllable).

Numbers

Digits

@ thisScript has a set of native digits

߀␣߁␣߂␣߃␣߄␣߅␣߆␣߇␣߈␣߉

The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##,##0%.c

However, unlike other right-to-left scripts such as Arabic, Hebrew, Thaana, the numbers are displayed right-to-left, with the most significant digit first.u This means that numbers don't produce bidirectional text in N'Ko.

ߝߌߟߊߣߊ߲ ߕߋ߬ߟߋ߫ ߂߇-߂߈/߂߀߁߀ ߕߊ߬ߡߌ߲߬ߣߍ߲ ߠߊ߫߸

A date in N'Ko. All characters are read right to left, including numbers.

Ordinal numbers

Ordinal numbers are produced using diacritics.

The first ordinal is produced using  ߭ [U+07ED NKO COMBINING SHORT RISING TONE​], eg. ߁߭ first. u

Others use  ߲ [U+07F2 NKO COMBINING NASALIZATION MARK​], eg. ߂߲ second. When there are multiple digits in a number, the diacritic appears only under the last in sequence, eg. ߁߂߃߲ 123rd. u

Currency

The CLDR standard format for currency is ¤#,##0.00;¤-#,##0.00, and the symbol for the Lao currency, Kip, is [U+20AD KIP SIGN].c

Text direction

@ thisScript text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the XXXXX orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the thisScript character app.

thisScript text is not cursive (ie. joined up like Arabic), however there is a significant amount of interaction between glyphs, and some joining, around consonant clusters.

Prescript vowels are visually ordered, and since there are no combining characters and no joining behaviour, the thisScript script has no contextual variation or placement of glyphs. Nor is printed text cursive.

thisScript has no special requirements for baseline alignment between mixed scripts or in general.

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Font styles

tbd

Observation: Panels of text in a Tamil newspaper that uses oblique fonts, but all the body text of the panel uses that font. Other fonts used for the body text in other articles tended to also have a slight lean, though not as much. The verticals in headings tend to be upright.

Punctuation & inline features

Grapheme boundaries

tbd

@ Since there are no combining marks or decompositions, graphemes correspond to individual characters.

@ Unicode grapheme clusters can be applied to Lisu without problems. There are no special issues related to operations that use grapheme clusters as their basic unit of text.

Word boundaries

tbd

@ Words are not separated by spaces, and in fact some word boundaries occur between stacked consonants. This means that segmentation for line-breaking, etc. uses orthographic syllables as a unit, where orthographic means

@ Words are not separated by spaces, nontheless double-clicking or other selection methods are expected to identify word boundaries. There are 2 alternative approaches for managing this.

  1. An application uses a dictionary or smart algorithm to parse the text and determine word boundaries.
  2. The author uses U+200B ZERO WIDTH SPACE between words when creating the content.

@ Unlike Thai or Khmer, it is fairly straightforward to parse individual syllables in Lao, because its alphabetic nature makes it possible to identify syllable-final consonants. Note that syllable-based segmentation must identify and keep together any syllable-initial clusters involving h or l, for example, the initial 2 letters inຫມາ hm̱ā (mā) dog should wrap as a unit just like the ligated form,ໝາ .

What about kw etc?

While nearly all syllables can be argued to be words in their own right, there is still a preference for keeping multi-syllabic words together (eg. ປະເທດ p̯aēṯʰd̯ (pa thēt) country) during word-based segmentation. For this, an application needs to use a dictionary to parse Lao text.

@ However, widely used software automatically inserts [U+200B ZERO WIDTH SPACE] (ZWSP) in Lao text at word or syllable boundaries, and many web pages use such inserted ZWSP characters to get browsers to wrap correctly.g

@ If a dictionary fails to keep two or more syllables together as needed, it should be possible to use the Unicode character U+2060 WORD JOINER between the two syllables. This is an invisible character, equivalent to a zero-width no-break space, and used to prevent line-breaks.

@ If dictionaries are used for segmentation, they should be selected based on the language, not the script. (See the list of languages using the Lao script.)

@ Devanagari has hyphenated words – mainly conjoined nouns, eg. लाभ-हानि lābʰ-hāni profit-loss, and माता-पिता mātā-pitā parents. i

Phrase & section boundaries

tbd

phrase

, [U+002C COMMA]

U+0020 SPACE

; [U+003B SEMICOLON]

: [U+003A COLON]

[U+A9C8 JAVANESE PADA LINGSA]

sentence

. [U+002E FULL STOP]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK]

[U+0964 DEVANAGARI DANDA]
paragraph [U+A9CB JAVANESE PADA ADEG ADEG]
section Ditto.
general divider [U+A9CA JAVANESE PADA ADEG] 

@ Both [U+1B5A BALINESE PANTI] and [U+1B5B BALINESE PAMADA] are used to begin a section in text.

@ [U+1B5D BALINESE CARIK PAMUNGKAH] is used as a colon, and [U+1B5E BALINESE CARIK SIKI] and [U+1B5F BALINESE CARIK PAREREN] are used as comma and full stop respectively.

@ At the end of a section, ᭟᭜᭟ pasalinan and ᭛᭜᭛ carik agung may be used (depending on what sign began the section). These are encoded using the punctuation ring [U+1B5C BALINESE WINDU] together with [U+1B5F BALINESE CARIK PAREREN] and [U+1B5B BALINESE PAMADA].

Parentheses & brackets

tbd

  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

( [U+0028 LEFT PARENTHESIS] and ) [U+0029 RIGHT PARENTHESIS] are used for parentheses.

Quotations

tbd

  start end
initial

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

The default quote marks for Tamil are [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.c

When an additional quote is embedded within the first, the quote marks are [U+2018 LEFT SINGLE QUOTATION MARK] and [U+2019 RIGHT SINGLE QUOTATION MARK].c

Emphasis

tbd

Abbreviation, ellipsis & repetition

tbd

[U+0EAF LAO ELLIPSIS] is used to indicate ellipsis or abbreviation.

Inline notes & annotations

tbd

Other inline ranges

tbd

Other punctuation

tbd

Line & paragraph layout

Line breaking & hyphenation

tbd

@ Lines are mostly broken at inter-word spaces. As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line.

@ Common practice is to break the sentence at any point when it reaches the end of a line, except that no line breaks should be allowed within syllable boundaries and no line breaks are allowed just before a colon, comma or full stop.

@ Although Lao doesn't use spaces or dividers between words, the expectation is that line-breaks occur at word boundaries. See word for a discussion of issues related to word-based segmentation.

@ In lontar texts where a word must be broken at the end of a line (always after a full syllable), the sign [U+1B60 BALINESE PAMENENG] is inserted. This sign is not used as a word-joining hyphen; it is used only in linebreaking.

@ For an application to use this correctly, it would need to know where the word boundaries are in the text, and then put this character at the end of the line only when a multisyllabic word is broken. This would require a dictionary to be applied to the text, as it is for line-breaking in Thai or Khmer, even though breaking on syllables is the default.

@ Like most writing systems, certain characters are expected not to start or end a line. For example, periods and commas shouldn't start a line, and opening parentheses shouldn't end a line.

Show (default) line-breaking properties for characters in the XXXXXX orthography described here.

Hyphenation/word-breaking

Because of the length of Tamil words, hyphenation is useful during layout, but it isn't easy to do because of the complexity of Tamil words.

Hyphenation must take place at syllable boundaries. A hyphen is added at the end of the line when a word is hyphenated.

Prabhakarp proposes rules that single characters should be avoided at line start/end, especially characters with nukta at line start, and a word with 5 characters including 3 consecutive consonants can't be split. He says that due to the fact that Tamil is highly inflexional, morphological or pattern based approaches are needed, rather than simple dictionary lookup.

Text alignment & justification

tbd

@ According to Sudewa, full justification is not a feature of thisScript text in traditional palm-leaf manuscripts, and only left, or occasionally centred or right alignment is relevant.

Letter spacing

tbd

There appears to be a tendency to stretch text, like in Arabic, to fit a given space or make a heading, using ߺ [U+07FA NKO LAJANYALAN], eg. ߞߺߺߺߺߺߏ߲.

Counters, lists, etc.

tbd

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Greek orthography uses 2 additive and one alphabetic styles. It also uses a numeric decimal style based on ASCII digits.

The modern Thai orthography uses numeric and fixed styles. Wikipedia lists 2 more styles: an old maghrebi sequence and the hijai sequence.

Numeric

The oriya numeric style is decimal-based and uses these digits.rmcs

୦␣୧␣୨␣୩␣୪␣୫␣୬␣୭␣୮␣୯

Examples:

୧␣୨␣୩␣୪␣୧୧␣୨୨␣୩୩␣୪୪␣୧୧୧␣୨୨୨␣୩୩୩␣୪୪୪

Fixed

@ The arabic-abjad fixed style for the Arabic language uses these letters. It is only able to count to 28.

ا␣ب␣ج␣د␣ه‍␣و␣ز␣ح␣ط␣ي␣ك␣ل␣م␣ن␣س␣ع␣ف␣ص␣ق␣ر␣ش␣ت␣ث␣خ␣ذ␣ض␣ظ␣غ

Note that the 5th counter includes a zero-width joiner formatting character. This makes the shape distinguishable from lang="xx" [U+0665 ARABIC-INDIC DIGIT FIVE].

Examples:

୧␣୨␣୩␣୪␣୧୧␣୨୨␣୩୩␣୪୪␣୧୧୧␣୨୨୨␣୩୩୩␣୪୪୪

Alphabetic

The khmer-consonant alphabetic style for the Khmer language uses these letters.

ก␣ข␣ค␣ง␣จ␣ฉ␣ช␣ซ␣ฌ␣ญ␣ฎ␣ฏ␣ฐ␣ฑ␣ฒ␣ณ␣ด␣ต␣ถ␣ท␣ธ␣น␣บ␣ป␣ผ␣ฝ␣พ␣ฟ␣ภ␣ม␣ย␣ร␣ล␣ว␣ศ␣ษ␣ส␣ห␣ฬ␣อ␣ฮ

Examples:

୧␣୨␣୩␣୪␣୧୧␣୨୨␣୩୩␣୪୪␣୧୧୧␣୨୨୨␣୩୩୩␣୪୪୪

Additive

The ancient-tamil additive style uses these letters. It is specified for a range between 1 and 9,999.

௯௲␣௮௲␣௭௲␣௬௲␣௫௲␣௪௲␣௩௲␣௨௲␣௲␣௯௱␣௮௱␣௭௱␣௬௱␣௫௱␣௪௱␣௩௱␣௨௱␣௱␣௯௰␣௮௰␣௭௰␣௬௰␣௫௰␣௪௰␣௩௰␣௨௰␣௰␣௯␣௮␣௭␣௬␣௫␣௪␣௩␣௨␣௧

Examples:

୧␣୨␣୩␣୪␣୧୧␣୨୨␣୩୩␣୪୪␣୧୧୧␣୨୨୨␣୩୩୩␣୪୪୪

Prefixes and suffixes

The default list style uses a full stop + space as a suffix.

The most common approach to writing lists in Burmese puts the counters in parentheses. An alternate style uses [U+104B MYANMAR SIGN SECTION] after the counter. Full stops are not commonly used.

Examples:

ա. բ. գ. դ. ե.
Separator for XXXXX list counters.

Styling initials

tbd

Page & book layout

This section is for any features that are specific to thisScript and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

Historical information

Orthographic development & variants

In 2019 the design of Adlam letter glyphs was overhauled in a proposal to the Unicode Consortium, which resulted in changes to the code chart. p

Adlam 2018Adlam 2019

Glyphs in the Unicode Adlam code chart showing pre-2019 (left) and post-2019 (right) shapes.

Typical changes involved standardising the shapes across cursive forms, better distinctions between lower and uppercase forms, removal of some ascenders to avoid diacritic collisions, and then addition of some small ascenders to help distinguish joined forms.

There were also some significant shape changes, particularly to make supplementary letters look more like those used for similar, standard sounds, or to make letters easier to read.

Although there are not many Adlam Unicode fonts, and they will be changed, legacy forms are likely to persist for some time alongside the new forms.

The 2017 release of the Noto Sans Adlam font (still in use in early 2020) contained a set of glyphs that sometimes matched one or other of the shapes shown in variant_shapes, and sometimes used completely different shapes from either. The Noto fonts were updated to the new shapes in September 2020.

Click to also show in variant_shapes shapes produced by the Noto Sans Adlam font at the start of 2020. Red underlines highlight some characters that don't resemble either of the other charts.

Historical approaches to justification

In the early stages of Adlam typography it was quite common to see full justification of printed text that was produced by stretching baselines, rather than by adjusting inter-word spaces. This was influenced by the use of keyboards based on Arabic code points. Handwritten documents, however, were not justified in this way.

Observation: The Winden Jangen site has scans of a number of books which apply full justification. The method of justification appears to be elongation of the baseline, with no affect on the inter-word spacing. See fig_justification. In narrow columns this can produce some exaggerated stretching, as seen in fig_justification_wide. There are many passages in the samples available that apply this exaggerated stretching. Some content also applies justification to the last line in a paragraph, which sometimes produces even wider elongations.

Justification example.
Full justification achieved by stretching baselines.

Languages using the thisScript script

According to ScriptSource, the thisScript script is used for the following languages:

References