Updated 7 December, 2024
This page brings together basic information about the Lao script and its use for the Lao language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Lao using Unicode.
Richard Ishida, Lao Orthography Notes, 07-Dec-2024, https://r12a.github.io/scripts/laoo/lo
ມາດຕາ 1: ມະນຸດເກີດມາມີສິດເສລີພາບ ແລະ ສະເໝີໜ້າກັນໃນທາງກຽດຕິສັກ ແລະ ທາງສິດດ້ວຍມະນຸດມີສະຕິສຳປັດຊັນຍະ(ຮູ້ດີຮູ້ຊົ່ວ)ແລະມີມະໂນທຳຈື່ງຕ້ອງປະພຶດຕົນຕໍ່ກັນໃນທາງພີ່ນ້ອງ.
ມາດຕາ 2: ຂໍ້ 1.ຄົນຜູ້ໃດກໍ່ອ້າງຕົນໄດ້ວ່າ:ມີສິດ ແລະ ເສລີພາບທຸກຢ່າງທີ່ໄດ້ປ່າວຮ້ອງຢູ່ໃນປະກາດສະບັບນີ້ໂດຍບໍ່ເລືອກໜ້າ ບໍ່ຈຳກັດເຊື້ອຊາດ,ຜິວເນື້ອ,ເພດ,ສາສະໜາ ຄວາມຄິດເຫັນໃນດ້ານການເມືອງ ຫຼື ອື່ນໆ ກຳເນີດແຫ່ງຊາດຫຼື ສັງຄົມຖານະການມີຊັບສົມບັດມາກ ຫຼື ນ້ອຍ,ມີຕະກຸນ ຫຼື ຖານະອື່ນໆ. ຂໍ້ 2.ອີກປະການໜື່ງ ຈະບໍ່ຈຳກັດຢ່າງໃດໃນການແຕກຕ່າງກັນອັນເນື່ອງມາຈາກລະບຽບການເມືອງການປົກຄອງ ຫຼື ລະຫວ່າງຊາດຂອງປະເທດ ຫຼື ດິນແດນ ຊື່ງບຸກຄົນຜູ້ໃດຜູ້ໜື່ງສັງກັດຢູ່;ດິນແດນນັ້ນຈຳເປັນເອກະລາດຢູ່ໃນຄວາມອາລັກຂາຂອງມະຫາອຳນາດ ຫຼື ບໍ່ມີອິດສະຫຼະ ຫຼື ຖືກລົດອະທິປະໄຕລົງໂດຍຈຳກັດກໍ່ຕາມ.
Source: Unicode UDHR, articles 1 & 2
Origins of the Lao script, 16thC – today.
Phoenician
└ Aramaic
└ Brahmi
└ Tamil-Brahmi
└ Pallava
└ Khmer
└ Sukhothai
└ Fakkham
└ Tai Noi
└ Lao
+ Tai Yo
The Lao language has around 4,000,000 speakers. The Lao script is used for writing the Lao language, and is also the official script of a number of minority languages in Laos. There is a considerable Lao-speaking population in Thailand who write their language with the Thai script.
ອັກສອນລາວ ʔáksɔ̌ːn láːw
The Lao alphabet was adapted from the Khmer script, and is a sister system to the Thai script, with which it shares many similarities and roots. However, Lao has fewer characters and is formed in a more curvilinear fashion than Thai. Further distancing from the Thai script occurred via a number of reforms. In 1975, the latest spelling reform simplified and standardised the script.
More information: Wikipedia
The script was originally an abugida, but since the script reforms leading up to 1960 it has been alphabetic. When the communist Pathet Lao overthrew the Lao government in 1975, they implemented a final spelling reform which simplified and standardized the script.
Lao is an alphabet. This means that both consonants and vowels are indicated. See the table to the right for a brief overview of features for the modern Lao orthography.
Lao text runs left to right in horizontal lines. Spaces separate phrases, rather than words. There is no case distinction.
Lao has 26 consonants. Each onset consonant is associated with a high, mid, or low class related to tone.
No conjuncts are used for consonant clusters, except for one subjoined consonant, used in combination only with HA.
Syllable-initial clusters and syllable-final consonant sounds are all written with ordinary consonant letters. However, because all vowels are written, it is not difficult to algorithmically detect syllable boundaries.
❯ basicV
Unlike its close relative, Thai, the Lao orthography is an alphabet and has no inherent vowel. It represents vowels using 9 combining marks, and 12 letters, much of the time grouped together in various combinations. This page lists 28 composite vowels, which can involve up to 4 glyphs (plus a tone mark) at a time, and can surround the base consonant(s) on up to 3 sides simultaneously.
Vowels are often written differently when they appear in a closed vs. open syllable.
Lao uses visual placement: only the 8 vowel components that appear above or below the consonant are combining marks; the others are ordinary spacing characters that are typed in the order seen.
There are 5 pre-base letters. There are no single-character circumgraphs in Lao text, but a single vowel or diphthong is frequently made up of multiple components, some of which will appear on different sides of the base consonant(s).
There are no independent vowels, and standalone vowel sounds are written using vowel signs applied to ອ.
Lao has 6 tones. Tone is indicated by a combination of the consonant class, the syllable type (checked/unchecked), plus any tone mark.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones.
iə̯ iːə̯ iw iːw | ɯə̯ ɯːə̯ | uə̯ uːə̯ |
ɤːj | ||
ɛw ɛːw | ||
aj aːj aw | ||
labial | alveolar | post- alveolar |
palatal | velar | glottal | |
---|---|---|---|---|---|---|
stops | p b | t d | k | ʔ | ||
aspirated | pʰ | tʰ | kʰ | |||
affricates | t͡ɕ | |||||
fricatives | f | s | x | h | ||
nasals | m | n | ɲ | ŋ | ||
approximants | ʋ/w | l | j | |||
trills/flaps | r | |||||
labial | alveolar | post- alveolar |
palatal | velar | glottal | |
---|---|---|---|---|---|---|
stop | p | t | k | ʔ | ||
nasal | m | n | ŋ | |||
approximant | w | j |
Lao has 6 phonological tones in unchecked syllables, and 4 in checked syllables.
Name | Vowel | Final | Unchecked? | Checked? |
---|---|---|---|---|
Rising | ˨˦ or ˨˩˦ | ě | ✓ | |
High | ˦ | é | ✓ | ✓ |
High falling | ˥˧ | ê | ✓ | |
Mid | ˧ | ē | ✓ | ✓ |
Low | ˩ | è | ✓ | ✓ |
Low falling | ˧˩ | e᷆ | ✓ | ✓ |
The tone depends on the class of the initial consonant in a syllable, the structure of the syllable, and whether or not a tone mark is applied to override the default.
Tone values vary depending on location in Laos. There is some disagreement about whether there are 5 or 6 tones in Vientiane, and the tables below show that different sources disagree on the tones produced. According to some, most dialects of Lao and Isan have six tones, those of Luang Prabang have five.wl
The following tables present different descriptions of tone values in Lao for the Vientiane dialect. The first and third tables basically agree on the tone value, although the names of tones vary. The middle table shows some different tone values altogether. See a list of studies for Vientiane tones.
This diagram shows 5 tones with names corresponding to a mixture of the first two tables below.
Tone marks are normally used only on open syllables, and modify the default tone value. Two of the four tone marks are only used with Class 1 consonants. Tone marks tend to be placed directly over the consonant (or superscript vowel), unlike Thai which tends to place them slightly to the right.
Open or live syllables are those that end with a long vowel or sonorant (eg. ງນມຍວ). Closed or dead syllables end with a stop consonant (eg. ກດບ) or short vowel.
Open | Closed short vowel |
Closed long vowel |
Tone mai eːk |
Tone mai toː |
Tone mai tiː |
Tone mai cat-ta-waː |
|
---|---|---|---|---|---|---|---|
Class 1 | low | ˊ high | ˆ low falling | ˉ mid | ˋ high falling | ˋ high falling | ˇ low rising |
Class 2 | ˇ low rising | ˊ high | ˆ low falling | ˉ mid | ˆ low falling | - | - |
Class3 | ˊ high | ˉ mid | ˋ high falling | ˉ mid | ˋ high falling | - | - |
Refs: Daniels
Live | Dead short vowel |
Dead long vowel |
Tone mai eːk |
Tone mai toː |
Tone mai tiː |
Tone mai cat-ta-waː |
|
---|---|---|---|---|---|---|---|
Class 1 | ˋ low | ˇ rising | ˇ rising | mid | ˆ falling | ˊ high | ˇ rising |
Class 2 | ˇ rising | ˇ rising | ˋ low | mid | ˋ low | - | - |
Class3 | ˊ high | mid | ˆ falling | mid | ˆ falling | - | - |
Refs: Simmala
Live | Dead short vowel |
Dead long vowel |
Tone mai eːk |
Tone mai toː |
Tone mai tiː |
Tone mai cat-ta-waː |
|
---|---|---|---|---|---|---|---|
Class 1 | low rising | high rising | low falling | high-mid | high falling | ||
Class 2 | low rising | high rising | low falling | high-mid | low falling | ||
Class3 | high rising | high-mid | high falling | high-mid | high falling |
Refs: SEAlang
The Simmala chart appears a little suspect, since they say in the text that the rising tone doesn't occur in dead syllables, and yet the book has examples of dead syllables with long vowels with a low tone.
The syllable is a basic element of the Lao language, and many words are monosyllabic. All syllables begin with a written consonant. Syllables that begin with a vowel sound are written with a silent base consonant.
The phonological structure of a syllable is (C)w?V(C)
.
Unlike Thai, its close neighbour linguistically, Lao doesn't naturally support onset clusters of consonants other than with ʷ, and then not before rounded vowels.wl,#Syllables Onset consonants followed by labialisation include: tʷʰ tɕʷ kʷ kʷʰ ʔʷ sʷ ŋʷ lʷ.
Only a small set of consonants occur at the end of a syllable. ʔ occurs after short vowels.
Lao also has 6 phonological tones in unchecked syllables (ie. ending in a vowel, or m, n, ŋ, w or j), and 4 in checked syllables (ie. ending in p, t, k, or ʔ).
The following table summarises the main vowel to character assigments.
The diphthongs section below contains one character that incorporates a final nasal. The list doesn't include those combinations that involve simply appending a glide after the vowel (see compositeV). Standalone vowels use the character shown as a base for the normal vowel symbols. ◌ indicates the location of consonants, but doesn't necessarily indicate the presence of a combining mark.
Plain: | |
---|---|
Diphthongs: | |
Standalone: |
For more details see vowel_mappings.
ກະ ka U+0E81 LAO LETTER KO + U+0EB0 LAO VOWEL SIGN A
The short vowel a has to be written explicitly, using 0EB0 in open syllables, and 0EB1 in closed syllables. The following word shows examples of both.
ລະດັບ
When used in conjunction with other vowels, 0EB0 and 0EB1 are also used to indicate short vowels for open and closed syllables, respectively. In phonetic transcriptions, shortened open vowels often end with a glottal stop. For example, compare:
ໂຕະ
ໂຕ
See also vlength.
Unlike its close relative, Thai, the Lao orthography is an alphabet and has no inherent vowel. It represents vowels using combining marks, dedicated vowel letters, and a couple of consonants, much of the time grouped together in various combinations. This page lists 28 composite vowels, which can involve up to 4 glyphs, and can surround the base consonant(s) on up to 3 sides simultaneously.
Vowels are often written differently when they appear in a closed vs. open syllable.
Lao uses visual placement: only the 8 vowel components that appear above or below the consonant are combining marks; the others are ordinary spacing characters that are typed in the order seen.
There are 5 pre-base letters. There are no single-character circumgraphs in Lao text, but a single vowel or diphthong is frequently made up of multiple components, some of which will appear on different sides of the base consonant(s).
None of the marks are spacing combining marks; the spacing glyphs used to write vowels are all letters.
ເກິ kɤ U+0EC0 LAO VOWEL SIGN E + U+0E81 LAO LETTER KO + U+0EB4 LAO VOWEL SIGN I
The following panel lists Lao monophthongs. The dotted circle shows the location of adjacent consonants.
A few of the vowels are written differently when the syllable is open or closed. Doubled dotted circles in the list above indicate special forms for closed syllables, eg. for ເ◌ັ◌ -e- and ແ◌ັ◌ -ɛ-.
0EAD is used as a vowel carrier for standalone vowels (see standalone), but is also used as a vowel. On its own, in the middle of a closed syllable, 0EAD is pronounced as the vowel -ɔː-, eg.
ຈອກ
ກຽວ kiːə̯w U+0E81 LAO LETTER KO + U+0EBD LAO SEMIVOWEL SIGN NYO + U+0EA7 LAO LETTER WO
Lao diphthongs are complicated and numerous. The way they are written also sometimes varies according to whether they appear in an open or a closed syllable.
Many complex Lao vowels involve adding a consonant to represent the final -j or -w glide after a vowel. The following panel shows combinations that don't follow that pattern. Except for uə̯, they are spelled the same whether the syllable is open or closed. However, there are 2 ways of spelling most diphthongs in open syllables.
Note that the above set includes 3 diphthongs that are written using single code points: ໄ, ໃ, and ວ.
On its own, in the middle of a closed syllable, the consonant 0EA7 is pronounced -uːə̯-, eg.
ບວມ
The next panel shows diphthongs and a triphthong that are produced by simply adding one of the consonants 0EA7, 0E8D, or 0EBD to the end of a previously mentioned vowel to represent the glide -w or -j.
0EBD was originally an alternate form of non-initial 0E8D, but is now used for diphthongs, either alone as iːə̯ or as the semi-vowel -j, eg.
ປຽກ
ຊາຽ
The final panel in this section shows a complete Lao rhyme that has a special spelling.
0EB3 is classed as a vowel, but also contains the final consonant m, represented by a built-in nikhahit (cf. 0ECD). It is a spacing combining character, but the Unicode Standard classifies it as a letter.
ໂກ koː U+0EC2 LAO VOWEL SIGN O + U+0E81 LAO LETTER KO
Five vowel signs appear to the left of the onset consonant. See an example in fig_prebase.
Like Thai, Lao uses a visual encoding model, so these characters are not combining characters, and are typed and stored before the base. For example:
ແມວ
Note that ແ should not be typed as two successive ເ characters.
These vowel signs are placed before the start of the syllable onset. This means that in a word with more than one consonant at the start (such as for shifting the tone) the pre-base vowel is placed to the left of the syllable-initial consonant, rather than to the left of the consonant after which it is pronounced. Tone marks and post-base vowel signs are however attached to the latter. For the following examples, click on the Lao text to see the order of characters.
ໃຫຍ່ ເຫຼືອງ
fig_prebase shows another example to graphically illustrate the relationships between the characters.
ແກວ່ງ
In common with other languages, i, ɯ and u vowels have dedicated characters for long and short sounds. But many composite vowels use 0EB0 (in open syllables) and 0EB1 (in closed syllables) as shorteners. The following provides one example of the general pattern.
This can be seen clearly by comparing the long and short vowels in vowel_mappings.
ເກຶອ kuːə U+0EC0 VOWEL SIGN E + U+0E81 LETTER KO + U+0EB6 VOWEL SIGN Y + U+0EAD LETTER O
In the lists below, ◌ represents a consonant. Vowels used in closed syllables are indicated by a trailing ◌, or a trailing hyphen in the IPA transcription.
Lao has many vowel sounds that are represented by more than one code point. Composite vowels can involve up to 4 glyphs, and glyphs can surround the base consonant(s) on up to 3 sides.
ເຈັ້ຍ
Some composite vowels represent plain vowel sounds:
The other composites represent diphthongs, which generally end in one of ə̯, i, or w.
In some cases, the spelling is straightforward because a semivowel simply follows a plain vowel.
For others, the spelling doesn't closely follow the sound represented.
Observation: Simmala et al. list the composite vowel ເ-ັຍະ for -ia in (only) one location, but I have yet to identify words containing this sequence, and suspect that it may be a typographic error.
Characters that don't appear in the combinations:
The following list shows where vowel signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. Numbers after + sign indicate multiple code points.
At maximum, vowel components can occur concurrently on 3 sides of the base.
Distribution of vowel elements is as follows:
-ັ -ິ -ີ -ຶ -ື -ໍ -ົ | -ຳ | ||
ເ ແ ໂ ໃ ໄ | ະ າ ອ ວ ຍ ຽ | ຍ ຽ ະ ວ | |
-ຸ -ູ |
Lao uses a silent ອ as a base (although it is often transcribed as ʔ), to which vowel signs are applied, eg.
ໂອ
ອຸ່ນ
ຊາວເອັດ
Lao has no independent vowel letters, but when 0EAD is used as a vowel in a closed syllable it indicates the vowel ɔː.
ຈອກ
The Unicode Lao block provides the following characters for indicating tone.
The tone is expressed using the class of the initial consonant in a syllable, the structure of the syllable, and whether or not a tone mark is applied to override the default.
Tone marks should be typed and stored in memory immediately after the base consonant of the syllable, or after a superscript vowel sign if there is one. However, the tone mark should be typed before ຳ, even though it will be displayed above the nikhahit.
ນ້ຳ
The following chart shows how to tell which tones are associated with a syllable. 'Checked' means ending in the sound -p, -t, or -k or a short vowel.
Mark | Checked? | Short/long | Consonant | Tone |
---|---|---|---|---|
่ | high/mid | low | ||
low | falling | |||
้ | high/mid | falling | ||
low | high | |||
๊ | mid | high | ||
๋ | mid | rising | ||
none | no | high | rising | |
mid/low | mid | |||
yes | short | high/mid | low | |
low | high | |||
long | high/mid | low | ||
low | falling |
This section maps Lao vowel sounds to common graphemes in the Lao orthography.
The ◌ indicates the location of a consonant relative to the vowel sign; if there are 2 of these, the vowel is used only in closed syllables.
mixed ີ
mixed ິ
mixed ື
mixed ຶ
mixed ູ
mixed ຸ
mixed ເ◌
final ເ◌ະ
medial ເ◌ັ◌
mixed ເ◌ີ
mixed ເ◌ິ
mixed ໂ◌
final ໂ◌ະ
medial ◌ົ◌
mixed ແ
final ແ◌ະ
medial ແ◌ັ◌
final ◌ໍ
medial ◌ອ◌
final ເ◌າະ
medial ◌ັອ◌
mixed ◌າ
final ◌ະ
medial ◌ັ◌
rhyme ເ◌ຍ
rhyme ເ◌ັຽ
medial ◌ຽ◌
rhyme ເ◌ັຍ
medial ◌ັຽ◌
rhyme ຽວ
rhyme ເ◌ັຽວ (check this)
rhyme ◌ີວ
rhyme ◌ິວ
rhyme ເ◌ັຍະ (check this)
rhyme ເ◌ືອ
rhyme ເ◌ຶອ
rhyme ◌ົວ
medial ◌ວ◌
medial ◌ວາ◌
rhyme ◌ົວະ
medial ◌ັວ◌
rhyme ◌ວາຍ
rhyme ເ◌ີຍ
rhyme ເ◌ີຽ (check this)
rhyme ໂ◌ຍ
rhyme ແ◌ວ
rhyme ◌ອຍ
rhyme ◌າຍ
rhyme ◌າຽ
rhyme ໃ◌
rhyme ໄ◌
rhyme ◌ັຍ
rhyme ◌າວ
rhyme ເ◌ົາ
rhyme ◌ຳ
The following table summarises the main consonant to character assigments.
The initial consonants are split across high, mid, and low columns.
high class | mid class | low class | |
---|---|---|---|
Onsets | |||
Finals | |||
For additional details see consonant_mappings.
Each of the basic consonants is associated with one of 3 classes (high, mid, and low), that play a part in indicating the tone of the syllable (see tones). In only a few cases, though, does this lead to more than one letter for a given consonant.
The pronunciation of a letter often differs when the consonant is the onset or coda of a syllable.
The panel below lists the basic consonant letters for Lao, with typical pronunciations for syllable onset and final positions.
In the following lists, the class of each consonant is shown just below the IPA data. If a dash appears after the IPA transcription, it indicates the pronunciation in syllable-initial position; before indicates the pronunciation for syllable codas.
high
mid
low
Gaps in the high class range can be filled by using a silent ຫ before the onset characters listed just below to make their default tonal class high. Note that this doesn't set a high tone, it just changes the class of the consonant, to which tone rules are then applied.
ຫວ່າງ
In modern texts, 3 of these combinations are typically represented by one of the following alternate forms.
Two combinations can be represented as ligatures, for which there are separate characters in Unicode: 0EDD and 0EDCd,462, eg.
ໝາ
ໜອນ
A third can be represented by 0EAB 0EBCu,378, eg.
ຫຼາຍ
Wiktionary lists most of the 2-letter spellings as dated when the following 3 alternatives could be used, instead.
0EAD represents a glottal stop or is silent when used as a base for vowels at the beginning of a syllable (see standalone).
ໂອ
When it appears after a base consonant in a closed syllable it becomes the vowel ɔː (see otherV).
ຈອກ
It is also used in combination with other characters to produce additional vowel sounds (see compositeV).
One more letter was officially removed from the alphabet by the Ministry of Education, but it is still used occasionally to transliterate Indic or other foreign words into Lao, eg. ຝຣັ່ງ flaŋ foreigner It is generally used to represent the letter 'r': the sound r no longer exists in Lao.
Modern Lao really only has one audible, syllable-initial cluster, and that occurs when ວ labialises one of just over half a dozen initial consonants (see structure). In those cases, the initial consonant is simply followed by ວ. Note that the tone mark goes over the second character in the digraph in the following example.
ກວ້າງ
Lao does, however, have clusters of written consonants at the syllable onset where ຫ is used to change the class of a following consonant. This includes the use of Lao's one subjoined consonant mark in the combination ຫຼ. This is described in highclass.
ຫຍ້າ
With one exception, Lao doesn't have any special code points dedicated solely to syllable final consonants, although consonants do appear in those positions, eg. ນົກ
This is true whether or not the syllable coda is followed by another syllable.
However, only the following consonants appear in syllable-final position. Note how the sound may change, compared to when the same letter is used in syllable-initial position: where the onset sound differs from the coda the onset is shown below.
Because Lao requires vowels to be written, there is not the ambiguity about syllable boundaries that one finds in Thai (caused by ambiguity about whether a consonant is syllable-final or a syllable in its own right).
The exception is that the rhyme -am may sometimes be represented by 0EB3.
ນຳ
See onsets for the few consonant clusters that occur at the beginning of a syllable.
Otherwise, consonant letter clusters only occur where a syllable ends with a consonant and another syllable begins.
Because the orthography is alphabetic, rather than an abugida, vowel absence after syllable-final consonants does not normally need to be marked in any way. The absence of a vowel sound is simply indicated by the absence of a vowel sign. The following example has 2 instances of a syllable-final consonant followed by an onset, one word-internal, and the other between words.
ອັກສອນລາວ
In a consonant cluster any tone marks or superscript vowels appear over the second consonant.
0EBA is used as a virama when writing Pali. It is not used in modern Lao.
0ECC was previously used to indicate silenced consonants, but is now described as obsolete.wl,#Punctuation
The Lao orthography has no special features for dealing with geminated or long consonant sounds.
This section maps Lao consonant sounds to common graphemes in the Lao orthography.
Light coloured characters occur infrequently.
mid class ປ
coda ບ
high class ຜ
low class ພ
mid class ບ
mid class ຕ
coda ດ
high class ຖ
low class ທ
mid class ຈ
mid class ດ
mid class ກ syllable initial & final
coda ກ
high class ຂ syllable-initial only
low class ຄ syllable-initial only
mid class vowel carrier ອ
high class ຝ
low class ຟ
high class ສ
low class ຊ
high class ຂ syllable-initial only
low class ຄ syllable-initial only
high class ຫ
low class ຮ
low class ມ
atomic high class digraph ໝ
high class digraph ຫມ
coda ມ
low class ນ
atomic high class digraph ໜ
high class digraph ຫນ
coda ລ
coda ຣ Used for non-native sounds in loan words.
low class ຍ
high class digraph ຫຍ
low class ງ syllable initial & final
high class digraph ຫງ
coda ງ
low class ວ
high class digraph ຫວ
low class ວ
high class digraph ຫວ
coda ວ
low class ລ
high class digraph ຫຼ
high class digraph ຫລ
high class subjoined ຼ
low class ຣ Used for non-native sounds in loan words.
et cetera logograph ໆລໆ
mid class ຢ
coda ຍ
Unicode 12 added 14 consonant letters and 1 combining mark for writing Pali.
This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..
Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.
In complex scripts, visually similar or identical glyph patterns can often be made from a sequence of code points rather than the single code point that Unicode provides. These are not made the same by normalisation, and they are not semantically equivalent. These inappropriate sequences should be avoided because they will cause the meaning of the text to change; searches, matching and other aspects of the text will fail to be understood by the application or the font.
Only one such is listed in the table below, The single code point on the left should be used, and not the sequence on the right. In some cases, fonts will indicate that there is a problem by forcing the appearance of a dotted circle or otherwise failing to render the text correctly, but this may not always be the case.
Use | Do not use |
---|---|
0EC1 | 0EC0 0EC0 |
The combination of nikahit and sara aa is normally written with the precomposed character in the Lao block. It is possible to use 2 code points to create something that may visually look identical (and is in fact used during justification), but the single character and the sequence are not converted to each other during normalisation; therefore, the text will be read as different by normalisation-based matching algorithms.
Recommended | Not recommended |
---|---|
0EB3 | 0ECD 0EB2 |
Tone marks should be typed and stored after any combining vowel mark. Fonts will typically indicate visually that the order is incorrect because the tone mark will appear below the vowel mark if they are the wrong way around.
Lao is visually encoded so pre-base glyphs are associated with ordinary spacing characters, and these need to be typed and stored in visual order relative to the base consonant(s) in a syllable. If the syllable begins with a consonant cluster such as pr, the pre-base code points must be typed before the p, even though they are pronounced after the r.
Lao uses Western digits.
There is, however, a set of Lao digits.
Observation: Pending further clarification about how widespread the use of Lao digits is, note that Lao Wikipedia uses Lao digits for table of contents list numbering and for footnote references. See the relevant sections below.
The CLDR standard-decimal pattern is #,##0.###
. The standard-percent pattern is #,##0%
.cldr
Observation: Lao Wikipedia uses a French pattern, #.##0,###
, eg. ມີກຳລັງຕິດຕັ້ງ 7.207,24 ເມກາວັດ There are 7,207.24 megawatts installed
The CLDR standard format for currency is ¤#,##0.00;¤-#,##0.00
, and the symbol for the Lao currency, Kip, is ₭.cldr
Lao text runs left to right in horizontal lines.
Show default bidi_class
properties for characters by the modern Lao orthography.
Experiment with examples using the Lao character app.
Pre-base vowels are visually ordered, and therefore do not need to be repositioned by the font.
Vowel signs, tones, and one consonant, on the other hand, are combining characters that need to be correctly positioned relative to the base character, and multiple marks can be combined with a single base character.
When using the vowel sign AM with a tone mark the small circle needs to push the tone mark upwards, even though the tone mark occurs before the vowel sign in memory (see fig_tone_am).
Observation: Italicised text used for a figure captions, and also for quotations.
Words are not separated by spaces, nontheless double-clicking or other selection methods are expected to identify word boundaries. There are 2 alternative approaches for managing this.
Unlike Thai or Khmer, it is fairly straightforward to parse individual syllables in Lao, because its alphabetic nature makes it possible to identify syllable-final consonants. Note that syllable-based segmentation must identify and keep together any syllable-initial clusters involving h or l, for example, the initial 2 letters in ຫມາ should wrap as a unit just like the ligated form, ໝາ mā.
What about kw etc?
While nearly all syllables can be argued to be words in their own right, there is still a preference for keeping multi-syllabic words together during word-based segmentation. eg. ປະເທດ For this, an application needs to use a dictionary to parse Lao text.
However, widely used software automatically inserts 200B in Lao text at word or syllable boundaries, and many web pages use such inserted ZWSP characters to get browsers to wrap correctly.g3,#issuecomment-385847864
If a dictionary fails to keep two or more syllables together as needed, it should be possible to use the Unicode character 2060 between the two syllables. This is an invisible character, equivalent to a zero-width no-break space, and used to prevent line-breaks.
If dictionaries are used for segmentation, they should be selected based on the language, not the script. (See the list of languages using the Lao script.)
tbd
Unicode grapheme clusters divide text into segments that contain a single base consonant plus any following combining characters: the latter include the 9 combining vowel signs, and all tone marks. Not included are free-standing vowel signs and consonants that make up other parts of a composite vowel, both pre-base and post-base. Also, syllable-initial consonant clusters with -ວ U+0EA7 LETTER WO and ຫ- U+0EAB LETTER HO SUNG are treated as 2 text units, but not ຫຼ.
This implies that a pre-base vowel sign such as ເ- U+0EC0 VOWEL SIGN E would be treated as a separate item from what follows, and in fact this can be seen in fig_drop_caps_2, where that character is the only thing highlighted in an initial letter selection. (On the other hand, initial letters followed by combining characters select the whole sequence, as seen in fig_drop_caps.)
This means that Lao typography is different from some other SE Asian scripts where pre-base vowel signs are selected with the base because they are combining characters, or syllable-initial consonant clusters form a unit because the 'medial' consonants are represented by combining characters.
Lao uses ASCII punctuation, but also uses space as punctuation.
phrase |
0020 , ; : |
---|---|
sentence |
0020 . ? ! |
Spaces are used, but represent phrase or sentence boundaries.
Numbers are also normally surrounded by spaces.
In principle, periods are not used, though this appears to be changing.wl,#Punctuation
Observation: Lao Wikipedia uses periods at the end of sentences, and commas (see an example).l An online news site also consistently uses periods to end sentences.
Western punctuation is also used. Contemporary writing may include punctuation marks borrowed from French, such as the exclamation mark (!), and question mark (?). However, questions can be determined by question words within a sentence.wl,#Punctuation
Hyphens are also commonly found in modern writing.wl,#Punctuation
Lao commonly uses ASCII parentheses to insert parenthetical information into text.
start | end | |
---|---|---|
standard | ( |
) |
( and ) are used for parentheses in contemporary writing.wl,#Punctuation
Lao texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.
start | end | |
---|---|---|
initial | “ « |
” » |
nested | ‘ |
’ |
The default quote marks for Lao are “ at the start, and ” at the end.cldr
When an additional quote is embedded within the first, the quote marks are ‘ and ’.cldr
Contemporary writing may also include « and » for quotation marks, borrowed from French.wl,#Punctuation
ຯ is used to indicate ellipsis or abbreviation, as well as missing words.wl,#Punctuation
The ellipsis, …, is also commonly found in modern writing.wl,#Punctuation
Observation: Lao Wikipedia uses periods after date-related abbreviations,l eg. in ຄ.ສ. 1935 ສະບັບຄົ້ນ ḵʰ.s. 1935 sab̯äb̯ḵʰo²ṉ CE 1935 Edition It is also used in the abbreviated name of the country, eg. ສ.ປ.ປ.ລາວ s.p̯.p̯.ḻāw̱ Lao PDR
ໆ is used in ໆລໆ kʰɯaŋ-mǎːj-lɛ-ɯːn-ɯːn (ເຄຶ່ອງໝາຍ ແລະອຶ່ນໆ), with a meaning similar to etc. For example, ການສື່ສານ,ສື່ມວນຊົນ,ສື່ໂຄສະນາ...ໆລໆ Communication, media, advertising ... etc
Some sources use ຯລຯ and others ໆລໆ – check this out.
ໆ is used to indicate repetition of a preceding sound.
CLDR includes the following punctuation.
Although Lao doesn't use spaces or dividers between words, the expectation is that line-breaks occur at word boundaries.
See word for a discussion of issues related to word-based segmentation.
As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.
Show (default) line-breaking properties for characters in the modern Lao orthography.
The following list gives examples of typical behaviours for some of the characters used in modern Lao. Context may affect the behaviour of some of these and other characters.
Click/tap on the Lao characters to show what they are.
Line breaking should not move a danda or double danda to the beginning of a new line even if they are preceded by a space character.
Since spaces aren't used to separate words, Lao has to use alternative strategies for justification of text.
Lao uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.
Lao places vowel and tone marks above base characters, one above the other, and can also add combining characters below the line. The complexity of these marks means that the vertical resolution needed for clearly readable Lao text is higher than for English, or most Latin text. In addition, Lao also tends to add more interline spacing than Latin text does.
To give an approximate idea, fig_baselines compares Latin and Lao glyphs from Noto fonts. The basic height of Lao letters is typically slightly above the Latin x-height, however extenders and combining marks reach well beyond the Latin ascenders and descenders, creating a need for larger line spacing.
fig_baselines_other shows similar comparisons for the Lao MN and DokChampa fonts.
You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.
The modern Lao orthography uses a numeric style.
The lao numeric style is decimal-based and uses these digits.rmcs
Examples:
Observation: Lao Wikipedia uses periods for suffixes.l
It is possible to find the first letter in a paragraph styled so that it is larger and sits alongside several lines of the continuing paragraph text.
Observation: All combining characters, including spacing ones, are included in the selections shown in fig_drop_caps.
Any punctuation such as opening quotes and opening parentheses should also be included in the initial styling. ?
Observation: In the figures shown, the alphabetic baseline of the highlighted letter(s) matches the bottom of the row that determines the size of the highlighted letter(s). Selections without diacritics above are somewhat shorter than the height of the lines alongside, whereas selections with multiple diacritics rise slightly higher than the first line of text.
Observation: In fig_drop_caps_2, the selection picks out only ຫ from the digraph ຫລ; and ເ from the syllable ເມຶ່ອ.
Observation: Lao Wikipedia uses Lao digits for footnote references.