Cambodian (draft)
Khmer

Updated 29 December, 2021

This page brings together basic information about the Khmer script and its use for the Cambodian language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Khmer using Unicode.

This page is a work in progress. The information given here should be correct, but needs to be added to and refined further.

Phonological transcriptions on this page should be treated as an approximate guide, only. They are taken from the sources consulted, and may be narrow or broad, phonemic or phonetic, depending on what is available.

More about using this page
Related pages.
Other script summaries.

Sample

Select part of this sample text to show a list of characters, with links to more details. Source
Change size:   28px

មាត្រា ១ មនុស្សទាំងអស់ កើតមកមានសេរីភាព និងសមភាព ក្នុងផ្នែកសេចក្ដីថ្លៃថ្នូរនិងសិទ្ធិ។ មនុស្ស មានវិចារណញ្ញាណនិងសតិសម្បជញ្ញៈជាប់ពីកំណើត ហើយគប្បីប្រព្រឹត្ដចំពោះគ្នាទៅវិញទៅមក ក្នុង ស្មារតីភាតរភាពជាបងប្អូន។

មាត្រា ២ មនុស្សម្នាក់ៗ អាចប្រើប្រាស់សិទ្ធិនិងសេរីភាពទាំងអស់ ដែលមានចែងក្នុងសេចក្ដីប្រកាសនេះ ដោយគ្មានការប្រកាន់បែងចែកបែបណាមួយ មានជាអាទិ៍ ពូជសាសន៍ ពណ៌សម្បុរ ភេទ ភាសា សាសនា មតិនយោបាយ ឬមតិផ្សេងៗទៀត ដើមកំណើតជាតិ ឬសង្គម ទ្រព្យសម្បត្ដិ កំណើត ឬស្ថានភាព ដទៃៗទៀតឡើយ។ លើសពីនេះ មិនត្រូវធ្វើការប្រកាន់បែងចែកណាមួយ ដោយសំអាងទៅលើឋានៈខាងនយោបាយ ខាងដែនសមត្ថកិច្ច ឬខាងអន្ដរជាតិរបស់ប្រទេស ឬដែនដីដែលបុគ្គលណាម្នាក់រស់នៅ ទោះបីជាប្រទេស ឬដែនដីនោះឯករាជ្យក្ដី ស្ថិតក្រោមអាណាព្យាបាលក្ដី ឬគ្មានស្វ័យគ្រប់គ្រងក្ដី ឬស្ថិតក្រោមការដាក់ កម្រិតផ្សេងទៀតណាមួយ ដល់អធិបតេយ្យភាពក្ដី។

Usage & history

The Khmer script is used for writing the official language of Cambodia, and sometimes for Cambodian minority languages, such as Tampuan, Krung, Cham, Brao and Mnong. It is currently in widespread use, although it is estimated that 35% of the Khmer-speaking population aged 15 and over are illiterate in the script. It is also used to write Pali in the Buddhist liturgy of Cambodia and Thailand.

អក្សរខ្មែរ ʔaʔsɑː kʰmaːe Khmer script

The script is thought to be descended from the Brahmi Pallava script, and the Khmer literary tradition dates back to the 7th century. The modern Khmer script differs somewhat from precedent forms seen on the inscriptions of the ruins of Angkor. The Thai and Lao scripts are descended from an older form of the Khmer script.

Sources: Scriptsource, Wikipedia, Unicode13 p653

Basic features

The script is an abugida, ie. like most Brahmi-influenced scripts, each consonant carries with it an inherent vowel. The sound following a consonant can be modified by attaching vowel signs to the consonant when writing. See the table to the right for a brief overview of features for the modern Khmer orthography.

Khmer text runs left to right in horizontal lines.

Words are not separated by spaces, however words may be separated by ZWSP. Spaces are used as phrase separators.

Each onset consonant is associated with a high or low class related to pronunciation (there is no tone). Khmer has more vowel sounds than ways to write them, so the choice of consonant class indicates different sounds for the same written vowel. Other factors may also affect the sound, such as stress, vowel harmony, and diacritics.

Word-internal clusters are very common at the beginning of a word, but clusters also occur medially in multisyllable words, and occasionally at the end of a word (though the 2nd consonant at the word end is usually not pronounced). Clusters are indicated by stacked consonants. Subjoined forms are created using an invisible coeng character. Stacks do not span word boundaries.

Word-final consonant sounds (typically 8 consonants and characters) use ordinary code points without an inherent vowel. Because there are no spaces or other word dividers, it is difficult to detect boundaries algorithmically. Two word-final sounds (m and h) can be produced using combining marks.

The Khmer orthography has 2 inherent vowels (for example, using the two symbols for the sound k, is kɑː neck, and is kɔː mute.), and represents vowels using 17 vowel-signs (including 3 prescripts and 5 circumgraphs, 2 of which can decompose into composite vowels), and 8 consonants or diacritics. All vowel-signs are combining marks, and are stored after the base character.

Khmer has more vowel sounds than ways to write them. Therefore, a written vowel can have different pronunciations, depending on the class of the base consonant. (There is no tone in Khmer, so classes are specially designed for vowel selection.) Additional factors include whether this is an unstressed vowel, vowel harmony, and whether any of the special diacritics have been used to change the sound. For an in-depth treatment of pronunciation see Huffman in the sources section.

There is an incomplete set of independent vowels, and standalone vowel sounds are typically written using vowel-signs applied to [U+17A2 KHMER LETTER QA].

This page lists 15 composite vowels (made from 9 vowel signs, and 8 consonants/diacritics), not counting decompositions. Composite vowels can involve up to 3 glyphs, but only surround the base consonant(s) on 2 sides, eg. កោះ koh̽.

Modern Khmer has a number of distinct writing styles, including slanted (called អក្សរច្រៀង), which has an upright variant, and round (called អក្សរឈរ). The round style includes more ligated forms. The upright style is used here. For examples, see writing_styles.

Character index

Letters

Show

Basic consonants

ផ␣ប␣ត␣ថ␣ឋ␣ដ␣ច␣ឆ␣ក␣ខ␣អ␣ព␣ភ␣ទ␣ធ␣ឍ␣ឌ␣ជ␣ឈ␣គ␣ឃ␣វ␣ស␣ហ␣ម␣ន␣ណ␣ញ␣ង␣រ␣ឡ␣ល␣យ

Independent vowels

ឥ␣ឦ␣ឪ␣ឧ␣ឩ␣ឯ␣ឰ␣ឱ␣ឳ␣ឲ

Other

Not used for Khmer

ឣ␣ឤ
ឨ␣ឝ␣ឞ␣ៜ
ឫ␣ឬ␣ឭ␣ឮ

Combining marks

Show

Vowel-signs

េ␣ែ␣ៃ␣ៀ␣ឿ␣ើ␣ោ␣ៅ␣ួ␣ុ␣ូ␣ិ␣ី␣ឹ␣ឺ␣ៈ␣ា␣ំ␣ះ

Other

្␣៍␣់␣័␣៉␣៊
៑␣៌␣៝␣៎␣៏

Not used for Khmer

឴␣឵

Numbers

Show
០␣១␣២␣៣␣៤␣៥␣៦␣៧␣៨␣៩

Punctuation

Show
“␣”␣‘␣’␣꧇␣៖␣។␣៕␣៙␣៚

ASCII

(␣)␣,␣-␣.␣?␣!

Not used for Khmer

Symbols

Show

Other

Show
​␣‌␣‍
In character lists, show:

Phonology

These are sounds for the Khmer language.

Click on the sounds to see where else in the document they are referred to.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i i ɨ ɨː ɨ ɨː u u e e o o ə əː ə əː ɛː ɛː ɔː ɔː a a ɑ ɑː ɑ ɑː

Diphthongs

ɨə ɨə ŭə ŭə ei ĕə ei ĕə ou ŏə ou ŏə əɨ əɨ ɔə ɔə ae ao ae ao

Consonant sounds

labial dental alveolar retroflex palatal velar glottal
stops p ɓ~b
t ɗ~d     c k ɡ ʔ
aspirated
     
affricates              
fricatives f   s z       h
nasals m   n   ɲ ŋ
approximants ʋ~w   l   j  
trills/flaps     r  

Structure

Morphological syllables

Many native Cambodian words are monosyllabic. There are also many bisyllabic words, in which the first syllable is typically unstressed, and the vowel is rendered in colloquial speech as a schwa. Some bisyllabic words are compounds, however, and this may not apply.

Syllables start with one or more consonants or an independent vowel, however the latter represent a vowel sound after a glottal stop, so the syllable onset is always C(C)(C)V. The rhyme is composed of a short vowel in stressed syllables that is always followed by a consonant (VC), or a long vowel, that may not be.

Many monosyllabic words that begin with consonant clusters, and some monosyllabic words that end with clusters, although only one consonant is pronounced in syllable final position.

Word-internally, a syllable with a final consonant followed by another syllable creates a stack of consonants. The top item in the stack is the syllable final consonant, and the initial consonant of the next syllable is rendered in subjoined form.

Polysyllabic words are usually of Sanskrit, Pali or French origin. These words tend to alternate stress across their syllables, but may not.

Orthographic syllables

An orthographic syllable differs from a morphological syllable in that it always begins with a consonant, or cluster of consonants. Where word-medial stacks occur, the orthographic syllable may begin with the final consonant of the previous morphological syllable and continue through the onsets of the following syllable. Alternatively, an orthographic syllable may be just a final consonant (or consonant cluster) in a morphological syllable.

កដ្ឋមណ្ឌូ      កដ្ឋមណ្ឌូ
The same word, split into phonetic syllables (left) and orthgraphic syllables (right).

An orthographic syllable includes all the combining characters associated with the consonants it contains.

Components of an 'orthographic syllable' should be composed in the following order:

  1. base consonant or independent vowel
  2. rɔɓaːt
  3. museʔkətoə̯n or trəisaɓ (register shifters)
  4. subscript (consonant or independent vowel)
  5. vowel sign
  6. zero-width joiner or non-joiner
  7. any other mark

This fixed ordering makes it easier to search for and collate text. For more details, see this SIL document.sil

As mentioned above, although all combining characters follow the base in memory, the visual order of syllable components may not follow a linear progression from left to right. In the following example the order in which the glyphs are pronounced is far left, far right, down, left, left: កន្ត្រៃ In the word ច្រៀង the spoken order of the separate visible parts, numbered left to right, is 3,2, 1+4, 5, Some vowel signs span two or three sides of the base consonant or cluster.

Vowels

Dashes are used to indicate whether the character represents a vowel sound in a closed or an open syllable.

Click on the characters in the lists for detailed information. For a mapping of sounds to graphemes see vowel_mappings.

Consonant registers

Cambodian inherited a writing system that has more vowel sounds than ways to write them, but has fewer consonant sounds than consonant symbols. Khmer takes advantage of this by dividing the consonant symbols into 2 classes (or registers): an a-class and an o-class. The class of a consonant then determines the vowel sound in a syllable. For example, compare the pronunciations in the table:

A-class consonant IPA O-class consonantIPA
[U+1780 KHMER LETTER KA] kɑː [U+1782 KHMER LETTER KO] kɔː
កី [U+1780 KHMER LETTER KA + U+17B8 KHMER VOWEL SIGN II] kəj គី [U+1782 KHMER LETTER KO + U+17B8 KHMER VOWEL SIGN II] kiː
កា [U+1780 KHMER LETTER KA + U+17B6 KHMER VOWEL SIGN AA] kaː គា [U+1782 KHMER LETTER KO + U+17B6 KHMER VOWEL SIGN AA] kiə
Examples of vowel pronunciation changes, depending on the class of the base consonant.

Other factors may also affect the sound, such as stress, and vowel harmony. Diacritics can also be used to change the class of a consonant, or create sequences that represent a class for which there isn't a single character (see register_change).

These registers are not related to tone, as Khmer is not tonal.

Inherent vowels

Khmer has 2 inherent vowels: ɑː and ɔː. Both are commonly transcribed as a.

The class of the consonant will initially dictate which sound is appropriate, eg. [U+1780 KHMER LETTER KA] (an ɑː class consonant) is pronounced kɑː whereas [U+1782 KHMER LETTER KO] (an ɔː class consonant) is pronounced kɔː, but see also vowel_harmony.

The invisible characters U+17B4 KHMER VOWEL INHERENT AQ and U+17B5 KHMER VOWEL INHERENT AA were originally intended to represent a phonetic difference not expressed by the spelling, so as to assist in phonetic sorting, however, the Unicode Standard considers them insufficient for that purpose and errors in the encoding, and they should not be used.u,677

Vowel harmony

In two-syllable words, where the second syllable begins with one of the following sonorant consonants the vowel class of the second syllable is the same as that of the first.

ង␣ញ␣ណ␣ន␣ម␣យ␣ឡ␣ល␣រ␣វ

For example, in the following word the second syllable starts with an o-class consonant but the class of the preceding syllable turns the vowel to an a-class sound. There are, however, exceptions to this rule. ប្រយ័ត្ន

Vowel-signs

Non-inherent vowel sounds that follow a consonant can be represented using vowel-signs, eg. ki is written កិ [U+1780 KHMER LETTER KA + U+17B7 KHMER VOWEL SIGN I].

An orthography that uses vowel-signs is different from one that uses simple diacritics or letters for vowels in that the vowel-signs are generally attached to the orthographic syllable, rather than just applied to the letter of the immediately preceding consonant. Where the orthographic syllable begins with a stack of consonants, the vowel-sign, including pre-base vowel-signs and circumgraphs, is rendered relative to the first (full-sized) item in the stack (see stack_vowels for examples). However, they are typed and stored in computer memory in the order of pronunciation.

Khmer vowels are written using dedicated combining characters, the nikahit or reəahmuk diacritics, and some consonants that are pronounced as vowels in certain contexts. These components may be used on their own, or in combination with others (see composite_vowels).

Around half the vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

Combining marks used for vowels

Khmer uses the following dedicated combining marks for vowels. As just mentioned, they may be used on their own, or in combination with others (see composite_vowels). Where there are 2 possible pronunciations listed, the left is for a-class consonants, and the right for 0-class.

ិ␣ី␣ឹ␣ឺ␣ុ␣ូ␣េ␣ោ␣ែ␣ៈ␣ា␣៏␣ ␣ៀ␣ឿ␣ួ␣ើ␣ៃ␣ៅ

All these characters are typed and stored after the base consonant they follow during pronunciation, and the glyph rendering system takes care of the positioning at display time. This also applies for stacks, and for the 3 pre-base vowel-signs and the 5 circumgraphs.

[U+17C8 KHMER SIGN YUUKALEAPINTU​] is a 20th century addition to the Khmer repertoire. It is used as a vowel after consonants that are pronounced as stressed syllables at the end of a word, or preceding an internal juncture in compounds.

[U+17CF KHMER SIGN AHSDA​] is used over just two words: ក៏ ដ៏

In the âksâr mul font style, some vowel-signs form ligatures with their base consonants. See vowel_ligatures.

Pre-base vowel-signs

េ␣ែ␣ៃ

Three combining marks are displayed to the left of the onset consonant(s), eg. គេ

Relative to the consonants, these combining marks are always typed and stored in the order in which they are pronounced, not the order in which they are displayed. The positioning of the glyph when displayed is managed by the font and rendering software.

In fact, these vowel-signs appear before the start of the orthographic syllable. When a syllable onset is a consonant cluster, the vowel-sign appears to the left of the initial character in the consonant stack. Click on the following examples to compare the sequence of code points with the displayed result: ម្ដេច អង្គ្លេស

Circumgraphs

ៀ␣ឿ␣ើ␣ោ␣ៅ

Five vowels are produced by a single combining character with visually separate parts, that appear on different (mostly opposite) sides of the consonant onset.

These are also combining marks that are always stored after the base consonant. The font places the glyphs in the right place relative to the base consonant.

គ្រឿង
A circumgraph vowel-sign (in the light colour), surrounding both the k and the r (midtone colour) after which it is pronounced in the word គ្រឿង.

See also encoding_choices.

Consonants used for vowels

Vowel-signs may sometimes be combined with other characters to represent a particular vowel sound.

Nikahit and reahmuk

ំ␣ះ

The descendants of the anusvara and the visarga, [U+17C6 KHMER SIGN NIKAHIT], called និគ្គហិត niʔkəhət, and [U+17C7 KHMER SIGN REAHMUK], called រះមុខ reə̆hmuk, are regarded as vowels in Khmer, even though they represent the sounds m and h, respectively.

Used on their own, they can change the inherent vowel, affecting both the pronunciation and the meaning, eg. ដម ដំ

Nikahit can also follow the vowel-signs for aa, and u/uː. The following example shows its effect on the vowel-sign [U+17B6 KHMER VOWEL SIGN AA]u,643: ពាម ពាំ

They are also used in combination with other vowel signs (see composite_vowels), and the 2 following sequences are regarded as letters in the Khmer alphabet. They are not encoded separately in Unicode, but they are named sequences.

ាំ␣ុំ

Other characters

The following produce glides or when used in syllable-final position.

វ␣យ

See composite_vowels for the few instances where these are used.

[U+179A KHMER LETTER RO] is usually silent in syllable-final position, but the combination –័រ [U+17D0 KHMER SIGN SAMYOK SANNYA + U+179A KHMER LETTER RO] produces the sound ɔə, eg. ជ័រ

Vowel modifier marks

័␣់

[U+17D0 KHMER SIGN SAMYOK SANNYA​] is used in some Pali and Sanskrit loan words (although alternative spellings exist) and indicates that the syllable has a particular vowel (click on the name just above for more details). 

[U+17CB KHMER SIGN BANTOC] is always placed above the final consonant, and basically shortens the preceding vowel.

Composite vowels

Vowels represented by combinations of the above characters (not including the decomposed versions of the 2 circumgraphs mentioned earlier):

ាំ␣ុំ␣ុះ␣េះ␣ោះ␣ា␣ាំង␣ិះ␣ឹះ␣ូវ␣ើះ␣ែះ␣័រ␣ិយ␣័យ
Show which combinations contain a given character:
ា់␣ាំង␣ាំ
ិយ␣ិះ
ឹះ
ុះ␣ុំ
ូវ
ើះ
េះ
ែះ
ោះ
ុំ␣ាំង␣ាំ
ោះ␣ិះ␣េះ␣ុះ␣ែះ␣ឹះ␣ើះ
ា់
័យ␣័រ
ិយ␣័យ
ាំង
័រ
ូវ
Show details about vowel glyph positioning.

The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of vowel-sign and niʔkəhət/reə̆hmuk,

  • 3 prescript, eg. កេ ke
  • 2+1 postscript, eg. កា
  • 4+1 superscript, eg. កិ ki
  • 3 subscript, eg. កុ ku
  • 4+1 pre+postscript, eg. កោ ko
  • 1+1 super+postscript, eg. កាំ kām̽
  • 0+1 super+subscript, ie. កុំ kum̽
  • 0+1 sub+postscript, ie. កុះ kuh̽
  • 0+1 pre+post+postscript, ie. កោះ koh̽

At maximum, vowel components can occur concurrently on 2 sides of the base.

Characters that don't appear in the combinations:

ី␣ឺ␣ួ␣ឿ␣ៀ␣ៃ␣ៈ␣ៅ

Standalone vowels

Most of the time, vowels that appear to be standalone are actually pronounced and written after a glottal stop. These use the regular consonant [U+17A2 KHMER LETTER QA] followed by a vowel-sign (and អ៊ [U+17A2 KHMER LETTER QA + U+17CA KHMER SIGN TRIISAP] to change the register), eg. អូ អ៊ូ ចង្អូរ

Independent vowels

Khmer also has independent vowel letters that can be used to represent some of these sounds, but unlike most South Asian scripts, there are fewer independent vowels than vowel sounds, and some do not have direct correspondences with a simple vowel sign, eg. [U+17AA KHMER INDEPENDENT VOWEL QUUV] corresponds phonetically to the composite vowel អូវ [U+17A2 KHMER LETTER QA + U+17BC KHMER VOWEL SIGN UU + U+179C KHMER LETTER VO] (See also vocalics.)

ឥ␣ឦ␣ឪ␣ឧ␣ឩ␣ឯ␣ឰ␣ឱ␣ឳ␣ឲ

Whether a vowel sound is represented using an independent vowel letter or the glottal consonant plus vowel-sign varies from word to word. In Cambodian orthography the two are not interchangeable. The independent vowel letters appear in relatively few words, but some of those words are quite common, eg. ឪពុក ឲ្យ

Other characters

The Unicode Khmer block contains 3 more independent vowels that are either obsolete or strong deprecated.

The Unicode Standard regards the following 2 characters as errors in the encoding.

ឣ␣ឤ

Silent vowels & consonants

Inherent vowels are not pronounced after syllable final consonants.

Khmer also has some diacritics that silence vowels or parts of the text.

្␣៍
៑␣៌␣៝

Vowels are not pronounced between stacked consonants. The first character in the lists above, [U+17D2 KHMER SIGN COENG], is used to create those consonant stacks. It is never visible. See clusters.

Two diacritics, [U+17CC KHMER SIGN ROBAT​] and [U+17CD KHMER SIGN TOANDAKHIAT​], are used to silence written characters. The former is not very common and silences final consonants, eg. បរិបូណ៌

(although it also introduces or affects sound in some cases in multisyllabic words). The latter is used over a consonant, particularly in loan words, to silence it and any attached vowels or subscripts, eg. សប្ដាហ៍ រេហ៍ពល កេរ្តិ៍

[U+17D1 KHMER SIGN VIRIAM], the sanskrit virama, is sometimes used in Sanskrit words to indicate that a final consonant has no vowel sound, eg. អាត្មន៑

[U+17DD KHMER SIGN ATTHACAN], on the other hand, is a rarely used sign that indicates that a final consonant retains its inherent vowel sound.

Vowel sounds mapped to characters

This section maps Khmer vowel sounds to common graphemes, grouped by register (class 1 or 2). The dotted circle indicates the location of the consonant relative to the vowel-sign; if there are 2 circles, the vowel is used only in closed syllables. Click on the character names to see examples.

Plain vowels

i
2

[U+17B7 KHMER VOWEL SIGN I]. Followed by glottal stop in stressed syllables, eg. លទ្ធិ, but not in unstressed, eg. និទាន.

2

[U+17B8 KHMER VOWEL SIGN II], eg. ពីរ.

◌ិយ [U+17B7 KHMER VOWEL SIGN I + U+1799 KHMER LETTER YO], eg. ឥន្ត្រិយ.

ɨ
2

ិ◌ [U+17B7 KHMER VOWEL SIGN I, except before [U+1799 KHMER LETTER YO], eg. ជិត.

ឹ◌ [U+17B9 KHMER VOWEL SIGN Y], eg. ទឹក.

[U+17C1 KHMER VOWEL SIGN E], before palatals, eg. ពេញ.

 
i

[U+17A5 KHMER INDEPENDENT VOWEL QI], eg. ឥន្ទុ

ɨː
2

[U+17BA KHMER VOWEL SIGN YY], eg. គឺ.

u
2

[U+17BB KHMER VOWEL SIGN U]. Followed by glottal stop in stressed syllables, eg. វិទ្យុ, but not in unstressed, eg. គុលិកា.

ុ◌ [U+17BB KHMER VOWEL SIGN U], eg. គុណ.

[U+17CB KHMER SIGN BANTOC] after an inherent, series 2 vowel and before a labial final, eg. ទប់.

 
i

[U+17A7 KHMER INDEPENDENT VOWEL QU, eg. ឧបម៉ា.

2

[U+17BC KHMER VOWEL SIGN UU], eg. គូ.

 
i
e
1

[U+17B7 KHMER VOWEL SIGN I]. Followed by glottal stop in stressed syllables, eg. មតិ, but not in unstressed, eg. កិរិយា.

2

[U+17C1 KHMER VOWEL SIGN E], eg. គេ.

o
1

[U+17BB KHMER VOWEL SIGN U]. Followed by glottal stop in stressed syllables, eg. វត្តុ, but not in unstressed, eg. កុមារ.

ុ◌ [U+17BB KHMER VOWEL SIGN U], eg. កុន.

 
i

[U+17A7 KHMER INDEPENDENT VOWEL QU], eg. ឧកញ៉ា.

2

[U+17C4 KHMER VOWEL SIGN OO], eg. គោ.

Inherent vowel in syllables that end with [U+0E23 THAI CHARACTER RO RUA], eg. พร p̱ʰṟ poːn blessing.

ə
1

ិ◌ [U+17B7 KHMER VOWEL SIGN I, eg. ចិត្ត.

ឹ◌ [U+17B9 KHMER VOWEL SIGN Y], eg. ដឹក.

[U+17C1 KHMER VOWEL SIGN E], before palatals, eg. ម៉េច.

 
i
əː
2

[U+17BE KHMER VOWEL SIGN OE], eg. ឈើ.

ɛː
2

[U+17C2 KHMER VOWEL SIGN AE], eg. គែ.

ɔː
2

Inherent vowel for 2nd series consonants, eg. .

a
1

[U+17D0 KHMER SIGN SAMYOK SANNYA], eg. ស័ក

ា◌់ [U+17B6 KHMER VOWEL SIGN AA + U+17CB KHMER SIGN BANTOC], eg. កាត់.

1

Inherent vowel for 1st series consonants, eg. .

[U+17B6 KHMER VOWEL SIGN AA], តា.

ɑ
1

[U+17CB KHMER SIGN BANTOC] after an inherent, series 1 vowel, eg. កត់.

ɑː
1

Inherent vowel for 1st series consonants, eg. .

Diphthongs and other combinations

2
 
1/2

[U+17C0 KHMER VOWEL SIGN IE], តៀប, ទៀប.

ɨə
1/2

[U+17BF KHMER VOWEL SIGN YA], eg. តឿ, ជឿ.

ɨj
2

័យ [U+17D0 KHMER SIGN SAMYOK SANNYA + U+1799 KHMER LETTER YO], eg. ជ័យ.

[U+17C3 KHMER VOWEL SIGN AI], eg. ព្រៃ.

ɨw
2

ូវ [U+17BC KHMER VOWEL SIGN UU + U+179C KHMER LETTER VO], eg. នូវ.

[U+17C5 KHMER VOWEL SIGN AU], eg. ទៅ.

1/2

[U+17BD KHMER VOWEL SIGN UA], eg. កួរ, គួរ.

uə̆
2

[U+17CB KHMER SIGN BANTOC] after an inherent, series 2 vowel and before a non-labial final, eg. យល់.

uə̆h
2
uh
2
um
2

ុំ [U+17BB KHMER VOWEL SIGN U + U+17C6 KHMER SIGN NIKAHIT], eg. ទុំ.

[U+17C6 KHMER SIGN NIKAHIT], eg. ទំ.

eə̆
2

[U+17D0 KHMER SIGN SAMYOK SANNYA], before velar finals, eg. ល័ខ

ា◌់ [U+17B6 KHMER VOWEL SIGN AA + U+17CB KHMER SIGN BANTOC] before velar finals, eg. ទាក់.

ei
1

[U+17C1 KHMER VOWEL SIGN E], eg. កេរ្តិ៍.

eh
1
eə̆h
2

[U+17C7 KHMER SIGN REAHMUK], eg. ទះ

ou
1

[U+17BC KHMER VOWEL SIGN UU], eg. កូរ.

 
i

[U+17A9 KHMER INDEPENDENT VOWEL QUU], eg. ឩដ្ឋ

oə̆
2

[U+17D0 KHMER SIGN SAMYOK SANNYA], before non-velar finals, eg. ទ័ព

ា◌់ [U+17B6 KHMER VOWEL SIGN AA + U+17CB KHMER SIGN BANTOC] before non-velar finals, eg. គាត់.

oə̆m
2
oh
1
om
1
əɨ
1

[U+17BA KHMER VOWEL SIGN YY], eg. ដឺ.

əj
1

[U+17B8 KHMER VOWEL SIGN II], eg. បី.

 
2

ិយ [U+17B7 KHMER VOWEL SIGN I + U+1799 KHMER LETTER YO], eg. ចេតិយ.

 
i

[U+17A5 KHMER INDEPENDENT VOWEL QI], eg. ឥឡូវ.

[U+17A6 KHMER INDEPENDENT VOWEL QII], eg. ឦសាន

əw
1

ូវ [U+17BC KHMER VOWEL SIGN UU + U+179C KHMER LETTER VO], eg. ត្រូវ.

 
i

[U+17AA KHMER INDEPENDENT VOWEL QUUV], eg. ឪពុក.

əh
1

ឹះ [U+17B9 KHMER VOWEL SIGN Y + U+17C7 KHMER SIGN REAHMUK], eg. ឆ្កឹះ.

ើះ [U+17BE KHMER VOWEL SIGN OE + U+17C7 KHMER SIGN REAHMUK], eg. ចង្កើះ (normally spelled ចង្កឹះ cŋ͓kɨh̽).

ɔə
2
ae
1

[U+17C2 KHMER VOWEL SIGN AE], eg. កែ.

 
i
1

[U+17BE KHMER VOWEL SIGN OE], eg. បើ.

ao
1

[U+17C4 KHMER VOWEL SIGN OO], eg. កោរ.

 
i
aj
1

័យ [U+17D0 KHMER SIGN SAMYOK SANNYA + U+1799 KHMER LETTER YO], eg. សម័យ.

[U+17C3 KHMER VOWEL SIGN AI], eg. ប្រៃ.

 
i

[U+17B0 KHMER INDEPENDENT VOWEL QAI], eg. ឰរាវ័ណ.

aw
1

[U+17C5 KHMER VOWEL SIGN AU], eg. តៅ.

 
i

[U+17B3 KHMER INDEPENDENT VOWEL QAU], eg. ឳទក.

ah
1

ោះ [U+17C4 KHMER VOWEL SIGN OO + U+17C7 KHMER SIGN REAHMUK], eg. កោះ.

[U+17C7 KHMER SIGN REAHMUK], eg. តះ

am
1
ɑm
1

[U+17C6 KHMER SIGN NIKAHIT], eg. ចំ.

Vocalics

Khmer represents vocalics only as independent vowel letters.

ឫ␣ឬ␣ឭ␣ឮ

Consonants

Click on the characters in the lists for detailed information. For a mapping of sounds to graphemes see consonant_mappings.

Changing consonant registers

Khmer is not tonal, but each consonant character belongs to one of two classes. The class of a consonant determines the vowel sound in a syllable. For example, compare kɑː kɔːand កី kəj គី kiː

៉␣៊

Two diacritics, [U+17C9 KHMER SIGN MUUSIKATOAN​] and [U+17CA KHMER SIGN TRIISAP​], are used to change the class of a consonant. These are particularly useful when a given sound has only one character associated with it, such as the letters , and etc.

Each of these diacritics should be typed and stored immediately after the base character (unless there is a ZWNJ, as described in consonant_shift_posn).u,647

Basic consonants

This shows consonants in use in modern Khmer, although some are not widely used.

The 2 registers are shown separately for the plosives, and the remainder are mixed. A superscript ɑ or ɔ indicates which register the consonant belongs to. Where pronunciation differs, forms such as p- indicate the sound at the start of a consonant cluster, and -p at the end of a syllable.

Stops

ɑː class
ផ␣ប␣ត␣ថ␣ឋ␣ដ␣ច␣ឆ␣ក␣ខ␣អ
ɔː class
ព␣ភ␣ទ␣ធ␣ឍ␣ឌ␣ជ␣ឈ␣គ␣ឃ

Fricatives

វ␣ស␣ហ

Nasals

ម␣ន␣ណ␣ញ␣ង

Other sonorants

រ␣ឡ␣ល␣យ

The following 3 consonants are obsolete, and used only for Pali/Sanskrit transliteration.

ឝ␣ឞ␣ៜ

Final consonants

Not all Khmer consonants can appear in syllable-final position. The most common syllable-final consonants include the following:

ប␣ត␣ក␣ម␣ន␣ញ␣ង␣ល

The pronunciation of the consonant in final position may differ from it's normal pronunciation, but it is not followed by a vowel sound.

Because ordinary letters are used in word-final position, it is difficult to parse Khmer. For example, the sequence កក could equally represent two syllables kɑːkɑː with inherent vowels, or one syllable with a final -k sound kɑːʔ.

Two final consonant sounds m and h can also be produced using combining characters. See anusvara_visarga for details.

Consonant clusters

In Khmer, word-internal clusters are very common at the beginning of a word, but clusters also occur medially in multisyllable words, and occasionally at the end of a word.

The absence of a vowel sound between two or more consonants is visually indicated by stacked consonants (only), where the non-initial consonant appears below the initial, typically with a different shape from normal.

In Unicode, the stacking behaviour is achieved by adding [U+17D2 KHMER SIGN COENG] between the consonants. This character has no visual representation.

Word boundaries. Clusters do not span word boundaries. Consonant clusters formed by the end of one word and the beginning of the next do not lead to stacking in Khmer.

Stacking

Subscript consonant forms are called ជើងអក្សរ (or 'coeng', pronounced cəːŋ).

Cambodians see these subscripts as distinct letter forms, but, unlike Tibetan, they are produced in Unicode by inserting [U+17D2 KHMER SIGN COENG] before the consonant that will become a subscript. This character, which has no visual form in Cambodian, is called the coeng in Unicode, although it should rightly be called the coeng generator.

All the shapes are simplified and reduced in size compared to the non-subscript form. Many have significantly different shapes.

This list shows consonants in their normal and subjoined forms

Class ɑː consonants
ផ្ផ␣ប្ប␣ត្ត␣ថ្ថ␣ឋ្ឋ␣ដ្ដ␣ក្ក␣ខ្ខ␣អ្អ␣ច្ច␣ឆ្ឆ␣ស្ស␣ហ្ហ␣ឡ្ឡ
Class ɔː consonants
ព្ព␣ភ្ភ␣ទ្ទ␣ធ្ធ␣ឍ្ឍ␣ឌ្ឌ␣គ្គ␣ឃ្ឃ␣ជ្ជ␣ឈ្ឈ␣វ្វ␣ម្ម␣ន្ន␣ណ្ណ␣ញ្ញ␣ង្ង␣រ្រ␣ល្ល␣យ្យ

[U+179A KHMER LETTER RO] produces a subjoined form that wraps to the left and under the preceding consonant. Several others wrap below and to the right of the consonant. [U+17A1 KHMER LETTER LA] doesn't normally appear in subscript form.

Where the two consonants involved in the cluster are in different classes or registers, the pronunciation of any following vowel is normally determined by the register of the subscript consonant. For the following exceptions, however, the vowel pronunciation is determined by the register of the first consonant:

ង␣ញ␣ន␣ម␣យ␣រ␣ល␣វ

Some subscripts change the sound of the preceding consonant.

Subscript consonants that appear at the end of a word, are silent, eg. ពេទ្យ រដ្ឋ

In some multisyllabic words a medial cluster may contain a final consonant for the first syllable and the initial consonant of the next syllable, eg. កម្មករ

Some clusters involve two subscripts. These are, with three exceptions, composed of a final nasal, followed by a stop and r, eg. កន្ត្រៃ កញ្ជ្រេង

The three exceptions are the loan words, អង្គ្លេស សងស្ក្រិត សាស្ត្រាចារ្យ

Combining characters & stacks

Most vowels that are pronounced after a consonant stack are displayed as if they were attached to the consonant at the top of the stack (including pre-base vowel-signs and circumgraphs) eg. (click on the examples to see the underlying sequence more clearly) ក្លិន ឆ្វេង ភ្លៀង

This can also lead to ligation between the top consonant in the stack and a vowel-sign, unless the subjoined consonant is right-spacing, eg. ក្ដារ ក្រាក់ ក្បាល

The exceptions are the vowel-signs that appear below the consonant; these appear below the last subjoined form, unless the subjoined form is left-spacing, eg. ខ្ពុរ ត្រុំ

Other diacritics are treated in a similar fashion. This can lead to ambiguity if a diacritic could modify either consonant in a stack.

Subscript consonants after vowels

It is rare but possible to find subscripts used after independent vowels. One common word spelled this way is ឲ្យ

It is also possible to find subscript forms of independent vowels. Four of these are named sequences in Unicode.

Consonant sounds to characters

This section maps Khmer consonant sounds to common graphemes, grouped by register (class 1 or 2). Click on the character names to see examples.

Initials

p
1
 
2

[U+1796 KHMER LETTER PO], eg. ពី.

b
1

[U+1794 KHMER LETTER BA], eg. ប្រុត.

 
2

ប៊ [U+1794 KHMER LETTER BA + U+17CA KHMER SIGN TRIISAP], eg. ប៊ុត.

1

[U+1795 KHMER LETTER PHA], eg. ផាយ.

 
2

[U+1797 KHMER LETTER PHO], eg. ភាយ.

t
1

[U+178F KHMER LETTER TA], eg. តា.

 
2

[U+1791 KHMER LETTER TO], eg. ទា.

d
1

[U+178A KHMER LETTER DA], eg. ដុន.

 
2

[U+178C KHMER LETTER DO], eg. ឌុន.

1

[U+1790 KHMER LETTER THA], eg. ថូ.

[U+178B KHMER LETTER TTHA], eg. ឋាន.

 
2

[U+1792 KHMER LETTER THO], eg. ធូរ.

[U+178D KHMER LETTER TTHO], eg. ឍាល.

k
1

[U+1780 KHMER LETTER KA], eg. .

 
2

[U+1782 KHMER LETTER KO], eg. .

1

[U+1781 KHMER LETTER KHA], eg. ខត់.

 
2

[U+1783 KHMER LETTER KHO], eg. ឃត់.

ʔ
1

[U+17A2 KHMER LETTER QA], eg. អី.

 
2
c
1

[U+1785 KHMER LETTER CA], eg. ចា.

 
2

[U+1787 KHMER LETTER CO], eg. ជា.

1

[U+1786 KHMER LETTER CHA], eg. ឆោង.

 
2

[U+1788 KHMER LETTER CHO], eg. ឈោង.

f
1
 
2
s
1

[U+179F KHMER LETTER SA], eg. សី.

 
2
h
1

[U+17A0 KHMER LETTER HA], eg. ហាង

 
2

ហ៊ [U+17A0 KHMER LETTER HA + U+17CA KHMER SIGN TRIISAP], eg. ហ៊ាន.

 
2

[U+1798 KHMER LETTER MO], eg. មែ.

n
1

[U+178E KHMER LETTER NNO], eg. ណាយ.

ហ្ន [U+17A0 KHMER LETTER HA + U+17D2 KHMER SIGN COENG + U+1793 KHMER LETTER NO], eg. ហ្នឹង

 
2

[U+1793 KHMER LETTER NO], eg. នាយ.

ɲ
1
 
2

[U+1789 KHMER LETTER NYO], egញាំ.

ŋ
1
 
2

[U+1784 KHMER LETTER NGO], eg. ងាវ.

 
2

[U+179C KHMER LETTER VO], eg. វៃ.

r
1
 
2

[U+179A KHMER LETTER RO], eg. เรือ eṟɯ̄ʔ̯ rɯːa boat.

l
1

[U+17A1 KHMER LETTER LA], eg. ឡេង.

ហ្ល [U+17A0 KHMER LETTER HA + U+17D2 KHMER SIGN COENG + U+179B KHMER LETTER LO], eg. ហ្លួង

 
2

[U+179B KHMER LETTER LO], eg. លាង.

j
1
 
2

[U+1799 KHMER LETTER YO], eg. យាង.

Vocalics

 

[U+17AB KHMER INDEPENDENT VOWEL RY]. eg. ឬស្សី.

rɨː
 
 

[U+17AD KHMER INDEPENDENT VOWEL LY]. eg. រំឭក.

lɨː
 

Finals

p
f

[U+1794 KHMER LETTER BA], eg. រៀប.

[U+1796 KHMER LETTER PO], eg. ភាព.

[U+1797 KHMER LETTER PHO], eg. លោភ.

t
f

[U+178F KHMER LETTER TA], eg. កាត់.

[U+1791 KHMER LETTER TO], eg. បាទ.

[U+178A KHMER LETTER DA], eg. ប្រាកដ.

[U+178B KHMER LETTER TTHA], eg. បាឋ.

ដ្ឋ [U+178A KHMER LETTER DA + U+17D2 KHMER SIGN COENG + U+178B KHMER LETTER TTHA], eg. ឥដ្ឋ.

[U+178C KHMER LETTER DO], eg. គ្រុឌ.

[U+1790 KHMER LETTER THA], eg. ប្រមាថ.

  [U+1792 KHMER LETTER THO], eg. អាវុធ.

[U+178D KHMER LETTER TTHO], eg. អាសាឍ.

c
f

[U+1785 KHMER LETTER CA], eg. តូច.

[U+1787 KHMER LETTER CO], eg. រាជ.

k
f

[U+1780 KHMER LETTER KA], eg. ជីក.

[U+1782 KHMER LETTER KO], eg. រោគ.

[U+1781 KHMER LETTER KHA], eg. មុខ.

[U+1783 KHMER LETTER KHO], eg. មេឃ.

ʔ
f

All of the following occur after one of the following vowels: a, aː, ɑ, ɑː, eə̆, uə̆, iə, ɨə, uə.

[U+1780 KHMER LETTER KA], eg. នាគ.

[U+1782 KHMER LETTER KO], eg. នាគ.

[U+1781 KHMER LETTER KHA], eg. ពិសាខ.

[U+1783 KHMER LETTER KHO], eg. មាឃ.

s
f

[U+179F KHMER LETTER SA], only in very formal reading style, eg. សូមទោស.

h
f

[U+179F KHMER LETTER SA], eg. ចាស់.

[U+17C7 KHMER SIGN REAHMUK], eg. ទះ. This represents -ah after a 1st class consonant, and  -eə̆h after a 2nd class consonant.

m
f

[U+1798 KHMER LETTER MO]. eg. តាម.

ុំ [U+17BB KHMER VOWEL SIGN U + U+17C6 KHMER SIGN NIKAHIT], eg. ដុំ.

[U+17C6 KHMER SIGN NIKAHIT], eg. ចំ.

ាំ [U+17B6 KHMER VOWEL SIGN AA + U+17C6 KHMER SIGN NIKAHIT], eg. ចាំ

n
f

[U+1793 KHMER LETTER NO], eg. មាន.

[U+178E KHMER LETTER NNO], eg. បូរាណ.

ɲ
f

[U+1789 KHMER LETTER NYO], eg. ទាញ.

ŋ
f

[U+1784 KHMER LETTER NGO]. eg. ដឹង.

[U+1789 KHMER LETTER NYO], after , eg. រីញ.

w
f

[U+179C KHMER LETTER VO]. eg. អាវ.

[U+17C5 KHMER VOWEL SIGN AU], eg. ហៅ.

l
f

[U+179B KHMER LETTER LO]. eg. កាល.

j
f

[U+1799 KHMER LETTER YO]. eg. បាយ.

[U+17C3 KHMER VOWEL SIGN AI], eg. ព្រៃ.

Encoding choices

In some fonts, two circumgraphs look the same whether they are written as a single character, or as two.

Recommended Not recommended
[U+17BE KHMER VOWEL SIGN OE] េី [U+17C1 KHMER VOWEL SIGN E + U+17B8 KHMER VOWEL SIGN II]
[U+17C4 KHMER VOWEL SIGN OO] េា [U+17C1 KHMER VOWEL SIGN E + U+17B6 KHMER VOWEL SIGN AA]

For Khmer, single and multiple code point realisations do not normalise to be the same in NFC or NFD, so you are creating different content by using one approach or the other. This may affect various operations on the text, and it is therefore better to stick with one representation. The Unicode Standard surprisingly makes no comment on this, although it does for other scripts, where it encourages use of the precomposed, single code point.

Also, some fonts may not display the decomposed sequences correctly.

Other letters

In addition to those mentioned above, the Unicode Khmer block has one other character with a general category of letter.

[U+17D7 KHMER SIGN LEK TOO] is used to indicate repetition (see repetition).

Numbers, dates, currency, etc.

Khmer has its own set of decimal digits, although western digits are also used sometimes.

០␣១␣២␣៣␣៤␣៥␣៦␣៧␣៨␣៩

The thousands separator is . [U+002E FULL STOP], and decimal separator is , [U+002C COMMA].

Ranges and dates use the ASCII hyphen.ws,#Spacing_and_punctuation

Currency

The symbol [U+17DB KHMER CURRENCY SYMBOL RIEL] (សញ្ញារៀល sɲ͓ɲāṟiᵊḻ saɲ ɲaː riəl) is placed after the amount, eg. ៣.០០០ ៛ ɓej poan riəl 3,000 riel. Sometimes [U+179A KHMER LETTER RO] is used instead.

Other

The Unicode Khmer block contains a set of numeric symbols for divination lore.

៰␣៱␣៲␣៳␣៴␣៵␣៶␣៷␣៸␣៹

Use of [U+17D3 KHMER SIGN BATHAMASAT] is discouraged in favor of the complete set of lunar date symbols.

The Khmer Symbols block is entirely composed of Lunar date symbols.

᧠␣᧡␣᧢␣᧣␣᧤␣᧥␣᧦␣᧧␣᧨␣᧩␣᧪␣᧫␣᧬␣᧭␣᧮␣᧯␣᧰␣᧱␣᧲␣᧳␣᧴␣᧵␣᧶␣᧷␣᧸␣᧹␣᧺␣᧻␣᧼␣᧽␣᧾␣᧿

Text direction

Khmer text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Khmer orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Khmer character app.

The Khmer script has no case distinction, and no special transforms are needed to convert between characters.

The script is not cursive (ie. joined up, like Arabic).

Writing styles

There are several distinct styles of font in Modern Khmer.

Most modern typefaces are set in an upright style (called អក្សរឈរ ʔk͓sṟc̱ʰṟ ʔɑːksɑː cʰɔː or អក្សរត្រង់ ʔk͓sṟt͓ṟŋ˘ ʔɑːksɑː trɑŋ).ws,#Styles This is the style used for this page.

អក្សរខ្មែ
The text អក្សរខ្មែរ in an âksâr chôr font style.

The slanted style (អក្សរជ្រៀង ʔk͓sṟc̱͓ṟiᵊŋ ʔɑːksɑː criəŋ) is used for whole documents or novels. The oblique styling has no affect on the semantics of the text.ws,#Styles

អក្សរខ្មែ
The text អក្សរខ្មែ in an âksâr chriĕng font style.

The round style (អក្សរមូល ʔk͓sṟm̱ūḻ ʔɑːksɑː muːl) includes more ligated forms, and is used for titles and headings in Cambodian documents, books, or currency, as well as on shop signs or banners. It may also be used to emphasise important names or nouns.ws,#Styles

អក្សរខ្មែ
The text អក្សរខ្មែ in an âksâr mul font style.

Another style (អក្សរខម ʔk͓sṟkʰm̱ ʔɑːksɑː kʰɑːm), characterized by sharper serifs and angles and retainment of some antique characteristics, is used for yantra text in Cambodia as well as in Thailand.ws,#Styles

Context-based shaping & positioning

This section mentions a handful of exceptions to the general rule that there is very little in the way of interaction between characters other than where the subscript shapes are used after the coeng generator.

Subjoined consonants can be quite different shapes from those of the corresponding base consonant, and those shapes can be seen in clusters. However, some additional reshaping of glyphs is needed to cope with stacking of characters. Compare for example the length of the final element fig_long_subjoined_yo.

ង្យ  ង្ខ្យ
The subjoined form of [U+1799 KHMER LETTER YO] is lengthened when it is the second subjoined consonant.

Joining forms for AA

Some small joining features occur in relation to [U+17B6 KHMER VOWEL SIGN AA] and similarly shaped vowels. Unicode provides the following list of common forms:

  1. ក + ា = កា
  2. ប +  ា = បា (avoids confusion with )
  3. ប +  ៅ = បៅ
  4.  ្ស +  ា = ្សា

Register-shifter position

The function of the register-shifter marks is described in register_changes.

When   [U+17C9 KHMER SIGN MUUSIKATOAN​] museʔkətoə̯n or   [U+17CA KHMER SIGN TRIISAP​] trəisaɓ appears with a vowel sign above the consonant, the ក្បៀសក្រោម kɓiəhkraom form may be used. This looks exactly like [U+17BB KHMER VOWEL SIGN U​], eg. compare the following uses of MUUSIKATOAN យ៉ាង ម៉ឺន ញ៉ាំ

Here are some examples of TRIISAP អ៊ូ អ៊ី

The Unicode Standardu,647 gives the impression that both of these diacritics are moved below the consonant any time a vowel appears over that consonant. However, in reality only certain consonants cause this behaviour.

For more details, see this SIL document.sil

Observation: SIL and Noto fonts, as well as others, stack the diacritics above only certain consonants. The behaviour varies a little by font, but in general the diacritic is lowereed for the following 3 letters for TRIISAP: សហអand these letters for MUUSIKATOAN: បវមនញងរលយ

If needed, this behaviour can be modified using ZWNJ [U+200C ZERO WIDTH NON-JOINER], if the font recognises it, to prevent the low form appearing. The ZWNJ must be typed and stored between the base consonant and the register-shifter. See fig_zwnj_triisap for an example using ប្រតឺងអ‌៊ឹះ.

ប្រតឺងអ‌៊ឹះ      ប្រតឺងអ៊ឹះ
TRIISAP rendered above the consonant by using ZWNJ immediately before it (left), when it would otherwise be moved below the consonant (right).

NYO with subscripts

Another common feature is that [U+1789 KHMER LETTER NYO] drops the swash below the baseline when followed by a subscript consonant, eg. បញ្ឆោត Also, when it appears as a subscript under itself it uses a special full form subscript. Compare កញ្ញា ប្រាជ្ញា

Vowel ligatures

In the âksâr mul style, some vowel signs ligate with the consonant characters to which they are applied. See the word វិទូ in fig_vowel_ligatures.

វិទូ វ‌ិទូ
The sound vi written as a ligature (left) and with no ligature (right) in the âksâr mul style.

Observation: This behaviour is font dependent. For example, the Khmer Mool font (used for the figure), and Khmer OS Muol produce the ligation, but the Khmer Ratanakiri font does not.

To prevent a ligature forming, use ZWNJ [U+200C ZERO WIDTH NON-JOINER] (ZWNJ) between the consonant and the vowel. To cause a ligature to form when there isn't one, if the font has the appropriate rules, use ZWJ [U+200D ZERO WIDTH JOINER] (ZWJ) instead.u,647

Font styles

tbd

Punctuation & inline features

Grapheme boundaries

tbd

Word boundaries

Khmer words are not separated by spaces, nevertheless Khmer should be wrapped at word boundaries.

Sometimes ZWSP [U+200B ZERO-WIDTH SPACE] may be used to indicate appropriate break-points.

Some other languages that use the Khmer script, such as Krung and Tampuan, do separate words with narrow spaces such as 6/MSP [U+2006 SIX-PER-EM SPACE], and separate phrases with a wider space such as EMSP [U+2003 EM SPACE].sil,5

Phrase & section boundaries

៖␣꧇␣។␣?␣!␣៎␣៕␣៙␣៚
phrase

U+0020 SPACE

[U+17D6 KHMER SIGN CAMNUC PII KUUH]

[U+A9C7 JAVANESE PADA PANGKAT]

sentence

[U+17D4 KHMER SIGN KHAN]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK]

[U+17CE KHMER SIGN KAKABAT​]

section [U+17D5 KHMER SIGN BARIYOOSAN]
text

[U+17D9 KHMER SIGN PHNAEK MUAN]

[U+17DA KHMER SIGN KOOMUUT]

Space. Although Khmer words are not separated by spaces, the space (ឃ្លា kliə) is used, and is regarded as punctuation, similar to the comma. Huffman lists the following uses:

  1. between clauses within a sentence
  2. between sentences in a cohesive group of sentences
  3. after preposed adverbial phrases, such as 'usually', 'today', 'in that town', etc.
  4. before and after proper names
  5. before and after numbers
  6. before and after the symbols and and the terms ។ល។ and ។ប។
  7. between coordinate words in lists

Huffman gives the following example to show the use of the space:

ថ្ងៃនេះ ខ្ញុំទៅផ្សារ ទិញក្រច អង្ករ ហើយនឹងអីវ៉ាន់ផ្សេង ៗ
tŋajnih kɲomtɨwpsaː tiɲkrouc ʔɑŋkɑː haəjnɨŋʔəjʋanpseiŋ pseiŋ
Today ( ) I'm going to the market ( ) to buy oranges ( ) rice ( ) and various things.

As mentioned in word, some other languages use narrow spaces to separate words, and wide spaces, such as the EM SPACE, to separate phrases

Phrasal punctuation marks. Khmer also uses [U+17D4 KHMER SIGN KHAN] to mark the end of sentences, although a series of sentences on a related topic tend to be separated by space instead.

[U+17D6 KHMER SIGN CAMNUC PII KUUH] is used in much the same way as a western colon. 

Question & exclamation marks. Khmer uses Western punctuation marks, eg. ហេត៊អ្វី? haetʰ aʋəi, and កុំ! kom.

Very rarely, the combining character [U+17CE KHMER SIGN KAKABAT​] can be used over the final consonant of a word like an exclamation mark, to convey excited emphasis, eg. ណែ៎ nɛː Hey! នុ៎ះន៎ nuhnɔː Over there!

Section boundaries. [U+17D5 KHMER SIGN BARIYOOSAN] can be used to close a chapter, or an entire text.

Text start and end. Poetic and religious texts typically start with [U+17D9 KHMER SIGN PHNAEK MUAN] and end with [U+17DA KHMER SIGN KOOMUUT].

Parentheses & brackets

(␣)
  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Quotations

“␣”␣‘␣’
  start end
initial

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

According to CLDR, the default quote marks for Khmer should be, reading right to left, “...”. When an additional quote is embedded within the first, the quote marks should be ‘...’.

Emphasis

[U+17CE KHMER SIGN KAKABAT​] is very rare, but is used over the final consonant of a word like an exclamation mark, to convey excited emphasis, eg. ណែ៎ នុ៎ះន៎

Abbreviation, ellipsis & repetition

Ellipsis

The word ។ល។ (pronounced laʔ) is used as the equivalent of 'etc.'

A character exists that represents that sequence, [U+17D8 KHMER SIGN BEYYAL], but the Unicode Standard deprecates it, and recommends the use of the three separate characters instead.u

Other spellings for et cetera also exist. These include:

  • ។បេ។
  • –បេ–
  • –ល–

Repetition

It is common to repeat words or sometimes phrases in Khmer, particularly to provide emphasis. [U+17D7 KHMER SIGN LEK TOO] (called លេខទោ leːktoː) can be used for this, eg. ខ្លាំង ៗ klaŋklaŋ very strongគាត់មានផ្ទះថ្មី ៗ kaːtʰ miən pʰteə̯h tʰməitʰməi he has a brand new house

Sometimes this sign repeats a phrase rather than a word, eg. បន្តិចម្ដង ៗ ɓɑntecmɗɑːŋ ɓɑntecmɗɑːŋ little by little

It is also occasionally used to repeat the word at the end of a sentence for the beginning of a new sentenceh, eg. ខ្ញុំទៅផ្ទះខ្ញុំ ។ នៅជិតផ្សារ kɲomtɨwpteə̆hkɲom pteə̆hkɲom nɨwcɨtpsaː

The sign is usually separated from the text by a space.

Inline notes & annotations

tbd

Other inline ranges

tbd

Other punctuation

Names

- [U+002D HYPHEN-MINUS] (called សហសញ្ញា sɑːhɑː sɑːɲɲiə) is used between the parts of a person's name. Typically the family name (written first) and following names, but often all names for Chinese Cambodians, eg. ញ៉ុក-ថែម ɲok tʰaem, លី-ធាម-តេង liː tʰiəm teiŋ.

Line & paragraph layout

Line breaking & hyphenation

Although Khmer doesn't use spaces or dividers between words, the expectation is that line-breaks occur at word boundaries.

There are three basic types of Khmer word:

  1. Single, indivisible words: eg. ជាតិ c̱āti nationalវិទ្យាល័យ v̱iṯ͓ȳāḻăȳ highschoolកម្ម km̱͓m̱ mission
  2. Words with prefixes and suffixes: eg. អន្តរជាតិ ʔṉ͓tṟc̱āti internationalមហវិទ្យាល័យ m̱hv̱iṯ͓ȳāḻăȳ high school កម្មករ km̱͓m̱kṟ workers
  3. Compound words (combining 2, 3, or more single words): eg. ជាតិសាសន៍ c̱ātisāsṉ˟ raceកម្មផល km̱͓m̱pʰḻ karmaសកលវិទ្យាល័យ skḻv̱iṯ͓ȳāḻăȳ university

The first two types cannot be broken, but the third type can. For example, |ជាតិ|សាសន៍|, |កម្ម|ផល|, and |សកល|វិទ្យាល័យ|. (Hong)

Text is not broken at sub-word syllable boundaries. In fact, this is particularly difficult to do algorithmically in Khmer, because syllable-final consonants are indistinguishable from consonants with an inherent vowel that constitute a new syllable. Some kind of morphological analysis is needed.

Show (default) line-breaking properties for characters in the modern Khmer orthography.

Text alignment & justification

tbd

Letter spacing

tbd

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Khmer orthography uses numeric and alphabetic styles.

Numeric

The khmer numeric style is decimal-based and uses these digits.rmcs

០␣១␣២␣៣␣៤␣៥␣៦␣៧␣៨␣៩

Examples:

១␣២␣៣␣៤␣១១␣២២␣៣៣␣៤៤␣១១១␣២២២␣៣៣៣␣៤៤៤

Alphabetic

The cambodian-consonant alphabetic style for the Khmer language uses these letters.

ក␣ខ␣គ␣ឃ␣ង␣ច␣ឆ␣ជ␣ឈ␣ញ␣ដ␣ឋ␣ឌ␣ឍ␣ណ␣ត␣ថ␣ទ␣ធ␣ន␣ប␣ផ␣ព␣ភ␣ម␣យ␣រ␣ល␣វ␣ស␣ហ␣ឡ␣អ

Examples:

ក␣ខ␣គ␣ឃ␣ដ␣ផ␣អ␣កដ␣គឋ␣ចភ␣ញគ␣ឌណ

Prefixes and suffixes

List counters typically use a full stop+space as a suffix.

Examples:

ក. ខ. គ. ឃ. ង.
Separator for Cambodian list counters.

Styling initials

tbd

Page & book layout

This section is for any features that are specific to Khmer and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

Languages using the Khmer script

According to ScriptSource, the Khmer script is used for the following languages:

References