Khmer script summary

Updated 19-Feb-2018 • tags khmer, scriptnotes

This page provides basic information about the Khmer script. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as I learned. For character-specific details follow the links to the Khmer character notes.

For similar information related to other scripts, see the Script comparison table.

Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

Sample (Khmer)

មាត្រា ១ មនុស្សទាំងអស់ កើតមកមានសេរីភាព និងសមភាព ក្នុងផ្នែកសេចក្ដីថ្លៃថ្នូរនិងសិទ្ធិ។ មនុស្ស មានវិចារណញ្ញាណនិងសតិសម្បជញ្ញៈជាប់ពីកំណើត ហើយគប្បីប្រព្រឹត្ដចំពោះគ្នាទៅវិញទៅមក ក្នុង ស្មារតីភាតរភាពជាបងប្អូន។

មាត្រា ២ មនុស្សម្នាក់ៗ អាចប្រើប្រាស់សិទ្ធិនិងសេរីភាពទាំងអស់ ដែលមានចែងក្នុងសេចក្ដីប្រកាសនេះ ដោយគ្មានការប្រកាន់បែងចែកបែបណាមួយ មានជាអាទិ៍ ពូជសាសន៍ ពណ៌សម្បុរ ភេទ ភាសា សាសនា មតិនយោបាយ ឬមតិផ្សេងៗទៀត ដើមកំណើតជាតិ ឬសង្គម ទ្រព្យសម្បត្ដិ កំណើត ឬស្ថានភាព ដទៃៗទៀតឡើយ។ លើសពីនេះ មិនត្រូវធ្វើការប្រកាន់បែងចែកណាមួយ ដោយសំអាងទៅលើឋានៈខាងនយោបាយ ខាងដែនសមត្ថកិច្ច ឬខាងអន្ដរជាតិរបស់ប្រទេស ឬដែនដីដែលបុគ្គលណាម្នាក់រស់នៅ ទោះបីជាប្រទេស ឬដែនដីនោះឯករាជ្យក្ដី ស្ថិតក្រោមអាណាព្យាបាលក្ដី ឬគ្មានស្វ័យគ្រប់គ្រងក្ដី ឬស្ថិតក្រោមការដាក់ កម្រិតផ្សេងទៀតណាមួយ ដល់អធិបតេយ្យភាពក្ដី។

Usage & history

From Scriptsource:

The Khmer script is an abugida, descended from the Brahmic script Pallava. It is used for writing Khmer, the official language of Cambodia. The script is also sometimes used for writing minority languages in Cambodia, such as Brao and Mnong.

The Khmer literary tradition dates back to the 7th century, and the script is currently in widespread use, although it is estimated that 35% of the Khmer-speaking population aged 15 and over are illiterate in the script.

From Wikipedia:

The Khmer alphabet or Khmer script (Khmer: អក្សរខ្មែរ; IPA: [ʔaʔsɑː kʰmaːe]) is an abugida (alphasyllabary) script used to write the Khmer language (the official language of Cambodia). It is also used to write Pali in the Buddhist liturgy of Cambodia and Thailand.

It was adapted from the Pallava script, which ultimately descended from the Brahmi script, which was used in southern India and South East Asia during the 5th and 6th centuries AD. The oldest dated inscription in Khmer was found at Angkor Borei District in Takéo Province south of Phnom Penh and dates from 611. The modern Khmer script differs somewhat from precedent forms seen on the inscriptions of the ruins of Angkor. The Thai and Lao scripts are descended from an older form of the Khmer script.

Key features

The script is an abugida, ie. like most Brahmi-influenced scripts, each consonant carries with it an inherent vowel. The sound following a consonant can be modified by attaching vowel signs to the consonant when writing. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

Direction of text is horizontal, left to right. However, glyphs constituting a single syllable can appear on any side of the base character, and multiple diacritics are often needed to create the vowel in a syllable.

A key feature of Khmer is that there are a large number of vowel sounds, and only a few vowel signs; and there are a large number of consonant letters for only a small number of consonant sounds. This lead to a system where there are generally two consonant signs for a given sound, each belonging to one of two classes (or registers). So to determine the pronunciation of a vowel sign you start by seeing which class of consonant it follows. For example, using the two symbols for the sound k, is kɑː neck, and is kɔː mute.

Consonants stack in Khmer, but not always in a predictable way. Multiple consonants at the start and (sometimes) end of a syllable are usually stacked. Consonant clusters in a multisyllabic word also tend to stack. But syllable-final consonants, which can be one of a number of characters, often don't stack with the onset consonant of the next syllable.

There are two distinct styles of font in Modern Khmer: slanted (called អក្សរច្រៀង), which has an upright variant, and round (called អក្សរឈរ). The round style includes more ligated forms. The upright style is used here. Style examples: slanted upright អក្សរ ខ្មែ, round អក្សរ ខ្មែ.

Character lists by language

The Khmer script characters in Unicode 10.0 are in the following blocks:

Follow these links for information about characters used by languages associated with this script. The numbers in parentheses refer to non-ASCII characters.

For character-specific details see Khmer character notes.

Consonants

The Khmer block has 35 consonants. The list just below shows those in use in modern Khmer, although some are not widely used. The phonetic transcriptions below show the sound of the consonant in syllable initial (left) and final (right) positions.

list all
kɑː, -k
kʰâkʰɑː, -k
ḵôkɔː, -k
ḵʰôkʰɔː, -k
ngôŋɔː, -ŋ
cɑː, -c
cʰâcʰɑː
c̱ôcɔː, -c
c̱ʰôcʰɔː
ñôɲɔː, -ɲ
ɗɑː, -t
ṭʰâtʰɑː, -t
ḏôɗɔː, -t
ṱʰôtʰɔː, -t
nɑː, -n
tɑː, -t
tʰâtʰɑː, -t
ṯôtɔː, -t
ṯʰôtʰɔː, -t
ṉônɔː, -n
ɓɑː, -p
pʰâpʰɑː, -p
ꝑôpɔː, -p
ꝑʰôpʰɔː, -p
mɔː, -m
jɔː, -iː
rɔː
ḻôlɔː, -l
ʋɔː, -w
sɑː, -h
hɑː
lɑː
’âʔɑː

The following two consonants are obsolete.

list all
śɕ
ʂ

Consonant class modification

As mentioned above, the class of a consonant determines the vowel sound in a syllable. For example, kɑː vs. kɔː, and កី kəj vs. គី kiː.

Two diacritics, [U+17C9 KHMER SIGN MUUSIKATOAN​] and [U+17CA KHMER SIGN TRIISAP​], are used to change the class of a consonant. These are particularly useful when a given sound has only one character associated with it, such as the letters , and etc.

Final consonants

Not all Khmer consonants can appear in syllable-final position. The most common syllable-final consonants include the following:

list all
k
ŋ
ɲ
t
n
p
m
l

The pronunciation of the consonant in final position may differ from it's normal pronunciation.

Subscript consonants

It is common to find clusters of consonants with no intervening vowel sounds. In Khmer, this is very common at the beginning of a word, but clusters also occur medially in multisyllable words, and occasionally at the end of a word.

When two consonants occur together without an intervening vowel, the second is rendered in subscript form, called ជើងអក្សរ cəːŋʔɑʔsɑː (consonant feet) (or 'coeng', pronounced cəːŋ). Cambodians see these subscripts as distinct letter forms.

In Unicode subscript forms are produced by inserting [U+17D2 KHMER SIGN COENG​] before the consonant that will become a subscript. This character, which has no visual form in Cambodian, is called the coeng in Unicode, although it should rightly be called the coeng generator.

Where the two consonants involved in the cluster are in different classes or registers, the pronunciation of any following vowel is normally determined by the register of the subscript consonant. For the following exceptions, however, the vowel pronunciation is determined by the register of the first consonant:

list all

Some subscripts change the sound of the preceding consonant.

Subscript consonants that appear at the end of a word, are silent, eg. ពេទ្យ peit; រដ្ឋ roat.

In some multisyllabic words a medial cluster may contain a final consonant for the first syllable and the initial consonant of the next syllable, eg. កម្មករ kɑmmɔkɑːworker .

There are some clusters involving two subscripts. These are, with three exceptions, composed of a final nasal, followed by a stop and r, eg. កន្ត្រៃ kɑntraj scissors, កញ្ជ្រេង kɑɲcreːŋ fox. The three exceptions are the loan words, អង្គ្លេស ʔɑŋkleːh English, សងស្ក្រិត sɑŋskret Sanskrit, and សាស្ត្រាចារ្យ sɑstraːcaː teacher.

Subscript consonants after vowels

It is rare but possible to find subscripts used after independent vowels. One common word spelled this way is ឲ្យ ʔaoj to give.

It is also possible to find subscript forms of independent vowels. Four of these are named sequences in Unicode.

Vowels

The Khmer Unicode block has 16 vowel signs, and 15 independent vowels (not counting 2 that are deprecated).

Several vowel characters are composed of parts that are visually separate, eg. the sound aw/əː is represented as [U+17BE KHMER VOWEL SIGN OE​].

Other diacritics also produce vowel sounds after or before the consonants they are attached to.

As mentioned above, an initial indicator of pronunciation is the class of the syllable-initial consonant. Additional factors include whether this is an unstressed vowel, vowel harmony, and whether any of the special diacritics have been used to change the sound. For an in-depth treatment of pronunciation see Huffman in the sources section.

Inherent vowels

Khmer has two inherent vowels, ɑː and ɔː. The class of the consonant will initially dictate which sound is appropriate, eg. kɑː vs. kɔː.

Inherent vowels are not pronounced after syllable final consonants.

Vowel signs

As mentioned above, in most cases, vowel signs attached to a consonant are pronounced differently, depending on the register of the consonant letter, eg. កា kaː vs. គា kiə.

list all
āaː, iə
iə, e, ɨ, i
īəj, iː
ə, ɨ
ȳəɨ, ɨː
uo, u
ūou, uː
ua
oeaə, əː
ẏaɨə
ia
eei, eː
aeae, ɛː
aiaj, ɨj
oao, oː
auaw, ɨw

Independent vowels

There are two ways of representing vowel sounds that are not preceded by a consonant.

The most common way is to add a vowel-sign to the character [U+17A2 KHMER LETTER QA], eg. អី ʔəj.

There are also some independent vowel letters, but unlike most South Asian scripts, there are fewer independent vowels than vowel signs, and some do not have direct correspondances with a vowel sign, eg. corresponds phonetically to the vowel plus consonant combination ូវ.

list all
′iʔə, ʔɨ, ʔəj
′īʔəj
′uʔo, ʔu, ʔao
′ūou, ʔuː
ýuʔəw
′aeʔae, ʔɛː, ʔeː
′aiʔaj
′oʔao
′auʔaw
′ǎuʔao
ra~ru
raː~ruː
la~lu
laː~luː

Whether an independent vowel sound is represented using an independent vowel sign or the glottal consonant plus vowel sign varies from word to word. In Cambodian orthography the two are not interchangeable. The independent vowel signs appear in relatively few words, but some of those words are quite common, eg. ឪពុក ʔəwpuk father, ឲ្យ ʔaoj to give and lɨː to hear.

Vowel-like diacritics

Three diacritics can behave like vowel-signs in Khmer.

list all
m
h
àaʔ, eə̆ʔ

The descendants of the anusvara and the visarga, [U+17C6 KHMER SIGN NIKAHIT​], called និគ្គហិត niʔkəhət, and [U+17C7 KHMER SIGN REAHMUK​], called រះមុខ reə̆hmuk, are regarded as vowels in Khmer, even though their sounds still end with ŋ and h, respectively.

They are also used in combination with other vowel signs to produce common vowel sounds.

list all
ាំam, oə̆m
ុំom, um
ុះoh, uh
េះeh, ih
ោះɑh, uə̆h

Two of these combinations are regarded as additional vowels in the alphabet but not encoded separately in Unicode (though they are named sequences), ie. អាំ am, oə̆m and អុំ om, um.

[U+17C8 KHMER SIGN YUUKALEAPINTU​] is a 20th century addition to the Khmer repertoire. It is used as a vowel after consonants that are pronounced as stressed syllables at the end of a word, or preceding an internal juncture in compounds.

Vowel harmony

In two-syllable words, where the second syllable begins with one of the following consonants the vowel class of the second syllable is the same as that of the first.

list all

For example, in ប្រយ័ត្ន prɑjat to be careful, the second syllable starts with an class consonant but the class of the preceding syllable turns the vowel to an ɑː class sound. There are, however, exceptions to this rule.

Diacritics

The Khmer block contains 33 combining marks, 16 of which are vowel-signs, and 3 vowel-like signs and are described above. Two deprecated inherent vowel marks, and one astrological mark who's use is discouraged, bring the remaining total to 11.

list all

[U+17D2 KHMER SIGN COENG​], used to create stacked consonants, is described earlier, as are [U+17C9 KHMER SIGN MUUSIKATOAN​] and [U+17CA KHMER SIGN TRIISAP​].

Other diacritics are described here in general terms. For the detail, click on the character above and follow the links to the character notes document.

Silencing marks

Two diacritics, [U+17CC KHMER SIGN ROBAT​] and [U+17CD KHMER SIGN TOANDAKHIAT​], are used to silence written characters. The former is not very common and silences final consonants, eg. បរិបូណ៌ ɓɑriɓou abundant (although it also introduces or affects sound in some cases in multisyllabic words). The latter is used over a consonant, particularly in loan words, to silence it and any attached vowels or subscripts, eg. សប្ដាហ៍ sâb͓dāhâ˚ sɑpɗaː week or រេហ៍ពល rehâ˚ꝑôḻ rɔpuə̆l army.

[U+17D1 KHMER SIGN VIRIAM], the sanskrit virama, is sometimes used in Sanskrit words to indicate that a final consonant has no vowel sound, eg. អាត្មន៑ ʔaːtman soul.

[U+17DD KHMER SIGN ATTHACAN​], on the other hand, is a rarely used sign that indicates that a final consonant retains its inherent vowel sound. 

 

Vowel modifier marks

[U+17CB KHMER SIGN BANTOC] is always placed above the final consonant, and basically shortens the preceding vowel.

[U+17D0 KHMER SIGN SAMYOK SANNYA​] is used in some Pail and Sanskrit loan words (although alternative spellings exist) and indicates that the syllable has a particular vowel. 

Other marks

[U+17CF KHMER SIGN AHSDA​] is used over two consonants to indicate that they represent two specific words: ក៏ kɑː also, then, therefore, and ដ៏ ɗɑː which, very.

[U+17CE KHMER SIGN KAKABAT​] is very rare, but used over the final consonant of a word like an exclamation mark, to convey excited emphasis, eg. ណែ៎ nɛː Hey!, or នុ៎ះន៎ nuhnɔː Over there!

Punctuation & symbols

The Khmer block has 7 non-deprecated punctuation marks, and a currency symbol.

list all

These are described elsewhere on this page.

Numbers

Khmer has its own set of decimal digits, although western digits are also used sometimes.

list all
0
1
2
3
4
5
6
7
8
9

The thousands separator is . [U+002E FULL STOP], and decimal separator is , [U+002C COMMA].

Ranges and dates use the ASCII hyphen.w

Currency. The symbol [U+17DB KHMER CURRENCY SYMBOL RIEL] (សញ្ញារៀល saɲ ɲaː riəl) is placed after the amount, eg. ៣.០០០ ៛ ɓej poan riəl 3,000 riel. Sometimes [U+179A KHMER LETTER RO] is used instead.

Unused characters

The Khmer block has a number of deprecated or obsolete characters.

list all

For more information, click on list all and follow the links to the character notes page.

Glyph shaping

Context-based glyph changes

There is very little in the way of interaction between characters other than the subscript shapes used after the coeng generator.

Some small joining features occur in relation to [U+17B6 KHMER VOWEL SIGN AA​] and similarly shaped vowels. Unicode provides the following list of common forms:

  1. ក + ា = កា
  2. ប +  ា = បា (avoids confusion with )
  3. ប +  ៅ = បៅ
  4.  ្ស +  ា = ្សា

Some reshaping of glyphs is needed to cope with stacking of characters. Compare for example the length of the final element in ង្យ and ង្ខ្យ.

Also, when [U+17C9 KHMER SIGN MUUSIKATOAN​] museʔkətoə̯n or [U+17CA KHMER SIGN TRIISAP​] trəisaɓ appears with a vowel sign above the consonant, the ក្បៀសក្រោម kɓiəhkraom form is used. This looks exactly like [U+17BB KHMER VOWEL SIGN U​], eg. compare យ៉ាង and ម៉ឺន məɨn 10,000 or ញ៉ាំ ɲam to eat.

Usage tip: This behaviour can be modified using U+200C ZERO WIDTH NON-JOINER.

Another common feature is that [U+1789 KHMER LETTER NYO] drops the swash below the baseline when followed by a subscript consonant, eg. បញ្ឆោត ɓɑɲcʰaot to trick. Also, when it appears as a subscript under itself it uses a special full form subscript. Compare កញ្ញា kɑɲɲaa young lady and ប្រាជ្ញា praːcɲaa intelligence.

Font styles

There are several distinct styles of font in Modern Khmer.

Most modern typefaces are set in an upright style (called អក្សរឈរ ʔɑːksɑː cʰɔː or អក្សរត្រង់ ʔɑːksɑː trɑŋ).w This is the style used for this page.

The text អក្សរខ្មែ in an âksâr chôr font style.

The slanted style (អក្សរជ្រៀង ʔɑːksɑː criəŋ) is used for whole documents or novels. The oblique styling has no affect on the semantics of the text.w

The text អក្សរខ្មែ in an âksâr chriĕng font style.

The round style (អក្សរមូល ʔɑːksɑː muːl) includes more ligated forms, and is used for titles and headings in Cambodian documents, books, or currency, as well as on shop signs or banners. It may also be used to emphasise important names or nouns.w

The text អក្សរខ្មែ in an âksâr mul font style.

Another style (អក្សរខម ʔɑːksɑː kʰɑːm), characterized by sharper serifs and angles and retainment of some antique characteristics, is used for yantra text in Cambodia as well as in Thailand.w

Syllable structure

The syllable is fundamental in Cambodian.

Many native Cambodian words are monosyllabic. These start with one or more consonants or an independent vowel (or a vowel sign attached to ʔɑː, which is a combination of both). Short vowels in stressed syllables are always followed by a consonant. Long vowels may not be. There are many monosyllabic words that begin with consonant clusters, and some monosyllabic words that end with clusters, although only one consonant is pronounced in syllable final position.

There are also many bisyllabic words. In many cases the first syllable in a bisyllabic word is unstressed, and the vowel is usually rendered in colloquial speech as a schwa. Some bisyllabic words are compounds, however, and this may not apply.

Polysyllabic words are usually of Sanskrit, Pali or French origin. These words tend to alternate stress across their syllables, but may not.

Order of syllable components

An orthographic syllable is slightly different from a morphological syllable, since an orthographic syllable may begin with the final consonant of the previous morphological syllable. Alternatively, an orthographic syllable may be just a final consonant or consonant cluster in a morphological syllable.

Usage tip: Components of an 'orthographic syllable'* should be composed in the following order:

  1. base consonant or independent vowel
  2. rɔɓaːt
  3. museʔkətoə̯n or trəisaɓ (register shifters)
  4. subscript (consonant or independent vowel)
  5. vowel sign
  6. zero-width joiner or non-joiner
  7. any other mark

This fixed ordering makes it easier to search for and collate text.

As mentioned above, although all combining characters follow the base in memory, the visual order of syllable components may not follow a linear progression from left to right. In the following example the order in which the glyphs are pronounced is far left, far right, down, left, left: កន្ត្រៃ kɑntraj scissors. Here ច្រៀង the spoken order of the separate visible parts, numbered left to right, is 3,2, 1+4, 5, Some vowel signs span two or three sides of the base consonant or cluster.

Text layout

Text direction

Khmer text runs left to right in horizontal lines.

Text boundaries & punctuation

Word boundaries. Khmer words are not separated by spaces, so the space, ឃ្លា kliə, is regarded as punctuation, similar to the comma. Huffman lists the following uses:

  1. between clauses within a sentence
  2. between sentences in a cohesive group of sentences
  3. after preposed adverbial phrases, such as 'usually', 'today', 'in that town', etc.
  4. before and after proper names
  5. before and after numbers
  6. before and after the symbols and and the terms ។ល។ and ។ប។
  7. between coordinate words in lists

Huffman gives the following example to show the use of the space:

ថ្ងៃនេះ ខ្ញុំទៅផ្សារ ទិញក្រច អង្ករ ហើយនឹងអីវ៉ាន់ផ្សេង ៗ
tŋajnih kɲomtɨwpsaː tiɲkrouc ʔɑŋkɑː haəjnɨŋʔəjʋanpseiŋ pseiŋ
Today ( ) I'm going to the market ( ) to buy oranges ( ) rice ( ) and various things.

Phrase delimiters. Khmer also uses [U+17D4 KHMER SIGN KHAN] to mark the end of sentences, although a series of sentences on a related topic tend to be separated by space instead.

[U+17D6 KHMER SIGN CAMNUC PII KUUH] is used in much the same way as a western colon. 

Question & exclamation marks. Khmer uses Western punctuation marks, eg. ហេត៊អ្វី? haetʰ aʋəi, and កុំ! kom.

Very rarely, the combining character [U+17CE KHMER SIGN KAKABAT​] can be used over the final consonant of a word like an exclamation mark, to convey excited emphasis, eg. ណែ៎ nɛː Hey!, or នុ៎ះន៎ nuhnɔː Over there!.

Names. - [U+002D HYPHEN-MINUS] (called សហសញ្ញា sɑːhɑː sɑːɲɲiə) is used between the parts of a person's name. Typically the family name (written first) and following names, but often all names for Chinese Cambodians, eg. ញ៉ុក-ថែម ɲok tʰaem, លី-ធាម-តេង liː tʰiəm teiŋ.

Section boundaries. [U+17D5 KHMER SIGN BARIYOOSAN] can be used to close a chapter, or an entire text.

Text start and end. Poetic and religious texts typically start with [U+17D9 KHMER SIGN PHNAEK MUAN] and end with [U+17DA KHMER SIGN KOOMUUT].

Repetition

It is common to repeat words or sometimes phrases in Khmer, particularly to provide emphasis. [U+17D7 KHMER SIGN LEK TOO] (called លេខទោ leːktoː) can be used for this, eg. ខ្លាំង ៗ klaŋklaŋ very strong; គាត់មានផ្ទះថ្មី ៗ kaːtʰ miən pʰteə̯h tʰməitʰməi he has a brand new house.

Sometimes this sign repeats a phrase rather than a word, eg. បន្តិចម្ដង ៗ ɓɑntecmɗɑːŋ ɓɑntecmɗɑːŋ little by little.

It is also occasionally used to repeat the word at the end of a sentence for the beginning of a new sentenceh, eg. ខ្ញុំទៅផ្ទះខ្ញុំ ។ នៅជិតផ្សារ kɲomtɨwpteə̆hkɲom pteə̆hkɲom nɨwcɨtpsaː .

The sign is usually separated from the text by a space.

Continuation

The word ។ល។ (pronounced laʔ) is used as the equivalent of 'etc.' A character exists that represents that sequence, [U+17D8 KHMER SIGN BEYYAL], but the Unicode Standard recommends the use of the three separate characters instead.u

Other spellings for et cetera also exist. These include:

Line breaks & hyphenation

- [U+002D HYPHEN-MINUS] is used to indicate when part of a word has been wrapped onto a new line.

TBD

Other features to be investigated in this section include: text direction, text delimiters, emphasis & highlighting, text decoration, abbreviations & ellipsis, glyph controls character transforms, quotations, repetition, line breaking, hyphenation, justification & alignment, first-letter styling, notes & footnotes, page layout

Further reading

  1. [H] Franklin Huffman, Cambodian System of Writing
  2. [G] Richard Gilbert & Sovandy Hang, Cambodian for Beginners, Paiboon Publishing, ISBN 1-887521-35-6
  3. [D] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
  4. [U] The Unicode Standard, Version 10.0
  5. [W] Wikipedia, Khmer Alphabet
Last changed 2018-02-19 11:10 GMT.  •  Make a comment.  •  Licence CC-By © r12a.