Updated Wed 15 Oct 2014 • tags khmer, myanmar, scriptnotes
In these notes I synthesize information from various sources, encountered as I explore the Khmer script as used for the Khmer language. They may be updated from time to time.
The page contains brief notes on general script features and discussions about which Unicode characters are most appropriate when there is a choice. See also the companion document, Khmer Character Notes, which describes the characters used in Khmer script one by one.
For more detailed information, especially about the history and phonology of the Khmer script, follow the links in the text and at the bottom of the page. When you see red text (examples of Khmer) you can click on it to reveal the component characters.
You can obtain fonts for Khmer free from the Web. For this page I used Khmer OS Battambang, which is also downloaded with this page as a webfont. For occasional examples I also used Khmer Mool. Click the blue vertical bar at the bottom right of the page to apply other fonts, if you have them on your system.
The script is an abugida, ie. like most Brahmi-influenced scripts, each consonant carries with it an inherent vowel. The sound following a consonant can be modified by attaching vowel signs to the consonant when writing.
Direction of text is horizontal, left to right. However, glyphs constituting a single syllable can appear on all sides of the initial character.
A key feature of Khmer is that there are a large number of vowel sounds, and only a few vowel signs, but a large number of consonant signs for only a small number of consonant sounds. This lead to a system where there are generally two consonant signs for a given sound, each belonging to one of two classes (or registers). So to determine the pronunciation of a vowel sign you start by seeing which class of consonant it follows. For example, using the two symbols for the sound k, ក is kɑː (neck), and គ is kɔː (mute).
Diacritics are available to change the class of a consonant. These are particularly useful when a particular sound has only one character associated with it, such as មយស etc.
In addition, there is the coeng generator , which has no visual form in Cambodian, and sets of divination lore and lunar date symbols which are not described here (but are available from the picker).
There are two distinct styles of font in Modern Khmer: slanted អក្សរច្រៀង (with an upright variant) and round អក្សរឈរ. The round style includes more ligated forms. The upright style is used here. Style examples: slanted upright អក្សរ ខ្មែ, round អក្សរ ខ្មែ.
Example of Khmer:
មនុស្សទាំងអស់ កើតមកមានសេរីភាព និងសមភាព ក្នុងផ្នែកសេចក្ដីថ្លៃថ្នូរនិងសិទ្ធិ។ មនុស្ស មានវិចារណញ្ញាណនិងសតិសម្បជញ្ញៈជាប់ពីកំណើត ហើយគប្បីប្រព្រឹត្ដចំពោះគ្នាទៅវិញទៅមក ក្នុង ស្មារតីភាតរភាពជាបងប្អូន។
The syllable is fundamental in Cambodian.
Many native Cambodian words are monosyllabic. These start with one or more consonants or an independent vowel (or a vowel sign attached to ʔɑː, which is a combination of both). Short vowels in stressed syllables are always followed by a consonant. Long vowels may not be. There are many monosyllabic words that begin with consonant clusters, and some monosyllabic words that end with clusters, although only one consonant is pronounced in syllable final position.
There are also many bisyllabic words. In many cases the first syllable in a bisyllabic word is unstressed, and the vowel is usually rendered in colloquial speech as a schwa. Some bisyllabic words are compounds, however, and this may not apply.
Polysyllabic words are usually of Sanskrit, Pali or French origin. These words tend to alternate stress across their syllables, but may not.
Several vowel characters are composed of separate parts visually, eg. ើ aw/əː. The descendants of the anusvara and the visarga, called niʔkəhət និគ្គហិត and reə̆hmuk រះមុខ respectively, are also regarded as vowels in Khmer, even though their vowel sounds still end with ŋ and h respectively. Two combinations of these characters and other vowel sign characters are regarded as vowels in the alphabet but not encoded separately in Unicode (though they are named sequences), ie. អាំ am/oə̆m and អុំ om/um.
Other diacritics also produce vowel sounds after or before the consonants they are attached to.
As mentioned above, an initial indicator of pronunciation is the class of the syllable initial consonant. Additional factors include whether this is an unstressed vowel, vowel harmony, and whether any of the special diacritics have been used to change the sound. For an in-depth treatment of pronunciation see Huffman in the sources section.
Inherent vowels Khmer has two inherent vowels, ɑː and ɔː. The class of the consonant will initially dictate which sound is appropriate, eg. ក kɑː vs. គ kɔː.
Inherent vowels are not pronounced after syllable final consonants.
Vowel signs. As mentioned above, in most cases, vowel signs attached to a consonant are pronounced differently, depending on the register of the consonant letter, eg. កា kaː vs. គា kiə.
Independent vowels. There are two ways of representing vowel sounds that are not preceded by a consonant.
The most common way is to add a vowel-sign to the character អ, eg. អី ʔəj.
There are also some independent vowel letters, but unlike most South Asian scipts, there are fewer independent vowels than vowel signs, and some do not have direct correspondances with a vowel sign, eg. ឪ corresponds phonetically to the vowel plus consonant combination ូវ.
Whether an independent vowels sound is represented using an independent vowel sign or the glottal consonant plus vowel sign varies from word to word. In Cambodian orthography the two are not interchangeable. The independent vowel signs appear in relatively few words, but some of those words are quite common, eg. ឪពុក ʔəwpuk (father), ឲ្យ ʔaoj (to give ) and ឮ lɨː (to hear ).
Vowel harmony. In two-syllable words, where the second syllable begins with one of the following consonants, ងញណនមយឡលរវ, the vowel class of the second syllable is the same as that of the first, eg. in ប្រយ័ត្ន prɑjat (to be careful), the second syllable starts with an oː class consonant but the class of the preceding syllable turns the vowel to an ɑː class sound. There are, however, exceptions to this rule.
Final consonants. Not all Khmer consonants can appear in syllable-final position. The most common syllable-final consonants include កងញតនបមល. The pronunciation of the consonant in final position may differ from it's normal pronunciation.
Subscript consonants. It is common to find clusters of consonants with no intervening vowel sounds. In Khmer, this is very common at the beginning of a word, but clusters also occur medially in multisyllable words, and occasionally at the end of a word.
When two consonants occur together without an intervening vowel, the second is rendered in subscript form, called ជើងអក្សរ cəːŋʔɑʔsɑː (consonant feet) (called in Unicode 'coeng'). Cambodians see these subscripts as distinct letter forms, but in Unicode they are produced by inserting 17D2: KHMER SIGN COENG before the consonant that will become a subscript.
Where the two consonants involved in the cluster are in different classes or registers, the pronunciation of any following vowel is normally determined by the register of the subscript consonant. For the following exceptions, however, the vowel pronunciation is determined by the register of the first consonant: ងញនមយរលវ. XXX Add an example.
Some subscripts change the sound of the preceding consonant.
Subscript consonants that appear at the end of a word, are silent, eg. ពេទ្យ peit; រដ្ឋ roat.
In some multisyllabic words a medial cluster may contain a final consonant for the first syllable and the initial consonant of the next syllable, eg. កម្មករ kɑmmɔkɑː (worker ) .
There are some clusters involving two subscripts. These are, with three exceptions, composed of a final nasal, followed by a stop and r, eg. កន្ត្រៃ kɑntraj scissors, កញ្ជ្រេង kɑɲcreːŋ (fox). The three exceptions are the loan words, អង្គ្លេស ʔɑŋkleːh (English), សងស្ក្រិត sɑŋskret (Sanskrit), and សាស្ត្រាចារ្យ sɑstraːcaː (teacher).
It is rare but possible to find subscripts used after independent vowels. One common word spelled this way is ឲ្យ ʔaoj (to give).
It is also possible to find subscript forms of independent vowels. Four of these are named sequences in Unicode. (See the table above.)
There is very little in the way of interaction between characters other than the subscript shapes used after the coeng generator.
Some small joining features occur in relation to ា and similarly shaped vowels. Unicode provides the following list of common forms:
Some reshaping of glyphs is needed to cope with stacking of characters. Compare for example the length of the final element in ង្យ and ង្ខ្យ.
Also, when museʔkətoə̯n or trəisaɓ appears with a vowel sign above the consonant, the ក្បៀសក្រោម kɓiəhkraom form is used. This looks exactly like sra-o អុ, eg. compare យ៉ាង and ម៉ឺន məɨn (10,000) or ញ៉ាំ ɲam (to eat). (This behaviour can be modified using the zero-width non-joiner.)
Another common feature is that ញ drops the swash below the baseline when followed by a subscript consonant, eg. បញ្ឆោត ɓɑɲcʰaot (to trick). Also, when it appears as a subscript under itself it uses a special full form subscript. Compare កញ្ញា kɑɲɲaa (young lady) and ប្រាជ្ញា praːcɲaa (intelligence).
Components of an 'orthographic syllable'* should be composed in the following order:
This fixed ordering makes it easier to search for and collate text.
As mentioned above, although all combining characters follow the base in memory, the visual order of syllable components may not follow a linear progression from left to right. In the following example the order in which the glyphs are pronounced is far left, far right, down, left, left: កន្ត្រៃ kɑntraj scissors. Here ច្រៀង the spoken order of the separate visible parts, numbered left to right, is 3,2, 1+4, 5, Some vowel signs span two or three sides of the base consonant or cluster.
Space. Khmer words are not separated by spaces, so the space, ឃ្លា kliə, is regarded as punctuation, similar to the comma. Huffman lists the following uses:
Huffman gives the following example to show the use of the space:
ថ្ងៃនេះ ខ្ញុំទៅផ្សារ ទិញក្រច អង្ករ ហើយនឹងអីវ៉ាន់ផ្សេង ៗ
tŋajnih kɲomtɨwpsaː tiɲkrouc ʔɑŋkɑː haəjnɨŋʔəjʋanpseiŋ pseiŋ
Today ( ) I'm going to the market ( ) to buy oranges ( ) rice ( ) and various things.
Other punctuation. Khmer uses other punctuation marks described in the punctuation section below. In addition to its own punctuation characters, Khmer uses Western punctuation marks, such as question mark (eg. ហេត៊អ្វី? haetʰ aʋəi), exclamation mark (eg. កុំ! kom).
Hyphens are used to indicate when part of a word has been wrapped onto a new line.
Hyphens are also used between the parts of a person's name. Typically the family name (written first) and following names, but often all names for Chinese Cambodians, eg. ញ៉ុក-ថែម ɲok tʰaem, លី-ធាម-តេង liː tʰiəm teiŋ.
This is a list of main characters or character combinations needed for Khmer. Clicking on these characters will open a page in another window. If the character is underlined, the new page will display additional information about that character.
|Consonants||ក ខ គ ឃ ង ច ឆ ជ ឈ ញ ដ ឋ ឌ ឍ ណ ត ថ ទ ធ ន ប ផ ព ភ ម យ ល ឡ រ ស ហ វ អ|
|Subscript consonants||្ក ្ខ ្គ ្ឃ ្ង ្ច ្ឆ ្ជ ្ឈ ្ញ ្ដ ្ឋ ្ឌ ្ឍ ្ណ ្ត ្ថ ្ទ ្ធ ្ន ្ប ្ផ ្ព ្ភ ្ម ្យ ្ល ្ឡ ្រ ្ស ្ហ ្វ ្អ|
|Vowel signs||ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ ំ ះ|
|Combinations of vowel signs||ាំ ុំ ុះ េះ ោះ|
|Independent vowels||ឥ ឦ ឧ ឯ ឰ ឱ ឲ ឳ ឫ ឬ ឭ ឮ|
|Subscript independent vowels||្ឧ ្ឯ ្ឫ ្ឬ|
|Combining marks||៉ ៊ ់ ៌ ៍ ៎ ៏ ័ ៈ ្|
|Punctuation||។ ៕ ៖|
|Numbers||០ ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩|
|Symbols||៛ ៗ ៙ ៚|
|Rare characters||ឝ ឞ ៜ ៝ ៑|
|Deprecated characters||ឤ ឣ ឴ ឵ ៓ ឨ ៘|
To see a list of ligatures and alternative shapes go to the 'shape' view of the Khmer character picker. (Hint: to see the composition of a conjunct, click on it and select 'Codepoints' or 'Analyse'.)