Notes/help

Updated 05-Aug-2017 • tags apps, pickers, khmer

This Unicode character picker allows you to produce or analyse runs of Khmer text. Character pickers are especially useful for people who don't know a script well, as characters are displayed in ways that aid identification.

If something is broken or missing raise an issue. For version information see the Github commit list.

Basic use

To produce characters in the text area, click on character shapes, or use your keyboard for Latin characters, delete, etc. Then cut & paste the result to your document, or use the controls to get further information about the characters. You must have JavaScript enabled.

Sample text If you want to add some sample text to the text area, click on the plus sign icon.

Fonts To properly display the text you will need to choose a font that is loaded on your system or device, or use the web fonts downloaded with the page (Khmer OS Battambang WF, and Noto Sans Khmer). The font list indicates which fonts are standard for Mac (Snow Leopard/Lion) and Windows7. See more information about standard OS fonts in Mac and Windows.

Mobile devices When working on an iPad or similar device, you should turn off Autofocus (just below the text area). This prevents the keyboard popping up after you input every character. You may also need to select a character twice to add it to the output field.

About the chart

Includes all the characters in the Unicode Khmer and Khmer Symbols blocks (in the default panel).

All text is output in Unicode normalisation form NFC by default. You can change to NFD or no normalisation by clicking on the buttons in the yellow area. Note that normalization only takes place when you click on a character - text pasted into the box won't be normalised until you click on another character above, or click on a button in the yellow area.

Interactive help

Select the thing you want help with:

Help selector.

Text area

This is where you see characters appearing as you select them from the panels lower down or where you paste text into the picker. Once you have some text here, you can perform various operations on it, or simply copy it to the clipboard for use elsewhere.

The controls just above the text area allow you to interactive with the text in various ways. They mostly work on highlighted text within the text area, or if there is no highlight they work on all the text. Controls near the bottom of the picker allow you to change font, font size, line-height, text direction, etc.

Selection area

Click on characters or buttons in this area to add them to the text area above.

Characters are arranged based on the order of input, to speed up picking.

Simple consonants are to the left in mostly alphabetic order. To their right are combining characters that follow the initial consonant, then subscript consonants, then vowels and other symbols. Independent vowels appear at the top, then combining vowel signs, then other combining marks. At the far right are digits and the currency symbol, and various other symbols and punctuation. Clicking on the subscript characters produces a coeng sign followed by a consonant.

Open the expanding link for obsoleted and other less commonly used characters.

Controls above the text area

Controls above the input box allow you to run various operations on the text in the box. Most of them work on what you have selected within the box, or the whole box if nothing is selected.

Copy, select, delete, etc. (). The icons on the left above the input box allow you (listing them from left to right) to copy the text to the clipboard, select the text, delete it, generate a URL to share with others that will reproduce for them what you see in the text box, add some sample text to the text area, and open this help file.

Show codepoints. Produces a list of the Unicode code points in the input box. You can usually follow a link from a code point item to more detailed information about that character.

Convert to escapes. Opens a new window for the converter app, which shows various different ways of representing the text in the input box using escapes.

Khmer to IPA. Produces an output that is intended to approximately reflect actual pronunciation. It uses the rules in Franklin Huffman's Cambodian System of Writing. However, it needs some assistance from the user. This is because Khmer doesn't use spaces between words, and it is often ambiguous as to whether a consonant represents a syllable-final sound or a syllable in its own right. It also needs help to identify unstressed syllables. I don't have the means to do automatic word segmentation, so you will need to provide this information.

After the first syllable on the line, put an ordinary space before each consonant or independent vowel sign that begins a new syllable (not word). (Note that this may split consonant clusters. The Khmer text will look strange but still work.) You should also indicate unstressed syllables by following the syllable with a hyphen, rather than a space. For many bisyllabic words, this means putting a hyphen after the first of the two syllables. For example, converting ប្រកាន់និទៀន to ប្រ-កាន់ និ-ទៀន will produce the following transcription prɑkannitiən. Note that, if you don't know Khmer well enough to know when a syllable is unstressed, you can still get an approximation to the pronunciation using only spaces. For instance, the previous example separated by spaces only will yield prɑːkanniʔtiən.

Remove space/hyphen. Removes the spaces from the highlighted range (or the whole output area, if nothing is highlighted).

Although the transcription is based on rules by Franklin Huffman in Cambodian System of Writing, some symbols are changed to be more recognizable to those familiar with IPA. While the transcription rules are quite detailed, and Khmer is largely regular, there are a few exceptions, particularly in words from Sanskrit or Pali, or ambiguities, for example in a few independent vowel signs, that cause problems for the transcription. The transcription is non-reversible. I created it to help me quickly reproduce (simple) phonetic alternatives for examples in my notes on Khmer. 

Make example. This may be useful to speed up the creation of examples. You can create an example with four parts, delimited by /, in the following order: [1] Khmer text, [2] IPA transcription, [3] other transcription, [4] meaning. You don't need to add all four elements, but if you want to skip one in the middle of the sequence, use //.

For example, the following:

ភាសាខ្មែរ/pʰiːəsaː kʰmaːe//Khmer language

will produce:

<span class="ex" lang="km">ភាសាខ្មែរ</span> <span class="ipa">pʰiːəsaː kʰmaːe</span> <span class="meaning">Khmer language</span>

To get just the Khmer and the meaning you would use:

ភាសាខ្មែរ///Khmer language

Character markup. This may be useful to speed up the creation of markup for a specific character or set of characters. Select one or more characters in the text area, then click this button. It will return something like the following for each of the characters:

<span class="codepoint"><span lang="km">&#x1797;</span> <span class="uname">U+1797 KHMER LETTER PHO</span></span>

When you add it to your document, it will look like this.

U+1797 KHMER LETTER PHO

Secondary text area

This area receives the output of various tools. Note that the text is editable.

The icons to the right ( ) allow you to copy the contents of this area to the clipboard, insert the contents into the main text area, or close this subwindow, respectively. When you insert the contents of this subwindow into the main text area, the text will overwrite any highlighted text, otherwise it will just be inserted at the current cursor position.

Some conversions produce ambiguous output. In this case, you will be offered two alternatives on a yellow background, eg. presents you with the alternatives 'h' or 't'. Simply click on the alternative you want, and the picker will discard the rest.

Character names

As you mouse over characters in the selection areas of the picker, you will see the code point and character name appear here.

Autofocus

When working on an iPad or similar device, you should set this to Off. This prevents the keyboard popping up after you input every character.

Input aids

The vertical grey bar to the left allows you to turn on/off a number of panels that can help create the text you want.

Default. Turns off all input aids and closes all panels.

Hinting. Changes the selection area so that, when you mouse over a character, characters that are similar in appearance, and may be easily confused, are automatically highlighted. This can be particularly useful for people who are not familiar with the script, to avoid confusing similar characters, or to find the right character when two or more look similar.

Shape lookup. This adds a row of orange pictures that represent basic shapes associated with the Khmer characters. When you click on a picture, characters that incorporate that shape are highlighted. This is particularly helpful for those who don't know the script at all and want to pick characters based on their shape, or for those times when you just can't find the character you want and need a hint.

Each orange key below the table represents a significant part of the shape of two or more characters; when you click on the keys, characters and combinations of characters that incorporate that shape are highlighted above. Click on these characters to add them to the output.

Latin characters. Displays a panel of lowercase Latin characters you are likely to need for transcription.

Huffman transcription. Displays a panel that allows you to generate Khmer text from a transcription as used by Huffman in Cambodian System of Writing. Where there are multiple possible choices, these choices are presented in a small pop-up box; click on the choice you want to insert it into the output area.

A hyphen in a selection list for either of this or the following transcription panels indicates that the sound is produced without a Khmer character, ie. the inherent vowel.

In a small number of cases, you will need to click twice on the components that make up the sound (eg. when bantoc is used on the following consonant). These cases are indicated by a red plus sign between two clickable shapes (one of which may be just a hyphen). You need to click on the item to the left of the plus sign, then add a consonant, then click on the item to the right of the plus sign. In several cases the item to the left is a hyphen (representing the inherent vowel), in which case just add another consonant followed by the item to the right.

Gilbert transcription. Displays a panel that allows you to generate Khmer text from a transcription as used by Gilbert and Hang in Cambodian for Beginners.

See also the notes for the Huffman transcription mentioned just above.

Controls on the yellow background

Left-hand controls. These controls at the bottom of the page allow you to modify fonts used, the font size, line height, and the height of the text area.

Add codepoint. You can add characters to the text area by typing codepoints and escapes in the Add codepoint field and hitting return. This will accept HTML numeric character references, javascript and other programming escapes, U+ Unicode notation, or just simple codepoint numbers separated by spaces. All codepoint numbers (including those in escapes) must be hexadecimal.

Search for. If you are searching for a particular character and know (at least part of) the name or the codepoint, type that in the search box and hit return. All characters with matching text in the name or codepoint number will be highlighted. The highlighting is only removed when you click on the X next to the search input field. You can also use regular expression syntax to improve your search results. For example, to find the letter 'ha', but not 'gha' etc, you can use \bha\b (or the shortcut, :ha:).

More controls

Click on more controls to reveal the less commonly used controls described here.

Normalise. All text is added to the main text area in Unicode normalisation form NFC by default. You can change to NFD or no normalisation by clicking on the buttons in the yellow area. Note that normalization only takes place when you click on a character – text pasted into the box won't be normalised until you click on another character above, or click on a button in the yellow area.

Change table font. Allows you to change the font and size of the characters you click on in the main selection areas.

CC base. You would normally expect combining characters, such as accents and vowel signs, when displayed alone to be associated with a dotted circle, however these font glyphs are handled inconsistently from one browser/font to the next. The picker is set up for a given web font initially, but if you change the table font you may need to do something to ensure that combining characters display in a way that helps you click on them.

The CC base control allows you to specify a base character that will be used before each combining character (or no base character). This should hopefully help for most font and browser combinations.

Copyright r12a@w3.org. Licence CC-By.