Updated 15 December, 2024
This page brings together basic information about the Chakma script and its use for the Chakma language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Chakma using Unicode.
The information on this page is derived from the sources listed. Those sources are sometimes inconsistent or lacking in certain information. In addition, almost no IPA transcriptions were found for the few items in the term database. The information provided here should be reliable, but additional research is needed in some areas, many of which are noted in observations in the text.
Richard Ishida, Chakma (Chakma script) Orthography Notes, 15-Dec-2024, https://r12a.github.io/scripts/cakm/ccp
๐๐ข ๐ท ๐๐ฌ๐๐ด ๐๐๐ช๐๐ด ๐๐จ๐ข๐จ๐๐จ๐ฃ๐จ ๐ฅ๐ง๐ ๐๐จ๐๐ด๐๐ฎ๐๐ด ๐๐ ๐๐๐ด๐๐ฅ๐ ๐๐จ๐๐ฌ๐ญ ๐๐ง๐๐ด๐๐๐ด๐ ๐๐ข๐ข๐ด ๐๐ฌ๐ ๐๐ ๐๐ช๐๐ด๐๐จ ๐๐๐ฌ; ๐ฅ๐ฌ๐๐ง๐๐ณ๐ ๐ด ๐๐ฌ๐๐ด๐ ๐๐ง๐ข๐ด ๐๐ฌ๐๐ด๐๐ง๐๐ด ๐๐ข๐ฌ๐๐ด ๐๐ง๐๐ง๐ข๐ด ๐๐ณ๐ข๐ง๐๐จ ๐๐ง๐๐ด ๐๐ฎ๐ฃ๐ด ๐๐จ๐๐ณ๐ ๐ฌ ๐๐จ๐๐ฌ๐ญ ๐๐ง๐ฃ๐ ๐ ๐ช๐๐จ๐๐ด๐
๐๐ข ๐ธ ๐๐ฌ ๐ฃ๐ฌ๐๐จ ๐๐ง๐ ๐ฌ ๐๐ฌ๐๐ฌ๐ ๐ฌ ๐ฅ๐๐ฉ๐๐ง๐ ๐๐ ๐๐๐ด๐๐ฅ๐ง๐๐ข๐ด ๐๐ช๐๐จ๐จ, ๐๐ง๐ข๐ด๐๐ง, ๐๐ง๐ข๐ด๐๐ง, ๐ฅ๐จ๐๐ด๐, ๐๐๐ด, ๐ข๐๐ง๐๐จ๐๐จ๐๐ด ๐ ๐๐๐๐ณ๐ฆ๐ด๐๐ด ๐๐ง๐๐ด, ๐๐๐ฉ๐ ๐ด ๐ ๐ฅ๐๐๐จ๐๐ด ๐๐ช๐ข๐จ๐ ๐ช๐๐ณ๐ ๐ด, ๐๐ง๐๐ด๐๐ง, ๐ฅ๐ง๐๐ด๐๐ง๐๐จ๐จ ๐ ๐๐ง๐๐ณ๐ ๐ง ๐๐ง๐๐ง ๐๐จ๐๐ด๐๐ฎ๐๐ด ๐๐๐ด๐๐จ๐๐ฌ๐ข๐ด ๐๐ข ๐๐ฌ๐๐ด๐ ๐๐ฌ ๐ฅ๐ง๐ ๐๐๐ด๐๐ฅ๐ ๐๐ฌ๐๐ง๐ ๐๐ง๐๐ง ๐๐ฌ๐๐ด ๐๐ง ๐๐๐จ๐๐จ๐๐ฌ๐ข๐ด ๐ข๐๐ง๐๐จ๐๐จ๐๐ด, ๐ฅ๐จ๐๐จ๐๐ฌ ๐ ๐๐จ๐๐จ๐จ๐๐จ๐ข๐ด ๐๐จ๐๐ด๐๐ฎ๐๐ง ๐ ๐ซ๐๐ช๐ข๐ฌ ๐๐ข๐ด ๐๐ง๐๐ง ๐๐ง๐๐จ๐๐ฅ๐ฉ๐ข๐ด ๐๐ณ๐ข๐ง๐๐จ ๐๐ง๐๐ง๐ข๐ง๐๐ง๐๐ด ๐๐ฌ๐ข๐ง๐๐ด ๐๐ฌ๐ข๐ง๐๐ด ๐๐ง๐ข ๐๐ง ๐ฆ๐ง๐๐ง; ๐ฅ๐ฌ ๐๐ฌ๐๐ด ๐ ๐๐๐จ๐๐จ๐๐ฌ ๐ฅ๐๐ฉ๐๐ด ๐ฆ๐ฎ๐๐ด, ๐ฆ๐ฎ๐๐ด ๐๐ง๐๐จ๐๐ช๐๐ง๐ง, ๐๐ง๐ฅ๐ ๐จ๐๐ง๐ง๐ฅ๐ฅ๐จ๐๐ง ๐๐จ๐๐ ๐ฅ๐ข๐ด๐๐ง๐๐ฏ๐๐ง๐๐ง๐ง๐ข๐ด ๐๐ง๐๐ณ๐ ๐ง ๐๐ง๐๐ง ๐ฅ๐จ๐๐จ๐๐ฌ๐ข๐ด ๐๐จ๐๐จ๐ข๐ฌ๐
Source: Universal Declaration of Human Rights - Chakma, articles 1 & 2
Origins of the Chakma script, 7thC โ today.
Phoenician
โ Aramaic
โ Brahmi
โ Tamil-Brahmi
โ Pallava
โ Mon-Burmese
โ Chakma
+ Burmese
+ Mon
+ Sgaw Karen
+ Shan
+ Tai Tham
+ Ahom
+ Tai Le
+ Khamti
Chakma is spoken by about 300,000 people in southeast Bangladesh and neighbouring parts of India.u The number of people who write their language in the Chakma script is small, however, as the majority use the Bengali script, instead.ws The language and script have been introduced to non-governmental schools in Bangladesh and Mizoram.@Chakma script,https://www.youtube.com/watch?v=W4I4N0B7_8A
๐๐๐ด๐๐ณ๐ฆ ๐๐ง๐๐๐๐ด
The Chakma script is an early offshoot from the Mon-Burmese script, and retains many of it's forms and features. It is currently in danger of being replaced by the Bengali script, due to cultural and political developments over the past century.
More information: Unicode proposal โข Endangered Alphabets
The Chakma script is an abugida, ie. each consonant contains an inherent vowel sound. See the table to the right for a brief overview of features for the modern Chakma orthography.
Chakma text runs left-to-right in horizontal lines. There is no case distinction. Words are separated by spaces.
โฏ consonantSummary
Chakma represents native consonant sounds using 32 basic letters and a couple more for specialised orthographies.
Syllable-final consonants are typically written using 11134 to kill the vowel of a syllable-final consonant letter, but the diacritics ๐ and ๐ may be used for -ล and -h, respectively.
The absence of an inherent vowel is usually indicated in modern text by the explicit diacritic 11134 (maayyaa). However, 5 consonants (and occasionally more) may be subjoined to indicate a consonant cluster. A more old-fashioned alternative is to create ligatures rather than stacks.
11134 is also used to indicate geminated consonants, in which case the base consonant typically supports this diacritic plus a vowel sign.
โฏ basicV
Chakma is an abugida with an inherent vowel pronounced aห. Plain post-consonant vowel sounds are written using 7 combining marks and 3 more are used for diphthongs. Chakma has 1 pre-base vowel sign and 2 circumgraphs.
Four independent vowels are available for writing standalone vowels. Other standalone vowels can be written by attaching vowel signs to 11103.
Nasalisation is indicated using ๐, which can be combined with either an anusvara or a visarga diacritic.
Chakma has a set of native digits, but sometimes Bengali digits may be used. It has a mixture of ASCII and Chakma code points for punctuation marks.
The following represents the repertoire of the Chakma language.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones. Source Wikipedia.
Chakma is not a tonal language.
tbd
The following table summarises the main vowel to character assigments.
โ represents the inherent vowel. Diacritics are added to the vowels to indicate nasalisation (not shown here). The right-hand column lists independent vowels.
Simple | ||
---|---|---|
Diphthongs |
For additional details see vowel_mappings.
𑄇 ka U+11107 LETTER KAA
The inherent vowel for Chakma is aห (longer than the inherent vowels in Bangla and Hindi). So kaห is written by simply using the consonant letter, eg.
๐๐๐ข
The dropping of the inherent vowel for syllable codas in Chakma is marked using ๐ด.
๐๐๐ด
๐๐จ๐๐ด
๐๐ง๐ข๐ด๐๐ง๐๐ด
The same diacritic is also used to signal consonant clusters and gemination.
𑄇𑄨 ki U+11107 LETTER KAA + U+11128 VOWEL SIGN I
Plain post-consonant vowel sounds are written using 7 combining marks and 3 more are used for diphthongs. Chakma has 1 pre-base vowel sign and 2 circumgraphs.
Two of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.
All vowel signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time. When consonants are stacked the glyphs used to represent vowels, whether alone or in multipart vowels, are arranged around a syllable onset, which may be 2 consonants, rather than just around the immediately preceding consonant. See prebase and circumgraphs.
Chakma uses the following dedicated combining marks for basic vowels. They are all vowel signs.
The vowel-sign ๐ is used to indicate an explicit aห sound in the Baarah Maatraa orthography.
Single-character vowel signs are used to write the following diphthongs.
The Baarah Maatraa orthography uses the vowel sign ๐ to write the sound eหi.
Other diphthongs appear to use multiple vowel signs over the same base consonant. These include:
Nasalisation is indicated using 11100.
This can also be used in syllables that end with an anusvara or a visarga.mh,2 For example, ๐๐๐.
Since both diacritics have the same combining class, the order in typing and storage should reflect the increasing distance from the base character.
𑄇𑄬 ke U+11107 LETTER KAA + U+1112C VOWEL SIGN E
Chakma has one pre-base vowel sign.
This combining mark is always typed and stored after the base consonant. The rendering process places the glyph before the base consonant at the time of display.
๐๐ฌ๐๐ด
When this vowel is pronounced after a consonant cluster the vowel sign is typed and stored after the second consonant in the cluster but is displayed before the first consonant.
๐๐ฌ๐๐ด๐ณ๐ฆ๐ฌ๐
𑄇𑄮 ko U+11107 LETTER KAA + U+1112E VOWEL SIGN O
Chakma has 2 circumgraphs.
Like pre-base glyphs, these are single combining marks that are always stored after the base consonant. When rendered, the single code point produces multiple glyphs, which are placed on different sides of the base consonant.
These circumgraphs have canonically equivalent decomposed forms (see encoding).
The code point ๐ง is commonly used alone to represent the sound ษ, but the ๐ฑ and ๐ฒ code points are not usually found in text.
Dedicated vowel signs are available for long vowel sounds.
Composite vowels are only produced when the 2 circumgraphs are decomposed (see encoding).
At the beginning of a word standalone vowels can be written using either one of four independent vowels or using combinations of vowel signs with ๐.
The independent vowels are the following.
Other standalone vowels are written using vowel signs attached to ๐, but there is also a modern trend to represent the sounds covered by the independent vowels using combinations, too. The following list shows just a few examples.
This section maps Chakma vowel sounds to common graphemes in the Chakma orthography.
The left column shows dependent vowels, and the right column independent vowel letters.
dependent ๐จ
standalone ๐
dependent ๐ฉ
dependent ๐ช
standalone ๐
dependent ๐ซ
dependent ๐ฌ
standalone ๐
dependent ๐ฎ
dependent ๐ฌ
dependent ๐ง
diphthong ๐ฌ๐ญ
inherent vowel eg. ๐๐๐ข.
dependent ๐ Used by the Baarah Maatraa orthography.
standalone ๐
diphthong ๐ช๐๐จ
diphthong ๐ Used by the Baarah Maatraa orthography.
diphthong ๐ฐ
diphthong ๐ฏ
diphthong ๐ญ
nasalisation marker ๐
The following table summarises the main consonant to character assigments.
The right column contains aspirated sounds.
Onsets | ||
---|---|---|
Finals |
For additional details see consonant_mappings.
Whereas the table just above takes you from sounds to letters, the following simply lists the basic consonant letters (however, since the orthography is highly phonetic there is little difference in ordering).
Ganguly et al. say that native speakers don't distinguish between s and ส, and that there is also much interchangeability between s and tอกส. The following 2 examples with IPA transcriptions in Wikipedia appear to illustrate this, and an ambivalence between kสฐ and h, but more research is needed to completely map out the correspondences between written letters and sounds, and for now we will stick with the correspondences conventionally ascribed in the resources seen.
๐๐ฎ๐ฃ๐๐ง๐ข๐ด
๐๐ง๐ข๐ด๐๐ง๐๐ด
Observation: It is worth noting, however, that recordings on YouTube by Bivuti Chakma pronounce ๐ and ๐ as haห. He also tends to pronounce ๐ and ๐ as saห. It isn't clear whether this is a dialect, or idiolect, or standard pronunciation.
Observation: Bivuti also appears to pronounce ๐ and ๐ as faห.
The following consonants were introduced for use with specialised orthographies.
๐ is used for the sound v when writing Pali.
๐ is used for the aspirated sound lสฐ in the Baarah Maatraa orthography.
Observation: There is an indication from the couple of terms below that multiple consonants can appear in syllable onsets, but this needs further investigation. The examples found both use stacked consonants, which may be significant. The combination with h may produce breathiness or aspiration(?).
๐๐๐ด๐๐ณ๐ฆ ๐๐๐ด
๐๐ณ๐ข๐จ๐๐ด๐จ๐
Observation: It's not clear whether a subjoined HA represents a way of indicating an aspirated or breathy consonant, or a syllable-initial h, or a syllable-final h. In the word for Chakma above it doesn't appear to be a syllable initial. However, there are other occurrences of a subjoined HA with come with a maayyaa above the stack, and this may indicate a different pronunciation, eg. ๐๐ง๐๐๐๐ณ๐ฆ๐ด.
General vowel suppression The dropping of the inherent vowel for syllable codas in Chakma is marked using ๐ด.
๐๐๐ด
๐๐จ๐๐ด
๐๐ง๐ข๐ด๐๐ง๐๐ด
The same diacritic is also used to signal consonant clusters and gemination.
Syllable codas are generally marked using ๐ด over an ordinary consonant letter, but some are indicated by stacking (or in older texts ligation) of consonant glyphs (see clusters).
๐๐ง๐๐ด
Marks for codas Final ล and h can also be marked using the anusvara and visarga diacritics, ๐ and ๐, respectively.
๐ฆ๐จ๐ ๐ง๐
As a rule, consonant clusters only involve 2 consonants.mh,5
Consonant clusters are visually indicated in one of the following ways.
This is the most common way of indicating a consonant cluster in modern Chakma writing.mh,3 11134 is a combining mark attached to and appearing above the first consonant in the cluster. It is always visible, and no shaping is applied to either consonant.
๐๐๐ด๐๐๐ด
11134 is also used to kill the inherent vowel when no cluster is involved (as shown at the end of the example above).
It is also used to indicate gemination when combined with a vowel sign. When it appears above a stack it indicates gemination of the initial consonant; it is not being used as a vowel killer.
๐๐๐ด๐ณ๐ฆ๐ช๐ข๐จ
๐๐ง๐๐ด๐ณ๐
Clusters can also be indicated by stacking the consonants. To tell the font to stack the letters, use the invisible character 11133 between them.
In 2001 an orthographic reform was proposed that would limit conjuncts to just 5 subjoined lettersmh,3, shown below in combination with ๐.
The 'subjoined' form of ๐ is actually conjoined, as in:
๐๐๐ด๐๐ณ๐ tอกสaหndjษ cฤndแบฤ
Observation: The letter HA commonly appears in subjoined form, but it isn't clear whether this indicates an aspirated onset or a final -h.
๐๐๐ข๐ณ๐ฆ
Observation: Some combinations of consonants are both stacked and have maayyaa above. More research is needed to understand this usage. See questions for more detail.
Ligated forms are now considered old-fashioned.mh,3 In this style of writing, the second consonant in the cluster is often alongside the first, and both are shaped so that they join together.
More examples of these conjunct forms can be found in Everson & Hosken, p4.
Gemination is indicated using ๐ด. Iusually t is distinguished from the use for consonant clusters because a vowel sign is combined with the same base consonant.gc
๐๐๐๐ด๐ญ
๐๐จ๐๐ด๐ฌ
When the maayyaa appears with a stacked consonant cluster, it is used in this role, ie. not to kill the vowel, but to lengthen the initial consonant.
๐๐๐ด๐ณ๐ฆ๐ช๐ข๐จ
๐๐ฃ๐ง๐๐ด๐ณ๐ฆ๐๐จ
The Noto and RibengUni fonts allow maayyaa to appear immediately after the initial consonant in a stack, or after the final consonant, with no difference in the rendered result, and it is possible to find examples encoded in both ways. Everson and the Unicode Standard (whose text is derived from Everson's proposal) seem to assume that both the virama and the maayyaa are present to kill a vowel, and their texts indicate that there is no justification for having both combining marks side by side in storage. However, since the maayyaa doesn't have the role of killing the vowel here, but instead indicates gemination of the initial character in the cluster, it is logical to use the order:
C๐ด๐ณC
This order is also confirmed as the appropriate one by Glass.cldt,177
This section maps Chakma consonant sounds to common graphemes in the Chakma orthography.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐ Used for Pali.
consonant ๐ฅ
consonant ๐
consonant ๐ก
consonant ๐ฅ
consonant ๐ฆ
final aspiration ๐ Final aspiration.
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
consonant ๐
final nasal ๐ Coda.
consonant ๐ค
consonant ๐ข
consonant ๐ข
consonant ๐ฃ
consonant ๐ Used by the Baarah Maatraa orthography.
consonant ๐
consonant ๐ก (Confirmation needed.)
This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..
Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.
Two letters can be represented as an atomic character (the norm), or as a sequence of combining marks. The parts are separated in Unicode Normalisation Form D (NFD), and atomic in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.
Atomic (recommended) | Decomposed ( NOT recommended ) |
---|---|
๐ฎ | 11131 11127 |
๐ฏ | 11132 11127 |
Normally, text will use the atomic form, and this is generally recommended by the Unicode Standard.
The following atomic characters look as if they could be composed of parts, but in fact there is no equivalence during normalisation, and so the atomic characters only should be used.
Atomic | Sequence ( DO NOT use! ) |
---|---|
๐ฐ | 1112D 11127 |
๐ฎ | 11127 11133 11124 |
๐ซ | 1112A 1112A |
๐ | 11101 11101 |
Combining marks always follow the based character.
Where present, characters in an orthographic syllable should always occur in the following order.
A number of words contain both ๐ด and ๐ณ in the same consonant cluster. It is possible to find both of the following sequences of characters in online text:
C๐ณC๐ด
C๐ด๐ณC
The Noto and RibengUni fonts support either ordering, with no difference in the rendered result.
Everson and the Unicode Standard (whose text is derived from Everson's proposal) seem to assume that both the virama and the maayyaa are present to kill a vowel, and they have text to indicate that there is no justification for having both combining marks side by side in storage. However, since the maayyaa doesn't have the role of killing the vowel here, but instead indicates gemination of the initial character in the cluster, it is logical to use the order:
C๐ด๐ณC
The second consonant is usually ๐ or ๐ฆ. The following are examples found in a single page.
It is worth noting that the maayyaa is rendered over the initial letter in the conjunct, regardless of the code point sequence in memory.
Chakma has a set of native digits.
Bengali digits may also be used.
Myanmar digits are used when the Chakma script is used to write the Tanchangya language.mh,6
Chakma text runs left to right in horizontal lines.
Show default bidi_class
properties for characters in the Chakma orthography described here.
Experiment with examples using the Chakma character app.
The glyphs used for Chakma in India and Bangladesh differ slightly in roundness (similar to variation in the Tai Tham script as used in Northern Thai and Tai Khรผn).mh,1
Base characters can carry multiple combining marks. For example, in addition to a vowel sign a base consonant may carry one or more of the following diacritics: ๐ด, ๐, ๐, ๐. In some cases the glyphs for multiple combining marks need to be positioned side by side or carefully positioned relative to each other, as shown in the examples just below.
๐๐๐๐ด๐ญ
๐๐จ๐ ๐ฎ๐
Generally speaking, there is no interaction between consonant characters, but where consonant characters are stacked or ligated then it becomes necessary for the font to apply the needed shaping and placement of glyphs.
๐๐๐ด๐๐ณ๐ฆ
Most subjoined letters are just smaller versions of the original consonant letter, but significantly different shapes are used for subjoined r and y. Compare the following:
components | rendered |
---|---|
๐๐ณ๐ข |
๐๐ณ๐ข |
๐๐ณ๐ |
๐๐ณ๐ |
For example:
๐๐ณ๐ข๐จ๐๐ด๐จ๐
Words are separated by spaces.
Some words are hyphenated. For example:
๐ข๐ง๐ฅ๐ด-๐๐ง๐ฅ๐ด rษs-kษs
tbd
Chakma uses a mixture of ASCII and native punctuation.
phrase |
, ; |
---|---|
sentence |
๐ ๐ ๐ |
section |
๐ |
The shape of ๐ can vary, including some shapes that look like flowers or leaves.mh,6
Observation: Other punctuation marks may be in use, especially things such as colon and exclamation mark. Further research is needed to establish the complete set.
See type samples.
Chakma commonly uses ASCII parentheses to insert parenthetical information into text.
start | end | |
---|---|---|
standard | ( |
) |
Lines are generally broken between words.
As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.
Show line-breaking properties for characters in the modern Chakma orthography.
The following list gives examples of typical behaviours for certain characters. Context may affect the behaviour of some of these.
Click/tap on the characters to show what they are.
Line breaking should not split any combining mark from its base character, either.
See type samples.
Chakma has a native numeric style. Follow the type samples link above for a real world example.
The chakma numeric style is decimal-based and uses these digits.
Examples:
Generally, Chakma lists use a full stop plus a space as a suffix.
Examples: