Burmese

Myanmar orthography notes

Updated 5 June, 2023

This page brings together basic information about the Myanmar script and its use for the Burmese language. It doesn't cover use of the Burmese orthography for writing Pali. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Burmese using Unicode.

Sample

Select part of this sample text to show a list of characters, with links to more details. Source
Change size:   28px

အပိုဒ် ၁ လူတိုင်းသည် တူညီ လွတ်လပ်သော ဂုဏ်သိက္ခာဖြင့် လည်းကောင်း၊ တူညီလွတ်လပ်သော အခွင့်အရေးများဖြင့် လည်းကောင်း၊ မွေးဖွားလာသူများ ဖြစ်သည်။ ထိုသူတို့၌ ပိုင်းခြား ဝေဖန်တတ်သော ဉာဏ်နှင့် ကျင့်ဝတ် သိတတ်သော စိတ်တို့ရှိကြ၍ ထိုသူတို့သည် အချင်းချင်း မေတ္တာထား၍ ဆက်ဆံကျင့်သုံးသင့်၏။

အပိုဒ် ၂ လူတိုင်းသည် လူ့အခွင့် အရေး ကြေညာစာတမ်းတွင် ဖော်ပြထားသည့် အခွင့်အရေး အားလုံး၊ လွတ်လပ်ခွင့် အားလုံးတို့ကို ပိုင်ဆိုင် ခံစားခွင့်ရှိသည်။ လူမျိုးနွယ်အားဖြင့် ဖြစ်စေ၊ အသားအရောင်အားဖြင့် ဖြစ်စေ၊ ကျား၊ မ၊ သဘာဝအားဖြင့် ဖြစ်စေ၊ ဘာသာစကားအားဖြင့် ဖြစ်စေ၊ ကိုးကွယ်သည့် ဘာသာအားဖြင့် ဖြစ်စေ၊ နိုင်ငံရေးယူဆချက်၊ သို့တည်းမဟုတ် အခြားယူဆချက်အားဖြင့် ဖြစ်စေ၊ နိုင်ငံနှင့် ဆိုင်သော၊ သို့တည်းမဟုတ် လူမှုအဆင့်အတန်းနှင့် ဆိုင်သော ဇစ်မြစ် အားဖြင့်ဖြစ်စေ၊ ပစ္စည်း ဥစ္စာ ဂုဏ်အားဖြင့် ဖြစ်စေ၊ မျိုးရိုးဇာတိအားဖြင့် ဖြစ်စေ၊ အခြား အဆင့်အတန်း အားဖြင့် ဖြစ်စေ ခွဲခြားခြင်းမရှိစေရ။ ထို့ပြင် လူတစ်ဦး တစ်ယောက် နေထိုင်ရာ နိုင်ငံ၏ သို့တည်းမဟုတ် နယ်မြေဒေသ၏ နိုင်ငံရေးဆိုင်ရာ ဖြစ်စေ စီရင် ပိုင်ခွင့်ဆိုင်ရာ ဖြစ်စေ တိုင်းပြည် အချင်းချင်း ဆိုင်ရာဖြစ်စေ၊ အဆင့်အတန်း တစ်ခုခုကို အခြေပြု၍ သော်လည်းကောင်း၊ ဒေသနယ်မြေတစ်ခုသည် အချုပ်အခြာ အာဏာပိုင် လွတ်လပ်သည့် နယ်မြေ၊ သို့တည်းမဟုတ် ကုလသမဂ္ဂ ထိန်းသိမ်း စောင့်ရှောက် ထားရသည့် နယ်မြေ၊ သို့တည်းမဟုတ် ကိုယ်ပိုင် အုပ်ချုပ်ခွင့် အာဏာတို့ တစိတ်တဒေသလောက်သာ ရရှိသည့် နယ်မြေ စသဖြင့် ယင်းသို့ သော နယ်မြေများ ဖြစ်သည်၊ ဖြစ်သည် ဟူသော အကြောင်းကို အထောက်အထား ပြု၍ သော်လည်းကောင်း ခွဲခြားခြင်း လုံးဝ မရှိစေရ။

Usage & history

The Myanmar script is used to write Burmese and, with various extensions and adaptations, for other languages in the region, such as Mon, Karen, Kayah, Shan, and Palaung. It is also used to write Pali and Sanskrit.

မြန်မာအက္ခရာ mjàɴmà ʔɛʔkʰa̰jà Burmese alphabet

A descendant of the Brahmi script, via Pallava and Old Mon, early evidence of the Myanmar script dates back to around the 10th century. What were originally square shapes evolved around the 17th century to become the rounded forms we see today, supposedly to improve writing techniques on palm leaves.

Sources: Scriptsource, Wikipedia.

Basic features

The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden, where needed, using vowel signs. See the table to the right for a brief overview of features for the modern Burmese orthography.

Myanmar text runs left to right in horizontal lines.

Spaces separate phrases, rather than words. ❯ words

The 26 consonant letters used for pure Burmese words are supplemented by 9 more which are used in Pali loan words. ❯ consonants

Consonant stacking is used in multi-syllabic words (mostly derived from Pali) to indicate doubled or homorganic consonant clusters. Subjoined (subscripted) forms are produced using a dedicated, invisible virama character. Conjuncts do not span word boundaries. ❯ clusters

Syllable-initial clusters use 4 dedicated combining marks for the medial. Aspirated onset consonants are common: some have dedicated letters, others are indicated with a subjoined h. ❯ onsets

Syllable-final consonant sounds use ordinary characters with an always visible mark called asat to indicate that the inherent vowel is killed. There is also one dedicated final consonant (the anusvara). ❯ finals

The Burmese orthography is an abugida with 4 inherent vowels: one for open syllables, and that plus 3 more for closed syllables.

Non-inherent vowel sounds that follow a consonant are written using combining marks (mostly vowel signs), 2 consonants, and the asat diacritic. ❯ vowels

This page lists just 6 multipart vowels. They can involve up to 3 glyphs, and glyphs can surround the base consonant(s) on up to 2 sides. There is 1 pre-base glyph, and no circumgraphs. ❯ compositeV

The pronunciation of some vowel graphemes may vary in open and closed syllables.

Burmese doesn't have true standalone vowels, since syllables begin with a consonant, including the glottal stop. The latter can be written using vowel signs applied to [U+1021 MYANMAR LETTER A], but may also be represented by an incomplete set of independent vowels, mostly used for Pali or Sanskrit words. ❯ standalone

Burmese has 4 tones, one of which is used exclusively in closed syllables. The tone of an open syllable can be indicated by the vowel used, or by combining a vowel and one of 2 combining marks. In a couple of instances an asat is used to indicate tone information, rather than attaching a tone mark. ❯ tones

Character index

Letters

Show

Basic consonants

ပ␣ဖ␣ဗ␣ဘ␣တ␣ထ␣ဒ␣က␣ခ␣ဂ␣အ␣သ␣ဿ␣စ␣ဆ␣ဇ␣ဟ␣မ␣န␣ည␣ဉ␣င␣ဝ␣ရ␣လ␣ယ

Pali loan words

ဓ␣ဃ␣ဈ␣ဠ
ဋ␣ဌ␣ဍ␣ဎ␣ဏ

Independent vowels

ဣ␣ဤ␣ဥ␣ဦ␣ဧ␣ဩ␣ဪ

Combining marks

Show

Vowel signs

ေ␣ိ␣ီ␣ု␣ူ␣ဲ␣ာ␣ါ

Tones

့␣း

Medials

ျ␣ြ␣ွ␣ှ

Bindu

Invisible stacker

Pure killer

Numbers

Show
၀␣၁␣၂␣၃␣၄␣၅␣၆␣၇␣၈␣၉

Punctuation

Show
၊␣။␣“␣”␣၌␣၍␣၎␣၏

ASCII

(␣)␣:␣?

Other

Show
‌␣​␣⁠

To be investigated

!␣§␣͏␣‍␣–␣—␣‘␣’␣†␣‡␣…␣′␣″
Items to show in lists

Phonology

Click on the sounds to see where else in the document they are referred to.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i ĩ u ũ e o ə ə ɛ ɔ a ã

Diphthongs

u̯a u̯ɛ u̯e ẽɪ õʊ əɨ əɨ ãɪ ãʊ

Some of the above sounds can only occur in open syllables, others only in closed syllables. As a rough rule, the plain vowels occur in open syllables, and the diphthongs in closed.

The sound ə only occurs in minor syllables, and is the only sound occurring in those syllables.

Vocalic weakening

A process called vocalic weakening affects the first syllables of certain words (mostly nouns and adverbs), eg. ထမင် is pronounced tʰəmɪ̀ɴ, not tʰa̰mɪ̀ɴ; ဘုရား is pronounced pʰəjá, not pʰṵjá.

Consonant sounds

labial dental alveolar post-
alveolar
palatal velar glottal
stops p b t d       k ɡ ʔ
apirated        
affricates       t͡ʃ d͡ʒ      
apirated       t͡ʃʰ      
fricatives   θ ð s z ʃ     h
apirated            
nasals m   n   ɲ ɲ̊ ŋ ŋ̊
approximants w ʍ   l   j  
trills/flaps     r  

Tones

Burmese is a contour tone language, with four tones: creaky, low, high and stopped. The tones involve more than simple pitch variations.

Diagram of tones.

Myanmar tones.m,7

Syllables ending with a stop consonant (always pronounced ʔ) always use the stopped tone, and no others.

Syllables that end with a vowel sound or the nasal sound (always pronounced ɴ) have one of the other three tones.

The following table provides typical phonological transcriptions and descriptions for the four tones, using a as the base for the examples.wbl,#Tones

stoppedaăʔ˥˧ short duration, high intensity, high pitch, final glottal stop
creakyaˀ˥˧ medium duration, high intensity, high pitch, often slightly falling
higháaː˥˥˦ long duration, high intensity, high pitch, often with fall before pause, sometimes breathy
lowàaː˧˧˦ medium duration, low intensity, low pitch, often slightly rising

Structure

The Burmese language is tonal and many native Burmese words are monosyllabic.

Syllables can start with a consonant or initial vowel. An initial consonant may be followed by a medial consonant, which adds the sound j or w. After the vowel, a syllable may end with a nasalisation of the vowel or an unreleased glottal stop, though these final sounds can be represented by various different consonant symbols.

In multisyllabic words derived from an Indian language such as Pali, where two consonants occur internally with no intervening vowel, the consonants tend to be stacked vertically, and the asat sign is not used.

Orthographic syllables

Algorithms for manipulating text at a sub-word level use orthographic syllable boundaries, rather than phonetic syllable boundaries.

The boundaries of an orthographic syllable may differ from those of a phonetic syllable, particularly in the presence of a conjunct. For more details, see graphemes.

ဣန္ဒြေ      ဣန္ဒြေ
The same word, split into phonetic syllables (left) and orthgraphic syllables (right). (Click to see the composition.)
show composition

ဣန္ဒြေ

Vowels

Dashes are used to indicate whether the character represents a vowel sound in a closed or an open syllable.

Vowel summary

The pronunciation of the vowel sign often depends on whether it appears in an open or closed syllable, eg. compare the following, which are open and closed, respectively.

ဆိုး ဆိုင်

Open syllables typically contain plain vowels, while closed ones contain diphthongs.

Plain:
ိ␣ီ␣ ␣ွ␣ ␣ု␣ူ
ေ␣ ␣ို
ဲ␣ယ်␣ ␣ော␣ေါ␣ော်␣ေါ်
ာ␣ါ
Diphthongs:
ိ␣ ␣ု
ို␣ော␣ေါ
Standalone:
ဣ␣အိ␣ဤ␣အီ␣ ␣ဥ␣အု␣ဦ␣အူ
ဧ␣အေ
အဲ␣ ␣ဩ␣ဪ
အ␣အာ␣အါ
အိ␣ ␣အု

For additional details see vowel_mappings.

Inherent vowels

က ka U+1000 MYANMAR LETTER KA

In open syllables the inherent vowel is usually transcribed and pronounced in Burmese as a, but very often reduced phonetically to ə. So ka is written by simply using the consonant letter.

In closed syllables, the inherent vowel is pronounced as one of ɪ, e, a, or ɛ, depending on the final consonant that follows, eg. နှစ်

Combining marks used for vowels

ကိ ki U+1000 MYANMAR LETTER KA + U+102D MYANMAR VOWEL SIGN I

Burmese uses the following combining marks for vowels.

ိ␣ီ␣ု␣ူ␣ွ␣ေ␣ဲ␣ာ␣ါ␣်

102B is just an alternative shape for [U+102C MYANMAR VOWEL SIGN AA]. See shape_variants.

103D in closed syllables when not followed by another vowel mark is ʊ (rather than the glide w), eg.

နွမ်း

103A is more typically used to indicate a final consonant in a syllable, but it is also used as part of several multipart vowels.

This next list shows vowels represented using more than one character. There are no single-character circumgraphs in Burmese text, but multiple combining marks following a consonant may appear on opposite sides of the base to represent a single vowel (or diphthong) (see compositeV).

Three vowel signs are always spacing marks, meaning that they consume horizontal space when added to a base consonant. The 2 vowel signs positioned below the base may also become spacing glyphs if they follow a consonant cluster.

All combining characters are stored after the base consonant, and the glyph rendering system takes care of the positioning at display time. Some input systems may allow the user to type a pre-base vowel before the base consonant, but it is still stored after.

Combining marks are attached to an orthographic syllable, rather than just applied to the letter of the immediately preceding consonant. This means that pre-base vowel glyphs appear before a whole consonant cluster if it is rendered as a conjunct (see orthosyllables).

Consonant letters pronounced as vowels

ကွ U+1000 MYANMAR LETTER KA + U+103D MYANMAR CONSONANT SIGN MEDIAL WA

Burmese uses the following consonant letters to express vowels.

အ␣ယ

1021 on its own represents the standalone version of the inherent vowel, ʔa.   It is used as a base for other standalone vowels.

101A is used as part of a multipart vowel, eg. စံပယ်

Multipart vowels

ကော် kɔ̀ U+1000 MYANMAR LETTER KA + U+1031 MYANMAR VOWEL SIGN E + U+102C MYANMAR VOWEL SIGN AA + U+103A MYANMAR SIGN ASAT

Vowels represented by combinations of the previously mentioned characters:

ို␣ယ်␣ော␣ေါ␣ော်␣ေါ်

In the sequence ော် [U+1031 MYANMAR VOWEL SIGN E + U+102C MYANMAR VOWEL SIGN AA + U+103A MYANMAR SIGN ASAT], the asat is added to indicate the low tone. Otherwise, it is the same vowel.

The base may be a single consonant, or it may be a stacked consonant cluster. If the latter, the components of the multipart vowel are arranged around the cluster, rather than just around the immediately preceding consonant.

ကြော်
The vowel ɔ in t͡ɕɔ̀ is made up of 3 characters that surround the consonant cluster.
show composition

ကြော်

Show which combinations contain a given character:
ို
ို
ော␣ေါ␣ော်
ော␣ော်
ေါ␣ေါ်
ယ်
ယ်␣ော်␣ေါ်
Show details about vowel glyph positioning.

The following list shows where vowel signs are positioned (by default) around a base consonant to produce vowels, and how many instances of that pattern there are. Numbers after the + sign represent combinations of vowel signs.

  • 1 pre-base, eg. ကေ ke
  • 2 post-base, eg. ကာ ka
  • 3 superscript, eg. ကိ ki kḭ
  • 2 subscript, eg. ကု ku kṵ
  • +2 pre+post-base, eg. ကော kea kɔ́
  • +1 super+subscript, eg. ကို kiu
  • +1 post+post-base, eg. ကယ် kyˣ kɛ̀
  • +2 pre+post+superscript, eg. ကော် keaˣ kɔ̀

The special long forms of [U+102F MYANMAR VOWEL SIGN U​] and [U+1030 MYANMAR VOWEL SIGN UU​], used when there is not enough room for them below a cluster are produced by the font.

Characters that don't appear in the combinations:

ီ␣ူ␣ဲ␣ွ

Vowel length

The 'primary' vowels have 'short' and 'long' written forms that hark back to the earlier Indic script origins, but the distinction is only used nowadays to indicate different tones. For example, compare the tones in the open syllables at the beginning of these 2 words.

မိနစ်

မီ

Nasalisation

Syllable-final nasals are usually realised as nasalisation of the vowel.wbp,#The_homorganic_nasal_and_glottal_stop

ကန်

The phonetic notation here uses ɴ for final nasals, but other sources may use ɰ̃.

Standalone vowels

In principle, Burmese syllables always begin with a consonant, so true standalone vowels don't exist. There is, however, a set of 'independent vowels' that represent syllables beginning with a glottal stop, and the same sounds and more can be created using the glottal stop consonant and combining vowel marks. The latter is the more frequent approach.

Independent vowels

ဣ␣ဤ␣ဥ␣ဦ␣ဧ␣ဩ␣ဪ

Myanmar has a set of independent vowel letters used to represent standalone vowels, but only in certain words – typically Indian loan words.

ဩဂုတ်
Use of an independent vowel letter for a standalone vowel sound.
show composition

ဩဂုတ်

Each letter represents a specific vowel+tone combination (see the accent marks in the list below). Not all vowel+tones combinations are represented.

ဧရာဝတီ ဩဂုတ်

Sequences of characters can be combined to look like a few of the independent vowels, eg. သြော် [U+101E MYANMAR LETTER SA + U+103C MYANMAR CONSONANT SIGN MEDIAL RA + U+1031 MYANMAR VOWEL SIGN E + U+102C MYANMAR VOWEL SIGN AA + U+103A MYANMAR SIGN ASAT] can look very similar to or the same as 102A. The Unicode Standard strongly recommends to only use the single code point for each independent vowel. See encoding.

Glottal stop + vowel

အိ␣အီ␣အု␣အူ␣အေ␣အဲ␣အ␣အာ␣အါ

1021 on its own represents the glottal stop with whatever inherent vowel is appropriate for the context in which it is used, eg.

အင်္ဂါ

အန္တရာယ်

အက္ခရာ

The following examples show its use as a base for vowel signs.

အိတ်

အုတ်

အေး

အုန်းသီး
Typically, Burmese adds vowel signs to the glottal stop consonant to indicate standalone vowels.
show composition

အုန်းသီး

Pre-base vowel sign

Burmese has only one vowel sign that appears to the left of the base consonant letter or cluster, see fig_prebase.

အမေ
The pre-base vowel sign e, pronounced after the m in ʔəmè, but rendered before the m.
show composition

အမေ

This is a combining mark that is always stored after the base consonant. The rendering process places the glyph before the base consonant or cluster.

The vowel sign is actually placed before the start of an orthographic syllable. In fig_prebase_cluster the sequence of glyphs for the orthographic syllable is rendered VCC, whereas the pronunciation is CCV. In clusters with 3 consonants, it will still be rendered before the consonants.

အငွေ့
The pre-base vowel sign e, pronounced after the w in ʔa̰ŋwḛ, but rendered before the ŋw stack.
show composition

အငွေ့

Some input methods may allow the user to type this vowel before the consonant, whereas others will expect it to be typed after, per the stored order.

Shape variants

Tall AA. There are two forms of the -a vowel sign in Burmese. The combination 101D 102Bglyph is used for the vowel to avoid confusion, ie. 101D 102B. This glyph, whether alone or as part of a complex vowel, is used after the following consonants:

ပ␣ဒ␣ခ␣ဂ␣င␣ဝ

For example, ပေါင် Where there is no ambiguity, however, the normal shape is used, eg. ပြောင်းဖူး

Whereas in Unicode 5.0 the choice of appropriate form was left to the font or implementation during rendering, such contextual decisions are not appropriate for Sgaw Karen and other minority scripts, which only use the tall form, so [U+102B MYANMAR VOWEL SIGN TALL AA​] was added to Unicode 5.1 as a separate character, and needs to be typed explicitly.u,648

Displaced U vowel signs. The vowel signs [U+102F MYANMAR VOWEL SIGN U​] and [U+1030 MYANMAR VOWEL SIGN UU​], which normally appear below a consonant, are displayed to the right if something else intrudes on that space. Examples include

There are no special characters for these forms. The shape is produced automatically by the font.

Tones

Burmese has 4 tones, one of which is used exclusively in closed syllables. The tone of an open syllable can be indicated by the vowel used, or by combining a vowel and one of 2 combining marks. In a couple of instances an asat is used to indicate tone information, rather than attaching a tone mark. See also tone_phonetics.

As previously mentioned, the stopped tone only occurs in checked syllables, and always occurs in such syllables. The tone is not marked, unless you count the asat as a marker, eg. ကတ်

Open syllables, however, show more variety. The tone of a syllable can be indicated by the vowel used, by combining a vowel and one of the following combining marks, or by a couple of special sequences.

့␣း

There are 7 main vowel sounds in open syllables. The following table lists those sounds and how the tone information is expressed with the consonant KA.

creaky low high example
i short form ကိ long form ကီ long+visarga ကီး မီး
u short form ကု long form ကူ long+visarga ကူး တူ
e dot below ကေ့ no mark ကေ visarga ကေး နှေး
o dot below ကို့ no mark ကို visarga ကိုး ဆိုး
ɛ dot below ကဲ့ killed-y ကယ် no mark ကဲ ဘယ်
ɔ dot below ကော့ asat ကော် no mark ကော ပျော်
a inherent vowel က no mark ကာ visarga အား လာ

Independent vowel letters are all specific to a particular tone.

creaky low high
i
u
e
o
ɛ
ɔ
a

Additional phonetic details

This section provides more detailed information about the pronunciation of rhymes in Burmese.

Show the detail.

Rhymes

A vowel plus tone combination is called a rhyme.

The following table shows the normal combinations of vowel, final consonant and tone mark characters that are seen in Burmese, and their pronunciations. Read down the left column to find the symbol used for the vowel sound, and across the top row to find syllable final consonants. The table doesn't take vowel reduction into account.

  open က် စ် တ် ပ် င် ည် ဉ် န် မ်
- a̰, ə ɛʔ ɪʔ ɪ̀ɴ ì, è, ɛ̀ ɪ̀ɴ àɴ àɴ àɴ
+ ့           ɪ̰ɴ ḭ, ḛ, ɛ̰ ɪ̰ɴ a̰ɴ a̰ɴ a̰ɴ
+ း           ɪ́ɴ í, é, ɛ́ ɪ́ɴ áɴ áɴ áɴ
ာ/ါ à                    
+ ့                      
+ း á                    
ယ် ɛ̀                    
+ ့                      
+ း                      
ိ (ဣ)     eɪʔ eɪʔ       èɪɴ èɪɴ èɪɴ
+ ့                 ḛɪɴ ḛɪɴ ḛɪɴ
+ း                 éɪɴ éɪɴ éɪɴ
ီ (ဤ) ì                    
+ ့                      
+ း í                    
ု (ဥ)     oʊʔ oʊʔ       òʊɴ òʊɴ òʊɴ
+ ့                 o̰ʊɴ o̰ʊɴ o̰ʊɴ
+ း                 óʊɴ óʊɴ óʊɴ
ူ (ဦ) ù                    
+ ့                      
+ း ú                    
ေ (ဧ) è                    
+ ့                    
+ း (ဧး) é                    
ɛ́                    
+ ့ ɛ̰                    
+ း                      
ော/ေါ (ဩ) ɔ́ aʊʔ       àʊɴ          
+ ့ ɔ̰         a̰ʊɴ          
+ း           áʊɴ          
ော်/ေါ် (ဪ) ɔ̀                    
+ ့                      
+ း                      
ို ò aɪʔ       àɪɴ          
+ ့         a̰ɪɴ          
+ း ó         áɪɴ          
      ʊʔ ʊʔ       ʊ̀ɴ ʊ̀ɴ  
+ ့                 ʊ̰ɴ ʊ̰ɴ  
+ း                 ʊ́ɴ ʊ́ɴ  

Vowels in open syllables

There are 7 main vowel sounds in open syllables. The following lists those sounds and their different representations for the three tones in Burmese, creaky, low and high, that apply to open syllables. (Combining symbols are shown with , and alternate independent forms are shown in parentheses.)

description low high creaky example
a Primary central အာ အား inherent လာ come
i Primary front အီ အီး () အိ () မီး fire
u Primary back အူ () အူး အု () တူ chopsticks
e High front mid အေ အေး () အေ့ နှေး n̥é slow
o High back mid အို အိုး အို့ ဆိုး sʰó bad
ɛ Low front mid အယ် အဲ အဲ့ ဘယ် bɛ̀ which
ɔ Low back mid အော် () အော () အော့ ပျော် pjɔ̀ happy

The following table summarises the above in a way that allows you to see how the various tones are applied to open syllables using the native Myanmar characters. Where long vs. short forms exist, for the purposes of clarity in the table, the long form is taken here to be the standard form and the short form a variant.

low high creaky
a no mark visarga inherent vowel
i no mark visarga short form
u no mark visarga short form
e no mark visarga dot below
o no mark visarga dot below
ɛ killed-y form no mark dot below
ɔ asat no mark dot below

Vowels in closed syllables

Vowels in 'closed' syllables end in a glottal stop or nasalisation. Historically, however, they ended in one of four nasals or four stops, and this is still reflected in the orthography. The vowel quality has also evolved in these syllables, typically producing diphthongs.

To indicate that the consonant is syllable-final, an asat is placed over it.

The sound values of vowel signs used in open and closed syllables differs systematically as follows.

i becomes , eg. အိန် ʔèɪɴ; အိတ် ʔeɪʔ.

u becomes , eg.အုန် ʔòʊɴ; အုတ် ʔoʊʔ.

ɔ becomes aʊ, eg. အောင် ʔàʊɴ; အောက် ʔaʊʔ.

o becomes , eg. အိုင် ʔàɪɴ; အိုက် ʔaɪʔ.

The inherent a is a lot more complicated, becoming one of ɪ, e, a, or ɛ.

The list of most common sounds are show in the large table above, and in the smaller tables below. There are other combinations of vowel and final consonant found in Burmese words of Indian origin, which often stick to the original Indian spelling, however, they tend to follow Burmese pronunciation, eg. ဓာတ် daʔ, ဗိုလ် , ဥယ္ယာဉ် ʔṵjaɴ.

Vowels in closed syllables ending in nasals

The following table lists the main sounds in Burmese where the syllable ends in a nasal.

Example
ã     အန် အမ် ပန်း páɴ flower
ĩ အင်       ဝင် wɪ̀ɴ enter
ɛ   အည်      
ũ     အွန်   ဇွန်း zʊ́ɴ spoon
    အိန် အိမ် အိမ် ʔèɪɴ house
oʊ̃     အုန် အုမ် ရန်ကုန် jàɴkòʊɴ Rangoon
aʊ̃ အောင်       ကောင်း káʊɴ good
aɪ̃ အိုင်       ဆိုင် sʰàɪɴ store

Note how အည် doesn't end in a nasalisation. There is another consonant, [U+1009 MYANMAR LETTER NYA], which has come to be used to produce nasalisation.

These syllables are by default low in tone, but creaky and high tones can be indicated using   [U+1037 MYANMAR SIGN DOT BELOW​] and   [U+1038 MYANMAR SIGN VISARGA​] in a very regular way. Note that the tone mark appears at the end of the syllable, not immediately after the vowel, eg. အုန့် and ကောင်း.

Vowels in closed syllables ending in stops

The following table lists the main sounds in Burmese where the syllable ends in a stop.

က Example
    အတ် အပ် ဖတ် pʰaʔ read
  အစ်     နှစ် n̥ɪʔ year
ɛʔ အက်       ကြက် tɕɛʔ chicken
ũ     အွတ်   လွတ်လပ် lʊʔlaʔ independent
eiʔ     အိတ် အိပ် အရိပ် ʔa̰jeɪʔ shadow
oʊʔ     အုတ် အုပ် စာအုပ် sàʔoʊʔ book
aʊʔ အောက်       နောက် naʊʔ next
aɪʔ အိုက်       လိုက် laɪʔ follow

These syllables are all unmarked 4th (stopped) tone.

Vowel sounds mapped to characters

This section maps Burmese vowel sounds to common graphemes in the Myanmar orthography, grouped by open ( o ), or closed ( c ) syllables, or as a standalone vowel ( s ).. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Dependent vowels are shown to the left, and independent vowels are shown on the right. Independent vowels are usually only used for a few Indian loan words, and equivalent sounds and more can be written using the glottal stop consonant combined with a vowel. Those combinations are not shown here.

Plain vowels

i
open

(creaky tone)
102D
အဘိဓာန်

(low tone)
102E
စေတီ

(high tone)
102E 1038
မီး

(low tone)
1024 Used in a few words only (typically Indian loan words).

(creaky tone)
1023. Ditto.
ဣန္ဒြေ

ɪ~i
closed

Inherent vowel, followed by –ည် i, –စ် ɪʔ, or one of –င်, –ဉ် ɪɴ
ထမင်း

1021
အင်္ဂါ

u
open

(creaky tone)
102F
ဟင်းနုနွယ်

(low tone)
1030
တူ

(high tone)
1030 1038
ဗုဒ္ဓဟူးနေ့

(creaky tone)
1025. Used in a few words only (typically Indian loan words).

(low tone)
1026.  Ditto.
ဦး

ʊ
closed

103D
နွမ်း

 

e
open

(low tone)
1031
မြေပုံ

(creaky tone)
1031 1037
ရှေ့

(high tone)
1031 1037
ဆေးရုံ

(low tone)
1027 Used in a few words only (typically Indian loan words).
ဧက

 
closed

Inherent vowel, followed by –ည်.

 

o
open

(low tone)
102D 102F
အညို

(creaky tone)
ို့ [U+102D MYANMAR VOWEL SIGN I + U+102F MYANMAR VOWEL SIGN U + U+1037 MYANMAR SIGN DOT BELOW]
နို့နဲ့

(high tone)
ိုး [U+102D MYANMAR VOWEL SIGN I + U+102F MYANMAR VOWEL SIGN U + U+1038 MYANMAR SIGN VISARGA]
ဆိုး

 

ɛ
open

(high tone)
1032
ခဲတံ

(creaky tone)
1032 1037
ပြီခဲ့တဲ့လ

(low tone)
101A 103A
ဘယ်

 

 
closed

Inherent vowel, followed by –ည် or -က် ɛʔ
စက်တင်ဘာ


အက္ခရာ

ɔ
open

(high tone)
1029 Used in a few words only (typically Indian loan words).
ဩဂုတ်

(low tone)
102A in some words, particularly Indian loan words or words in the literary style.

ə
open

Unstressed vowels in minor syllables
ကလေး

a
open

(creaky tone)
Inherent vowel
ကျ

(low tone)
102C
102B
ဆရာ

(high tone)
102C 1038
102B 1038
နွား

1021 often reduced to ə.
အရိပ်

 
closed

Inherent vowel, followed by one of –တ်, –ပ်, or one of –န်, –မ်, –ံ
ကတ်

1021
ကဏ္ဍ

Diphthongs

closed

(low tone)
102D
အိမ်

(creaky tone)
102D 25CC 1037
စိမ့်

(high tone)
102D 25CC 1038
တစ်သိန်း

closed

(low tone)
102F
ရန်ကုန်

(creaky tone)
102F 25CC 1037
မုန့်ဆိုင်

(high tone)
102F 25CC 1038
သုံး

closed

(low tone)
102D 102F
ဆိုင်

(creaky tone)
102D 102F 25CC 1037

(high tone)
102D 102F 25CC 1038
ထိုင်းနိုင်ဂံ

closed

(low tone)
1031 102C
1031 102B
ကြောင်

(creaky tone)
1031 102C 25CC 1037
1031 102B 25CC 1037
စောင့်တယ်

(high tone)
1031 102C 25CC 1038
1031 102B 25CC 1038
ကျောင်း

Consonants

Consonant summary

The set of initial consonants includes letters used infrequently for Pali-derived words. Aspirated consonants are shown separately at the end of the line. Allophones created by sandhi or unusual positioning are not listed here, but are listed in the section consonant_mappings.

Initials:
ပ␣ဗ␣ဘ␣တ␣ဋ␣ဒ␣ဍ␣ဎ␣ဓ␣က␣ဂ␣ဃ␣အ␣ ␣ဖ␣ထ␣ဌ␣ခ
ကျ␣ကြ␣ဂျ␣ဂြ␣␣ချ␣ခြ
သ␣ဿ␣စ␣ဇ␣ဈ␣သျှ␣လျှ␣ဟ␣ ␣ဆ
မ␣န␣ဏ␣ည␣ဉ␣ငြ␣င␣ ␣မှ␣နှ␣ညှ␣ငှ
ဝ␣ြွ␣လ␣ဠ␣ယ␣ရ␣ ␣ဝှ␣ြွှ␣လှ
Medials:
ွ␣္လ␣ျ␣ြ␣ှ
Finals:
က်␣စ်␣တ်␣ပ်
င်␣ဉ်␣ည်␣န်␣မ်

For additional details see consonant_mappings.

Basic Burmese consonants

Native Burmese words use a subset of the consonants that make up the traditional articulatory arrangement of indic scripts.

ပ␣ဖ␣ဗ␣ဘ␣တ␣ထ␣ဒ␣က␣ခ␣ဂ␣အ
သ␣ဿ␣စ␣ဆ␣ဇ␣ဟ
မ␣န␣ည␣ဉ␣င
ဝ␣ရ␣လ␣ယ

Other consonant sounds

Foreign consonant sounds

Some Burmese conventions exist for representing foreign sounds. f is (usually ), v is (usually b) or ဗွ (usually bw), eg. တီဗွီ

A foriegn syllable final sound can be rendered by placing a second killed consonant after the syllable, sometimes in parentheses, eg. ဘတ်(စ်)

Pali loans

The following additional consonants are mainly used for Pali loan words.

ဋ␣ဌ␣ဍ␣ဎ␣ဓ␣ဃ␣ဈ␣ဏ␣ဠ

Additional symbols are available for use in loan words, especially Indian loan words. These include the retroflex and voiced aspirated consonants.

Voicing

Unvoiced syllable initial consonants are typically pronounced with voicing when they appear in non-initial syllables of a word or in particle suffixes, unless they follow a syllable with stopped tone or follow the [U+1021 MYANMAR LETTER A] prefix. Aspirated consonants lose their aspiration at the same time. For example, သတင်းစာ farmer is pronounced θədɪ́ɴzà not θətɪ́ɴsà. However, because of the rule about the stopped tone (ie. a syllable ending in a plosive consonant), တစ်ဆယ် ten is pronounced təʔsʰɛ̀ not təʔzɛ̀.

Note that care needs to be taken with compound words, since they contain more than one word-initial syllable, eg. နားထောင် listen is pronounced nátʰàʊɴ not nádàʊɴ .m,175-6

There is also an irregular pattern of voicing initial consonants, particularly with place names. Mesher provides examples of words beginning with [U+1005 MYANMAR LETTER CA] [U+1015 MYANMAR LETTER PA] [U+1010 MYANMAR LETTER TA] and [U+1011 MYANMAR LETTER THA], eg. စေတီ table is pronounced zèdì not sèdì ; ပုဂံ Pagan/Bagan is pronounced bəgàɴ not pəgàɴ; ထားဝယ် Tavoy/Dawei is pronounced dəwɛ̀ not tʰəwɛ̀.m,251

Consonants with no following vowel

Consonants with no following vowel occur in consonant clusters and at syllable- and word-final locations. For the former see onsets and clusters, and for the latter see finals.

Onset consonants

Unicode has the following, dedicated combining characters for the second letter in a syllable-onset cluster. The virama should not be used.

ျ␣ြ␣ွ␣ှ

The following panel lists a number of syllable-onset clusters which are not pronounced as you might expect.

ကျ␣ကြ␣ဂျ␣ဂြ␣ချ␣ခြ␣သျှ␣လျှ␣ငြ␣ြွ␣ြွှ

Medial YA and RA. 103B and 103C are both pronounced j by default, eg. ပျော် ပြည်

However, when preceded by a velar stop these characters produce palatalisation as tɕ, tɕʰ, dʑ, ɲ, eg. ကျောင်း ကြက် ဂျပန်

Medial HA. 103E is used to create aspirated versions of consonants, eg. မှာ See aspiration for more details. It also creates the sound ʃ in the combinations 101B 103E and လျှ [U+101C MYANMAR LETTER LA + U+103B MYANMAR CONSONANT SIGN MEDIAL YA + U+103E MYANMAR CONSONANT SIGN MEDIAL HA] eg. ရှိတယ်

The -h medial is typically transcribed before the letter it modifies, unlike the order of characters as typed or stored in memory. For example, မြွှ [U+1019 MYANMAR LETTER MA + U+103C MYANMAR CONSONANT SIGN MEDIAL RA + U+103D MYANMAR CONSONANT SIGN MEDIAL WA + U+103E MYANMAR CONSONANT SIGN MEDIAL HA] (pronounced m̥w) is transcribed hmrw. For more information about character order, see cporder.

Medial WA. 103D represents the w glide, eg. နွား

However, it may also represent the vowel ʊ (see vowelsigns).

Multiple medials. It is possible to find 2 or 3 medials in an onset cluster, eg. (click on words to see components) လျှာ လျှင် မြွေ

Aspirated consonants

Burmese aspirates many consonants. In some cases these are separate characters, in other cases the aspiration is indicated using [U+103E MYANMAR CONSONANT SIGN MEDIAL HA​]. Aspirated sounds include the followingm,12, where the last six use MEDIAL HA.

ဖ␣ထ␣ခ␣ချ␣ခြ␣ဆ␣မှ␣နှ␣ညှ␣ငှ␣ဝှ␣လှ

For example words, see consonant_mappings.

Pali, Sanskrit, and other medials

Pali and Sanskrit texts written in the Myanmar script, as well as in older orthographies of Burmese, sometimes render the consonants YA, RA, WA and HA in subjoined form. In those cases, U+1039 MYANMAR SIGN VIRAMA and the regular form of the consonant are used.u,647

The old spelling of many words uses a fifth medial consonant, la swe, eg. ခ္လိုဝ်း kʰ͓liuwˣ² washwhich is produced using just a subjoined l, ie. ္လ [U+1039 MYANMAR SIGN VIRAMA + U+101C MYANMAR LETTER LA].

Final consonants

In native Burmese, 9 characters (5 nasals, င ဉ ည န မ NGA, NYA, NNYA, NA and MA, and 4 stops, က စ တ ပ KA, CA, TA, PA) appear in syllable final position.

In final position, nasals are pronounced as a nasalization of the previous vowel, eg. ရင် and all stops are pronounced ʔ, eg. မတ်

Syllable-final consonants carry a visible mark called asat (အသတ် ʔa̰θaʔ) to indicate that the inherent vowel is killed, eg. စက်တင်ဘာ

[U+103A MYANMAR SIGN ASAT​] is a character introduced in Unicode version 5.1 for this purpose. It is always visible, and does not produce stacking.

Some syllables ending in nasal consonants use 1036 rather than the ordinary consonant sign, eg. compare သိမ်း သုံး

(Note that the ASAT is also used in 3 multipart vowels to produce specific vowel+tone combinations. See compositeV.)

Consonant clusters

Consonant stacking

In many multi-syllabic words (mostly derived from Pali), consonants that have no intervening inherent vowel are arranged such that the consonant cluster is stacked. Stacked consonants of this kind are always doubled consonants or homorganic.wbs,#Stacked_consonants

The second consonant appears below the first, eg. မန္တလေး ဗုဒ္ဓ In some cases the lower character is abbreviated or reoriented, eg. က္ဌ to represent က်ဌ.

This effect is achieved by using the character 1039 between the consonants forming the cluster.

The virama is never visible.

Consonants may also be stacked in abbreviations of native Burmese words, in which case they may not be homorganic and vowels may be pronounced between the consonants. For example, လက်ဖက် is sometimes abbreviatedwbs,#Stacked_consonants to လ္ဘက် l͓ḃkˣ

Consonant repetition

Where the same consonant appears at the end of a syllable and the beginning of a new syllable in the same word they are commonly represented in the usual cluster form, eg. ပိန္နဲသီး

In a few Burmese words, however, a doubled consonant is represented by a single consonant plus asat, eg. ယောက်ျား ကျွန်ုပ် Note how this produces a situation where an asat is used between a consonant and a medial or vowel sign.h

A repeated [U+101E MYANMAR LETTER SA] can be represented using [U+103F MYANMAR LETTER GREAT SA]. In modern Burmese, appears within words, whereas သ်သ is used across word boundaries.l,3

Kinzi

When the first consonant in a consonant cluster is a non-word-final [U+1004 MYANMAR LETTER NGA] it rises over the following letter and keeps its virama, rather than pushing the following consonant below it, eg. အင်္ဂလန် This is called 'kinzi' (ကင်းစီး kɪ́ɴzí).

To achieve this, use the sequence င်္ [U+1004 MYANMAR LETTER NGA + U+103A MYANMAR SIGN ASAT + U+1039 MYANMAR SIGN VIRAMA], then continue with the next letter.

Consonant length

Burmese consonants are not typically lengthened or geminated. However, see crepetition for situations where one syllable ends and the next begins with the same consonant.

Consonant sounds to characters

This section maps Burmese consonant sounds to common graphemes in the Myanmar orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, Sanskrit, etc.

Stops

p
 

1015

b
 

1017

1018

1015 where affected by sandhi.

1016 where affected by sandhi.

 

1016

1018,  sometimes at the beginning of words or particles

t
 

1010

100B. Mostly archaic, used for Pali.

100C. Mostly archaic, used for Pali.

d
 

1012

1013

1010 where affected by sandhi.

1011 where affected by sandhi, but sometimes also word-initial.

100D. Mostly archaic, used for Pali.

100E. Mostly archaic, used for Pali.

 

1011

k
 

1000

ɡ
 

1002

1000 where affected by sandhi.

1001 where affected by sandhi.

1003, rare, primarily used in words of Pali origin.

 

1001

ʔ
 

1021 

–ပ [U+1015 MYANMAR LETTER PA] when final.

–တ [U+1010 MYANMAR LETTER TA] when final.

–က [U+1000 MYANMAR LETTER KA] when final.

Affricates

t͡ɕ
 

1000 103B

1000 103C

 

1002 103B

1002 103C 

t͡ɕʰ
 

1001 103B

1001 103C 

Fricatives

f
 

1015. For foreign sounds.

v
 

1017. For foreign sounds.

1017 103D For foreign sounds.

θ
 

101E

103F

ð
 

101E where affected by sandhi.

s
 

1005

z
 

1007

1008 rare.

1005 when affected by sandhi, but also irregularly in initial position.

1005 103B

1006 when affected by sandhi.

 

1006

h
 

101F

 

103E Medial.

Nasals

m
 

1019

 

1019 103E

n
 

1014

100F Mostly archaic, used for Pali.

 

1014 103E

ɲ
 

100A (Silent in final position.)

ɲ̥
 

100A 103E

ŋ
 

1004

ŋ̥
 

1004 103E

ɴ
 

–င [U+1004 MYANMAR LETTER NGA] when final.

–ဉ [U+1009 MYANMAR LETTER NYA] when final.

–မ [U+1019 MYANMAR LETTER MA] when final.

1036

Other

w
 

101D

103D Medial.

 

101D 103E

ɹ
 

101B, in loan words.

l
 

101C

1020 Rare.

 

101C 103E

j
 

101A

101B

103B Medial.

103C Medial.

Encoding choices

Myanmar is a script where different sequences of Unicode characters may produce the same visual result.

Independent vowels

It is strongly recommended to use the single code points on the left, rather than the sequences on the right, because they are not made the same by normalisation. Therefore the content will be regarded as different, which will affect searching and other operations on the text.

Code point Deprecated combination
102A သြော် [U+101E MYANMAR LETTER SA + U+103C MYANMAR CONSONANT SIGN MEDIAL RA + U+1031 MYANMAR VOWEL SIGN E + U+102C MYANMAR VOWEL SIGN AA + U+103A MYANMAR SIGN ASAT]
1029  101E 103C 

In the following case, the precomposed character decomposes in NFD, and re-forms again in NFC. It is generally recommended to use the precomposed character, but both forms are canonically equivalent.

Precomposed Decomposed
1026  1025 102E 

Combining mark order

The following indicates the expected ordering of Unicode combining marks for Burmese. The labels are those used for the Unicode Indic Syllabic Categories. Follow the links to see what characters are represented by a given label.

There are 3 types of combining character sequence (CCS).

The first type is a base plus Invisible_Stacker. This is the non-final part of a consonant cluster, and consists of just the base and the stacker.

The second occurs for syllable-final consonants and is a base with an asat (Pure_Killer). Although the asat may appear after a vowel, this type of CCS contains only the base and the asat, with an optional tone mark. In principle, the tone mark should appear at the end of the CCS, but normalization produces a different order where [U+103A MYANMAR SIGN ASAT​] occurs after [U+1037 MYANMAR SIGN DOT BELOW​] . Applications such as fonts should still handle this alternative order, since the sequences are canonically equivalent..

The remaining CCS type uses the following preferred ordering after a base.

  1. Consonant_Medial (4)
    1. 103B Pure_Killer ?
    2. 103C
    3. 103D
    4. 103E
  2. Vowel_Dependent (8)
    1. 1031
    2. 102D | 102E | 1032
    3. 102F | 1030
    4. ( 102C | 102B ) Pure_Killer ?
  3. Bindu
  4. Tone_Mark | Visarga

Ordering characters as shown above avoids potential ambiguities and maximises the likelihood of success when rendering the text.

Changes in Unicode 5.1

ℹ

In Unicode 5.0, [U+103A MYANMAR SIGN ASAT​] did not exist, and 1039 had to be used for both visible and non-visible viramas. This approach was problematic in that, since there are no spaces between words, it is not easy to automatically ascertain whether a virama should appear above a consonant or cause the stacking effect. For example, should my sequence of characters appear like this, အမ်မီတာ, or like this အမ္မီတာ? To get around this in Unicode 5.0 you needed to use a 200C (ZWNJ) after the virama if you wanted it to remain visible (ie. the first example above would have been transcribed as ʔmˣmïta and the second as ʔm͓mïta). The non-joiner prevents stacking. In practice, this meant that there were very many ZWNJ characters in Burmese text, since there are many syllable-final consonants needing ASAT, and typing in the Myanmar script was therefore much more time-consuming than it needed to be.

Unicode 5.1 also introduced dedicated medial consonants. This makes it easier to type Myanmar text, but also allows for easy distinction of subjoined variants of these consonants rather than the usual medial forms.

One or two other characters were introduced, such as the TALL AA.

Numbers, dates, currency, etc.

The Unicode Myanmar block includes two sets of digits. The following set is used for Myanmar, but also tends to be used for other languages, including those with their own scripts, such as Tai Nüa.

၀␣၁␣၂␣၃␣၄␣၅␣၆␣၇␣၈␣၉

Text direction

Myanmar text is written horizontally, left to right.

Show default bidi_class properties for characters in the Burmese orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

Burmese doesn't have any features relevant to cursive text, baselines, or character transforms.

You can experiment with examples using the Burmese character app.

Context-based shaping & positioning

Shaping

Context-based shaping is widespread when rendering Burmese text, and here we just list a few examples.

Glyphs for subjoined consonants tend to be smaller than their full forms, eg. သဒ္ဒါ and may be rotated, eg. က္ဌ .

The shape of [U+103C MYANMAR CONSONANT SIGN MEDIAL RA​] changes according to what it surrounds, eg. compare the two different widths in the words and note the shortening at the top right of the second word ကြက်သွန်ဖြူ ဝန်ကြီး The joining behaviour of [U+103B MYANMAR CONSONANT SIGN MEDIAL YA​] also differs, eg. compare ချက် ကျွန်မတို့

The asat varies its position and shape according to context, eg. လမ်း ဒေါ်လေး

The shape of NA changes when something appears below it, eg.နို့နဲ့ Similarly, the bottom of NYA also changes in the following context, ပဉ္စမ

Context-based positioning

Like context-based shaping, context-based positioning is also often needed for Burmese text, and here we simply offer some examples.

The placement of [U+1037 MYANMAR SIGN DOT BELOW​], used as a tone mark, varies slightly according to context, eg. ပြီးခဲ့တဲ့ တချို့ as does that of [U+103E MYANMAR CONSONANT SIGN MEDIAL HA​], eg. it is smaller than usual in ကောက်ညှင်း and the shape and position are very different in ရွှေပဲသီး

Other examples noted earlier include the change of shape and position of [U+102F MYANMAR VOWEL SIGN U​] and [U+1030 MYANMAR VOWEL SIGN UU​] when other items appear below the base consonant, and the production of the kinzi.

As a further example, note how the anusvara appears as a small circle above the consonant in this word, even though in memory it is separated from that consonant by codepoints for the medial RA and vowel sign U အပြုံး

Font styling & weight

Observation: Italicisation is used in Wikipedia for quotations and for citing the titles of articles or publications.

Slanted text used for a quotation in Wikipedia.

Graphemes

This section is still undergoing research and development.

Grapheme clusters alone are not sufficient to represent typographic units in Burmese. Stacks occur and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with 1039, which has an Indic Syllabic Category of Invisible_Stacker.

Burmese uses different code points for indicating syllable-final characters and for stacking, which makes it much easier to manage segmentation.

However, unusually, Burmese doesn't absorb all combining characters after a base into a single grapheme cluster. Right-rendered, spacing marks (vowel signs and tone marks) are grapheme clusters in their own right.

Grapheme clusters

(Base Combining_mark*) | Right_spacing_combining_mark | Tone_mark

Grapheme clusters only equate to Burmese typographic units some of the time. Furthermore, grapheme cluster boundaries for Myanmar text follow a specially called-out set of rules that include exceptions to the norm. Specifically, grapheme boundaries are introduced before spacing combining marks that are rendered to the right of the base, and before tone marks. These exceptions were introduced to support cursor movement, but are not useful for things like line-breaking.

ကား ကား

When a grapheme cluster is comprised of a base plus combining marks it may include zero or more of the following types of character:

  1. Medial consonants [4] (see onsets)
  2. Dependent vowels [6], but not spacing vowel signs that are rendered to the right (see combiningV)
  3. Tone mark [1] (see tone_marks)
  4. Final consonants [1] (see finals)
  5. Pure killer [1] (see finals)
  6. Invisible stacker [1] (see clusters)

Burmese grapheme clusters that include a syllable nucleus usually begin with a consonant, but can also begin with an independent vowel. If one classes 1021 as an independent vowel, it is one that can be followed by vowel signs and tone marks; but otherwise independent vowels are typically followed by a tone mark only, if anything. Multiple vowel signs may follow a base.

Syllable codas, if not written using the anusvara combining mark, are normally written using a consonant letter followed by 103A, which is always visible, and which doesn't create conjuncts.

The invisible stacker 1039 is used alone after a consonant to create a conjunct with a following grapheme cluster (see below).

The combining marks used in Burmese that start a new grapheme cluster include the following (there are others in the Myanmar Unicode blocks but they are used for other languages):

  1. Dependent vowels [2] that are rendered to the right and spacing (see combiningV)
  2. Tone mark [1] (see tone_marks)

The following examples show a variety of grapheme clusters:

Click on the text version of these words to see more detail about the composition.

ကြေးမုံကြေးမုံ
ထမင်းဆိုင်ထမင်းဆိုင်
အားကစား အားကစား
ပျော့တယ်ပျော့တယ်
ပိန္နဲသီးပိန္နဲသီး
အင်္ဂါအင်္ဂါ

Larger typographic units

(Base Invisible_Stacker)* Grapheme_cluster

Stacks generally appear only in word medial positions in Burmese (and do not occur across word boundaries, as they do in some other scripts). The kind of typographic unit that includes stacks cannot be realised using Unicode grapheme clusters, which create break after a virama rather than including the following consonants.

Editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.

Where conjuncts appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with 1039, and the final grapheme cluster begins with a consonant.

The kinzi is the sequence င်္ [U+1004 MYANMAR LETTER NGA + U+103A MYANMAR SIGN ASAT + U+1039 MYANMAR SIGN VIRAMA], and is unusual in that the first consonant in the cluster appears above the second, rather than the normal arrangement where the second is subjoined to the first. It is also unusual in that it does this even though the first consonant is followed by an asat sign. Nevertheless, it and the following consonant still follow the general orthographic syllable pattern.

The following words contain typographic units which begin with a consonant cluster:

Click on the text version of these words to see more detail about the composition.

ကြေးမုံကြေးမုံ
ထမင်းဆိုင်ထမင်းဆိုင်
အားကစား အားကစား
ပျော့တယ်ပျော့တယ်
ပိန္နဲသီးပိန္နဲသီး
အင်္ဂါအင်္ဂါ

Browser behaviour

Test in your browser. The 2 words on the left test grapheme clusters only; the 2 on the right include stacks. First, the text is displayed in a contenteditable paragraph, then in a textarea. Results are reported for Gecko (Firefox), Blink (Chrome), and WebKit (Safari) on a Mac.

ကား ကြေးမုံ ပိန္နဲသီး ဥစ္စာ

Cursor movement. Move the cursor through the text.
Gecko steps through the whole text using combining character sequences (CCS), rather than grapheme clusters. In other words, it doesn't stop before spacing combining characters rendered to the right of the base, or before tone marks, but treats them as part of the typographic unit that includes the base. It takes 2 steps to get through the stacks, one grapheme cluster at a time. Blink steps through all words using grapheme clusters, however this means that the cursor appears to stand still sometimes, or appears in the middle of a sequence. WebKit steps through using CSS's but also treats orthographic syllables as a single unit (ie. it steps over a stack and all associated combining characters in one jump).

Selection. Place the cursor next to a character and hold down shift while pressing an arrow key.
The behaviour is the same as for cursor movement. This has the effect of sometimes appearing to highlight backwards in Blink.

Deletion. Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point, except that WebKit deletes both the virama and the ZWJ at the same time.

Line-break. See this test. The CSS sets the value of the line-break property to anywhere. Pad the start of the container with characters to slowly push the text off the first line.
Gecko moves stacks to the next line as a single unit, except when the kinzi is involved, and otherwise the break points are assigned after each CCS, rather than by grapheme cluster. Blink and WebKit move stacks to the next line as a single unit, but otherwise moves text a grapheme cluster at a time – which means that combining characters can sit alone at the beginning of a line.

Punctuation & inline features

Word boundaries

Myanmar script doesn't visually separate words in a phrase.

There is, however, a concept of words. Native Burmese words are typically monosyllabic, but there are also multisyllabic words, and these should not be broken during line wrapping.

Consonants are not stacked across word boundaries. As consequence, it is easier to apply word-based line-breaking.

Phrase & section boundaries

၊␣:␣။␣?
phrase

0020

104A

003A

sentence

104B

003F

Spaces are used to separate phrases, rather than words. Phrase length is variable. Examples can be seen in the extract from the Declaration of Human Rights at the top of the page.

Punctuation is commonly limited to [U+104A MYANMAR SIGN LITTLE SECTION] and [U+104B MYANMAR SIGN SECTION], with significance close to comma and full stop, respectively.

Bracketed text

(␣)

Burmese commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

0028

0029

Quotations & citations

“␣”

Burmese texts typically use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

  start end
initial

201C

201D

Quoted speech may also be slanted (see fontstyle).

Emphasis

tbd

Abbreviation, ellipsis & repetition

Abbreviations

၌␣၍␣၎␣၏

104C is an abbreviation meaning 'locative marker', ie. 'at, in, on', used in Burmese literary form.

104D  means 'subordinate marker', used to connect two trains of thought, ie. 'so / because'. Used in Burmese literary form.

104E is used in the sequence ၎င်း as a demonstrative noun (this or that) when it precedes a noun. It is also used as a connecting phrase (meaning as well as) between two nouns within a clause. 

104F is used in Burmese literary form as a genitive that is written at the end of a sentence ending with a verb. It also marks possession of a preceding noun. It is used as a full stop if the sentence ends immediately with a verb.

Follow the links on the names of the above characters for more information.

Inline notes & annotations

tbd

Other punctuation

tbd

Other inline text decoration

tbd

Line & paragraph layout

Line breaking & hyphenation

If it is necessary to break text within a phrase, breaks can occur at syllable boundaries, but not within a word. The difficulty is that there is no visual information about which sequences of syllables consitute a word.

One way of detecting line-break opportunities is to use a dictionary to search for polysyllabic words, and then break at syllable boundaries outside the word. This approach may, however, run into problems when uncommon words or new words are used, especially those borrowed foreign terms.

A common approach is to break lines at phrase boundaries and then use justification is to adjust inter-phrase spacing.h,12

An alternative is to indicate break points by inserting 200B (ZWSP) between words when the content is developed.

Otherwise, you could tie the syllables in a polysyllabic word together using 2060 while authoring the content. This requires less intervention than adding ZWSP, since the number of polysyllabic words is smaller than the whole. Problems with that approach, currently, are that applications must be able to ignore the word joiner for searching, sorting, and the like. For this reason Hosken recommends against using it, and recommends instead the use of a dictionary with ZWSP backup for words that the dictionary doesn't handle well. However, it's not clear which words a dictionary will fail to recognise when the text is used across different platforms and applications, so this is not an ideal solution either – not to mention that it is difficult for an author to know in advance which words will cause problems and which won't .h,12

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the modern Burmese orthography.

The following list gives examples of typical behaviours for some of the characters used in modern Burmese. Context may affect the behaviour of some of these and other characters.

Click/tap on the Burmese characters to show what they are.

  • “ (   should not be the last character on a line.
  • ” ) : ! ? । ॥ %   should not begin a new line.

Text alignment & justification

Justification may begin by adjusting inter-phrase spacing.h,12

Text spacing

tbd

Baselines, line height, etc.

Burmese uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Burmese requires more vertical space than Latin text. To give an approximate idea, fig_baselines compares Latin and Burmese glyphs from Noto fonts. The basic height of Burmese letters is typically around the Latin x-height, however overall letter height and combining marks, extend beyond the Latin ascenders and descenders, creating a need for larger line spacing.

Xhqxလူတိုင်းပြည်ရှိသည့်၂၆ Xhqxလူတိုင်းပြည်ရှိသည့်၂၆
Font metrics for Latin text compared with Burmese glyphs in the Noto Serif Myanmar (top) and Noto Sans Myanmar (bottom) fonts.

fig_baselines_other shows similar comparisons for the Tharlon and Myanmar Text fonts.

Xhqxလူတိုင်းပြည်ရှိသည့်၂၆ Xhqxလူတိုင်းပြည်ရှိသည့်၂၆
Latin font metrics compared with Burmese glyphs in the Tharlon (top) and Myanmar Text (bottom) fonts.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles..

The Burmese language uses numeric and alphabetic styles.

Numeric

The myanmar numeric style is decimal-based and uses these digits.rmcs,#myanmar-styles

၀␣၁␣၂␣၃␣၄␣၅␣၆␣၇␣၈␣၉

Examples:

၁␣၂␣၃␣၄␣၅␣၁၁␣၂၂␣၃၃␣၄၄␣၁၁၁␣၂၂၂␣၃၃၃␣၄၄၄

Alphabetic

The burmese-consonant alphabetic style uses these letters.rmcs,#myanmar-styles

က␣ခ␣ဂ␣ဃ␣င␣စ␣ဆ␣ဇ␣ဈ␣ည␣ဋ␣ဌ␣ဍ␣ဎ␣ဏ␣တ␣ထ␣ဒ␣ဓ␣န␣ပ␣ဖ␣ဗ␣ဘ␣မ␣ယ␣ရ␣လ␣ဝ␣သ␣ဟ␣ဠ␣အ

Examples:

က␣ခ␣ဂ␣ဃ␣င␣ဋ␣ဖ␣အ␣ကဋ␣ဂဌ␣စဘ␣ညဂ␣ဍဏ

Prefixes and suffixes

The most common approach to writing lists in Burmese puts the counters in parentheses. An alternate style uses 104B after the counter. Full stops are not commonly used.

Examples:

The first 5 counters. The first 5 counters.
Separators for Burmese list counters: parentheses (left) and section mark (right).

Styling initials

tbd

Page & book layout

This section is for any features that are specific to thisScript and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

Notes, footnotes, etc

Numeric footnote references, using Shan numerals, in Wikipedia.

References