Updated 31 December, 2024
This page brings together basic information about the Hanifi Rohingya script and its use for the Rohingya language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Rohingya using Unicode.
Richard Ishida, Rohingya (Hanifi Rohingya) Orthography Notes, 31-Dec-2024, https://r12a.github.io/scripts/rohg/rhg
𐴀𐴞𐴕𐴐𐴦𐴝𐴕 𐴁𐴠𐴒𐴧𐴟𐴕 𐴀𐴝𐴎𐴝𐴊𐴢 𐴀𐴝𐴌 𐴀𐴠𐴑𐴧𐴟 𐴉𐴥𐴟𐴖𐴝𐴙𐴕𐴝 𐴇𐴥𐴡𐴑 𐴀𐴝𐴌 𐴀𐴞𐴎𐴧𐴡𐴃𐴢 𐴓𐴡𐴌 𐴉𐴡𐴘𐴊𐴝 𐴀𐴥𐴡𐴘𐴧𐴠 ۔ 𐴀𐴥𐴞𐴃𐴝𐴘𐴝𐴃𐴧𐴟 𐴀𐴝𐴈𐴡𐴓 𐴀𐴝𐴌 𐴁𐴟𐴎 𐴀𐴥𐴡𐴘𐴧𐴠 ، 𐴀𐴥𐴠𐴃𐴡𐴓𐴧𐴝 𐴀𐴥𐴞𐴃𐴝𐴌𐴝𐴃𐴧𐴟 𐴀𐴠𐴑 𐴀𐴡𐴕 𐴀𐴝𐴌 𐴀𐴠𐴑 𐴎𐴡𐴕 𐴓𐴡𐴘 𐴁𐴤𐴝𐴘𐴧𐴡 𐴋𐴧𐴡𐴙𐴓𐴧𐴝 𐴔𐴦𐴝𐴔𐴠𐴓𐴝 𐴒𐴡𐴌𐴥𐴡𐴕 𐴏𐴝𐴀𐴝 ۔
Source: Unicode UDHR, article 1
Origins of the Hanifi Rohingya script, 1980s – today.
Phoenician
└ Aramaic
└ Nabataean
└ Arabic
└ Hanifi Rohingya
+ N'Ko
+ Thaana
+ Persian
Hanifi Rohingya is one of four scripts used for writing the Rohingya language, spoken by about 1,500,000 people, mostly in Myanmar, but also in significant Rohingya-speaking refugee communities in Bangladesh and Thailand.
The other scripts are Arabic, Latin (called Rohingyalish), and Myanmar.
Hanifi Rohingya is actively used in newspapers and books and on the Web. The inventor estimates that around 50 Rohingya community Schools in Bangladesh refugee camps are teaching Hanifi Rohingya, and another 2,000 are learning in Malaysia and Saudi Arabia. There are also a number of web sites and apps dedicated to the script.
𐴌𐴟𐴇𐴥𐴝𐴚𐴒𐴙𐴝 ɾuh²aŋgia ruˈhɪŋdʒa Rohingya
For over 200 years, the Rohingya language has been written in Arabic script, using several orthographies, one of which was developed in 1975, but didn't gain much traction. The Latin orthography. called Rohingyalish or Rohingya Fonna, was developed in 1999 in order to make it easier to write Rohingya on computers.
Around 1960, scholars began to see a need for a completely new writing system that was tailored closely to the needs of the language and that provided a focus point for Rohingya culture. In the 1980s this lead to the development of the Hanifi Rohingya script by Mohammad Hanif and his colleagues.
More information: Scriptsource and Endangered Alphabets.
The Hanifi Rohingya script is an alphabet. Both consonants and vowels are indicated by letters. See the table to the right for a brief overview of features for the Rohingya language.
Hanifi Rohingya is mostly a simple and largely phonetic orthography, clearly modelled on Arabic script, and yet with significant differences. All vowels are written as spacing letters. The only combining characters are for the 3 tones and a gemination marker. The script has no case distinction.
Hanifi Rohingya runs right to left in horizontal lines. Words are separated by spaces.
The script is cursive, but mostly simple joins at the baseline.
Rohingya has 25 consonant letters.
Hanifi Rohingya indicates gemination, but consonant clusters are written simply as a sequence of consonants
❯ basicV
This orthography is an alphabet, and vowel sounds are written using 6 vowel letters, plus 2 semi-vowels used in diphthongs. Standalone vowel sounds, whether word-initial or -medial, are preceded by 𐴀.
All vowels can be nasalised by following them with 𐴣. Vowel length is affected by the application of the tones. Vowel absence is usually only marked at the end of a word and for certain characters only by the addition of 𐴢.
There are 3 tone marks, all combining characters.
There is a set of native digit shapes.
Punctuation is a mixture of Western and Arabic, and some texts use punctuation like the Myanmar section dividers.
Justification involves stretching the baseline between characters.
Distinctive characteristics: cursive and tonal; optional word-final vowel-absence marker; letter for nasalisation; one letter that joins only to the left.
The following represents the repertoire of the Rohingya language.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones. Source Wikipedia.
Vowels can be nasalised.
Diphthongs often start or end in j or w, but vowels can also appear together with no intervening consonant.
labial | alveolar | post- alveolar |
retroflex | palatal | velar | glottal | |
---|---|---|---|---|---|---|---|
stop | p b | t d | ʈ ɖ | c ɟ | k ɡ | ʔ | |
fricative | f | s z | ʃ | ç | x | h | |
nasal | m | n | ɲ | ŋ | |||
approximant | w | l | j | ||||
trill/flap | ɾ | ɽ | |||||
Rohingya has 3 tones. They indicate stress and vowel length.
tbd
The following table summarises the main vowel to character assigments.
The right-hand column shows word-initial forms.
Plain | |||
---|---|---|---|
Diphthongs |
For additional details see vowel_mappings.
Vowel sounds after a consonant are written using 6 vowel letters, plus 2 semi-vowels used in diphthongs.
Rohingya uses the following vowel letters.
Rohingya vowel letters are all normal, spacing characters.
Long vowels are indicated by the tone mark applied.
Rohingya also has dedicated letters to use as glides in diphthongs. They appear alongside the main vowel of the syllable.
Diphthongs beginning or ending in the glides j or w can be written using 𐴙 and 𐴗, respectively. eg. 𐴉𐴗𐴝 𐴇𐴥𐴝𐴙𐴓𐴢
Observation: It appears that 𐴘 can also be used to create diphthongs, eg. 𐴁𐴝𐴘
Vowel length is primarily affected by tone.
Observation: Also, it seems possible to repeat a vowel to lengthen a sound, eg. 𐴈𐴡𐴀𐴡 xoʔo xo:
Nasalised vowels are indicated by writing 𐴣 after the vowel, eg. 𐴔𐴦𐴝𐴣 𐴈𐴝𐴣𐴓𐴞
A standalone vowel at the beginning of a word is always preceded by 𐴀, which acts as a vowel carrier, eg. 𐴀𐴝𐴣𐴍𐴥𐴝𐴓𐴞 𐴀𐴦𐴟𐴘
Similarly, in a sequence of vowels inside a word, the non-initial vowels are preceded by the carrier, eg. 𐴅𐴝𐴕𐴟𐴀𐴝𐴌𐴞
When 𐴀 occurs without a following vowel letter at the beginning or in the middle of a word, it represents the vowel ɔ, eg. 𐴁𐴡𐴀𐴌 𐴉𐴝𐴃𐴝
Rohingya has 3 tones. They indicate stress and vowel length, and are indicated in writing using dedicated combining characters, as follows.
These usually appear above the consonant in a syllable, however in some fonts the mark drifts to the left, so that it appears between the consonant and the vowel, eg. 𐴁𐴤𐴝 ba¹ ba 𐴁𐴥𐴝 ba² ba 𐴁𐴦𐴝 ba³ ba
Observation: In order to achieve the best positioning using the Noto Sans Hanifi Rohingya and the Rohingya Noories One fonts, it is necessary to type the tone mark after the consonant (rather than after the vowel, as suggested in Pandey's script proposal).
When both 10D27 and a tone mark appear together, it is important to type in and store both in the correct order for display. The tassi is typed and stored first,u,684 eg. 𐴔𐴡𐴙𐴅𐴧𐴤𐴙𐴠𐴊𐴠 moiɟ&¹iede
These diacritics may appear side-by-side, with the tone mark to the left, when they occur together.u,684
Consonant clusters are typically written just as a sequence of consonant letters, eg. 𐴑𐴡𐴔𐴂𐴘𐴟𐴄𐴝𐴌
At the end of a word, or when just the consonant is written alone, 𐴢 is used after some consonants when there is no following vowel.
Consonants that typically take the sakin include the following:
And those that don't:
However, these rules are not hard and fast. For example, m may be written either way (see the example of 'turnip' below).
Examples: 𐴇𐴥𐴝𐴙𐴓𐴢 𐴐𐴥𐴝𐴓𐴒𐴡𐴔𐴢 𐴄𐴝𐴘𐴧𐴡𐴕 𐴋𐴥𐴠𐴙𐴇𐴝𐴑
This section maps Rohingya vowel sounds to common graphemes in the Hanifi Rohingya orthography.
Graphemes are labelled as either dependent (post-consonant) vowels or standalone vowels.
dependent 𐴞
standalone 𐴀𐴞
dependent 𐴞𐴣
dependent 𐴟
standalone 𐴀𐴟
dependent 𐴟𐴣
dependent 𐴠
standalone 𐴀𐴠
dependent 𐴠𐴣
dependent 𐴡
standalone 𐴀𐴡
dependent 𐴡𐴣
standalone 𐴀 when used without a following.
dependent 𐴝
standalone 𐴀𐴝
dependent 𐴝𐴣
dependent 𐴗
dependent 𐴙
The following table summarises the main consonant to character assigments.
Stops | |
---|---|
Fricatives | |
Nasals | |
Other |
For additional details see consonant_mappings.
Some people have used 𐴜 to represent the sound v, although it was not formally approved as part of the script. The normal letter to use for both w and v is 𐴖.
Consonant clusters are written simply as a sequence of consonants. 𐴑𐴟𐴌𐴥𐴏𐴞 𐴀𐴞𐴏𐴃𐴞𐴌𐴞 𐴇𐴥𐴠𐴚𐴒𐴝𐴌
Geminated consonant sounds are indicated using 10D27, which is typed immediately after the consonant and before any following vowel, and which is rendered above the consonant letter, eg. compare the z sounds in the following 𐴎𐴥𐴞𐴘𐴡𐴎𐴧𐴝𐴔𐴝𐴘
This section maps Rohingya consonant sounds to common graphemes in the Hanifi Rohingya orthography.
The right-hand column shows joining forms for the letter.
10D0210D0210D0210D02 consonant 𐴂
10D0110D0110D0110D01 consonant 𐴁
10D0310D0310D0310D03 consonant 𐴃
10D0A10D0A10D0A10D0A consonant 𐴊
10D0410D0410D0410D04 consonant 𐴄
10D0B10D0B10D0B10D0B consonant 𐴋
10D0610D0610D0610D06 consonant 𐴆
10D0510D0510D0510D05 consonant 𐴅
10D1110D1110D1110D11 consonant 𐴑
10D1210D1210D1210D12 consonant 𐴒
10D0910D0910D0910D09 consonant 𐴉
10D1610D1610D1610D16 consonant 𐴖
10D0F10D0F10D0F10D0F consonant 𐴏
10D0E10D0E10D0E10D0E consonant 𐴎
10D1010D1010D1010D10 consonant 𐴐
10D0810D0810D0810D08 consonant 𐴈
10D0710D0710D0710D07 consonant 𐴇
10D1410D1410D1410D14 consonant 𐴔
10D1510D1510D1510D15 consonant 𐴕
10D1B10D1B10D1B10D1B consonant 𐴛
10D1A10D1A10D1A10D1A consonant 𐴚
10D1610D1610D1610D16 consonant 𐴖
10D1710D1710D1710D17 semivowel 𐴗 Semivowel.
10D0C10D0C10D0C10D0C consonant 𐴌
10D0D10D0D10D0D10D0D consonant 𐴍
10D1310D1310D1310D13 consonant 𐴓
10D1810D1810D1810D18 consonant 𐴘
10D1910D1910D1910D19 semivowel 𐴙 Semivowel.
Hanifi Rohingya has a set of native digits
Numbers are written left-to-right within the overall right-to-left flow.
Hanifi Rohingya text is written horizontally and right-to-left in the main but, as in most right-to-left scripts, numbers and embedded text in other scripts are written left-to-right (producing 'bidirectional' text).
The Unicode Bidirectional Algorithm automatically takes care of the ordering for all the text in fig_bidi, as long as the 'base direction' is set to RTL. In HTML this can be set using the dir
attribute, or in plain text using formatting controls.
Show default bidi_class
properties for characters in the Rohingya language.
For other aspects of dealing with right-to-left writing systems see the following sections:
For more information about how directionality and base direction work, see Unicode Bidirectional Algorithm basics. For information about plain text formatting characters see How to use Unicode controls for bidi text. And for working with markup in HTML, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
For authoring HTML pages, one of the most important things to remember is to use <html dir="rtl" … >
at the top of the page. Also, use markup to manage direction, and do not use CSS styling.
Unicode provides a set of 10 formatting characters that can be used to control the direction of text when displayed. These characters have no visual form in the rendered text, however text editing applications may have a way to show their location.
202B (RLE), 202A (LRE), and 202C (PDF) are in widespread use to set the base direction of a range of characters. RLE/LRE comes at the start, and PDF at the end of a range of characters for which the base direction is to be set.
In Unicode 6.1, the Unicode Standard added a set of characters which do the same thing but also isolate the content from surrounding characters, in order to avoid spillover effects. They are 2067 (RLI), 2066 (LRI), and 2066 (PDI). The Unicode Standard recommends that these be used instead.
There is also 2068 (FSI), used initially to set the base direction according to the first recognised strongly-directional character.
061C (ALM) is used to produce correct sequencing of numeric data. Follow the link and see expressions for details.
200F (RLM) and 200E (LRM) are invisible characters with strong directional properties that are also sometimes used to produce the correct ordering of text.
For more information about how to use these formatting characters see How to use Unicode controls for bidi text. Note, however, that when writing HTML you should generally use markup rather than these control codes. For information about that, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
Experiment with examples using the Hanifi Rohingya character app.
Hanifi Rohingya is cursive, ie. letters in a word are joined up.
Nearly all letters join on both sides. 𐴢 joins only to the right. And 𐴀 joins only to the left, which is very unusual. Fonts automatically produce the appropriate joining form for a code point, according to its visual context.
The cursive treatment produces only minor changes to glyph shapes in most cases, other than extensions to the baseline. 𐴔 is an exception, with different final and medial/initial shapes (see fig_joining_forms).
There is a style of font which behaves slightly differently, and appears to be quite commonly used. fig_cursive_triangle_noories shows an example. Note how the n doesn't join to the right, and the o falls short of the l. The glyphs in this font don't have joining strokes to the right, and taper off and barely touch (if they do) to the left.
The following tables show all joining forms.
isolated | right-joined | dual-join | left-joined | joining groups |
---|---|---|---|---|
𐴂 | ـ𐴂 | ـ𐴂ـ | 𐴂ـ | |
𐴁 | ـ𐴁 | ـ𐴁ـ | 𐴁ـ | |
𐴃 | ـ𐴃 | ـ𐴃ـ | 𐴃ـ | |
𐴊 | ـ𐴊 | ـ𐴊ـ | 𐴊ـ | |
𐴄 | ـ𐴄 | ـ𐴄ـ | 𐴄ـ | |
𐴋 | ـ𐴋 | ـ𐴋ـ | 𐴋ـ | |
𐴆 | ـ𐴆 | ـ𐴆ـ | 𐴆ـ | |
𐴅 | ـ𐴅 | ـ𐴅ـ | 𐴅ـ | |
𐴑 | ـ𐴑 | ـ𐴑ـ | 𐴑ـ | |
𐴒 | ـ𐴒 | ـ𐴒ـ | 𐴒ـ | |
𐴏 | ـ𐴏 | ـ𐴏ـ | 𐴏ـ | |
𐴎 | ـ𐴎 | ـ𐴎ـ | 𐴎ـ | |
𐴐 | ـ𐴐 | ـ𐴐ـ | 𐴐ـ | |
𐴈 | ـ𐴈 | ـ𐴈ـ | 𐴈ـ | |
𐴇 | ـ𐴇 | ـ𐴇ـ | 𐴇ـ | |
𐴔 | ـ𐴔 | ـ𐴔ـ | 𐴔ـ | |
𐴕 | ـ𐴕 | ـ𐴕ـ | 𐴕ـ | |
𐴛 | ـ𐴛 | ـ𐴛ـ | 𐴛ـ | |
𐴚 | ـ𐴚 | ـ𐴚ـ | 𐴚ـ | |
𐴖 | ـ𐴖 | ـ𐴖ـ | 𐴖ـ | |
𐴗 | ـ𐴗 | ـ𐴗ـ | 𐴗ـ | |
𐴌 | ـ𐴌 | ـ𐴌ـ | 𐴌ـ | |
𐴍 | ـ𐴍 | ـ𐴍ـ | 𐴍ـ | |
𐴓 | ـ𐴓 | ـ𐴓ـ | 𐴓ـ | |
𐴘 | ـ𐴘 | ـ𐴘ـ | 𐴘ـ | |
𐴙 | ـ𐴙 | ـ𐴙ـ | 𐴙ـ | |
𐴟 | ـ𐴟 | ـ𐴟ـ | 𐴟ـ | |
𐴡 | ـ𐴡 | ـ𐴡ـ | 𐴡ـ | |
𐴝 | ـ𐴝 | ـ𐴝ـ | 𐴝ـ | |
𐴢 | ـ𐴢 | ـ𐴢ـ | 𐴢ـ |
isolated | left-joined | letters |
---|---|---|
𐴀 | 𐴀ـ |
isolated | right-joined | letters |
---|---|---|
𐴢 | ـ𐴢 |
tbd
See just above for shaping related to cursive joining.
Words are separated by spaces.
tbd
Rohingya uses a mixture of Arabic and ASCII punctuation, and may also use Myanmar signs.
phrase |
، ؛ : |
---|---|
sentence |
۔ . ؟ ! |
Observation: It seems to be standard practise to separate the punctuation from the foregoing text with a space.
Observation: Two online sites use punctuation that looks like the Burmese section marks, ၊ and ။, except that they use different characters. One uses a single or double 𐴱 [U+10D31 HANIFI ROHINGYA DIGIT ONE], the other uses | [U+007C VERTICAL LINE] or |𐴱 [U+007C VERTICAL LINE + U+10D31 HANIFI ROHINGYA DIGIT ONE]. (The page from which the example in fig_section_signs is taken also uses other strange punctuation choices, such as the Arabic thousands separator instead of the Arabic comma, that can be seen in the bottom line of the example.)
Rohingya commonly uses ASCII parentheses to insert parenthetical information into text.
start | end | |
---|---|---|
standard | ( |
) |
It is important to note that the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.
The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones used for Rohingya.
Rohingya texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks. Note, however, that the order of use is different from that in LTR text, because they are not automatically mirrored.
start | end | |
---|---|---|
initial |
“ |
” |
nested |
‘ |
’ |
Unlike the bracketing quotation marks, these characters are not mirrored during display. This means that LEFT means use on the left, and RIGHT means use on the right.
tbd
Observation: Lines appear to be broken at word boundaries.
Show (default) line-breaking properties for characters in the Rohingya language.
When a line break occurs in the middle of an embedded left-to-right sequence, the items in that sequence need to be rearranged visually so that it isn't necessary to read lines from top to bottom.
Of course, the rearragement is only that of the visual glyphs: nothing affects the order of the characters in memory.
Examples of printed matter show full justification. A baseline extension is frequently used to stretch words in order to achieve flush lines (see fig_justification).
Pandey recommends using ـ for this.p,12 However, it should be noted that the tatweel character is only useful if the text is static. If window resizing or inserted text cause the line breaks to appear between different words, the tatweels will end up in the wrong place.
tbd
Rohingya uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.
You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.
Hanifi Rohingya uses numeric counters.
The numeric style is decimal-based and uses these digits.
Examples:
Observation: Further examples are needed to clarify the standard prefix and/or suffix for lists. The examples in fig_counters show circled numbers followed by a hyphen, and numbers followed by an equals sign.
Arabic books, magazines, etc., are bound on the right-hand side, and pages progress from right to left.
Columns are vertical but run right-to-left across the page.