Updated 16 April, 2022
This page brings together basic information about the Hanifi Rohingya script and its use for the Rohingya language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Rohingya using Unicode.
𐴀𐴞𐴕𐴐𐴦𐴝𐴕 𐴁𐴠𐴒𐴧𐴟𐴕 𐴀𐴝𐴎𐴝𐴊𐴢 𐴀𐴝𐴌 𐴀𐴠𐴑𐴧𐴟 𐴉𐴥𐴟𐴖𐴝𐴙𐴕𐴝 𐴇𐴥𐴡𐴑 𐴀𐴝𐴌 𐴀𐴞𐴎𐴧𐴡𐴃𐴢 𐴓𐴡𐴌 𐴉𐴡𐴘𐴊𐴝 𐴀𐴥𐴡𐴘𐴧𐴠 ۔ 𐴀𐴥𐴞𐴃𐴝𐴘𐴝𐴃𐴧𐴟 𐴀𐴝𐴈𐴡𐴓 𐴀𐴝𐴌 𐴁𐴟𐴎 𐴀𐴥𐴡𐴘𐴧𐴠 ، 𐴀𐴥𐴠𐴃𐴡𐴓𐴧𐴝 𐴀𐴥𐴞𐴃𐴝𐴌𐴝𐴃𐴧𐴟 𐴀𐴠𐴑 𐴀𐴡𐴕 𐴀𐴝𐴌 𐴀𐴠𐴑 𐴎𐴡𐴕 𐴓𐴡𐴘 𐴁𐴤𐴝𐴘𐴧𐴡 𐴋𐴧𐴡𐴙𐴓𐴧𐴝 𐴔𐴦𐴝𐴔𐴠𐴓𐴝 𐴒𐴡𐴌𐴥𐴡𐴕 𐴏𐴝𐴀𐴝 ۔
Hanifi Rohingya is one of four scripts used for writing the Rohingya language, spoken by about 1,500,000 people, mostly in Myanmar, but also in significant Rohingya-speaking refugee communities in Bangladesh and Thailand.
The other scripts are Arabic, Latin (called Rohingyalish), and Myanmar.
Hanifi Rohingya is actively used in newspapers and books and on the Web. The inventor estimates that around 50 Rohingya community Schools in Bangladesh refugee camps are teaching Hanifi Rohingya, and another 2,000 are learning in Malaysia and Saudi Arabia. There are also a number of web sites and apps dedicated to the script.
𐴌𐴟𐴇𐴥𐴝𐴚𐴒𐴙𐴝 ɾuh²aŋgia ruˈhɪŋdʒa Rohingya
For over 200 years, the Rohingya language has been written in Arabic script, using several orthographies, one of which was developed in 1975, but didn't gain much traction. The Latin orthography. called Rohingyalish or Rohingya Fonna, was developed in 1999 in order to make it easier to write Rohingya on computers.
Around 1960, scholars began to see a need for a completely new writing system that was tailored closely to the needs of the language and that provided a focus point for Rohingya culture. In the 1980s this lead to the development of the Hanifi Rohingya script by Mohammad Hanif and his colleagues.
Sources Scriptsource and Endangered Alphabets.
The Hanifi Rohingya script is an alphabet. Both consonants and vowels are indicated by letters. See the table to the right for a brief overview of features for the Rohingya language.
Hanifi Rohingya is mostly a simple and largely phonetic orthography, clearly modelled on Arabic script, and yet with significant differences. All vowels are written as spacing letters. The only combining characters are for the 3 tones and a gemination marker. The script has no case distinction.
Hanifi Rohingya runs right to left in horizontal lines.
Words are separated by spaces.
The script is cursive, but mostly simple joins at the baseline.
Santali has 25 consonant letters.
There are 6 vowel letters, and 2 semi-vowels used in diphthongs. Standalone vowels, whether word-initial or -medial, are preceded by 𐴀 [U+10D00 HANIFI ROHINGYA LETTER A].
All vowels can be nasalised by following them with 𐴣 [U+10D23 HANIFI ROHINGYA MARK NA KHONNA]. Vowel length is affected by the application of the tones. Vowel absence is usually only marked at the end of a word and for certain character only by the addition of 𐴢 [U+10D22 HANIFI ROHINGYA MARK SAKIN].
There are 3 tone marks, all combining characters.
There is a set of native digit shapes.
Punctuation is a mixture of Western and Arabic, and some texts use punctuation like the Myanmar section dividers.
Justification involves stretching the baseline between characters.
Distinctive characteristics: cursive and tonal; optional word-final vowel-absence marker; letter for nasalisation; one letter that joins only to the left.
The following represents the repertoire of the Rohingya language.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones. Source Wikipedia.
Vowels can be nasalised.
Diphthongs often start or end in j or w, but vowels can also appear together with no intervening consonant.
|stop||p b||t d||ʈ ɖ||c ɟ||k ɡ||ʔ|
This section maps Rohingya vowel sounds to common graphemes in the Hanifi Rohingya orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc.
Rohingya uses the following vowel letters.
Rohingya vowel letters are all normal, spacing characters. With the exception of 𐴀 [U+10D00 HANIFI ROHINGYA LETTER A], they are never written alone, but have to follow a consonant.
A standalone vowel at the beginning of a word is always preceded by 𐴀 [U+10D00 HANIFI ROHINGYA LETTER A], which acts as a vowel carrier, eg. 𐴀𐴝𐴣𐴍𐴥𐴝𐴓𐴞 𐴀𐴦𐴟𐴘
Similarly, in a sequence of vowels inside a word, the non-initial vowels are preceded by the carrier, eg. 𐴅𐴝𐴕𐴟𐴀𐴝𐴌𐴞
When 𐴀 [U+10D00 HANIFI ROHINGYA LETTER A] occurs without a following vowel letter at the beginning or in the middle of a word, it represents the vowel ɔ, eg. 𐴁𐴡𐴀𐴌 𐴉𐴝𐴃𐴝
Long vowels are indicated using tone marks.
Diphthongs beginning or ending in the glides j or w can be written using 𐴙 [U+10D19 HANIFI ROHINGYA LETTER KINNA YA] and 𐴗 [U+10D17 HANIFI ROHINGYA LETTER KINNA WA], respectively. eg. 𐴉𐴗𐴝 𐴇𐴥𐴝𐴙𐴓𐴢
Observation: It appears that 𐴘 [U+10D18 HANIFI ROHINGYA LETTER YA] can also be used to create diphthongs, eg. 𐴁𐴝𐴘
Nasalised vowels are indicated by writing 𐴣 [U+10D23 HANIFI ROHINGYA MARK NA KHONNA] after the vowel, eg. 𐴔𐴦𐴝𐴣 𐴈𐴝𐴣𐴓𐴞
Vowel length is primarily affected by tone.
Observation: Also, it seems possible to repeat a vowel to lengthen a sound, eg. 𐴈𐴡𐴀𐴡 xoʔo xo:
Consonant clusters are typically written just as a sequence of consonant letters, eg. 𐴑𐴡𐴔𐴂𐴘𐴟𐴄𐴝𐴌
At the end of a word, or when just the consonant is written alone, 𐴢 [U+10D22 HANIFI ROHINGYA MARK SAKIN] is used after some consonants when there is no following vowel.
Consonants that typically take the sakin include the following:
And those that don't:
However, these rules are not hard and fast. For example, m may be written either way (see the example of 'turnip' below).
Examples: 𐴇𐴥𐴝𐴙𐴓𐴢 𐴐𐴥𐴝𐴓𐴒𐴡𐴔𐴢 𐴄𐴝𐴘𐴧𐴡𐴕 𐴋𐴥𐴠𐴙𐴇𐴝𐴑
Rohingya has 3 tones. They indicate stress and vowel length, and are indicated in writing using dedicated combining characters, as follows.
These usually appear above the consonant in a syllable, however in some fonts the mark drifts to the left, so that it appears between the consonant and the vowel, eg. 𐴁𐴤𐴝 ba¹ ba 𐴁𐴥𐴝 ba² ba 𐴁𐴦𐴝 ba³ ba
Observation: In order to achieve the best positioning using the Noto Sans Hanifi Rohingya and the Rohingya Noories One fonts, it is necessary to type the tone mark after the consonant (rather than after the vowel, as suggested in Pandey's script proposal).
When both 𐴧 [U+10D27 HANIFI ROHINGYA SIGN TASSI] and a tone mark appear together, it is important to type in and store both in the correct order for display. The tassi is typed and stored first,u,684 eg. 𐴔𐴡𐴙𐴅𐴧𐴤𐴙𐴠𐴊𐴠 moiɟ&¹iede
These diacritics may appear side-by-side, with the tone mark to the left, when they occur together.u,684
This section maps Rohingya consonant sounds to common graphemes in the Hanifi Rohingya orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc.
Some people have used 𐴜 [U+10D1C HANIFI ROHINGYA LETTER VA] to represent the sound v, although it was not formally approved as part of the script. The normal letter to use for both w and v is 𐴖 [U+10D16 HANIFI ROHINGYA LETTER WA].
Geminated consonant sounds are indicated using 𐴧 [U+10D27 HANIFI ROHINGYA SIGN TASSI], which is typed immediately after the consonant and before any following vowel, and which is rendered above the consonant letter, eg. compare the z sounds in the following 𐴎𐴥𐴞𐴘𐴡𐴎𐴧𐴝𐴔𐴝𐴘
Consonant clusters are written simply as a sequence of consonants, eg. 𐴑𐴟𐴌𐴥𐴏𐴞 𐴀𐴞𐴏𐴃𐴞𐴌𐴞 𐴇𐴥𐴠𐴚𐴒𐴝𐴌
Hanifi Rohingya has a set of native digits
Numbers are written left-to-right within the overall right-to-left flow.
Hanifi Rohingya text is written horizontally and right-to-left in the main but, as in most right-to-left scripts, numbers and embedded text in other scripts are written left-to-right (producing 'bidirectional' text).
The Unicode Bidirectional Algorithm automatically takes care of the ordering for all the text in fig_bidi, as long as the 'base direction' is set to RTL. In HTML this can be set using the
dir attribute, or in plain text using formatting controls.
bidi_class properties for characters in the Rohingya language.
For other aspects of dealing with right-to-left writing systems see the following sections:
For more information about how directionality and base direction work, see Unicode Bidirectional Algorithm basics. For information about plain text formatting characters see How to use Unicode controls for bidi text. And for working with markup in HTML, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
For authoring HTML pages, one of the most important things to remember is to use
<html dir="rtl" … > at the top of the page. Also, use markup to manage direction, and do not use CSS styling.
Unicode provides a set of 10 formatting characters that can be used to control the direction of text when displayed. These characters have no visual form in the rendered text, however text editing applications may have a way to show their location.
RLE [U+202B RIGHT-TO-LEFT EMBEDDING], LRE [U+202A LEFT-TO-RIGHT EMBEDDING], and PDF [U+202C POP DIRECTIONAL FORMATTING] are in widespread use to set the base direction of a range of characters. RLE/LRE come at the start, and PDF at the end of a range of characters for which the base direction is to be set.
More recently, the Unicode Standard added a set of characters which do the same thing but also isolate the content from surrounding characters, in order to avoid spillover effects. They are RLI [U+2067 RIGHT-TO-LEFT ISOLATE], LRI [U+2066 LEFT-TO-RIGHT ISOLATE], and PDI [U+2069 POP DIRECTIONAL ISOLATE]. The Unicode Standard recommends that these be used instead of RLE, LRE, and PDF.
There is also FSI [U+2068 FIRST STRONG ISOLATE], used initially to set the base direction according to the first recognised strongly-directional character.
ALM [U+061C ARABIC LETTER MARK] is used to produce correct sequencing of numeric data.
RLM [U+200F RIGHT-TO-LEFT MARK] and LRM [U+200E LEFT-TO-RIGHT MARK] are invisible characters with strong directional properties that are also sometimes used to produce the correct ordering of text.
For more information about how to use these formatting characters see How to use Unicode controls for bidi text. Note, however, that when writing HTML you should generally use markup rather than these control codes. For information about that, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.
You can experiment with examples using the Hanifi Rohingya character app.
The orthography has no case distinction, and no special transforms are needed to convert between characters.
Hanifi Rohingya is cursive, ie. letters in a word are joined up.
Nearly all letters join on both sides. 𐴢 [U+10D22 HANIFI ROHINGYA MARK SAKIN] joins only to the right. And 𐴀 [U+10D00 HANIFI ROHINGYA LETTER A] joins only to the left, which is very unusual. Fonts automatically produce the appropriate joining form for a code point, according to its visual context.
The cursive treatment produces only minor changes to glyph shapes in most cases, other than extensions to the baseline. 𐴔 [U+10D14 HANIFI ROHINGYA LETTER MA] is an exception, with different final and medial/initial shapes (see fig_joining_forms).
There is a style of font which behaves slightly differently, and appears to be quite commonly used. fig_cursive_triangle_noories shows an example. Note how the n doesn't join to the right, and the o falls short of the l. The glyphs in this font don't have joining strokes to the right, and taper off and barely touch (if they do) to the left.
The following tables show all joining forms.
Word units are separated by spaces.
: [U+003A COLON]
. [U+002E FULL STOP]
Observation: It seems to be standard practise to separate the punctuation from the foregoing text with a space.
Observation: Two online sites use punctuation that looks like the Burmese section marks, ၊ [U+104A MYANMAR SIGN LITTLE SECTION] and ။ [U+104B MYANMAR SIGN SECTION], except that they use different characters. One uses a single or double 𐴱 [U+10D31 HANIFI ROHINGYA DIGIT ONE], the other uses | [U+007C VERTICAL LINE] or |𐴱 [U+007C VERTICAL LINE + U+10D31 HANIFI ROHINGYA DIGIT ONE]. (The page from which the example in fig_section_signs is taken also uses other strange punctuation choices, such as the Arabic thousands separator instead of the Arabic comma, that can be seen in the bottom line of the example.)
It is important to note that the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.
The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones used for Rohingya.
|initial||” [U+201D RIGHT DOUBLE QUOTATION MARK]|
|nested||’ [U+2019 RIGHT SINGLE QUOTATION MARK]|
Unlike the bracketing quotation marks, these characters are not mirrored during display. This means that LEFT means use on the left, and RIGHT means use on the right.
Observation: Lines appear to be broken at word boundaries.
Show (default) line-breaking properties for characters in the Rohingya language.
When a line break occurs in the middle of an embedded left-to-right sequence, the items in that sequence need to be rearranged visually so that it isn't necessary to read lines from top to bottom.
Of course, the rearragement is only that of the visual glyphs: nothing affects the order of the characters in memory.
Examples of printed matter show full justification. A baseline extension is frequently used to stretch words in order to achieve flush lines (see fig_justification).
Pandey recommends using ـ [U+0640 ARABIC TATWEEL] for this.p,12 However, it should be noted that the tatweel character is only useful if the text is static. If window resizing or inserted text cause the line breaks to appear between different words, the tatweels will end up in the wrong place.
You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.
Hanifi Rohingya uses numeric counters.
The numeric style is decimal-based and uses these digits.
Observation: Further examples are needed to clarify the standard prefix and/or suffix for lists. The examples in fig_counters show circled numbers followed by a hyphen, and numbers followed by an equals sign.
This section is for any features that are specific to thisScript and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.
Arabic books, magazines, etc., are bound on the right-hand side, and pages progress from right to left.
Columns are vertical but run right-to-left across the page.