Arabic script homographs

Updated 24 March, 2023

This page lists a number of Arabic script characters and character sequences that look the same, given the right font, but that may need to be used with caution because the alternative forms differ in usage and meaning.

Ijam, tashkil, hamza

Chapter 9 of the Unicode Standard makes an important distinction between ijam and tashkil diacritics.

إِعْجَام An ijam is a diacritic in the Arabic script that is considered to be an integral part of a basic letter form, such as the dots in ث [U+062B ARABIC LETTER THEH], pronounced θ. Unicode encodes letter+ijam combinations as atomic characters, which are never given equivalent decompositions in the standard. Ijam generally take the form of one-, two-, three- or four-dot markings above or below the basic letter skeleton, although other diacritic forms occur, especially in extensions of the Arabic script in Central and South Asia and in Africa. For example, ۈ [U+06C8 ARABIC LETTER YU] is a letter with ijam that represents the vowel y in the Uighur orthography.

تَشْكِيل A tashkil is an Arabic script mark that indicates vocalization of text or other types of phonetic guide that indicate pronunciation, such as in ثَ [U+062B ARABIC LETTER THEH + U+064E ARABIC FATHA], pronounced θa. These include several subtypes: harakat (short vowel marks), tanwin (postnasalized or long vowel marks), shaddah (consonant gemination mark), and sukun (to mark lack of a following vowel). A basic Arabic letter plus any of these types of marks is never encoded as an atomic, precomposed character, but must always be represented as a sequence of letter plus a separate combining mark. For example, هٰ [U+0647 ARABIC LETTER HEH + U+0670 ARABIC LETTER SUPERSCRIPT ALEF] pronounced ha, is an example of a letter plus tashkil combination in Arabic (cf. use of that diacitic as part of a precomposed character in Uighur).

This distinction between using a character with ijam instead of combining a letter with a tashkil becomes important when choosing which Unicode characters to use because (as can be seen in the examples above) the visual forms can be identical. Using the wrong character can change the meaning of the text, affecting the results of text search, font rendering, text to speech, etc.

There are, however, some very common combinations of diacritic and base that can be represented using precomposed characters or decomposed sequences that are canonically equivalent. For those the standard encourages the use of the precomposed form, but the fact that the forms are canonically equivalent removes concerns about changes in meaning.

هَمْزة The hamza is another Arabic script mark that may be precomposed with a letter in some code points, or attached to a letter as a combining mark in others. It is not regarded as a tashkil. It is typically used for the Arabic language to represent the glottal stop, or in Persian or Urdu as the ezafe, but it has other uses in extended orthographies. For example, it represents a vowel in Kashmiri, and as such can appear above a number of letters for which there are no precomposed alternatives.

The Arabic letter yeh is associated with some special, idiosynchratic rules when it comes to the hamza.

Homographs

The tables that follow provide a non-exhaustive list of homographs, and provide information about usage where that can be found. Notes are only illustrative.

Canonically-equivalent homographs

In these cases, either a precomposed character or decomposed sequence can be used because they are canonically equivalent.

Diacritic Precomposed Decomposed
ٓ آ [U+0622 ARABIC LETTER ALEF WITH MADDA ABOVE]آ [U+0627 ARABIC LETTER ALEF + U+0653 ARABIC MADDAH ABOVE]
ٔ أ [U+0623 ARABIC LETTER ALEF WITH HAMZA ABOVE]أ [U+0627 ARABIC LETTER ALEF + U+0654 ARABIC HAMZA ABOVE]
ؤ [U+0624 ARABIC LETTER WAW WITH HAMZA ABOVE]
ؤ [U+0648 ARABIC LETTER WAW + U+0654 ARABIC HAMZA ABOVE]
ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE]
ئ [U+064A ARABIC LETTER YEH + U+0654 ARABIC HAMZA ABOVE]
ۓ [U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE] ۓ [U+06D2 ARABIC LETTER YEH BARREE + U+0654 ARABIC HAMZA ABOVE]
ٕ إ [U+0625 ARABIC LETTER ALEF WITH HAMZA BELOW] إ [U+0627 ARABIC LETTER ALEF + U+0655 ARABIC HAMZA BELOW]

Yeh & hamza

The letter yeh with hamza above has rather complicated rules, due to the way it was encoded and the need to accommodate different dot patterns.

Diacritic Do not use Do use
ٔ when you want no dots in any positional form. ىٔ [U+0649 ARABIC LETTER ALEF MAKSURA + U+0654 ARABIC HAMZA ABOVE]  ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE] or
ئ [U+064A ARABIC LETTER YEH + U+0654 ARABIC HAMZA ABOVE]
(canonically equivalent)
 Although this has dots below with any other mark, fonts should remove those dots when combined with hamza.
ٔ when you do want dots in all positional forms. ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE] [U+08A8 ARABIC LETTER YEH WITH TWO DOTS BELOW AND HAMZA ABOVE]
eg. in Adamawa Fulfulde for
ٔ when you want dots in initial & medial positional forms only. ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE] یٔ [U+06CC ARABIC LETTER FARSI YEH + U+0654 ARABIC HAMZA ABOVE]

Homographs that are not equivalent

The following table is a non-exhaustive list of precomposed characters that are not canonically equivalent to letter+mark visual equivalents, and therefore choosing the wrong form may affect the semantics of the text and prevent success when searching. Examples of use are shown for many.

Diacritic   Precomposed Decomposed
ٔ ࢡ [U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE]
eg. Fulfulde ɓ
بٔ [U+0628 ARABIC LETTER BEH + U+0654 ARABIC HAMZA ABOVE]
eg. Kashmiri
ݬ ݬ [U+076C ARABIC LETTER REH WITH HAMZA ABOVE]
eg. Ormuri ʑ
رٔ [U+0631 ARABIC LETTER REH + U+0654 ARABIC HAMZA ABOVE]
eg. Kashmiri
ځ ځ [U+0681 ARABIC LETTER HAH WITH HAMZA ABOVE]
eg. Pashto d͡z
حٔ [U+062D: ARABIC LETTER HAH + U+0654: ARABIC HAMZA ABOVE]
eg. Kashmiri
ٟ ٳ ٳ [U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW]
Deprecated & strongly discouraged by the Unicode Standard.

اٟ [U+0627 ARABIC LETTER ALEF + U+065F ARABIC WAVY HAMZA BELOW]
Kashmiri ɨː

 

ٴ ٵ ٵ [U+0675 ARABIC LETTER HIGH HAMZA ALEF]
Dangerous because compatibility decompositions result in a hamza to the left. Unicode therefore recommends to avoid using this character.
ٴا [U+0674 ARABIC LETTER HIGH HAMZA + U+0627 ARABIC LETTER ALEF]
eg. Kazakh æ, preferred spelling.
ٶ ٶ [U+0676 ARABIC LETTER HIGH HAMZA WAW]
ditto
ٴو [U+0674 ARABIC LETTER HIGH HAMZA + U+0648 ARABIC LETTER WAW]
eg. Kazakh ø̞, preferred spelling.
ٸ ٸ [U+0678 ARABIC LETTER HIGH HAMZA YEH]
ditto
ٴي [U+0674 ARABIC LETTER HIGH HAMZA + U+064A ARABIC LETTER YEH]
eg. Kazakh ɪ
ٷ ٷ [U+0677 ARABIC LETTER U WITH HAMZA ABOVE]
ditto

ٴۇ [U+0674 ARABIC LETTER HIGH HAMZA + U+06C7 ARABIC LETTER U]
eg. Kazakh ʏ

ٛ ؽ ؽ [U+063D ARABIC LETTER FARSI YEH WITH INVERTED V]
eg. Azerbaijani vowel ɯ
یٛ [U+06CC ARABIC LETTER FARSI YEH + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
eg. Fulfulde consonant+o (applies also to the cells that follow)
ۉ ۉ [U+06C9 ARABIC LETTER KIRGHIZ YU]
eg. Kyrgyz y
وٛ [U+0648 ARABIC LETTER WAW + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ۯ ۯ [U+06EF ARABIC LETTER REH WITH INVERTED V]
Parkari ɭ
رٛ [U+0631 ARABIC LETTER REH + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ۮ ۮ [U+06EE ARABIC LETTER DAL WITH INVERTED V]
Parkari ɗ
دٛ [U+062F ARABIC LETTER DAL + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ۿ ۿ [U+06FF ARABIC LETTER HEH WITH INVERTED V]
Parkari ɦ
هٛ [U+0647 ARABIC LETTER HEH + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ࢲ [U+08B2 ARABIC LETTER ZAIN WITH INVERTED V ABOVE]
Berber
زٛ [U+0632 ARABIC LETTER ZAIN + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ࢉ [U+0889 ARABIC LETTER NOON WITH INVERTED SMALL V]
Arebica ɲ
ںٛ [U+06BA ARABIC LETTER NOON GHUNNA + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ࢊ [U+088A ARABIC LETTER HAH WITH INVERTED SMALL V BELOW]
Arebica t͡ɕ
حٛ [U+062D ARABIC LETTER HAH + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
Diacritic is incorrectly positioned.
ݾ ݾ [U+077E ARABIC LETTER SEEN WITH INVERTED V]
Early Persian
سٛ [U+0633 ARABIC LETTER SEEN + U+065B ARABIC VOWEL SIGN INVERTED SMALL V ABOVE]
ٚ ۆ ۆ [U+06C6 ARABIC LETTER OE]
Kazakh v
وٚ [U+0648 ARABIC LETTER WAW + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
Kashmiri consonant+e (applies also to other cells below).
ێ ێ [U+06CE ARABIC LETTER YEH WITH SMALL V]
Sorani e
یٚ [U+06CC ARABIC LETTER FARSI YEH + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ࣀ [U+08C0 ARABIC LETTER TTEH WITH SMALL V]
Hindko
ٹٚ [U+0679 ARABIC LETTER TTEH + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ࣁ [U+08C1 ARABIC LETTER TCHEH WITH SMALL V]
Hindko
چٚ [U+0686 ARABIC LETTER TCHEH + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ࣂ [U+08C2 ARABIC LETTER KEHEH WITH SMALL V]
Hindko
کٚ [U+06A9 ARABIC LETTER KEHEH + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ࢾ [U+08BE ARABIC LETTER PEH WITH SMALL V]
Hindko
پٚ [U+067E ARABIC LETTER PEH + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ࢿ [U+08BF ARABIC LETTER TEH WITH SMALL V]
Hindko
تٚ [U+062A ARABIC LETTER TEH + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ݩ ݩ [U+0769 ARABIC LETTER NOON WITH SMALL V]
eg. Gorji, & Arebica ɲ
نٚ [U+0646 ARABIC LETTER NOON + U+065A ARABIC VOWEL SIGN SMALL V ABOVE]
ُ ۇ ۇ [U+06C7 ARABIC LETTER U]
eg. Uyghur u
وُ [U+0648 ARABIC LETTER WAW + U+064F ARABIC DAMMA]
eg. Arabic wu
ٰ ۈ ۈ [U+06C8 ARABIC LETTER YU]
eg. Uyghur y
وٰ [U+0648 ARABIC LETTER WAW + U+0670 ARABIC LETTER SUPERSCRIPT ALEF]
eg. Arabic ha
ؕ ٹ ٹ [U+0679 ARABIC LETTER TTEH]
eg. Urdu ʈ
ٮؕ [U+066E ARABIC LETTER DOTLESS BEH + U+0615 ARABIC SMALL HIGH TAH]
Marks a recommended pause position in some Qurans published in Iran and Pakistan. The Unicode Standard warns not to confuse with the diacritic the diacritic used in the precomposed characters !
ڋ ڋ [U+068B ARABIC LETTER DAL WITH DOT BELOW AND SMALL TAH]
eg. Lahnda
ڊؕ [U+068A ARABIC LETTER DAL WITH DOT BELOW + U+0615 ARABIC SMALL HIGH TAH]
ditto
ڻ ڻ [U+06BB ARABIC LETTER RNOON]
eg. Sindhi ɳ
ںؕ [U+06BA ARABIC LETTER NOON GHUNNA + U+0615 ARABIC SMALL HIGH TAH]
ditto
ࣇ [U+08C7 ARABIC LETTER LAM WITH SMALL ARABIC LETTER TAH ABOVE]
eg. Punjabi ɭ
لؕ [U+0644 ARABIC LETTER LAM + U+0615 ARABIC SMALL HIGH TAH]
ditto
ݰ ݰ [U+0770 ARABIC LETTER SEEN WITH SMALL ARABIC LETTER TAH AND TWO DOTS]
eg. Khowar ʂ
ݱ ݱ [U+0771 ARABIC LETTER REH WITH SMALL ARABIC LETTER TAH AND TWO DOTS]
eg. Khowar ʐ
ݲ ݲ [U+0772 ARABIC LETTER HAH WITH SMALL ARABIC LETTER TAH ABOVE]
eg. Torwali
ݨ ݨ [U+0768 ARABIC LETTER NOON WITH SMALL TAH]
eg. Saraiki, Pathwari ɳ
نؕ [U+0646 ARABIC LETTER NOON + U+0615 ARABIC SMALL HIGH TAH]
ۢ ࢶ [U+08B6 ARABIC LETTER BEH WITH SMALL MEEM ABOVE]
Bravanese mba
بۢ [U+0628 ARABIC LETTER BEH + U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM]
Quranic annotation form.
ࢷ [U+08B7 ARABIC LETTER PEH WITH SMALL MEEM ABOVE]
Bravanese mpa
پۢ [U+067E ARABIC LETTER PEH + U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM]
Quranic annotation form
ۨ ࢹ [U+08B9 ARABIC LETTER REH WITH SMALL NOON ABOVE]
Bravanese nra
رۨ [U+0631 ARABIC LETTER REH + U+06E8 ARABIC SMALL HIGH NOON]
Quranic annotation form.
ࢺ [U+08BA ARABIC LETTER YEH WITH TWO DOTS BELOW AND SMALL NOON ABOVE]
Bravanese nja
يۨ [U+064A ARABIC LETTER YEH + U+06E8 ARABIC SMALL HIGH NOON]
Quranic annotation form.
﮶ ࣃ [U+08C3 ARABIC LETTER GHAIN WITH THREE DOTS ABOVE]
Hausa ɡʷ
غ﮶ [U+063A ARABIC LETTER GHAIN + U+FBB6 ARABIC SYMBOL THREE DOTS ABOVE]
ࣄ [U+08C4 ARABIC LETTER AFRICAN QAF WITH THREE DOTS ABOVE]
eg. Hausa ƙʷ
ࢼ﮶ [U+08BC ARABIC LETTER AFRICAN QAF + U+FBB6 ARABIC SYMBOL THREE DOTS ABOVE]
ࣅ [U+08C5 ARABIC LETTER JEEM WITH THREE DOTS ABOVE]
Hausa, Wolof or other African orthographies
ج﮶ [U+062C ARABIC LETTER JEEM + U+FBB6 ARABIC SYMBOL THREE DOTS ABOVE]