Arabic

Updated 25 July, 2017 • tags arabic, scriptnotes

This page provides basic information about the Arabic script. For similar information related to other scripts, see the Script comparison table.

Click on red text examples or highlight part of the sample text to see a list of characters. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

For more details see: Character notes Script links

Samples

Arabic

المادة 1 يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.

المادة 2 لكل إنسان حق التمتع بكافة الحقوق والحريات الواردة في هذا الإعلان، دون أي تمييز، كالتمييز بسبب العنصر أو اللون أو الجنس أو اللغة أو الدين أو الرأي السياسي أو أي رأي آخر، أو الأصل الوطني أو الإجتماعي أو الثروة أو الميلاد أو أي وضع آخر، دون أية تفرقة بين الرجال والنساء. وفضلاً عما تقدم فلن يكون هناك أي تمييز أساسه الوضع السياسي أو القانوني أو الدولي لبلد أو البقعة التي ينتمي إليها الفرد سواء كان هذا البلد أو تلك البقعة مستقلاً أو تحت الوصاية أو غير متمتع بالحكم الذاتي أو كانت سيادته خاضعة لأي قيد من القيود.

Urdu (nastaliq style)

دفعہ ۱۔ تمام انسان آزاد اور حقوق و عزت کے اعتبار سے برابر پیدا ہوئے ہیں۔ انہیں ضمیر اور عقل ودیعت ہوئی ہے۔ اس لئے انہیں ایک دوسرے کے ساتھ بھائی چارے کا سلوک کرنا چاہیئے۔

دفعہ ۲۔ ہر شخص ان تمام آزادیوں اور حقوق کا مستحق ہے جو اس اعلان میں بیان کئے گئے ہیں، اور اس حق پر نسل، رنگ، جنس، زبان، مذہب اور سیاسی تفریق کا یا کسی قسم کے عقیدے، قوم، معاشرے، دولت یا خاندانی حیثیت وغیرہ کا کوئی اثر نہ پڑے گا۔ اس کے علاوہ جس علاقے یا ملک سے جو شخص تعلق رکھتا ہے اس کی سیاسی کیفیت دائرہ اختیار یا بین الاقوامی حیثیت کی بنا پر اس سے کوئی امتیازی سلوک نہیں کیا جائے گا۔ چاہے وہ ملک یا علاقہ آزاد ہو یا تولیتی ہو یا غیر مختار ہو یا سیاسی اقتدار کے لحاظ سے کسی دوسری بندش کا پابند ہو۔

Key features

Arabic is an abjad. This means that in normal use the script represents only consonant and long vowel sounds. This approach is helped by the strong emphasis on consonant patterns in Semitic languages, however the Arabic script is also used for other kinds of language (such as the Indo-European Urdu).

For more information see ScriptSource, Wikipedia or Omniglot.

Text direction

Arabic script is written horizontally and right-to-left in the main, but as with all RTL scripts, numbers and embedded LTR script text are written left-to-right (producing 'bidirectional' text). In the following example, the Arabic words are read RTL, starting with the one on the right, and numbers are ten and twelve (ie. read left-to-right). The numeric range is also ordered RTL, ie. it starts with 10 and ends with 12:

في 10-12 آدار

Baseline

The alphabetic baseline is a strong feature of Arabic script on the whole, since characters tend to join there. This is not always the case: for example, some adjacent pairs or ligatures have joins above the baseline, and initial letters in some fonts may start slightly above the baseline, but for most cases it remains a strong feature.

The nastaliq style of the script, on the other hand, uses arrangements of joined glyphs that cascade downwards from right to left, and ressemble a strongly sloping baseline. Here are some examples from the sample text above.

مستحق  •  شخص  •  کیفیت

Combining characters

The main arabic block contains 52 combining characters, with 43 more in the Arabic Extended-A block. However, only a small number are typically used for normal, written Arabic, Persian, etc.

Diacritics are used to express short vowel sounds, however for languages such as Arabic, Persian and Urdu they are typically not used, unless there is a particular need to help the reader understand the pronunciation. These diacritics are used in the Koran.

However, when the script is used for Uighur, the vowel diacritics are used, as a matter of course.

The diacritic ◌ّ [U+0651 ARABIC SHADDA​] doubles the value of the consonant it is attached to. It, too, is not often used, although sometimes it appears when vowel signs don't.

Multiple combining characters may be used for a single base character, such as when both a shadda and a vowel diacritic are used together.

Context-based glyph changes

Cursive script

Arabic script joins letters together. This results in four different shapes for most letters (including an isolated shape). The highlights in the example below show the same letter, ع [U+0639 ARABIC LETTER AIN], with three different joining forms:

على  •  متعددة  •  وسيجمع

A few Arabic script letters only join on the right-hand side.

Contextual shaping

Ligated glyph forms are common in Arabic. Some, such as لا are mandatory. Most of the remainder depend on the font style. Traditional fonts tend to have more ligated forms than modern styles. The following shows a word using no ligatures, on the left, and ligatures, on the right.

  vs. 

In more traditional fonts, you will also often see the join between certain characters, when adjacent, above the baseline, rather than at the baseline, like this:

rather than on the baseline, like this:

But actually a good font will constantly change the shape of glyphs slightly so as to create a more aesthetically pleasing, and in some cases and more easily readable, flow. The following show some simple examples of pairs of the same letter where the glyphs differ.

ـدد   تتـ   سسـ  

Context-based positioning

When vowel or shadda diacritics are used they can be placed in different positions, according to the context.

Here you can see shadda being placed at different heights, depending on the height of the base character that it appears above.

يتكلّم  •  تسجّل

When both shadda and vowel signs are present, a more complicated set of rules may be applied, depending on the font style, to determine the relevant positions. Vowel diacritics are placed above and below the shadda, rather than above and below the base character.

مَمِمّمَّمِّ

Punctuation

Arabic languages use a mixture of western and arabic punctuation.

The following excerpt from the Arabic sample text above shows the use of ، [U+060C ARABIC COMMA], but uses the ASCII full stop at the end of a sentence.

تمييز،

The Urdu sample also uses the arabic comma, but uses ۔ [U+06D4 ARABIC FULL STOP] at the end of a sentence.

چاہیئے۔

Digits

A set of arabic-indic digits are typically used in Middle Eastern and Gulf countries, whereas North African countries tend to use european digits. In neither area, however, is one digit style used exclusively.

٠١٢٣٤٥٦٧٨٩

There is a second set of digits in Unicode for use in languages such as Persian and Urdu. The glyph shapes are typically different for 4 of the digits (although there can also be differences between Persian and Urdu shapes).

۰۱۲۳۴۵۶۷۸۹

Text layout

Justification

Arabic script justification can use a number of different techniques. These include stretching the baseline and the glyphs of the text, expanding inter-word spaces, application of ligatures or swash forms, etc.. Typically these will be applied in combination.

Some font styles, such as ruq'a, do not permit extensions of the baseline.

Use the control below to see how your browser justifies the text sample here.

المادة 7 كل الناس سواسية أمام القانون ولهم الحق في التمتع بحماية متكافئة عنه دون أية تفرقة، كما أن لهم جميعاً الحق في حماية متساوية ضد أي تمييز يُخل بهذا الإعلان وضد أي تحريض على تمييز كهذا.

Character list

The Arabic script characters in Unicode 7.0 are spread across 3 blocks:

  1. Arabic (255)
  2. Arabic Supplement (48)
  3. Arabic Extended-A (47)

There are two additional blocks for presentation forms, but (with the exception of a handful of code points) these characters are only for compatibility with legacy encodings, and should not be used. Sometimes they are used by people to get around problems with Arabic support in applications, but this is a bad idea since it corrupts the underlying data, making it difficult to search, spellcheck, or do many other things that rely on the use of standard characters and their properties.

Many of the characters share a common base form, and are distinguished by the number and location of dots or other small diacritics, called i'jam. For example, س ‎ش ‎ݜ ‎ ݰ ‎ݽ ‎ݾ ‎ڛ ‎ښ ‎ڜ ‎ۺ.

The following is an incomplete list of languages and the number of characters they use, per version 31 of CLDR's lists of characters (exemplarCharacters).

Click on the links to see a list of characters with names.

Arabic

Mainء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ى ي36
Combining charactersً ٌ ٍ َ ُ ِ ّ ْ ٓ ٔ ٕ ٰ12
Punctuation - ‐ – — ، ؛ : ! ؟ . … ' " « » ( ) [ ]19

Persian

Mainء آ أ ؤ ئ ا ب پ ة ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ه و ي ی39
Combining charactersً ٌ ٍ ّ ٓ ٔ6
Punctuationـ - ‐ ، ٫ ٬ ؛ : ! ؟ . … ‹ › « » ( ) [ ] * / \23

Urdu

Mainء آ أ ؤ ئ ا ب پ ة ت ث ٹ ج چ ح خ د ذ ڈ ر ز ڑ ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن ں ه ھ ہ ۂ و ي ی ے47
Combining charactersٓ ٔ2
Punctuation، ؍ ٫ ٬ ؛ : ؟ . ۔ ( ) [ ]13
First published 11 Jul 2017. This version 2017-07-25 7:51 GMT.  •  Raise an issue.  •  Copyright r12a. Licence CC-By.