Urdu writing system

Updated 25 July, 2017 • tags arabic, urdu, scriptnotes

This page provides basic information about the Urdu writing system, a variant of the Arabic script. See also the Arabic script summary. For similar information related to other scripts, see the Script comparison table.

Click on red text examples or highlight part of the sample text to see a list of characters. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

For more details see: Character notes Script links

Sample (Urdu)

دفعہ ۱۔ تمام انسان آزاد اور حقوق و عزت کے اعتبار سے برابر پیدا ہوئے ہیں۔ انہیں ضمیر اور عقل ودیعت ہوئی ہے۔ اس لئے انہیں ایک دوسرے کے ساتھ بھائی چارے کا سلوک کرنا چاہیئے۔

دفعہ ۲۔ ہر شخص ان تمام آزادیوں اور حقوق کا مستحق ہے جو اس اعلان میں بیان کئے گئے ہیں، اور اس حق پر نسل، رنگ، جنس، زبان، مذہب اور سیاسی تفریق کا یا کسی قسم کے عقیدے، قوم، معاشرے، دولت یا خاندانی حیثیت وغیرہ کا کوئی اثر نہ پڑے گا۔ اس کے علاوہ جس علاقے یا ملک سے جو شخص تعلق رکھتا ہے اس کی سیاسی کیفیت دائرہ اختیار یا بین الاقوامی حیثیت کی بنا پر اس سے کوئی امتیازی سلوک نہیں کیا جائے گا۔ چاہے وہ ملک یا علاقہ آزاد ہو یا تولیتی ہو یا غیر مختار ہو یا سیاسی اقتدار کے لحاظ سے کسی دوسری بندش کا پابند ہو۔

Key features

Urdu uses the Arabic script with extensions. A number of the extensions are based on those developed for Persian (Farsi).

The script type is abjad, ie. the script is largely consonantal and short vowel sounds are typically not shown. Some of the consonant characters double as long vowels (eg. ی and و). The vowels are not usually clearly defined, but when necessary, vowel information can be represented by combining marks appearing above or below the base consonant. The absence of a vowel and doubling of consonants can be indicated in the same way.

The basic alphabet covers a much wider repertoire of sounds than found in Arabic, so several extensions have been added to the basic Arabic script. Many of these come via Persian. The alphabet includes aspirated letters that have to be composed with two Unicode characters and a je letter that uses different Unicode characters depending on the context.

Although it is not always possible to guess the vowel sounds in a word, the consonants are largely reliable phonetically. There is mostly a one-to-one correspondance between letters and sounds.

Shaping and style

Since the script is cursive (ie. letters are typically joined) the letter forms can vary considerably according to position.

Urdu is typically written in a nasta'liq style; ie. the connected letters in a word tend to follow a sloping baseline. This is achieved in Unicode by using the correct font – the underlying characters used are not different for nasta'liq vs. other styles.

Consonants

Consonant clusters

The absence of a vowel sound can be indicated with the diacritic  ْ [U+0652 ARABIC SUKUN​], called sukūn or jazm, although this diacritic is not normally shown in text, eg. سَخْت saxt hard.

It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex.

This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.

Consonant lengthening

Consonant sounds can be lengthened. In vowelled text, which is very rare, this is shown using the diacritic  ّ [U+0651 ARABIC SHADDA​], called taʃdiːd, eg. ستّر sattar, seventy. More often than not, this is not written.

Vowels

There are 10 vowel sounds, though there are also allophonic variants. They are usually grouped into pairs of 'short' and 'long' sounds - although the difference is qualitative, rather than just length. The basic phonemes are as follows:

ə ɪ ʊ ɛ ɔ
ɑː e o

The phoneme ə is sometimes written a in phonemic transcriptions in this material. (This is the letter usually used in other sources too.)

The following table shows the standard ways of indicating vowel sounds when diacritics are used. Note however, that context can change the value of a vowel diacritic (such as a following 'ain or he) - these are detailed below the table. Three short vowels are not typically found in final position. The examples only show diacritics for the sound currently being discussed.

sound final medial initial base component final medial initial
ə   064E: ARABIC FATHA 0627: ARABIC LETTER ALEF+064E: ARABIC FATHA zabar   بَب bəb اَب əb
ɪ   0650: ARABIC KASRA 0627: ARABIC LETTER ALEF+0650: ARABIC KASRA zer   دِن dɪn اِن ɪn
ʊ   064F: ARABIC DAMMA 0627: ARABIC LETTER ALEF+064F: ARABIC DAMMA peʃ   سُست sʊst اُس ʊs
ɑː 0627: ARABIC LETTER ALEF 0627: ARABIC LETTER ALEF 0622: ARABIC LETTER ALEF WITH MADDA ABOVE alɪf لکھنا lɪkʰnɑː
باغ bɑːɣ
آج ɑːʤ
e 06D2: ARABIC LETTER YEH BARREE 06CC: ARABIC LETTER FARSI YEH 0627: ARABIC LETTER ALEF+06CC: ARABIC LETTER FARSI YEH je بجے baʤe بیٹا beʈɑː ایک ek
06CC: ARABIC LETTER FARSI YEH 06CC: ARABIC LETTER FARSI YEH+0650: ARABIC KASRA 06CC: ARABIC LETTER FARSI YEH+0650: ARABIC KASRA zer+je / je گاری gɑːriː تِین tiːn اِینٹ iːnʈ
ɛ 064E: ARABIC FATHA+06D2: ARABIC LETTER YEH BARREE 064E: ARABIC FATHA+06CC: ARABIC LETTER FARSI YEH 0627: ARABIC LETTER ALEF+064E: ARABIC FATHA+06CC: ARABIC LETTER FARSI YEH zabar+je ہَے کَیسا kɛsɑː اَیسا ɛsɑː
o 0648: ARABIC LETTER WAW 0648: ARABIC LETTER WAW 0627: ARABIC LETTER ALEF+0648: ARABIC LETTER WAW vɑːuː کو ko ٹوپی ʈopiː اوس os
0648: ARABIC LETTER WAW+064F: ARABIC DAMMA
0648: ARABIC LETTER WAW+0657: ARABIC INVERTED DAMMA
0648: ARABIC LETTER WAW+064F: ARABIC DAMMA
0648: ARABIC LETTER WAW+0657: ARABIC INVERTED DAMMA
0627: ARABIC LETTER ALEF+064F: ARABIC DAMMA+0648: ARABIC LETTER WAW
0627: ARABIC LETTER ALEF+0648: ARABIC LETTER WAW+0657: ARABIC INVERTED DAMMA
peʃ+vɑːuː or
vɑːuː+inverted peʃ
ہندُو hɪnduː
ہندوٗ hɪnduː
پُورا puːrɑː
ثوٗرا puːrɑː
اُوپر uːpar
اوٗپر uːpar
ɔ 064E: ARABIC FATHA+0648: ARABIC LETTER WAW 064E: ARABIC FATHA+0648: ARABIC LETTER WAW 0627: ARABIC LETTER ALEF+064E: ARABIC FATHA+0648: ARABIC LETTER WAW zabar+vɑːuː نَو شَوق ʃɔq اَور ɔr

'ain

The letter ع [U+0639 ARABIC LETTER AIN] is used in words of Arabic origin. In these words it is typically not pronounced but can support vowels. In this way, at the beginning of a word it can fulfill the same function as the alif, eg. عَرب arab Arab. The Urdu word اَرَب arab necessity, though pronounced the same, becomes a completely different word by its spelling. Note, in particular, that the equivalent of آ [U+0622 ARABIC LETTER ALEF WITH MADDA ABOVE] ɑː is عا, as in عادت ɑːdat habit.

A following ع may also affect a short vowel diacritic to produce a long vowel sound as follows:

  1. ɑː from zabar followed by 'ain, eg. بَعد bɑːd after

  2. e from zer followed by 'ain, eg. شِعر seːr verse

  3. o from peʃ followed by 'ain, eg. شُعلہ ʃolɑː flame

Choṭī he and baṛī he

The letters ہ [U+06C1 ARABIC LETTER HEH GOAL] and ح [U+062D ARABIC LETTER HAH] can also modify preceding short vowels as follows:

  1. ɛ from zabar followed by he, eg. اَحمد ɛhmad Ahmed, رَہنا rɛhnɑː to remain

  2. ɛ from zer followed by he, eg. مِہربانی mɛhrbɑːniː kindness, and واضِح vɑːzɛh clear

  3. o from peʃ followed by 'ain, eg. شُہرت ʃohrat fame, and توجُّہ tavajːoh attention

The so-called 'silent' he that appears at the end of many words of Arabic or Persian derivation is pronounced ɑː, مکَہ makːɑː Mecca.

Nasalisation

Vowels may be nasalised, like at the end of the French word élan. This is indicated in Urdu by a glyph called nun ghunna that looks like the letter nun except that in word final position it has no dot, eg. ماں mãː, mother, ٹاںگ tãːg leg, and کروں karũː, I may do. In Unicode there are different characters for each of these uses.

Vowel junctions

A hamzā plays more than one role in Urdu. One such role is to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways. It can also have two different shapes, one like the initial form of 'ain and the other more like an italic 's'.

In this example we see hamza in its isolated form, انشاءﷲ ɪnʃalːaː God willing.

When the second vowel is an or e represented by ی [U+06CC ARABIC LETTER FARSI YEH] or ے [U+06D2 ARABIC LETTER YEH BARREE], the hamzā 'sits on a chair' before the letter representing the second vowel, eg. کئی kaiː several; تیئیس teiːs twenty-three; کوئی koiː someone; گئے gae they went; گائے gɑːe they sang.

The short vowel ɪ as a second vowel is also represented by hamzā 'on its chair', eg. کوئلہ koɪlɑː coal; لائن lɑːɪn queue.

To represent hamzā 'on a chair' for initial or medial positions with the Nafees Nastaleeq script you can use ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE]. This is not an ideal solution, however, since the hamza is sitting on a yeh that is not actually appropriate. This becomes particularly problematic when decomposed for normalization. If you substitute the combination ي +  ٔ [U+064A ARABIC LETTER YEH + U+0654 ARABIC HAMZA ABOVE​] the Nafees Nastaleeq font will not render the glyphs correctly, other Urdu fonts don't position the hamza correctly, and often two dots appear below the yeh. It seems that in order to resolve this issue a new Unicode character will be needed, but use of ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE] seems fairly widespread at the moment

When the second vowel is an or o represented by و [U+0648 ARABIC LETTER WAW], the hamzā typically sits directly on top of the و, eg. آؤ ɑːo come; جاؤں ʤɑːũː I may go. Note that often the hamzā is omitted in this situation. To represent this in Unicode use ؤ [U+0624 ARABIC LETTER WAW WITH HAMZA ABOVE].

Many words have the vowel combinations iːɑ̃ iːe iːo, where hamzā is not typically used, eg. لڑکیاں laɽkiːɑ̃ː girls; چلیے ʧaliːe come on; لڑکیوں کا laɽkiːõ kɑː of the girls.

Hamzā is also used to represent izāfat when the preceding word ends in either choṭī he or ye (see below).

Izāfat

Izāfat ɪzɑːfat is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.

This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب ʃer ɛ panʤɑːb Lion of the Panjab; طالبِ علم tɑːlɪb ɛ ɪlm seeker of knowledge (a student).

Izāfat is represented by a combining hamzā when the preceding word ends in either choṭī he ہ [U+06C1 ARABIC LETTER HEH GOAL] or ye ی [U+06CC ARABIC LETTER FARSI YEH]: eg. قطرۂآب qatrah ɛ ɑːb drop of water; ولئکامل valiː ɛ kɑːmɪl perfect saint.

I have a question about the use of ARABIC YEH WITH HAMZA ABOVE, which the Nafees Nastaleeq font requires. Note also that the Nafees Nastaleeq font will not work properly if there is a space after the izāfat.

izāfat may also be shown as ے [U+06D2 ARABIC LETTER YEH BARREE] with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلند sadɑː ɛ buland a high voice; رو ۓ زمین ruː ɛ zamiːn the surface of the ground.

I have a question about how the Nafees Nastaleeq font handles precomposed vs decomposed versions of yeh baree with hamza.

Arabic definite article

The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.

The lām is not pronounced if it precedes one of the following characters:

ت [U+062A ARABIC LETTER TEH] te
ث [U+062B ARABIC LETTER THEH] se
د [U+062F ARABIC LETTER DAL] dāl
ذ [U+0630 ARABIC LETTER THAL] zāl
ر [U+0631 ARABIC LETTER REH] re
ز [U+0632 ARABIC LETTER ZAIN] ze
س [U+0633 ARABIC LETTER SEEN] sīn
ش [U+0634 ARABIC LETTER SHEEN] šīn
ص [U+0635 ARABIC LETTER SAD] svād
ض [U+0636 ARABIC LETTER DAD] zvād
ط [U+0637 ARABIC LETTER TAH] toe
ظ [U+0638 ARABIC LETTER ZAH] zoe
ل [U+0644 ARABIC LETTER LAM] lām
ن [U+0646 ARABIC LETTER NOON] nūn

Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم asːalɑːm alaikum greetings.

Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل bɪlkul absolutely; فی الحال filhɑːl at present.

Often the vowel is pronounced ʊ, eg. دارالحکومت dɑːrʊlhʊkuːmat capital.

List of basic symbols

Alphabet ا ب بھ پ پھ ت تھ ٹ ٹھ ث ج جھ چ چھ ح خ د دھ ڈ ڈھ ر ڑ ڑھ ز ژ س ش ص ض ط ظ ع غ ف ق ک کھ گ گھ ل م ن و ہ ی ے
Other characters not generally
counted as part of the alphabet
ء ؤ ئ ۓ ۂ آ ں ھ  ً   ٌ    ٍ    ُ    ِ    ّ    َ    ْ    ٰ    ٖ    ٗ    ٘ 
Punctuation ، ؛ ؟ ۔ ٫ ،
Digits ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹
Signs & symbols   ؀       ؁                 ؂ ؃ ؏   ؐ    ؑ    ؒ    ؓ    ؔ  ؎

References

  1. [Matthews] David Matthews & Mohamed Kasim Dalvi, Teach Yourself Urdu, Hodder & Shoughton, ISBN 0-340-67027-4
  2. [Delacy] Richard Delacy, Beginner's Urdu Script, ISBN 0-340-86028-6
  3. [Daniels] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
  4. [Hugo] Hugo's Urdu Pages: Alphabet
  5. [Hugo2] Hugo's Urdu Pages: Vowels
  6. [Hussain] Sarmad Hussain, Proposal to add Marks and Digits in Arabic Code Block (for Urdu)
  7. [Kew] Jonathan Kew, Proposal to add Arabic-script honorifics and other marks
  8. [URLSDF] Urdu and Regional Language Software Development Forum, Ministry of Science and Technology, Government of Pakistan, Proposal to add Marks and Digits in Arabic Code Block (for Urdu)
  9. [Abdali] Kamal Abdali, Urdu on the Mac
  10. [WPBasmala] Wikipedia, Basmala
First published 7 Apr 2006. This version 2017-07-25 9:01 GMT.  •  Copyright r12a@w3.org. Licence CC-By.