Saraiki (draft)

Arabic script orthography notes

Updated 11 December, 2024

This page brings together basic information about the Arabic script and its use for the Saraiki language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Saraiki using Unicode.

It was difficult to find sources of information in English about the Saraiki orthography. The information on this page is largely drawn from Wikipedia articles and an examination of the Saraiki lemmas in Wiktionary (especially for vowels). It covers the basics, but more information needs to be added.

Referencing this document

Richard Ishida, Saraiki (Arabic) Orthography Notes, 11-Dec-2024, https://r12a.github.io/scripts/arab/skr

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

جغرافیہ یونانی ٻولی دا لفظ ہے۔ جس دے معنی ہن زمین دا بیان۔ ”جغرافیہ اوہ علم ہے جس وچ زمین، اس دی خصوصیات، اس دے باشندیاں ،اس دے مظاہر تے اس دے نقوش دا مطالعہ کیتا ویندا ہے۔“ دنیا تے زمین دے علاقیاں دے حالات دا علم ہے، زمین دی سائنس کوں جغرافیہ آہدن۔ انساناں دی ساری ترقی جغرافیہ دے علم نال تھئی ہے۔ زمین انسان دا گھر ہے، ایں گھر کنوں فائدے چاوݨ کیتے جغرافیہ دی لوڑ ہے۔ ”جغرافیہ او علم ہے۔ جیندے وچ زمین، ایندی خصوصیات، ایندے باشندیاں ،ایں دے مظاہر تے ایں دے نقوش دا مطالعہ کیتا ویندا ہے۔“

Source: Saraiki Wikipedia, جغرافیہ

Usage & history

Origins of the Arabic script, 6thC – today.

Phoenician

└ Aramaic

└ Nabataean

└ Arabic

Saraiki is an Indo-Aryan language spoken by approximately 30 million people in the south-western half of the province of Punjab in Pakistan. The primary writing system for Saraiki is the Perso-Arabic script, which has been described as an extension of the Shahmukhi alphabet.

سرائیکی

The Arabic script was introduced to the region during the Arab conquest of Sindh in the 8th century, but Saraiki was historically written using the Multani alphabet. The name Saraiki was formally adopted in the 1960s as the name of the adapted form of Shahmukhi by regional social and political leaders who undertook to promote Saraiki dialects of the Punjabi language.

More information: History of Saraiki language

Basic features

The Arabic script is an abjad, ie. short vowels are not normally written. See the table to the right for a brief overview of features for the Saraiki language.

The Saraiki Arabic orthography is derived from the Arabic/Persian abjads, where in normal use the script represents long vowel sounds using matres lectionis. However, the script has been adapted in this orthography in order to cope with the additional vowels sounds in Saraiki.

Saraiki text runs right to left in horizontal lines, but numbers and embedded Latin text are read left-to-right. There is no case distinction. Words are separated by spaces.

❯ consonantSummary

Saraiki represents consonant sounds using 30 basic letters, 9 additional letters for spellings of loan words that have not been assimilated into the basic Saraiki spelling, and 17 more digraphs for aspirated sounds.

Consonant clusters and consonant gemination occur in Saraiki text, but diacritics to indicate these are not used in normal text (although they may be used in vowelled text).

❯ basicV

In normal text, the Saraiki abjad indicates the location of 5 out of 9 vowel sounds using 4 letters. Four more sounds are not normally written. Since two of the letters also serve as consonants, the orthography relies heavily on the reader for disambiguation of sounds in a word.

When needed, all vowels can be unambiguously represented using the letters and 3 combining marks. Post-consonant vowel sounds are written using the same code points, regardless of the position within a word, except for e, which has a different shape and code point when word-final.

Word-initial standalone vowels are preceded by or represented by ا.

Word-medial nasalisation is indicated using ن, to which a special diacritic can be added in vowelled text. Word-final nasalisation is written using ں. Vowel absence is not normally marked.

Saraiki uses native digits, and a mixture of ASCII and Arabic code points for punctuation marks.

Character index

Letters

Show
آ␣ؤ␣ئ␣ا␣ب␣ت␣ث␣ج␣ح␣خ␣د␣ذ␣ر␣ز␣س␣ش␣ص␣ض␣ط␣ظ␣ع␣غ␣ف␣ق␣ل␣م␣ن␣و␣ي␣ٹ␣ٻ␣پ␣ڄ␣چ␣ڈ␣ڑ␣ژ␣ک␣گ␣ڳ␣ں␣ھ␣ہ␣ی␣ے␣ۓ␣ݙ␣ݨ

Combining marks

Show
ً␣ٌ␣ٍ␣َ␣ُ␣ِ␣ّ␣ْ␣ٓ␣ٔ␣ٖ␣ٗ␣٘␣ٰ

Numbers

Show
۰␣۱␣۲␣۳␣۴␣۵␣۶␣۷␣۸␣۹

Punctuation

Show
،␣؛␣؟␣۔

ASCII

!␣(␣)␣.␣:

Other

Show
‌␣‍␣⁧␣‫␣⁦␣‪␣⁨␣⁩␣‬␣‏␣‎␣؜␣͏

To be investigated

,␣-␣«␣»␣ـ␣‑␣–␣—␣‘␣’␣“␣”␣…␣‹␣›␣﴾␣﴿
Items to show in lists

Phonology

The following represents the repertoire of the Saraiki language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i u ɪ ʊ e o ə ə ɛ a a

Consonant sounds

labial labio-
dental
alveolar retroflex palatal velar uvular glottal
stop p b   t d ʈ ɖ t͡ʃ d͡ʒ k ɡ q ʔ
    ʈʰ ɖʰ t͡ʃʰ d͡ʒʰ ɡʰ    
implosive ɓ   ɗ   ʄ ɠ    
fricative   f v
s z   ʃ x ɣ   ɦ
nasal m   n ɳ ɲ ŋ  
    ɳʰ      
approximant, trill, flap     r l ɽ j    
      ɽʰ      
  

Tone

Saraiki is not a tonal language.

Structure

tbd

Vowels

Vowel summary table

The following table summarises the main vowel to character assignments.

Vowel diacritics are shown in this table; in normal text these diacritics do not appear. The hyphens on the IPA show whether this is an initial (-x), medial (-x-), or final (x-) form. It's not clear how to represent the sound ɛ.

  front back
Plain vowels
‍ی␣◌ِ‍ی‍␣اِی‍
◌ِ‍و␣◌ُ‍و‍␣اُو‍
◌ِ␣اِ
◌ُ␣اُ
‍ے␣‍ی␣‍ی‍␣ای‍
‍و␣‍و‍␣او‍
 
◌َ␣اَ
 
‍ا␣‍ا‍␣آ

For additional details see vowel_mappings.

Post-consonant vowels

In normal text, the Saraiki abjad indicates the location of 5 out of 9 vowel sounds using 4 letters. Four more sounds are not normally written. Since two of the letters also serve as consonants, the orthography relies heavily on the reader for disambiguation of sounds in a word. When needed, all vowels can be unambiguously represented using the letters and 3 combining marks. Post-consonant vowel sounds are written using the same code points, regardless of the position within a word, except for e, which has a different shape and code point when word-final.

Plain vowels

After a consonant, Saraiki represents the following vowel sounds using letters.

ی␣و␣ی␣ے␣و␣ا

The vowels i, u, e, and o are written using the consonants ی and و as matres lectionis. However, a word-final e is written using ے.

As consonants these letters represent j and w, respectively, and, in addition to the fact that no distinction is made between i and e (word-medially) or u and o, it can often be difficult to know whether these letters represent consonants or vowels.

ویہہ

ݙو

دیگ

واسطے

گوشت

When ا appears in word-medial or word-final position it always represents . (As a word-initial standalone vowel, however, it can represent several sounds.)

ݙاہ

پراݨا

The short vowels, ɪ, ʊ, and ə are not written.

سر

ککڑ

تپ

Combining marks used for vowels

Where needed, either to disambiguate homographs or simply to clarify vowel pronunciations, vowel sounds can be indicated using one of the following diacritics.

َ␣ُ␣ِ

The basicV section above and the examples below show how these diacritics are used in combination with the letters just described. Note that the difference between i and e or u and o is indicated by the presence of a diacritic for the first, but an absence for the second.

ویہہ

ݙو

دیگ

واسطے

گوشت

پراݨا

سر

ککڑ

تپ

Wikipedia says that the following additional diacritics also occur. These are probably only used for Arabic, Persian or Urdu loan words that retain their original spelling, but they appear to be rare and do not appear in any of the terms found in Wiktionary.

ً␣ٌ␣ٍ␣ٰ␣ٖ␣ٗ

The following 2 additional combining marks can be found in decomposed text (only).

ٓ␣ٔ

Nasalisation

ن␣ں␣٘

Word-medially, vowel nasalisation is normally represented using ن. When diacritics are used with the text ٘ can be used to distinguish this from an ordinary n sound.

منجھ

Word-finally, nasalisation is indicated using ں.

ڳاں

Standalone vowels

Word-initial standalone vowels in Saraiki begin with the vowel carrier ا, apart from , which is written using آ.

ای‍␣ا␣او␣ا␣ای‍␣او␣ا␣آ

Vowels that are only distinguished by diacritics are not distinguished in normal text, and there is a great deal of phonological ambiguity in these spellings. Examples follow:

اکی

ابھا

اڄ

او

اے

آنا

Word-medial standalones appear to be indicated using a letter with hamza above. The following are examples found in the Wiktionary term list.

ئی␣ؤ␣ئے

ٻئیٹھ

وتاؤں

ترائے

Vowel sounds to characters

This section maps Saraiki vowel sounds to common graphemes in the Arabic orthography.

Ignore the diacritics for normal text usage.

i

initial اِی

medial ِي eg. مہینہ

final ِي eg. اٻاسی

ɪ

initial اِ eg. اکی

medial ِ Not written in normal text, eg. سر

ʊ

initial اُ eg. اننجھا

medial ُ Not written in normal text, eg. بکھ

u

initial اُو

medial ُو eg. خوراک

final ُو eg. ݙو

e

initial ای

medial ی eg. ڈیرہ

final ے eg. تے

o

initial او eg. او

medial و eg. رووݨ

final و

ə

initial اَ eg. اٹھارھاں

medial َ Not written in normal text, eg. سڑن

initial آ eg. آلھݨا

medial ا eg. لانگھا

final ا eg. آنا

Consonants

Consonant summary table

The following table summarises the main consonant to character assigments.

The left column is lowercase, and the right uppercase.

Onsets
پ␣ب␣ت␣ط␣د␣ٹ␣ڈ␣چ␣ج␣ک␣گ␣ق
پھ␣بھ␣تھ␣دھ␣ٹھ␣ڈھ␣چھ␣جھ␣کھ␣گھ
ٻ␣ݙ␣ڄ␣ڳ
ف␣و␣وھ␣س␣ص␣ث␣ز␣ذ␣ظ␣ض␣ش␣ژ␣خ␣غ␣ح␣ہ
م␣ن␣ݨ␣ں
مھ␣نھ␣ݨھ
ر␣ڑ␣ل␣ی
رھ␣ڑھ␣لھ

For additional details see consonant_mappings.

Basic consonants

The following list shows letters used for the basic set of consonant sounds in native Saraiki.

پ␣ب␣ت␣د␣ٹ␣ڈ␣چ␣ج␣ک␣گ␣ق␣ٻ␣ݙ␣ڄ␣ڳ␣ف␣و␣س␣ز␣ش␣خ␣غ␣ہ␣م␣ن␣ݨ␣ر␣ڑ␣ل␣ی

The ijam for implosives and retroflexes are used in a consistent way. Vertical, double dots below identify implosives, and a small, superscript TAH is used to indicate a retroflex (resulting in both being used for ).

Observation: q is not found in the Wiktionary entries. It's not clear from other sources whether this is a native sound or not.

Other consonant letters

Nine more consonant letters are hangovers from the original spellings of loan words that have not been assimilated into the basic Saraiki alphabet.

ط␣ص␣ث␣ذ␣ظ␣ض␣ژ␣ح␣ع

Aspiration

A large number of Saraiki phones are accompanied by aspiration. Unlike Sindhi, no aspirated sounds are represented by a single character.

The aspirated consonants, listed below, are represented by a digraph with ھ.

پھ␣بھ␣تھ␣دھ␣ٹھ␣ڈھ␣چھ␣جھ␣کھ␣گھ␣وھ␣مھ␣نھ␣ݨھ␣رھ␣ڑھ␣لھ

In vocalised text, vowel diacritics tend to be placed over the initial consonant letter in the digraph, rather than over the HEH.

Finals

Saraiki doesn't normally use any mark to indicate a consonant without a following vowel. Word-final consonants are apparently not marked even in text with other diacritics.

سڑن

بدل

Consonant clusters

No special mechanisms are used to indicate consonant clusters in normal text; they are simply written as a sequence of characters. However, in text with diacritics ْ may be used.

ہفتہ

ابھرݨ

Note how the vowel killer is attached to the first letter in an aspirated digraph (see the second example just above).

Note also that the shape in the nastaliq style is that of an inverted v, rather than the small circle shape used in Arabic language orthographies. The same code point is used for both, and the difference should be managed by using an appropriate font. The diacritic 065B should never be used for this (it was added to the Unicode Standard to serve as a vowel sign in African languages).

Consonant length

ّ

Geminated consonants are not normally marked in text, but in text with diacritics they may be indicated using 0651.

بھڄݨ

نک

Consonant sounds to characters

This section maps Saraiki consonant sounds to common graphemes in the Arabic orthography.

The right-hand side of each item shows the various joining forms.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

p

067E067E067E067E consonant پ

067E 06BE067E 06BE067E 06BE067E 06BE aspirated consonant پھ

b

0628062806280628 consonant ب

0628 06BE0628 06BE0628 06BE0628 06BE aspirated consonant بھ

ɓ

067B067B067B067B consonant ٻ

t

062A062A062A062A consonant ت

0637063706370637 consonant ط Used in unassimilated spellings of loan words.

062A 06BE062A 06BE062A 06BE062A 06BE aspirated consonant تھ

t͡ʃ

0686068606860686 consonant چ

t͡ʃʰ

0686 06BE0686 06BE0686 06BE0686 06BE aspirated consonant چھ

d

062F062F consonant د

062F 06BE062F 06BE aspirated consonant دھ

d͡ʒ

062C062C062C062C consonant ج

d͡ʒʰ

062C 06BE062C 06BE062C 06BE062C 06BE aspirated consonant جھ

ʈ

0679067906790679 consonant ٹ

ʈʰ

0679 06BE0679 06BE0679 06BE0679 06BE aspirated consonant ٹھ

ɖ

06880688 consonant ڈ

ɖʰ

0688 06BE0688 06BE aspirated consonant ڈھ

07590759 implosive consonant ݙ

ʄ

0684068406840684 implosive consonant ڄ

k

06A906A906A906A9 consonant ک

06A9 06BE06A9 06BE06A9 06BE06A9 06BE consonant کھ

ɡ

06AF06AF06AF06AF consonant گ

ɡʰ

06AF 06BE06AF 06BE06AF 06BE06AF 06BE aspirated consonant گھ

ɠ

06B306B306B306B3 consonant ڳ

q

0642064206420642 consonant ق

f

0641064106410641 consonant ف

v

06480648 consonant/vowel و

0648 06BE0648 06BE0648 06BE0648 06BE aspirated consonant وھ

s

0633063306330633 consonant س

0635063506350635 consonant ص Used in unassimilated spellings of loan words.

062B062B062B062B consonant ث Used in unassimilated spellings of loan words.

z

06320632 consonant ز

06300630 consonant ذ Used in unassimilated spellings of loan words.

0638063806380638 consonant ظ Used in unassimilated spellings of loan words.

0636063606360636 consonant ض Used in unassimilated spellings of loan words.

ʃ

0634063406340634 consonant ش

06980698 consonant ژ Used in unassimilated spellings of loan words.

x

062E062E062E062E consonant خ

ɣ

063A063A063A063A consonant غ

h~ɦ

06C106C1 consonant ہ

062D062D062D062D consonant ح Used in unassimilated spellings of loan words.

m

0645064506450645 consonant م

0645 06BE0645 06BE0645 06BE0645 06BE aspirated consonant مھ

n

0646064606460646 consonant ن

0646 06BE0646 06BE0646 06BE0646 06BE aspirated consonant نھ

ɳ

0768076807680768 consonant ݨ

ɳʰ

0768 06BE0768 06BE0768 06BE0768 06BE aspirated consonant ݨھ

r

06310631 consonant ر

0631 06BE0631 06BE aspirated consonant رھ

ɽ

06910691 consonant ڑ

ɽʰ

0691 06BE0691 06BE0691 06BE0691 06BE aspirated consonant ڑھ

l

0644064406440644 consonant ل

0644 06BE0644 06BE0644 06BE0644 06BE aspirated consonant لھ

j

06CC06CC06CC06CC consonant/vowel ی

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Canonically equivalent encodings

Two letters can be represented as an atomic character (the norm), or as a sequence of base letter plus combining mark. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.

Atomic (recommended) Decomposed ( NOT recommended )
آ 0627 0653
ئ 064A 0654
ؤ 0648 0654
ۓ 06D2 0654

Normally, text will use the atomic form, and this is generally recommended by the Unicode Standard.

Confusables & spelling errors

This table lists characters that are often mistakenly used because they look the same as or similar to the code points used for Saraiki, or perhaps because the correct character is not available on the user's keyboard.

Correct Incorrect Notes
06CC 064A The Farsi YEH drops the dots below in isolate and final positions.
06A9 0643 Common fonts tend not to show the difference between these two characters, but the ability to search and compare text is impaired unless the application is aware of and takes counter-measures against this substitution.
0652 065B The function of this glyph is that of the sukun, so the correct semantic character should be used. Although 065B looks like the Saraiki jazm, it was introduced to Unicode to serve as a vowel sign for African languages§. In order to produce the correct glyph using a font such as Noto it is essential to indicate that the language of the text is Saraiki. (In HTML this can be done using the attribute lang="skr".) Otherwise, the shape is likely to be a small circle.

Codepoint sequences

Combining marks always follow the base character.

Numbers

Digits

Saraiki uses the set of native digits in the Unicode Arabic block known as Eastern Arabic-Indic digits.

۰␣۱␣۲␣۳␣۴␣۵␣۶␣۷␣۸␣۹

Text direction

Arabic script text is written horizontally and right-to-left in the main but, as in most right-to-left scripts, numbers and embedded text in other scripts are written left-to-right (producing 'bidirectional' text).

العاشر ليونيكود (Unicode Conference)،الذي سيعقد في 10-12 آذار 1997 مبدينة
Arabic words are read right-to-left, starting from the right of this line, but numbers and Latin text (highlighted) are read left-to-right.

The Unicode Bidirectional Algorithm automatically takes care of the ordering for all the text in fig_bidi, as long as the 'base direction' is set to RTL. In HTML this can be set using the dir attribute, or in plain text using formatting controls.

If the base direction is not set appropriately, the directional runs will be ordered incorrectly as shown in fig_bidi_no_base_direction, making it very difficult to get the meaning.

في XHMTL 1.0 يتم تحقيق ذلك بإضافة العنصر المضمن bdo.
في XHMTL 1.0 يتم تحقيق ذلك بإضافة العنصر المضمن bdo.
The exact same sequence of characters in Arabic language text with the base direction set to RTL (top), and with no base direction set on this LTR page (bottom). Certain items are highlighted to help track their position.

Show default bidi_class properties for characters in the Saraiki language.

For other aspects of dealing with right-to-left writing systems see the following sections:

For more information about how directionality and base direction work, see Unicode Bidirectional Algorithm basics. For information about plain text formatting characters see How to use Unicode controls for bidi text. And for working with markup in HTML, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.

For authoring HTML pages, one of the most important things to remember is to use <html dir="rtl" … > at the top of the page. Also, use markup to manage direction, and do not use CSS styling.

Managing text direction

Unicode provides a set of 10 formatting characters that can be used to control the direction of text when displayed. These characters have no visual form in the rendered text, however text editing applications may have a way to show their location.

202B (RLE), 202A (LRE), and 202C (PDF) are in widespread use to set the base direction of a range of characters. RLE/LRE comes at the start, and PDF at the end of a range of characters for which the base direction is to be set.

In Unicode 6.1, the Unicode Standard added a set of characters which do the same thing but also isolate the content from surrounding characters, in order to avoid spillover effects. They are 2067 (RLI), 2066 (LRI), and 2066 (PDI). The Unicode Standard recommends that these be used instead.

There is also 2068 (FSI), used initially to set the base direction according to the first recognised strongly-directional character.

061C (ALM) is used to produce correct sequencing of numeric data. Follow the link and see expressions for details.

200F (RLM) and 200E (LRM) are invisible characters with strong directional properties that are also sometimes used to produce the correct ordering of text.

For more information about how to use these formatting characters see How to use Unicode controls for bidi text. Note, however, that when writing HTML you should generally use markup rather than these control codes. For information about that, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.

Expressions & sequences

Sequences of numbers are sets of numbers separated by punctuation or spaces, such as 10–12–2022. Sequences of digits, such as 123, in Arabic script text run LTR automatically. Expressions and sequences of numbers follow somewhat complicated rules, which are described in the Arabic language orthography notes.

Glyph shaping & positioning

Experiment with examples using the Saraiki character app.

Cursive script

See type samples.

Arabic script is always cursive, ie. letters in a word are joined up. Fonts need to produce the appropriate joining form for a letter, according to its visual context, but the code point used doesn't change. This results in four different shapes for most letters (including an isolated shape). Ligated forms also join with characters alongside them.

The highlights in the example below show the same letter, ع, with three different joining forms.

على • متعددة • وسيجمع

The letter ع (ain) in 3 different joining contexts.

Most Arabic script letters join on both sides. A few only join on the right-hand side: this involves 4 basic shapes for Modern Standard Arabic.

Cursive joining forms

Most dual-joining characters add or become a swash when they don't join to the left. A number of characters, however, undergo additional shape changes across the joining forms. fig_joining_forms and fig_right_joining_forms show the basic shapes in Modern Standard Arabic and what their joining forms look like. Significant variations are highlighted.

isolatedright-joineddual-joinleft-joined Saraiki letters
ب ـب ـبـ بـ
پ␣ت␣ٹ␣ب␣ٻ␣ث
ن ـن ـنـ نـ
ن␣ݨ␣ں
ق ـق ـقـ قـ
ق
ف ـف ـفـ فـ
ف
س ـس ـسـ سـ
س␣ش
ص ـص ـصـ صـ
ص␣ض
ط ـط ـطـ طـ
ط␣ظ
ک ـک ـکـ کـ
گ␣ک␣ڳ
ل ـل ـلـ لـ
ل
ہ ـہ ـہـ ہـ
ہ
ھ ـھ ـھـ ھـ
ھ
م ـم ـمـ مـ
م
ع ـع ـعـ عـ
ع␣غ
ح ـح ـحـ حـ
چ␣ڄ␣خ␣ح␣ج
ي ـي ـيـ يـ
ی␣ئ
Joining forms for shapes that join on both sides..
isolatedright-joined MSA letters
ا ـا
ا
ر ـر
ر␣ڑ␣ز␣ژ
د ـد
د␣ڈ␣ݙ␣ذ
و ـو
و␣ؤ
Joining forms for shapes that join on the right only.

Managing glyph shaping

200D (ZWJ) and 200C (ZWNJ) are used to control the joining behaviour of cursive glyphs. They are particularly useful in educational contexts, but also have real world applications.

ZWJ permits a letter to form a cursive connection without a visible neighbour. For example, the marker for hijri dates in Arabic is an initial form of heh, even though it doesn't join to the left, ie. ه‍. For this, use ZWJ immediately after the heh, eg. الاثنين 10 رجب 1415 ه‍..

ZWNJ prevents two adjacent letters forming a cursive connection with each other when rendered. For example, it is used in Persian for plural suffixes, some proper names, and Ottoman Turkish vowels. Ignoring or removing the ZWNJ will result in text with a different meaning or meaningless text, eg, تن‌ها is the plural of body, whereas تنها is the adjective alone.2 The only difference is the presence or absence of ZWNJ after noon.

034F is used in Arabic to produce special ordering of diacritics. The name is a misnomer, as it is generally used to break the normal sequence of diacritics.

Context-based shaping & positioning

In addition to the cursive shaping, Arabic script glyphs also require context-dependent shaping and positioning. For more information, see the Arabic language orthography notes.

The usual mandatory ligature applies for لا.

لانگھا

ٻلا

Typographic units

Word boundaries

Words are separated by spaces.

Graphemes

tbd

Phrase, sentence, and section delimiters are described in phrase.

Punctuation & inline features

Observation: The following punctuation marks have been seen in use while researching Saraiki, but there are likely to be more to document.

Phrase & section boundaries

،␣:␣؛␣.␣؟␣!␣—

Saraiki uses a mixture of ASCII, Arabic, and other punctuation.

phrase

،

؛

:

sentence

۔

.

؟

!

Some Saraiki texts use . as a full stop, whereas others use ۔.

Bracketed text

See type samples.

(␣)

Saraiki commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(

)

Mirrored characters

The words 'left' and 'right' in the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.

a > b > c
ا > ب > ج
Both of these lines use > U+003E GREATER-THAN SIGN, but the direction it faces depends on the base direction at the point of display.

The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones.

(␣)␣<␣>␣[␣]␣{␣}␣«␣»␣‹␣›

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words. They are not broken at the small gaps that appear where a character doesn't join on the left.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show default line-breaking properties for characters in this orthography.

The following list gives examples of typical behaviours for characters affected by these rules. Context may affect the behaviour of some of these and other characters.

  • « “ ‘ (   should not be the last character on a line
  • » ” ’ ) . ⹁ ⁏ ؟ !   should not begin a new line

Breaking between Latin words

When a line break occurs in the middle of an embedded left-to-right sequence, the items in that sequence need to be rearranged visually so that it isn't necessary to read lines upwards.

latin-line-breaks shows how two Latin words are apparently reordered in the flow of text to accommodate this rule. Of course, the rearragement is only that of the visual glyphs: nothing affects the order of the characters in memory.

Text with no line break in Latin text.

Text with line break in Latin text.

In this Arabic language text, the lower of these two images shows the result of decreasing the line width, so that text wraps between a sequence of Latin words.

Page & book layout

General page layout & progression

Saraiki books, magazines, etc., are bound on the right-hand side, and pages progress from right to left.

عنوان كتاب

Binding configuration for Saraiki books, magazines, etc.

Columns are vertical but run right-to-left across the page.

Online resources

References