Wolof (draft)

Wolofal ajami orthography notes

Updated 7 July, 2024

This page brings together basic information about the Arabic script and its use for the Wolof language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Wolof using Unicode.

Referencing this document

Richard Ishida, Wolof (Wolofal ajami) Orthography Notes, 07-Jul-2024, https://r12a.github.io/scripts/arab/wo

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   38px

دࣷومِ آدَمَ يࣺݒّ دَنُيْ جُدُّ، يَمْ ݖِ تَوفࣹيخْ ݖِ سَگْ اَکْ سَݧْ-سَݧْ. نࣹکّ نَ اِتْ کُ خَمْ دࣴگّ تࣹ اࣵندْ نَ خࣹلَمْ، تࣹ وَرْ نَا جࣴفْلَنْتࣹ اَکْ نَوْلࣹينْ، تࣹ تࣹگْ کࣷ ݖِ وࣵلُّ مبࣷکّ.

Source: Unicode UDHR, article 1, in Wikipedia

Usage & history

Origins of the Arabic script, 6thC – today.

Phoenician

└ Aramaic

└ Nabataean

└ Arabic

The Wolof language is spoken by around 40% of Senegalese, and others in Mauritania and The Gambia. There are around 5.5 million native speakers in Senegal, and the total number of speakers is a little over 12 million.

Wolof is normally written in the Latin script, and the Garay script is used by a small number of people, but historically and still occasionally it is written using the Arabic ajami script.

وࣷلࣷفْ لࣵکّ

The use of the Arabic script has a much longer history than that of the Latin orthography, which was a colonial import. A number of letters were obsoleted by more recent standards for the orthography. Reforms under the umbrella of the Directíon de la Promotion des Langues Nationales (DPLN) in 1986 and 1990 sought to harmonise the spelling of Latin and Arabic orthographies with that used in other countries of West Africalpp. In-country standards are currently maintained by the Centre de linguistique appliquée de Dakar (CLAD).

Basic features

The Arabic script is commonly used as an abjad, which means that in normal use the script represents only consonant and long vowel sounds. However, since Wolofal normally shows all the vowel diacritics, it actually functions as an alphabet. See the table to the right for a brief overview of features for Wolof using the Arabic script.

Wolof is written right to left in horizontal lines. Letters are joined (cursive) as is usual for the Arabic script. Words are separated by spaces. The script is unicameral.

The orthography for Wolofal described here has 24 basic consonants.

Wolof has prenasalised consonants, but they are spelled out using normal consonant letters (as in the Latin orthography).

Doubled consonants are common and are indicated using the shadda diacritic.

❯ basicV

Wolof is an alphabet where vowel sounds are written using a mixture of combining marks and letters. Unlike Semitic languages such as Arabic that build words on consonant patterns and so normally hide vowel diacritics in the Arabic script, it can be difficult to read Wolof text without the full vowel information, and therefore Wolofal retains all vowel diacritics in the text. Wolof has more vowel sounds than Arabic, so additional diacritics are used to write those.

The spelling is largely regular. The 5 letters used for vowels also double as consonants. 8 combining marks are used to write vowels, and the sukun is used to indicate vowel absence.

Word-initially, standalone vowels are attached to aleph. Word-medial vowels following another vowel are rare, and are written by adding vowel characters to ع, rather than the alef used for the Arabic language.

Line-breaking and justification are primarily based on inter-word spaces.

Joining forms

Because the Arabic script is 'cursive' (ie. joined-up) writing, letters tend to have different shapes depending on whether they join with adjacent letters or not (see cursive). In addition, vowels can be represented using different characters, depending on where in a word they appear.

In scripts such as Arabic, several characters have no left-joining form. In what follows we'll use the characters ي and د to illustrate shapes. The former can join on both sides, but the latter can only join on the right.

Left-joining glyphs are commonly called initial; dual-joining are called medial; and right-joining are called final. Glyphs that don't join on either side are called isolated. However, these glyph shapes can be found in various places within a single word.

Word-initial characters usually have initial glyph shapes (eg. 064A ). However, characters that only join to the right will use an isolated glyph shape (eg. 062F ). Furthermore, words beginning with a vowel are always preceded by a vowel carrier, which is normally ا (eg. 0627 06CC or 0627 064E ).

Word-medial characters will typically join on both sides (eg. 064A ) but those that only join to the right will use a final glyph (eg. 062F ). However, if either of those is preceded by another character that only joins to the right, the glyph shapes rendered will be initial (eg. 064A ) and isolated (eg. 062F ), respectively.

Word-final characters will typically use a final glyph shape (eg. 064A and 062F ). However, if the previous character joins only to the right, they will use isolated glyph shapes (eg.064A and 062F ).

In all this contextual glyph shaping the basic shapes used for a character can vary significantly in a script like Arabic. This also includes some characters that only have ijam dots in certain contexts.

Character index

Letters

Show

Consonants

آ␣ا␣ب␣ت␣ج␣خ␣د␣ر␣س␣ع␣ف␣ق␣ل␣م␣ن␣ه␣و␣ي␣ک␣گ␣ݒ␣ݖ␣ݝ␣ݧ

Obsolete

پ␣چ␣ڎ␣ڭ␣ݑ␣ݣ␣ݤ

Combining marks

Show
َ␣ُ␣ِ␣ّ␣ْ␣ۛ␣ࣴ␣ࣵ␣ࣷ␣ࣸ␣ࣹ␣ࣺ

Obsolete

ٜ␣ٝ

Punctuation

Show
،␣؛␣؟␣‘␣’␣“␣”␣…

ASCII

!␣(␣)␣-␣.␣:

Other

Show
‌␣‍␣⁧␣‫␣⁦␣‪␣⁨␣⁩␣‬␣‏␣‎␣؜␣͏

To be investigated

%␣Z␣[␣]␣z␣§␣«␣»␣‑␣–␣—␣†␣‡␣;␣‰␣′␣″␣‹␣›
Items to show in lists

Phonology

The following represents the general repertoire of the Wolof languages and dialects.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

i u ə ə əː e o ɛ ɛː ɔ ɔː a

Long vowel sounds are distinctive.

Consonant sounds

labial alveolar palatal velar glottal
stop p b t d c ɟ k ɡ ʔ
pre-nasalised ᵐb ⁿd ᶮɟ ᵑɡ  
fricative f s   x  
nasal m n ɲ ŋ
approximant w l j  
trill/flap   r

The glottal stop occurs before standalone vowels, but is not written.

اَکَرَ

Vowel harmony

Vowels in suffixes tend to be altered due to vowel harmony, based on the advanced tongue retraction (ATR) of the word-initial vowel. There are some exceptions.

+ATR vowels are: i u é ó ë.

-ATR vowels are: e o a.

Authors differ in whether they reflect the vowel harmony in writing.

Tone

There is no tone in Wolof.

Structure

Gemination is common and occurs with all consonants except q, ʔ, f, s, and x.

Gemination and consonant clusters do not occur in word-initial position, but can occur medially and in final position, where they may be followed by a faint epenthetic schwa.

p, d, c, and k only occur formally in word-initial position, unless geminated (which is common), or following a nasal. However, word final b, j, and g are typically devoiced and become allophones of those consonants.

Vowels

Vowel summary table

The following table summarises the main vowel to character assignments.

Each table cell shows word-initial, word-medial, and word-final forms from right to left. The glyphs shown are illustrative; alternative shapes may occur (see joining_forms).

i اِ‍ ◌ِ ◌ِ
اِي‍ ◌ِ‍ي‍ ◌ِ‍ي
u اُ‍ ◌ُ ◌ُ
اُو ◌‍ُو ◌‍ُو
e اࣺ‍ ‍◌ࣺ‍ ◌ࣺ
اࣺي‍ ◌ࣺ‍ي‍ ◌ࣺ‍ي
o اࣸ‍ ◌ࣸ ◌ࣸ
اࣸو ◌ࣸ‍و ◌ࣸ‍و
ə اࣴ‍ ◌ࣴ ◌ࣴ
əː اࣴ‍عࣴ ◌ࣴ‍عࣴ ◌ࣴ‍عࣴ
ɛ اࣹ‍ ‍◌ࣹ‍ ◌ࣹ
ɛː اࣹي‍ ◌ࣹ‍ي‍ ◌ࣹ‍ي
ɔ اࣷ‍ ◌ࣷ ◌ࣷ
ɔː اࣷو ◌ࣷ‍و ◌ࣷ‍و
a اَ‍ ◌َ ◌َ
اَ‍ ◌َ‍ا ◌َ‍ا
Basic Wolofal vowels.

In word-initial position vowels are attached to 0627, which acts as a vowel carrier (see standalone). Otherwise, unlike orthographies for languages such as Arabic and Urdu, the spelling is very regular and characters used to represent a given vowel are normally the same, regardless of the position within a word. All vowel diacritics are always written.

The following is the set of characters needed to write vowels, as described in this section, grouped by general category.

ا␣ع␣و␣ي
َ␣ُ␣ِ␣ࣹ␣ࣺ␣ࣴ␣ࣷ␣ࣸ

Post-consonant vowels

Vowels that follow consonants are written using a mixture of combining marks and letters. Vowel diacritics are not hidden.

The spelling is largely regular. The 5 letters used for vowels also double as consonants. 8 combining marks are used to write vowels, and the sukun is used to indicate vowel absence.

Combining marks used for vowels

Wolof uses the following combining characters for vowels.

َ␣ُ␣ِ␣ࣴ␣ࣷ␣ࣸ␣ࣹ␣ࣺ

Vowel letters

Wolof uses the following letters to write vowels in combination with diacritics.

آ␣ا␣ع␣و␣ي

All word-initial vowels use 0627 as a vowel carrier, except for , which uses the single letter 0622. The remaining letters and aleph are used after a vowel diacritic to indicate a long vowel.

In a standard Arabic orthography some of these characters would be regarded as matres lectionis, but since Wolof shows all vowel diacritics they don't have the same role here. Instead, they form part of a composite that distinguishes one vowel from another (see compositeV).

Vowel length

ا␣و␣ي␣آ␣ع

In most cases, vowel length is indicated by adding one of the first 3 letters above after the vowel diacritic. It is clear when these letters are being used to lengthen vowels (rather than as consonants) because they don't carry a vowel diacritic themselves. For details, see the table at basicV.

لࣺينِ

بُورْ

كࣵادُّ

One exception is the word-initial , which is represented using 0622 alone.

آکِمُ

The long vowel əː is also unusual in that it is represented by the combination 08F4 0639 08F4.

سُفࣴلࣴعࣴر

Nasalisation

Vowel nasalisation is not a distinctive feature of Wolof. (However, see also prenasalisation.)

Standalone vowels

Wolof tends not to have true standalone vowels. Orthographically, word-initial vowels are preceded by 0627, which represents the glottal stop ʔ, although that is not always written in IPA transcriptions.

اَکَرَ

In word-medial locations a sequence of vowels (usually in foreign words) is separated by 0639.

اِسْرَعࣹلْ

Obsolete vowels

In the past the following diacritics were used for Wolofal, but they are now obsolete.

ٜ␣ٝ

065C was used for both ɛ and e. 065D was used for ɔ.

Also, 064E was used where 08F5 is used nowadays, and 064F was used for ɔ instead of 08F7.

Vowel absence

ْ

Wolof uses 0652 to indicate that there is no vowel after a consonant. Vowel absence is usually marked (unlike Standard Arabic), including syllable-final consonants.

اِتَمْ

وࣷرْسَگْ

The exceptions are letters used to lengthen vowels and nasal letters indicating prenasalisation. The regular use of the sukun provides a simple way to unambiguously tell when these letters are being used as consonants or for the functions just described.

سࣵنݖّ

سࣴيهْ

Another exception is that consonants which carry 0651 do not also carry the sukun.

ݒِݖّ

Vowel sounds to characters

This section maps Wolof vowel sounds to common graphemes in the Arabic orthography.

The columns run right to left and indicate typical word-initial, word-medial, and word-final usage. The joining forms shown are illustrative; alternative shapes may occur (see joining_forms). They are also fully-vowelled, although the examples show normal unvowelled usage as well as vowelled.

Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

i
 

0650

لࣺينِ

0650

ݒِݖّ

0627 0650

اِتَمْ

 

0650 064A

0650 064A

نجِيتْ

0627 0650 064A

u
 

064F

دَانُ

064F

دُگُبْ

0627 064F

اُݒُّکَاي

 

064F 0648

064F 0648

بُورْ

0627 064F 0648

e
 

08FA

لࣸرِيࣺ

08FA

دࣺنکّ

0627 08FA

 

08FA 064A

08FA 064A

لࣺينِ

0627 08FA 064A

o
 

08F8

08F8

نࣸبْ

0627 08F8

 

08F8 0648

08F8 0648

وࣸورْ

0627 08F8 0648

ə
 

08F4

08F4

گࣴمْ

0627 08F4

əː
 

08F4 0639 08F4

08F4 0639 08F4

سُفࣴلࣴعࣴر

0627 08F4 0639 08F4

ɛ
 

08F9

اُودࣹ

08F9

سࣹنࣹگَالْ

0627 08F9

ɛː
 

08F9 064A

08F9 064A

لࣹينْ

0627 08F9 064A

ɔ
 

08F7

آجࣷ

08F7

وࣷلࣷفْ

0627 08F7

اࣷوتࣷ

ɔː
 

08F7 0648

08F7 0648

وࣷورْ

0627 08F7 0648

a
 

064E

064E

رَخَسْ

0627 064E

اَکَرَ

 
 

Used before geminated or prenasalised consonants.

08F5

كࣵادُّ

0627 08F5

 

064E 0627

064E 0627

جَايْ

0622

آکِمُ

Consonants

Basic consonant letters

ݒ␣ب␣ت␣د␣ݖ␣ج␣ک␣گ␣ق␣ف␣س␣خ␣ه␣م␣ن␣ݧ␣ݝ␣و␣ر␣ل␣ي

The following letters are used in conjunction with vowels. See vowels.

ا␣آ␣ع

Pre-nasalised stops

مݒ␣مب␣ند␣نݖ␣نج␣نک␣نگ␣نق

Pre-nasalised sounds are written using digraphs, and occur frequently.

مبَارْ

نگَ

ندَوْ

The sounds ᵐp, ᶮc, and ᵑq don't occur word-initially.

سࣵنݖّ

سࣵمݒّ

Observation: The resources consulted don't mention nt, although it does occasionally occur. On the other hand, they seem to describe any nasal followed by one of these stops as prenasalised; there is no indication that the sequence may, for example, be -mp- rather than ᵐp. This is likely to have an impact on syllable segmentation.

Obsolete consonants

In the past the following were used for Wolofal, but they are now obsolete.

جۛ␣پ␣چ␣ڭ␣ݑ␣ݤ

Onsets

Other than prenasalised stops, consonant clusters in syllable onsets are quite rare in Wolof. There are no special mechanisms other than the use of the sukun and the prenasalisation.

Finals

Syllable-final consonants are marked with a sukun, or with a shadda if they are elongated.

Consonant clusters

Consonant clusters are not as common as in many other languages, but they do occur (especially in loan words). When they do, the consonant letter that has no following vowel is marked with 0652.

Consonant length

Consonant gemination is common and is phonemically distinctive in Wolof. Gemination is written using 0651. It is particularly common in word-final position.

بࣴتْ

بࣴتّ

Note that only one of 0651 and 0652 can appear above a letter.

Consonant sounds to characters

This section maps Wolof consonant sounds to common graphemes in the Arabic orthography.

The right-hand column shows the various joining forms for each letter.

Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

p
 

0752

ݒِݖّ

075207520752

b
 

0628

بَاتْ

062806280628

t
 

062A

تَارْ

062A062A062A

d
 

062F

دَانُ

062F062F

c
 

0756

ݖَابِ

075607560756

ɟ
 

062C

جَايْ

062C062C062C

k
 

06A9

کࣹݒُّ

06A906A906A9

ɡ
 

06AF

گَالْ

06AF06AF06AF

q
 

0642

نَقَرْ

064206420642

ᵐp
 

0645 0752 Doesn't occur in word-initial position.

ݒِݖّ

0645 07520645 07520645 0752

ᵐb
 

0645 0628

بَاتْ

0645 06280645 06280645 0628

ⁿd
 

0646 062F

دَانُ

0646 062F0646 062F

ᶮc
 

0646 0756 Doesn't occur in word-initial position.

ݖَابِ

0646 07560646 07560646 0756

ᶮɟ
 

0646 062C

جَايْ

0646 062C0646 062C0646 062C

ᵑk
 

0646 06A9

کࣹݒُّ

0646 06A90646 06A90646 06A9

ᵑɡ
 

0646 06AF

گَالْ

0646 06AF0646 06AF0646 06AF

ᵑq
 

0646 0642 Doesn't occur in word-initial position.

نَقَرْ

0646 06420646 06420646 0642

f
 

0641

فَارْ

064106410641

s
 

0633

سࣹنࣹگَالْ

063306330633

x
 

062E

خَمْ

062E062E062E

h
 

0647 Uncommon.

هَارْ

064706470647

m
 

0645

مَامْ

064506450645

n
 

0646

نَانْ

064606460646

ɲ
 

0767

ݧَانْ

076707670767

ŋ
 

075D

ݝَامَمْ

075D075D075D

w
 

0648

وَاوْ

06480648

r
 

0631

رَخَسْ

06310631

l
 

0644

لࣺينِ

064406440644

j
 

064A

يَايْ

064A064A064A

Other features

Ligatures

The combination لا is always written as a ligature. The underlying code points are, however, preserved. The shape varies slightly, depending on whether the ligature joins to the right or not. Compare:

لَاکَ

لِسْلَانْ

Observation: When diacritics are used with this ligature, they sometimes appear to be over the ALEF, rather than over the LAM, eg. قليلاً This would require a typing order that is different from the spoken sequence.

Formatting characters

Arabic script text makes use of a relatively large set of invisible formatting characters, especially in plain text, many of which are used to manage text direction. Descriptions of these characters can be found in the following sections:

Presentation Forms

The code points in the Unicode blocks Arabic Presentation Forms-A and Arabic Presentation Forms-B provide positional forms of Arabic letters and ligatures. They should not be used for ordinary text. Those code points are provided for compatibility with legacy code pages, and have (compatibility) character decomposition mappings. Normally, Arabic text should be written with code points from the main Arabic block and its extensions; positional forms are dealt with by the font and rendering algorithms.

For more information see the Arabic orthography notes.

Encoding choices

In the Wolofal orthography different sequences of Unicode characters may produce the same visual result. Here we look at those, and make notes on usage.

Codepoint sequences

When typing and in storage, combining marks always follow the base character they are associated with.

Special rendering rules

In principle, if more than one combining mark appears on the same side of the base character, Unicode expects applications to render the marks such that those marks closer to the base character in memory appear closer to the base character when rendered. (This is called the inside-out rule.) However, due to the reordering applied by the Unicode normalisation forms, some of the Arabic script diacritics end up in an inappropriate order on display.

For example, if a user types the sequence of characters in fig_amtra, the order of the marks will be changed such that applying the inside-out rule would render the shadda above the vowel (which is incorrect). (In fact, most application renderers have special rules to correct this.)

The Unicode Standard formally addresses this anomaly in the Technical Annex Unicode® Arabic Mark Rendering (AMTRA), with a set of rules for how to render sequences of Arabic characters. The rules generally move shadda, hamza, round dots, etc. so that they are close to the base character.

User inputPost-normalisation output

بُّ

ب

ّ

ُ

بُ͏ّ

ب

ُ

ّ

A sequence of shadda and damma as the user is likely to input it (left), and how it could potentially be arranged after normalisation (right).

In the rare exceptions where the AMTRA rules should not change the rendering, this can be achieved by placing an invisible 034F character between the combining marks. (In fact, this is what was done to simulate the incorrect appearance in fig_amtra, because otherwise the browser rendering engine would have automatically produced the same output as in the first column. Clicking on the example will show the sequence used.)

Numbers, dates, currency, etc

TBD.

Text direction

Arabic script text is written horizontally and right-to-left in the main but, as in most right-to-left scripts, numbers and embedded text in other scripts are written left-to-right (producing 'bidirectional' text).

العاشر ليونيكود (Unicode Conference)،الذي سيعقد في 10-12 آذار 1997 مبدينة
Arabic words are read right-to-left, starting from the right of this line, but numbers and Latin text (highlighted) are read left-to-right.

The Unicode Bidirectional Algorithm automatically takes care of the ordering for all the text in fig_bidi, as long as the 'base direction' is set to RTL. In HTML this can be set using the dir attribute, or in plain text using formatting controls.

If the base direction is not set appropriately, the directional runs will be ordered incorrectly as shown in fig_bidi_no_base_direction, making it very difficult to get the meaning.

في XHMTL 1.0 يتم تحقيق ذلك بإضافة العنصر المضمن bdo.
في XHMTL 1.0 يتم تحقيق ذلك بإضافة العنصر المضمن bdo.
The exact same sequence of characters with the base direction set to RTL (top), and with no base direction set on this LTR page (bottom). Certain items are highlighted to help track their position.

Show default bidi_class properties for characters in the Wolof language.

For other aspects of dealing with right-to-left writing systems see the following sections:

For more information about how directionality and base direction work, see Unicode Bidirectional Algorithm basics. For information about plain text formatting characters see How to use Unicode controls for bidi text. And for working with markup in HTML, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.

For authoring HTML pages, one of the most important things to remember is to use <html dir="rtl" … > at the top of the page. Also, use markup to manage direction, and do not use CSS styling.

Managing text direction

Unicode provides a set of 10 formatting characters that can be used to control the direction of text when displayed. These characters have no visual form in the rendered text, however text editing applications may have a way to show their location.

202B (RLE), 202A (LRE), and 202C (PDF) are in widespread use to set the base direction of a range of characters. RLE/LRE comes at the start, and PDF at the end of a range of characters for which the base direction is to be set.

In Unicode 6.1, the Unicode Standard added a set of characters which do the same thing but also isolate the content from surrounding characters, in order to avoid spillover effects. They are 2067 (RLI), 2066 (LRI), and 2066 (PDI). The Unicode Standard recommends that these be used instead.

There is also 2068 (FSI), used initially to set the base direction according to the first recognised strongly-directional character.

061C (ALM) is used to produce correct sequencing of numeric data. Follow the link and see expressions for details.

200F (RLM) and 200E (LRM) are invisible characters with strong directional properties that are also sometimes used to produce the correct ordering of text.

For more information about how to use these formatting characters see How to use Unicode controls for bidi text. Note, however, that when writing HTML you should generally use markup rather than these control codes. For information about that, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.

Glyph shaping & positioning

You can experiment with examples using the Wolofal character app.

Cursive script

Arabic script is always cursive, ie. letters in a word are joined up. Fonts need to produce the appropriate joining form for a letter, according to its visual context, but the code point used doesn't change. This results in four different shapes for most letters (including an isolated shape). Ligated forms also join with characters alongside them.

The highlights in the example below show the same letter, ع, with three different joining forms.

على • متعددة • وسيجمع

The letter ع (ain) in 3 different joining contexts.

Most Arabic script letters join on both sides. A few only join on the right-hand side: this involves 4 basic shapes for Modern Standard Arabic.

Cursive joining forms

Most dual-joining characters add or become a swash when they don't join to the left. A number of characters, however, undergo additional shape changes across the joining forms. fig_joining_forms and fig_right_joining_forms show the basic shapes in Modern Standard Arabic and what their joining forms look like. Significant variations are highlighted.

isolatedright-joineddual-joinleft-joined MSA letters
ب ـب ـبـ بـ
ب␣ت␣ث␣پ
ن ـن ـنـ نـ
ن
ق ـق ـقـ قـ
ق
ف ـف ـفـ فـ
ف␣ڤ
س ـس ـسـ سـ
س␣ش
ص ـص ـصـ صـ
ص␣ض
ط ـط ـطـ طـ
ط␣ظ
ك ـك ـكـ كـ
ك
ل ـل ـلـ لـ
ل
ه ـه ـهـ هـ
ه␣ة
م ـم ـمـ مـ
م
ع ـع ـعـ عـ
ع␣غ
ح ـح ـحـ حـ
ح␣خ␣ج␣چ
ي ـي ـيـ يـ
ي␣ئ␣ى
Joining forms for shapes that join on both sides..
isolatedright-joined MSA letters
ا ـا
ا␣أ␣إ␣آ␣ٱ
ر ـر
ر␣ز
د ـد
د␣ذ
و ـو
و␣ؤ
Joining forms for shapes that join on the right only.

Managing glyph shaping

200D (ZWJ) and 200C (ZWNJ) are used to control the joining behaviour of cursive glyphs. They are particularly useful in educational contexts, but also have real world applications.

ZWJ permits a letter to form a cursive connection without a visible neighbour. For example, the marker for hijri dates is an initial form of heh, even though it doesn't join to the left, ie. ه‍. For this, use ZWJ immediately after the heh, eg. الاثنين 10 رجب 1415 ه‍..

ZWNJ prevents two adjacent letters forming a cursive connection with each other when rendered. For example, it is used in Persian for plural suffixes, some proper names, and Ottoman Turkish vowels. Ignoring or removing the ZWNJ will result in text with a different meaning or meaningless text, eg, تن‌ها is the plural of body, whereas تنها is the adjective alone.2 The only difference is the presence or absence of ZWNJ after noon.

034F is used in Arabic to produce special ordering of diacritics. The name is a misnomer, as it is generally used to break the normal sequence of diacritics.

Context-based shaping & positioning

See just above for shaping related to cursive joining.

See also the section on glyph shaping in the Arabic orthography notes.

Wolof-specific font features

The SIL Scheherazade New and Harmattan fonts provide some special shaping to a number of Arabic characters when the language of the text is set to Wolof. These include the following.

  • The letter د has small vertical extensions on both ends.

  • When a consonant carries the combination 0651 0650, the vowel is positioned below the consonant, rather than below the shadda.

  • The glyph for 064F doesn't have a small end stroke sticking out to the right.

Typographic units

Word boundaries

Words are separated by spaces.

Some words are hyphenated, eg. سَݧْ-سَݧْ saɲ-saɲ or اَمْ-دِ-جَمَّ am-di-jamma.

Graphemes

tbd

Punctuation & inline features

Phrase & section boundaries

،␣؛␣:␣.␣؟␣!

Wolof uses ASCII punctuation.

phrase

،

؛

:

sentence

.

؟

!

Bracketed text

(␣)

Wolof commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(

)

Mirrored characters

The words 'left' and 'right' in the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.

a > b > c
ا > ب > ج
Both of these lines use > U+003E GREATER-THAN SIGN, but the direction it faces depends on the base direction at the point of display.

The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones.

(␣)␣<␣>␣[␣]␣{␣}␣«␣»␣‹␣›

Quotations & citations

See type samples.

”␣“␣’␣‘

The following type of quotation mark can be found in Wolofal texts. When quoted text appears within quoted text different characters are used, though usually of the same type. (Of course, depending on ease of input, quotations may also be surrounded by ASCII double and single quote marks.)

  start end
primary

nested

Unlike brackets, these quote marks are not mirrored during display. As a result, LEFT means use on the left, and RIGHT means use on the right.

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show default line-breaking properties for characters in this orthography.

The following list gives examples of typical behaviours for characters affected by these rules. Context may affect the behaviour of some of these and other characters.

  • « “ ‘ (   should not be the last character on a line
  • » ” ’ ) . ، ؛ ؟ !   should not begin a new line

Breaking between Latin words

When a line break occurs in the middle of an embedded left-to-right sequence, the items in that sequence need to be rearranged visually so that it isn't necessary to read lines upwards.

latin-line-breaks shows how two Latin words are apparently reordered in the flow of text to accommodate this rule. Of course, the rearragement is only that of the visual glyphs: nothing affects the order of the characters in memory.

Text with no line break in Latin text.

Text with line break in Latin text.

In this Arabic language text, the lower of these two images shows the result of decreasing the line width, so that text wraps between a sequence of Latin words.

Text alignment & justification

The principal line-break opportunities are inter-word spaces.

Baselines, line height, etc.

tbd

Wolof uses the 'alphabetic' baseline.

Page & book layout

Online resources

  1. Dictionnaire wolof-français et français-wolof
  2. Wollof-English dictionary

References