Updated 12 December, 2024
This page brings together basic information about the Arabic script and its use for the Wolof language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Wolof using Unicode.
Richard Ishida, Wolof (Wolofal ajami) Orthography Notes, 12-Dec-2024, https://r12a.github.io/scripts/arab/wo
دࣷومِ آدَمَ يࣺݒّ دَنُيْ جُدُّ، يَمْ ݖِ تَوفࣹيخْ ݖِ سَگْ اَکْ سَݧْ-سَݧْ. نࣹکّ نَ اِتْ کُ خَمْ دࣴگّ تࣹ اࣵندْ نَ خࣹلَمْ، تࣹ وَرْ نَا جࣴفْلَنْتࣹ اَکْ نَوْلࣹينْ، تࣹ تࣹگْ کࣷ ݖِ وࣵلُّ مبࣷکّ.
Source: Unicode UDHR, article 1, in Wikipedia
Origins of the Arabic script, 6thC – today.
Phoenician
└ Aramaic
└ Nabataean
└ Arabic
The Wolof language is spoken by around 40% of Senegalese, and others in Mauritania and The Gambia. There are around 5.5 million native speakers in Senegal, and the total number of speakers is a little over 12 million.
Wolof is normally written in the Latin script, and the Garay script is used by a small number of people, but historically and still occasionally it is written using the Arabic ajami script.
وࣷلࣷفْ لࣵکّ
The use of the Arabic script has a much longer history than that of the Latin orthography, which was a colonial import. A number of letters were obsoleted by more recent standards for the orthography. Reforms under the umbrella of the Directíon de la Promotion des Langues Nationales (DPLN) in 1986 and 1990 sought to harmonise the spelling of Latin and Arabic orthographies with that used in other countries of West Africalpp. In-country standards are currently maintained by the Centre de linguistique appliquée de Dakar (CLAD).
The Arabic script is commonly used as an abjad, which means that in normal use the script represents only consonant and long vowel sounds. However, since Wolofal normally shows all the vowel diacritics, it actually functions as an alphabet. See the table to the right for a brief overview of features for Wolof using the Arabic script.
Wolof is written right to left in horizontal lines. Letters are joined (cursive) as is usual for the Arabic script. Words are separated by spaces. The script is unicameral.
The orthography for Wolofal described here has 24 basic consonants.
Wolof has prenasalised consonants, but they are spelled out using normal consonant letters (as in the Latin orthography).
Doubled consonants are common and are indicated using the shadda diacritic.
❯ basicV
Wolof is an alphabet where vowel sounds are written using a mixture of combining marks and letters. Unlike Semitic languages such as Arabic that build words on consonant patterns and so normally hide vowel diacritics in the Arabic script, it can be difficult to read Wolof text without the full vowel information, and therefore Wolofal retains all vowel diacritics in the text. Wolof has more vowel sounds than Arabic, so additional diacritics are used to write those.
The spelling is largely regular. The 5 letters used for vowels also double as consonants. 8 combining marks are used to write vowels, and the sukun is used to indicate vowel absence.
Word-initially, standalone vowels are attached to aleph. Word-medial vowels following another vowel are rare, and are written by adding vowel characters to ع, rather than the alef used for the Arabic language.
Line-breaking and justification are primarily based on inter-word spaces.
Because the Arabic script is 'cursive' (ie. joined-up) writing, letters tend to have different shapes depending on whether they join with adjacent letters or not (see cursive). In addition, vowels can be represented using different characters, depending on where in a word they appear.
In scripts such as Arabic, several characters have no left-joining form. In what follows we'll use the characters ي and د to illustrate shapes. The former can join on both sides, but the latter can only join on the right.
Left-joining glyphs are commonly called initial; dual-joining are called medial; and right-joining are called final. Glyphs that don't join on either side are called isolated. However, these glyph shapes can be found in various places within a single word.
Word-initial characters usually have initial glyph shapes (eg. 064A ). However, characters that only join to the right will use an isolated glyph shape (eg. 062F ). Furthermore, words beginning with a vowel are always preceded by a vowel carrier, which is normally ا (eg. 0627 06CC or 0627 064E ).
Word-medial characters will typically join on both sides (eg. 064A ) but those that only join to the right will use a final glyph (eg. 062F ). However, if either of those is preceded by another character that only joins to the right, the glyph shapes rendered will be initial (eg. 064A ) and isolated (eg. 062F ), respectively.
Word-final characters will typically use a final glyph shape (eg. 064A and 062F ). However, if the previous character joins only to the right, they will use isolated glyph shapes (eg.064A and 062F ).
In all this contextual glyph shaping the basic shapes used for a character can vary significantly in a script like Arabic. This also includes some characters that only have ijam dots in certain contexts.
The following represents the general repertoire of the Wolof languages and dialects.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones. Source Wikipedia.
Long vowel sounds are distinctive.
labial | alveolar | palatal | velar | glottal | |
---|---|---|---|---|---|
stop | p b | t d | c ɟ | k ɡ | ʔ |
pre-nasalised | ᵐb | ⁿd | ᶮɟ | ᵑɡ | |
fricative | f | s | x | ||
nasal | m | n | ɲ | ŋ | |
approximant | w | l | j | ||
trill/flap | r | ||||
The glottal stop occurs before standalone vowels, but is not written.
اَکَرَ
Vowels in suffixes tend to be altered due to vowel harmony, based on the advanced tongue retraction (ATR) of the word-initial vowel. There are some exceptions.
+ATR vowels are: i u é ó ë.
-ATR vowels are: e o a.
Authors differ in whether they reflect the vowel harmony in writing.
There is no tone in Wolof.
Gemination is common and occurs with all consonants except q, ʔ, f, s, and x.
Gemination and consonant clusters do not occur in word-initial position, but can occur medially and in final position, where they may be followed by a faint epenthetic schwa.
p, d, c, and k only occur formally in word-initial position, unless geminated (which is common), or following a nasal. However, word final b, j, and g are typically devoiced and become allophones of those consonants.
The following table summarises the main vowel to character assignments.
Each table cell shows word-initial, word-medial, and word-final forms from right to left. The glyphs shown are illustrative; alternative shapes may occur (see joining_forms).
i
اِ
◌ِ
◌ِ
iː اِي ◌ِي ◌ِي |
u
اُ
◌ُ
◌ُ
uː اُو ◌ُو ◌ُو |
e
اࣺ
◌ࣺ
◌ࣺ
eː اࣺي ◌ࣺي ◌ࣺي |
o
اࣸ
◌ࣸ
◌ࣸ
oː اࣸو ◌ࣸو ◌ࣸو |
ə
اࣴ
◌ࣴ
◌ࣴ
əː اࣴعࣴ ◌ࣴعࣴ ◌ࣴعࣴ |
|
ɛ
اࣹ
◌ࣹ
◌ࣹ
ɛː اࣹي ◌ࣹي ◌ࣹي |
ɔ
اࣷ
◌ࣷ
◌ࣷ
ɔː اࣷو ◌ࣷو ◌ࣷو |
a
اَ
◌َ
◌َ
aː اَ ◌َا ◌َا |
In word-initial position vowels are attached to 0627, which acts as a vowel carrier (see standalone). Otherwise, unlike orthographies for languages such as Arabic and Urdu, the spelling is very regular and characters used to represent a given vowel are normally the same, regardless of the position within a word. All vowel diacritics are always written.
Vowels that follow consonants are written using a mixture of combining marks and letters. Vowel diacritics are not hidden.
The spelling is largely regular. The 5 letters used for vowels also double as consonants. 8 combining marks are used to write vowels, and the sukun is used to indicate vowel absence.
Wolof uses the following combining characters for vowels.
Wolof uses the following letters to write vowels in combination with diacritics.
All word-initial vowels use 0627 as a vowel carrier, except for aː, which uses the single letter 0622. The remaining letters and aleph are used after a vowel diacritic to indicate a long vowel.
In a standard Arabic orthography some of these characters would be regarded as matres lectionis, but since Wolof shows all vowel diacritics they don't have the same role here. Instead, they form part of a composite that distinguishes one vowel from another (see compositeV).
In most cases, vowel length is indicated by adding one of the first 3 letters above after the vowel diacritic. It is clear when these letters are being used to lengthen vowels (rather than as consonants) because they don't carry a vowel diacritic themselves. For details, see the table at basicV.
لࣺينِ
بُورْ
كࣵادُّ
One exception is the word-initial aː, which is represented using 0622 alone.
آکِمُ
The long vowel əː is also unusual in that it is represented by the combination 08F4 0639 08F4.
سُفࣴلࣴعࣴر
Vowel nasalisation is not a distinctive feature of Wolof. (However, see also prenasalisation.)
Wolof tends not to have true standalone vowels. Orthographically, word-initial vowels are preceded by 0627, which represents the glottal stop ʔ, although that is not always written in IPA transcriptions.
اَکَرَ
In word-medial locations a sequence of vowels (usually in foreign words) is separated by 0639.
اِسْرَعࣹلْ
In the past the following diacritics were used for Wolofal, but they are now obsolete.
065C was used for both ɛ and e. 065D was used for ɔ.
Also, 064E was used where 08F5 is used nowadays, and 064F was used for ɔ instead of 08F7.
This section maps Wolof vowel sounds to common graphemes in the Arabic orthography.
The items indicate typical word-initial, word-medial, and word-final usage. The joining forms shown are illustrative; alternative shapes may occur (see joining_forms).
initial 0627 0650 eg. اِتَمْ
medial 0650 eg. ݒِݖّ
final 0650 eg. لࣺينِ
initial 0627 0650 064A
medial 0650 064A eg. نجِيتْ
final 0650 064A
initial 0627 064F eg. اُݒُّکَاي
medial 064F eg. دُگُبْ
final 064F eg. دَانُ
initial 0627 064F 0648
medial 064F 0648 eg. بُورْ
final 064F 0648
initial 0627 08FA
medial 08FA eg. دࣺنکّ
final 08FA eg. لࣸرِيࣺ
initial 0627 08FA 064A
medial 08FA 064A eg. لࣺينِ
final 08FA 064A
initial 0627 08F8
medial 08F8 eg. نࣸبْ
final 08F8
initial 0627 08F8 0648
medial 08F8 0648 eg. وࣸورْ
final 08F8 0648
initial 0627 08F4
medial 08F4 eg. گࣴمْ
final 08F4
initial 0627 08F4 0639 08F4
medial 08F4 0639 08F4 eg. سُفࣴلࣴعࣴر
final 08F4 0639 08F4
initial 0627 08F9
medial 08F9 eg. سࣹنࣹگَالْ
final 08F9 eg. اُودࣹ
initial 0627 08F9 064A
medial 08F9 064A eg. لࣹينْ
final 08F9 064A
initial 0627 08F7 eg. آجࣷ
medial 08F7 eg. وࣷلࣷفْ
final 08F7 eg. آجࣷ
initial 0627 08F7 0648
medial 08F7 0648 eg. وࣷورْ
final 08F7 0648
initial 0627 064E eg. اَکَرَ
initial 0627 08F5 Used before geminated or prenasalised consonants.
medial 064E eg. رَخَسْ
medial 08F5 Used before geminated or prenasalised consonants, eg. كࣵادُّ
final 064E
initial 0622 eg. آکِمُ
medial 064E 0627 eg. جَايْ
final 064E 0627
The following letters are used in conjunction with vowels. See vowels.
Pre-nasalised sounds are written using digraphs, and occur frequently.
مبَارْ
نگَ
ندَوْ
The sounds ᵐp, ᶮc, and ᵑq don't occur word-initially.
سࣵنݖّ
سࣵمݒّ
Observation: The resources consulted don't mention nt, although it does occasionally occur. On the other hand, they seem to describe any nasal followed by one of these stops as prenasalised; there is no indication that the sequence may, for example, be -mp- rather than ᵐp. This is likely to have an impact on syllable segmentation.
In the past the following were used for Wolofal, but they are now obsolete.
Other than prenasalised stops, consonant clusters in syllable onsets are quite rare in Wolof. There are no special mechanisms other than the use of the sukun and the prenasalisation.
Wolof uses 0652 to indicate that there is no vowel after a consonant. Vowel absence is usually marked (unlike Standard Arabic), including syllable-final consonants.
اِتَمْ
وࣷرْسَگْ
The exceptions are letters used to lengthen vowels and nasal letters indicating prenasalisation. The regular use of the sukun provides a simple way to unambiguously tell when these letters are being used as consonants or for the functions just described.
سࣵنݖّ
سࣴيهْ
Another exception is that consonants which carry 0651 do not also carry the sukun.
ݒِݖّ
Consonant clusters are not as common as in many other languages, but they do occur (especially in loan words). When they do, the consonant letter that has no following vowel is marked with 0652.
Consonant gemination is common and is phonemically distinctive in Wolof. Gemination is written using 0651. It is particularly common in word-final position.
بࣴتْ
بࣴتّ
Note that only one of 0651 and 0652 can appear above a letter.
This section maps Wolof consonant sounds to common graphemes in the Arabic orthography.
The right-hand side of each item shows joining forms.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.
0752075207520752 consonant ݒ
0628062806280628 consonant ب
062A062A062A062A consonant ت
062F062F consonant د
0756075607560756 consonant ݖ
062C062C062C062C consonant ج
06A906A906A906A9 consonant ک
06AF06AF06AF06AF consonant گ
0642064206420642 consonant ق
0645064506450645 prenasalised consonant مݒ not word-initial
0645064506450645 prenasalised consonant مب
06460646 prenasalised consonant ند
0646064606460646 prenasalised consonant نݖ not word-initial
0646064606460646 prenasalised consonant نج
0646064606460646 prenasalised consonant نک
0646064606460646 prenasalised consonant نگ
0646064606460646 prenasalised consonant نق not word-initial
0641064106410641 consonant ف
0633063306330633 consonant س
062E062E062E062E consonant خ
0647064706470647 consonant ه
0645064506450645 consonant م
0646064606460646 consonant ن
0767076707670767 consonant ݧ
075D075D075D075D consonant ݝ
06480648 consonant و
06310631 consonant ر
0644064406440644 consonant ل
064A064A064A064A consonant ي
The combination لا is always written as a ligature. The underlying code points are, however, preserved. The shape varies slightly, depending on whether the ligature joins to the right or not. Compare:
لَاکَ
لِسْلَانْ
Observation: When diacritics are used with this ligature, they sometimes appear to be over the ALEF, rather than over the LAM, eg. قليلاً This would require a typing order that is different from the spoken sequence.
Arabic script text makes use of a relatively large set of invisible formatting characters, especially in plain text, many of which are used to manage text direction. Descriptions of these characters can be found in the following sections:
The code points in the Unicode blocks Arabic Presentation Forms-A and Arabic Presentation Forms-B provide positional forms of Arabic letters and ligatures. They should not be used for ordinary text. Those code points are provided for compatibility with legacy code pages, and have (compatibility) character decomposition mappings. Normally, Arabic text should be written with code points from the main Arabic block and its extensions; positional forms are dealt with by the font and rendering algorithms.
For more information see the Arabic orthography notes.
In the Wolofal orthography different sequences of Unicode characters may produce the same visual result. Here we look at those, and make notes on usage.
When typing and in storage, combining marks always follow the base character they are associated with.
In principle, if more than one combining mark appears on the same side of the base character, Unicode expects applications to render the marks such that those marks closer to the base character in memory appear closer to the base character when rendered. (This is called the inside-out rule
.) However, due to the reordering applied by the Unicode normalisation forms, some of the Arabic script diacritics end up in an inappropriate order on display.
For example, if a user types the sequence of characters in fig_amtra, the order of the marks will be changed such that applying the inside-out rule would render the shadda above the vowel (which is incorrect). (In fact, most application renderers have special rules to correct this.)
The Unicode Standard formally addresses this anomaly in the Technical Annex Unicode® Arabic Mark Rendering (AMTRA), with a set of rules for how to render sequences of Arabic characters. The rules generally move shadda, hamza, round dots, etc. so that they are close to the base character.
User input | Post-normalisation output |
---|---|
بُّ ب ّ ُ |
بُ͏ّ ب ُ ّ |
In the rare exceptions where the AMTRA rules should not change the rendering, this can be achieved by placing an invisible 034F character between the combining marks. (In fact, this is what was done to simulate the incorrect appearance in fig_amtra, because otherwise the browser rendering engine would have automatically produced the same output as in the first column. Clicking on the example will show the sequence used.)
TBD.
Arabic script text is written horizontally and right-to-left in the main but, as in most right-to-left scripts, numbers and embedded text in other scripts are written left-to-right (producing 'bidirectional' text).
The Unicode Bidirectional Algorithm automatically takes care of the ordering for all the text in fig_bidi, as long as the 'base direction' is set to RTL. In HTML this can be set using the dir
attribute, or in plain text using formatting controls.
If the base direction is not set appropriately, the directional runs will be ordered incorrectly as shown in fig_bidi_no_base_direction, making it very difficult to get the meaning.
Show default bidi_class
properties for characters in the Wolof language.
For other aspects of dealing with right-to-left writing systems see the following sections:
For more information about how directionality and base direction work, see Unicode Bidirectional Algorithm basics. For information about plain text formatting characters see How to use Unicode controls for bidi text. And for working with markup in HTML, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
For authoring HTML pages, one of the most important things to remember is to use <html dir="rtl" … >
at the top of the page. Also, use markup to manage direction, and do not use CSS styling.
Unicode provides a set of 10 formatting characters that can be used to control the direction of text when displayed. These characters have no visual form in the rendered text, however text editing applications may have a way to show their location.
202B (RLE), 202A (LRE), and 202C (PDF) are in widespread use to set the base direction of a range of characters. RLE/LRE comes at the start, and PDF at the end of a range of characters for which the base direction is to be set.
In Unicode 6.1, the Unicode Standard added a set of characters which do the same thing but also isolate the content from surrounding characters, in order to avoid spillover effects. They are 2067 (RLI), 2066 (LRI), and 2066 (PDI). The Unicode Standard recommends that these be used instead.
There is also 2068 (FSI), used initially to set the base direction according to the first recognised strongly-directional character.
061C (ALM) is used to produce correct sequencing of numeric data. Follow the link and see expressions for details.
200F (RLM) and 200E (LRM) are invisible characters with strong directional properties that are also sometimes used to produce the correct ordering of text.
For more information about how to use these formatting characters see How to use Unicode controls for bidi text. Note, however, that when writing HTML you should generally use markup rather than these control codes. For information about that, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
You can experiment with examples using the Wolofal character app.
Arabic script is always cursive, ie. letters in a word are joined up. Fonts need to produce the appropriate joining form for a letter, according to its visual context, but the code point used doesn't change. This results in four different shapes for most letters (including an isolated shape). Ligated forms also join with characters alongside them.
The highlights in the example below show the same letter, ع, with three different joining forms.
Most Arabic script letters join on both sides. A few only join on the right-hand side: this involves 4 basic shapes for Modern Standard Arabic.
Most dual-joining characters add or become a swash when they don't join to the left. A number of characters, however, undergo additional shape changes across the joining forms. fig_joining_forms and fig_right_joining_forms show the basic shapes in Modern Standard Arabic and what their joining forms look like. Significant variations are highlighted.
isolated | right-joined | dual-join | left-joined | MSA letters |
---|---|---|---|---|
ب | ـب | ـبـ | بـ | |
ن | ـن | ـنـ | نـ | |
ق | ـق | ـقـ | قـ | |
ف | ـف | ـفـ | فـ | |
س | ـس | ـسـ | سـ | |
ل | ـل | ـلـ | لـ | |
ه | ـه | ـهـ | هـ | |
م | ـم | ـمـ | مـ | |
ع | ـع | ـعـ | عـ | |
ح | ـح | ـحـ | حـ | |
ي | ـي | ـيـ | يـ |
isolated | right-joined | MSA letters |
---|---|---|
ا | ـا | |
ر | ـر | |
د | ـد | |
و | ـو |
200D (ZWJ) and 200C (ZWNJ) are used to control the joining behaviour of cursive glyphs. They are particularly useful in educational contexts, but also have real world applications.
ZWJ permits a letter to form a cursive connection without a visible neighbour. For example, the marker for hijri dates is an initial form of heh, even though it doesn't join to the left, ie. ه. For this, use ZWJ immediately after the heh, eg. الاثنين 10 رجب 1415 ه..
ZWNJ prevents two adjacent letters forming a cursive connection with each other when rendered. For example, it is used in Persian for plural suffixes, some proper names, and Ottoman Turkish vowels. Ignoring or removing the ZWNJ will result in text with a different meaning or meaningless text, eg, تنها is the plural of body, whereas تنها is the adjective alone.2 The only difference is the presence or absence of ZWNJ after noon.
034F is used in Arabic to produce special ordering of diacritics. The name is a misnomer, as it is generally used to break the normal sequence of diacritics.
See just above for shaping related to cursive joining.
See also the section on glyph shaping in the Arabic orthography notes.
The SIL Scheherazade New and Harmattan fonts provide some special shaping to a number of Arabic characters when the language of the text is set to Wolof. These include the following.
The letter د has small vertical extensions on both ends.
When a consonant carries the combination 0651 0650, the vowel is positioned below the consonant, rather than below the shadda.
The glyph for 064F doesn't have a small end stroke sticking out to the right.
Words are separated by spaces.
Some words are hyphenated, eg. سَݧْ-سَݧْ saɲ-saɲ or اَمْ-دِ-جَمَّ am-di-jamma.
tbd
Wolof uses ASCII punctuation.
phrase |
، ؛ : |
---|---|
sentence |
. ؟ ! |
Wolof commonly uses ASCII parentheses to insert parenthetical information into text.
start | end | |
---|---|---|
standard | ( |
) |
The words 'left' and 'right' in the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.
The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones.
See type samples.
The following type of quotation mark can be found in Wolofal texts. When quoted text appears within quoted text different characters are used, though usually of the same type. (Of course, depending on ease of input, quotations may also be surrounded by ASCII double and single quote marks.)
start | end | |
---|---|---|
primary | ” |
“ |
nested | ’ |
‘ |
Unlike brackets, these quote marks are not mirrored during display. As a result, LEFT means use on the left, and RIGHT means use on the right.
Lines are generally broken between words.
As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.
Show default line-breaking properties for characters in this orthography.
The following list gives examples of typical behaviours for characters affected by these rules. Context may affect the behaviour of some of these and other characters.
When a line break occurs in the middle of an embedded left-to-right sequence, the items in that sequence need to be rearranged visually so that it isn't necessary to read lines upwards.
latin-line-breaks shows how two Latin words are apparently reordered in the flow of text to accommodate this rule. Of course, the rearragement is only that of the visual glyphs: nothing affects the order of the characters in memory.
The principal line-break opportunities are inter-word spaces.
tbd
Wolof uses the 'alphabetic' baseline.