Uighur writing system

Updated 9 February, 2019 • tags arabic, scriptnotes

This page provides basic information about the Uighur writing system, a variant of the Arabic script, which builds on the more general information in the Arabic script summary. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For character-specific details follow the links to the Arabic character notes.

For similar information related to other scripts, see the Script comparison table.  

Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

Sample (Uighur)

1 ماددا ھەممە ئادەم زانىدىنلا ئەركىن، ئىززەت-ھۆرمەت ۋە ھوقۇقتا باپباراۋەر بولۇپ تۇغۇلغان. ئۇلار ئەقىلغە ۋە ۋىجدانغا ئىگە ھەمدە بىر-بىرىگە قېرىنداشلىق مۇناسىۋىتىگە خاس روھ بىلەن موئامىلە قىلىشى كېرەك.

2 ماددا ھەممە ئادەم مۇشۇ خىتابنامىدە قەيت قىلىنغان بارلىق ھوقۇق ۋە ئەركىنلىكتىن بەھرىمەن بولۇش سالاھىيىتىگە ئىگە. ئۇلار ئىرقى، رەڭگى، جىنسى، تىلى، دىنى، سىياسىي قارىشى ياكى باشقا قارىشى، دۆلەت تەۋەلىكى ياكى ئىجتىمائىي كېلىپ چىقىشى، مۈلكى، تۇغۇلۇشى ياكى باشقا سالاھىيىتى جەھەتتىن قىلچە پەرقلەنمەيدۇ. ئۇنىڭ ئۇستىگە ھەممە ئادەم ئوزى تەۋە دۆلەت ياكى زېمىننىڭ سياسىي، مەمۇرىي لاكى خەلقئارا ئورنىنىڭ ئوخشاش بولماسلىقى بىلەن پەرقلەنمەيدۇ. بۇ زېمىننىڭ مۇستەقىل زېمىن، ۋاكالىتەن باشقۇرۇلۇۋاتقان زېمىن، ئاپتونومىيىسىز زېمىن ياكى باشقا ھەرقانداق ىگىلىك ھوقۇقىغا چەك قويۇلغان ھالەتتىكى زېمىن بولۇشىدىن قەتئىينەزەر.

Usage & history

From Wikipedia:

The Uyghur Perso-Arabic alphabet (Uyghur: ئۇيغۇر ئەرەب يېزىقى‎, ULY: Uyghur Ereb Yëziqi or UEY, USY: Уйғур Әрәб Йезиқи) is an Arabic alphabet used for writing the Uyghur language, primarily by Uyghurs living in China. It is one of several Uyghur alphabets, and has been the official alphabet of the Uyghur language since 1982.

The first Perso-Arabic derived alphabet for Uyghur was developed in the 10th century, when Islam was introduced there. The version used for writing the Chagatai language. It became the regional literary language, now known as the Chagatay alphabet. It was used nearly exclusively up to the early 1920s. Alternative Uyghur scripts then began emerging and collectively largely displaced Chagatai; Kona Yëziq, meaning "old script", now distinguishes it and UEY from the alternatives that are not derived from Arabic. Between 1937 and 1954 the Perso-Arabic alphabet used to write Uyghur was modified by removing redundant letters and adding markings for vowels. A Cyrillic alphabet was adopted in the 1950s and a Latin alphabet in 1958. The modern Uyghur Perso-Arabic alphabet was made official in 1978 and reinstituted by the Chinese government in 1983, with modifications for representing Uyghur vowels.

The Arabic alphabet used before the modifications (Kona Yëziq) did not represent Uyghur vowels and according to Robert Barkley Shaw, spelling was irregular and long vowel letters were frequently written for short vowels since most Turki speakers were unsure of the difference between long and short vowels. The pre-modification alphabet used Arabic diacritics (zabar, zer, and pesh) to mark short vowels. ...

The reformed modern Uyghur Arabic alphabet eliminated letters whose sounds were found only in Arabic and spelt Arabic and Persian loanwords, including Islamic religious words, as they were pronounced in Uyghur, not as they were originally spelt in Arabic or Persian.

Distinctive features

The Arabic script is normally an abjad, ie. in normal use the script represents only consonant and long vowel sounds. This approach is helped by the strong emphasis on consonant patterns in Semitic languages. However Uighur is not a Semitic language, and the modern version of the Arabic script used for Uighur is an alphabet. See the table to the right for a brief overview of the features of the general Arabic script, taken from the Script Comparison Table.

Uighur text is written horizontally, right-to-left, but numbers and embedded Latin text are read left-to-right. Words are separated by spaces, and contain a mixture of consonants and vowels. Initial vowels or those preceded by a vowel in a word are preceded by 'hamza on a tooth', eg. ئە.

The script is cursive, and some basic letter shapes change significantly, depending on their joining context.

Character lists

For information about the Arabic script in general, and for links to pages about other writing systems based on the script, see the Arabic script summary. This page will focus on the features of the Uighur writing system.

The following links give information about characters used the Uighur language. The numbers in parentheses are for non-ASCII characters.

For character-specific details see the Arabic character notes.

Uighur uses the following characters over and above those listed for Arabic.

پ␣چ␣ژ␣ڭ␣گ␣ھ␣ۆ␣ۇ␣ۈ␣ۋ␣ې␣ە
In yellow boxes, show:

Text direction

Arabic script is written horizontally and right-to-left in the main, but as with most RTL scripts, numbers and embedded LTR script text are written left-to-right (producing 'bidirectional' text).

1899 - ئاسپىرىن (Aspirin) بازارغا سېلىندى.

Uighur words are read RTL, starting on the right, but numbers and Latin text are read left-to-right.

Vowels

There are 8 vowels.

ا␣ە␣و␣ۇ␣ۆ␣ۈ␣ې␣ى
list all
اaɑ,a
ەeɛ,æ
وoo,ɔ
ۇuu,ʊ
ۆöø
ۈüy,ʏ
ېée
ىii,ɨ

The forms shown above occur within or at the end of a word. When a vowel is alone, initial, or follows another vowel inside a word, it is always preceded by ئ [U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE], which in theory represents the glottal stop, but which is not pronounced as such at the start of a word – rather, it is just a support for the vowel.

ئا␣ئە␣ئو␣ئۇ␣ئۆ␣ئۈ␣ئې␣ئى

Examples: ئارىسلان ’arislan lion, يېڭىسار yëŋisar Yangi Hissary, خوتەن χoten Khotan.

Consonant clusters & gemination

There is apparently no special way to indicate consonant clusters or gemination. The Arabic diacritic ـّ [U+0651 ARABIC SHADDA​] isn't used. This is presumably due to the alphabetic nature of the writing system.

Consonants

The following consonants are used for the Uighur language, which is largely written as it is spoken:

ب␣پ␣ت␣ج␣چ␣خ␣د␣ر␣ز␣ژ␣س␣ش␣غ␣ف␣ق␣ك␣گ␣ڭ␣ل␣م␣ن␣ھ

Two more letters are used as semivowels.

ۋ␣ي

The transcriptions shown are from the Uyghur Latin alphabet (ULY) system, and occasionally there can be ambiguities around the digraphs. In such cases, an apostrophe is used, eg. the transcription bashlan’ghuch for باشلئانگۇچ baʃl’anguč bɑʃlɑnʁutʃ beginning disambiguates n-gh from ng-h.

Punctuation

The Uighur language typically uses only the following punctuation from the Arabic script block. For information about how these and punctuation marks from other blocks are used for the Arabic language, see the boundaries and numbers sections below.

،␣؛␣؟

Numbers

Uighur uses european digits.

Glyph shaping & positioning

Cursive script

Arabic script joins letters together. This results in four different shapes for most letters (including an isolated shape).

تۇغۇلغان

The letter غ [U+063A ARABIC LETTER GHAIN] in 2 different joining contexts.

A few Arabic script letters only join on the right-hand side.

Contextual shaping

As in Arabic, lam followed by alef ligate, eg. ئىسلام ’islam islam Islam.

Structural boundaries & markers

Word boundaries

The concept of 'word' is difficult to define in any language (see What is a word?). Here, a word is a vaguely-defined, but recognisable semantic unit that is typically smaller than a phrase and may comprise one or more syllables.

Words are separated by spaces.

Phrase boundaries

Uighur uses a mixture of western and arabic punctuation.

For separators at the sentence level and below, the following are used in Uighur text, where the right column indicates approximate equivalences to Latin script.

comma ، [U+060C ARABIC COMMA]
semi-colon ؛ [U+061B ARABIC SEMICOLON]
colon : [U+003A COLON]
sentence . [U+002E FULL STOP]
question mark ؟ [U+061F ARABIC QUESTION MARK] 

TBD

Further information needed for this section includes:

Glyph shaping & positioning
    Cursive text
    Context-based shaping
    Multiple combining characters
    Context-based positioning
    Transforming characters

Structural boundaries & markers
    Grapheme, word & phrase boundaries
    Hyphens & dashes
    Bracketing information
    Quotations
    Abbreviations, ellipsis, & repetition
    Emphasis & highlights
    Inline notes & annotations

Inline layout
    Inline text spacing
    Bidirectional text

Line & paragraph layout
    Text direction
    Line breaking
    Hyphenation
    Text alignment & justification
    Counters, lists, etc.
    Styling initials
    Baselines & inline alignment

Page & book layout
    General page layout & progression
    Directional layout features
	Grids & tables
    Notes, footnotes, etc.
    Forms & user interaction
    Page numbering, running headers, etc.
Last changed 2019-02-09 22:21 GMT.  •  Make a comment.  •  Licence CC-By © r12a.