Use accesskey "n" to jump to the internal navigation links at any point. Right now you can

 
r12a >> docs

Uighur

Orthography notes

Updated 11 December, 2024 • recent changes scripts/arab/ug • leave a comment

This page brings together basic information about the Arabic script and its use for the Uighur language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Uighur using Unicode.

Referencing this document

Richard Ishida, Uighur Orthography Notes, 11-Dec-2024, https://r12a.github.io/scripts/arab/ug

 

Click to toggle Table of Contents.

Phonological transcriptions should be treated as a guide, only. They are taken from the sources consulted, and may be narrow or broad, phonemic or phonetic, depending on what is available. They mostly represent pronunciation of words in isolation. For more detailed information about allophones, alternations, sandhi, dialectal differences, and so on, follow the links to cited references.

This is an interactive document. Click/tap on the following to reveal detailed information and examples for each character: (a) coloured characters in examples and lists; (b) link text on character names. If your browser supports it, your cursor will change to look like as you hover over these items.

More about using this page

Character names. The names of characters in codepoint markup drop the initial ARABIC label (purely to reduce the length of the examples). In other places the full name can be found.

Navigation. The Toggle images icon opens the table of contents in a popup window. Dismiss it by clicking on the X alongside it, or by hitting the ESC key.

Detailed character notes. Clicking on coloured characters in lists or on character names opens panels that give detailed information about each character. This information is taken from the companion document, Arabic Character Notes. (Those panels can be dismissed by pressing on the ESC key.)

Transcriptions & transliterations. Phonological transcriptions are surrounded by ⌈corner brackets⌋, to indicate that they vary between narrow, [phonetic] and broad, /phonemic/ transcriptions.
Latin transcriptions between <angle brackets>, represent the letters as commonly written in the Latin script.
A transliteration has also been developed especially for this orthography, and is generally based on the sound of a letter where possible, but where a letter has multiple pronunciations, the transliteration represents only one.
Transliterations provide perfect round-trip conversion between the native script and Latin, whereas Latin transcriptions rarely do.
When you click on an example to see its composition, the top of the panel that opens contains a transliteration, followed by the native text, then (if available) an IPA transcription.

Copied !
TOC.
Accessibility settings
ˇ

Languages using the Arabic scriptUighur pickerTerms listCharacter notesArabic linksOther orthography notes

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   38px

1 ماددا ھەممە ئادەم زانىدىنلا ئەركىن، ئىززەت-ھۆرمەت ۋە ھوقۇقتا باپباراۋەر بولۇپ تۇغۇلغان. ئۇلار ئەقىلغە ۋە ۋىجدانغا ئىگە ھەمدە بىر-بىرىگە قېرىنداشلىق مۇناسىۋىتىگە خاس روھ بىلەن موئامىلە قىلىشى كېرەك.

2 ماددا ھەممە ئادەم مۇشۇ خىتابنامىدە قەيت قىلىنغان بارلىق ھوقۇق ۋە ئەركىنلىكتىن بەھرىمەن بولۇش سالاھىيىتىگە ئىگە. ئۇلار ئىرقى، رەڭگى، جىنسى، تىلى، دىنى، سىياسىي قارىشى ياكى باشقا قارىشى، دۆلەت تەۋەلىكى ياكى ئىجتىمائىي كېلىپ چىقىشى، مۈلكى، تۇغۇلۇشى ياكى باشقا سالاھىيىتى جەھەتتىن قىلچە پەرقلەنمەيدۇ. ئۇنىڭ ئۇستىگە ھەممە ئادەم ئوزى تەۋە دۆلەت ياكى زېمىننىڭ سياسىي، مەمۇرىي لاكى خەلقئارا ئورنىنىڭ ئوخشاش بولماسلىقى بىلەن پەرقلەنمەيدۇ. بۇ زېمىننىڭ مۇستەقىل زېمىن، ۋاكالىتەن باشقۇرۇلۇۋاتقان زېمىن، ئاپتونومىيىسىز زېمىن ياكى باشقا ھەرقانداق ىگىلىك ھوقۇقىغا چەك قويۇلغان ھالەتتىكى زېمىن بولۇشىدىن قەتئىينەزەر.

Source: Unicode UDHR, articles 1 & 2

Usage & history

Origins of the Arabic script, 6thC – today.

Phoenician

└ Aramaic

└ Nabataean

└ Arabic

The Perso-Arabic orthography described here is one of several alphabets used to write the Uighur language, but has been the official alphabet of the Uyghur language, used primarily by Uighur living in China, since 1982.

ئۇيغۇر ئەرەب يېزىقى ʿuyʁur ʿereb yëziqi Uyghur Ereb Yëziqi Uighur Alphabet (UEY)

Wikipedia provides the following account of the development of the orthography.

The first Perso-Arabic derived alphabet for Uyghur was developed in the 10th century, when Islam was introduced there. The version used for writing the Chagatai language. It became the regional literary language, now known as the Chagatay alphabet. It was used nearly exclusively up to the early 1920s. Alternative Uyghur scripts then began emerging and collectively largely displaced Chagatai; Kona Yëziq, meaning "old script", now distinguishes it and UEY from the alternatives that are not derived from Arabic. Between 1937 and 1954 the Perso-Arabic alphabet used to write Uyghur was modified by removing redundant letters and adding markings for vowels. A Cyrillic alphabet was adopted in the 1950s and a Latin alphabet in 1958. The modern Uyghur Perso-Arabic alphabet was made official in 1978 and reinstituted by the Chinese government in 1983, with modifications for representing Uyghur vowels.

The Arabic alphabet used before the modifications (Kona Yëziq) did not represent Uyghur vowels and according to Robert Barkley Shaw, spelling was irregular and long vowel letters were frequently written for short vowels since most Turki speakers were unsure of the difference between long and short vowels. The pre-modification alphabet used Arabic diacritics (zabar, zer, and pesh) to mark short vowels. ...

The reformed modern Uyghur Arabic alphabet eliminated letters whose sounds were found only in Arabic and spelt Arabic and Persian loanwords, including Islamic religious words, as they were pronounced in Uyghur, not as they were originally spelt in Arabic or Persian.

More information: Wikipedia

Script codearabl
Language codeug
Script typealphabet
Originwasia
Native speakers25,000,000
  
Total characters52
Letters33
Combining marks2
Punctuation5
Other12
Possible other17
Unicode blocks7
  
Character counts above are for this
orthography but exclude ASCII.
  
Text directionrtl
Post-consonant vowelsletters
Standalone vowelscarrier ئـ
Case distinctionno
Cursive scriptyes
Combining marksno
Clusters markedno
Other ligaturesyes
Word separatorspace
Wraps atword
Hyphenationyes ـ
G Clusters OK?yes
Justificationspaces
baseline stretching
Baselineromn

Basic features

The Arabic script is normally an abjad, ie. in normal use the script represents only consonant and long vowel sounds. This approach is helped by the strong emphasis on consonant patterns in Semitic languages. However Uighur is not a Semitic language, and the modern version of the Arabic script used for Uighur is an alphabet. See the table to the right for a brief overview of the features of the modern Uighur orthography.

Uighur text is written horizontally, right-to-left, but numbers and embedded Latin text are read left-to-right.

Words are separated by spaces, and contain a mixture of consonants and vowels.

The script is unicameral. Words are separated by spaces, but word-internal line breaks are allowed (unlike for the Arabic orthography when used for the Arabic language).

The script is cursive, and some basic letter shapes change significantly, depending on their joining context.

❯ Consonant summary table

Uighur has 25 consonant letters, including a character that serves as a vowel base.

Arabic sukun is not used to indicate consonant clusters or lack of a vowel. Similarly, geminated consonant sounds are written by doubling the letter, rather than using the Arabic shadda.

❯ Vowel summary table

The Uighur orthography is an alphabet where vowels are written using 8 vowel letters, in a straightforward way. Except in decomposed text, there are no combining marks. Unlike Arabic, all the diacritics are ijam, and in normal text are part of an atomic character.

The discussion of ijam vs. tashkil in the Arabic script overview has a bearing on several Uighur graphemes.

Word-initial standalone vowels or those following a vowel in a word are preceded by 'hamza on a tooth', ie. ئ‍ U+0626 LETTER YEH WITH HAMZA ABOVE.

Numbers use ASCII digits.

Punctuation marks use code points from the ASCII and Arabic Unicode ranges.

Joining forms

Because the Arabic script is 'cursive' (ie. joined-up) writing, letters tend to have different shapes depending on whether they join with adjacent letters or not (see Cursive script). In addition, vowels can be represented using different characters, depending on where in a word they appear.

In scripts such as Arabic, several characters have no left-joining form. In what follows we'll use the characters يU+064A LETTER YEH and دU+062F LETTER DAL to illustrate shapes. The former can join on both sides, but the latter can only join on the right.

Left-joining glyphs are commonly called initial; dual-joining are called medial; and right-joining are called final. Glyphs that don't join on either side are called isolated. However, these glyph shapes can be found in various places within a single word.

Word-initial characters usually have initial glyph shapes (eg. ي‍ ). However, characters that only join to the right will use an isolated glyph shape (eg. د ). Furthermore, words beginning with a vowel are always preceded by a vowel carrier, which is normally اU+0627 LETTER ALEF (eg. ای‍ or اَ ).

Word-medial characters will typically join on both sides (eg. ‍ي‍ ) but those that only join to the right will use a final glyph (eg. ‍د ). However, if either of those is preceded by another character that only joins to the right, the glyph shapes rendered will be initial (eg. ي‍ ) and isolated (eg. د ), respectively.

Word-final characters will typically use a final glyph shape (eg. ‍ي and ‍د ). However, if the previous character joins only to the right, they will use isolated glyph shapes (eg.ي and د ).

In all this contextual glyph shaping the basic shapes used for a character can vary significantly in a script like Arabic. This also includes some characters that only have ijam dots in certain contexts.

Character index

The index points to locations where a character is mentioned in this page, and indicates whether it is used by the Arabic orthography described here.

Manage characters.

Click on the image to the left to view all the 'main' and 'infrequent' characters in the index in various groupings or open related apps.

Letters

Show

Consonants

list all 25
پ067E
ARABIC LETTER PEHconsonant p p
ب0628
ARABIC LETTER BEHconsonant b b
ت062A
ARABIC LETTER TEHconsonant t t
د062F
ARABIC LETTER DALconsonant d d
ك0643
ARABIC LETTER KAFconsonant k k
گ06AF
ARABIC LETTER GAFconsonant ɡ g
ق0642
ARABIC LETTER QAFconsonant q q
چ0686
ARABIC LETTER TCHEHconsonant t͡ʃ ch
ج062C
ARABIC LETTER JEEMconsonant d͡ʒ j
ف0641
ARABIC LETTER FEHconsonant f f
س0633
ARABIC LETTER SEENconsonant s s
ز0632
ARABIC LETTER ZAINconsonant z z
ژ0698
ARABIC LETTER JEHconsonant ʒ zh
ش0634
ARABIC LETTER SHEENconsonant ʃ sh
خ062E
ARABIC LETTER KHAHconsonant χ x
غ063A
ARABIC LETTER GHAINconsonant ʁ gh
ھ06BE
ARABIC LETTER HEH DOACHASHMEEconsonant h h
ر0631
ARABIC LETTER REHconsonant r r
ل0644
ARABIC LETTER LAMconsonant l l
م0645
ARABIC LETTER MEEMconsonant m m
ن0646
ARABIC LETTER NOONconsonant n n
ڭ06AD
ARABIC LETTER NGconsonant ŋ ng
ۋ06CB
ARABIC LETTER VEsemivowel v w w
ي064A
ARABIC LETTER YEHsemivowel j y
ئ0626
ARABIC LETTER YEH WITH HAMZA ABOVEinitial vowel carrier - - ’

Vowels

list all 8
ى0649
ARABIC LETTER ALEF MAKSURAvowel i ɨ i
ۈ06C8
ARABIC LETTER YUvowel y ʏ ü
ۇ06C7
ARABIC LETTER Uvowel u ʊ u
ې06D0
ARABIC LETTER Evowel e ë
و0648
ARABIC LETTER WAWvowel o ɔ o
ە06D5
ARABIC LETTER AEvowel ɛ æ e
ۆ06C6
ARABIC LETTER OEvowel ø ö
ا0627
ARABIC LETTER ALEFvowel ɑ a a

Combining marks

Show
list
ٔ0654
(rare)    ARABIC HAMZA ABOVEhamza above Used in decomposed text only.

Punctuation

Show
list all 5
،060C
ARABIC COMMAcomma ,
؛061B
ARABIC SEMICOLONsemicolon ;
؟061F
ARABIC QUESTION MARKquestion mark ?
«00AB
LEFT-POINTING DOUBLE ANGLE QUOTATION MARKquotation mark
»00BB
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARKquotation mark

ASCII

list all 5
!0021
EXCLAMATION MARKexclamation mark !
.002E
FULL STOPfull stop .
:003A
COLONcolon :
(0028
LEFT PARENTHESISparenthesis (
)0029
RIGHT PARENTHESISparenthesis )

Other

Show

Formatting

list all 13
ZWNJ200C
ZERO WIDTH NON-JOINERzero-width non-joiner
ZWJ200D
ZERO WIDTH JOINERzero-width joiner
RLI2067
RIGHT-TO-LEFT ISOLATErtl isolate
RLE202B
RIGHT-TO-LEFT EMBEDDINGrtl embed
LRI2066
LEFT-TO-RIGHT ISOLATEltr isolate
LRE202A
LEFT-TO-RIGHT EMBEDDINGltr embed
FSI2068
FIRST STRONG ISOLATEfirst-strong isolate
PDI2069
POP DIRECTIONAL ISOLATEpop direction isolate
PDF202C
POP DIRECTIONAL FORMATTINGpop direction
RLM200F
RIGHT-TO-LEFT MARKrtl mark
LRM200E
LEFT-TO-RIGHT MARKltr mark
؜ZWSP061C
ARABIC LETTER MARKarabic letter mark
͏ALM034F
COMBINING GRAPHEME JOINERcombining grapheme joiner

To be investigated

list all 18
-002D
(tbc)    HYPHENhyphen -
[005B
(tbc)    LEFT SQUARE BRACKETbracket [
]005D
(tbc)    RIGHT SQUARE BRACKETbracket ]
ʼ02BC
(tbc)    MODIFIER LETTER APOSTROPHEapostrophe ʼ
٪066A
(tbc)    ARABIC PERCENT SIGNpercent sign %
200B
(tbc)    ZERO WIDTH SPACEzero-width space
2011
(tbc)    NON-BREAKING HYPHENnon-breaking hyphen
2013
(tbc)    EN DASHen dash
2014
(tbc)    EM DASHem dash
2018
(tbc)    LEFT SINGLE QUOTATION MARKquotation mark
2019
(tbc)    RIGHT SINGLE QUOTATION MARKquotation mark
201C
(tbc)    LEFT DOUBLE QUOTATION MARKquotation mark
201D
(tbc)    RIGHT DOUBLE QUOTATION MARKquotation mark
2026
(tbc)    HORIZONTAL ELLIPSISellipsis
2039
(tbc)    LEFT SINGLE QUOTATION MARKquotation mark
203A
(tbc)    RIGHT SINGLE QUOTATION MARKquotation mark
2060
(tbc)    U+2060 WORD JOINERword joiner

Phonology

These are sounds of the modern Uighur language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones.

Vowel sounds

i y y ɨ ɨ ɯ ɯ u ɪ ʏ ʊ e ɤ ɤ o ɛ œ œ ʌ ʌ ɔ æ ɑ ɑ

Natively and phonemically, Uighur has only short vowel sounds, although historical assimilation and loan words have led to some longer sounds phonetically2.

Uighur has no diphthongs, although hiatus may occur in some loanwords2.

Uighur vowels participate in vowel harmony and vowel reduction. For more information see Uighur Phonology.

Consonant sounds

labial dental alveolar post-
alveolar
palatal velar uvular glottal
stop p b t d       k ɡ q ʔ
affricate       t͡ʃ d͡ʒ        
fricative f v   s z ʃ ʒ     χ ʁ ɦ
nasal m   n     ŋ  
approximant w   l   j    
trill/flap     r    

Stops and affricates weaken (lenition) before dissimilar consonants, and r, l and j may be assimilated to the preceding vowel, which becomes lengthened, but none of this is reflected in the orthography1.

Tone

Uighur is not a tonal language.

Structure

The general syllabic structure of Uighur2 is

CV(C)(C)

Uighur syllables are primarily CV or CVC. Consonant clusters in the syllable coda are often phonetically altered by elision or epenthesis2.

Any consonant can begin a syllable except for ŋ. Any consonant can appear in the coda except for ʔ2.

Vowels

The Uighur orthography is an alphabet where vowels are written using 8 vowel letters, in a straightforward way. Except in decomposed text, there are no combining marks. Unlike Arabic, all the diacritics are ijam, and in normal text are part of an atomic character.

The discussion of ijam vs. tashkil in the Arabic script overview has a bearing on several Uighur graphemes.

Word-initial standalone vowels or those following a vowel in a word are preceded by 'hamza on a tooth', ie. ئ‍ U+0626 LETTER YEH WITH HAMZA ABOVE.

Vowel summary table

The following table summarises the main vowel to character assignments.

The right-hand column shows word-initial forms. The glyphs shown in the table are illustrative; alternative shapes may occur (see Joining forms).

Simple:

3
i ɨىii0649
    
y ʏۈüü06C8
    
u ʊۇuu06C7

3
i ɨئىiʿi0626
0649
    
y ʏئۈüʿü0626
06C8
    
u ʊئۇuʿu0626
06C7

3
eېëë06D0
    
øۆöø06C6
    
o ɔوoo0648

3
eئېëʿë0626
06D0
    
øئۆoʿø0626
06C6
    
o ɔئوöʿo0626
0648

ɛ æەee06D5

ɛ æئەeʿe0626
06D5

ɑ aاaa0627

ɑ aئاaʿa0626
0627

For additional details see Vowel sounds to characters.

Post-consonant vowels

Vowel letters

Uighur has 8 vowel letters.


8
ىi ɨi i0649
ۈy ʏü ü06C8
ۇu ʊu u06C7
ېeë ë06D0
وo ɔo o0648
ەɛ æe e06D5
ۆøö ø06C6
اɑ aa a0627

Examples: يېڭىسار yeŋisar Yengisar خوتەن xotæn Khotan

When used for standalone vowels, these letters are preceded by ئU+0626 LETTER YEH WITH HAMZA ABOVE (see Standalone vowels).

See also Encoding choices.

Combining marks


ٔrare0654

Because Uighur uses atomic characters for its vowels, Uighur text usually contains no vowel-dedicated combining marks. The only exception occurs in decomposed text, where ٔU+0654 HAMZA ABOVE will become a combining mark.

Vowel length

Uighur doesn't natively have long vowel sounds, and none are marked in the written orthography.2

Standalone vowels

Standalone vowels are vowel sounds that are not preceded by a consonant sound, or are preceded by only a glottal stop. They may appear at the beginning of a word or in the middle of a word after a preceding vowel.

When a vowel is alone, initial, or follows another vowel inside a word, it is always preceded by ئ‍ U+0626 LETTER YEH WITH HAMZA ABOVE, which in theory represents the glottal stop, but which is not pronounced as such at the start of a word – rather, it is just a support for the vowel.


8
ئىi ɨiʿi0626
0649
ئۈy ʏüʿü0626
06C8
ئۇu ʊuʿu0626
06C7
ئېeëʿë0626
06D0
ئوo ɔöʿo0626
0648
ئەɛ æeʿe0626
06D5
ئۆøoʿø0626
06C6
ئاɑ aaʿa0626
0627

Vowel sounds to characters

This section maps Uighur vowel sounds to common graphemes in the Arabic orthography.

The items here indicate typical word-initial, word-medial, and word-final usage. The joining forms shown are illustrative; alternative shapes may occur (see Joining forms).

i ɨ

initial ئى‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+0649 LETTER ALEF MAKSURA eg. ئىرادە ʔirɑdɛ willpower

medial ‍ى‍ U+0649 LETTER ALEF MAKSURA eg. ئالتىنچى ɑltint͡ʃi sixth

final ‍ىU+0649 LETTER ALEF MAKSURA eg. ئالتىنچى ɑltint͡ʃi sixth

y ʏ

initial ئۈ‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+06C8 LETTER YU eg. ئۈزۈك ʔyzyk finger ring

medial ‍ۈ‍ U+06C8 LETTER YU eg. سۈلھ sylh peace

final ‍ۈU+06C8 LETTER YU eg. كۆزگۈ køzɡy mirror

u ʊ

initial ئۇ‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+06C7 LETTER U eg. ئۇلۇغ ʔuluʁ great, chieftain

medial ‍ۇ‍ U+06C7 LETTER U eg. بۇرۇن burun nose

final ‍ۇU+06C7 LETTER U eg. جاڭيۇ d͡ʒɑŋju soy sauce

e

initial ئې‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+06D0 LETTER E eg. ئېغىز ʔeʁiz mouth

medial ‍ې‍ U+06D0 LETTER E eg. بېنزىن benzin gasoline

final ‍ېU+06D0 LETTER E eg. داشۈئې dɑʃyʔe university

ø

initial ئۆ‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+06C6 LETTER OE eg. ئۆلمەك œlmæk̚ to die

medial ‍ۆ‍ U+06C6 LETTER OE eg. تۆت tøt̚ warm, hot

final ‍ۆU+06C6 LETTER OE

o

initial ئو‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+0648 LETTER WAW eg. ئوتۇن ʔotun firewood

medial ‍و‍ U+0648 LETTER WAW eg. پروفېسسور propessor professor

final ‍وU+0648 LETTER WAW eg. جۇڭگو d͡ʒuŋɡo China

ɛ

initial ئە‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+06D5 LETTER AE eg. ئەتىۋار ʔɛtiwɑr value, worth

medial ‍ە‍ U+06D5 LETTER AE eg. ئاپەت ʔɑpɛt disaster, tragedy

final ‍ەU+06D5 LETTER AE eg. ئالتە ɑltɛ six

æ

initial ئە‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+06D5 LETTER AE eg. ئەپسۇس æp.sus pity

medial ‍ە‍ U+06D5 LETTER AE eg. خوتەن xotæn Khotan

final ‍ەU+06D5 LETTER AE

ɑ a

initial ئا‍ U+0626 LETTER YEH WITH HAMZA ABOVE + U+0627 LETTER ALEF eg. ئالاقە ʔɑlɑqɛ connection

medial ‍ا‍ U+0627 LETTER ALEF eg. بانان banan banana

final ‍اU+0627 LETTER ALEF eg. بانا bana excuse

Consonants

Uighur has 25 consonant letters, including a character that serves as a vowel base.

Arabic sukun is not used to indicate consonant clusters or lack of a vowel. Similarly, geminated consonant sounds are written by doubling the letter, rather than using the Arabic shadda.

Consonant summary table

The following table summarises the main consonant to character assigments.

Consonants

7
pپpp067E
bبbb0628
tتtt062A
dدdd062F
kكkk0643
ɡگgg06AF
qقqq0642

both
t͡ʃچchč0686
d͡ʒجjʤ062C

9
fفff0641
v wۋww06CB
sسss0633
zزzz0632
ʒژzhʒ0698
ʃشshʃ0634
χخxχ062E
ʁغghʁ063A
hھhh06BE

3
mمmm0645
nنnn0646
ŋڭngŋ06AD

4
wۋww06CB
rرrr0631
lلll0644
jيyy064A

For additional details see Consonant sounds to characters.

Consonant letters

The following consonants are used for the Uighur language, which is largely written as it is spoken. Whereas the table just above takes you from sounds to letters, the following simply lists the basic consonant letters (however, since the orthography is highly phonetic there is little difference in ordering).


24
پpp p067E
بbb b0628
تtt t062A
دdd d062F
كkk k0643
گɡg g06AF
    
    
چt͡ʃch č0686
    
    
فff f0641
سss s0633
زzz z0632
ژʒzh ʒ0698
شʃsh ʃ0634
خχx χ062E
غʁgh ʁ063A
    
    
مmm m0645
نnn n0646
    
    
ۋv ww w06CB
رrr r0631
لll l0644
يjy y064A

Transcription note

In transcriptions using the Uyghur Latin alphabet (ULY) system, occasionally there can be ambiguities around the digraphs. In such cases, an apostrophe is used, eg. the transcription bashlan’ghuch for the following disambiguates n-gh from ng-h. باشلئانگۇچ beginning

Consonant clusters & gemination

Consonant clusters have no special annotation or shaping. The Arabic sukkun is not used to indicate missing vowels.

Geminated consonants are written by simply repeating the consonant twice, there is no use of the Arabic shadda, eg. تاللاش tallash selection, sample ئاپتاپپەرەس aptapperes sunflower ئۇسسۇل ussul dance

Consonant sounds to characters

This section maps Uighur consonant sounds to common graphemes in the Arabic orthography.

The right-hand side of each item shows the various joining forms for that character.

p

پ‍پ‍پ‍ پ‍ consonant پU+067E LETTER PEH

b

ب‍ب‍ب‍ ب‍ consonant بU+0628 LETTER BEH

t

ت‍ت‍ت‍ ت‍ consonant تU+062A LETTER TEH

t͡ʃ

چ‍چ‍چ‍ چ‍ consonant چU+0686 LETTER TCHEH

d

د‍ ‍د consonant دU+062F LETTER DAL

d͡ʒ

ج‍ج‍ج‍ ج‍ consonant جU+062C LETTER JEEM

k

ك‍ك‍ك‍ ك‍ consonant كU+0643 LETTER KAF

ɡ

گ‍گ‍گ‍ گ‍ consonant گU+06AF LETTER GAF

q

ق‍ق‍ق‍ ق‍ consonant قU+0642 LETTER QAF

f

ف‍ف‍ف‍ ف‍ consonant فU+0641 LETTER FEH

v

ۋ‍ ‍ۋ semivowel ۋU+06CB LETTER VE

w

ۋ‍ ‍ۋ semivowel ۋU+06CB LETTER VE

s

س‍س‍س‍ س‍ consonant سU+0633 LETTER SEEN

z

ز‍ ‍ز consonant زU+0632 LETTER ZAIN

ʃ

ش‍ش‍ش‍ ش‍ consonant شU+0634 LETTER SHEEN

ʒ

ژ‍ ‍ژ consonant ژU+0698 LETTER JEH

χ

خ‍خ‍خ‍ خ‍ consonant خU+062E LETTER KHAH

ʁ

غ‍غ‍غ‍ غ‍ consonant غU+063A LETTER GHAIN

h

ھ‍ھ‍ھ‍ ھ‍ consonant ھU+06BE LETTER HEH DOACHASHMEE

m

م‍م‍م‍ م‍ consonant مU+0645 LETTER MEEM

n

ن‍ن‍ن‍ ن‍ consonant نU+0646 LETTER NOON

ŋ

ڭ‍ڭ‍ڭ‍ ڭ‍ consonant ڭU+06AD LETTER NG

w

ۋ ـۋ vowel ۋU+06CB LETTER VE

r

ر‍ ‍ر consonant رU+0631 LETTER REH

l

ل‍ل‍ل‍ ل‍ consonant لU+0644 LETTER LAM

j

ي‍ي‍ي‍ ي‍ semivowel يU+064A LETTER YEH

Encoding choices

Several of the vowel signs could be written by adding a combining mark to a base character (see the table), but in practice precomposed characters are used, and the only letter that decomposes during NFD normalisation is ئU+0626 LETTER YEH WITH HAMZA ABOVE.

Use Do NOT use
ۆU+06C6 LETTER OE وٚ U+0648 LETTER WAW + U+065A VOWEL SIGN SMALL V ABOVE
ۈU+06C8 LETTER YU وٰ U+0648 LETTER WAW + U+0670 LETTER SUPERSCRIPT ALEF
ۇU+06C7 LETTER U وُ U+0648 LETTER WAW + U+064F DAMMA

This table reflects the fact that the marks associated with base characters here are of the ijam, rather than tashkil, kind (see ijam vs. tashkil in the Arabic script overview).

Numbers

This section describes typographic features related to digits, dates, currencies, etc.

Digits

See type samples.

Uighur uses ASCII digits.

Dates

Observation: Figure 1 shows day-month format using a tatweel-like connector, however the text doesn't connect to the horizontal line.

Day-month date forms using a low horizontal connector.

Text direction

Arabic script text is written horizontally and right-to-left in the main, but as with most RTL scripts, numbers and embedded LTR script text are written left-to-right (producing 'bidirectional' text).

 1899 - ئاسپىرىن (Aspirin) بازارغا سېلىندى.
Uighur words are read RTL, starting on the right, but numbers and Latin text (highlighted here) are read left-to-right.

The Unicode Bidirectional Algorithm automatically takes care of the ordering for all the text in Figure 2, as long as the 'base direction' (ie. the surrounding directional context) is set to right-to-left (RTL).

Characters are all stored in the order in which they are spoken (and typed). This so-called 'logical' order is then rendered as bidirectional flows by the application at run time, as the text is displayed or printed. The relative placement of characters within a single directional flow is based on strong directional properties (RTL or LTR) assigned to each Unicode character by the Unicode Standard. There exist, however a set of neutral direction property values, mostly for punctuation, where the placement of characters depends on the base direction.

Show default bidi_class properties for characters in this orthography.

If the base direction is not set appropriately, the directional runs will be ordered incorrectly as shown in Figure 3, making it very difficult to get the meaning.

شىنجاڭ ئۇيغۇر ئاپتونوم رايونى (خەنچە: 新疆维吾尔自治区) جۇڭخۇا خەلق جۇمھۇرىيىتى توپراقلىرى ئىچىدە يەر ئالغان
The exact same sequence of characters with the base direction set to RTL (top), and with no base direction set on this LTR page (bottom). The arrows show how items are relocated.

In some circumstances the Unicode Bidirectional Algorithm requires additional assistance to correctly render the directionality of bidirectional text. For such cases the Unicode Standard provides invisible formatting characters for use in plain text. See Managing text direction.

In HTML the base direction and higher level controls can be set using the dir or bdi attributes. CSS should not be used to control direction. Unicode formatting codes should also not be used where markup is available.

For more information about how directionality and base direction work, see Unicode Bidirectional Algorithm basics. For information about plain text formatting characters see How to use Unicode controls for bidi text. And for working with markup in HTML, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.

For authoring HTML pages, one of the most important things to remember is to use <html dir="rtl" … > at the top of a right-to-left page, and then use the dir attribute or bdi tag for ranges within the page, but only when you need to change the base direction. Also, use markup to manage direction, and do not use CSS styling.

For other aspects of dealing with right-to-left writing systems see the following sections:

Managing text direction

Unicode provides a set of 10 formatting characters that can be used to control the direction of text when displayed. These characters have no visual form in the rendered text, however text editing applications may have a way to show their location.

‫U+202B RIGHT-TO-LEFT EMBEDDING (RLE), ‪U+202A LEFT-TO-RIGHT EMBEDDING (LRE), and ‬U+202C POP DIRECTIONAL FORMATTING (PDF) are in widespread use to set the base direction of a range of characters. RLE/LRE comes at the start, and PDF at the end of a range of characters for which the base direction is to be set.

In Unicode 6.1, the Unicode Standard added a set of characters which do the same thing but also isolate the content from surrounding characters, in order to avoid spillover effects. They are ⁧U+2067 RIGHT-TO-LEFT ISOLATE (RLI), ⁦U+2066 LEFT-TO-RIGHT ISOLATE (LRI), and ⁦U+2066 LEFT-TO-RIGHT ISOLATE (PDI). The Unicode Standard recommends that these be used instead.

There is also ⁨U+2068 FIRST STRONG ISOLATE (FSI), used initially to set the base direction according to the first recognised strongly-directional character.

؜U+061C LETTER MARK (ALM) is used to produce correct sequencing of numeric data. Click on the character name, and see also expressions for details.

‏U+200F RIGHT-TO-LEFT MARK (RLM) and ‎U+200E LEFT-TO-RIGHT MARK (LRM) are invisible characters with strong directional properties that are also sometimes used to produce the correct ordering of text.

For more information about how to use these formatting characters see How to use Unicode controls for bidi text. Note, however, that when writing HTML you should generally use markup rather than these control codes. For information about that, see Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.

Glyph shaping & positioning

This section describes typographic features related to font/writing styles, cursive text, context-based shaping, context-based positioning, letterform slopes, weights & italics, and case & other character transforms.

You can experiment with examples using the Uighur character app.

Cursive script

Do letters in this script join with each other by default? Is the basic shape of a letter radically changed? Is it sometimes not cursive? Are there any special features to note? Are Unicode joiner and non-joiner characters needed to override default joining behaviours?

Arabic script joins letters together. This results in four different shapes for most letters (including an isolated shape).

تۇغۇلغان
The letter غU+063A LETTER GHAIN in 2 different joining contexts.

A few Arabic script letters only join on the right-hand side.

Context-based shaping & positioning

Are special glyph forms needed, depending on the context in which a character is used? Do glyphs interact in some circumstances? Are there requirements to position diacritics or other items specially, depending on context? Does the script have multiple diacritics competing for the same location relative to the base?

As in Arabic, lam followed by alef ligate, eg. ئىسلام islam Islam

Letterform slopes, weights, & italics

See type samples.

Observation: The image in Figure 5 show italicisation where the glyphs lean in the direction of text (ie. to the left).

In the italicised text of the heading the glyphs lean to the left.

Typographic units

Word boundaries

Are words separated by spaces, or other characters? Are there special requirements when double-clicking on the text? Are words hyphenated?

The concept of 'word' is difficult to define in any language (see What is a word?). Here, a word is a vaguely-defined, but recognisable semantic unit that is typically smaller than a phrase and may comprise one or more syllables.

Words are separated by spaces.

Word-level segmentation is used for line-breaking and basic justification.

Uighur has hyphenated words, eg.

ئىززەت-ھۆرمەت izzet-hörmet dignity

Graphemes

Uighur principally uses word boundaries for line-breaking and basic justification, but uses grapheme boundaries for other operations that work at the sub-word level.

Phrase, sentence, and section delimiters are described in Phrase & section boundaries.

Grapheme clusters

A grapheme is a user-perceived unit of text. Text operations that use graphemes as a unit of text include line-breaking, forwards deletion, cursor movement & selection, character counts, text spacing, text insertion, justification, case conversions, and sorting. The Unicode Standard uses generalised rules to define 'grapheme clusters', which approximate the likely grapheme boundaries in a writing system, however they don't work well with many complex scripts.

Base (Combining_mark)*

In Uighur, segmentation can be realised using Unicode grapheme clusters. A typographic unit is almost always equivalent to a letter, since precomposed code points are available for all letter and diacritic combinations. Only one letter, ئU+0626 LETTER YEH WITH HAMZA ABOVE, decomposes; in that case, the typographic unit includes both the base letter and the combining mark.

Examples:

ئاچقۇچ achquch key
ئۆيمۇئۆي öymu'öy from door to door

This kind of typographic unit can be used for forwards deletion, cursor movement & selection, character counts, text spacing, and text insertion.

Punctuation & inline features

This section describes typographic features related to word boundaries, phrase & section boundaries, bracketed text, quotations & citations, emphasis, abbreviation, ellipsis & repetition, inline notes & annotations, other punctuation, and other inline text decoration.

Phrase & section boundaries

What characters are used to indicate the boundaries of phrases, sentences, and sections?

See type samples.


6
!!!0021
،,,060C
:::003A
؛;;061B
...002E
؟??061F

Uighur uses a mixture of ASCII and Arabic punctuation.

phrase

،U+060C COMMA

؛U+061B SEMICOLON

:U+003A COLON

sentence

.U+002E FULL STOP

؟U+061F QUESTION MARK

!U+0021 EXCLAMATION MARK

Observation: The comma can be found immediately after the previous word, but as shown in fig_comma_gap, it may also be surrounded by space.


Commas (in different documents) without (top) and with (bottom) leading space. 🗋

Bracketed text

See type samples.


both
(((0028
)))0029

Uighur commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(U+0028 LEFT PARENTHESIS

)U+0029 RIGHT PARENTHESIS

See type samples.

Mirrored characters

The words 'left' and 'right' in the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.

a > b > c
ا > ب > ج
Both of these lines use > U+003E GREATER-THAN SIGN, but the direction it faces depends on the base direction at the point of display.

The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones.


12
( 0028
) 0029
< 003C
> 003E
[ 005B
] 005D
{ 007B
} 007D
« 00AB
» 00BB
 2039
 203A

Quotations & citations

What characters are used to indicate quotations? Do quotations within quotations use different characters? What characters are used to indicate dialogue? Are the same mechanisms used to cite words, or for scare quotes, etc? What about citing book or article names?

See type samples.


both
«00AB
»00BB

The following quotation marks can be found in Uighur texts. (Depending on ease of input, quotations may alternatively be surrounded by ASCII double and single quote marks.)

  start end
primary

« [U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK]

» [U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK]

Because they are mirrored, when using these quotation marks, LEFT should be read as if it said START, and RIGHT as END.

Observation: Figure 8 appears to show double angle brackets being used as a quotation mark.

Quotation marks (?) using double angle brackets.

Observation: Figure 9 shows double angle brackets being used to cite lists of characters.

Examples, bracketed with double angle brackets.

Line & paragraph layout

This section describes typographic features related to line breaking & hyphenation, text alignment & justification, text spacing, baselines, line height, counters, lists, and styling initials.

Line breaking & hyphenation

Are there special rules about the way text wraps when it hits the end of a line? Does line-breaking wrap whole 'words' at a time, or characters, or something else (such as syllables in Tibetan and Javanese)? What characters should not appear at the end or start of a line, and what should be done to prevent that? Is hyphenation used, or something else? What rules are used? What difficulties exist?

See type samples.

Common practice is to break the sentence at any point when it reaches the end of a line.

In-word line-breaking

Uighur text can be hyphenated at the end of a line (see Figure 10).

Examples of line-end hyphenation in Uighur. 🗋

The glyphs before the hyphen and at the start of the next line are joined forms.

The hyphen sits on the baseline and looks like a tatweel. A very small gap appears between the hyphen and the last letter of the word at the end of the line.

Observation: The actual 'hyphen' looks like ـ [U+0640 ARABIC TATWEEL]. That would produce the expected joining form at the end of the line, although some additional mechanism would be needed to produce the form at the start of the next line. However, scans of various documents show a very small gap between the horizontal line and the last joining form at the end of the line, as can be seen in Figure 10, which would negate the joining produced by a tatweel.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show default line-breaking properties for characters in this orthography.

The following list gives examples of typical behaviours for characters affected by these rules. Context may affect the behaviour of some of these and other characters.

  • « “ ‘ (   should not be the last character on a line
  • » ” ’ ) . ، ؛ ؟ !   should not begin a new line

Breaking between Latin words

When a line break occurs in the middle of an embedded left-to-right sequence, the items in that sequence need to be rearranged visually so that it isn't necessary to read lines from bottom to top.

Figure 11 shows how two Latin words are apparently reordered in the flow of text to accommodate this rule. Of course, the rearragement is only that of the visual glyphs: nothing affects the order of the characters in memory.

Text with no line break in Latin text.

Text with line break in Latin text.

In this Arabic language text, the lower of these two images shows the result of decreasing the line width, so that text wraps between a sequence of Latin words.

Text alignment & justification

Does text in a paragraph needs to have flush lines down both sides? Does the script allow punctuation to hang outside the text box at the start or end of a line? Where adjustments are need to make a line flush, how is that done? Does the script shrink/stretch space between words and/or letters? Are word baselines stretched, as in Arabic? What about paragraph indents?

See type samples.

Baseline lengthening is used to justify lines of text.

Figure 12 shows that baseline lengthening and hyphenation can both be used, and sometimes within the same word.

Kashida baseline lengthening and hyphenation used in the same word (2nd line down).

Baselines, line height, etc.

Does the script have special requirements for baseline alignment between mixed scripts and in general? Is line height special for this script? Are there other aspects that affect line spacing, or positioning of items vertically within a line?

Uighur uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Font baselines should match the alphabetic baseline of Latin script text, and Arabic Uighur fonts should have relative sizes that match. However, Uighur also needs to look right alongside Chinese text, which has a slightly lower baseline and generally larger characters than Latin.

Uighur places vowel and tone glyphs above and below base characters. Several glyphs (especially in independent or final forms) also have long descenders or ascenders.

To give an approximate idea, Figure 13 compares Latin and Uighur glyphs from Noto fonts. The basic part of most Uighur letters is generally less than Latin x-height, however extenders and combining marks reach up to and sometimes beyond the Latin ascenders and descenders. That said, Noto fonts are relatively conservative in terms of glyph heights.

Hhqxغ‌گ‌جئائۆئې‌لخئۈ百万 Hhqxغ‌گ‌جئائۆئې‌لخئۈ百万
Font metrics for Latin text compared with Uighur glyphs in the Noto Naskh Arabic (top) and Noto Sans Arabic (bottom) fonts.

Figure 14 shows similar comparisons for the Scheherazade New and Microsoft Uighur fonts.

Hhqxغ‌گ‌جئائۆئې‌لخئۈ Hhqxغ‌گ‌جئائۆئې‌لخئۈ
Latin font metrics compared with Uighur glyphs in the Scheherazade New (top) and Microsoft Uighur (bottom) fonts.

Page & book layout

This section describes typographic features related to general page layout & progression; grids & tables, notes, footnotes, etc, forms & user interaction, and page numbering, running headers, etc.

Online resources

  1. Uyghur Wikipedia   

References & sources

1Wikipedia, Uyghur language

2Wikipedia, Uyghur phonology

3Wikipedia, Quotation mark

4Yannis Haralambous, Breaking Arabic: the creative inventiveness of Uyghur script reforms

See recent changes.  •  Make a comment.  •  Licence CC-By © r12a.