XXXX

XXXX script orthography notes

Updated 13 July, 2024

This page brings together basic information about the XXXX script and its use for the XXXX language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write XXXX using Unicode.

Referencing this document

Richard Ishida, Bangla (Bengali) Orthography Notes, 13-Jul-2024, https://r12a.github.io/scripts/beng/bn

Howtos

Character panels

A typical character panel looks like:

𖼁␣𖼋␣𖼏␣𖼟␣𖼢␣𖼸␣𖼯␣𖼛␣𖼅␣𖼑␣𖼕

The source text is:

<figure class="characterBox auto" data-cols="ipa,trans,transc">𖼁␣𖼋␣𖼏␣𖼟␣𖼢␣𖼸␣𖼯␣𖼛␣𖼅␣𖼑␣𖼕</figure>

Class: characterBox produces a yellow background. Other coloured backgrounds are only used occasionally; they include auxiliaryBox (mauve), etc.

class:small renders the character at 80% size. Good for long lists of conjuncts, list counters, etc.

class:hideCh renders the character at 20% size with a border. Good for invisible characters that are represented by images (see data-notes).

class:noexpansion prevents the production of the downwards arrow that shows details for all characters.

class:nolist prevents production of the arrow that lists all characters.

class:ipaplus adds the contents of the ipa+ column in the spreadsheet (usually inherent vowel).

data-ignore a (single) character that should be ignored when listing the hex code point values; typically used for dotted circles.

data-ipa value is a comma-separated list of ipa transcriptions that will be aligned with the items displayed. You should normally remove 'ipa' from the data-cols field.

data-translit tbd.

data-links associates a link with each item.

data-extra tbd.

data-notes a comma-separated list of texts that are displayed below the main item. This is particularly useful for invisible characters because it can be used to pull in images, eg.
data-notes="<img src='../../c/Miao/large/16F8F.png' alt='ZWNJ' style='vertical-align:middle; height:2rem'>, ..."

data-last indicates that the hex code point value refers to the last item; used mostly for bicameral listings.

Codepoints

A typical reference to a code point or sequence of code points looks like the following 4 examples. The page converts simple markup to more complicated markup during rendering.

𞋀

𞋖𞋭

𞋮

200B

Basic markup for these is, respectively:

<span class="ch">𞋀</span> or <span class="hx">1E2C0</span>

<span class="ch">𞋖𞋭</span> or <span class="hx">1E2D6 1E2ED</span>

<span class="ch">𞋮</span> or <span class="hx">1E2EE</span>

<span class="hx img">200B</span>

The generated markup will look something like this:

<span class="codepoint" translate="no"><bdi lang="th">𞋀</bdi><a href="javascript:void(0)" target=""><span class="uname">U+1E2C0 WANCHO LETTER AA</span></span>

Each orthography notes page globally declares a language tag to be used with example text. The markup automatically adds a lang attribute for that language to the generated markup. If you want a different lang attribute to be applied, simply add lang="xx" to the span tag (where xx is the lang value).

Every span tag must contain one of the following classes:

ch The span element content contains one or more characters (with no separators).

hx The span element content contains one or more hex code point values, separated by a space.

Optional, additional class names are as follows. Most of these can be combined with others.

img A PNG image will be used to show the character.

svg An SVG image will be used to show the character.

noname Show the character but suppress the name & codepoint value.

split Renders a sequence as 𞋖𞋭, rather than 𞋖𞋭.

circle Adds a dotted circle before the listed character(s) on display.

coda Adds a dotted circle after the listed character(s) on display. This is useful for showing character sequences from scripts like Thai, where the pronunciation is different in syllables with and without a coda.

init Adds ZWJ after the character(s), when displayed.

medi Adds ZWJ before and after the character(s), when displayed.

fina Adds ZWJ before the character(s), when displayed.

skip Adds ZWJ before the second character(s), when displayed. This is useful for cases where, for a cursive script, you want to show a combining mark followed by a joining letter.

noindex Prevents this item being picked up for inclusion in the index.

References

References can be added to the markup using <tt>...</tt>.

To indicate a source link that is listed in the refs.js file, use <tt> with the key for that resource. After a comma you can add a number for a page, or '#fragid', eg.<tt>ul,22</tt>

To indicate a source link that is NOT listed in the refs.js file, use <tt> containing '@shortname,url', eg. <tt>@Github,https://github.com/w3c/sealreq/issues/45</tt>

If this is something the reader should also look at, rather than just a source link, add class="more" to the tt element.

On devices that don't have hover any page number will be shown alongside the reference number.

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   CHANGE_ME

Sample_text_goes_here

Source: Universal Declaration of Human Rights - Chakma, articles 1 & 2

Usage & history

Origins of the Oriya script, 1051 – today.

Phoenician

└ Aramaic

└ Brahmi

└ Tamil-Brahmi

└ Pallava

└ Grantha

└ Malayalam

+ Tigalari

+ Dhives Akuru

+ Saurashtra

Speakers of the Eastern Cham language number about 132,000 in Bình Thuận, Ninh Thuận, and Đồng Nai provinces in southern Vietnam, as well as in Hồ Chí Minh City. The Ethnologue estimates the L1 literacy rate to be 5%–10%. The Cham script is the primary orthography for the Eastern Cham, but the largely muslim Western Cham peoples of Cambodia prefer the Arabic script.eth Historically, the Eastern Cham script was learned by boys once they reached a certain age, but not by women and girls.ws

ꨀꨇꩉ ꨌꩌ

Coming to Southeast Asia with the expansion of Indian religions, Cham was one of the first scripts to develop from the Pallava script some time around 200 CE.ws

More information: WikipediaEndangered Alphabets

Basic features

The XXXX script is an abugida, ie. each consonant contains an inherent vowel sound. See the table to the right for a brief overview of features for the modern ZZZZ orthography.

The XXXX script is an alphabet, ie. consonants and vowels are written separately. See the table to the right for a brief overview of features for the ZZZZ language.

The XXXX script is an abjad, ie. short vowels are not normally written. See the table to the right for a brief overview of features for the ZZZZ language.

The ZZZZ XXXX orthography is derived from the Arabic/Persian abjads, where in normal use the script represents only consonant and long vowel sounds. However, the script has been adapted in this orthography in order to cope with the many more vowels sounds in Kashmiri, and this is one of the Arabic orthographies that regularly indicates all vowel sounds, making it an alphabet. See the table to the right for a brief overview of features for the modern Kashmiri orthography using the Arabic script.

XXXX text runs left-to-right in horizontal lines, but numbers and embedded Latin text are read left-to-right. There is no case distinction. Words are separated by spaces.

❯ consonantSummary

Modern XXXX represents native consonant sounds using 19 basic letters and 6 aspirated digraphs, but can use 13 more consonants to spell words loaned from Persian, Arabic and Urdu.

A special letter is used to indicate palatalisation, which is common in Kashmiri. Similarly to the other yeh used in Kashmiri, it has a circle below when used in syllable onsets, and a swash with no circle after a syllable coda.

Unlike other Arabic orthographies, 0652 (jazm), normally used to show vowel absence, is placed over the second consonant in an onset cluster (such as tr). That letter may therefore carry both the jazm diacritic and a vowel diacritic, which is quite unusual.

❯ basicV

Kashmiri is an alphabet where 16 vowel sounds (far more than in Arabic or Persian) are written using a mixture of 10 combining marks and 10 letters. Unlike Arabic, Persian, and Urdu, all vowel diacritics are always visible in Kashmiri texts. Representation of vowel sounds is complicated because the code points used to represent a given vowel typically differ according to its position within a word.

The distinction between ijam vs. tashkil has a bearing on several Kashmiri graphemes, and the choice between precomposed and decomposed realisations of a vowel letter can be complicated (see encoding).

Word-initial standalone vowels are preceded by or attached to either 0627 or 0639.

The jazm is used over a word-medial 0646 to indicate nasalisation of a preceding vowel sound. Apart from this use and that for medial consonants, it is not typically used to indicate vowel absence.

Kashmiri uses native digits, and Arabic code points for several of the more common punctuation marks.

Character index

Letters

Show

Basic consonants

xxxxx

Extended consonants

xxxxx

Vowels

xxxxx

Vocalics

xxx

Other

xxxx

Not used for XXXXX

xxxxx

Combining marks

Show

Vowels

xxxxx

Vocalics

xxx

Tones

xxx

Medials

xxx

Finals

xxx

Other

xxxxx

Not used for XXXXX

xxxxx

Numbers

Show
०␣१␣२␣३␣४␣५␣६␣७␣८␣९

Punctuation

Show
xxxxx

ASCII

!␣(␣)␣,␣-␣.␣:␣;␣?

CLDR additions

xxxx

Not used for XXXXX

xxx

Symbols

Show
xxx

Not used for XXXXX

xxx

Other

Show
‌␣‍␣⁧␣‫␣⁦␣‪␣⁨␣⁩␣‬␣‏␣‎␣؜␣͏

To be investigated

!␣?
Items to show in lists

Phonology

The following represents the repertoire of the ZZZZ language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

ĩ i y y ɨ ɨ ʉ ʉ ɯ ɯ u ũ ɪ ʏ ʊ e ø ø ɘ ɘ ɵ ɵ ɤ ɤ o õ ə ə̃ ə əː ɛ̃ ɛː ɛ œ œ ɜ ɜ ɞ ɞ ʌ ʌ ɔ ɔː ɔ̃ æ ɐ ɐ ã a ɶ ɶ a a ɑː ɑ ɑː ɑ ɒ

Complex vowels

ia iɯʔ ua
ea oa
ɛə ɔə
auʔ

Consonant sounds

labial labio-
dental
dental alveolar post-
alveolar
retroflex palatal velar uvular pharyngeal
epiglottal
glottal
stop p b     t d   ʈ ɖ c ɟ k ɡ q ɢ ʡ ʔ
        ʈʰ ɖʰ ɟʰ ɡʰ      
  ɓ     ɗ     ʄ ɠ ʛ    
  ᵐp ᵐb     ⁿt ⁿd   ᶯʈ ᶯɖ ᶮcɟ ᵑk ᵑɡ ᶰq ᶰɢ    
affricate       t͡s d͡z t͡ʃ d͡ʒ ʈ͡ʂ ɖ͡ʐ t͡ɕ d͡ʑ        
        t͡sʰ d͡zʰ t͡ʃʰ d͡ʒʰ ʈ͡ʂʰ ɖ͡ʐʰ t͡ɕʰ d͡ʑʰ        
fricative ɸ β f v θ ð s z ʃ ʒ ʂ ʐ ç ʝ x ɣ χ ʁ ħ ʕ h ɦ
        ɬ ɮ ɧ   ɕ ʑ        
nasal m ɱ   n   ɳ ɲ ŋ ɴ  
approximant ʍ w ʋ   l ɫ ɹ   ɻ ɭ j ʎ ɰ ʟ    
trill/flap ʙ   r ɾ ɺ   ɽ ʀ ʜ ʢ
  

Consonant sounds OLD

labial dental alveolar post-
alveolar
retroflex palatal velar uvular pharyngeal epiglottal glottal
stop p b
t d
    ʈ ɖ
ʈʰ ɖʰ
c ɟ k ɡ
ɡʰ
q ɢ   ʡ ʔ
affricate   t͡s d͡z   t͡ʃ d͡ʒ
t͡ʃʰ d͡ʒʰ
t͡ɕ d͡ʑ
t͡ɕʰ d͡ʑʰ
             
fricative f v θ ð s z
ɬ ɮ
ʃ ʒ
ɕ ʑ
ɧ
ʂ ʐ ç ʝ x ɣ χ ʁ ħ ʕ ʜ ʢ h ɦ
nasal m   n   ɳ ɲ ŋ ɴ    
approximant ʋ w   l ɫ ɹ   ɻ ɭ j ʎ ɰ
ʟ
     
trill/flap     r ɾ ɺ   ɽ ʀ

Tone

Thai is a contour tone language, with 5 tones: high, mid, low, falling, & rising.

tbd

Structure

tbd

Alphabet

Click on the characters to find where they are mentioned in this page.

The ZZZZ alphabet has 23 consonants and 5 vowels. Each has upper and lowercase forms; shown above and below, respectively.

𞤢␣𞤣␣𞤤␣𞤥␣𞤦␣𞤧␣𞤨␣𞤩␣𞤪␣𞤫␣𞤬␣𞤭␣𞤮␣𞤯␣𞤰␣𞤱␣𞤲␣𞤳␣𞤴␣𞤵␣𞤶␣𞤷␣𞤸␣𞤹␣𞤺␣𞤻␣𞤼␣𞤽
𞤀␣𞤁␣𞤂␣𞤃␣𞤄␣𞤅␣𞤆␣𞤇␣𞤈␣𞤉␣𞤊␣𞤋␣𞤌␣𞤍␣𞤎␣𞤏␣𞤐␣𞤑␣𞤒␣𞤓␣𞤔␣𞤕␣𞤖␣𞤗␣𞤘␣𞤙␣𞤚␣𞤛

Syllables

Vowels

Vowel summary table

The following table summarises the main vowel to character assigments.

ⓘ represents the inherent vowel. Diacritics are added to the vowels to indicate nasalisation (not shown here).

  post-consonant word-final standalone word-medial standalone word-initial standalone
Simple:
 
Vocalics:

For additional details see vowel_mappings.

Inherent vowel

ka U+1A20 LETTER HIGH KA

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter.

Post-consonant vowels

Combining marks used for vowels

ᬓᬶ ki U+1B13 BALINESE LETTER KA + U+1B36 BALINESE VOWEL SIGN ULU

Balinese uses the following dedicated combining marks for vowels. They may be used on their own, or in combination with others (see composite_vowels). They are all vowel signs.

xxxxxxx

To represent the sounds or , Balinese uses vocalic letters. A sequence such as *ᬭᭂ [U+1B2D BALINESE LETTER RA + U+1B42 BALINESE VOWEL SIGN PEPET] is not used. See vocalics.

Six of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

All vowel signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time. The glyphs used to represent vowels, whether alone or in multipart vowels, are arranged around a syllable onset, which may be 2 consonants, rather than just around the immediately preceding consonant. See prebase and circumgraphs.

Vowel letters

something

Dedicated vowel letters

These are the dedicated vowel letters.

xxxxxx

Consonants used for vowels

Kashmiri uses the following consonant characters to write vowels, generally in combination with diacritics, but also alone when representing eː and oː in non-initial positions.

xxxxxx

Multipart vowels

Multipart vowels are only produced when text is decomposed; 5 of the circumgraphs split off the 1B35 glyph, to create the following pairs:

ᭀ␣ᭃ␣ᭁ␣ᬻ␣ᬽ

Pre-base vowel signs

𑐶

The short i sound is written using 𑐶 [U+11436 NEWA VOWEL SIGN I], which appears to the left of the base consonant letter or cluster.

ꤱꥉ
A pre-base vowel sign (highlighted). It is positioned to the left of the consonant after which it is typed, stored, and pronounced.
show composition

ꤷꥉꤼꥉꤿ꥓

This combining mark is always typed and stored after the base consonant. The rendering process places the glyph before the base consonant at the time of display.

When an orthographic syllable begins with a consonant cluster that is rendered as a conjunct, the vowel sign is rendered before the start of the syllable, eg. here are 3 sets of consonant clusters, each followed by i when spoken, but the vowel sign appears to the left of each cluster.𑐗𑑂𑐏𑐶 𑐳𑑂𑐟𑐶 𑐧𑑂𑐬𑐶 jkhi sti bri

When an orthographic syllable begins with a consonant cluster, the vowel sign is actually placed before the start of the onset, ie. to the left of the syllable as a whole. See fig_prebase.

ꨚꨴꨯꨱꩃ
A pre-base vowel glyph (light highlight colour) placed before both consonants in an onset cluster, rather than just before the consonant pronounced immediately before it.
show composition

ꨚꨴꨯꨱꩃ

However, note that if the cluster is split by a visible virama, this creates two orthographic syllables and the pre-base vowel sign appears after the consonant with the virama.

Circumgraphs

ᬓᭀ ko U+1B13 BALINESE LETTER KA + U+1B40 BALINESE VOWEL SIGN TALING TEDUNG

Five vowel signs are usually produced by a single combining character with visually separate parts, that appear on different sides of the consonant onset.

ᭀ␣ᭃ␣ᭁ␣ᬻ␣ᬼ␣ᬽ

This section includes some vowel signs described in the section vocalics.

Like pre-base glyphs, these are combining marks that are always stored after the base consonant. When rendered, the single code point produces multiple glyphs, which are placed on different sides of the base consonant. Click on fig_circumgraphs to see the sequence of characters in storage.

ꤷꥇꥆꥋꥆ
A circumgraph. The vowel stored after the consonant as a single character in memory is rendered with separate glyphs on 2 sides of the base.
show composition

ꤷꥇꥆꥋꥆ

Glyphs can appear on up to 3 sides of the base. Some of the glyphs merge with the base character's glyph (see context).

These circumgraphs have canonically equivalent decomposed forms (see vs_encoding).

Vowel length

tbd

Nasalisation

tbd

Multipart vowels

Multipart vowels are only produced when text is decomposed; 5 of the circumgraphs split off the 1B35 glyph, to create the following pairs:

ᭀ␣ᭃ␣ᭁ␣ᬻ␣ᬽ

Standalone vowels

ᝉ␣ᝊ␣ᝆ␣ᝇ␣ᝃ␣ᝄ␣ ␣ᝐ␣ᝑ␣ ␣ᝋ␣ᝈ␣ᝅ␣ ␣ᝏ␣ᝍ␣ᝎ␣ᝌ

Standalone vowels are written using the normal vowel signs with no special additional mechanisms.

𞋀𞋊𞋞

Tones

tbd

Vowel absence

tbd

Vowel sounds to characters

This section maps XXXX vowel sounds to common graphemes in the XXXX orthography.

The left column shows dependent vowels, and the right column independent vowel letters.

Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Plain vowels

 
 
Open syllable
Closed syllable
Independent vowel
i
 

1752

ᝐᝒᝆ

1741

ᝁᝇᝓ

u
 

1753

ᝎᝓᝆᝓ

1742

ᝂᝇᝓ

Complex vowels

i
 

1752

ᝐᝒᝆ

1741

ᝁᝇᝓ

u
 

1753

ᝎᝓᝆᝓ

1742

ᝂᝇᝓ

Vocalics

ᬋ␣ᬌ␣ᬍ␣ᬎ␣ᬺ␣ᬻ␣ᬼ␣ᬽ

description

Consonants

Consonant summary table

The following table summarises the main consonant to character assigments.

The left column is lowercase, and the right uppercase.

  post-consonant word-final standalone word-medial standalone word-initial standalone
Onsets
Finals:

For additional details see consonant_mappings.

Basic consonants

Whereas the table just above takes you from sounds to letters, the following simply lists the basic consonant letters (however, since the orthography is highly phonetic there is little difference in ordering).

ᝉ␣ᝊ␣ᝆ␣ᝇ␣ᝃ␣ᝄ␣ ␣ᝐ␣ᝑ␣ ␣ᝋ␣ᝈ␣ᝅ␣ ␣ᝏ␣ᝍ␣ᝎ␣ᝌ

Repertoire extension

tbd

Onsets

tbd

Finals

tbd

Consonant clusters

The absence of a vowel sound between two or more consonants is visually indicated in one of the following ways.

  1. Create a conjunct. There are a number of possibilities here:
    1. Stacking : Reduce a non-initial consonant in size and shape and position it below the first.
    2. Conjoining : The two consonants sit side by side, but the second consonant has a special shape.
    3. Ligation : Create a fusion of the letter shapes, where it may be difficult to identify one or more of the components.
    4. The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
  2. Show a visible virama below the non-final consonants in the cluster.
  3. Use the anusvara .

In Unicode, the conjunct formation is achieved by adding [U+0B4D ORIYA SIGN VIRAMA] between the consonants. The font hides the virama glyph automatically when a conjunct is formed.

See also finals and gemination.

Stacking

tbd

Initial RA in clusters

tbd

Ligated forms

tbd

Conjoined consonants or a visible halanta

tbd

Triple-consonant clusters

tbd

Visible vowel-killer

tbd

Consonant length

tbd

Consonant sounds to characters

This section maps XXXXX consonant sounds to common graphemes in the XXXXXX orthography. Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc.

The right-hand column shows xxxx

Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

 
 
 
Subjoined form
Joining forms
p
 

067E

xxx

067E067E067E

Symbols

tbd

Other features

XXXX

tbd

XXXXX

tbd

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Canonically equivalent encodings

One letter only can be represented as an atomic character (the norm), or as a sequence of base letter plus combining mark. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.

Atomic (recommended) Decomposed ( NOT recommended )
ێ 064A 0654

Note that the base character in the decomposed sequence is 064A, and not 06CC, which is used elsewhere for yeh in Kurmanji. 064A is only used for this specific decomposed sequence; it is inappropriate to use it elsewhere in Kurmanji text.

Unresolved encodings

A couple of Sorani letters are encoded in different ways in different texts, and in fact can be mixed within the same text. At the moment there doesn't appear to be a clear ruling on which is expected. This section lists the alternatives.

Alternative 1 Alternative 2 Notes
06BE 0647 Wikipedia and Wiktionary represent the sound h using only HEH DOACHASHMEE, whereas gov.krd uses only ARABIC HEH. Other online resources examined are not completely one or the other. Note that Uighur uses HEH DOACHASHMEE to represent h.
06D5 0647 200E

The majority usage seems to favour AE, although again some resources mix both to some degree (although they tend to mostly use AE). This makes sense, since the use of HEH plus ZWNJ has the appearance of a hack intended to prevent HEH joining to the left, whereas AE will do this naturally, without any formatting code point. Use of AE also gets around practical difficulties that arise because the ZWNJ character is invisible and is not readily accessible from many keyboards – difficulties that are amplified by the fact that this vowel letter is one of the most commonly used letters in the Sorani alphabet. Note that Uighur also uses AE to represent a vowel, and a different character for the sound h.

Confusables & spelling errors

This table lists characters that are often mistakenly used because they look the same as or similar to the code points used for Kurmanji, or perhaps because the correct character is not available on the user's keyboard.

Incorrect Correct Notes
064A 06CC The Arabic YEH doesn't drop the dots below in isolate and final positions.
0643 06A9 Common fonts tend not to show the difference between these two characters, but the ability to search and compare text is impaired unless the application is aware of and takes counter-measures against this substitution.

False friends

The following atomic characters look as if they could be composed of parts, but in fact there is no equivalence during normalisation, and so the atomic characters only should be used.

Atomic Sequence ( DO NOT use! )
ڵ 0644 065A
ێ 06CC 065A
ۆ 0648 065A

Codepoint sequences

Combining marks always follow the based character, however for Sorani, which doesn't normally use combining marks, this is only relevant for ێ when it occurs in decomposed text.

Where present, characters in a syllable should always occur in the following order.

  1. A consonant or independent vowel.
  2. One or 2 medial consonants.
  3. A pre-base dependent vowel.
  4. A non-prebase dependent vowel.
  5. A vowel lengthener.
  6. A final consonant letter or combining mark.

Formatting characters

U+200C ZERO WIDTH NON-JOINER (ZWNJ) can be used to force the production of a visible virama, rather than a half-form (see visiblevirama). It can also be used to prevent the formation of vowel ligatures (see vowelligatures).

U+200D ZERO WIDTH JOINER (ZWJ) is used to produce special joining forms for YA (see consonant_syllable).

Numbers

Digits

@ XXXX has a set of native digits

߀␣߁␣߂␣߃␣߄␣߅␣߆␣߇␣߈␣߉

The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##,##0%.c

However, unlike other right-to-left scripts such as Arabic, Hebrew, Thaana, the numbers are displayed right-to-left, with the most significant digit first. This means that numbers don't produce bidirectional text in N'Ko.

ߝߌߟߊߣߊ߲ ߕߋ߬ߟߋ߫ ߂߇-߂߈/߂߀߁߀ ߕߊ߬ߡߌ߲߬ߣߍ߲ ߠߊ߫߸

A date in N'Ko. All characters are read right to left, including numbers.

Ordinal numbers

Ordinal numbers are produced using diacritics.

The first ordinal is produced using  ߭ [U+07ED NKO COMBINING SHORT RISING TONE​], eg. ߁߭ first.

Others use  ߲ [U+07F2 NKO COMBINING NASALIZATION MARK​], eg. ߂߲ second. When there are multiple digits in a number, the diacritic appears only under the last in sequence, eg. ߁߂߃߲ 123rd.

Currency

The CLDR standard format for currency is ¤#,##0.00;¤-#,##0.00, and the symbol for the Lao currency, Kip, is [U+20AD KIP SIGN].

Text direction

@ XXXX text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the XXXXX orthography described here.

Glyph shaping & positioning

Experiment with examples using the XXXX character app.

Context-based shaping & positioning

Wancho letters don't interact, so no special shaping is needed.

Base characters carry only a single combining mark.

Some small adjustments may be needed for the placement of tone marks relative to the base letter, but these are mostly horizontal, since Wancho letters tend to have the same height. fig_gpos shows adjustments to the horizontal position of the tone mark, depending on the shape of the base character. It also shows kerning of the letters.

𞋀𞋬𞋫𞋬 𞋀𞋮𞋫𞋮
An illustration of context-sensitive glyph placement in Wancho.

Typographic units

Word boundaries

Words are separated by spaces.

Some words are hyphenated.

Graphemes

Graphemes in XXXX consist of single letters or letters with one or two combining marks. This means that text can be segmented into typographic units using grapheme clusters.

XXXX doesn't use word boundaries for text segmentation, relying instead on grapheme boundaries (sometimes called orthographic syllables).

Graphemes in XXXX consist of single letters or letters with a single combining mark.

When applied to XXXX, Unicode grapheme clusters split text into base characters plus any combining marks that follow. Therefore, grapheme clusters can be used to segment XXXX into typographic units.

Phrase, sentence, and section delimiters are described in phrase.

Punctuation & inline features

Phrase & section boundaries

,␣:␣;␣.␣?␣!␣।␣॥␣—

XXXX uses a mixture of ASCII and Arabic punctuation. Other languages using the Arabic script may use different punctuation, such as the full stop in Urdu.

XXX uses ASCII punctuation marks.

phrase

,

;

:

sentence

.

?

!

paragraph

A9CB

section Ditto.
general divider

A9CA

Bracketed text

See type samples.

(␣)

Wancho commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(

)

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the modern XXXXXX orthography.

The following list gives examples of typical behaviours for certain characters. Context may affect the behaviour of some of these.

Click/tap on the characters to show what they are.

  • “ ‘ (   should not be the last character on a line.
  • ” ’ ) . , ; ! ? । ॥ %   should not begin a new line.
  •   should be kept with any number, even if separated by a space or parenthesis.

Line breaking should not move a danda or double danda to the beginning of a new line even if they are preceded by a space character.

Page & book layout

Historical information

Orthographic development & variants

tbd

Online resources

  1. Universal Declaration of Human Rights - Assyrian Neo-Aramaic
  2. The Bible in Assyrian Neo-Aramaic

References