XXXX orthography notes

Howtos

Character panels

A typical character panel looks like:

𖼁␣𖼋␣𖼏␣𖼟␣𖼢␣𖼸␣𖼯␣𖼛␣𖼅␣𖼑␣𖼕

The source text is:

Class: characterBox produces a yellow background. Other coloured backgrounds are only used occasionally; they include auxiliaryBox (mauve), etc.

class:small renders the character at 80% size. Good for long lists of conjuncts, list counters, etc.

class:hideCh renders the character at 20% size with a border. Good for invisible characters that are represented by images (see data-notes).

class:noexpansion prevents the production of the downwards arrow that shows details for all characters.

class:nolist prevents production of the arrow that lists all characters.

class:ipaplus adds the contents of the ipa+ column in the spreadsheet (usually inherent vowel).

data-ignore a (single) character that should be ignored when listing the hex code point values; typically used for dotted circles.

data-ipa value is a comma-separated list of ipa transcriptions that will be aligned with the items displayed. You should normally remove 'ipa' from the data-cols field.

data-translit tbd.

data-links associates a link with each item.

data-extra tbd.

data-notes a comma-separated list of texts that are displayed below the main item. This is particularly useful for invisible characters because it can be used to pull in images, eg.
data-notes="<img src='../../c/Miao/large/16F8F.png' alt='ZWNJ' style='vertical-align:middle; height:2rem'>, ..."

data-last indicates that the hex code point value refers to the last item; used mostly for bicameral listings.

For cased scripts, display lists side-by-side using <div class="cased">, eg.

𞤭␣𞤵␣𞤫␣𞤮␣𞤢

𞤋␣𞤓␣𞤉␣𞤌␣𞤀

Codepoints

A typical reference to a code point or sequence of code points looks like the following 4 examples. The page converts simple markup to more complicated markup during rendering.

𞋀

𞋖𞋭

𞋮

200B

Basic markup for these is, respectively:

𞋀 or 1E2C0

𞋖𞋭 or 1E2D6 1E2ED

𞋮 or 1E2EE

200B

The generated markup will look something like this:

<bdi lang="th">𞋀</bdi><a href="javascript:void(0)" target="">U+1E2C0 WANCHO LETTER AA

Each orthography notes page globally declares a language tag to be used with example text. The markup automatically adds a lang attribute for that language to the generated markup. If you want a different lang attribute to be applied, simply add lang="xx" to the span tag (where xx is the lang value).

Every span tag must contain one of the following classes:

ch The span element content contains one or more characters (with no separators).

hx The span element content contains one or more hex code point values, separated by a space.

Optional, additional class names are as follows. Most of these can be combined with others.

img A PNG image will be used to show the character.

svg An SVG image will be used to show the character.

noname Show the character but suppress the name & codepoint value.

split Renders a sequence as 𞋖𞋭, rather than 𞋖𞋭.

circle Adds a dotted circle before the listed character(s) on display.

coda Adds a dotted circle after the listed character(s) on display. This is useful for showing character sequences from scripts like Thai, where the pronunciation is different in syllables with and without a coda.

init Adds ZWJ after the character(s), when displayed.

medi Adds ZWJ before and after the character(s), when displayed.

fina Adds ZWJ before the character(s), when displayed.

skip Adds ZWJ before the second character(s), when displayed. This is useful for cases where, for a cursive script, you want to show a combining mark followed by a joining letter.

noindex Prevents this item being picked up for inclusion in the index.

References

References can be added to the markup using <tt>...</tt>.

To indicate a source link that is listed in the refs.js file, use <tt> with the key for that resource. After a comma you can add a number for a page, or '#fragid', eg.<tt>ul,22</tt>

To indicate a source link that is NOT listed in the refs.js file, use <tt> containing '@shortname,url', eg. <tt>@Github,https://github.com/w3c/sealreq/issues/45</tt>

If this is something the reader should also look at, rather than just a source link, add class="more" to the tt element.

On devices that don't have hover any page number will be shown alongside the reference number.

Sequence tables

General pattern for use in figures and outside figures.

ক␣্␣ত␣ক্ত

ঞ␣্␣চ␣ঞ্চ

ক␣্␣ষ␣্␣ম␣ক্ষ্ম

Ligated conjunct forms.

Used alongside an example term.

গ্রাম

গ্রা␣গ␣্␣র␣া

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size: CHANGE_ME

Sample_text_goes_here

Source: Universal Declaration of Human Rights - Chakma, articles 1 & 2

Usage & history

Origins of the Oriya script, 1051 – today.

Phoenician

└ Aramaic

└ Brahmi

└ Tamil-Brahmi

└ Pallava

└ Grantha

└ Malayalam

+ Tigalari

+ Dhives Akuru

+ Saurashtra

Speakers of the Eastern Cham language number about 132,000 in Bình Thuận, Ninh Thuận, and Đồng Nai provinces in southern Vietnam, as well as in Hồ Chí Minh City. The Ethnologue estimates the L1 literacy rate to be 5%–10%. The Cham script is the primary orthography for the Eastern Cham, but the largely muslim Western Cham peoples of Cambodia prefer the Arabic script.eth Historically, the Eastern Cham script was learned by boys once they reached a certain age, but not by women and girls.ws

ꨀꨇꩉ ꨌꩌ

Coming to Southeast Asia with the expansion of Indian religions, Cham was one of the first scripts to develop from the Pallava script some time around 200 CE.ws

More information: Wikipedia • Endangered Alphabets

Basic features

The XXXX script is an abugida, ie. each consonant contains an inherent vowel sound. See the table to the right for a brief overview of features for the modern ZZZZ orthography.

The XXXX script is an alphabet, ie. consonants and vowels are written separately. See the table to the right for a brief overview of features for the ZZZZ language.

The XXXX script is an abjad, ie. short vowels are not normally written. See the table to the right for a brief overview of features for the ZZZZ language.

The ZZZZ XXXX orthography is derived from the Arabic/Persian abjads, where in normal use the script represents only consonant and long vowel sounds. However, the script has been adapted in this orthography in order to cope with the many more vowels sounds in Kashmiri, and this is one of the Arabic orthographies that regularly indicates all vowel sounds, making it an alphabet. See the table to the right for a brief overview of features for the modern Kashmiri orthography using the Arabic script.

XXXX text runs left-to-right in horizontal lines, but numbers and embedded Latin text are read left-to-right. There is no case distinction. Words are separated by spaces.

❯ consonantSummary

Modern XXXX represents native consonant sounds using 19 basic letters and 6 aspirated digraphs, but can use 13 more consonants to spell words loaned from Persian, Arabic and Urdu.

A special letter is used to indicate palatalisation, which is common in Kashmiri. Similarly to the other yeh used in Kashmiri, it has a circle below when used in syllable onsets, and a swash with no circle after a syllable coda.

Unlike other Arabic orthographies, 0652 (jazm), normally used to show vowel absence, is placed over the second consonant in an onset cluster (such as tr). That letter may therefore carry both the jazm diacritic and a vowel diacritic, which is quite unusual.

❯ basicV

Kashmiri is an alphabet where 16 vowel sounds (far more than in Arabic or Persian) are written using a mixture of 10 combining marks and 10 letters. Unlike Arabic, Persian, and Urdu, all vowel diacritics are always visible in Kashmiri texts. Representation of vowel sounds is complicated because the code points used to represent a given vowel typically differ according to its position within a word.

The distinction between ijam vs. tashkil has a bearing on several Kashmiri graphemes, and the choice between precomposed and decomposed realisations of a vowel letter can be complicated (see encoding).

Word-initial standalone vowels are preceded by or attached to either 0627 or 0639.

The jazm is used over a word-medial 0646 to indicate nasalisation of a preceding vowel sound. Apart from this use and that for medial consonants, it is not typically used to indicate vowel absence.

Kashmiri uses native digits, and Arabic code points for several of the more common punctuation marks.

Notable features

no circumgraphs — vowel signs are doubled up
conjuncts form across word boundaries
independent vowels distinguish proper nouns or foreign words from ordinary words
vocalic letters exist, but no vocalic vowel signs
consonants classed as basic, murda, and mahaprana
dedicated medial consonants

xxxxx

Numbers

Show

०␣१␣२␣३␣४␣५␣६␣७␣८␣९

Punctuation

Show

xxxxx

ASCII

!␣(␣)␣,␣-␣.␣:␣;␣?

The following represents the repertoire of the ZZZZ language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

Complex vowels

ia iɯʔ iə	ua uə
ea	oa
ɛə	ɔə
auʔ aɪ aʊ

Consonant sounds

	labial	labio- dental	dental	alveolar	post- alveolar	retroflex	palatal	velar	uvular	pharyngeal epiglottal	glottal
stop	p b			t d		ʈ ɖ	c ɟ	k ɡ	q ɢ	ʡ	ʔ
	pʰ bʰ			tʰ dʰ		ʈʰ ɖʰ	cʰ ɟʰ	kʰ ɡʰ
	ɓ			ɗ			ʄ	ɠ	ʛ
	ᵐp ᵐb			ⁿt ⁿd		ᶯʈ ᶯɖ	ᶮc ᶮɟ	ᵑk ᵑɡ	ᶰq ᶰɢ
affricate				t͡s d͡z	t͡ʃ d͡ʒ	ʈ͡ʂ ɖ͡ʐ	t͡ɕ d͡ʑ
				t͡sʰ d͡zʰ	t͡ʃʰ d͡ʒʰ	ʈ͡ʂʰ ɖ͡ʐʰ	t͡ɕʰ d͡ʑʰ
fricative	ɸ β	f v	θ ð	s z	ʃ ʒ	ʂ ʐ	ç ʝ	x ɣ	χ ʁ	ħ ʕ	h ɦ
				ɬ ɮ	ɧ		ɕ ʑ
nasal	m	ɱ		n		ɳ	ɲ	ŋ	ɴ
approximant	ʍ w	ʋ		l ɫ ɹ		ɻ ɭ	j ʎ	ɰ ʟ
trill/flap	ʙ	ⱱ		r ɾ ɺ		ɽ			ʀ	ʜ ʢ

Tone

Thai is a contour tone language, with 5 tones: high, mid, low, falling, & rising.

tbd

Structure

tbd

Alphabet

Click on the characters to find where they are mentioned in this page.

The ZZZZ alphabet has 23 consonants and 5 vowels. Each has upper and lowercase forms; shown above and below, respectively.

𞤢␣𞤣␣𞤤␣𞤥␣𞤦␣𞤧␣𞤨␣𞤩␣𞤪␣𞤫␣𞤬␣𞤭␣𞤮␣𞤯␣𞤰␣𞤱␣𞤲␣𞤳␣𞤴␣𞤵␣𞤶␣𞤷␣𞤸␣𞤹␣𞤺␣𞤻␣𞤼␣𞤽

𞤀␣𞤁␣𞤂␣𞤃␣𞤄␣𞤅␣𞤆␣𞤇␣𞤈␣𞤉␣𞤊␣𞤋␣𞤌␣𞤍␣𞤎␣𞤏␣𞤐␣𞤑␣𞤒␣𞤓␣𞤔␣𞤕␣𞤖␣𞤗␣𞤘␣𞤙␣𞤚␣𞤛

Syllables

Vowels

Vowel summary table

This table only summarises basic vowel to character assignments.

ⓘ represents the inherent vowel. Diacritics are added to the vowels to indicate nasalisation (not shown here).

	post-consonant	word-final standalone	word-medial standalone	word-initial standalone
Simple:




Vocalics:

For additional details see vowel_mappings.

Inherent vowel

ᨠ ka U+1A20 LETTER HIGH KA

The inherent vowel for XXXXXX is pronounced a or ʌ. So ka is written by simply using the consonant letter.

The inherent vowel is not usually pronounced at the end of a word.

The suppression of the inherent vowel in a syllable coda is indicated using the 𑜫 diacritic (see the examples above).

Post-consonant vowels

Combining marks used for vowels

ᬓᬶ ki U+1B13 BALINESE LETTER KA + U+1B36 BALINESE VOWEL SIGN ULU

Balinese uses the following dedicated combining marks for vowels. They may be used on their own, or in combination with others (see composite_vowels). They are all vowel signs.

xxxxxxx

To represent the sounds rə or lə, Balinese uses vocalic letters. A sequence such as *ᬭᭂ [U+1B2D BALINESE LETTER RA + U+1B42 BALINESE VOWEL SIGN PEPET] is not used. See vocalics.

Six of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

All vowel signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time. The glyphs used to represent vowels, whether alone or in multipart vowels, are arranged around a syllable onset, which may be 2 consonants, rather than just around the immediately preceding consonant. See prebase and circumgraphs.

Vowel letters

something

Dedicated vowel letters

These are the dedicated vowel letters.

xxxxxx

Consonants used for vowels

Kashmiri uses the following consonant characters to write vowels, generally in combination with diacritics, but also alone when representing eː and oː in non-initial positions.

xxxxxx

Composite vowel signs

Multipart vowels are only produced when text is decomposed; 5 of the circumgraphs split off the 1B35 glyph, to create the following pairs:

ᭀ␣ᭃ␣ᭁ␣ᬻ␣ᬽ

Pre-base vowel signs

𑐶

The short i sound is written using 𑐶 [U+11436 NEWA VOWEL SIGN I], which appears to the left of the base consonant letter or cluster.

A pre-base vowel sign (highlighted). It is positioned to the left of the consonant after which it is typed, stored, and pronounced.

show composition

ꤷꥉꤼꥉꤿ꥓

This combining mark is always typed and stored after the base consonant. The rendering process places the glyph before the base consonant at the time of display.

When an orthographic syllable begins with a consonant cluster that is rendered as a conjunct, the vowel sign is rendered before the start of the syllable, eg. here are 3 sets of consonant clusters, each followed by i when spoken, but the vowel sign appears to the left of each cluster.𑐗𑑂𑐏𑐶 𑐳𑑂𑐟𑐶 𑐧𑑂𑐬𑐶 jkhi sti bri

When an orthographic syllable begins with a consonant cluster, the vowel sign is actually placed before the start of the onset, ie. to the left of the syllable as a whole. See fig_prebase.

show composition

ꨚꨴꨯꨱꩃ

However, note that if the cluster is split by a visible virama, this creates two orthographic syllables and the pre-base vowel sign appears after the consonant with the virama.

Circumgraphs

ᬓᭀ ko U+1B13 BALINESE LETTER KA + U+1B40 BALINESE VOWEL SIGN TALING TEDUNG

Five vowel signs are usually produced by a single combining character with visually separate parts, that appear on different sides of the consonant onset.

ᭀ␣ᭃ␣ᭁ␣ᬻ␣ᬼ␣ᬽ

This section includes some vowel signs described in the section vocalics.

Like pre-base glyphs, these are combining marks that are always stored after the base consonant. When rendered, the single code point produces multiple glyphs, which are placed on different sides of the base consonant. Click on fig_circumgraphs to see the sequence of characters in storage.

show composition

ꤷꥇꥆꥋꥆ

Glyphs can appear on up to 3 sides of the base. Some of the glyphs merge with the base character's glyph (see context).

These circumgraphs have canonically equivalent decomposed forms (see vs_encoding).

Vowel length

tbd

Nasalisation

tbd

Composite vowel signs

Multipart vowels are only produced when text is decomposed; 5 of the circumgraphs split off the 1B35 glyph, to create the following pairs:

ᭀ␣ᭃ␣ᭁ␣ᬻ␣ᬽ

Standalone vowels

Kirat Rai has no independent vowel letters, but instead uses 𖵃 as a carrier for standalone vowels. This can represent a zero or glottal stop onset. The vowel to be pronounced is indicated by attaching a vowel sign, eg.

𖵃𖵤𖵛𖵣

Used alone, this letter represents the standalone vowel a.

𖵃𖵈𖵫𖵄𖵣

The following list shows how to write each of the standalone vowels.

𖵃𖵤␣𖵃𖵦␣𖵃𖵥␣𖵃𖵧␣𖵃𖵩␣𖵃␣𖵃𖵣␣𖵃𖵨␣𖵃𖵪

Standalone vowels are written using the normal vowel signs with no special additional mechanisms, eg.

𞋀𞋊𞋞

Tones

tbd

Vowel sounds to characters

This section maps Fula vowel sounds to common graphemes in the Adlam orthography.

Important notes.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

Plain vowels

iː

mixed ີ

mixed ິ

Diphthongs & rhymes

1752

ᝐᝒᝆ

1741

ᝁᝇᝓ

1753

ᝎᝓᝆᝓ

1742

ᝂᝇᝓ

Vocalics

ᬋ␣ᬌ␣ᬍ␣ᬎ␣ᬺ␣ᬻ␣ᬼ␣ᬽ

description

Consonants

Consonant summary table

This table only summarises basic consonant to character assignments.

The left column is lowercase, and the right uppercase. A number of letters have allophones which are not shown here, and consonant clusters can be unpredictable in pronunciation. See the following sections for details. Normal letters are used as final consonants, but the right-hand column lists some additional, dedicated finals.

	post-consonant	word-final standalone	word-medial standalone	word-initial standalone
Onsets




Finals:

For additional details see consonant_mappings.

Basic consonants

Basic consonant sounds in Bengali are written using the following letters

Click on each letter for usage notes, alternative pronunciations, and for examples of usage.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

onset ပ

pʰ

onset ဖ

onset ဘ sometimes at the beginning of words or particles.

Other features

XXXX

tbd

XXXXX

tbd

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Canonically equivalent encodings

One letter only can be represented as an atomic character (the norm), or as a sequence of base letter plus combining mark. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.

Atomic (recommended)	Decomposed ( NOT recommended )
ێ	064A 0654

Note that the base character in the decomposed sequence is 064A, and not 06CC, which is used elsewhere for yeh in Kurmanji. 064A is only used for this specific decomposed sequence; it is inappropriate to use it elsewhere in Kurmanji text.

Unresolved encodings

A couple of Sorani letters are encoded in different ways in different texts, and in fact can be mixed within the same text. At the moment there doesn't appear to be a clear ruling on which is expected. This section lists the alternatives.

Alternative 1	Alternative 2	Notes
06BE	0647	Wikipedia and Wiktionary represent the sound h using only HEH DOACHASHMEE, whereas gov.krd uses only ARABIC HEH. Other online resources examined are not completely one or the other. Note that Uighur uses HEH DOACHASHMEE to represent h.
06D5	0647 200E	The majority usage seems to favour AE, although again some resources mix both to some degree (although they tend to mostly use AE). This makes sense, since the use of HEH plus ZWNJ has the appearance of a hack intended to prevent HEH joining to the left, whereas AE will do this naturally, without any formatting code point. Use of AE also gets around practical difficulties that arise because the ZWNJ character is invisible and is not readily accessible from many keyboards – difficulties that are amplified by the fact that this vowel letter is one of the most commonly used letters in the Sorani alphabet. Note that Uighur also uses AE to represent a vowel, and a different character for the sound h.

Confusables & spelling errors

This section lists characters that may be mistakenly rendered using sequences that look the same as or similar to the atomic code points available for Bengali.

This table lists characters that are often mistakenly used because they look the same as or similar to the code points used for Kurmanji, or perhaps because the correct character is not available on the user's keyboard.

Incorrect	Correct	Notes
064A	06CC	The Arabic YEH doesn't drop the dots below in isolate and final positions.
0643	06A9	Common fonts tend not to show the difference between these two characters, but the ability to search and compare text is impaired unless the application is aware of and takes counter-measures against this substitution.

This information draws on the DoNotEmit tables.

False friends

The following atomic characters look as if they could be composed of parts, but in fact there is no equivalence during normalisation, and so the atomic characters only should be used.

Atomic	Sequence ( DO NOT use! )
ڵ	0644 065A
ێ	06CC 065A
ۆ	0648 065A

Codepoint sequences

Combining marks always follow the based character, however for Sorani, which doesn't normally use combining marks, this is only relevant for ێ when it occurs in decomposed text.

Where present, characters in a syllable should always occur in the following order.

A consonant or independent vowel.
One or 2 medial consonants.
A pre-base dependent vowel.
A non-prebase dependent vowel.
A vowel lengthener.
A final consonant letter or combining mark.

Formatting characters

U+200C ZERO WIDTH NON-JOINER (ZWNJ) can be used to force the production of a visible virama, rather than a half-form (see visiblevirama). It can also be used to prevent the formation of vowel ligatures (see vowelligatures).

U+200D ZERO WIDTH JOINER (ZWJ) is used to produce special joining forms for YA (see consonant_syllable).

Numbers

Digits

@ XXXX has a set of native, decimal digits

𖵰␣𖵱␣𖵲␣𖵳␣𖵴␣𖵵␣𖵶␣𖵷␣𖵸␣𖵹

The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##,##0%.c

However, unlike other right-to-left scripts such as Arabic, Hebrew, Thaana, the numbers are displayed right-to-left, with the most significant digit first. This means that numbers don't produce bidirectional text in N'Ko.

ߝߌߟߊߣߊ߲ ߕߋ߬ߟߋ߫ ߂߇-߂߈/߂߀߁߀ ߕߊ߬ߡߌ߲߬ߣߍ߲ ߠߊ߫߸

A date in N'Ko. All characters are read right to left, including numbers.

Ordinal numbers

Ordinal numbers are produced using diacritics.

The first ordinal is produced using ߭ [U+07ED NKO COMBINING SHORT RISING TONE], eg. ߁߭ first.

Others use ߲ [U+07F2 NKO COMBINING NASALIZATION MARK], eg. ߂߲ second. When there are multiple digits in a number, the diacritic appears only under the last in sequence, eg. ߁߂߃߲ 123rd.

Currency

The CLDR standard format for currency is ¤#,##0.00;¤-#,##0.00, and the symbol for the Lao currency, Kip, is ₭ [U+20AD KIP SIGN].

₭

Text direction

@ XXXX text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the XXXXX orthography described here.

Glyph shaping & positioning

Experiment with examples using the XXXX character app.

Context-sensitive shaping & positioning

Wancho letters don't interact, so no special shaping is needed.

Base characters carry only a single combining mark.

Some small adjustments may be needed for the placement of tone marks relative to the base letter, but these are mostly horizontal, since Wancho letters tend to have the same height. fig_gpos shows adjustments to the horizontal position of the tone mark, depending on the shape of the base character. It also shows kerning of the letters.

Bengali fonts need to adapt glyphs based on their context.

Principal areas where context-sensitive shaping is required include conjunct formation and vowel ligation. Positioning of glyphs is also sometimes context-sensitive, particularly where multiple diacritics are applied to a single base.

The rest of this section provides examples of context-sensitive shaping and positioning.

𞋀𞋬𞋫𞋬 𞋀𞋮𞋫𞋮 — An illustration of context-sensitive glyph placement in Wancho.

Typographic units

Word boundaries

Words are separated by spaces.

Some words are hyphenated.

Graphemes

Graphemes in XXXX consist of single letters or letters with one or two combining marks. This means that text can be segmented into typographic units using grapheme clusters.

XXXX doesn't use word boundaries for text segmentation, relying instead on grapheme boundaries (sometimes called orthographic syllables).

Graphemes in XXXX consist of single letters or letters with a single combining mark.

When applied to XXXX, Unicode grapheme clusters split text into base characters plus any combining marks that follow. Therefore, grapheme clusters can be used to segment XXXX into typographic units.

Phrase, sentence, and section delimiters are described in phrase.

Punctuation & inline features

Phrase & section boundaries

,␣:␣;␣.␣?␣!␣।␣॥␣—

XXXX uses a mixture of ASCII and Arabic punctuation. Other languages using the Arabic script may use different punctuation, such as the full stop in Urdu.

XXX uses ASCII punctuation marks.

phrase	, ; :
sentence	. ? !
paragraph	A9CB
section	Ditto.
general divider	A9CA

Bracketed text

See type samples.

(␣)

Wancho commonly uses ASCII parentheses to insert parenthetical information into text.

	start	end
standard	(	)

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the modern XXXXXX orthography.

The following list gives examples of typical behaviours for certain characters. Context may affect the behaviour of some of these.

Click/tap on the characters to show what they are.

“ ‘ ( should not be the last character on a line.
” ’ ) . , ; ! ? । ॥ % should not begin a new line.
৳ should be kept with any number, even if separated by a space or parenthesis.

Line breaking should not move a danda or double danda to the beginning of a new line even if they are preceded by a space character.

Page & book layout

Historical information

Orthographic development & variants

tbd

XXXX (xxxx)

Howtos

Character panels

Codepoints

References

Sequence tables

Sample

Usage & history

Basic features

Notable features

Character index

Letters

Basic consonants

Extended consonants

Vowels

Vocalics

Other

Not used for XXXXX

Combining marks

Vowels

Vocalics

Tones

Medials

Finals

Other

Not used for XXXXX

Numbers

Punctuation

ASCII

CLDR additions

Not used for XXXXX

Symbols

Not used for XXXXX

Other

To be investigated

Phonology

Vowel sounds

Plain vowels

Complex vowels

Consonant sounds

Tone

Structure

Alphabet

Syllables

Vowels

Vowel summary table

Inherent vowel

Post-consonant vowels

Combining marks used for vowels

Vowel letters

Dedicated vowel letters

Consonants used for vowels

Composite vowel signs

Pre-base vowel signs

Circumgraphs

Vowel length

Nasalisation

Composite vowel signs

Standalone vowels

Tones

Vowel sounds to characters

Plain vowels

Diphthongs & rhymes

Vocalics

Consonants

Consonant summary table

Basic consonants

Repertoire extension

Onsets

Finals

Consonant clusters

Stacking

Initial RA in clusters

Ligated forms

Conjoined consonants or a visible halanta

Triple-consonant clusters

Visible vowel-killer

Consonant length

Consonant sounds to characters

Symbols