Tai Dam orthography notes

Usage & history

Origins of the Tai Viet script, 16thC – today.

Phoenician

└ Aramaic

└ Brahmi

└ Tamil-Brahmi

└ Pallava

└ Old Khmer

└ Sukhothai

└ Tai Viet

+ Khom Tai

+ Lai Tay

The Tai Viet script is used for writing the Tai Dam (Black Tai or Tai Noir), Tai Dón (White Tai or Tai Blanc), Tai Daeng, Thai Song (Lao Song or Lao Song Dam) and Tày Tac languages spoken in Vietnam, Laos, China and Thailand. There is also a diaspora in the United States, Australia and France.

The total population using the three languages, across all countries, is estimated to be 1.3 million (Tai Dam 764,000, Tai Dón 490,000, Thai Song 32,000). The script is still used by the Tai people in Vietnam, and there is a desire to introduce it into formal education there.

ꪼꪕꪒꪾ

Little is known about the origin of the Tai Viet script. It appears to have been derived from the Thai script around the 16th century.

Significant variation occurs in the orthographic conventions of the Tai languages, as well as in their phonologies. A unified, standardized version of the script, with an agreed upon core set of characters, was developed at a UNESCO-sponsored workshop in 2006, and subsequently accepted for encoding in The Unicode Standard.

Sources: Scriptsource, The Unicode Standard.

Basic features

The script is an alphabet. Both consonants and vowels are indicated by letters. See the table to the right for a brief overview of features for the modern Tai Dam orthography.

The Tai Viet script is heavily syllable-based, with exceptions being a very small number of unstressed initial syllables, and loan words.

Tai Viet text runs left to right in horizontal lines. Words are separated by spaces, although this is a recent innovation.

❯ consonantSummary

Tai Dam uses 42 consonant charactes, all neatly divided into 2 classes. Each consonant is associated with a high or low class to indicate tone. Tone is indicated by a combination of the consonant class, the syllable type (checked/unchecked), plus any tone mark.

There are no conjuncts or subjoined consonants.

The only syllable-initial cluster involves labialisation, using ꪫ w.

Syllable-final consonant sounds use a subset of 8 ordinary consonant letters, but since there is no inherent vowel, it is still simple to detect syllable boundaries. Syllable-final consonant sounds are also built into 6 vowel-consonant graphemes.

❯ basicV

The Tai Dam orthography is an alphabet (with no inherent vowel). Vowels are written using a mixture of 13 ordinary spacing characters (of which 5 are also consonants) and 7 combining marks.

Tai Viet uses visual placement: only the vowel components that appear above or below the consonant are combining marks; the others are ordinary spacing characters that are typed in the order seen.

This page lists 6 multipart vowels, made from 6 vowel signs and 3 consonants. Multipart vowels can involve up to 3 glyphs, though usually only 2, and glyphs can surround the base consonant(s) on 2 sides.

There are 5 pre-base vowel glyphs (all letters), but no circumgraphs.

There are no independent vowels, and standalone vowel sounds use a vowel sign attached to ꪮ or ꪯ.

Tone can be indicated either by diacritics or ordinary spacing characters. Both are a recent innovation. Combining tone marks always follow the root consonant and any combining vowels, ie. they come before any post-base vowel. Spacing tone marks always come at the very end of the syllable.

Phonology

These are sounds for the Tai Dam language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones.

Vowel sounds

Plain vowels

Diphthongs

Consonant sounds

	labial	alveolar	post- alveolar	palatal	velar	glottal
stop	p b	t d			k ɡ	ʔ
aspirated		tʰ
affricate			t͡ɕ
fricative	f v	s			x	h
nasal	m	n		ɲ	ŋ
approximant	w	l		j
trill/flap		r

r and ɡ are used in Vietnamese names.

Syllable-final

	labial	alveolar	palatal	velar	glottal
stop	p	t		k	ʔ
nasal	m	n		ŋ
approximant	w		j

Tone

tbd

Structure

The Tai languages are almost exclusively monosyllabic. A very small number of words have an unstressed initial syllable, and loan words may be polysyllabic.b

The essential character sequence of a Tai Viet syllable is:

pre-base vowel?, root consonant(s), combining vowel?, post-base vowel?, final consonant?

The root consonant(s) may be a cluster involving labialisation. Any combining vowel goes after the root consonant(s).

Tone marks expressed as combining characters always follow the root consonant(s) and any combining vowels, which means that they come before any post-base vowel.

Tone marks expressed as spacing characters always come at the very end.

-ap. One other sequence occurs when writing the vowel-final consonant combination -ap, which is written with a vowel placed over the final low-series b, rather than over the initial consonant, eg. ꪁꪚꪾ kap

The sequence is: root consonant(s) + ꪚꪾ U+AA9A LETTER LOW BO + U+AABE VOWEL AM

See gpos, however, for a font variant setting that allows you to store the code points in the normal order, but still display the AM over the BO.

Vowels

Vowel summary table

This table only summarises basic vowel to character assignments.

Plain	ꪲ␣ꪳ␣␣ꪴ
	ꪹ◌ꪸ␣ ␣ꪶ◌
	ꪹ◌ꪷ
	ꪵ◌␣ ␣ꪷ␣ꪮ␣ꪯ
	ꪰ␣ꪱ
Complex	ꪸ␣ ␣ꪹ◌␣ ␣ꪺ◌
	ꪻ◌
	ꪵ◌ꪫ␣ꪵ◌ꪫꪥ
	ꪼ◌␣ꪹ◌ꪱ␣ꪾ␣ꪽ␣ꪚꪾ
Standalone	ꪮ␣ꪯ

For additional details see vowel_mappings.

Post-consonant vowels

Vowels are written using a mixture of 13 ordinary spacing characters (of which 5 are also consonants) and 7 combining marks. Tai Viet uses visual placement: only the vowel components that appear above or below the consonant are combining marks; the others are ordinary spacing characters that are typed in the order seen.

Combining marks used for vowels

ꪁꪲ ki U+AA81 TAI VIET LETTER HIGH KO + U+AAB2 TAI VIET VOWEL I

Tai Dam uses the following combining marks for vowels.

ꪲ␣ꪳ␣ꪴ␣ꪷ␣ꪰ␣ꪸ␣ꪾ

Dedicated vowel letters

ꪁꪺ kuə U+AA81 TAI VIET LETTER HIGH KO + U+AABA TAI VIET VOWEL UA

The following additional, vowel-specific characters are ordinary spacing characters, with the general category of 'letter'.

ꪶ␣ꪵ␣ꪱ␣ꪹ␣ꪺ␣ꪻ␣ꪼ␣ꪽ

Five of these are typed and stored before the onset consonant (see prebase), and only the following 3 appear after:

ꪱ
ꪽ
ꪺ

Consonants used for vowels

ꪁꪮ kɔ U+AA81 TAI VIET LETTER HIGH KO + U+AAAE TAI VIET LETTER LOW O

The following characters are also used to create vowel sounds, either alone or as part of a multipart vowel.

ꪮ␣ꪯ␣ꪫ␣ꪥ␣ꪚ

ꪮ and ꪯ can represent vowels on their own. The following word in fact shows the same character being used as both consonant and vowel in the same word.b

ꪮꪮꪀ

The others are used in combination with other vowel signs, see compositeV.

Pre-base vowel signs

ꪶꪁ ko U+AAB6 TAI VIET VOWEL O + U+AA81 TAI VIET LETTER HIGH KO

Five vowel signs appear to the left of the onset consonant after which they are pronounced.

ꪹ␣ꪶ␣ꪵ␣ꪻ␣ꪼ

Like Lao, Tai Viet uses a visual encoding model, so these characters are not combining characters, but are typed and stored before the base. For example:

ꪵꪣꪫ

Note that ꪵ should not be typed as two successive ꪹ characters.

These vowel signs are placed before the start of the syllable onset. This means that in a word with more than one consonant at the start (ie. a labialised consonant) the pre-base vowel is placed to the left of the syllable-initial consonant, rather than to the left of the consonant after which it is pronounced.

fig_prebase shows an example to graphically illustrate the relationships between the characters.

A vowel sign that appears 2 characters out of sequence from where it is pronounced, because the syllable onset is 2 characters long.

show composition

ꪵꪁꪫꪥ

Multipart vowels

ꪹꪁꪸ ke U+AAB9 VOWEL UEA + U+AA81 LETTER HIGH KO + U+AAB8 VOWEL IA

Vowels represented by combinations of the above characters include the following, which mostly add glyphs to different sides of the base:

ꪹꪱ␣ꪹꪸ␣ꪹꪷ␣ꪵꪫ␣ꪵꪫꪥ␣ꪚꪾ

Pre-base and post-base vowel glyphs are split around the syllable onset, which may be more than a single character. fig_prebase shows an example.

ꪫ can be ambiguous in this combination unless there is a tone mark. The sequence ꪵ–ꪫꪥ U+AAB5 VOWEL E + U+AAAB LETTER HIGH VO + U+AAA5 LETTER HIGH YO is sometimes used to remove that ambiguity. For details, see onsets.

The last item in the list is rather unusual. Some dialects use the combination ꪚꪾ to make -ap,b,7 eg. ꪀꪚꪾ There are 2 possible code point orders that can be used for this: see structure.

Characters that don't appear in the combinations:

ꪲ␣ꪳ␣ꪴ␣ꪶ␣ꪮ␣ꪯ␣ꪰ␣ꪰ␣ ␣ꪻ␣ꪼ␣ꪽ

Show which combinations contain a given character:

ꪹ	ꪹ-ꪸ␣ꪹ-ꪷ␣ ␣ꪹ-ꪱ
ꪵ	ꪵ-ꪫ
ꪱ	ꪹ-ꪱ
ꪸ	ꪹ-ꪸ
ꪷ	ꪹ-ꪷ
ꪫ	ꪵ-ꪫ
ꪚ	-ꪚꪾ
ꪾ	-ꪚꪾ

Show details about glyph positioning

The following list shows where vowel signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,

5 pre-base, eg. ꪶꪁ ok
3 post-base, eg. ꪁꪱ kā
6 superscript, eg. ꪁꪲ ki
1 subscript, eg. ꪁꪴ ku
2 pre+post-base, eg. ꪹꪁꪱ ɨᵊkā (kaʷ)
2 pre+superscript, eg. ꪹꪁꪱ ɨᵊkā (ke)
1 post+superscript, eg. ꪁꪜꪾ kp̄aᵐ (kap)

Standalone vowels

There are no independent vowels, and Tai Viet represents what look like standalone vowels using a vowel sign attached to ꪮ or ꪯ, and phonetic transcriptions include an initial glottal stop.

ꪮ꪿ꪱꪉ

ꪵꪮꪚ

Tones

Until the latter part of the 20th century Tai Viet didn't mark tones other than by the consonant class. Since then, however, 2 methods have developed.

Tai Dam speakers in the United States and speakers of the Song language borrowed combining tone marks from Lao/Thai.

꪿␣꫁␣

These tone marks are typed and stored immediately after any combining vowel sign, if there is one, otherwise after the initial consonant(s).

The Tai community in Vietnam developed an alternative approach, where tone is marked by ordinary spacing characters that are typed and stored after all other elements in the syllable.

ꫀ␣ꫂ

The following chart shows how to tell which tones are associated with a syllable.

Consonant	Checked?	Tone mark	Tone
high	checked	-	5
	open	-	4
		꪿ or ꫀ	5
		꫁ or ꫂ	6
low	checked	-	2
	open	-	1
		꪿ or ꫀ	2
		꫁ or ꫂ	3

Vowel sounds to characters

This section maps Tai Dam vowel sounds to common graphemes in the Tai Viet orthography.

Plain vowels

dependent vowel ꪲ

dependent vowel ꪳ

dependent vowel ꪴ

circumgraph vowel ꪹ◌ꪸ

prescript vowel ꪶ◌

circumgraph vowel ꪹ◌ꪷ

prescript vowel ꪵ◌

prescript vowel ꪷ

prescript vowel ꪯ

prescript vowel ꪮ

dependent vowel ꪰ

aː

dependent vowel ꪱ

Diphthongs and rhymes

iə

dipthong ꪸ

ɨə

prescript dipthong ꪹ◌

uə

dipthong ◌ꪺ

ʷɛ

circumgraph dipthong ꪵ◌ꪫ

circumgraph dipthong ꪵ◌ꪫꪥ in some dialects, to avoid ambiguity.

əw

prescript dipthong ꪻ◌

prescript vowel ꪼ◌

aːw

circumgraph dipthong ꪹ◌ꪱ

rhyme ꪾ

rhyme ꪽ

rhyme ꪚꪾ

Consonants

Consonant summary table

This table only summarises basic consonant to character assignments.

For initial consonants, the left column shows high class consonants, and the right low class.

	high class	low class
Onsets	ꪝ␣ꪛ␣ꪕ␣ꪓ␣ꪁ␣ꪇ␣ꪯ␣ ␣ꪗ	ꪜ␣ꪚ␣ꪔ␣ꪒ␣ꪀ␣ꪆ␣ꪮ␣ ␣ꪖ
	ꪋ	ꪊ
	ꪡ␣ꪫ␣ꪏ␣ꪅ␣ꪭ	ꪠ␣ꪪ␣ꪎ␣ꪄ␣ꪬ
	ꪣ␣ꪙ␣ꪑ␣ꪉ	ꪢ␣ꪘ␣ꪐ␣ꪈ
	ꪫ␣ꪧ␣ꪩ␣ꪥ	ꪦ␣ꪨ␣ꪤ
Medial	ꪫ
Finals	ꪚ␣ꪒ␣ꪀ
	ꪣ␣ꪙ␣ꪉ
	ꪫ␣ꪥ
	ꪾ␣ꪽ␣ꪜꪾ

For additional details see consonant_mappings.

Basic consonants

Basic consonant sounds in Tai Dam are written using the following letters.

Click on each letter for more details and for examples of usage, especially where more than one sound is indicated.

The letters ʰ and ˡ below each character indicate whether the class is high or low.

ꪝ␣ꪜ␣ꪛ␣ꪚ␣ꪕ␣ꪔ␣ꪓ␣ꪒ␣ꪁ␣ꪀ␣ꪇ␣ꪆ␣ꪯ␣ꪮ␣ꪗ␣ꪖ␣ꪋ␣ꪊ␣ꪡ␣ꪠ␣ꪫ␣ꪪ␣ꪏ␣ꪎ␣ꪅ␣ꪄ␣ꪭ␣ꪬ␣ꪣ␣ꪢ␣ꪙ␣ꪘ␣ꪑ␣ꪐ␣ꪉ␣ꪈ␣ꪫ␣ꪧ␣ꪦ␣ꪩ␣ꪨ␣ꪥ␣ꪤ

Other dialects

Three pairs of consonants are used for the Tai Don language, but not for Tai Dam.btd They are:

ꪟ␣ꪞ␣ꪍ␣ꪌ␣ꪃ␣ꪂ

Onsets

The consonant wa can appear immediately after the initial consonant in a syllable. It is written using ꪫ.

The pronunciation of a syllable containing WA in non-initial position can be ambiguous, unless there is a diacritic, since the WA may or may not be a final consonant.b Compare ꪀꪲꪫ ḵiw ꪀꪫꪲ ḵwi and ꪵꪀ꫁ꪫ ɛḵ²w kʷɛ ꪵꪀꪫ꫁ ɛḵw² kɛw

In order to address the latter ambiguity, the character ꪥ is sometimes appended to the end of the sequence to indicate the second pronunciation, eg. ꪵꪁꪫꪥ Since j never occurs after ɛ, this can be done without creating a new ambiguity. This spelling is only used in some dialects of the traditional script, however, it has been adopted as a standard in a project sponsored by the Son La Department of Science and Technology.b

The sound kʰʷ exists in Tai Don, but not in Tai Dam. The sound kʷ exists in both languages.btd

Finals

Syllable-final plosives are written using the following low class consonants. These create 'checked' syllables.

ꪀ␣ꪚ␣ꪒ

For open syllables ending with nasals or glides, the following high class consonants are used.

ꪣ␣ꪙ␣ꪉ␣ꪥ␣ꪫ

In addition, several vowels carry a final consonant. See vowels. These include:

ꪾ␣ꪽ␣-ꪜꪾ␣ꪹ-ꪱ␣ꪼ␣ꪻ

Consonant clusters

Consonant clusters occur in the following circumstances:

When an initial consonant is labialised, see onsets.
Where a syllable ends with a consonant and another syllable begins.

No special characters or viramas are involved, in any of those. There are no conjunct forms or subjoined consonants.

Consonant sounds to characters

This section maps Tai Dam consonant sounds to common graphemes in the Tai Viet orthography.

The labels on the left show whether this consonant is high class, low class, or a coda.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

high ꪝ

low ꪜ

coda ꪚ

coda ꪚꪾ

See also the rhymes for -ap.

high ꪛ

low ꪚ

high ꪕ

low ꪔ

coda ꪒ

tʰ

high ꪗ

low ꪖ

t͡ɕ

high ꪋ

low ꪊ

high ꪓ

low ꪒ

high ꪁ

low ꪀ

kon⁴

logograph ꫛ

high ꪇ

low ꪆ

high ꪯ

low ꪮ

coda ꪀ

high ꪡ

low ꪠ

high ꪫ

low ꪪ

high ꪏ

low ꪎ

high ꪅ

low ꪄ

high ꪭ

low ꪬ

high ꪣ

low ꪢ

coda ꪣ

coda ꪾ

high ꪙ

low ꪘ

coda ꪙ

coda ꪽ

nɨŋ⁵

logograph ꫜ

high ꪑ

low ꪐ

high ꪉ

low ꪈ

coda ꪉ

high ꪫ medial glide.

coda ꪫ See also the diphthongs ending in w.

high ꪧ

low ꪦ

high ꪩ

low ꪨ

high ꪥ

low ꪤ

coda ꪥ See also the diphthongs ending in j.

Glyph shaping & positioning

Experiment with examples using the Tai Viet character app.

Font styles

Glyph variants. The Tai Heritage Pro font also has font features that allow the following alternative glyph shapes for certain characters.

feature	code point	alternative shapes
`lcoa`	ꪊ
`htoa`	ꪕ
`hpho`	ꪟ
`auea`	ꪻ
`hoia`	꫞

Context-based shaping & positioning

Contextual positioning. Combining marks need to be positioned relative to the shape of the base that they are combined with. fig_vowp shows an example: the combining marks are higher to the right than the left, because of the size of the glyphs below.

Location of combining marks. The Tai Heritage Pro font offers a variant feature that allows placement of combining vowel signs and tones over the onset consonant, or over the final consonant in a closed syllable, see fig_vowp. The underlying sequence of code points is identical.

ꪕꪳ꪿ꪉ — Font feature `vowp` as default (left), and set to 2 (right).

Whereas the code point sequence remains the same for the example just shown, the same font feature can also be used to support a different code point sequence for AABE. By default, the code point order for the left-hand example in fig_vowp1 would be:

ꪊꪚꪾ

With the vowp feature set to 1, combining marks appear over the onset, except for this specific combination. This means that you can use the code point sequence:

ꪊꪾꪚ

Punctuation & inline features

Phrase & section boundaries

,␣.␣꫞␣꫟

phrase	,
sentence	.
poems	꫞ ꫟

phrase

sentence

poems

꫞

꫟

Observation: The UDHR text contains regular ASCII punctuation, including commas, periods, and colons, as well as dashes to separate text. Some examples can be seen in the sample text at the start of this page.

1. ꪋꪴ ꫛ ꪝꪮꪣ ꪼꪒ ꪣꪲ ꪁꪫꪸꪙ ꪵꪮꪚ ꪭꪸꪙ - ꪼꪒ ꪹꪤꪸꪒ ꪕꪮꪥ ꪹꪊꪸ ꪶꪒ ꪤꪱꪫ ꪤꪴꪀ ꪹꪚꪱ ꪎꪸ ꪁꪱ ꪙꪮꪥ ꪹꪭꪸꪉ ꪁꪷ ꪼꪒ ꪵꪮꪚ ꪬꪮꪉ ꪋꪽ ꪔꪾꪣ ꪀꪾꪚ ꪤꪱꪫ ꪤꪴꪀ ꪘꪰꪉ ꪶꪠꪉ ꪶꪩ - ꪤꪱꪫ ꪤꪴꪀ ꪋꪽ ꪔꪾꪣ ꪭꪳ ꪵꪣꪙ ꪄꪮꪉ ꪄꪰꪒ ꪹꪭꪸꪉ, ꪤꪱꪫ ꪥꪴꪀ ꪀꪲ ꪗꪺꪒ ꪀꪾꪚ ꪝꪳꪉ ꪹꪉꪸ ꪭꪳ ꪹꪜꪸꪙ ꪼꪄ ꪀꪫꪱꪉ ꪀꪾꪚ ꪤꪱꪫ ꪥꪴꪀ ꪋꪽ ꪎꪴꪉ ( ꪀꪱꪫ ꪭꪮꪀ ) ꪹꪜꪸꪙ ꪕꪮꪥ ꪈꪫꪸꪙ ꪔꪰꪀ ꪹꪋꪷꪉ ꪝꪸꪉ ꪻꪬ ꪹꪚꪱ ꪁꪫꪱꪙ ꫛ ꪻꪒ ꪵꪮꪚ ꪼꪒ.

Observation: Example ASCII punctuation in UDHR. (source)

Poems & songs. The only punctuation in the Unicode Tai Viet block is for poems and songs: ꫞ marks the beginning and ꫟ marks the end of the text.

ꪉꪮꪙ ꪶꪕ ꪖꪳ 2 ꪣꪳ 05 ꪁꪾ ꪹꪚꪙ 8 ꪜꪲ 2019 – ꪁꪱꪫꪣ ꪶꪕ ꪵꪔ ꪶꪡꪉ ꪚꪱꪙ ꪙꪱ — Observation: Example of en-dash in Tai Viet. (source)

Bracketed text

(␣)

Tai Dam commonly uses ASCII parentheses to insert parenthetical information into text.

	start	end
standard	(	)

ꪉꪮꪙ ꪶꪕ ꪋꪴ ꪹꪋ ꫄ꪤꪙ ꪹꪣꪉ ꪀꪰꪚ ꪀꪺꪀ ꪶꪭꪥ (Ngon tô chu chựa dần mương cắp Quốc hội) — Observation: Examples of parentheses in Tai Viet. (source)

Abbreviation, ellipsis & repetition

ꫝ

Repetition. ꫝ indicates repetition of the previous word.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

Tai Viet, Tai Dam

Sample

Usage & history

Basic features

Character index

Letters

Consonants

Vowels

Tones

Other

Not used

Combining marks

Vowels

Tones

Punctuation

ASCII

Other

To be investigated

Phonology

Vowel sounds

Plain vowels

Diphthongs

Consonant sounds

Syllable-final

Tone

Structure

Vowels

Vowel summary table

Post-consonant vowels

Combining marks used for vowels

Dedicated vowel letters

Consonants used for vowels

Pre-base vowel signs

Multipart vowels

Standalone vowels

Tones

Vowel sounds to characters

Plain vowels

Diphthongs and rhymes

Consonants

Consonant summary table

Basic consonants

Other dialects

Onsets

Finals

Consonant clusters

Consonant sounds to characters

Symbols

Numbers

Text direction

Glyph shaping & positioning

Font styles

Context-based shaping & positioning

Typographic units

Word boundaries

Graphemes

Punctuation & inline features

Phrase & section boundaries

Bracketed text

Abbreviation, ellipsis & repetition

Line & paragraph layout

Line breaking & hyphenation

Baselines, line height, etc.

Page & book layout

Online resources

References