Updated 10 January, 2020
This page gathers together basic information about the Tai Tham script and its use for the Khün language. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Khün using Unicode; for greater details follow the footnote links (especially those with an arrow alongside them).
For character-specific details follow the links to the Tai Tham character notes. See also the Tai Khün character app, the Northern Thai character app, and the All Tai Tham character app, and the notes on Northern Thai.
For similar information related to this and other scripts, see the script links pages.
Clicking on red text examples, or highlighting part of the sample text shows a list of characters. Click on the vertical blue bar (bottom right) to change font settings for the sample text. Colours and annotations on panels listing characters are relevant to their use for the Tai Khün language.
Unless in parentheses, the transcriptions in italics that follow Tai Tham text are a transliteration developed for these pages. Those in parentheses are usually more standard transcriptions. Transcriptions in ⌈ brackets ⌋ may be phonemic or phonetic. (The transliterations may be slightly different for Northern Thai text.) Consonants with a line below in the transliteration are associated with low class by default. Mid class consonants have an inverted breve below.
ᨡᩳ᩶ 1 ᨣᩢ᩠ᨶᩉᩮᩖᩨᨠᩥ᩠ᨶ ᨣᩢᩐᩢᩣᨡᩣ᩠ᨿᨸᩮ᩠ᨶᨦᩫ᩠ᨶ ᨠᩮ᩠ᨷᩉᩬᨾᩋᩬᨾᩅᩱᩢᨯ᩠᩶ᨦᨶᩦ᩶ ᨴᩩᨠᪧᨸᩦᨾᩣᨷᩢᨡᩣ᩠ᨯ ᨧᩥ᩠᩵ᨦᨠ᩠ᨴᩣᩴᩉᩨ᩶ᨡᩮᩢᩣᨻᩳ᩵ᨾᩯ᩵ᩃᩪᨠ ᨷᩢᨯᩱᩢᨠᩢᩢ᩠ᨶᩈᩢ᩠ᨦᩈᩢ᩠ᨠᨩᩮᩨᩬ.
ᨡᩳ᩶ 2 ᨴᩩᨠᨤ᩠ᨶᩫᨾᩦᩈᩥᨴ᩠ᨵᩥᩓᩢᨻ᩠ᨦᩈᩁᨽᩣ᩠ᨷ ᨲᩣ᩠ᨾᨴᩦ᩵ᨯᩱᩢᨠ᩵ᩣ᩠ᩅᩅᩱᩢᨶᩱᨡᩳ᩶ᨠᨲᩥᨠᩣᩋ᩠ᨶᩢᨶᩦ᩶ ᨯᩰ᩠ᨿᨷᩢᨯᩱᩢᨲᩯ᩠ᨠᨲ᩵ᩣ᩠ᨦᨠ᩠ᨶᩢ ᨷᩢᩅᩤ᩵ᩋ᩠ᨶᩢᨯᩱ ᩉᩮ᩠ᨾᩨᩁᩅᩤ᩵ ᨩᩮ᩠ᩋᩨ᩶ᨩᩣ᩠ᨲ ᨹ᩠ᩅᩥ ᨻ᩠ᨿᨯ ᨽᩣᩈᩣ ᩈᩣᩈᨶᩣ ᨣ᩠ᩅᩣ᩠ᨾᨣ᩠ᨯᩧᩉ᩠ᨶᩢᨴᩤ᩠ᨦᨠᩣ᩠ᩁᨾᩮ᩠ᨦᩨ ᩉᩕᩨᨴᩤ᩠ᨦᩋ᩠ᨶᩨ᩵ᪧ ᨩᩣ᩠ᨲᨩᩮ᩠ᩋᩨ᩶ ᩈ᩠ᨦᩢᨣ᩠ᨾᩫ ᩈᩫ᩠ᨾᨷ᩠ᨲᩢ ᨩᩣ᩠ᨲᨠᩮ᩠ᨯᩨ ᩉᩕᩨᩈᨳᩣᨶᩋ᩠ᨶᩨ᩵ᪧ ᨳᩯ᩠ᨦᩢᨷᩕᨠᩣ᩠ᩁᩉ᩠ᨶᩧ᩵ᨦ ᨷᩢᨾᩦᨣ᩠ᩅᩣ᩠ᨾᨲᩯ᩠ᨠᨲ᩵ᩣ᩠ᨦᨠ᩠ᨶᩢ ᨷᩢᩅᩤ᩵ᨴᩤ᩠ᨦᨠᩣ᩠ᩁᨾᩮ᩠ᨦᩨ ᨠᩣ᩠ᩁᩈᩣ᩠ᩁᨠᩣ᩠ᩁᩃᩩᨾ ᨠᩣ᩠ᨦᨲ᩵ᩣ᩠ᨦᨷᩤ᩠ᨶᩢᨲ᩵ᩣ᩠ᨦᨾᩮ᩠ᨦᩨ ᩉᩕᩨᨠᩣ᩠ᩁᨶᩱᨯ᩠ᨶᩥᨯᩯ᩠ᨶᨴᩦ᩵ᨤ᩠ᨶᩫᩋᩣᩈᩱ᩠ᨿᩀᩪ᩵ ᨷᩢᩅᩤ᩵ᨯ᩠ᨶᩥᨯᩯ᩠ᨶᨶᩦ᩶ᨧᩢᨸᩮ᩠ᨶᩑᨠᩁᩣ᩠ᨩ ᩀᩪ᩵ᨶᩱᨣ᩠ᩅᩣ᩠ᨾᨸ᩠ᨠᩫᨣᩕ᩠ᩋᨦᨡ᩠ᩋᨦᨲ᩠ᨶᩫ ᩉᩕᩨᩀᩪ᩵ᨲᩱᩢᩋᩴᩣᨶᩣ᩠ᨧᨲ᩠ᨶᩫᨯᩱᪧᨴ᩠ᨦᩢᩈᩢ᩠ᨿᨦ
The script called Tai Tham is used for three living languages, Lue, Khuen, and Northern Thai, which are spoken in China, Myanmar, Northern Thailand, and surrounding areas. In addition, the script is used for Lao Tham (or Old Lao) and other dialect variants found in Buddhist palm leaves and notebooks. Although the script has no single, commonly recognized name across the region today, it is known by various language-specific and region-specific names, such as Old Xishuang banna Dai or Old Tai Lue in China, Khün in Myanmar, and Tua Mueang, Lanna, or Yuan in Thailand.
Few of the six million speakers of Northern Thai are literate in the Tai Tham script, although there is some rising interest in the script among the young. There are about 690,000 speakers of Tai Lue. Of those, many people born before 1950 are literate in the Tai Tham script, and newspapers and other literature are regularly produced in the Xishuangbanna region of Yunnan using the script. Younger speakers are taught the New Tai Lue script, instead. ... The Tai Tham script continues to be taught in the Tai Lue monasteries. There are 107,000 speakers of Khün, for which Tai Tham is the only script.
The Tai Tham script, Lanna script (Thai: อักษรธรรมล้านนา) or Tua Mueang (Lanna: ᨲ᩠ᩅᩫᨾᩮᩥᩬᨦ, Northern Thai pronunciation: [tǔa.mɯ̄aŋ] Tai Lü: ᨲ᩠ᩅᩫᨵᨾ᩠ᨾ᩼ , Tham, "scripture"), is used for three living languages: Northern Thai (that is, Kham Mueang), Tai Lü and Khün. In addition, the Lanna script is used for Lao Tham (or old Lao) and other dialect variants in Buddhist palm leaves and notebooks. The script is also known as Tham or Yuan script.
The Tai Tham script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs. In Khün, consonants carry an inherent vowel a. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
The following list describes some distinctive characteristics of the Tai Tham script.
Although the same code points are used, there are some significant and consistent differences in the glyphs shapes used for characters in the Tai Khün (top) and Northern Thai (bottom) repertoires.
The panel below shows the differences for the consonants for the representative fonts used for this page.
Tai Tham script is written horizontally and left to right.
Words in the Tai languages are mostly monosyllabic, however multi-syllable borrowings and compound words occur.
Owen describes the stressed phonological syllable in Khün as C(C)V(C)T. The second consonant in an initial cluster is highly restricted. The onset consonant can be a glottal stop. Unstressed syllables are CVT, where the vowel is normally a.
Dialects differ, and experts also differ, but the basic sounds of Khün appear to grosso-modo include the following.
p pʰ b k kʰ t tʰ d c cʰ ʔ
i iː ɯ ɯː u uː
|p t k ʔ|
f s h
|e eː ɤ ɤː o oː|
m n ŋ
|ɛ ɛː a aː ɔ ɔː||m n ŋ|
j r l w
|(r) l||w j|
Final ʔ occurs only after short vowels.
Click on the sounds to reveal locations in this document where they are mentioned.
These tables use a hyphen to indicate the location of a consonant or consonant cluster. Prescript vowel-signs have been stored before the hyphen because of the limitations of the font, but in reality all vowel-signs should occur after the consonant they modify.
|Independent||Open syllables||Closed syllables|
|High||short||ʔi ᩍ||ʔu ᩏ||iʔ -ᩥ||ɯʔ -ᩧ||uʔ -ᩩ||i -ᩥ-||ɯ -ᩧ-||u -ᩩ-|
|long||ʔi: ᩎ||ʔu: ᩐ||i: -ᩦ||ɯː -ᩨ||u: -ᩪ||i: -ᩦ-||ɯː -ᩨ-||u: -ᩪ-|
|Upper-mid||short||eʔ ᩮ-ᩡ||ɤʔ ᩮ-ᩬᩨᩡ||oʔ ᩰ-ᩡ -᩠ᩅᩫᩡ|
|long||ʔe: ᩑ||ʔo: ᩒ||e: ᩮ-||ɤ: ᩮ-ᩬᩨ||o: ᩰ- -᩠ᩅᩫ||e: ᩮ-- -᩠ᨿ-||ɤ: ᩮ-ᩨ-||o: ᩰ-- -᩠ᩅ-|
|Lower-mid||short||ɛʔ ᩯ-ᩡ||ɔʔ ᩰ-ᩬᩡ||ɔ -ᩫ-|
|long||ɛ ᩯ-||ɔ: -ᩳ||ɛ ᩯ--||ɔ:-ᩬ-|
|ʔa ᩋ||aʔ - -ᩡ||a -ᩢ-|
|long||ʔa: ᩋᩣ||a: -ᩣ -ᩤ||a: -ᩣ- -ᩤ-|
Green indicates that the same character(s) are used for open syllables
In the absence of any other consonant, short vowels are always followed by a glottal stop. Long vowels never occur before a glottal stop.o143
Rhymes ending in approximants.
Most rhymes ending in an approximant (j or w) are regular combinations of vowel nucleus and approximant coda. These include -iw -e:w -ɛːw -aːw -ɯj -ɯːj -ɤj -ɤːj -aːj -uj.o152
The following special forms exist.o153
|Front • Central • Back|
|Upper-mid||e:w -ᩴ᩠ᨿ||o:j -᩠ᩅ᩠ᨿ|
|Low||aw ᩮ-ᩢᩣ aj ᩱ- ᩱ-᩠ᨿ|
The Unicode proposal contains the following diphthong -ia ᩮ-᩠ᨿ.e4-5
Owen says that most archaic diphthongs have morphed into plain vowels in Khün speech, however the remnants of the former sounds are seen in the duplication of written forms for e: ɤː oː in the earlier table which are represented by a sequence of vowel-signs.
The inherent vowel is usually transcribed and pronounced as a. So ᨠ is pronounced ka.
Other than the inherent vowel, vowel sounds that follow a consonant sound are represented using vowel-signs, eg. ᨠᩥ ki. This includes diphthongs, 5 prescript signs, and 22 circumgraphs.
Characters that produce vowel signs are all combining characters. In principle a single character is used per base consonant, but vowel signs can be combined to create additional sounds.
All vowel-signs are typed and stored after the base consonant, whether or not they precede it when displayed. The font takes care of the glyph positioning.
About half of the vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.
In the following list, transcriptions indicate which vowel-signs are used by Tai Khün.
ᩤ [U+1A64 TAI THAM VOWEL SIGN TALL AA] and ᩣ [U+1A63 TAI THAM VOWEL SIGN AA] are both used to represent the same phoneme. The choice of which is used is a matter of spelling. The taller version is typically (Owen says only, for Khün o152) used after ᨷ ᩅ ᨴ ᨵ ᨣ, and avoids confusion with otherwise similar shapes, eg. ᩅᩣ looks like ᨲ. Some textbooks also recommend it's use after ᨧ ᨻ ᩁ ᨽ. e
Northern Thai, also uses ᩲ [U+1A72 TAI THAM VOWEL SIGN THAM AI], however, ᩭ [U+1A6D TAI THAM VOWEL SIGN OY] is not used in Northern Thai.u655
The sequence ᩠ + ᨿ [U+1A60 TAI THAM SIGN SAKOT + U+1A3F TAI THAM LETTER LOW YA] is pronounced as the diphthong ia in Northern Thai and as eː in Khün when it appears alone.
The sequence ᩠ + ᩅ [U+1A60 TAI THAM SIGN SAKOT + U+1A45 TAI THAM LETTER WA] is pronounced as the diphthong ua when it appears alone.
Both of these characters also appear as a part of the combinations described below.
The various letters used for vowels can be mixed together to produce additional sounds, as shown in the examples below for sequences used in Khün. This list doesn't include rhymes that end in WA and YA in the normal way that codas are formed.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters, The list includes subjoined WA and YAand the postfixed ᩋ.
Vowel components can occur concurrently on 4 sides of the base, eg. ᩅᩮᩬᩨᩡ.
Distribution of vowel elements is as follows:
|ᩢ ᩫ ᩳ ᩴᩘ|
|ᩮ ᩯ ᩱ ᩰ ᩲ||ᩡ ᩅ ᩣ ᩤ ᩋ||ᩡ|
|ᩩ ᩪ ᩬ||ᩭ ᩠ᨿ|
In Tai Tham some vowels that are not preceded by a consonant can be represented using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound, but doesn't cover all such possible vowels.
The 6 vowel letters are used in syllable-initial position where there is no consonant onset, eg. ᩑᨠ ẹk eːk³ one.
The use of independent vowels is lexically constrained: for other vowels not preceded by a consonant Tai Tham uses ᩋ [U+1A4B TAI THAM LETTER A] with a vowel-sign, eg. ᩋᩧ᩠ᨷ ʔ̯ɯ˖b̯ ɯp² starve.
ᩒ [U+1A52 TAI THAM LETTER OO] is not used in Northern Thai.u654
The lists below show all consonants in the Tai Khün repertoire.
The consonants are associated with high, mid, or low class values related to tone. (Low class consonants are indicated using an underline in the transliteration, and mid class by an inverted breve.)
Click on the sounds to reveal locations in this document where they are mentioned.
p ᨸ ᨻ
t ᨭ ᨴ
tʰ ᨳ ᨮ ᨵ ᨰ
k ᨠ ᨣ
kʰ ᨡ ᨥ ᨤ
|Fricative||f ᨺ ᨼ||s ᩈ ᩆ ᩇ ᨪ||h ᩉ ᩌ|
|Nasal||m ᨾ||n ᨶ ᨱ||ŋ ᨦ|
|Approximant||w ᩅ||l ᩃ ᩊ||j ᩀ ᨿ|
cʰ seems to be regarded as not a native Khün sound, but rather associated with reading the alphabet out loud and in learned pronunciation of Pali loanwords.o142
There seems to be general agreement that initial consonant clusters are limited to kw and kʰw, although some have found other sounds in words loaned from Burmese, Sanskrit or Pali.o142
The initial glottal stop is also pronounced before independent vowels, though they are not listed here.
Syllable-final consonants. o143
p̚ ᨸ ᨻ
||t̚ ᨭ ᨴ||k̚ ᨠ ᨣ||ʔ|
|Nasal||m ᨾ||n ᨶ ᨱ||ŋ ᨦ|
|Approximant||w ᩅ||j ᩀ ᨿ|
The glottal stop is unwritten and non-phonemic, but is pronounced after a short, open vowel.
High and low consonants usually come in pairs, but where they don't the high variant is normally given by subjoining the low consonant below ᩉ [U+1A49 TAI THAM LETTER HIGH HA], eg. ᩉ᩠ᨶᩧ᩵ᨦ h˖ṉɯ¹ṉ̇ nɯɲ³ one.
A few consonants have different phonetic realisations in Northern Thai, and ᨢ [U+1A22 TAI THAM LETTER HIGH KXA] is used in Northern Thai but not by Tai Khün.
ᩋ [U+1A4B TAI THAM LETTER A] represents a glottal stop. It can be used with vowels at the beginning of a syllable, eg. ᩋᩧ᩠ᨷ ʔ̯ɯ˖b̯ ɯp² starve. Note that this can have very different shapes, as shown by the Northern Thai font (ᩋ) and the Khün font (ᩋ).
These combining characters are used to represent the second consonant in syllable-initial clusters.
In addition, a subjoined w̱ is often found in a syllable-initial cluster, eg. ᨣ᩠ᩅᩣ᩠ᨿ ḵ˖w̱ā˖ɲ̱̇ kwaːj² buffalo.
Other syllable-initial clusters include the combination of ᩉ [U+1A49 TAI THAM LETTER HIGH HA] plus a subjoined low class consonant to give a high class version, as mentioned just above.
Medial characters are useful, as they can signal the difference between a consonant cluster and an initial-final sequence, ie. using a subjoined l. Some fonts, however, don't make that distinction clear.h
Tai Tham text commonly renders syllable-final consonants using regular consonant code points, eg. ᩑᨠ ẹk eːk³ one, but sometimes the special combining characters shown below are used.
When regular consonants are used they are commonly subjoined, eg. ᨠᩣ᩠ᩁ kā˖r kan¹ work, but not always. For example, when preceded by a subscript vowel a final consonant may be rendered on the baseline, eg. ᩃᩪᨠ ḻūk luːk⁴ offspring. On the other hand, sometimes in Khün the consonant is subjoined and the subscript vowel is moved to the side of the stack, eg. ᨧ᩠ᨷᩪ c˖b̯ū cuːp³ kiss. (In Lanna, all three may be stacked, ie. ᨧ᩠ᨷᩪ.) In either case, due to font design or USE (the Universal Shaping Engine) the characters may have to be typed in an order that departs from the spoken order so that they look as expected.
The following diacritics are sometimes used for syllable-final consonants. For more details about usage, click on the characters and follow the links to the character notes.
Owen says that the superscript consonants in Kühn are limited to final r ( ᩺ [U+1A7A TAI THAM SIGN RA HAAM]) and ŋ ( ᩙ [U+1A59 TAI THAM CONSONANT SIGN FINAL NGA]) in syllables where a subscript vowel prevents the use of a subscript final consonant. Superscript forms are mainly found in handwritten text, whereas regular forms of these consonants in postscript position are the norm for printed texts. o145
᩺ [U+1A7A TAI THAM SIGN RA HAAM], has a different shape in Northern Thai and is not used for syllable-final consonants, but rather as a silence marker. Note that a syllable-final r is pronounced n.
ᩴ [U+1A74 TAI THAM SIGN MAI KANG] may be regarded as a vowel, but it doesn't introduce any vowel sound other than the inherent vowel when used above a consonant on its own.
ᩘ [U+1A58 TAI THAM SIGN MAI KANG LAI], has very different shapes in Northern Thai (ᩅᩘ) and Khün (ᩅᩘ).
ᩜ [U+1A5C TAI THAM CONSONANT SIGN MA], ᩝ [U+1A5D TAI THAM CONSONANT SIGN BA] and ᩞ [U+1A5E TAI THAM CONSONANT SIGN SA] appear to be alternative shapes for the normal subjoined consonants, used per writer preference (follow the links for more information). The latter two are rarely used in Khün.
The first of these is a special-use consonant diacritic. The second two are ligatures.
ᩛ [U+1A5B TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA] represents two different functions with the same appearance. It represents ᨮ [U+1A2E TAI THAM LETTER HIGH RATHA] in ᩈᨱᩛᩣ᩠ᨶ sṇ̱ᵽā˖ṉ shape. e And it represents ᨻ [U+1A3B TAI THAM LETTER LOW PA] in ᩋᨾᩛ ʔ̯m̱ᵽ mango. (Compare with the somewhat rare subjoined form, eg. ᨷᩢᨱ᩠ᨻᨷᩩᩁᩩᩇ b̯áṇ̱˖p̄b̯uruṣ disciple.)e
Khün uses ᨭᩛ, which is ᨭ + ᩛ [U+1A2D TAI THAM LETTER RATA + U+1A5B TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA] instead of ᨮ [U+1A2E TAI THAM LETTER HIGH RATHA].e
ᩓ [U+1A53 TAI THAM LETTER LAE] represents the combination ᩃ + ᩯ [U+1A43 TAI THAM LETTER LA + U+1A6F TAI THAM VOWEL SIGN AE], eg. ᩈᩮᩓ᩠ᩅ᩶ selₔ˖w̱² seː¹lɛːw⁶ already.
ᩔ [U+1A54 TAI THAM LETTER GREAT SA] represents geminated ᩈ [U+1A48 TAI THAM LETTER HIGH SA].
ᩚ [U+1A5A TAI THAM CONSONANT SIGN LOW PA] moves the stack upwards. The normal rendering of kp̄˖p̄ʰ would be ᨠᨻ᩠ᨽ, but in some Tai Lü words this sign is used instead, eg. ᨠᨽᩚ.e Note, however, that this changes the typing order of the consonants, since a combining character has to be typed after the base. The transliteration now becomes kp̄ʰp̆. e
Tai Tham is unusual in that subjoined consonants do not only appear where there are consonant clusters. There is a natural tendency to attempt to stack consonants, usually 2 high, whenever possible.
᩠ [U+1A60 TAI THAM SIGN SAKOT] is an invisible character used to produce the subjoined form of a consonant, eg. ᨠᨠ vs. ᨠ᩠ᨠ.
Unlike the virama in most brahmi-derived scripts, sakot doesn't necessarily kill the vowel between two consonants, nor does it create conjuncts in the sense of merged shapes. For example, in ᨨ᩠ᩃᩣ᩠ᨯ cʰ˖ḻā˖d̯ cʰa² laːt⁴ clever the inherent vowel after c is not suppressed.
Also unusually, sakot can follow a vowel-sign. For example, in ᩈᩣ᩠ᨾ sā˖m̱ saːm¹ three the sakot is used to position the final consonant in the syllable below the vowel-sign. This is quite common. A subjoined consonant can also follow a digit, eg. ᪓᩠ᨴ 3̣˖ṯ saːm tiː three times.
Tai Tham will usually attempt to subjoin non-initial consonants, although generally only two characters deep. Sequences of 2 subjoined characters exist, but in Tai Khün the second subjoined character joins to the right of the stack, rather than sitting below it, eg. ᨠ᩠ᩅ᩠ᨿᩁ k˖w̱˖ȳr kweːn¹ ox-cart. In Northern Thai, however, they may all be stacked, eg. ᨠ᩠ᩅ᩠ᨿᩁ
A consequence of this shallow subjoining is that a subscript vowel will typically cause a final consonant to not be subjoined, eg. ᩃᩪᨠ ḻūk luːk⁴ offspring. However, this is not always the case. In Khün, ᨧ᩠ᨷᩪ c˖b̯ū cuːp³ kiss the final b is subjoined under the onset consonant, and the normally subscript vowel is moved to the side. (In Northern Thai, however, this may be displayed as a single stack, ie. ᨧ᩠ᨷᩪ.)
Subjoined consonants are not only syllable-final consonants. The first consonant in a following syllable may also be subjoined, eg. ᨳ᩠ᨶ᩻ᩫᩁ tʰ˖ṉ᩻ɔ̈ṟ tʰanon path (final r is pronounced as n). e u654
Not all consonants traditionally have subjoined forms, but modern innovations in borrowed terminology suggest that fonts should provide them for all consonants except the old vocalic letters. u654
With the high/low categorisation of consonants, Tai Tham writing needs only the two combining tone marks below to indicate one of 6 possible phonetic tones.
The Unicode block for Khün contains 3 more tone marks, although they are rarely used.
Owen describes various studies of tones in Khün which reach slightly different conclusions.→o In addition, some studies conclude that there are 6 tones in total, and others 5. The table below shows Owen's 6-tone system.
|mid glottalised||˧˧ʔ||33ʔ||kaː˧˧ʔ dance|
|high falling||˥˩||51||kaː˥˩ trade|
If there is a vowel over or below a consonant or consonant stack, the tone mark follows the vowel in storage, and is displayed above or alongside the vowel.
Otherwise, the tone is input after the consonant, ie. before a vowel sign that is displayed to the right or below, and appears over the consonant.e
The default fonts used here expect the tone to be typed after a lefted vowel if there is one; after a vowel above, if there is one; before a vowel to the right; and doesn't seem to matter wrt low vowel. See this test. Noto agrees except for lefted vowels.
The following chart shows how to tell which tones are associated with a syllable.
Tai Tham has many combining marks. Here we list those that are not already listed in one of the following sections: vowelsigns, medials, finals, specials, and tones.
᩠ [U+1A60 TAI THAM SIGN SAKOT] is used to create a subjoined consonant. See subjoined.
᩼ [U+1A7C TAI THAM SIGN KHUEN-LUE KARAN] is written over a consonant (normally in final position) when that consonant is not to be pronounced. Frequently used in loans from languages with consonant clusters in the coda such as Pali, eg. ᩈᩫ᩠ᨾᨷᩪᩁ᩠᩼ᨱ sɔ̈˖m̱b̯ūr˖˚ṇ̱ som¹buːn² perfect or English ᨼᩥ᩠ᩃ᩼ᨾ f̱i˖ḻ˚m̱ fim² film.o149 Northern Thai would use ᩺ [U+1A7A TAI THAM SIGN RA HAAM], ie. ᨼᩥ᩠ᩃ᩺ᨾ f̱i˖ḻ˟m̱ fim².p
᩻ [U+1A7B TAI THAM SIGN MAI SAM] is used in Northern Thai either to indicate repetition of a word, eg. ᨲ᩵ᩣ᩠ᨦ᩻ t¹ā˖ŋ̱ʻ taːŋ taːŋ different in my view, or to identify double-acting consonants, or to indicate that a subjoined consonant begins a new syllable, eg. compare ᨳᩫ᩠ᨶᩁ tʰɔ̈˖ṉr tʰonra with ᨳᩫ᩠ᨶ᩻ᩁ tʰɔ̈˖ṉʻr tʰanon path (final r is pronounced as n).e
᩿ [U+1A7F TAI THAM COMBINING CRYPTOGRAMMIC DOT] is used in Northern Thai, singly or multiply beneath letters to give each letter a different value according to some hidden agreement between reader and writer.u655
The shapes of some of the above are significantly different in Northern Thai.
These characters are all described in boundaries.
The meaning of each of the logographs is shown above.
Two sets of digits are in common use: a secular set (Hora) and an ecclesiastical set (Tham). European digits are also found in books. u655
Are special glyph forms needed, depending on the context in which a character is used? Do glyphs interact in some circumstances?
Are there requirements to position diacritics or other items specially, depending on context? Does the script have multiple diacritics competing for the same location relative to the base?
There is not so much contextual shaping in Tai Tham as in many other Brahmi-descended scripts. One particularly noticeable example of contextual shaping is the realisation of ᨬ + ᩠ + ᨬ [U+1A2C TAI THAM LETTER NYA + U+1A60 TAI THAM SIGN SAKOT + U+1A2C TAI THAM LETTER NYA], which moves the initial character upwards rather than subjoining the second, ie. ᨬ᩠ᨬ.
Another common ligature is ᨶᩣ, which is composed of ᨶ + ᩣ [U+1A36 TAI THAM LETTER NA + U+1A63 TAI THAM VOWEL SIGN AA]. It forms even when NA has non-spacing subscripts, and even MEDIAL RA, eg. ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ ʔ̯oṉ˖tr̆ā˖ɲ̱̇ ʔontʰalaːi danger. Pali must regularly handle the nominative singular ending for present participles, ᨶ᩠ᨲᩮᩣ ṉ˖teā.r
Placement of tone marks often involves special shaping and positioning. See the positions in the examples below.
In the A Tai Tham KH New font, a 2nd-tone mark following ᩢ [U+1A62 TAI THAM VOWEL SIGN MAI SAT] loses its uptick to create two parallel lines, eg. ᨡᩮᩢ᩶ᩣ kʰeá²ā kʰaw⁵ rice.
Does the script have special requirements for baseline alignment between mixed scripts and in general?
Are italicisation, bolding, oblique, etc relevant? Do italic fonts lean in the right direction? Is synthesised italicisation problematic? Are there other problems relating to bolding or italicisation - perhaps relating to generalised assumptions of applicability?
If the script is bicameral, are the special rules about case conversion? Are there other correspondences between glyphs, such as half- vs fullwidth presentation forms?
Do Unicode grapheme clusters appropriately segment character units for the script? Are there special requirements when double-clicking on the text, or moving through the text with the cursor, or backspace, etc.?
Are words separated by spaces, or other characters? Are there special requirements when double-clicking on the text? Are words hyphenated?
Spaces separate phrases. There is no separation of individual words.
A new word may start with a subjoined consonant. Stacking is performed across word boundaries. This means that operations such as line-breaking, word highlighting, etc. have to use an orthographic syllable unit which differs from the underlying phonetic syllables.
What characters are used to indicate the boundaries of phrases, sentences, and sections?
The following punctuation marks have "progressive values of finality".
European punctuation marks such as question marks, exclamation marks, parentheses, and quotation marks are also used.
᪣ [U+1AA3 TAI THAM SIGN KEOW], ᪤ [U+1AA4 TAI THAM SIGN HOY], ᪥ [U+1AA5 TAI THAM SIGN DOKMAI], and ᪭ [U+1AAD TAI THAM SIGN CAANG] are all used as section starters, sometimes in conjunction with other punctuation, eg. ᪩᪥᪩ and ᪭ᩣ. e
To close a section, use ᪦ [U+1AA6 TAI THAM SIGN REVERSED ROTATED RANA] and/or ᪬ [U+1AAC TAI THAM SIGN HANG], eg. ᪦᪦᪩, ᪩᪦᪩, ᪩᪦᪩᪬, and ᪦᪦᪬.
What characters are used as parentheses, or to bracket information?
What characters are used to indicate quotations? Do quotations within quotations use different characters? What characters are used to indicate dialogue?
What characters are used to indicate abbreviation, ellipsis & repetition?
How are emphasis and highlighting achieved? If lines are drawn alongside, over or through the text, do they need to be a special distance from the text itself? Is it important to skip characters when underlining, etc? How do things change for vertically set text?
What mechanisms, if any, are used to create inline notes and annotations? (For referent-type notes such as footnotes, see below.)
Are there special rules about the way text wraps when it hits the end of a line? Does line-breaking wrap whole 'words' at a time, or characters, or something else (such as syllables in Tibetan and Javanese)? What characters should not appear at the end or start of a line, and what should be done to prevent that?
Opportunities for line breaking are lexical, but a line break may not be inserted between a base letter and a combining diacritic. u656
There is no insertion of visible hyphens at line boundaries. u656
Character properties. Characters used for the Tai Khün language have the following assignments related to line-break properties.
|NU||20||᪀ ᪁ ᪂ ᪃ ᪄ ᪅ ᪆ ᪇ ᪈ ᪉ ᪐ ᪑ ᪒ ᪓ ᪔ ᪕ ᪖ ᪗ ᪘ ᪙|
|SA||96||ᩮ ᩯ ᩱ ᩰ ᩣ ᩤ ᩡ ᩢ ᩥ ᩦ ᩧ ᩨ ᩩ ᩪ ᩭ ᩫ ᩬ ᩳ ᩠ ᨿ ᩠ ᩅ ᩋ ᩍ ᩎ ᩏ ᩐ ᩑ ᩒ ᨠ ᨡ ᨧ ᨨ ᨮ ᨲ ᨳ ᨸ ᨹ ᩀ ᩈ ᩆ ᩇ ᨺ ᩉ ᨯ ᨷ ᩋ ᨣ ᨥ ᨤ ᨦ ᨩ ᨫ ᨬ ᨭ ᨰ ᨱ ᨴ ᨵ ᨶ ᨻ ᨽ ᨾ ᩃ ᩊ ᩁ ᨿ ᩅ ᨪ ᨼ ᩌ ᩕ ᩖ ᩺ ᩙ ᩴ ᩘ ᩜ ᩝ ᩞ ᩛ ᩓ ᩔ ᩠ ᩵ ᩶ ᩼ ᪨ ᪩ ᪪ ᪫ ᪧ|
CM (combining mark) takes on the behaviour of its base character.
NU (number) behaves like ordinary characters (AL) in the context of most characters but activate the prefix and postfix behavior of prefix and postfix characters.
SA (Southeast Asian) require morphological analysis to determine break opportunities, in a way similar to a hyphenation algorithm. No break opportunities will be found otherwise. Complex context analysis, often involving dictionary lookup of some form, is required to determine non-emergency line breaks. If such analysis is not available, it is recommended to treat them as AL.
ZW (ZERO WIDTH SPACE, ZWSP) enables invisible break opportunities wherever SPACE cannot be used. It has no width, and is treated as if it wasn't there during justification.
Is hyphenation used, or something else?
Does text in a paragraph needs to have flush lines down both sides? Does the script need assistance to conform to a grid pattern? Does the script allow punctuation to hang outside the text box at the start or end of a line? Where adjustments are need to make a line flush, how is that done? Does the script shrink/stretch space between words and/or letters? Are word baselines stretched, as in Arabic? What about paragraph indents?
Does the script create emphasis or other effects by spacing out the words, letters or syllables in a word? (For justification related spacing, see above.).
Are there list or other counter styles in use? If so, what is the format used? Do counters need to be upright in vertical text? Are there other aspects related to counters and lists that need to be addressed?
Does the script use special styling of the initial letter of a line or paragraph, such as for drop caps or similar? How about the size relationship between the large letter and the lines alongide? where does the large letter anchor relative to the lines alongside? is it normal to include initial quote marks in the large letter? is the large letter really a syllable? etc.
How are the main text area and ancilliary areas positioned and defined? Are there any special requirements here, such as dimensions in characters for the Japanese kihon hanmen? The book cover for scripts that are read right-to-left scripts is on the right of the spine, rather than the left. When content can flow vertically and to the left or right, how to specify the location of objects, text, etc. relative to the flow? Do tables and grid layouts work as expected? How do columns work in vertical text? Can you mix block of vertical and horizontal text? Does text scroll in a different direction?
Does the script have special requirements for character grids or tables?
Does the script have special requirements for notes, footnotes, endnotes or other necessary annotations of this kind? (There is a section above for purely inline annotations, such as ruby or warichu. This section is more about annotation systems that separate the reference marks and the content of the notes.)
Are vertical form controls needed? Are scroll bars in an unusual position? Other special requirements for user interaction?
Are there special conventions for page numbering, or the way that running headers and the like are handled?
The Tai Tham script characters in Unicode 12.0 are in the following block:
Apart from ASCII characters, the Khün orthography described here uses 97 characters (and 22 more, used infrequently) from the following Unicode blocks:
Character Usage has information about the following orthographies associated with this script: Khün (Tai Khün) • Northern Thai (Lanna, Kam Mueang)
For character-specific details see Tai Tham character notes.
According to ScriptSource, the Tai Tham script is used for the following languages: