Kurukh orthography notes

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size: 36px

𑷕𑶳𑷐𑶺𑶵 𑶵𑷑𑶵𑷐𑶰𑶿 𑷕𑶴𑷊 𑷌𑶴𑷕𑶰 𑶸𑶵𑷐𑶱 𑶿𑶲 𑶺𑶴𑷑𑷑𑶰𑶿𑶻𑶵 𑶴𑷗𑶵𑷂𑶰 𑶴𑷐𑶵 𑶴𑷓𑷀𑶱𑶺 𑶺𑶴𑶿𑶿𑶵 𑷌𑶴𑷕𑶰 𑷕𑶴𑶻 𑷖𑶴𑷋𑶴𑷐𑷊𑶰 𑷐𑶴𑶰. 𑶵𑷐𑶰𑶿 𑷑𑶲𑷐 𑶴𑷐𑶵 𑷇𑶰𑷏𑶵 𑷌𑶴𑷕𑶰 𑷂𑶴𑶽 𑶸𑶴𑶲𑷔𑶵 𑷖𑶴𑷋𑶴𑷊𑶰𑷙 𑷐𑶴𑶰𑷙 𑶴𑷐𑶵 𑶻𑶲𑶺𑷕𑶱𑷙𑷓 𑶺𑶴𑷈𑶰 𑶿𑶲𑷙 𑶺𑶱𑷑-𑶶̰𑶱𑶺 𑷌𑶴𑷕𑶰 𑶸𑶱𑶽𑷕𑶵𑷐 𑶿𑶴𑶿𑶿𑶵 𑷅𑶴𑷕𑶰𑷙.

Source: Universal Declaration of Human Rights, article 1

Usage & history

Tolong Siki ( 𑶻𑶳𑷑𑶳𑷎 𑷔𑶰𑷊𑶰 ) is a South Asian monocameral alphabet used in India specifically for the North Dravidian Kurukh language. It was invented by Narayan Oraon in 1988 and formally published in 1999. Books and magazines have been published in Tolong Siki, and it was officially recognized by the state of Jharkhand in 2007. The Kurukh Literary Society of India has been instrumental in spreading the Tolong Siki script for Kurukh literature.

Tolong Siki is read horizontally, left to right. It is a relatively simple alphabet. Vowels are written using letters, with additional signs to indicate vowel length and nasalisation. There are no ligatures or conjuncts. There are combining marks, but no ascenders or descenders, so the need for context-sensitive positioning is low. Words are separated using spaces. It has its own set of number digits.

Unicode 17 has 1 dedicated Tolong Siki block, comprising 54 characters, but several diacritics are sourced from other blocks.

More information: Unicode Proposal • Kobayashi et al

Basic features

The Tolong Siki script is an alphabet, ie. all vowels are written explicitly, alongside consonants; there is no inherent vowel in a consonant (abugidas), certain vowels are not systematically dropped (abjads), and consonant and vowel are not combined in the same character (syllabaries).

❯ basicV

Vowels Vowels are written using 6 vowel letters. A breve above a vowel can be used to indicate non-native sounds.

All vowels can be lengthened using a colon-like sign, and nasalised using a tilde above, or both.

Although word-initial standalone vowels begin phonetically with a glottal stop, they are written using the vowel letter alone, but word-medial standalones are typically preceded by an apostrophe that indicates that the sequence is not a diphthong and represents the glottal stop.

❯ consonantSummary

Consonants Tolong Siki has 36 basic consonant letters. Combining marks can be used to indicate non-native sounds.

Geminated consonant sounds are indicated by simply doubling the consonant letter.

Vowel absence Since this is an alphabet, vowel absence in consonant clusters or after codas is marked simply by an absence of vowel letters. There is no special shaping or mark to indicate a consonant cluster.

Medial r can be written as a tilde below the preceding base, and Tolong Siki also uses a dot like the Devanagari anusvara to represent homorganic nasals at the end of a syllable.

Numbers Tolong Siki has a set of native digits.

Layout Tolong Siki text runs left to right in horizontal lines. Words are separated by spaces. There is no case distinction.

Punctuation is ASCII.

Notable features

word-medial standalone vowels are preceded by an apostrophe
indistinct differentiation between a and ɑ
a colon-like letter indicates vowel length
representation of extended consonants changed in 2015

Character index

Letters

Show

Basic consonants

𑶶,𑶷,𑶸,𑶹,𑶺,𑶻,𑶼,𑶽,𑶾,𑶿,𑷀,𑷁,𑷂,𑷃,𑷄,𑷅,𑷆,𑷇,𑷈,𑷉,𑷊,𑷋,𑷌,𑷍,𑷎,𑷏,𑷐,𑷑,𑷒,𑷓,𑷔,𑷕,𑷖,𑷗,𑷘,𑷚

Vowels

𑶰,𑶱,𑶲,𑶳,𑶴,𑶵,𑷙

Other

𑷛

Combining marks

Show

Vowels

̃,̆

Medials

Finals

Other

̤,̈

Not used for modern Kurukh

̄,̣,̱

Numbers

Show

𑷠,𑷡,𑷢,𑷣,𑷤,𑷥,𑷦,𑷧,𑷨,𑷩

Punctuation

Show

।,ʼ

ASCII

!,(,),␣,-,.,:,;,?

Phonology

The following represents the repertoire of the Kurukh language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

Consonant sounds

	labial	alveolar	post- alveolar	retroflex	palatal	velar	glottal
stop	p b	t d		ʈ ɖ	c ɟ	k ɡ	ʔ
	pʰ bʰ	tʰ dʰ		ʈʰ ɖʰ	cʰ ɟʰ	kʰ ɡʰ
fricative		s	ʃ			x	h
nasal	m	n		ɳ	ɲ	ŋ
approximant	w	l			j
trill/flap		r		ɽ ɽʰ

Tone

Kurukh is not a tonal language.

Structure

tbd

Vowels

	Post-consonant	Long vowel	Nasalised vowel	Long, nasalised vowel
Simple:	𑶰,𑶲	𑶰𑷙,𑶲𑷙	𑶰̃,𑶲̃	𑶰̃𑷙,𑶲̃𑷙
	𑶱,𑶳	𑶱𑷙,𑶳𑷙	𑶱̃,𑶳̃	𑶱̃𑷙,𑶳̃𑷙
	𑶴,𑶵	𑶴𑷙,𑶵𑷙	𑶴̃,𑶵̃	𑶴̃𑷙,𑶵̃𑷙

Diacritics are added to the vowels to indicate length and nasalisation. These are shown for a single vowel letter, as the pattern is the same for other vowels. ONLY THE LEFT-HAND COLUMN LINKS TO THE DETAILED TABLE.

Post-consonant vowels

Basic vowels are written in a straightforward way, using dedicated letters for each of the native vowel sounds. All vowels can be lengthened using a colon-like sign, and nasalised using a tilde above, or both.

A breve above a vowel can be used to indicate non-native sounds.

Basic vowels

These are the basic vowels.

𑶰,𑶲,𑶱,𑶳,𑶴,𑶵

eg.

𑶼𑶴𑷐𑶰𑷏𑶵

𑶼,𑶴,𑷐,𑶰,𑷏,𑶵

Although Kobayashi et al. say that Kurukh has 5 cardinal vowels, and even though both 𑶴 and 𑶵 are generally transcribed using a, they describe the latter as only marginally longer than former, but consistently pronounced further back. LETTER A is described as phonetically between ɐ and ə.kl§23 This distinction is reinforced by the fact that both can take lengthening marks (see vlength). Therefore the vowels are represented here using different symbols: a and ɑ in order to give more clarity in the phonological transcriptions.

Extended vowels

𑶵̆

The diacritic ̆ can be used to indicate non-native vowel sounds. An example is the English sound ɔ, which is written in a way that calls to mind the candra symbols in Devanagari.

eg.

𑶵̆𑶶𑶰𑷔

Diphthongs

Kurukh has diphthongs, including ɐɪ, ɐʊ, and oɪ. They are written using a sequence of vowel letters, with no intervening glottal stop character.

eg.

𑷏𑶴𑶰𑶽

Vowel length

Long vowels in Tolong Siki are indicated by the use of 𑷙.

eg.

𑶱𑷙𑷊𑶵

𑶺𑶴𑷈𑷕𑶰𑷙

The following panel shows examples of lengthened vowels.

𑶰𑷙,𑶲𑷙,𑶱𑷙,𑶳𑷙,𑶴𑷙,𑶵𑷙

Observation: Although 𑶵 is already supposed to be long, instances of it followed by a vowel lengthener can easily be found in texts, for example 𑶻𑶵𑷙𑷊𑶵. It's not clear what this means, although it lends to the assumption that 𑶴 and 𑶵 are differentiated more by quality than length (see plainV).

Nasalisation

Vowel nasalisation is marked using ̃.

eg.

𑷊𑶲̃𑷗𑶲𑷖

𑷀𑶲𑷉𑶵̃

If another sign is attached to the base consonant, such as a vowel lengthening mark, the nasalisation mark is positioned over the base. For example,

eg.

𑶺𑶲̃𑷙𑷈𑶲𑷐ʼ𑶵

The following panel shows examples of long, nasalised vowels.

𑶰̃𑷙,𑶲̃𑷙,𑶱̃𑷙,𑶳̃𑷙,𑶴̃𑷙,𑶵̃𑷙

This diacritic was introduced in 2015. Prior to that authors would use ̇ for vowel nasalisation as well as to represent nasal codas (for which it is still used).

Standalone vowels

Word initial standalone vowels are phonetically preceded by a glottal stop, but are written using the normal vowel signs with no special additional mechanisms.

eg.

𑶵𑷗𑶰𑷙

𑶵̆𑶶𑶰𑷔

𑶰𑷐𑶰𑶸

Word-medial standalone vowels will typically be preceded by ʼ, to indicate the syllable boundaries within the word, and in careful speech the glottal stop, eg.

eg.

𑷕𑶴ʼ𑶵

𑷐𑶱𑶻ʼ𑶵

𑶱𑷙𑶼𑷐ʼ𑶵

This sign helps to distinguish between diphthongs and multiple syllables.

Vowel sounds to characters

This section maps Kurukh vowel sounds to common graphemes in the Tolong Siki orthography.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

Plain vowels

vowel 𑶰

vowel 𑶲

vowel 𑶱

vowel 𑶳

extended vowel 𑶵̆ Non-native sound in loan words.

vowel 𑶴

vowel 𑶵

Modifiers

◌̃

nasalisation ̃

vowel length 𑷙

Vowel absence

Vowel absence principally occurs either when a consonant is a syllable coda, or when a consonant is part of a consonant cluster.

Since this is an alphabet, the absence of vowel sounds in consonant clusters or after codas is marked simply by an absence of vowel letters.

Kurukh syllables can include reasonably long consonant clusters, especially word-medially but also in word-final position. The clusters can involve a number of sounds, which don't necessarily follow the normal principle of sonority sequencing. However, apart from the medial r described in onsets, and the nasal coda described in finals, there are no special mechanisms involved in representing consonant clusters. Letters are simply juxtaposed.

eg.

𑶿𑶰𑷎𑷌𑷔𑷖𑶱𑷗𑶳𑷙

𑶹𑶱𑶿𑶽𑷌𑷑𑶱𑷙

Consonants

Onsets	𑶶,𑶸,𑶻,𑶽,𑷀,𑷂,𑷅,𑷇,𑷊,𑷌,𑷚
	𑶷,𑶹,𑶼,𑶾,𑷁,𑷃,𑷆,𑷈,𑷋,𑷍
	𑷔,𑷖,𑷕
	𑶺,𑶿,𑷄,𑷉,𑷓,𑷎
	𑷒,𑷐,𑷗,𑷘,𑷑,𑷏
Medial	◌̰
Finals	◌̇,◌̇,◌̇,◌̇,

Basic consonants

Basic consonant sounds in Kurukh are written using the following letters.

Click on each letter for more details and for examples of usage.

𑶶,𑶷,𑶸,𑶹,𑶻,𑶼,𑶽,𑶾,𑷀,𑷁,𑷂,𑷃,𑷅,𑷆,𑷇,𑷈,𑷊,𑷋,𑷌,𑷍,𑷚,𑷔,𑷖,𑷕,𑶺,𑶿,𑷄,𑷉,𑷓,𑷎,𑷒,𑷐,𑷗,𑷘,𑷑,𑷏

The sign for the glottal stop 𑷚 is apparently used after vowel sounds, although its use does not appear to be particularly common. Narayan Oraon gives the following example: 𑷔𑶰𑷚𑶿𑶵. The sign ʼ also indicates a glottal stop when it occurs before word-medial standalone vowels (see standalone).

Observation: It's not clear what the second palatal nasal is for.

Repertoire extension

̤,̈

Tolong Siki uses combining marks to indicate non-native sounds. Since 2015, this involves ̤ or ̈.

The table below gives mappings to sounds found in Hindi and Sanskrit. The right-hand column shows the Devanagari equivalent.

11DCA 324	q	क़
11DCC 324	ɣ	ग़
11DC5 324	kʂ	क्ष
11DC7 324	z	ज़
11DB7 324	f	फ़
11DD2 324	v	व़
11DD4 324	ʃ	श
11DD4 308	ʂ	ष

Prior to 2015 several different combining marks were used, and these may still appear in texts. They include ̣, ̱, ̄, ̄̃, and ̄̇.

Onsets

Medial r can be written using ̰. The Unicode Proposal lists the following combinations.

𑷊̰,𑶻̰,𑶽̰,𑷔̰,𑷊̰𑶰

This diacritic was introduced in 2015.

Codas

Syllable codas are typically written just using ordinary consonant letters, eg.

eg.

𑶹𑶵𑶹𑶿𑶵

However, ̇ can be used to indicate a syllable-final nasal. The quality of the nasal depends on the sound that follows. For example,

eg.

𑷀𑶳̇𑷊𑶵

Consonant length

Consonant gemination is common in Kurukh, and often occurs at the end of a word. Gemination is indicated by doubling the relevant consonant letter. Examples:

eg.

𑶰𑶻𑶻𑶵

𑷅𑶰𑷅𑷅

𑶲𑷎𑷊𑷊

Consonant sounds to characters

This section maps Kurukh consonant sounds to common graphemes in the Tolong Siki orthography.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

consonant 𑶶

pʰ

consonant 𑶷

consonant 𑶸

bʰ

consonant 𑶹

consonant 𑶻

tʰ

consonant 𑶼

consonant 𑶽

dʰ

consonant 𑶾

consonant 𑷀

ʈʰ

consonant 𑷁

consonant 𑷂

ɖʰ

consonant 𑷃

consonant 𑷅

cʰ

consonant 𑷆

consonant 𑷇

ɟʰ

consonant 𑷈

consonant 𑷊

kʰ

consonant 𑷋

consonant 𑷌

ɡʰ

consonant 𑷍

glottal stop ʼ Used before vowels.

glottal stop 𑷚 Used after vowels.

consonant 𑷔

consonant 𑷖

consonant 𑷕

consonant 𑶺

consonant 𑶿

homorganic final nasal ̇ Coda.

consonant 𑷄

homorganic final nasal ̇ Coda.

consonant 𑷉

consonant 𑷓

homorganic final nasal ̇ Coda.

consonant 𑷎

homorganic final nasal ̇ Coda

consonant 𑷒

consonant 𑷐

medial r ̰ Medial consonant.

consonant 𑷗

ɽʰ

consonant 𑷘

consonant 𑷑

consonant 𑷏

Non-native sounds

kʂ

extended repertoire consonant 𑷅̤ Equivalent to क्ष.

extended repertoire consonant 𑷊̤ Equivalent to क़.

extended repertoire consonant 𑶷̤ Equivalent to फ़.

extended repertoire consonant 𑷒̤ Equivalent to व़.

extended repertoire consonant 𑷇̤ Equivalent to ज़.

extended repertoire consonant 𑷔̤ Equivalent to श.

extended repertoire consonant 𑷔̈ Equivalent to ष.

extended repertoire consonant 𑷌̤ Equivalent to ग़.

Other features

Auspicious sign

𑷛 is an auspicious sign. It has the general category of letter, and is pronounced ũɡɡu.

Encoding choices

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Codepoint sequences

Combining marks always follow the base character.

When a consonant is both nasalised and long, the length marker needs to be added after the nasalisation diacritic, so that the latter sits above the consonant.

eg.

𑶺𑶲̃𑷙𑷈𑶲𑷐ʼ𑶵

Numbers

Digits

Tolong Siki has a set of native digits

𑷠,𑷡,𑷢,𑷣,𑷤,𑷥,𑷦,𑷧,𑷨,𑷩

Text direction

Tolong Siki text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Tolong Siki orthography described here.

Glyph shaping & positioning

Experiment with examples using the Tolong Siki character app.

Context-based shaping & positioning

Wancho letters don't interact, so no special shaping is needed.

Where a base character carries multiple combining marks, these need to be arranged so as not to overlap.

Typographic units

Word boundaries

Words are separated by spaces.

Some words are hyphenated.

eg.

𑷑𑶴𑶹-𑷑𑶴𑶹

Graphemes

Graphemes in Tolong Siki consist of single letters or letters with one or two combining marks. This means that text can be segmented into typographic units using grapheme clusters.

Phrase, sentence, and section delimiters are described in phrase.

Punctuation & inline features

Phrase & section boundaries

Tolong Siki uses mostly ASCII and Arabic punctuation, although apparently । may sometimes be used as well.up§4

phrase	, ; :
sentence	. ? !

phrase

;

sentence

Bracketed text

See type samples.

Wancho commonly uses ASCII parentheses to insert parenthetical information into text.

	start	end
standard	(	)

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words.

Tolong Siki, Kurukh

Sample

Usage & history

Basic features

Notable features

Character index

Letters

Basic consonants

Vowels

Other

Combining marks

Vowels

Medials

Finals

Other

Not used for modern Kurukh

Numbers

Punctuation

ASCII

Phonology

Vowel sounds

Plain vowels

Consonant sounds

Tone

Structure

Vowels

Post-consonant vowels

Basic vowels

Extended vowels

Diphthongs

Vowel length

Nasalisation

Standalone vowels

Vowel sounds to characters

Plain vowels

Modifiers

Vowel absence

Consonants

Basic consonants

Repertoire extension

Onsets

Codas

Consonant length

Consonant sounds to characters

Non-native sounds

Other features

Auspicious sign

Encoding choices

Codepoint sequences

Numbers

Digits

Text direction

Glyph shaping & positioning

Context-based shaping & positioning

Typographic units

Word boundaries

Graphemes

Punctuation & inline features

Phrase & section boundaries

Bracketed text

Line & paragraph layout

Line breaking & hyphenation

Page & book layout

Online resources

References