Santali orthography notes

Basic features

The Ol Chiki script is an alphabet. Both consonants and vowels are indicated by letters. See the table to the right for a brief overview of features for the Santali language.

Ol Chiki is mostly a simple and small orthography. There are no combining characters, and no symbols. The script has no case distinction.

Ol Chiki runs left to right in horizontal lines. Words are separated by spaces.

❯ consonantSummary

Santali has 23 consonant letters. Consonants can be aspirated by forming a digraph with a special aspiration letter.

Ol Chiki has a unique way of handling syllable-final consonants. Four of the voiced stops are typically pronounced unreleased, and unvoiced if they don't appear before a vowel. Where the full value of the letter should be retained, this can be indicated using a modifier called ahad. Another hyphen-like modifier, phaarkaa, applies the reduction before a vowel, when needed (for example for certain verb forms).

There are no special forms for consonant clusters in Santali.

❯ basicV

This orthography is an alphabet. Vowels are written using 6 basic vowel letters and 3 digraphs, where an existing vowel letter is followed by a dot. All vowels can be nasalised and lengthened by modifier letters. There are no combining marks.

Ol Chiki has native digit shapes.

Block-specific danda and double-danda are used as sentence and section dividers. Otherwise, most of the punctuation is ASCII.

Distinctive characteristics: an alphabet surrounded by abugidas; modifiers for voiced stops at word boundaries; no combining characters.

Phonology

These are sounds for the Santali language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

Santali can have successive vowels without intervening consonants, but they are not diphthongs such as those that combine with a glide at the end.

Consonant sounds

The following reprour are non-native or allophones. Source Wikipedia.

	labial	alveolar	retroflex	palatal	velar	glottal
stops	p b	t d	ʈ ɖ	c ɟ	k ɡ
aspirated	pʰ bʰ	tʰ dʰ	ʈʰ ɖʰ	cʰ ɟʰ	ɡʰ
fricatives		s				h
nasals	m	n	ɳ	ɲ	ŋ
approximants	w	l		j
trills/flaps		r	ɽ

The aspirated stops occur primarily, but not exclusively, in Indo-Aryan loanwords.wl,#Phonology

ɳ only appears as an allophone of n before ɖ.wl,#Phonology

A typical Munda feature is that word-final stops are "checked", i. e. glottalised and unreleased.wl,#Phonology

Tone

tbd

Structure

tbd

Vowels

Vowel summary table

This table summarises basic vowel to character assignments.

Long vowels are indicated by a following ᱻ and nasalisation by ᱸ or ᱺ (not shown here).

	ᱤ␣ ␣ᱩ
	ᱮ␣ ␣ᱳ
	ᱟᱹ
	ᱮᱹ␣ ␣ᱚ␣ᱚᱹ
	ᱟ

For additional details see vowel_mappings.

Post-consonant vowels

Vowels are written using 6 basic vowel letters and 3 digraphs, where an existing vowel letter is followed by a dot. All vowels can be nasalised and lengthened by modifier letters. There are no combining marks.

Basic vowels

The standard vowel sounds for Santali are written as follows.

ᱤ␣ᱩ␣ᱮ␣ᱳ␣ᱚ␣ᱟ

Additional vowels

ᱹ

Three additional vowel sounds are represented using ᱹ,

ᱮᱹ␣ᱚᱹ␣ᱟᱹ

ᱚᱹ is rarely used, and the phonetic difference between it and ᱚ is not clearly defined, but the ALA-LOC transcription page says that it has a lower pitch. The phonemic difference between the two may be only marginal.rp,9

Long vowels

ᱻ

To indicate a prolonged vowel sound, ᱻ is used,rp,9.

ᱢᱚᱹᱬᱮᱻ ᱢᱚᱸᱻᱦᱟ

ᱡᱭᱻᱭᱤ

Nasalisation

ᱸ␣ᱺ

Nasalisation of vowels is indicated using ᱸ,rp,9 eg. ᱦᱟᱸᱰᱮ

When the letter is followed by ᱹ a separate Unicode character is used, rather than adding the two characters. That character is ᱺ,rp,9 eg. ᱵᱮᱺᱫᱤ

Vowel sounds to characters

This section maps Santali vowel sounds to common graphemes in the Ol Chicki orthography.

vowel ᱤ

vowel ᱤᱸ

vowel ᱩ

vowel ᱩᱸ

vowel ᱮ

ẽ

vowel ᱮᱸ

vowel ᱳ

vowel ᱳᱸ

extended vowel ᱟᱹ

ə̃

extended vowel ᱟᱺ

extended vowel ᱮᱹ

ɛ̃

extended vowel ᱮᱺ

vowel ᱚ

extended vowel ᱚᱹ

ɔ̃

vowel ᱚᱸ

extended vowel ᱚᱺ

vowel ᱟ

vowel ᱟᱸ

Consonants

Consonant summary table

This table summarises basic consonant to character assignments.

Onsets	ᱯ␣ᱵ␣ᱛ␣ᱫ␣ᱪ␣ᱰ␣ᱴ␣ᱡ␣ᱠ␣ᱜ
	ᱯᱷ␣ᱵᱷ␣ᱛᱷ␣ᱫᱷ␣ᱪᱷ␣ᱡᱷ␣ᱠᱷ␣ᱜᱷ
	ᱣ␣ᱥ␣ᱦ
	ᱢ␣ᱱ␣ᱧ␣ᱬ␣ᱝ
	ᱣ␣ᱶ␣ᱨ␣ᱲ␣ᱞ␣ᱭ
Finals	ᱵ␣ᱫ␣ᱡ␣ᱜ␣ ␣ᱵᱽ␣ᱫᱽ␣ᱡᱽ␣ᱜᱽ

For additional details see consonant_mappings.

Basic consonants

Basic consonant sounds in Santali are written using the following letters.

Click on each letter for more details and for examples of usage, especially where more than one sound is indicated.

ᱯ␣ᱵ␣ᱛ␣ᱫ␣ᱪ␣ᱰ␣ᱴ␣ᱡ␣ᱠ␣ᱜ␣ᱥ␣ᱦ␣ᱢ␣ᱱ␣ᱧ␣ᱬ␣ᱝ␣ᱣ␣ᱶ␣ᱨ␣ᱲ␣ᱞ␣ᱭ

Aspiration

ᱷ

Aspirated consonant sounds are indicated using ᱷ after the consonant,fp,2 eg. ᱡᱷᱚᱛᱚ ᱛᱷᱚᱲᱟ

Finals

ᱽ␣ᱼ

Four voiced stops are pronounced unvoiced and unreleased when they are not followed by a vowel, especially in word-final position.

ᱵ	b	→	p̚	eg. ᱩᱵ
ᱫ	d	→	t̚	eg. ᱢᱮᱫ
ᱡ	ɟ	→	c̚	eg. ᱢᱩᱡ
ᱜ	ɡ	→	k̚	eg. ᱫᱚᱜ

Where the voicing needs to be maintained, ᱽ is added, eg.

ᱨᱚᱡᱽ	cf. ᱨᱚᱡᱚ
ᱫᱟᱜᱽ	cf. ᱫᱟᱜᱤ

In the opposite situation, where a voiced consonant is used before a vowel but you want to allow the devoicing, put ᱼ before the vowel.rp,10 For example, see this verb form: ᱢᱮᱱᱟᱜᱼᱟ

Consonant sounds to characters

This section maps Santali consonant sounds to common graphemes in the Ol Chiki orthography.

consonant ᱯ

consonant ᱵ Coda (unreleased).

pʰ

consonant ᱯᱷ

consonant ᱵ

consonant ᱵᱽ Coda.

bʰ

consonant ᱵᱷ

consonant ᱛ

consonant ᱫ Coda (unreleased).

tʰ

consonant ᱛᱷ

consonant ᱫ

consonant ᱫᱽ Coda (unreleased).

dʰ

consonant ᱫᱷ

ᱴ

pre-nasalised consonant ᱰ

consonant ᱪ

consonant ᱡ Coda (unreleased).

cʰ

consonant ᱪᱷ

consonant ᱡ

consonant ᱡᱽ Coda (unreleased).

ɟʰ

consonant ᱡᱷ

ᱠ

consonant ᱜ Coda (unreleased).

kʰ

consonant ᱠᱷ

consonant ᱜ

consonant ᱜᱽ Coda (unreleased).

ɡʰ

consonant ᱜᱷ

consonant ᱦ may occur when word-final.

consonant ᱥ

consonant ᱦ

consonant ᱢ

consonant ᱱ

ᱬ

consonant ᱧ

consonant ᱝ

consonant ᱣ

w̃

ᱶ

consonant ᱨ

ᱲ

consonant ᱞ

consonant ᱭ

Typographic units

Word boundaries

Word units are separated by spaces.

Paired words may be separated by ᱼ, eg.

ᱥᱩᱡᱷᱼᱵᱩᱡᱷ sujʰ-bujʰ

Graphemes

Since there are no combining marks or decompositions, grapheme clusters correspond to individual characters.

Question: Should nasalisation or vowel extension dots be handled like combining characters, ie. form a grapheme with the preceding character?

Grapheme clusters

Base

Each base letter is a grapheme cluster, and there are no combining marks to extend them. Other following markers, being letters, are also treated as grapheme clusters, separately from the thing they modify. This includes ᱸ, ᱹ, ᱻ, etc.

Click on the text version of this word to see more detail about the composition.

	ᱨᱚᱡᱚ
	ᱢᱚᱹᱬᱮᱻ ᱢᱚᱸᱻᱦᱟ
	ᱢᱮᱱᱟᱜᱼᱟ

Punctuation & inline features

Phrase & section boundaries

,␣;␣:␣᱾␣?␣!␣᱿

Santali uses mostly ASCII punctuation, but also some Indic punctuation from the Santali block.

phrase	, ; :
sentence	᱾ ? !
section	᱿

phrase

;

sentence

᱾

section

᱿

The ASCII full stop is not used, since it creates confusion with other dots in the orthography, therefore ᱾ is the main sentence delimiter.rp,11

᱿ is used at the end of a paragraph or some other block of text.

The children’s primer Al Ita, in https://www.unicode.org/L2/L2005/05243r-n2984-ol-chiki.pdf — Example of mucaad and double mucaad in Santali text.`fp,6`

Observation: Samples in the Unicode proposals suggest that the mucaad and double mucaad punctuation is preceded by a space.

Bracketed text

(␣)

Santali commonly uses ASCII parentheses to insert parenthetical information into text.

	start	end
standard	(	)

Quotations & citations

“␣”␣‘␣’

Santali texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

	start	end
initial	“	”
nested	‘	’

Line & paragraph layout

Line breaking & hyphenation

tbd

Observation: Lines appear to be broken at word boundaries.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show (default) line-breaking properties for characters in the modern Santali orthography.

The following list gives examples of typical behaviours for some of the characters used in Ol Chiki. Context may affect the behaviour of some of these and other characters.

Click/tap on the characters to show what they are.

“ ‘ ( should not be the last character on a line.
” ’ ) . , ; ! ? । ॥ % should not begin a new line.
ᱹ ᱸ ᱺ ᱻ ᱼ do not create line-break opportunities when surrounded by other letters.

Line breaking should not move a danda or double danda to the beginning of a new line even if they are preceded by a space character.

Text alignment & justification

Observation: All but one of the samples in the Unicode submission document are fully justified. Mostly, the justification is achieved by stretching inter-word spacing, however some words also have the space between characters stretched.

The magazine Bhanj Parayni, in https://www.unicode.org/L2/L2005/05243r-n2984-ol-chiki.pdf — Example of full justification, with the word at the end of the 3rd line from the bottom also showing signs of being stretched.`fp,7`

Baselines, line height, etc.

Santali uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

The height of Ol Chiki letters is very uniform, and there are no combining marks to increase the extension. Nor are there any descenders.

To give an approximate idea, fig_baselines compares Latin and Santali glyphs from the Noto font. The basic height of Santali letters is set to the Latin cap-height. The Santali glyphs don't extend past the Latin glyphs.

Hhqxᱢᱚᱹᱚᱸᱬᱮᱻᱦᱟ᱓ — Font metrics for Latin text compared with Santali glyphs in the Noto Sans Santali font.

fig_baselines_other shows similar comparisons for the Nirmala UI font.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

Santali Wikipedia pages use numeric styles.

Numeric

The numeric style is decimal-based and uses these digits.

᱑␣᱒␣᱓␣᱔␣᱕␣᱖␣᱗␣᱘␣᱙␣᱐

Examples:

᱑␣᱒␣᱓␣᱔␣᱑᱑␣᱒᱒␣᱓᱓␣᱔᱔␣᱑᱑᱑␣᱒᱒᱒␣᱓᱓᱓␣᱔᱔᱔

Prefixes and suffixes

A range of prefixes and/or suffixes is used in Wikipedia. They include a simple period, parentheses on both sides, and no mark.

List counters with parens. — Separators for Santali list counters in Wikipedia.

List counters with dots. — Separators for Santali list counters in Wikipedia.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

Ol Chiki, Santali

Sample

Usage & history

Basic features

Character index

Letters

Consonants

Vowels

Other

Numbers

Punctuation

Other

Other

To be investigated

Phonology

Vowel sounds

Plain vowels

Consonant sounds

Tone

Structure

Vowels

Vowel summary table

Post-consonant vowels

Basic vowels

Additional vowels

Long vowels

Nasalisation

Vowel sounds to characters

Consonants

Consonant summary table

Basic consonants

Aspiration

Finals

Consonant sounds to characters

Numbers

Digits

Text direction

Glyph shaping & positioning

Context-based shaping & positioning

Typographic units

Word boundaries

Graphemes

Grapheme clusters

Punctuation & inline features

Phrase & section boundaries

Bracketed text

Quotations & citations

Line & paragraph layout

Line breaking & hyphenation

Line-edge rules

Text alignment & justification

Baselines, line height, etc.

Counters, lists, etc.

Numeric

Prefixes and suffixes

Page & book layout

Online resources

References