Santali (draft)
Ol Chiki

Updated 14 November, 2022

This page brings together basic information about the Ol Chiki script and its use for the Santali language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Santali using Unicode.

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

ᱱᱤᱭᱟᱹ ᱣᱤᱠᱤᱯᱤᱰᱤᱭᱟ ᱫᱚ ᱥᱟᱱᱛᱟᱲᱤ ᱛᱮ ᱚᱞ ᱟᱠᱟᱱᱟ᱾ ᱚᱨᱦᱚᱸ ᱮᱴᱟᱜ ᱯᱟᱹᱨᱥᱤᱛᱮ ᱦᱚᱸ ᱟᱭᱢᱟ ᱣᱤᱠᱤᱯᱤᱰᱤᱭᱟ ᱢᱮᱱᱟᱜᱼᱟ ᱾ ᱱᱚᱸᱰᱮ ᱠᱤᱪᱷᱩ ᱛᱟᱹᱞᱠᱟᱹ ᱮᱢ ᱦᱩᱭᱱᱟ ᱾

Usage & history

The Ol Chiki script was invented by Pandi Raghunath Murmu in the 1920s to provide Santali, spoken by around 6 million people, with a dedicated script, instead of the Latin, Bengali, Devanagari, and Odia that had been and still are used. Ol Chiki is primarily used for the southern dialect of Santali, and has received some official recognition. It has been used for other Munda languages.

ᱚᱞ ᱪᱤᱠᱤ ɔl ciki Ol Chiki ᱥᱟᱱᱛᱟᱲᱤ santaɽi Santali/Santhali

Sources Unicode and Wikipedia.


Basic features

The Ol Chiki script is an alphabet. Both consonants and vowels are indicated by letters. See the table to the right for a brief overview of features for the Santali language.

Ol Chiki is mostly a simple and small orthography. There are no combining characters, and no symbols. The script has no case distinction.

Ol Chiki runs left to right in horizontal lines.

Words are separated by spaces.

Santali has 23 consonant letters. Consonants can be aspirated by forming a digraph with a special aspiration letter.

There are 6 basic vowel letters, and 3 digraphs, where an existing vowel letter is followed by a dot. All vowels can be nasalised and lengthened by modifiers.

Ol Chiki has a unique way of handling syllable-final consonants. Four of the voiced stops are typically pronounced unreleased, and unvoiced if they don't appear before a vowel. Where the full value of the letter should be retained, this can be indicated using a modifier called ahad. Another hyphen-like modifier,phaarkaa, applies the reduction before a vowel, when needed (for example for certain verb forms).

Ol Chiki has native digit shapes.

Block-specific danda and double-danda are used as sentence and section dividers. Otherwise, most of the punctuation is ASCII.

Distinctive characteristics: an alphabet surrounded by abugidas; modifiers for voiced stops at word boundaries; no combining characters.

Character index

Letters

Show

Consonants

ᱯ␣ᱵ␣ᱛ␣ᱫ␣ᱪ␣ᱰ␣ᱴ␣ᱡ␣ᱠ␣ᱜ␣ᱥ␣ᱦ␣ᱢ␣ᱱ␣ᱧ␣ᱬ␣ᱝ␣ᱣ␣ᱶ␣ᱨ␣ᱲ␣ᱞ␣ᱭ

Vowels

ᱤ␣ᱩ␣ᱮ␣ᱳ␣ᱚ␣ᱟ

Other

ᱹ␣ᱸ␣ᱺ␣ᱻ␣ᱽ␣ᱼ␣ᱷ

Numbers

Show
᱐␣᱑␣᱒␣᱓␣᱔␣᱕␣᱖␣᱗␣᱘␣᱙

Punctuation

Show
᱾␣᱿␣“␣”␣‘␣’
Items to show in lists

Phonology

These are sounds for the Santali language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i ĩ u ũ e o ə ə̃ ə ə̃ ɛ ɛ̃ ɔ ɔ̃ a ã

Santali can have successive vowels without intervening consonants, but they are not diphthongs such as those that combine with a glide at the end.

Consonant sounds

The following reprour are non-native or allophones. Source Wikipedia.

labial dental alveolar post-
alveolar
retroflex palatal velar glottal
stops p b t d     ʈ ɖ c ɟ k ɡ  
aspirated     ʈʰ ɖʰ ɟʰ ɡʰ  
fricatives     s         h
nasals m   n   ɳ ɲ ŋ
approximants w   l     j  
trills/flaps     r   ɽ

The aspirated stops occur primarily, but not exclusively, in Indo-Aryan loanwords.wl,#Phonology

ɳ only appears as an allophone of n before ɖ.wl,#Phonology

A typical Munda feature is that word-final stops are "checked", i. e. glottalised and unreleased.wl,#Phonology

Vowels

Vowel sounds to characters

This section maps Santali vowel sounds to common graphemes in the Ol Chicki orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Basic vowels

The standard vowel sounds for Santali are written as follows.

ᱤ␣ᱩ␣ᱮ␣ᱳ␣ᱚ␣ᱟ

Additional vowels

Three additional vowel sounds are represented using [U+1C79 OL CHIKI GAAHLAA TTUDDAAG],

ᱮᱹ␣ᱚᱹ␣ᱟᱹ

ᱚᱹ [U+1C5A OL CHIKI LETTER LA + U+1C79 OL CHIKI GAAHLAA TTUDDAAG] is rarely used, and the phonetic difference between it and [U+1C5A OL CHIKI LETTER LA] is not clearly defined, but the ALA-LOC transcription page says that it has a lower pitch. The phonemic difference between the two may be only marginal.rp,9

Nasalisation

ᱸ␣ᱺ

Nasalisation of vowels is indicated using [U+1C78 OL CHIKI MU TTUDDAG],rp,9 eg. ᱦᱟᱸᱰᱮ

When the letter is followed by [U+1C79 OL CHIKI GAAHLAA TTUDDAAG] a separate Unicode character is used, rather than adding the two characters. That character is [U+1C7A OL CHIKI MU-GAAHLAA TTUDDAAG],rp,9 eg. ᱵᱮᱺᱫᱤ

Long vowels

To indicate a prolonged vowel sound, [U+1C7B OL CHIKI RELAA] is used,rp,9 eg. ᱢᱚᱹᱬᱮᱻ ᱢᱚᱸᱻᱦᱟ

Consonants

There are no special forms for consonant clusters in Santali.

Consonant sounds to characters

This section maps Santali consonant sounds to common graphemes in the Ol Chiki orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc.

Stops

Fricatives

Nasals

Other

Basic consonants

ᱯ␣ᱵ␣ᱛ␣ᱫ␣ᱪ␣ᱰ␣ᱴ␣ᱡ␣ᱠ␣ᱜ
ᱥ␣ᱦ
ᱢ␣ᱱ␣ᱧ␣ᱬ␣ᱝ
ᱣ␣ᱶ␣ᱨ␣ᱲ␣ᱞ␣ᱭ

Final consonants

ᱽ␣ᱼ

Four voiced stops are pronounced unvoiced and unreleased when they are not followed by a vowel, especially in word-final position.

[U+1C75 OL CHIKI LETTER OB]beg. ᱩᱵ
[U+1C6B OL CHIKI LETTER UD]deg. ᱢᱮᱫ
[U+1C61 OL CHIKI LETTER AAJ]ɟeg. ᱢᱩᱡ
[U+1C5C OL CHIKI LETTER AG]ɡeg. ᱫᱚᱜ

Where the voicing needs to be maintained, [U+1C7D OL CHIKI AHAD] is added, eg.

ᱨᱚᱡᱽcf. ᱨᱚᱡᱚ
ᱫᱟᱜᱽcf. ᱫᱟᱜᱤ

In the opposite situation, where a voiced consonant is used before a vowel but you want to allow the devoicing, put [U+1C7C OL CHIKI PHAARKAA] before the vowel.rp,10 For example, see this verb form: ᱢᱮᱱᱟᱜᱼᱟ

Aspiration

Aspirated consonant sounds are indicated using [U+1C77 OL CHIKI LETTER OH] after the consonant,fp,2 eg. ᱡᱷᱚᱛᱚ ᱛᱷᱚᱲᱟ

Symbols

The Ol Chiki Unicode block has no characters with the general category of Symbol.

Numbers

Digits

Ol Chiki has a set of native digits

᱐␣᱑␣᱒␣᱓␣᱔␣᱕␣᱖␣᱗␣᱘␣᱙

Text direction

Santali text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Santali orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Ol Chiki character app.

Santali text is not cursive (ie. joined up like Arabic), however there is some ligation in hand written text which doesn't occur in printed content.

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Font styling & weight

tbd

Graphemes

Grapheme clusters

Since there are no combining marks or decompositions, grapheme clusters correspond to individual characters.

Question: Should nasalisation or vowel extension dots be handled like combining characters, ie. form a grapheme with the preceding character?

Punctuation & inline features

Word boundaries

Word units are separated by spaces.

Paired words may be separated by [U+1C7C OL CHIKI PHAARKAA], eg. ᱥᱩᱡᱷᱼᱵᱩᱡᱷ sujʰ-bujʰ

Phrase & section boundaries

,␣;␣:␣᱾␣?␣!␣᱿

Santali uses mostly ASCII punctuation, but also some Indic punctuation from the Santali block.

phrase

, [U+002C COMMA]

; [U+003B SEMICOLON]

: [U+003A COLON]

sentence

[U+1C7E OL CHIKI PUNCTUATION MUCAAD]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK]

section ᱿ [U+1C7F OL CHIKI PUNCTUATION DOUBLE MUCAAD]

The ASCII full stop is not used, since it creates confusion with other dots in the orthography, therefore [U+1C7E OL CHIKI PUNCTUATION MUCAAD] is the main sentence delimiter.rp,11

᱿ [U+1C7F OL CHIKI PUNCTUATION DOUBLE MUCAAD] is used at the end of a paragraph or some other block of text.

Example of mucaad and double mucaad in Santali text.fp,6

Observation: Samples in the Unicode proposals suggest that the mucaad and double mucaad punctuation is preceded by a space.

Bracketed text

(␣)

Santali commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Quotations & citations

“␣”␣‘␣’

Santali texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

  start end
initial

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

Emphasis

tbd

Abbreviation, ellipsis & repetition

tbd

Inline notes & annotations

tbd

Other punctuation

tbd

Other inline text decoration

tbd

Line & paragraph layout

Line breaking & hyphenation

tbd

Observation: Lines appear to be broken at word boundaries.

Show (default) line-breaking properties for characters in the modern Santali orthography.

Text alignment & justification

Observation: All but one of the samples in the Unicode submission document are fully justified. Mostly, the justification is achieved by stretching inter-word spacing, however some words also have the space between characters stretched.

Example of full justification, with the word at the end of the 3rd line from the bottom also showing signs of being stretched.fp,7

Text spacing

tbd

This section looks at ways in which spacing is applied between characters over and above that which is introduced during justification.

Baselines, line height, etc.

tbd

Santali uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

Santali Wikipedia pages use numeric styles.

Numeric

The numeric style is decimal-based and uses these digits.

᱑␣᱒␣᱓␣᱔␣᱕␣᱖␣᱗␣᱘␣᱙␣᱐

Examples:

᱑␣᱒␣᱓␣᱔␣᱑᱑␣᱒᱒␣᱓᱓␣᱔᱔␣᱑᱑᱑␣᱒᱒᱒␣᱓᱓᱓␣᱔᱔᱔

Prefixes and suffixes

A range of prefixes and/or suffixes is used in Wikipedia. They include a simple period, parentheses on both sides, and no mark.

List counters with parens. List counters with dots. List counters with no prefix/suffix.
Separators for Santali list counters in Wikipedia.

Styling initials

tbd

Page & book layout

This section is for any features that are specific to Ol Chiki and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

References