Sinhala script summary

Updated 14 February, 2018 • tags sinhala, scriptnotes

This page provides basic information about the Sinhala script and its use for the Sinhalese language. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as I learned. For character-specific details follow the links to the Sinhala character notes.

For similar information related to other scripts, see the Script comparison table.

Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text.

Sample (Sinhala)

1 වන වගන්තිය සියලු මනුෂ්‍යයෝ නිදහස්ව උපත ලබා ඇත. ගරුත්වයෙන් හා අයිතිවාසිකම්වලින් සමාන වෙති. යුක්ති අයුක්ති පිළිබඳ හැඟීමෙන් හා හෘදය සාක්ෂියෙන් යුත් ඔවුන්, ඔවුනොවුන්ට සැළකිය යුත්තේ සහෝදරත්වය පිළිබඳ හැඟීමෙනි.

2 වන වගන්තිය ජාති, වංශ, වර්ණ, ස්ත්‍රී පුරුෂ භාවය, භාෂාව, ආගම්, දේශපාලන ආදී කවර බේදයක් හෝ සමාජ, ජාතික, දේපළ, උපත ආදී කවර තත්ත්වයක විශේෂයක් හෝ නොමැතිව මේ ප්‍රකාශනයේ සඳහන් සියලු හිමිකම්වලට හා ස්වාධීනත්වයන්ට සෑම පුද්ගලයකුම උරුම වන්නේය. තවද යම් පුද්ගලයකු අයත්වන රටේ දේශපාලන, නීතිමය හෝ ජාත්‍යන්තර තත්ත්වයන් පිළිබඳ කිසිදු විශේෂයක් ද ඒ රටේ ස්වාධීන, භාරකාර, අස්වාධීන ආදී කවර තත්ත්වයක් පිළිබඳ විශේෂයක් ද නොමැතිව මේ හිමිකම් ඔහු සතු වන්නේය.

Usage & history

From Scriptsource:

The Sinhala script is used for writing the Sinhala language, spoken by approximately 15,500,000 people in Sri Lanka, and for transcribing the ancient Pali and Sanskrit languages. The script is derived from Brahmi, and shows close similarities to the Grantha script which was used in southern India until the 16th century. Sinhala is a diglossic language, that is, the spoken and written forms of the language show considerable variation. ...

There are two forms of the Sinhala script. The standard, 'pure', form which is taught in schools is called eḷu hōḍiya or śuddha hōḍiya. This system contains twenty consonant and twenty vowel letters and can be used to represent the sounds of the spoken language almost perfectly. However, to adhere to current spelling conventions - some of which represent archaic pronunciations - and to accurately transcribe Sanskrit, Pali, Hindi and English loanwords, a wider set of letters is needed. This set is called 'mixed alphabet' miśra hōḍiya and contains an additional eighteen consonant letters, many of which are aspirated equivalents of existing letters.

From Wikipedia:

The Sinhalese alphabet (Sinhalese: සිංහල අක්ෂර මාලාව) (Siṁhala Akṣara Mālāva) is an alphabet used by the Sinhalese people in Sri Lanka and elsewhere to write the Sinhalese language and also the liturgical languages Pali and Sanskrit. The Sinhalese alphabet, which is one of the Brahmic scripts, a descendant of the ancient Indian Brahmi script closely related to the South Indian Grantha script and Kadamba alphabet.

Sinhalese is often considered two alphabets, or an alphabet within an alphabet, due to the presence of two sets of letters. The core set, known as the śuddha siṃhala (pure Sinhalese, ශුද්ධ සිංහල) or eḷu hōḍiya (Eḷu alphabet එළු හෝඩිය), can represent all native phonemes. In order to render Sanskrit and Pali words, an extended set, the miśra siṃhala (mixed Sinhalese, මිශ්‍ර සිංහලimg), is available.

Key features

The Sinhala script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs. In Sinhala, consonants carry an inherent vowel a. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

Modern Sinhala can be written using a subset of the letters available in the Sinhala Unicode block. The remainder are used for representing the sounds of Sanskrit, Pail and other languages, and include all the aspirated consonants (which are pronounced in the same way as the unaspirated ones). Unusually for indic scripts, there are also a set of prenasalised consonants, and there is an æ vowel.

The virama is usually displayed in consonant clusters. However, it is also possible to render clusters using conjunct forms (ligatures or reduced glyphs). Zero width joiner is used after the virama to signal the intention for that. Putting the ZWJ before the virama produces another form of conjunct, where adjacent consonants touch each other, but this is not used for modern Sinhalese.

Text runs from left to right.

Character lists

The Sinhala script characters in Unicode 10.0 are spread across 2 blocks:

  1. Sinhala (59 letters, 20 marks, 1 punctuation, 10 numbers : total 90)
  2. Sinhala Archaic Numbers (20 numbers : total 20)

The following links give information about characters used for languages associated with this script. The numbers in parentheses are for non-ASCII characters.

For character-specific details see Sinhala character notes.

Consonants

Sinhala uses a core set of consonants for writing the modern language of Sinhalese, but has an extended set used for writing Sanskrit, Pali, and Tamil words. Each consonant has the inherent vowel a.

The core set, or śuddha hōḍiya, is based on the classical grammar of the middle ages (called එළු හෝඩිය eḷu hōḍiya) and contains the following consonants:

list all
kk
gg
ňgᵑɡ
cʧ
jʤ
ʈ
ɖ
n
ňḍⁿɖ
tt
dd
ňdⁿd
pp
bb
mm
m̌bᵐb
yj
rr
ll
l
vʋ
ss
hɦ

The full set of consonants, known as miśra hōḍiya (mixed alphabet), includes the following additional consonants.

list all
khk
ghg
ŋ
chʧ
jhʤ
ñɲ
ṭhʈ
ḍhɖ
tht
dhd
nn
php
bhb
śʃ
ʃ
ff

Note that the aspirated miśra consonants are mapped to the same sounds as the unaspirated śuddha ones, and the retroflex and are each pronounced without retroflexion.

Sinhalese has a new character for f, [U+0DC6 SINHALA LETTER FAYANNA]. Sometimes, instead, a character is used that combines the Latin letter 'f' with the Sinhalese p, [U+0DB4 SINHALA LETTER ALPAPRAANA PAYANNA].

Prenasalised consonants

A peculiarity of Sinhalese among indic scripts is the inclusion of prenasalised consonants, representing a nasal sound followed by a stop. The orthography distinguishes these graphemes from the more straightforward nasal consonant followed by a stop. For example, compare අඬ aňḍa aⁿɖa sound and අණ්ඩ aṇḍa aɳɖa egg.

list all
ňgᵑɡ
ňj?
jña?
ňḍⁿɖ
ňdⁿd
m̌bᵐb

The prenasalised shapes are formed from a combination of the shapes of the participating characters.

Consonant clusters & gemination

Consonant cluster handling is a little unusual in Sinhala, compared to other indic scripts.

There are 3 ways of managing consonant clusters. Modern Sinhala uses only the first two alternatives.

In all cases, ◌් [U+0DCA SINHALA SIGN AL-LAKUNA​], called al lakuna, is used as the virama, however, different joining behaviour can be produced by adding ZWJ [U+200D ZERO WIDTH JOINER] before or after.

The first approach simply shows the virama visually over the preceding consonant(s). The shape of the virama can take two forms, depending on the base character it is appended to: with k you get ක්; with kh you get ඛ්.

The second approach is principally used when combining r or y with another consonant (both before and after, in the case of r), and produces a reduced or ligated form. For this use:
◌් + ZWJ [U+0DCA SINHALA SIGN AL-LAKUNA + U+200D ZERO WIDTH JOINER].

The forms with r or y look like this when combined with k: ර්‍ක rk, ක්‍ර kr, and ක්‍ය ky. There are also forms using both, eg. ක්‍ය්‍ර kjra and කාර්‍ය්‍යාලය kāryyālaya. Although the use of the conjunct with r is required in normal Sinhalese text, it is possible to not use it: both කර්ම karma and කර්‍ම karma are valid.s

Wikipediaw lists several more conjuncts, some of which are reproduced below. The availability of these conjuncts is font dependent, eg. ඳ්‍ව ňdva doesn't ligate using the default font of this page, but may with another.

list all
ක්‍වkva
ක්‍ෂkṣa
ත්‍ථttha
ත්‍වtva
න්‍දnda
න්‍ධndha
න්‍ද්‍රndra

The third approach is used in ancient scriptures but is not used in modern Sinhala.w It hides the virama and moves the consonants alongside each other, so that they are touching, eg. ම‍්ම mm (cf. මම). For this use ZWJ first, ie.:
ZWJ + ◌් [U+200D ZERO WIDTH JOINER + U+0DCA SINHALA SIGN AL-LAKUNA​].

Vowels

The vowel letters of Sinhala are also divided into a core and extended set.

The core (śuddha) alphabet includes the following.

list all

-
aa,ə

āaː,a

ææ

ǣæː

ii

ī

uu

ū

ee

ē

oo

ō

The extended (miśra) letters are:

list all

ri,ru

r̥̄riː,ruː

aiɑj

auɑw

li

l̥̄liː

The pronunciations of [U+0D85 SINHALA LETTER AYANNA] and [U+0D86 SINHALA LETTER AAYANNA] vary, but in a fairly predictable way. The former is a in the first syllable, except for a few words, and before double consonants or clusters, and ə word finally and before single consonants. The latter represents everywhere except word-finally, where it may be a, depending on the word structure. Similar length rules apply to e and o in final position.

One particular affix, යි yi, is pronounced j and treated as a final consonant.

Multipart vowels

The following vowel signs have strokes that appear on both sides of the base. In decomposed text (NFD) these could be represented by multiple code points, but the Sinhala encoding standards recommends that only the single code point is used.

list all
ේē
ොo
ෝō
ෞau

The list above shows both precomposed and decomposed forms of the vowels. Click on the red text to see the composition.

Combining characters

Of the 20 combining characters in the Sinhala block, 17 are vowel signs. The remainder include the virama (described above), the anusvara, and the visarga.

list all
-්virama
-ංanusvara
-ඃvisarga

◌ං [U+0D82 SINHALA SIGN ANUSVARAYA​] usually represents the sound ŋ, eg. සිංහල siṃhala siŋhala Sinhala.

◌ඃ [U+0D83 SINHALA SIGN VISARGAYA​] is also in the repertoire. Not clear how it's used in Sinhala. 

Either of these 'semi-consonants' must be used after a vowel or after a consonant+vowel (including the inherent vowel), and must be the last combining character in the syllable.

Numbers

Sinhala uses european digits.

There is, however, a set of native digits, that were used into the 20th century, but mostly associated with horoscopes. The shapes of some of these are identical to characters used for other purposes.

list all
0
1
2
3
4
5
6
7
8
9

There is also an old system used in a historic number system, called Sinhala Illakkam, prior to 1815. These are all in the Sinhala Archaic Numbers block.

list all
𑇡1
𑇢2
𑇣3
𑇤4
𑇥5
𑇦6
𑇧7
𑇨8
𑇩9
𑇪10
𑇫20
𑇬30
𑇭40
𑇮50
𑇯60
𑇰70
𑇱80
𑇲90
𑇳100
𑇴1000

Context-based glyph changes

Contextual shaping

Similarly to the Tamil script, the u and ū vowels assume various different shapes and connection points, depending on what consonant they follow.

  (-a) -u
kකුකූ
pපුපූ
rරුරූ
ළුළූ
Shape variants for the u and ū vowel signs.

Other idiosyncratic combinations are also possible, such as the rendering of .

ra
රැ
රෑ
Shape variants for the æ and ǣ vowel signs.

Combining characters may need to be adapted to fit the consonants they are attached to.

ක් ඛ්    පි රි ඬි

Two different versions of hal kirīma (left); differently shaped i in pi, ri and ňḍ (right).

As described above, consonant clusters may cause conjuncts to form, as a way of indicating that there are no intervening vowels. Conjunct ligations are generally expected for r and y, and other conjuncts depend on font availability. Generally, a conjunct is formed by reducing the non-final consonant shapes.

ක්‍ව    ක්ව

Conjoined kv (left), and kv with hal kirīma (left).

Context-based positioning

Vowel signs may appear above, below, to the right, to the left, or on both sides of the base consonant.

ක කි කු කැ කෙ කො

Position of vowel signs for the sequence ka ki ku kæ ke ko.

Vowels signs are positioned around an orthographic syllable, rather than around a specific consonant. So a part of a vowel sign that appears to the left of its base will appear to the left of a conjunct.

ක්‍වො

In the syllable kvo the vowel sign appears on either side of the conjunct, not the letter v.

When a u vowel (or the long vowel) appears below a conjunct, it is placed below the final consonant, eg. ක්‍යු kyu.

Text layout

Text direction

Sinhala script is written horizontally and left to right.

Text delimiters

Words are separated by spaces.

Sinhala uses western punctuation.

The punctuation character [U+0DF4 SINHALA PUNCTUATION KUNDDALIYA] once functioned to indicate the end of a paragraph, but is not used for modern Sinhala content.

TBD

Other features to be investigate in this section include: text delimiters, emphasis & highlighting, text decoration, abbreviations & ellipsis, hyphens & dashes character transforms, quotations, ruby, repetition, line breaking, hyphenation, justification & alignment, first-letter styling, vertical text, notes & footnotes, page layout

Input

The Sinhala keyboards has deadkeys which change the assignments of keys around them when pressed. For example, pressing the key for e will change several keys to letters that start with the same symbol.

Sinhala keyboard in default state.

Sinhala keyboard after the key for e is pressed.

Note also, in the bottom left corner, that the keyboard has a key for the combination of ◌් + ZWJ + [U+0DCA SINHALA SIGN AL-LAKUNA​ + U+200D ZERO WIDTH JOINER + U+0DBB SINHALA LETTER RAYANNA], ie. the conjoined -r. The shifted layout has a similar key for -y.

There is a rephaya key (for the sequence + ◌් + ZWJ [U+0DBB SINHALA LETTER RAYANNA + U+0DCA SINHALA SIGN AL-LAKUNA​ + U+200D ZERO WIDTH JOINER]), but it is typed after the consonant that normally follows it in memory. The input method then has to rearrange the codepoints in canonical order.

Effectively, you type characters or parts of multipart characters in visual order, and the system then has to rearrange things to produce the expected codepoint order.

References

  1. [U] The Unicode Standard v10.0, Sinhala, pp513-514.
  2. [D] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0, pp408-412
  3. [W] Sinhalese alphabet.
  4. [S] Sri Lanka Standard, Sinhala Character Code for Information Exchange.
Last changed 2018-02-14 7:02 GMT.  •  Make a comment.  •  Licence CC-By © r12a.