Use accesskey "n" to jump to the internal navigation links at any point. Right now you can

 
r12a >> docs

Malayalam

orthography notes

Updated 13 April, 2025 • recent changes scripts/mlym/ml • leave a comment

This page brings together basic information about the Malayalam script and its use for the Malayalam language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Malayalam using Unicode.

Referencing this document

Richard Ishida, Malayalam Orthography Notes, 13-Apr-2025, https://r12a.github.io/scripts/mlym/ml

 

Click to toggle Table of Contents.

Phonological transcriptions should be treated as a guide, only. They are taken from the sources consulted, and may be narrow or broad, phonemic or phonetic, depending on what is available. They mostly represent pronunciation of words in isolation. For more detailed information about allophones, alternations, sandhi, dialectal differences, and so on, follow the links to cited references.

This is an interactive document. Click/tap on the following to reveal detailed information and examples for each character: (a) coloured characters in examples and lists; (b) link text on character names. If your browser supports it, your cursor will change to look like as you hover over these items.

More about using this page

Character names. The names of characters in codepoint markup drop the initial MALAYALAM label (purely to reduce the length of the examples). In other places the full name can be found.

Navigation. The Toggle images icon opens the table of contents in a popup window. Dismiss it by clicking on the X alongside it, or by hitting the ESC key.

Detailed character notes. Clicking on coloured characters in lists or on character names opens panels that give detailed information about each character. This information is taken from the companion document, Malayalam Character Notes. (Those panels can be dismissed by pressing on the ESC key.)

Transcriptions & transliterations. Phonological transcriptions are surrounded by ⌈corner brackets⌋, to indicate that they vary between narrow, [phonetic] and broad, /phonemic/ transcriptions.
Latin transcriptions between <angle brackets>, represent the letters as commonly written in the Latin script.
A transliteration has also been developed especially for this orthography, and is generally based on the sound of a letter where possible, but where a letter has multiple pronunciations, the transliteration represents only one.
Transliterations provide perfect round-trip conversion between the native script and Latin, whereas Latin transcriptions rarely do.
When you click on an example to see its composition, the top of the panel that opens contains a transliteration, followed by the native text, then (if available) an IPA transcription.

Copied !
TOC.
Accessibility settings
ˇ

Languages using the Malayalam scriptMalayalam pickerTerms listCharacter notesMalayalam linksOther orthography notes

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

വകുപ്പ്‌ 1. മനുഷ്യരെല്ലാവരും തുല്യാവകാശങ്ങളോടും അന്തസ്സോടും സ്വാതന്ത്ര്യത്തോടുംകൂടി ജനിച്ചിട്ടുള്ളവരാണ്‌. അന്യോന്യം ഭ്രാതൃഭാവത്തോടെ പെരുമാറുവാനാണ്‌ മനുഷ്യന്നു വിവേകബുദ്ധിയും മനസ്സാക്ഷിയും സിദ്ധമായിരിക്കുന്നത്‌.

വകുപ്പ്‌ 2. ജാതി, മതം, നിറം, ഭാഷ, സ്ത്രീപുരുഷഭേദം, രാഷ്ട്രീയാഭിപ്രായം സ്വത്ത്‌, കുലം എന്നിവയെ കണക്കാക്കാതെ ഈ പ്രഖ്യാപനത്തില്‍ പറയുന്ന അവകാശങ്ങള്‍ക്കും സ്വാതന്ത്ര്യത്തിനും സര്‍വ്വജനങ്ങളും അര്‍ഹരാണ്‌. രാഷ്ട്രീയ സ്ഥിതിയെ അടിസ്ഥാനമാക്കി (സ്വതന്ത്രമോ, പരിമിത ഭരണാധികാരത്തോടു കൂടിയതോ ഏതായാലും വേണ്ടതില്ല) ഈ പ്രഖ്യാപനത്തിലെ അവകാശങ്ങളെ സംബന്ധിച്ചേടത്തോളം യാതൊരു വ്യത്യാസവും യാതൊരാളോടും കാണിക്കാന്‍ പാടുള്ളതല്ല.

Source: Unicode UDHR, articles 1 & 2

Usage & history

Origins of the Malayalam script, 13thC – today.

Phoenician

└ Aramaic

└ Brahmi

└ Tamil-Brahmi

└ Pallava

└ Grantha

└ Malayalam

+ Tigalari

+ Dhives Akuru

+ Saurashtra

Malayalam script is used to write the Malayalam language of Kerala state, and spoken by 35 million people including the diaspora, and the script is used for another 10 minority languages, according to the Ethnologue. It is also widely used for writing Sanskrit texts in Kerala.

മലയാളലിപി mlyāɭlipi mələjɑːɭə lɪpɪ Malayalam script

Originally descended from Bhrami, the Malayalam script is a Vatteluttu alphabet extended with symbols from the Grantha alphabet to represent Indo-Aryan loanwords. Throughout its history, it has absorbed words from Tamil, Sanskrit, Arabic, and English.

In the 1970s and 1980s, Malayalam underwent orthographic reform due to printing difficulties. A significant change involved the introduction of a visible virama (chandrakkala) rather than conjunct forms, and simplification of a number of forms, including consonant plus -u/-uu combinations.

Sources: The Unicode Standard, Wikipedia.

Script codemlym
Language codeml
Script typeabugida
Originsasia
Native speakers38,000,000
  
Total characters93
Letters61
Combining marks15
Symbols1
Punctuation4
Numbers10
Other2
Possible other21
Unicode blocks1
  
Character counts above are for this
orthography but exclude ASCII.
  
Text directionltr
Post-consonant vowels1 inherent vowel
marks
vocalics
composite vowels
pre-base marks
circumgraphs
Standalone vowelsletters
Case distinctionno
Cursive scriptno
Combining marksno
Clusters markedno
Dedicated finalsmarks
Other ligaturesyes
Word separatorspace
Wraps atword
G Clusters OK?no
Justificationspaces
Baselineromn

Basic features

The script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel signs to the consonant. See the table to the right for a brief overview of features for the modern Malayalam orthography.

The Malayalam script was significantly simplified at the beginning of the 1970s. Prior to the orthographic reform there were many more ligated forms. In particular, the vowels u/ū and r in 2nd position in a consonant were reduced from ligated forms to simple, unchanging glyphs alongside a consonant.

Generally, words are separated by spaces, however the number of characters between spaces can be quite high as sometimes spaces are used to indicate phonological pauses, rather than lexical boundaries.

❯ Consonant summary table

Malayalam uses 36 basic consonant letters.

Consonant clusters are typically indicated in modern Malayalam using the visible chandrakkala mark (virama), which indicates that no vowel follows a consonant. Conjunct forms are also expressed using stacked consonants, and conjoined consonants, where the chandrakkala is still used but hidden, and special chillu shapes.

As part of a cluster, RA has special forms. As a medial consonant in the modern orthography it appears as a simple glyph to the left of the letters spoken before it. When initial in the cluster its glyph includes a cillu hook at the top right. There are also special rules involving clusters of multiple RA letters.

Syllable-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants with chandrakkala, or 6 chillu forms. The word-final virama sometimes represents a half-u sound, rather than completely killing the inherent vowel. Because of this, Malayalam uses a set of syllable-final consonants called chillus that have no vowel sound associated with them.

❯ Vowel summary table

The Malayalam orthography is an abugida with one inherent vowel. It represents other vowels using 12 vowel signs, all combining marks. Also the word-final half-u sound is written in modern Malayalam using U+0D4D SIGN VIRAMA (candrakkala).

The orthography includes 3 pre-base vowels and 3 circumgraphs. All circumgraphs can be decomposed, creating composite vowels. The only composite vowels are those created by decomposition of the circumgraphs, and involve 2 glyphs, one on each side of the base consonant(s).

Standalone vowel sounds are written using 12 independent vowels, one for each vowel sound, including the inherent vowel.

There is also a set of vocalics.

There is an archaic set of numbers that include digits beyond the normal 0-9 range, and include a number of fractional symbols.

Character index

The index points to locations where a character is mentioned in this page, and indicates whether it is used by the Malayalam orthography described here.

Manage characters.

Click on the image to the left to view all the 'main' and 'infrequent' characters in the index in various groupings or open related apps.

Letters

Show

Basic consonants

list all 36
0D2A
MALAYALAM LETTER PAconsonant p p
0D2C
MALAYALAM LETTER BAconsonant b b
0D24
MALAYALAM LETTER TAconsonant t
0D26
MALAYALAM LETTER DAconsonant d
0D1F
MALAYALAM LETTER TTAconsonant ʈ
0D21
MALAYALAM LETTER DDAconsonant ɖ
0D15
MALAYALAM LETTER KAconsonant k k
0D17
MALAYALAM LETTER GAconsonant ɡ g
0D2B
MALAYALAM LETTER PHAconsonant ph
0D2D
MALAYALAM LETTER BHAconsonant bh
0D25
MALAYALAM LETTER THAconsonant t̪ʰ th
0D27
MALAYALAM LETTER DHAconsonant d̪ʰ dh
0D20
MALAYALAM LETTER TTHAconsonant ʈʰ ṭh
0D22
MALAYALAM LETTER DDHAconsonant ɖʰ ḍh
0D16
MALAYALAM LETTER KHAconsonant kh
0D18
MALAYALAM LETTER GHAconsonant ɡʰ gh
0D1A
MALAYALAM LETTER CAconsonant t͡ʃ c
0D1C
MALAYALAM LETTER JAconsonant ɟ j
0D1B
MALAYALAM LETTER CHAconsonant t͡ʃʰ ch
0D1D
MALAYALAM LETTER JHAconsonant ɟʰ jh
0D38
MALAYALAM LETTER SAconsonant s s
0D37
MALAYALAM LETTER SSAconsonant ʂ
0D36
MALAYALAM LETTER SHAconsonant ɕ~ʃ ś
0D39
MALAYALAM LETTER HAconsonant ɦ h
0D2E
MALAYALAM LETTER MAconsonant m m
0D28
MALAYALAM LETTER NAconsonant n̪~n n
0D23
MALAYALAM LETTER NNAconsonant ɳ
0D1E
MALAYALAM LETTER NYAconsonant ɲ ñ
0D19
MALAYALAM LETTER NGAconsonant ŋ
0D35
MALAYALAM LETTER VAconsonant ʋ v
0D30
MALAYALAM LETTER RAconsonant r
0D31
MALAYALAM LETTER RRAconsonant r t
0D34
MALAYALAM LETTER LLLAconsonant ɻ
0D32
MALAYALAM LETTER LAconsonant l l
0D33
MALAYALAM LETTER LLAconsonant ɭ
0D2F
MALAYALAM LETTER YAconsonant j y

Vowels

list all 12
0D07
MALAYALAM LETTER Iindependent vowel i i
0D08
MALAYALAM LETTER IIindependent vowel ī
0D09
MALAYALAM LETTER Uindependent vowel u u
0D0A
MALAYALAM LETTER UUindependent vowel ū
0D0E
MALAYALAM LETTER Eindependent vowel e e
0D0F
MALAYALAM LETTER EEindependent vowel ē
0D12
MALAYALAM LETTER Oindependent vowel o o
0D13
MALAYALAM LETTER OOindependent vowel ō
0D05
MALAYALAM LETTER Aindependent vowel a a
0D06
MALAYALAM LETTER AAindependent vowel ā
0D10
MALAYALAM LETTER AIindependent vowel ai̯ ai
0D14
MALAYALAM LETTER AUindependent vowel au̯ au

Vocalics

list all 4
0D0B
MALAYALAM LETTER VOCALIC Rvocalic independent vowel r̥̣
0D60
(rare)    MALAYALAM LETTER VOCALIC RRvocalic independent vowel Very rare. rɨː r̥̣̄
0D0C
(rare)    MALAYALAM LETTER VOCALIC Lvocalic independent vowel Used in one Sanskrit word only. l̥̣
0D61
(rare)    MALAYALAM LETTER VOCALIC LLvocalic independent vowel Very rare. lɨː l̥̣̄

Dead consonants

list all 9
ൿ0D7F
MALAYALAM LETTER CHILLU Kchillu consonant k k^
0D7A
MALAYALAM LETTER CHILLU NNchillu consonant ɳ ṇ^
0D7B
MALAYALAM LETTER CHILLU Nchillu consonant n n^
0D7C
MALAYALAM LETTER CHILLU RRchillu consonant r r^
0D7D
MALAYALAM LETTER CHILLU Lchillu consonant l l^
0D7E
MALAYALAM LETTER CHILLU LLchillu consonant ɭ ḷ^
0D54
MALAYALAM LETTER CHILLU Mchillu consonant m
0D55
MALAYALAM LETTER CHILLU Ychillu consonant j
0D56
MALAYALAM LETTER CHILLU LLLchillu consonant l?

Not used for modern Malayalam

list both
0D29
(archaic)    MALAYALAM LETTER NNNAconsonant n
0D3A
(archaic)    MALAYALAM LETTER TTTAconsonant t

Combining marks

Show

Vowels

list all 11
0D46
MALAYALAM VOWEL SIGN Evowel sign e e
0D47
MALAYALAM VOWEL SIGN EEvowel sign ē
0D48
MALAYALAM VOWEL SIGN AIvowel sign ai̯ ai
0D4A
MALAYALAM VOWEL SIGN Ovowel sign o o
0D4B
MALAYALAM VOWEL SIGN OOvowel sign ō
ി0D3F
MALAYALAM VOWEL SIGN Ivowel sign i i
0D40
MALAYALAM VOWEL SIGN IIvowel sign ī
0D41
MALAYALAM VOWEL SIGN Uvowel sign u u
0D42
MALAYALAM VOWEL SIGN UUvowel sign ū
0D3E
MALAYALAM VOWEL SIGN AAvowel sign ā
0D57
MALAYALAM AU LENGTH MARKvowel sign au̯ au
list
0D4C
(archaic)    MALAYALAM VOWEL SIGN AUvowel sign Historic use only. au̯ au

Vocalics

list
0D43
MALAYALAM VOWEL SIGN VOCALIC Rvocalic vowel sign
list all 3
0D44
(unused)    MALAYALAM VOWEL SIGN VOCALIC RRvocalic vowel sign Very rare. rɨː r̥̄
0D62
(unused)    MALAYALAM VOWEL SIGN VOCALIC Lvocalic vowel sign Very rare.
0D63
(unused)    MALAYALAM VOWEL SIGN VOCALIC LLvocalic vowel sign Very rare. lɨː l̥̄

Bindu

list
0D02
MALAYALAM SIGN ANUSVARAfinal consonant m

Virama

list
0D4D
MALAYALAM SIGN VIRAMAvirama

Visarga

list
0D03
MALAYALAM SIGN VISARGAfinal consonant ɦ

Not used for modern Malayalam

list
0D4E
(archaic)    MALAYALAM LETTER DOT REPHrepha Used before the 1970s reform.

Numbers

Show
list all 10
0D66
(infrequent)    MALAYALAM DIGIT ZEROdigit 0
0D67
(infrequent)    MALAYALAM DIGIT ONEdigit 1 1
0D68
(infrequent)    MALAYALAM DIGIT TWOdigit 2 2
0D69
(infrequent)    MALAYALAM DIGIT THREEdigit 3 3
0D6A
(infrequent)    MALAYALAM DIGIT FOURdigit 4 4
0D6B
(infrequent)    MALAYALAM DIGIT FIVEdigit 5 5
0D6C
(infrequent)    MALAYALAM DIGIT SIXdigit 6 6
0D6D
(infrequent)    MALAYALAM DIGIT SEVENdigit 7 7
0D6E
(infrequent)    MALAYALAM DIGIT EIGHTdigit 8 8
0D6F
(infrequent)    MALAYALAM DIGIT NINEdigit 9 9

Not used for modern Malayalam

list all 19
0D70
(archaic)    MALAYALAM NUMBER TENnumber sign
0D71
(archaic)    MALAYALAM NUMBER ONE HUNDREDnumber sign
0D72
(archaic)    MALAYALAM NUMBER ONE THOUSANDnumber sign
0D73
(archaic)    MALAYALAM FRACTION ONE QUARTERnumber sign
0D74
(archaic)    MALAYALAM FRACTION ONE HALFnumber sign
0D75
(archaic)    MALAYALAM FRACTION THREE QUARTERSnumber sign
0D58
(archaic)    MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETHfraction
0D59
(archaic)    MALAYALAM FRACTION ONE FORTIETHfraction
0D5A
(archaic)    MALAYALAM FRACTION THREE EIGHTIETHSfraction
0D5B
(archaic)    MALAYALAM FRACTION ONE TWENTIETHfraction
0D5C
(archaic)    MALAYALAM FRACTION ONE TENTHfraction
0D5D
(archaic)    MALAYALAM FRACTION THREE TWENTIETHSfraction
0D5E
(archaic)    MALAYALAM FRACTION ONE FIFTHfraction
0D76
(archaic)    MALAYALAM FRACTION ONE SIXTEENTHfraction
0D77
(archaic)    MALAYALAM FRACTION ONE EIGHTHfraction
0D78
(archaic)    MALAYALAM FRACTION THREE SIXTEENTHSfraction
A830
(archaic)    NORTH INDIC FRACTION ONE QUARTERfraction
A831
(archaic)    NORTH INDIC FRACTION ONE HALFfraction
A832
(archaic)    NORTH INDIC FRACTION THREE QUARTERSfraction

Punctuation

Show
list both
0964
(archaic)    DEVANAGARI DANDAphrase separator for older texts .
0965
(archaic)    DEVANAGARI DOUBLE DANDAphrase separator for older texts

ASCII

list all 12
2018
LEFT SINGLE QUOTATION MARKquotation mark
2019
RIGHT SINGLE QUOTATION MARKquotation mark
201C
LEFT DOUBLE QUOTATION MARKquotation mark
201D
RIGHT DOUBLE QUOTATION MARKquotation mark
(0028
LEFT PARENTHESISparenthesis (
)0029
RIGHT PARENTHESISparenthesis )
,002C
COMMAcomma ,
.002E
FULL STOPfull stop .
:003A
COLONcolon :
;003B
SEMICOLONsemicolon ;
?003F
QUESTION MARKquestion mark ?
!0021
EXCLAMATION MARKexclamation mark !

Symbols

Show
list
0D79
(rare)    MALAYALAM DATE MARKdate sign Usage is fading.

Not used for modern Malayalam

list
0D4F
(archaic)    MALAYALAM SIGN PARAmeasure of rice

Other

Show
list both
ZWNJ200C
ZERO WIDTH NON-JOINERzero-width non-joiner
ZWJ200D
ZERO WIDTH JOINERzero-width joiner

To be investigated

list all 26
%0025
(tbc)    PERCENT SIGNpercentage mark
[005B
(tbc)    LEFT SQUARE BRACKETbracket [
]005D
(tbc)    RIGHT SQUARE BRACKETbracket ]
§00A7
(tbc)    SECTION SIGNsection sign §
«00AB
(tbc)    LEFT-POINTING DOUBLE ANGLE QUOTATION MARKquotation mark
»00BB
(tbc)    RIGHT-POINTING DOUBLE ANGLE QUOTATION MARKquotation mark
ʼ02BC
(tbc)    MODIFIER LETTER APOSTROPHEapostrophe ʼ
͏034F
(tbc)    COMBINING GRAPHEME JOINERcombining grapheme joiner
0D00
(tbc)    MALAYALAM SIGN COMBINING ANUSVARA ABOVE
0D01
(tbc)    MALAYALAM SIGN CANDRABINDUnasalisation ̃ ̃
0D3B
(tbc)    MALAYALAM SIGN VERTICAL BAR VIRAMA
0D3C
(tbc)    MALAYALAM SIGN CIRCULAR VIRAMA
0D3D
(tbc)    MALAYALAM SIGN AVAGRAHAavagraha
200B
(tbc)    ZERO WIDTH SPACEzero-width space
2011
(tbc)    NON-BREAKING HYPHENnon-breaking hyphen
2013
(tbc)    EN DASHen dash
2014
(tbc)    EM DASHem dash
2020
(tbc)    DAGGERdagger
2021
(tbc)    DOUBLE DAGGERdouble dagger
2026
(tbc)    HORIZONTAL ELLIPSISellipsis
2030
(tbc)    PER MILLE SIGNper mille mark
2032
(tbc)    PRIMEprime
2033
(tbc)    DOUBLE PRIMEdouble prime
2039
(tbc)    LEFT SINGLE QUOTATION MARKquotation mark
203A
(tbc)    RIGHT SINGLE QUOTATION MARKquotation mark
2060
(tbc)    U+2060 WORD JOINERword joiner

Phonology

These are sounds for the Malayalam language.

Click on the sound groups to see where else in the document each of the sounds are referred to.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i ɨ̆ ɨ̆ u e o ə ə a

Diphthongs

ai au

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar glottal
stops p b t d t   ʈ ɖ   k ɡ  
aspirates     ʈʰ ɖʰ   ɡʰ  
affricates       t͡ʃ d͡ʒ        
aspirates       t͡ʃʰ d͡ʒʰ        
fricatives f   s ɕ ʂ     h
nasals m n   ɳ ɲ ŋ
approximants ʋ   l   ɻ ɭ j  
trills/flaps     ɾ   ɽ

Tone

Malayalam is not a tonal language.

Structure

tbd

Vowels

The Malayalam orthography is an abugida with one inherent vowel. It represents other vowels using 12 vowel signs, all combining marks. Also the word-final half-u sound is written in modern Malayalam using U+0D4D SIGN VIRAMA (candrakkala).

The orthography includes 3 pre-base vowels and 3 circumgraphs. All circumgraphs can be decomposed, creating composite vowels. The only composite vowels are those created by decomposition of the circumgraphs, and involve 2 glyphs, one on each side of the base consonant(s).

Standalone vowel sounds are written using 12 independent vowels, one for each vowel sound, including the inherent vowel.

There is also a set of vocalics.

Vowel summary table

The following table summarises the main vowel to character assigments.

ⓘ represents the inherent vowel. Diacritics are added to the vowels to indicate nasalisation (not shown here).

  post-consonant standalone
Plain:

4
iിii0D3F
īī0D40
uuu0D41
ūū0D42

4
ii0D07
īị̄0D08
uu0D09
ūụ̄0D0A

4
eee0D46
ēē0D47
ooo0D4A
ōō0D4B

4
ee0D0E
ēẹ̄0D0F
oo0D12
ōọ̄0D13

ə ͞0D4D

əഎ്  0D0E
0D4D

both
a  24D8
āā0D3E

both
aaạ̄0D05
ā0D06
Diphthongs:

3
ai̯ ai0D48
au̯ au0D57
au̯archaicauȧʷ0D4C

both
ai̯aiạʲ0D10
au̯auạʷ0D14
Vocalics:

0D43

4
 r̥̣r̥̣0D0B
rɨːrarer̥̣̄r̥̣̄0D60
rarel̥̣l̥̣0D0C
lɨːrarel̥̣̄l̥̣̄0D61

For additional details see Vowel sounds to characters.

Inherent vowel

ka U+0D15 MALAYALAM LETTER KA

An inherent vowel is a vowel sound that is automatically pronounced after a consonant letter, unless specifically suppressed.

The inherent vowel for Malayalam is pronounced a, so ka is written by simply using the consonant letter, eg.

കനത്ത kanat̪t̪a heavy

Vowel absence

ക് k U+0D15 MALAYALAM LETTER KA + U+0D4D MALAYALAM SIGN VIRAMA​

Malayalam uses U+0D4D SIGN VIRAMA (in Malayalam called ചന്ദ്രക്കല cn͓d͓rk͓kl (candrakkala) ʧand̪r̪akkala) to kill the inherent vowel after a consonant, eg. ക്U+0D15 LETTER KA + U+0D4D SIGN VIRAMA explicitly represents just the sound k.

However, in modern text, U+0D4D SIGN VIRAMA may also represent the sound ə̆ or ɨ̆, depending on dialect, see Half-U).

Word-final consonants are typically written using chillu characters, rather than using the virama (see Finals), and in conjuncts the virama is hidden (see Consonant clusters).

Post-consonant vowels

A vowel sign is attached to a consonant base to express a following vowel sound. Sometimes vowel signs have multiple parts, which are displayed on different sides of the base consonant or cluster. They are known as 'matras' in Sanskrit.

Post-consonant vowels are written using 12 vowel signs, all combining marks. Also the word-final half-u sound may be written in modern Malayalam using U+0D4D SIGN VIRAMA.

The orthography includes 3 pre-base vowels and 3 circumgraphs. Normally, Malayalam text has no composite vowels, but all circumgraphs can be decomposed, creating composite vowels that involve 2 glyphs, one on each side of the base consonant(s).

All of the vowel signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

All vowel signs are typed and stored after the base consonant, and the glyph rendering system takes care of the positioning at display time. Conjuncts are treated as indivisible units when it comes to rendering vowel signs, meaning that pre-base vowel signs and left-side glyphs of circumgraphs are rendered before the conjunct as a whole (see Pre-base vowel signs).

Vowel signs may also be attached to digits (see Ordinals).

The Unicode block also contains U+0D3D SIGN AVAGRAHA and U+0D01 SIGN CANDRABINDU, for use with Sanskrit texts.5

Basic plain vowel sounds

കി ki U+0D15 MALAYALAM LETTER KA + U+0D3F MALAYALAM VOWEL SIGN I

Malayalam uses the following dedicated combining marks for plain vowel sounds.


9
ിiii0D3F
īī0D40
uuu0D41
ūū0D42
eee0D46
ēē0D47
ooo0D4A
ōō0D4B
āā0D3E

In the older orthography, the u and ū vowel signs, and to some extent the i and ī signs, tend to form ligatures with the base consonant. The shape of the u vowel sign has also changed recently, to avoid the complications of the older ligated forms. See Vowel ligatures & orthographic reforms.

Half-U

Although it normally represents the virama, at the end of a word in modern Malayalam text the combination ക്U+0D15 LETTER KA + U+0D4D SIGN VIRAMA may also represent the sound kə̆ or kɨ̆ (depending on dialect). The transcription for this is usually ŭ, and it is called half-u.

കട്ടയാക് kaʈʈajaːkɨ̆ to freeze

In older documents the half-u was typically written with a u vowel sign plus chandrakkala, which is not ambiguous. It is unusual for a virama to occur after a vowel sign, like this. പാലു് paːlə milk, pink

The Unicode Standard provides examples of half-u occurring in positions that are not word-final, such as before U+0D02 SIGN ANUSVARA, eg.

ഐശീല്ം ai̯ɕiːləm than ice

In another example, the chandrakkala is attached to an independent vowel letter, and overrides the sound of that letter, eg. എ്ന്നാ ənnaː on which day?

The chandrakkala is always written after any vowel sign.

Diphthongs

Two Malayalam diphthongs are also written using vowel signs.


3
 ai̯ai0D48
 au̯au0D57
archaicau̯auȧʷ0D4C

Although it was originally just an indicator of vowel length, U+0D57 AU LENGTH MARK has been used for some time as the normal way to write the sound au̯. Previously, that sound was represented by a vowel sign that surrounded the base character. If you want to use that form in your text you should use U+0D4C VOWEL SIGN AU, however some fonts still hide the left-hand part from display.

Composite vowels

A composite vowel sign is a single vowel sound or diphthong that is represented by more than one code point from the set of vowel signs, repurposed consonants, and diacritics available. It is the opposite of a circumgraph.

Composite vowels only occur in decomposed text, where the glyphs in circumgraphs are split into separate code points.


3
ൊō0D46
0D3E
ോai̯aiēā0D47
0D3E
ൌau̯aueaʷ0D46
0D57

An unusual feature of Malayalam is that there are circumstances where a decomposed circumgraph has to be used. See Clusters with RA.

Pre-base vowel signs

കെ ke U+0D15 MALAYALAM LETTER KA + U+0D46 MALAYALAM VOWEL SIGN E


3
eee0D46
ēē0D47
   
ai̯ai0D48

Three vowel signs appear to the left of the base consonant letter or cluster, eg.

കെട്ട് keʈʈɨ̆ to tie

കേട് keːʈɨ̆ bad

കൈപ്പ് kɐi̯ppɨ̆ bitterness (taste)

These are combining marks that are always typed and stored after the base consonant(s), ie. the codepoints follow the order in which the items are pronounced. The rendering process places the glyph before the base consonant without changing the code points. The following shows the sequence of code points that make up the first word just above.

KAETTAviramaTTAvirama

These vowel signs do not split a conjunct. This means that a word with a consonant cluster at the start separates the pre-base vowel from the position where it is pronounced by more than one consonant character, eg.

ഡ്രൈവ് ɖrɐi̯ʋɨ̆ drive

ഡ്രൈɖrɐi̯DDA+virama+RA+AI
ഇദ്ദേഹം
A prebase vowel, pronounced after a consonant cluster, but rendered to the left of the conjunct.
show composition

ഇദ്ദേഹം iddeham he

However, if the cluster is split by a visible virama, the pre-base vowel sign appears after the consonant with the virama. If you view the composition of the example below, you'll see that the characters and code point orders are the same as for the previous example (apart from the addition of the ZWNJ to force the virama to appear), but the location of the pre-base vowel sign is now immediately before the consonant after which it is pronounced.

ഇദ്‌ദേഹം
The same word, but without the conjunct. The vowel is now rendered to the left of the last consonant in the cluster.
show composition

ഇദ്ദേഹം iddeham he

Circumgraphs

കൊ ko U+0D15 MALAYALAM LETTER KA + U+0D4A MALAYALAM VOWEL SIGN O

When a single vowel sign code point produces glyphs on more than one side of the consonant base, it is referred to here as a circumgraph.


3
 ooo0D4A
 ōō0D4B
    
archaicau̯auȧʷ0D4C

Three vowels are produced by a single combining character with visually separate parts, that appear on opposite sides of the consonant onset, eg.

ഒരൊറ്റ oɾotɐ single

ആഗോള äːɡoːɭɐ global

As for pre-base vowel signs, these do not split a conjunct, but instead they treat the conjunct as a single unit and place glyphs either side of it, eg.

ക്രോധം kroːd̪ʱam anger, fury

ക്രോkroː KA+virama+RA+OO
കൊച്ച്
A circumgraph vowel: a single code point with glyphs on both sides of the consonant after which it is pronounced.
show composition

കൊച്ച് kott͡ʃɨ̆ child

In modern text, U+0D57 AU LENGTH MARK has become a dominant way to write the vowel au̯, rather than U+0D4C VOWEL SIGN AU, eg. സൗന്ദര്യം sɐun̪d̪ɐɾjɐm beauty

All of these circumgraphs can be written as a single character, or as two. Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround, and if the vowel signs are decomposed, they must be typed and stored in left to right order. See Encoding circumgraphs.

Vowel length

Vowel length is indicated by the vowel sign used (see Basic plain vowel sounds).

Standalone vowels

Standalone vowels are vowel sounds that are not preceded by a consonant sound, or are preceded by only a glottal stop. They may appear at the beginning of a word or in the middle of a word after a preceding vowel.

Malayalam represents standalone vowels using a set of independent vowel letters.


12
ii0D07
īị̄0D08
uu0D09
ūụ̄0D0A
ee0D0E
ēẹ̄0D0F
oo0D12
ōọ̄0D13
aaạ̄0D05
ā0D06
   
ai̯aiạʲ0D10
au̯auạʷ0D14

For example:

എല്ലാ ellaː all

ഓടുക oːɖuɡɐ to run

സിമേഈ simei Simei (place in Singapore)

The set includes a character to represent the inherent vowel sound (U+0D05 LETTER A ).

The Unicode block also contains U+0D5F LETTER ARCHAIC II, which is an archaic form for .5

Vowel sounds to characters

This section maps Malayalam vowel sounds to common graphemes in the Malayalam orthography.

Code points are labelled as either dependent or standalone.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

Plain vowels

i

dependent ിU+0D3F VOWEL SIGN I

standalone U+0D07 LETTER I

dependent U+0D40 VOWEL SIGN II

standalone U+0D08 LETTER II

u

dependent U+0D41 VOWEL SIGN U

standalone U+0D09 LETTER U

dependent U+0D42 VOWEL SIGN UU

standalone U+0D0A LETTER UU

e

dependent U+0D46 VOWEL SIGN E

standalone U+0D0E LETTER E

dependent U+0D47 VOWEL SIGN EE

standalone U+0D0F LETTER EE

o

dependent U+0D4A VOWEL SIGN O

standalone U+0D12 LETTER O

dependent U+0D4B VOWEL SIGN OO

standalone U+0D13 LETTER OO

ə~ɨ̆

dependent U+0D4D SIGN VIRAMA at the end of a word.

dependent ു്U+0D41 VOWEL SIGN U + U+0D4D SIGN VIRAMA at the end of a word in older texts.

standalone എ്U+0D0E LETTER E + U+0D4D SIGN VIRAMA

a

inherent vowel eg. കടുവ kɐɖuʋɐ tiger

standalone U+0D05 LETTER A

dependent U+0D3E VOWEL SIGN AA

standalone U+0D06 LETTER AA

Diphthongs and other combinations

ai̯

dependent U+0D48 VOWEL SIGN AI

standalone U+0D10 LETTER AI

au̯

dependent U+0D57 AU LENGTH MARK

standalone U+0D14 LETTER AU

Vocalics

Vocalics are letters derived from Sanskrit that generally behave like vowels, but represent r/l followed by a vowel. They are often available both as vowel signs and independent vowel letters.

The modern orthography generally only uses the following.5


both
0D43
r̥̣r̥̣0D0B

Examples:

മൃഗം mriɡɐm animal

ഋഷി riʂi rishi, sage

The items in the list below are rare and typically used only to write Sanskrit in Malayalam.5


6
unusedrɨːr̥̄r̥̄0D44
rarerɨːr̥̣̄r̥̣̄0D60
unused0D62
rarel̥̣l̥̣0D0C
unusedlɨːl̥̄l̥̄0D63
rarelɨːl̥̣̄l̥̣̄0D61

Consonants

Malayalam uses 36 basic consonant letters.

Consonant clusters are typically indicated in modern Malayalam using the visible chandrakkala mark (virama), which indicates that no vowel follows a consonant. Conjunct forms are also expressed using stacked consonants, and conjoined consonants, where the chandrakkala is still used but hidden, and special chillu shapes.

As part of a cluster, RA has special forms. As a medial consonant in the modern orthography it appears as a simple glyph to the left of the letters spoken before it. When initial in the cluster its glyph includes a cillu hook at the top right. There are also special rules involving clusters of multiple RA letters.

Syllable-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants with chandrakkala, or 6 chillu forms. The word-final virama sometimes represents a half-u sound, rather than completely killing the inherent vowel. Because of this, Malayalam uses a set of syllable-final consonants called chillus that have no vowel sound associated with them.

Consonant summary table

The following table summarises the main consonant to character assigments.

For additional details see Vowel sounds to characters.

  onsets finals
Consonants

8
ppp0D2A
bbb0D2C
tt0D24
dd0D26
ʈʈ0D1F
ɖɖ0D21
kkk0D15
ɡgg0D17

-kൿk^0D7F

8
ph0D2B
bh0D2D
t̪ʰth0D25
d̪ʰdh0D27
ʈʰṭhʈʰ0D20
ɖʰḍhɖʰ0D22
kh0D16
ɡʰgh0D18
 

4
t͡ʃcc0D1A
ɟjj0D1C
    
t͡ʃʰch0D1B
ɟʰjh0D1D
 

5
ʋvʋ0D35
sss0D38
ʂʂ0D37
ɕ~ʃśʃ0D36
ɦhh0D39

0D03

5
mmm0D2E
n̪~nnn0D28
ɳɳ0D23
ɲñɲ0D1E
ŋŋ0D19

3
-m0D02
-n0D7B
0D7A

6
rr0D30
r t0D31
ɻɻ0D34
lll0D32
ɭɭ0D33
jyy0D2F

3
-r0D7C
-l0D7D
0D7E

Basic consonants

The basic consonant sounds in Malayalam are written using the following letters. Click on each letter for more details and for example usage.


36
ppp0D2A
ph0D2B
bbb0D2C
bh0D2D
tt0D24
t̪ʰth0D25
dd0D26
d̪ʰdh0D27
ʈʈ0D1F
ʈʰṭhʈʰ0D20
ɖɖ0D21
ɖʰḍhɖʰ0D22
kkk0D15
kh0D16
ɡgg0D17
ɡʰgh0D18
t͡ʃcc0D1A
t͡ʃʰch0D1B
ɟjj0D1C
ɟʰjh0D1D
ʋvʋ0D35
sss0D38
ʂʂ0D37
ɕ~ʃśʃ0D36
ɦhh0D39
mmm0D2E
n̪~nnn0D28
ɳɳ0D23
ɲñɲ0D1E
ŋŋ0D19
rr0D30
r t0D31
ɻɻ0D34
lll0D32
ɭɭ0D33
jyy0D2F

The conjunct ക്ഷU+0D15 LETTER KA + U+0D4D SIGN VIRAMA + U+0D37 LETTER SSA is conventionally regarded as an additional letter in the Malayalam alphabet.1420

U+0D31 LETTER RRA has some complicated mappings to sounds. See Clusters with RRA.

Other consonants

Two other letters, U+0D29 LETTER NNNA and U+0D3A LETTER TTTA are historic and used rarely in scholarly texts to represent alveolar sounds. In ordinary texts, U+0D28 LETTER NA and U+0D31 LETTER RRA are used instead.7

Onsets

Clusters of consonant letters at the beginning of an orthographic syllable occur in Malayalam, and are typically handled as described in the section Consonant clusters. Examples:

വ്യക്തി ʋjɐkt̪i person

സ്വാഗതം sʋɐːɡɐd̪ɐm welcome

A medial RA is rendered to the left of an initial consonant or conjunct (see Medial RA).

ഗ്രാമ്പൂ ɡɾaːmbuː clove

In some cases, a medial consonant is represented using a vocalic vowel sign, rather than a consonant, eg.

കൢപ്തം kɭipt̪ɐm fixed, limited

Medial RA

U+0D30 LETTER RA when non-initial in a cluster is displayed to the left of the other consonant(s) in the reformed orthography. This transposition is done during the font rendering – the typed and stored order remains the same as the spoken order.

ചക്രം t͡ʃɐkrɐm wheel

ക്രkr̪aKA+virama+RA

When RA follows more than one consonant, it is displayed to the left of any conjunct, not just to the left of the preceding consonant, eg.

ചന്ദ്രക്കല ʧand̪r̪akkala chandrakala

ന്ദ്രnd̪r̪aNA+virama+DA+virama+RA

Finals

Malayalam words can end with a consonant sound or an inherent vowel, and the difference is usually marked. However, unlike many other Brahmi-derived scripts, the virama (chandrakkala) is not normally used to kill the inherent vowel at the end of a word. This is because, in Malayalam, the chandrakkala actually can indicate a short vowel sound (see Half-U). Instead, Malayalam uses a set of special characters, called chillus, that don't have a following vowel, or a final-consonant diacritic.

The chillu characters can also appear as word-medial syllable codas, but often these codas are written using either a chandrakkala or a conjunct (see Consonant clusters).

Codas can also be written using a couple of diacritics, described below.

Chillus

Contemporary chillu (or cillakṣaram) characters include the following.


6
ൿkk^0D7F
nn^0D7B
ɳṇ^ɳ̽0D7A
rr^0D7C
ll^0D7D
ɭḷ^ɭ̽0D7E

ൿU+0D7F LETTER CHILLU K is relatively rare in modern texts.5

Examples:

വില്ലൻ ʋillan villain

അയാൾ ajaːɭ he

അവർ aʋar they

Although they represent 'dead' consonants, chillu forms can sometimes be followed by a virama and subjoined consonant, although this is only likely to occur in modern texts in the combination ൻ്റU+0D7B LETTER CHILLU N + U+0D4D SIGN VIRAMA + U+0D31 LETTER RRA (see Clusters with RRA).

The Unicode Standard lists a small number of other, historical combinations of a chillu consonant as part of a stack.

Unicode v9 introduced 3 more chillu letters, U+0D54 LETTER CHILLU M, U+0D55 LETTER CHILLU Y, and U+0D56 LETTER CHILLU LLL, which are only found in historical texts.5


3
m0D54
j0D55
l?0D56

In older Unicode text chillus were written by following the consonant with ് + ‍U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER, but since the introduction of the chillu characters in Unicode v5.1 the use of the atomic characters is recommended, instead.5

RA coda

Malayalam has 3 ways of writing an r coda before a following consonant.5

Chillu RA Since the orthographic reform, this has been written as U+0D7C LETTER CHILLU RR followed by the next consonant in the cluster, eg.

നേർത്ത n̪eːɾt̪t̪a thin

ഈർപ്പമുള്ള iːɾppamuɭɭa wet

Special form The form ര്യU+0D30 LETTER RA + U+0D4D SIGN VIRAMA + U+0D2F LETTER YA is only used before U+0D2F LETTER YA5, eg.

സൗന്ദര്യം sɐun̪d̪ɐɾjɐm beauty

ഭാര്യ bʱaːɾjɐ wife

Dot reph Before the 1970s, a dot or small vertical stroke was used over the following consonant, in a similar way to the repha in other Indic scripts. It may still be in use by people educated before that time. The character U+0D4E LETTER DOT REPH is used to reproduce this, eg. compare:

തൎക്കം tarkkam argument

തർക്കം tarkkam argument

The reph dot is not a combining character; it has the general category of letter. It is typed and stored in the same place as you would expect to find the RA + VIRAMA, ie. before the consonant it appears over, and then the font needs to position the glyph over the consonant that follows it.

Anusvara & visarga


both
m0D02
ɦ0D03

Malayalam also uses the anusvara and visarga as syllable-final characters, eg. the following word contains both:

ദുഃഖം d̪uɦkʰam sorrow

The anusvara normally represents the sound m, but may be assimilated to another nasal consonant. It can be used multiple times after a vowel,5 eg.

ഈംംംം ị̄m̽m̽m̽m̽

Consonant clusters

A consonant cluster is a sequence of consonant sounds with no intervening vowels.

A conjunct is a consonant cluster where the lack of intervening vowels is indicated by one or more of stacking, changing and merging the shapes of the constituent letter forms (usually in abugidas). Not all consonant clusters are displayed as conjuncts.

The absence of a vowel sound between two or more consonants is visually indicated in one of the following ways.

  1. A visible chandrakkala character above the top right of the initial consonant.
  2. Conjuncts. There are a number of possibilities here.
    1. Stacked consonants, where the non-initial consonant appears below the initial, usually with a reduced or ligated form.
    2. Conjoined consonants, where consonants sit side-by-side but with some ligation or different forms than usual.
    3. Special 'chillu' shapes for the initial consonant (a special case of conjoined consonants).
    4. Special RA forms : The letter RA has its own idiosyncratic ways of combining with other consonants, whether it precedes or follows them. See RA coda, Medial RA, and Clusters with RRA.

See also syllable-final consonants, which may be followed by a regular consonant.

If the font supports it, ‌U+200C ZERO WIDTH NON-JOINER (ZWNJ) and ‍U+200D ZERO WIDTH JOINER (ZWJ) can be used to control the shaping of conjuncts. See Explicit shaping controls.

Conjunct formation

In Unicode, the stacking and conjoining behaviour is achieved by adding U+0D4D SIGN VIRAMA between the consonants. The font hides the glyph automatically.

NYA+virama+CAഞ്ചɲt͡ʃ

The rendered shape of the conjunct may also vary from font to font. Traditional fonts have more ligatures than modern ones. For example, compare the sequences below which are identical except that Noto Serif Malayalam is used for the top row, and Manjari is used for the bottom row.

PA+virama+TAപ്തpt̪
PA+virama+TAപ്തpt̪

The link at the bottom of this section shows all combinations of two consonants and allows you to observe the effect of changing the font. Versions of the table with conjuncts highlighted are available for Noto Serif Malayalam and Thoolika Traditional Unicode, which is split into those that are stacked, etc, and those using chillu shapes. The number of conjuncts is 129 and 219+156, respectively.

There doesn't appear to be much in the way of a systematic approach to shaping. With a few exceptions, the conjuncts are specific to particular pairs of characters. Sequences involving more than two consonants in a cluster can combine a variety of methods. The example in Figure 4 shows 3 conjoined consonants in the middle, and a conjoined cluster stacked below another letter at the end.

ഇന്ത്യയ്ക്ക്
The word ഇന്ത്യയ്ക്ക് ịn͓t͓yy͓k͓k͓ to India in a traditional font.
show composition

ഇന്ത്യയ്ക്ക് to India

Malayalam has 2 other virama code points, both of which are only found in historical texts, and used to indicate a pure consonant in a particular orthography. They are U+0D3B SIGN VERTICAL BAR VIRAMA and U+0D3C SIGN CIRCULAR VIRAMA.5

See a table of 2-consonant clusters.
The table allows you to test results for various fonts.

Visible chandrakkala

This was promoted as the default by the orthographic reforms of the 1970s. It is also the fallback if the font doesn't contain conjunct forms for a particular cluster of consonants.

Examples include ആഴ്ച äːɻt͡ʃɐ week ഗുല്ഫം gulfam ankle നമസ്തേ n̪amast̪eː hello

Stacking

The non-initial consonant is drawn below the initial consonant, and with a slightly different shape.

KA+virama+LAക്ലkl
YA+virama+YAയ്യjj
SA+virama+RRA+virama+RRAസ്റ്റst(t)
Examples of conjuncts formed by subjoining non-initial consonants.

Stacks tend to be particularly common for geminated consonants, even when those consonants don't participate in other conjunct pairings. In such cases, the second consonant is sometimes represented by a small triangle.

Otherwise, the subjoined consonant may be a reduced version of the original, or may be ligated. Note that LA has a very different shape from normal when in subjoined position.

Although Malayalam consonant clusters can often involve 3 consonants, the 3-character stack in the third example in Figure 5 is unusual.

Show stacked conjuncts in the Noto Serif Malayalam font

27
ക്ടk͓ʈ␣k͓l␣g͓g␣c͓c␣c͓cʰ␣ʈ͓ʈ␣ɖ͓ɖ␣ɖ͓ɖʰ␣ɳ͓ɳ␣t͓l␣d͓d␣p͓p␣p͓l␣b͓d␣b͓dʰ␣b͓b␣b͓l␣m͓l␣ṙ͓ṙ␣y͓y␣l͓p␣l͓l␣l͓ʋ␣ʋ͓ʋ␣ʃ͓l␣ʃ͓ʃ␣s͓l␣s͓s␣ʂ͓ʈ␣h͓l0D15
0D1F
ക്ല 0D15
0D32
ഗ്ഗ 0D17
0D17
ച്ച 0D1A
0D1A
ച്ഛ 0D1A
0D1B
ട്ട 0D1F
0D1F
ഡ്ഡ 0D21
0D21
ഡ്ഢ 0D21
0D22
ണ്ണ 0D23
0D23
ത്ല 0D24
0D32
ദ്ദ 0D26
0D26
പ്പ 0D2A
0D2A
പ്ല 0D2A
0D32
ബ്ബ 0D2C
0D2C
ബ്ല 0D2C
0D32
മ്ല 0D2E
0D32
റ്റ 0D31
0D31
യ്യ 0D2F
0D2F
ല്പ 0D32
0D2A
ല്ല 0D32
0D32
ല്വ 0D32
0D35
വ്വ 0D35
0D35
ശ്ല 0D36
0D32
ശ്ശ 0D36
0D36
സ്ല 0D38
0D32
സ്സ 0D38
0D38
ഹ്ല 0D39
0D32

Conjoined consonants

Conjuncts where the consonants remain side-by-side, typically merging the shapes of the consonants.

KA+virama+SSAക്ഷ
TA+virama+TAത്തt̪t̪
NA+virama+TAന്തnt̪
MA+virama+MAമ്മmm
Examples of conjuncts formed by conjoining consonants.

Three consonants have very standardised glyphs when they appear in non-initial position, and those glyphs don't merge with the other consonant. They are U+0D2F LETTER YA, U+0D31 LETTER RRA, and U+0D35 LETTER VA. Figure 7 shows them combined with the letter KA.

KA+virama+YAക്യkj
KA+virama+LAക്വkl
KA+virama+RAക്രkr
Standard shaping for 3 conjoined pairs.

The isolated, pre-base shape for RA was introduced by the reformed orthography. In the old orthography RA as the second element in a conjunct was represented by a ligated swash below the initial consonant.

ക്ര ഖ്ര ഗ്ര ഘ്ര ങ്ര
Examples of consonant+RA ligatures in the old orthography.

An exception to the standardised shapes illustrated in Figure 7 is:

YA+virama+YAയ്യjj
Show conjoined conjuncts in the Noto Serif Malayalam font

33
ക്കk͓k␣k͓ʂ␣g͓d␣g͓n␣g͓m␣ŋ͓k␣ŋ͓ŋ␣j͓j␣j͓ɲ␣ɲ͓c␣ɲ͓cʰ␣ɲ͓ɲ␣ɳ͓ʈ␣ɳ͓ɖ␣ɳ͓ɖʰ␣ɳ͓m␣t͓t␣t͓tʰ␣t͓n␣t͓bʰ␣t͓m␣t͓s␣d͓dʰ␣n͓t␣n͓tʰ␣n͓d␣n͓dʰ␣n͓n␣n͓m␣m͓p␣m͓m␣ʃ͓c␣ʃ͓cʰ␣s͓tʰ␣h͓n␣h͓m0D15
0D15
ക്ഷ 0D15
0D37
ഗ്ദ 0D17
0D26
ഗ്ന 0D17
0D28
ഗ്മ 0D17
0D2E
ങ്ക 0D19
0D15
ങ്ങ 0D19
0D19
ജ്ജ 0D1C
0D1C
ജ്ഞ 0D1C
0D1E
ഞ്ച 0D1E
0D1A
ഞ്ഞ 0D1E
0D1E
ണ്ട 0D23
0D1F
ണ്ഡ 0D23
0D21
ണ്മ 0D23
0D2E
ത്ത 0D24
0D24
ത്ഥ 0D24
0D25
ത്ന 0D24
0D28
ത്ഭ 0D24
0D2D
ത്മ 0D24
0D2E
ത്സ 0D24
0D38
ദ്ധ 0D26
0D27
ന്ത 0D28
0D24
ന്ഥ 0D28
0D25
ന്ദ 0D28
0D26
ന്ധ 0D28
0D27
ന്ന 0D28
0D28
ന്മ 0D28
0D2E
മ്പ 0D2E
0D2A
മ്മ 0D2E
0D2E
ശ്ച 0D36
0D1A
സ്ഥ 0D38
0D25
ഹ്ന 0D39
0D28
ഹ്മ 0D39
0D2E

Chillu-style initials

In some fonts the initial consonant in a cluster may take a chillu shape, followed by an ordinary glyph for the second character.

In the Thoolika Traditional Unicode font this applies to the following consonants in initial position.


6
k͓ʃ␣ɳ͓ʃ␣n͓ʃ␣r͓ʃ␣l͓ʃ␣ɭ͓ʃ0D15
0D23
0D28
0D30
0D32
0D33

Figure 9 shows the same sequence of characters in the Thoolika Traditional Unicode font. Note how the shape of the second consonant remains the same as normal - there is no ligation or repositioning. The examples in Figure 9 all use SHA in the second position. Note that the chillu code points are not used here – this is just font styling on normal consonants.

ക്ശ ണ്ശ ന്ശ ര്ശ ല്ശ ള്ശ
ക്ശ ണ്ശ ന്ശ ര്ശ ല്ശ ള്ശ
Chillu-style initials used in a modern font (top) and traditional font (bottom) for consonant clusters.

Clusters with RA

When RA occurs in a cluster, either as a medial consonant or a coda followed by another consonant, there are special rules for rendering. See Medial RA and RA coda for details.

Clusters with RRA

The conjunct റ്റU+0D31 LETTER RRA + U+0D4D SIGN VIRAMA + U+0D31 LETTER RRA is always pronounced tt, eg.

പാറ്റ paːttɐ cockroach

ഉറ്റോർ uttoːr relatives

റ്റോttoːRRA+virama+RRA+O

Until the 1960s, the geminated tt was generally written using just two side-by-side RRA letters, although circumgraphs and pre-base vowel signs would span the digraph as if it were a single unit. For example:

പാററ paːttɐ cockroach

This is problematic in 2 ways:

  1. because there is no virama creating a single typographic unit of the digraph, the decomposed form of circumgraphs has to be used, with parts of the vowel sign after each RRA letter, eg.

    മാറെറാലി maːṯṯoli echo

    റെറാttoRRA+E+RRA+A
  2. the digraph sometimes represents two separate consonants followed by vowel sounds, making the sequence ambiguous, eg.

    ടെംപററി ʈemparari temporary

    ററിrariRRA+RRA+ിI

It would be particularly ambiguous when there are more than 2 RRA characters side by side. For example, compare

ബാറ്ററി baːttɐri battery

ബാറററി baːttɐri?   baːrɐtti?   baːrɐrɐri?

Use in /nta/. When stacked, ൻ്റU+0D7B LETTER CHILLU N + U+0D4D SIGN VIRAMA + U+0D31 LETTER RRA is always pronounced nta, eg. ആൻ്റോ aːntoː proper name

According to the Unicode Standard, an alternative spelling exists without the stack, but this can also lead to ambiguity4506, ie. ആൻേറാ ận̽ēṙā aːntoː

Note that again we had to split the vowel to achieve this spelling.

Consonant length

Gemination/consonant lengthening is very common in Malayalam. It is usually indicated by doubling the consonant, in the same way as for consonant clusters (see Consonant clusters).

അപ്പൻ appan father

തൂക്ക് t̪uːkkɨ̆ to rub, wipe

The sound tt is unusual in that it can be written റ്റU+0D31 LETTER RRA + U+0D4D SIGN VIRAMA + U+0D31 LETTER RRA (see Clusters with RRA).

പാറ്റ paːttɐ cockroach

മാറ്റം maːttɐm change

Consonant sounds to characters

This section maps Malayalam consonant sounds to common graphemes in the Malayalam orthography.

The left column contains ordinary consonants, and the right column contains dedicated syllable-final consonants.

p

consonant U+0D2A LETTER PA

consonant U+0D2B LETTER PHA

b

consonant U+0D2C LETTER BA

consonant U+0D2D LETTER BHA

consonant U+0D24 LETTER TA

t̪ʰ

consonant U+0D25 LETTER THA

t͡ʃ

consonant U+0D1A LETTER CA

t͡ʃʰ

consonant U+0D1B LETTER CHA

consonant U+0D26 LETTER DA

d̪ʰ

consonant U+0D27 LETTER DHA

d͡ʒ/ɟ

consonant U+0D1C LETTER JA

d͡ʒʰ/ɟʰ

consonant U+0D1D LETTER JHA

ʈ

consonant U+0D1F LETTER TTA

ʈʰ

consonant U+0D20 LETTER TTHA

ɖ

consonant U+0D21 LETTER DDA

ɖʰ

consonant U+0D22 LETTER DDHA

k

consonant U+0D15 LETTER KA

chillu consonant ൿU+0D7F LETTER CHILLU K Coda. (Chillu consonant)

consonant U+0D16 LETTER KHA

ɡ

consonant U+0D17 LETTER GA

ɡʰ

consonant U+0D18 LETTER GHA

s

consonant U+0D38 LETTER SA

ʂ

consonant U+0D37 LETTER SSA

ʃ~ɕ

consonant U+0D36 LETTER SHA

ɦ

consonant U+0D39 LETTER HA

final consonant U+0D03 SIGN VISARGA Coda.

m

consonant U+0D2E LETTER MA

final consonant U+0D02 SIGN ANUSVARA Coda.

n

consonant U+0D28 LETTER NA

chillu consonant U+0D7B LETTER CHILLU N Coda. (Chillu consonant)

ɲ

consonant U+0D1E LETTER NYA

ɳ

consonant U+0D23 LETTER NNA

chillu consonant U+0D7A LETTER CHILLU NN Coda. (Chillu consonant)

ŋ

consonant U+0D19 LETTER NGA

ʋ

consonant U+0D35 LETTER VA

consonant U+0D30 LETTER RA

r

consonant U+0D31 LETTER RRA

chillu consonant U+0D7C LETTER CHILLU RR Coda. (Chillu consonant)

vocalic vowel sign U+0D43 VOWEL SIGN VOCALIC R

vocalic independent vowel U+0D0B LETTER VOCALIC R

rɨː

vocalic independent vowel U+0D44 VOWEL SIGN VOCALIC RR Very rare. Used for Sanskrit words.

vocalic independent vowel U+0D60 LETTER VOCALIC RR Very rare. Used for Sanskrit words.

ɻ

consonant U+0D34 LETTER LLLA

l

consonant U+0D32 LETTER LA

chillu consonant U+0D7D LETTER CHILLU L Coda. (Chillu consonant)

ɭ

consonant U+0D33 LETTER LLA

chillu consonant U+0D7E LETTER CHILLU LL Coda. (Chillu consonant)

vocalic independent vowel U+0D0C LETTER VOCALIC L Used in one Sanskrit word only.

lɨː

vocalic independent vowel U+0D61 LETTER VOCALIC LL Very rare.

j

consonant U+0D2F LETTER YA

Encoding choices

This section looks at alternative strategies for typing and storing text in Malayalam, taking into consideration the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

Encoding circumgraphs

The 3 circumgraphs can be written as a single character, or as two characters (in decomposed text).

The single code point per vowel sign is the form preferred by the Unicode Standard and the form in common use for Malayalam. The parts are separated, however, in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches are canonically equivalent.

Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround. In the case of decomposed vowel signs, the order is also important and must be as shown above.

Precomposed Decomposed
U+0D4A VOWEL SIGN O ൊU+0D46 VOWEL SIGN E + U+0D3E VOWEL SIGN AA
U+0D4B VOWEL SIGN OO ോU+0D47 VOWEL SIGN EE + U+0D3E VOWEL SIGN AA
U+0D4C VOWEL SIGN AU ൌU+0D46 VOWEL SIGN E + U+0D57 AU LENGTH MARK

Inappropriate glyph combinations

In some cases, visually similar or identical glyph patterns can be made from a sequence of code points rather than the single code point that Unicode provides. These are not made the same by normalisation, and they are not semantically equivalent. These inappropriate sequences should be avoided because they will cause the meaning of the text to change; searches, matching and other aspects of the text will fail to be understood by the application or the font. In the table below, the single code point on the left should be used, and not the sequence on the right. In some cases, fonts will indicate that there is a problem by forcing the appearance of a dotted circle or otherwise failing to render the text correctly, but this may not always be the case.

Use Do not use
U+0D48 VOWEL SIGN AI െെU+0D46 VOWEL SIGN E + U+0D46 VOWEL SIGN E
U+0D08 LETTER II ഇൗU+0D07 LETTER I + U+0D57 AU LENGTH MARK
U+0D0A LETTER UU ഉൗU+0D09 LETTER U + U+0D57 AU LENGTH MARK
U+0D13 LETTER OO ഒാU+0D12 LETTER O + U+0D3E VOWEL SIGN AA
U+0D10 LETTER AI എെU+0D0E LETTER E + U+0D46 VOWEL SIGN E
U+0D14 LETTER AU ഒൗU+0D12 LETTER O + U+0D57 AU LENGTH MARK

Chillus

In older Unicode text chillu letters were written using the combination C+VIRAMA+ZWJ, but since the introduction of the chillu characters in Unicode v5.1 these new atomic characters are recommended. The sequences are not canonically equivalent.

The default Noto font used for this page doesn't render the K glyph sequence in this table the same as the atomic character, but older fonts such as Malayalam MN and ThoolikaTraditionalUnicode do.

Atomic (recommended) Decomposed (do not use)
ൿU+0D7F LETTER CHILLU K ക്‍U+0D15 LETTER KA + U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER
U+0D7A LETTER CHILLU NN ണ്‍U+0D23 LETTER NNA + U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER
U+0D7B LETTER CHILLU N ന്‍U+0D28 LETTER NA + U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER
U+0D7C LETTER CHILLU RR ര്‍U+0D30 LETTER RA + U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER
U+0D7D LETTER CHILLU L ല്‍U+0D32 LETTER LA + U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER
U+0D7E LETTER CHILLU LL ള്‍U+0D33 LETTER LLA + U+0D4D SIGN VIRAMA + U+200D ZERO WIDTH JOINER

Codepoint order

When 2 vowel signs are used for a circumgraph, the encoded order of the combining marks should match the displayed order, left to right.

Numbers, dates, currency, etc

This section describes typographic features related to digits, dates, currencies, etc.

Digits

There is a set of Malayalam digits, but they are not in use for modern texts.


10
infreq.00D66
infreq.10D67
infreq.20D68
infreq.30D69
infreq.40D6A
infreq.50D6B
infreq.60D6C
infreq.70D6D
infreq.80D6E
infreq.90D6F

Older texts also used the following additional numeric characters.


16
archaic{10}0D70
archaic{100}0D71
archaic{1000}0D72
archaic¼0D73
archaic½0D74
archaic¾0D75
archaic{1/160}0D58
archaic{1/40}0D59
archaic{3/8}0D5A
archaic{1/20}0D5B
archaic{1/10}0D5C
archaic{3/20}0D5D
archaic{1/5}0D5E
archaic{1/16}0D76
archaic{1/8}0D77
archaic{3/16}0D78

U+0D4F SIGN PARA was used historically to measure rice.

According to the Unicode Standard, Malayalam also used the following characters in the Common Indic Number Forms block.5


3
archaic¼␣½␣¾A830
archaic A831
archaic A832

Ordinals

U+0D02 SIGN ANUSVARA may be attached to digits to indicate ordinal numbers5, eg.

355ാം 355th

Dates


rare0D79

U+0D79 DATE MARK can be used like the 'th' in English dates, but it use is fading in modern text. 5

Text direction

Malayalam text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Malayalam orthography described here.

Glyph shaping & positioning

This section describes typographic features related to font/writing styles, cursive text, context-based shaping, context-based positioning, letterform slopes, weights & italics, and case & other character transforms.

You can experiment with examples using the Malayalam character app.

Context-based shaping & positioning

Are special glyph forms needed, depending on the context in which a character is used? Do glyphs interact in some circumstances? Are there requirements to position diacritics or other items specially, depending on context? Does the script have multiple diacritics competing for the same location relative to the base?

Malayalam is not cursive, but display technology needs to provide shaping for conjunct formation.

Display technology must correctly position pre-base vowels to the left of the consonant or consonant cluster, and place the separate glyphs of 2-part vowels around those also (see Pre-base vowel signs and Circumgraphs).

It must do a similar thing for display of RA using the orthographic reforms (see RA coda, Medial RA, and Clusters with RRA).

Vowel ligatures & orthographic reforms

Like Tamil, in the traditional version of the script Malayalam consonants combining with U+0D41 VOWEL SIGN U and U+0D42 VOWEL SIGN UU tend to produce ligated forms.

During orthographic reforms in the 1970s and 1980s a simpler approached was introduced, to make printing easier. Both vowels were represented by an unchanging, post-base vowel sign as shown below. No change is needed to the underlying code points in Unicode, this is purely a font difference.

The list below shows traditional forms for each of the basic consonants with U+0D41 VOWEL SIGN U. The top line shows the traditional form, and the line just below (in the darker colour) shows the modern version.


36
പുപു pu  0D2A
0D41
ഫുഫു pʰu  0D2B
0D41
ബുബു bu  0D2C
0D41
ഭുഭു bʰu  0D2D
0D41
തുതു tu  0D24
0D41
ഥുഥു tʰu  0D25
0D41
ദുദു du  0D26
0D41
ധുധു dʰu  0D27
0D41
ടുടു ʈu  0D1F
0D41
ഠുഠു ʈʰu  0D20
0D41
ഡുഡു ɖu  0D21
0D41
ഢുഢു ɖʰu  0D22
0D41
കുകു ku  0D15
0D41
ഖുഖു kʰu  0D16
0D41
ഗുഗു gu  0D17
0D41
ഘുഘു gʰu  0D18
0D41
    
ചുചു cu  0D1A
0D41
ഛുഛു cʰu  0D1B
0D41
ജുജു ju  0D1C
0D41
ഝുഝു jʰu  0D1D
0D41
    
വുവു ʋu  0D35
0D41
സുസു su  0D38
0D41
ഷുഷു ʂu  0D37
0D41
ശുശു ʃu  0D36
0D41
ഹുഹു hu  0D39
0D41
    
മുമു mu  0D2E
0D41
നുനു nu  0D28
0D41
ണുണു ɳu  0D23
0D41
ഞുഞു ɲu  0D1E
0D41
ങുങു ŋu  0D19
0D41
    
രുരു ru  0D30
0D41
റുറു ṙu  0D31
0D41
ഴുഴു ɻu  0D34
0D41
ലുലു lu  0D32
0D41
ളുളു ɭu  0D33
0D41
യു  yu  0D2F
0D41

This next list shows the same, but the consonant is followed by U+0D42 VOWEL SIGN UU.


40
പൂപൂ puː  0D2A
0D42
ഫൂഫൂ pʰuː  0D2B
0D42
ബൂബൂ buː  0D2C
0D42
ഭൂഭൂ bʰuː  0D2D
0D42
തൂതൂ tuː  0D24
0D42
ഥൂഥൂ tʰuː  0D25
0D42
ദൂദൂ duː  0D26
0D42
ധൂധൂ dʰuː  0D27
0D42
ടൂടൂ ʈuː  0D1F
0D42
ഠൂഠൂ ʈʰuː  0D20
0D42
ഡൂഡൂ ɖuː  0D21
0D42
ഢൂഢൂ ɖʰuː  0D22
0D42
കൂകൂ kuː  0D15
0D42
ഖൂഖൂ kʰuː  0D16
0D42
ഗൂഗൂ guː  0D17
0D42
ഘൂഘൂ gʰuː  0D18
0D42
     
ചൂചൂ cuː  0D1A
0D42
ഛൂഛൂ cʰuː  0D1B
0D42
ജൂജൂ juː  0D1C
0D42
ഝൂഝൂ jʰuː  0D1D
0D42
     
വൂവൂ ʋuː  0D35
0D42
സൂസൂ suː  0D38
0D42
ഷൂഷൂ ʂuː  0D37
0D42
ശൂശൂ ʃuː  0D36
0D42
ഹൂഹൂ huː  0D39
0D42
     
മൂമൂ muː  0D2E
0D42
നൂനൂ nuː  0D28
0D42
ണൂണൂ ɳuː  0D23
0D42
ഞൂഞൂ ɲuː  0D1E
0D42
ങൂങൂ ŋuː  0D19
0D42
     
രൂരൂ ruː  0D30
0D42
റൂറൂ ṙuː  0D31
0D42
ഴൂഴൂ ɻuː  0D34
0D42
ലൂലൂ luː  0D32
0D42
ളൂളൂ ɭuː  0D33
0D42
യൂയൂ yuː  0D2F
0D42

See a table of all consonants and all vowel signs.
The table allows you to test results for various fonts.

Explicit shaping controls

Figure 10 summarises the various ways in which a consonant cluster can be rendered (on the right), and on the left indicates which code sequences may produce those forms.

KA+virama+RA+ക്രkrɐ
KA+virama+ZWJZWNJ+RAക്‌രkrɐ
KA+ZWJZWNJ+virama+RAക്രkrɐ
Uses of ZWNJ (zero-width non-joiner) in Malayalam where, by default, the font produces a ligature for the sequence pronounced krɐ.

Assuming that you have fonts that produce the expected behaviours, the Unicode Standard describes the use of the joiner characters ‌U+200C ZERO WIDTH NON-JOINER (ZWNJ) and ‍U+200D ZERO WIDTH JOINER (ZWJ) as follows:

Typographic units

Word boundaries

Are words separated by spaces, or other characters? Are there special requirements when double-clicking on the text? Are words hyphenated?

The concept of 'word' is difficult to define in any language (see What is a word?). Here, a word is a vaguely-defined, but recognisable semantic unit that is typically smaller than a phrase and may comprise one or more syllables.

Spaces are often used between words, but it is not uncommon for writers to use spacing to indicate phonological pauses, rather than lexical boundaries.7

Sequences of characters between spaces are often quite long in Malayalam, eg. അറിയപ്പെടുന്നുവെങ്കിലുംạ̄ṙiyp͓peʈun͓nuʋeŋ͓kilum̽

Graphemes

A grapheme is a user-perceived unit of text. Text operations that use graphemes as a unit of text include line-breaking, forwards deletion, cursor movement & selection, character counts, text spacing, text insertion, justification, case conversions, and sorting. The Unicode Standard uses generalised rules to define 'grapheme clusters', which approximate the likely grapheme boundaries in a writing system, however they don't work well with many complex scripts.

The term orthographic syllable is not clearly defined in the Unicode Standard. In the orthography notes on this site we define it to mean a typographic unit that includes more than one grapheme cluster. This is commonly the case for Brahmi-derived scripts, such as for Devanagari conjuncts, or Balinese stacks. Orthographic syllables do not correspond to phonetic syllables.

In many cases, grapheme clusters can be used to segment Malayalam words, since the virama is often visible and in principle allows for a segment break immediately (like Tamil). However, consonant cluster sequences often form conjuncts which should not be broken during edit operations such as letter-spacing, first-letter highlighting, and in-word line breaking. For the operations mentioned, one needs to segment the text using orthographic syllables.

The choice of visible virama vs. conjunct tends to vary from sequence to sequence and from font to font, but given that there is only one Malayalam virama, the application needs to interpret the virama in two different ways for segmentation: (1) as a simple vowel-killer, and (2) as a conjunct initiator. Choosing the right behaviour requires the application to understand the rendered glyphs, but this is asking a lot of an application.

The Malayalam virama (chandrakkala) is U+0D4D SIGN VIRAMA, which has an Indic Syllabic Category of Virama.

Grapheme clusters

Base ZW(N)J? Combining_mark* ZW(N)J?

Combining marks may include one of the following types of character.

  1. Dependent vowels [13] (see combiningvowels) There is usually only one vowel sign per base consonant, however in decomposed text circumgraphs are represented by 2 combining characters. Vowel signs may also be attached to numbers (see vowelsigns).
  2. Final consonants [2] (see Finals) One of 2 possible combining marks, at the end of a grapheme cluster sequence. May also occur after independent vowels.
  3. Virama (chandrakkala) [1] (see Anusvara & visarga and absence) The chandrakkala is used between consonants in a cluster, however sometimes it is simply rendered as a diacritic over the non-final consonant(s) in a cluster, but other times it causes conjunct creation and is invisible (see Larger typographic units). It is also used (usually at the end of a word) to indicate the half-u vowel sound. In older texts this would follow the U vowel sign. Finally, it may also be used to modify a vowel sound, such as in the 3rd example below.

Placing a ZWNJ before the chandrakkala is supposed to produce the modern 'open' form of the conjunct in fonts that would otherwise produce a traditional conjunct (see Explicit shaping controls). A ZWNJ can also be used after a chandrakkala to prevent the formation of a conjunct form.

ZWJ can be used before the chandrakkala to produce a traditional conjunct form in fonts that produce the open form by default but have the glyphs for the traditional forms too (see Explicit shaping controls).

The following examples show a variety of grapheme clusters, several of which show the virama used in a different way from its use in other words:

Click on the text version of these words to see more detail about the composition.

കാഴ്ച käːɻt͡ʃɐ sight, vision
കോത് koːt̪ɨ̆ to cut
എ്ന്നാ ənnaː on which day?
നമ്മൾ n̪ammaɭ we
നല്ലത് n̪ɐllɐd̪ɨ̆ goodness
നമസ്തേ n̪amast̪eː hello

In many cases a non-final consonant in a cluster is these days rendered using a special chillu codepoint, rather than a consonant with virama. Chillus are also used for word final consonants that are not followed by a vowel. These chillu characters stand alone as grapheme clusters. See the example below, where the 2nd and final graphemes are chillus.

പെൻസിൽ pensil pencil

Larger typographic units

(Consonant Chandrakkala)* Grapheme_cluster

Malayalam commonly stacks or conjoins glyphs, to form conjuncts. The conjuncts represent consonant clusters.

Grapheme clusters terminate after a sequence of marks that ends with a chandrakkala, but editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.

Where conjuncts appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with U+0D4D SIGN VIRAMA, and the final grapheme cluster begins with a consonant.

The following are examples.

Click on the text version of these words to see more detail about the composition.

നമ്മൾ n̪ammaɭ we
നല്ലത് n̪ɐllɐd̪ɨ̆ goodness
അങ്കക്കളരി aŋɡakkaɭaɾi arena

Complicating factors

Malayalam has only one virama code point, but it can be used to indicate a conjunct and disappear, or it may simply be displayed as a diacritic over the non-final consonant(s) in a cluster. Often both approaches will appear in the same cluster. The codepoints in memory give no indication as to which will result – that may also vary by font. There is no additional code point, like in some Southeast Asian scripts, that users can choose to indicate that they want a visible chandrakkala rather than a conjunct.

The problem is that, in principle, you would expect line-breaks, etc. to be allowed after a consonant with a visible chandrakkala, just like in Tamil. But without a way to distinguish how the font is rendering the codepoints, this is not possible. Therefore, applications may keep cluster components together for Malayalam when the chandrakkala is visible.

The following example shows 3 chandrakkala characters that are used in different ways. The first just appears above its base, the second creates a conjunct and disappears, and the third represents a vowel sound (a completely different usage). Note that the app that generated the orthographic syllable keeps everything together as one unbreakable typographic unit.

Click on the text version of this word to see more detail about the composition.

തുടയ്ക്ക് t̪uʈajkkɨ̆ to wipe
Same word with orthographic syllable rules applied.

Treatment as grapheme clusters rather than conjuncts can also affect vowel sign positioning. An illustration of this can be seen when a consonant cluster is followed (phonetically) by a vowel rendered as a vowel sign glyph that is displayed to the left of the base. For example, observe below how the pre-base vowel U+0D47 VOWEL SIGN EE appears to the left of the tr conjunct, but doesn't get rendered at the beginning of the str cluster.

Click on the text version of this word to see more detail about the composition.

ഓസ്ട്രേലിയ oːsʈreːlijä Australia
Same word with orthographic syllable rules applied.

Punctuation & inline features

This section describes typographic features related to word boundaries, phrase & section boundaries, bracketed text, quotations & citations, emphasis, abbreviation, ellipsis & repetition, inline notes & annotations, other punctuation, and other inline text decoration.

Phrase & section boundaries

What characters are used to indicate the boundaries of phrases, sentences, and sections?


8
! !!0021
, ,,002C
: ::003A
; ;;003B
. ..002E
archaic.|0964
archaic 0965
? ??003F

Malayalam uses western punctuation.

phrase

,U+002C COMMA

;U+003B SEMICOLON

:U+003A COLON

U+0964 DEVANAGARI DANDA

sentence

.U+002E FULL STOP

?U+003F QUESTION MARK

!U+0021 EXCLAMATION MARK

U+0965 DEVANAGARI DOUBLE DANDA

U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA are used in older texts to separate phrases.5

Bracketed text


both
(((0028
)))0029

Malayalam commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(U+0028 LEFT PARENTHESIS

)U+0029 RIGHT PARENTHESIS

Quotations & citations

What characters are used to indicate quotations? Do quotations within quotations use different characters? What characters are used to indicate dialogue? Are the same mechanisms used to cite words, or for scare quotes, etc? What about citing book or article names?


4
2018
2019
201C
201D

Malayalam texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

  start end
initial

U+201C LEFT DOUBLE QUOTATION MARK

U+201D RIGHT DOUBLE QUOTATION MARK

Line & paragraph layout

This section describes typographic features related to line breaking & hyphenation, text alignment & justification, text spacing, baselines, line height, counters, lists, and styling initials.

Line breaking & hyphenation

Are there special rules about the way text wraps when it hits the end of a line? Does line-breaking wrap whole 'words' at a time, or characters, or something else (such as syllables in Tibetan and Javanese)? What characters should not appear at the end or start of a line, and what should be done to prevent that? Is hyphenation used, or something else? What rules are used? What difficulties exist?

Spaces provide the main line break opportunities, however Malayalam is an agglutinative language and Malayalam words can be long. This can lead to large gaps during justification, and sometimes words that are longer than the available column width, so it is desirable to also hyphenate words.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show (default) line-breaking properties for characters in the modern Malayalam orthography.

The following list gives examples of typical behaviours for some of the characters used in modern Malayalam. Context may affect the behaviour of some of these and other characters.

Click/tap on the Malayalam characters to show what they are.

  • “ ‘ (   should not be the last character on a line.
  • ” ’ ) . , ; ! ? । ॥ %   should not begin a new line.
  •   should be kept with any number, even if separated by a space or parenthesis.

In-word line-breaks

'Hyphenation' here refers to an extra set of rules applied after the basic line-break algorithm to split words at syllable or morphological boundaries in order to improve the layout of a paragraph. Hyphenation may or may not be indicated using a visual marker at the end or start of a line, however it is commonly marked by a hyphen or other glyph.

Because of the length of Malayalam words, in-word line-breaks are very common and needed during layout, especially in narrow columns, such as newsprint.

The breaks mostly takes place at syllable boundaries, however there are also occasional exceptions and special cases. Usually, no visual marker is associated with the mid-word line break.3

Newspaper clipping

Text from the Malayalam newpaper, Deshabhimani, showing hyphenated words with yellow highlighting.

Baselines, line height, etc.

Does the script have special requirements for baseline alignment between mixed scripts and in general? Is line height special for this script? Are there other aspects that affect line spacing, or positioning of items vertically within a line?

Malayalam uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Malayalam characters have ascenders and descenders, and combining marks appear above and below the lettters. However, generally speaking the extensions involved don't extend far beyond those of Latin text.

To give an approximate idea, Figure 12 compares Latin and Malayalam glyphs from Noto fonts. The basic height of Malayalam letters is typically around the Latin x-height, however extenders and combining marks reach slightly beyond the Latin ascenders and descenders, creating a need for slightly larger line spacing.

Xhqxളീൻകൂട്ടസ്സട് Xhqxളീൻകൂട്ടസ്സട്
Font metrics for Latin text compared with Malayalam glyphs in the Noto Serif Malayalam (top) and Noto Sans Malayalam (bottom) fonts.

Figure 13 shows similar comparisons for the Malayalam MN and Kartika fonts.

Xhqxളീൻകൂട്ടസ്സട് Xhqxളീൻകൂട്ടസ്സട്
Latin font metrics compared with Malayalam glyphs in the Malayalam MN (top) and Kartika (bottom) fonts.

Counters, lists, etc.

Are there list or other counter styles in use? If so, what is the format used? Do counters need to be upright in vertical text? Are there other aspects related to counters and lists that need to be addressed?

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Malayalam orthography uses a native numeric style.

Numeric

The malayalam numeric style is decimal-based and uses these digits.2


10
infreq.110D67
infreq.220D68
infreq.330D69
infreq.440D6A
infreq.550D6B
infreq.660D6C
infreq.770D6D
infreq.880D6E
infreq.990D6F
infreq.000D66

Examples:


12
infreq.10D67
infreq.20D68
infreq.30D69
infreq.40D6A
൧൧ 110D67
0D67
൨൨ 220D68
0D68
൩൩ 330D69
0D69
൪൪ 440D6A
0D6A
൧൧൧ 1110D67
0D67
0D67
൨൨൨ 2220D68
0D68
0D68
൩൩൩ 3330D69
0D69
0D69
൪൪൪ 4440D6A
0D6A
0D6A

Prefixes and suffixes

Malayalam commonly uses a full stop + space as a suffix.

Examples:

൧. ൨. ൩. ൪. ൬.
Separator for Malayalam list counters: full stop + space.

Page & book layout

This section describes typographic features related to general page layout & progression; grids & tables, notes, footnotes, etc, forms & user interaction, and page numbering, running headers, etc.

References & sources

1Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0

2Richard Ishida, Ready-made Counter Styles

3Santhosh Thottingal, Personal correspondence

4Unicode Consortium, The Unicode Standard, Version 13.0, Chapter 12.9: South and Central Asia-I, Malayalam, 508-515, ISBN 978-1-936213-16-0.

5Unicode Consortium, The Unicode Standard, Version 16.0, Chapter 12.9: South and Central Asia-I, Malayalam

6Unicode Consortium, Unicode Line Breaking Algorithm (UAX#14)

7Wikipedia, Malayalam script

See recent changes.  •  Make a comment.  •  Licence CC-By © r12a.