Dhivehi

Thaana orthography notes

Updated 27 January, 2024

This page brings together basic information about the Thaana script and its use for the Maldivian language, Dhivehi. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Dhivehi using Unicode.

Referencing this document

Richard Ishida, Dhivehi (Thaana) Orthography Notes, 27-Jan-2024, https://r12a.github.io/scripts/thaa/dv

Sample

Select part of this sample text to show a list of characters, with links to more details. Source
Change size:   28px

1 ވަނަ މާއްދާ ހުރިހާ އިންސާނުންވެސް ދުނިޔެއަށް އުފަންވަނީ، މިނިވަންކަމުގައި، ހަމަހަމަ ޙައްޤުތަކަކާއެކު، ހަމަހަމަ ދަރަޖައެއްގައި ކަމޭހިތެވިގެންވާ ބައެއްގެ ގޮތުގައެވެ. ހެޔޮ ވިސްނުމާއި، ހެޔޮބުއްދީގެ ބާރު އެމީހުންނަށް ލިބިގެންވެއެވެ. އަދި އެކަކު އަނެކަކާމެދު އެމީހުން މުޢާމަލާތް ކުރަންވާނީ، އުޚުއްވަތްތެރިކަމުގެ ރޫޙެއްގައެވެ.

2 ވަނަ މާއްދާ ހަމަ ކޮންމެ މީހަކަށްމެ، މިޤަރާރުގައި ބަޔާންކޮށްފައިވާ ހުރިހާ ޙައްޤުތަކަކާއި މިނިވަންކަމުގެ މިންގަނޑުތަކެއް ހޯދުމާއި، ލިބިގަތުމުގެ ޙައްޤު ލިބިގެންވެއެވެ. އެޙައްޤުތަކާއި އެމިންގަނޑުތައް ލިބިދެނީ، ނަސްލާއި، ކުލައާއި، ޖިންސާއި، ބަހާއި، ދީނާއި، ސިޔާސީގޮތުން ނުވަތަ އެހެންވެސް ކަމަކާ ގުޅޭގޮތުން ވިސްނުން ގެންގުޅޭ ގޮތާއި، ވަކި ޤައުމަކަށް ނުވަތަ މުޖުތަމަޢަކަށް ނިސްބަތްވުމާއި، މުދާ ލިބިހުރުމާއި، އުފަންވީ ޢާއިލާއެއްގެ ސަބަބުން ޤަދަރުވެރިވުމާއި، އެހެންވެސް ސަބަބަކާ ހުރެ ޤަދަރުވެރިވުންފަދަ، އެއްވެސް ބާވަތެއްގެ މިންގަނޑަކުން ތަފާތު ކުރުމެއް ނެތިއެވެ. އަދި މީހަކު ނިސްބަތްވާ ޤައުމަކީ، ނުވަތަ ސަރަޙައްދަކީ ސިޔާސީ ގޮތުން، ނުވަތަ އެޤައުމުގެ ބާރު ހިނގާ ސަރަޙައްދުގެ މިންވަރުގެ ގޮތުން، ނުވަތަ އެޤައުމުގެ ބައިނަލްއަޤްވާމީ ހައިސިއްޔަތުގެ ގޮތުން ނަމަވެސް، އެއްވެސް ތަފާތުކުރުމެއް ގެންގުޅެގެން ނުވާނޭ ގޮތުންނެވެ. އެޤައުމަކީ، ނުވަތަ މީހަކު ނިސްބަތްވާ އެސަރަޙައްދަކީ، މިނިވަން ޤައުމަކަށް ވިޔަސް، ނުވަތަ އެކުވެރި ދައުލަތްތަކުގެ ބެލުމުގެ ދަށުން އެހެން ޤައުމަކުންބަލަހައްޓަމުންދާ ޤައުމެއް ކަމުގައިވިޔަސް، ނުވަތަ އަމިއްލަ ވެރިކަމެއް ނެތް ޤައުމެއް ކަމުގައި ވިޔަސް، ނުވަތަ އެހެންވެސް ގޮތަކުން ސިޔާދަތީ ބާރު މަޙްދޫދު ކުރެވިގެންވާ ޤައުމަކަށް ވީނަމަވެހެވެ.

Usage & history

The Thaana script is used for writing the Maldivian language, also known as Dhivehi, spoken by about 370,000 people in the Maldives and in Maldivian communities in India.

ތާނަ tə̄nə təːnə ThaanaIt is thought that speakers of Dhivehi have written their language for over two thousand years, and that writing was developed by  Maldivian Buddhist monks translating the Buddhist scriptures.

Over time, the script evolved, slanting the letters by 45 degrees and adding spaces between words. The earliest known sample of the Thaana script (rather than the Dhives Akuru alphabet) is inscribed on the main Friday mosque of the island and dates back to AD 1599.

Sources: Scriptsource, Wikipedia.

Basic features

Dhivehi is an alphabetic abjad. All vowels are written, but as diacritics above the consonants. See the table to the right for a brief overview of features for the modern Dhivehi orthography using the Thaana script.

Thaana text is written horizontally, right to left, but unlike Arabic or N'Ko, the text is not cursive. Multi-digit numbers are displayed left-to-right. Words are separated by spaces. The script is monocameral. The visual forms of letters don't usually interact.

❯ consonantSummary

Dhivehi has 24 basic consonant letters, but the repertoire also contains 14 extra consonants to represent the sounds of Arabic and English.

❯ basicV

Thaana is an alphabet, but post-consonant vowels are written using 10 combining marks (no letters).

Diphthongs are written as a vowel diacritic over a consonant followed by a standalone vowel.

Thaana represents standalone vowels using އ [U+0787 THAANA LETTER ALIFU] as a base, to which vowel diacritics are attached.

A diacritic is used to indicate consonant clusters, or the absence of a vowel after a consonant base.

Numbers use ASCII digits.

Punctuation mixes ASCII and Arabic characters.

Character index

Letters

Show

Basic consonants

ޕ␣ބ␣ތ␣ދ␣ޗ␣ޖ␣ޓ␣ޑ␣ކ␣ގ␣ފ␣ވ␣ސ␣ޒ␣ށ␣ހ␣މ␣ނ␣ޏ␣ރ␣ލ␣ޅ␣ޔ␣އ

Extended consonants

ޠ␣ޟ␣ޤ␣ޘ␣ޛ␣ޞ␣ޡ␣ޝ␣ޜ␣ޚ␣ޣ␣ޙ␣ޢ␣ޥ

Other

ޕ

Not used for modern Dhivehi

ޱ

Combining marks

Show

Vowels

ި␣ީ␣ު␣ޫ␣ެ␣ޭ␣ަ␣ާ␣ޮ␣ޯ

Other

ް

Punctuation

Show
،␣؛␣؟␣“␣”

ASCII

(␣)␣.␣:␣!

Other

Show

To be investigated

,␣/␣;␣[␣]␣⹁
Items to show in lists

Phonology

These are sounds for the Dhivehi language.

Click on the sound groups to see where else in the document each of the sounds are referred to.

Phones in a lighter colour are non-native or allophones. Source wp.

Vowel sounds

i u e o ə ə əː

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar uvular pharyngeal glottal
stops p b t d     ʈ ɖ   k ɡ q    
pre-nasalised ᵐb ⁿd
    ᶯɖ   ᵑɡ      
ejective                  
affricate     t͡ʃ d͡ʒ              
fricatives f θ ð s z ʃ ʒ ʂ ɕ x ɣ   ħ ʕ h
ejective                  
nasal m   n   ɳ ɲ ŋ    
approximant ʋ w l   ɭ j      
trill/flap     r      

Tone

Dhivehi is not a tonal language.

Structure

Native Maldivian words do not allow initial consonant clusters. The syllable structure is (C)V(C), ie. one vowel with the option of a consonant in the onset and/or coda. This affects the introduction of loanwords, such as is.kuːl from English school.wp

Vowels

Vowel summary table

The following table summarises the main vowel to character assigments.

Simple:
ި␣ީ␣ ␣ު␣ޫ
ެ␣ޭ␣ ␣ޮ␣ޯ
ަ␣ާ
Standalone:
އ

For additional details see vowel_mappings.

Vowel diacritics

All vowels are always written, but as diacritics above or below the consonant they follow.

ި␣ީ␣ު␣ޫ␣ެ␣ޭ␣ަ␣ާ␣ޮ␣ޯ

Diphthongs & glides

Diphthongs are written as a vowel diacritic over a consonant followed by a standalone vowel.

ވައި

މިއުޒިކް

ސައުވީސް

Standalone vowels

Thaana represents standalone vowels using އ [U+0787 THAANA LETTER ALIFU] as a base, to which vowel diacritics are attached.

އާއިލާ

އިރުގައި

Vowel length

tbd

Nasalisation

tbd

Vowel absence

ް

The diacritic ◌ް [U+07B0 THAANA SUKUN​] indicates that there is no vowel following the consonant it sits on. This is always used, with one exception: when ށ [U+0781 THAANA LETTER SHAVIYANI] is written with no diacritic, this indicates prenasalization of a following stop, eg. ކަނޑު It is the only case where a letter can appear without a diacritic.

Vowel sounds to characters

This section maps Dhivehi vowel sounds to common graphemes in the Thaana orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc.

Plain vowels

Consonants

Consonant summary table

The following table summarises the main consonant to character assigments.

Stops
ޕ␣ބ␣ތ␣ޠ␣ދ␣ޟ␣ޓ␣ޑ␣ކ␣ގ␣ޤ␣އް␣ށް␣ތް
ނޑ
Affricates
ޗ␣ޖ
Fricatives
ފ␣ވ␣ޘ␣ޛ␣ސ␣ޒ␣ޞ␣ޡ␣ށ␣ޝ␣ޜ␣ޚ␣ޣ␣ޙ␣ޢ␣ހ
Nasals
މ␣ނ␣ޏ
Other
ޥ␣ރ␣ލ␣ޅ␣ޔ

For additional details see consonant_mappings.

Basic consonants

For writing the languages of the Maldives there are 24 consonants in this block.

ޕ␣ބ␣ތ␣ދ␣ޓ␣ޑ␣ކ␣ގ
ޗ␣ޖ
ފ␣ވ␣ސ␣ޒ␣ށ␣ހ
މ␣ނ␣ޏ
ރ␣ލ␣ޅ␣ޔ␣އ

އ [U+0787 THAANA LETTER ALIFU] has no sound of its own, and is used in a number of situations, including as a support for standalone vowels (see standalone), to indicate gemination (see gemination), and to produce a glottal stop at the end of a word (see glottal).

Repertoire extension

A further 14 dotted versions of the normal consonants are available for transcribing Arabic sounds and the English sound ʒ (ޜ).

ޠ␣ޟ␣ޤ␣ޘ␣ޛ␣ޞ␣ޡ␣ޝ␣ޜ␣ޚ␣ޣ␣ޙ␣ޢ␣ޥ

Glottal stop

Several combinations of characters at the end of a word indicate a glottal stop.

އް [U+0787 THAANA LETTER ALIFU + U+07B0 THAANA SUKUN] is the most commond,566 (however, Wiktionary sometimes transcribes this as h).

އެކެއް

A word-final ށް [U+0781 THAANA LETTER SHAVIYANI + U+07B0 THAANA SUKUN] or ތް [U+078C THAANA LETTER THAA + U+07B0 THAANA SUKUN], also indicate a glottal stop, except that the latter also adds j before the stop.

ރަށް

ރަތް

Prenasalisation

Dhivehi, like Sinhala, has prenasalised stops that contrast with sequences of nasal plus stop.

These are not always written, but when they are the stop is preceded by ނ [U+0782 THAANA LETTER NOONU]. No sukun is used (in fact, this is the only situation where a consonant character isn't followed by a diacritic).

ކަނޑު

ކަޑު

Obsolete consonant

ޱ [U+07B1 THAANA LETTER NAA] was abolished from Maldivian official documents around 1953, but it is still seen in reprints of old books like the Bodu Tartheebu, and is used by the people of Addu Atoll and Fuvahmulah.

ޱ

Onsets

tbd

Finals

tbd

Consonant clusters

The diacritic ◌ް [U+07B0 THAANA SUKUN] attached to a consonant signifies that it is not followed by a vowel.

Gemination

Gemination is common in Dhivehi. It is mostly written using އް [U+0787 THAANA LETTER ALIFU + U+07B0 THAANA SUKUN] before the doubled consonant.

ބައްޓެއް

However, if the consonant in question is a nasal, it is preceded instead by ން [U+0782 THAANA LETTER NOONU + U+07B0 THAANA SUKUN].d,566

އެންމެ

Gemination is also produced after ށް [U+0781 THAANA LETTER SHAVIYANI + U+07B0 THAANA SUKUN] and ތް [U+078C THAANA LETTER THAA + U+07B0 THAANA SUKUN], except that the latter also adds j before the doubled consonant.

އަށްޑިހަ

އަތްޕުޅު

Consonant sounds to characters

This section maps Dhivehi consonant sounds to common graphemes in the Thaana orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, Sanskrit, etc.

Stops

Affricates

Fricatives

Nasals

 

ނ [U+0782 THAANA LETTER NOONU] prenasalisation is indicated by this letter when it occurs without a vowel diacritic.

Other

Numbers, dates, currency, etc.

See type samples.

Either european digits or arabic-indic digits are used.

Text direction

Thaana script is written horizontally and right-to-left in the main, but as with most RTL scripts, numbers and embedded LTR script text are written left-to-right (producing 'bidirectional' text).

7.2.2 ބައްތެލި
The section number in this example is read LTR, ie 7.2.2, but the Thaana word following is RTL.

Show default bidi_class properties for characters in the Dhivehi orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: font/writing styles; cursive text; context-based shaping; context-based positioning; letterform slopes, weights, & italics; case & other character transforms.

Experiment with examples using the Thaana character app.

The Thaana script is not cursive, and involves no significant context-based shaping or positioning.

The script is unicameral and needs no transforms to convert between code points.

Context-based shaping & positioning

Thaana letters don't interact, so no special shaping is needed.

Base characters carry only a single combining mark.

Graphemes

Dhivehi graphemes are straightforward, and can be mapped to Unicode grapheme clusters.

Grapheme clusters

Base Combining_mark?

Nearly all base letters in Dhivehi are accompanied by a combining mark to indicate a vowel (or absence of a vowel). There is only one mark per base. All such sequences conform to Unicode grapheme clusters.

Click on the text version of this word to see more detail about the composition.

ލަކްޝަދީބު
އެކެއް

Punctuation & inline features

Word boundaries

Words are separated by spaces.

Phrase & section boundaries

See type samples.

،␣:␣؛␣.␣؟␣!

Dhivehi punctuation uses a mixture of western and Arabic punctuation. The latter includes, in particular, ، [U+060C ARABIC COMMA], ؛ [U+061B ARABIC SEMICOLON] and ؟.

phrase

، [U+060C ARABIC COMMA]

؛ [U+061B ARABIC SEMICOLON]

: [U+003A COLON]

sentence

. [U+002E FULL STOP]

؟

! [U+0021 EXCLAMATION MARK]

In the sample text at the top of the page Arabic commas and ASCII full stops are mixed together.

އުފަންވަނީ،
An excerpt from the sample text above showing the use of an arabic comma.

Bracketed text

See type samples.

(␣)

Dhivehi commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Mirrored characters

The words 'left' and 'right' in the Unicode names for parentheses, brackets, and other paired characters should be ignored. LEFT should be read as if it said START, and RIGHT as END. The direction in which the glyphs point will be automatically determined according to the base direction of the text.

a > b > c
ا > ب > ج
Both of these lines use > [U+003E GREATER-THAN SIGN], but the direction it faces depends on the base direction at the point of display.

The number of characters that are mirrored in this way is around 550, most of which are mathematical symbols. Some are single characters, rather than pairs. The following are some of the more common ones.

(␣)␣<␣>␣[␣]␣{␣}

Quotations & citations

”␣“
  start end
initial 201D 201C

The default quote marks are 201D at the start, and 201C at the end. Unlike the parentheses, these characters are not mirrored during display. This means that LEFT means use on the left, and RIGHT means use on the right, producing a sequential order that is the opposite of the English usage.

Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

Line & paragraph layout

Line breaking & hyphenation

The primary break-point for line-wrapping is the inter-word space.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show (default) line-breaking properties for characters in the modern Dhivehi orthography.

The following list gives examples of typical behaviours for some of the characters used in modern Dhivehi. Context may affect the behaviour of some of these and other characters.

Click/tap on the characters to show what they are.

  • “ ‘ (   should not be the last character on a line.
  • ” ’ ) . ، ؛ ! ؟ %   should not begin a new line.

Text alignment & justification

See type samples.

Justification of Dhivehi text appears to be common, and a common approach involves stretching inter-word spacing.

Baselines, line height, etc.

Dhivehi uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Dhivehi places vowel marks above and below base characters but there is only ever one.

To give an approximate idea, fig_baselines compares Latin and Dhivehi glyphs from a Noto font. The basic height of Dhivehi letters is typically below the Latin x-height, however extenders for some characters push below the baseline, and the combining marks can reach beyond the Latin ascenders and descenders, creating a need for slightly larger line spacing.

Hhqxއިރުޗޯއާއުފަންވަނީޓީ،2
Font metrics for Latin text compared with Dhivehi glyphs in the Noto Serif Thaana font.

fig_baselines_other shows similar comparisons for the MV Boli font.

Hhqxއިރުޗޯއާއުފަންވަނީޓީ،2
Latin font metrics compared with Dhivehi glyphs in the MV Boli font.

Page & book layout

Online resources

  1. Dharu.com
  2. Dhivehi.mv
  3. Jazeera.mv

References