Hindi

Updated 3 October, 2021

This page brings together basic information about the Devanagari script and its use for the Hindi language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Hindi using Unicode.

Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription.

More about using this page
Related pages.
Other script summaries.

Sample (hindi)

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

अनुच्छेद १. सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है । उन्हें बुद्धि और अन्तरात्मा की देन प्राप्त है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिए ।

अनुच्छेद २. सभी को इस घोषणा में सन्निहित सभी अधिकारों और आज़ादियों को प्राप्त करने का हक़ है और इस मामले में जाति, वर्ण, लिंग, भाषा, धर्म, राजनीति या अन्य विचार-प्रणाली, किसी देश या समाज विशेष में जन्म, सम्पत्ति या किसी प्रकार की अन्य मर्यादा आदि के कारण भेदभाव का विचार न किया जाएगा । इसके अतिरिक्त, चाहे कोई देश या प्रदेश स्वतन्त्र हो, संरक्षित हो, या स्त्रशासन रहित हो या परिमित प्रभुसत्ता वाला हो, उस देश या प्रदेश की राजनैतिक, क्षेत्रीय या अन्तर्राष्ट्रीय स्थिति के आधार पर वहां के निवासियों के प्रति कोई फ़रक़ न रखा जाएगा ।

Usage & history

Devanagari is used in India and Nepal for almost 200 languages, making it the fourth most widely adopted writing system in the world. Among the others, it is the script used for writing Sanskrit and Hindi.

देवनागरी d̪eːʋˈnaːɡri

Devanagari is a descendant of the 3rd century BCE Brahmi script through the Gupta script and then the closely related Nagari script. The modern standardised form of Devanagari was in use by about 1000 CE. An early version in the Kutila inscription of Bareilly, dated to 992 CE, demonstrates the emergence of the horizontal bar to group letters belonging to a word.

It has long been used traditionally by religiously educated people throughout South Asia to record and transmit information, and often appears in parallel with a wide variety of local scripts.

Sources: Scriptsource, Wikipedia

Basic features

Devanagari is an abugida. Consonant letters have an inherent vowel sound. Combining vowel-signs are attached to the consonant to indicate that a different vowel follows the consonant. See the table in the right-hand column for a brief overview of features of the Hindi language.

Devanagari text runs left-to-right in horizontal lines.

Orthographic syllables (as opposed to phonetic syllables) play a significant role in Devanagari. An orthographic syllable starts at the beginning of any cluster of consonants and incorporates the whole cluster plus any following vowels and diacritics.

Phonetically, Hindi, like other Indic languages, has four forms of plosives, illustrated here with the bilabial stop: unvoiced p, voiced b, aspirated , and murmured . It also has a set of retroflex consonants. These are all represented separately in the orthography.

The 33 consonant letters used for Hindi are supplemented by repertoire extensions for 8 more non-native sounds by applying the nukta diacritic to characters.

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used.

As part of a cluster, RA has special forms. When initial in an orthographic syllable it appears as a hook at the top right of the whole syllable. When non-initial it appears as one of 2 special marks applied to the other consonants.

Word-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. Also, the inherent vowel of a penultimate consonant in a word of 3 syllables that ends in a non-inherent vowel is usually elided, and not marked as such.

The Hindi orthography has an inherent vowel, and represents vowels using 9-11 vowel-signs, including 1 prescript and no circumgraphs. All vowel-signs are combining marks, and are stored after the base character.

There are 10-12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds.

There are no composite vowels.

Vowels may be nasalised, using the candrabindu diacritic.

Hindi uses native number digits.

The Unicode Devanagari block contains more characters than other indic scripts, partly because it serves as a pivot script for transliterations of other scripts.

Character index

Letters

Show

Consonants

प␣फ␣ब␣भ␣त␣थ␣द␣ध␣ट␣ठ␣ड␣ढ␣क␣ख␣ग␣घ␣च␣छ␣ज␣झ␣स␣श␣ष␣ह␣म␣न␣ञ␣ण␣ङ␣व␣र␣ल␣य␣क़␣फ़␣ज़␣ख़␣ग़␣ड़␣ढ़

Independent vowels

इ␣ई␣उ␣ऊ␣ए␣अ␣ओ␣ऐ␣औ␣आ
ऍ␣ऑ

Vocalic

Other

ऽ␣ॐ

Combining marks

Show

Vowel-signs

ि␣ी␣ु␣ू␣े␣ो␣ै␣ौ␣ा
ॅ␣ॉ

Vocalic

Other

़␣्␣ँ␣ं␣ः

Numbers

Show
०␣१␣२␣३␣४␣५␣६␣७␣८␣९

Punctuation

Show
।␣॥␣॰␣‘␣’␣“␣”␣—

ASCII

!␣(␣)␣,␣.␣:␣;␣?

Symbols

Show

Not used for modern Hindi

Character lists show:

Phonology

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

ɪ ɪ ʊ ʊ ə ə ɛː ɛː ɔː ɔː æ æ ɑː ɑː

The phoneme ə is often written a in phonemic transcriptions. Its pronunciation may also be slightly lower (such as ɐ).

and in word-final position are typically shortened to i and u,wp,#Vowels eg. शक्ति वस्तु

Where ɦ has inherent vowels on either side, those vowels may become ɛ, eg. कहना khnā kɛɦɛnaː to say A similar process occurs for word-final ɦ,wp,#Vowels eg. कह kh kəɦ say!

For more details, see Wikipedia.

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar uvular glottal
stop p b
t d
    ʈ ɖ
ʈʰ ɖʱ
  k ɡ
ɡʱ
q ʔ
affricate       t͡ʃ d͡ʒ
t͡ʃʰ d͡ʒʱ
         
fricative f v   s z ʃ ʂ   x ɣ   h ɦ
nasal m   n   ɳ ɲ ŋ  
approximant ʋ w   l     j    
trill/flap     r ɾ   ɽ ɽʱ  

Hindi, like other Indic languages, has four forms of plosives, illustrated here with the bilabial stop: unvoiced p, voiced b, aspirated , and murmured . It also has a set of retroflex consonants.

v and w are allophones of ʋ in Hindi. w typically occurs between a consonant and vowel,wp,#Allophony_of_[v]_and_[w] eg. compare पकवान व्रत

For more details, see Wikipedia.

Structure

The effective unit of the writing systems is the orthographic syllable, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants, with a canonical structure of (((C)C)C)V.u

Consonant letters by themselves constitute a CV unit, where the V is an inherent vowel, whose exact phonetic value may vary by writing system. Independent vowels also constitute a CV unit, where the C is considered to be null. A dependent vowel sign is used to represent a V in CV units where C is not null and V is not the inherent vowel. u

In some cases, a phonological diphthong, such as Hindi जाओ ɟāọ̄ is actually written as two orthographic CV units, where the second of these units is an independent vowel letter. u

Two diacritics (generally classified as vowels) can be used to represent a syllable-final nasal or an unvoiced aspiration. Medial consonants are catered for by the consonant cluster model. Diacritics are also used to nasalise vowel sounds.

Vowels

Click on the characters in the lists for detailed information. For a mapping of sounds to graphemes see vowel_mappings.

Inherent vowel

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter [U+0915 DEVANAGARI LETTER KA].

Vowel-signs

Non-inherent vowel sounds that follow a consonant are represented using vowel-signs, eg. kiː is written की [U+0915 DEVANAGARI LETTER KA + U+0940 DEVANAGARI VOWEL SIGN II].

An orthography that uses vowel-signs is different from one that uses simple diacritics or letters for vowels, in that the vowel-signs are generally attached to the syllable, rather than just applied to the letter of the immediately preceding consonant (see prescript_vowels for an example).

Devanagari vowel-signs are all combining characters. A single Unicode character is used per base consonant, and there are no vowel-signs with multiple parts. All vowel-signs are typed and stored after the base consonant, and the font puts them in the correct place for display.

Half the vowel-signs are spacing combining characters, meaning that they consume horizontal space when added to a base consonant.

See also vocalics.

Combining marks used for vowels

Hindi uses the following dedicated combining marks for vowels.

ि␣ी␣ु␣ू␣े␣ो␣ै␣ौ␣ा

It also includes 2 vowel-signs used for sounds in foreign (especially English) loan words.

ॅ␣ॉ

Pre-base vowel-sign

ि

One vowel-sign appears to the left of the base consonant letter or cluster, eg. दिन

This is a combining mark that is always typed and stored after the base consonant(s). The font places the glyph before the base consonant.

It is actually placed before the start of the syllable. This means that a word with a consonant cluster at the start displays the pre-base vowel more than one consonant character away from the place where it is pronounced, eg. शक्ति

Note, however, that if the cluster is split by a visible virama, this creates two syllables and the pre-base vowel-sign appears after the consonant with the virama. If you click on the example below, you'll see that the characters and code point orders are the same as for the previous example (apart from the addition of the ZWNJ to force the virama to appear), but the location of the pre-base vowel-sign is now immediately before the consonant after which it is pronounced. शक्‌ति

Standalone vowels

Devanagari represents standalone vowels using a set of independent vowel letters. The set contains a character to represent the inherent vowel sound.

Independent vowels used by Hindi:

इ␣ई␣उ␣ऊ␣ए␣अ␣ओ␣ऐ␣औ␣आ

Two more are used for sounds in loan words.

ऍ␣ऑ

The following combinations are also counted as letters of the alphabet.

अं␣अः␣अँ

Note the sound difference between the use of a standalone vowel vs. a vowel-sign after a consonant:नई nị̄ nəiː नी niː

Nasalisation

ँ␣ं

Any vowel in Hindi can be nasalised, except for the vocalics.s

Nasalisation is usually indicated using [U+0901 DEVANAGARI SIGN CANDRABINDU], eg. मुँह

When a vowel-sign rises above the head line, the glyph for this character may be simplified to just a dots, which can be written using [U+0902 DEVANAGARI SIGN ANUSVARA] instead of candrabindu, eg. हैं

The distinction between use of [U+0901 DEVANAGARI SIGN CANDRABINDU] and [U+0902 DEVANAGARI SIGN ANUSVARA] is not always clearly defined. For example, snake can be written in both of the following ways: साँप सांप

Vowel lengthening

An extra-long, sustained vowel sound can be indicated using [U+093D DEVANAGARI SIGN AVAGRAHA], eg. आईऽऽऽ! <aiii!>

This was originally used as a vowel elision marker in Sanskrit.

Consonants with no following vowel

The inherent vowel is not always pronounced. For example in Hindi it is not usually pronounced at the end of a word, although a ghost echo may appear after a word-final cluster of consonants, eg. योग्य yōg͓y jogjᵊ राष्ट्र rāʂ͓ʈ͓r ɾəstɾᵊ

In addition Hindi has a general rule that when a word has three or more syllables and ends in a vowel other than the inherent a, the penultimate vowel is not pronounced, eg. compareसमझ smjʱ səməɟʱ समझा smjʱā səmɟʱaːandरहन rhn rəhən रहना rhnā rəhnaː

(For a number of reasons, however, this rule does not always hold.)

Devanagari uses [U+094D DEVANAGARI SIGN VIRAMA] (called halant in Hindi) to kill the inherent vowel after a consonant. The virama is rarely seen. As just mentioned, no virama is used at the end of a word, or in the penultimate syllable where the above rules apply. The virama is also usually hidden when the consonant is part of a consonant cluster (see clusters). The virama is visible, however, if it isn't followed by a consonant, eg. the following explicitly represents just the sound k,क्

Vowel to script mapping

The following tables show how the above vowel sounds map to characters or sequences of characters. Both dependent vowel-signs (d) and independent vowels (i) are shown.

Plain vowels

d

[U+0940 DEVANAGARI VOWEL SIGN II], eg. तीन.

 
i

[U+0908 DEVANAGARI LETTER II], eg. ईंट.

ɪ
d

ि [U+093F DEVANAGARI VOWEL SIGN I], eg. दिन.

 
i

[U+0907 DEVANAGARI LETTER I], eg. इन्सान.

ʊ
d

[U+0941 DEVANAGARI VOWEL SIGN U], eg. सुस्त.

 
i

[U+0909 DEVANAGARI LETTER U], eg. उड़ना.

d

[U+0942 DEVANAGARI VOWEL SIGN UU], eg. फूल.

 
i

[U+090A DEVANAGARI LETTER UU], eg. ऊपर.

d

[U+0947 DEVANAGARI VOWEL SIGN E], eg. बेटा.

 
i

[U+090F DEVANAGARI LETTER E], eg. एक.

d

[U+094B DEVANAGARI VOWEL SIGN O], eg. टोपी.

 
i

[U+0913 DEVANAGARI LETTER O], eg. ओस.

ɛː
d

[U+0948 DEVANAGARI VOWEL SIGN AI]

May also replace inherent vowels alongside ɦ, per the description above.

◌̃

[U+0901 DEVANAGARI SIGN CANDRABINDU], eg. दाँत.

[U+0902 DEVANAGARI SIGN ANUSVARA]  hiᵑͫdi hiⁿͫdi

Sources: Wikipedia, and Google Translate.

Vocalics

In Devanagari, vocalics are available both as vowel-signs and independent vowels.

Hindi generally uses just one vocalic.

ृ␣ऋ

Other vocalics are used for Sanskrit.

ॄ␣ॢ␣ॣ␣ॠ␣ऌ␣ॡ

Consonants

Click on the characters in the lists for detailed information. For a mapping of sounds to graphemes see consonant_mappings.

The 33 consonant letters used for Hindi are supplemented by repertoire extensions for 8 more non-native sounds by applying the nukta diacritic to characters.

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used.

As part of a cluster, RA has special forms. When initial in an orthographic syllable it appears as a hook at the top right of the whole syllable. When non-initial it appears as one of 2 special marks applied to the other consonants.

Word-final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama. Also, the inherent vowel of a penultimate consonant in a word of 3 syllables that ends in a non-inherent vowel is usually elided, and not marked as such.

Basic consonants

Basic set of consonants, used for Hindi and Sanskrit. (Phonetic information for Hindi.)

Stops

प␣फ␣ब␣भ␣त␣थ␣द␣ध␣ट␣ठ␣ड␣ढ␣क␣ख␣ग␣घ

Affricates

च␣छ␣ज␣झ

Fricatives

व␣स␣श␣ष␣ह

Nasals

म␣न␣ञ␣ण␣ङ

Liquids

व␣र␣ल␣य

Other

Hindi also counts 3 character combinations as consonantal letters of the alphabet.

त्र␣ज्ञ␣क्ष

Repertoire extension

[U+093C DEVANAGARI SIGN NUKTA] is used to represent foreign sounds, eg. in the following example the dot changes to ख़ x ख़ारीदारी

A list of graphemes used in Hindi that combine nukta with an existing consonant. These are all counted as letters of the Hindi alphabet. The 5th one is very rare.

क़␣फ़␣ज़␣झ़␣श़␣ख़␣ग़␣ड़␣ढ़

The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.

The Unicode block also contains the following precomposed code points for the sequences used in Hindi.

क़␣फ़␣ज़␣ख़␣ग़␣ड़␣ढ़

The Unicode Standard recommends not to use the precomposed code points for Hindi, but instead to use the base+nukta sequences. See also nukta_encoding for more information.

Syllable-final consonants

Although traditionally classified as vowels, 2 diacritics represent syllable-final consonant sounds.

ं␣ः

Nasal sounds m n ŋ that are homorganic with a following consonant are commonly written using [U+0902 DEVANAGARI SIGN ANUSVARA]. This mark is positioned over the previous consonant, eg. हिंदी

Most words that use the anusvara can also be written using the consonant itself, eg. हिन्दी

In some cases, however, the anusvara form is more common. For example, the first of the two following alternatives is much more common पंजाब *पञ्जाब

Some words, mostly Sanskrit loan words, may end with a voiceless h after a vowel which can be written using [U+0903 DEVANAGARI SIGN VISARGA], eg. पुनः दुःखी

See also the candrabindu diacritic, which nasalises a vowel.

Consonant clusters

When the shapes of constituent consonants in a cluster are changed or merged to indicate the lack of intervening vowels, this is referred to as a conjunct.

The absence of a vowel sound between two or more consonants can be visually indicated in one of the following ways.

  1. Create a conjunct. There are a number of possibilities here:
    1. Half-forms : Reduce the shape of all consonants in the cluster except the last to a 'half-form' by removing the vertical stroke.
    2. Stacking : Reduce a non-initial consonant in size and shape and position it below the first.
    3. Special ligation : Create a fusion of the two shapes, where one or other of the components may not be easily recognisable.
    4. The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
  2. Show a visible virama below the non-final consonants in the cluster.
  3. No indication, although there are usually generalised pronunciation rules that allow readers to spot these locations. Examples of these rules are given in the section about the inherent vowel.

See a table of combinations.

In all cases except the last, the underlying mechanism in terms of codepoints involves adding [U+094D DEVANAGARI SIGN VIRAMA] between the consonants in the cluster, eg. क्ष is produced by the sequence + + [U+0915 DEVANAGARI LETTER KA + U+094D DEVANAGARI SIGN VIRAMA + U+0937 DEVANAGARI LETTER SSA].

The font usually determines which visual method is used, although it is possible to influence this (see below).

Click on the figures below to see which characters are being shown.

Conjoined half-forms

A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Devanagari consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly. The last consonant in the cluster retains its full shape.

तव→त्व कक→क्क तसव→त्स्व
Examples of conjuncts formed by using half-forms.

A small number of half-forms are only minimally different from side-by-side characters.

छग छ्ग
An example of a conjunct with a subtle difference between separate consonants with intervening vowel (left), and a conjunct cluster (right). The difference is highlighted on the left.

Vertical stacks

This is more common for Sanskrit, and few modern fonts reorder glyphs in this way, or do so for a limited number of combinations.

कक→क्क दब→द्ब हव→ह्व
Conjuncts formed by subjoining non-initial consonants.

Ligated conjuncts

Typically, only a small number of clusters are combined in a way that makes it difficult to spot the component parts. This is, however, the default for two particular clusters: क्ष k͓ʂ kṣ ज्ञ ɟ͓ɲ ɡj

कष→क्ष जञ→ज्ञ कत→क्त
Conjuncts formed by ligation.

Conjuncts with ra

When [U+0930 DEVANAGARI LETTER RA] follows another consonant, it is typically rendered as a small, diagonal line to the left, eg. क्र ग्र भ्रAfter 6 consonants, however, it is rendered as an upside-down v shape below, ie. ट्र ठ्र ड्र ढ्र ड़्र छ्रAfter [U+0924 DEVANAGARI LETTER TA] it produces त्र

कर→क्र टर→ट्र तर→त्र
Conjuncts formed by a following ra.

When ra precedes another consonant, it is rendered as a small hook above the vertical line in the cluster, eg. र्क r͓k र्ल r͓lWhere it precedes a cluster using half-forms, it is aligned with the vertical line of the trailing consonant, eg. र्स्प r͓s͓pHowever, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. र्का r͓kā र्की r͓kī(This illustrates how the basic units of the script are orthographic syllables.)

र्क र्ल र्स्प र्का
The horizontal position of the hook for conjuncts formed by a preceding ra follows the main vertical bar of the syllable.

Visible virama

The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, as illustrated in fig_virama_visible.

ङ्ख ङ्ख
A consonant cluster for which there exists a conjunct form in the Tiro Hindi font (left), but not in the Noto Serif Devanagari font (right). The latter indicates that this is a cluster by showing a visible virama.

Examples of clusters that the default font used for this page is unable to render as a conjunct form: स्विट्ज़रलैंड रीट्वीट

An important consequence of representing clusters in this way is that the syllable boundaries are different. For example, if we follow the cluster with a left-positioned vowel-sign, it will now appear after the virama, rather than before the cluster, eg. compare the position of the pre-base vowel-sign in fig_virama_vowel. This change is also reflected in segmentation of the text for line-breaking, inter-character spacing, etc.

ङ्खि ङ्खि
Positioning of the pre-base vowel-sign in relation to the same consonant cluster where a conjunct forms (left) vs. where a visible virama appears (right).

A visible virama may also be used with a single consonant, to indicate that it is to be pronounced without the inherent vowel, eg. क् k

Consonant lengthening

Lengthened (geminated) consonants are indicated in the script using the same mechanisms as for clusters.

Most native consonants may be lengthened, but not , ɽ, ɽʱ, or ɦ. Geminate consonants are always medial and preceded by one of ə, ɪ, or ʊ.wp,#Consonants

Using ZWJ & ZWNJ

ZWNJ It's possible to prevent the formation of conjuncts using U+200C ZERO WIDTH NON-JOINER (ZWNJ). For example:

ZWJ To produce a half-form, rather than a ligated form, use U+200D ZERO WIDTH JOINER (ZWJ). For example, क्‍ष   →   क्ष

It can also be used to produce standalone half-forms (for educational text) such as घ्‍

Consonant sounds to characters

The following maps Hindi consonant sounds to common graphemes.

p

[U+092A DEVANAGARI LETTER PA], eg. पानी.

[U+092B DEVANAGARI LETTER PHA], eg. फल.

b

[U+092C DEVANAGARI LETTER BA], eg. बहुत.

[U+092D DEVANAGARI LETTER BHA], eg. भारी.

t

[U+0924 DEVANAGARI LETTER TA], eg. तीन.

[U+0925 DEVANAGARI LETTER THA], eg. थूकना.

d

[U+0926 DEVANAGARI LETTER DA], eg. दो.

[U+0927 DEVANAGARI LETTER DHA], eg. धूल.

ʈ

[U+091F DEVANAGARI LETTER TTA], eg. टांग.

ʈʱ

[U+0920 DEVANAGARI LETTER TTHA], eg. ठंडा.

ɖ

[U+0921 DEVANAGARI LETTER DDA], eg. अंडा.

k

[U+0915 DEVANAGARI LETTER KA], eg. कुत्ता.

[U+0916 DEVANAGARI LETTER KHA], eg. खाना.

ɡ

[U+0917 DEVANAGARI LETTER GA], eg. गर्दन.

ɡʱ

[U+0918 DEVANAGARI LETTER GHA], eg. घास.

q

क़ [U+0915 DEVANAGARI LETTER KA + U+093C DEVANAGARI SIGN NUKTA], eg. क़लम.

[U+0958 DEVANAGARI LETTER QA] (decomposes in NFC and doesn't recompose) 

t͡ʃ

[U+091A DEVANAGARI LETTER CA], eg. चार.

t͡ʃʱ

[U+091B DEVANAGARI LETTER CHA], eg. छोटा.

d͡ʒ

[U+091C DEVANAGARI LETTER JA], eg. जानवर.

d͡ʒʱ

[U+091D DEVANAGARI LETTER JHA], eg. झील.

f

फ़ [U+092B DEVANAGARI LETTER PHA + U+093C DEVANAGARI SIGN NUKTA], eg. सफ़ेद.

[U+095E DEVANAGARI LETTER FA]  (decomposes in NFC and doesn't recompose) 

v

[U+0935 DEVANAGARI LETTER VA] as an allophone of ʋ, eg. व्रत.

s

[U+0938 DEVANAGARI LETTER SA], eg. सूरज.

z

ज़ [U+091C DEVANAGARI LETTER JA + U+093C DEVANAGARI SIGN NUKTA], eg. नज़दीक.

[U+095B DEVANAGARI LETTER ZA]   (decomposes in NFC and doesn't recompose) 

ʃ

[U+0936 DEVANAGARI LETTER SHA], eg. बारिश.

ʒ

झ़ [U+091D DEVANAGARI LETTER JHA + U+093C DEVANAGARI SIGN NUKTA]. Very rare, used in loanwords from Persian, Portuguese and English. Hindi speakers often use z or d͡ʒ instead.

x

ख़ [U+0916 DEVANAGARI LETTER KHA + U+093C DEVANAGARI SIGN NUKTA], eg. ख़ून.

[U+0959 DEVANAGARI LETTER KHHA]   (decomposes in NFC and doesn't recompose) 

ɣ

ग़ [U+0917 DEVANAGARI LETTER GA + U+093C DEVANAGARI SIGN NUKTA]

[U+095A DEVANAGARI LETTER GHHA]   (decomposes in NFC and doesn't recompose) 

h

[U+0903 DEVANAGARI SIGN VISARGA] (mostly Sanskrit words)

ɦ

[U+0939 DEVANAGARI LETTER HA], eg. हड्डी.

m

[U+092E DEVANAGARI LETTER MA], eg. मछली.

[U+0902 DEVANAGARI SIGN ANUSVARA] when followed by a labial consonant.

n

[U+0928 DEVANAGARI LETTER NA], eg. नाक.

[U+0902 DEVANAGARI SIGN ANUSVARA] when followed by an alveolar consonant, eg. ठंडा.

ɳ

[U+0923 DEVANAGARI LETTER NNA]. Brought into Hindi via Sanskrit loan words, but casually pronounced n and ɽ̃ is a common allophone. Does not appear initially.wp,#External_borrowing

ŋ

[U+0919 DEVANAGARI LETTER NGA]

[U+0902 DEVANAGARI SIGN ANUSVARA] when followed by a velar consonant, eg. टांग

ʋ

[U+0935 DEVANAGARI LETTER VA], eg. त्वचा, हवा.

w

[U+0935 DEVANAGARI LETTER VA] as a variant of ʋ commonly occuring between a consonant and vowel, eg. पकवान.

ɾ

[U+0930 DEVANAGARI LETTER RA], eg. रात rāt ɾɑːt̪ night.

ɽ

ड़ [U+0921 DEVANAGARI LETTER DDA + U+093C DEVANAGARI SIGN NUKTA], eg. बड़ा.

[U+095C DEVANAGARI LETTER DDDHA]     (decomposes in NFC and doesn't recompose) 

ɽʱ

ढ़ [U+0922 DEVANAGARI LETTER DDHA + U+093C DEVANAGARI SIGN NUKTA], eg. गाढ़ा.

[U+095D DEVANAGARI LETTER RHA]    (decomposes in NFC and doesn't recompose) 

l

[U+0932 DEVANAGARI LETTER LA], eg. लाल.

j

[U+092F DEVANAGARI LETTER YA], eg. नया.

Sources: Wikipedia, and Google Translate.

Encoding choices

This section looks at alternative strategies for typing and storing vowel-signs and independent vowels used by Hindi, taking into consideration the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

Vowel-signs

The single code points on the left should be used, and not the sequences on the right, because they are not made the same by normalisation. Therefore the content will be regarded as different, which will affect searching and other operations on the text.

Use Do not use
[U+094B DEVANAGARI VOWEL SIGN O] + [U+093E DEVANAGARI VOWEL SIGN AA + U+0947 DEVANAGARI VOWEL SIGN E]
[U+094C DEVANAGARI VOWEL SIGN AU] + [U+093E DEVANAGARI VOWEL SIGN AA + U+0948 DEVANAGARI VOWEL SIGN AI]

Independent vowels

Again, the single code points on the left should be used, and not the sequences on the right, because they are not made the same by normalisation.

Use Do not use
[U+0906 DEVANAGARI LETTER AA] + [U+0905 DEVANAGARI LETTER A + U+093E DEVANAGARI VOWEL SIGN AA]
[U+0913 DEVANAGARI LETTER O] + [U+0905 DEVANAGARI LETTER A + U+094B DEVANAGARI VOWEL SIGN O]
[U+0914 DEVANAGARI LETTER AU] + [U+0905 DEVANAGARI LETTER A + U+094C DEVANAGARI VOWEL SIGN AU]
[U+0910 DEVANAGARI LETTER AI] + [U+090F DEVANAGARI LETTER E + U+0947 DEVANAGARI VOWEL SIGN E]

Nuktas

The way the Unicode Standard recommends to type and store graphemes with nuktas is a little unusual for Devanagari. Here we look at alternative strategies for all uses of the nukta in the Devanagari block (usage recommendations for Hindi are given in the section nukta), and consider the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

For the following alternatives (unusually) the decomposed form is recommended by the Unicode Standard. NFC does not recombine the parts into precomposed characters. Instead, normalisation produces decomposed forms for both approaches using NFC and NFD, so both approaches are canonically equivalent.

Precomposed Decomposed (recommended)
[U+0958 DEVANAGARI LETTER QA] क़ [U+0915 DEVANAGARI LETTER KA + U+093C DEVANAGARI SIGN NUKTA]
[U+095E DEVANAGARI LETTER FA] फ़ [U+092B DEVANAGARI LETTER PHA + U+093C DEVANAGARI SIGN NUKTA]
[U+095B DEVANAGARI LETTER ZA] ज़ [U+091C DEVANAGARI LETTER JA + U+093C DEVANAGARI SIGN NUKTA]
[U+0959 DEVANAGARI LETTER KHHA] ख़ [U+0916 DEVANAGARI LETTER KHA + U+093C DEVANAGARI SIGN NUKTA]
[U+095A DEVANAGARI LETTER GHHA] ग़ [U+0917 DEVANAGARI LETTER GA + U+093C DEVANAGARI SIGN NUKTA]
[U+095C DEVANAGARI LETTER DDDHA] ड़ [U+0921 DEVANAGARI LETTER DDA + U+093C DEVANAGARI SIGN NUKTA]
[U+095D DEVANAGARI LETTER RHA] ढ़ [U+0922 DEVANAGARI LETTER DDHA + U+093C DEVANAGARI SIGN NUKTA]
[U+095F DEVANAGARI LETTER YYA] य़ [U+092F DEVANAGARI LETTER YA + U+093C DEVANAGARI SIGN NUKTA]

The next batch of characters produces precomposed characters under NFC, and decomposed under NFD. Both approaches are therefore canonically equivalent, even though the behaviour is different. In this case, the Unicode Standard recommends using the precomposed form.

Precomposed (recommended) Decomposed
[U+0929 DEVANAGARI LETTER NNNA] ऩ [U+0928 DEVANAGARI LETTER NA + U+093C DEVANAGARI SIGN NUKTA]
[U+0931 DEVANAGARI LETTER RRA] ऱ [U+0930 DEVANAGARI LETTER RA + U+093C DEVANAGARI SIGN NUKTA] 
[U+0934 DEVANAGARI LETTER LLLA]  ऴ [U+0933 DEVANAGARI LETTER LLA + U+093C DEVANAGARI SIGN NUKTA]

The final grapheme exists only in decomposed form, there is no precomposed equivalent.

Precomposed Decomposed
n/a झ़ [U+091D DEVANAGARI LETTER JHA + U+093C DEVANAGARI SIGN NUKTA]

In practise, it's hard to envisage content authors being aware of, let alone respecting, rules about whether they should use precomposed or decomposed forms. Keyboards or other input mechanisms, or perhaps sometimes applications that automatically normalise can perhaps guide users to the recommended practise, but it's likely that Devanagari text will always contain a mixture of forms for these graphemes.

Symbol

[U+0950 DEVANAGARI OM] is a religious symbol used in both Hinduism and Buddhism. 

Numbers, dates, currency, etc

Devanagari has a set of digits, that can be referred to as 'hindi' numerals. They are used regularly.

०␣१␣२␣३␣४␣५␣६␣७␣८␣९

The CLDR standard-decimal pattern is #,##,##0.###. The standard-percent pattern is #,##,##0%.cldr

An interesting feature of large numbers written in India is that they apply groupings of two, rather than three, digits between commas (even when using european digits).

20,00,000

Two million, written with Indian comma separators.

Currency

The CLDR standard format for currency is ¤#,##,##0.00.cldr

[U+20B9 INDIAN RUPEE SIGN] is the symbol introduced by the Government of India in 2010 as the official currency symbol for the Indian rupee (INR).u

It is distinguished from [U+20A8 RUPEE SIGN], which is an older symbol not formally tied to any particular currency.u Follow that link for more information about the rupee. 

Text direction

Text is normally written horizontally, left to right.

Show default bidi_class properties for characters in the modern Hindi orthography.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Hindi character app.

Devanagari text is not cursive (ie. joined up like Arabic), however there is a significant amount of interaction between glyphs around consonant clusters.

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Context-based shaping

The shape of a character when displayed can vary, often dramatically, according to the context.

One very common example in most indic scripts is the handling of 'conjunct consonants', ie. groups of consonants with no intervening vowel sounds. Since consonants in indic scripts have an inherent vowel sound, when two consonants are combined this way you have to indicate that the vowel of the initial consonant is suppressed. This is normally done by altering the shape of the first consonant, or merging the shape of the two consonants.

To tell the font to do this, in Unicode you add  ् [U+094D DEVANAGARI SIGN VIRAMA​] between the two consonants. This produces the change in the shapes of the glyphs that indicates to the reader that this is a conjunct. The actual outcome is font dependent. For the word below which contains a conjunct of two [U+0932 DEVANAGARI LETTER LA] characters (making a long L sound) you may see a 'half-form' used for the first LA (shown on the left) or you may see (as shown on the right) a ligated form.

दिल्ली दिल्‍ली
Alternative representations of a geminated l consonant.

There are other types of context-based shaping, which are font specific. One is shown below. The width of the glyph for  ि [U+093F DEVANAGARI VOWEL SIGN I​] differs according to the base character to which it is attached.

हालाँकि प्रचलित
Context-sensitive shaping of the glyph for i.

Multiple combining characters

Diacritics regularly combine with a vowel-sign attached to the same consonant or consonant cluster. The example below shows two combining characters that are positioned above the base character in a very common form of the verb 'to be'. One is [U+0948 DEVANAGARI VOWEL SIGN AI​], and the other the nasalisation mark [U+0902 DEVANAGARI SIGN ANUSVARA​].

हैं
Multiple combining characters over one base character.

Context-based positioning

Combining characters need to be placed in different positions, according to the context.

The example on the left below displays the dot (anusvara) immediately over the long vertical stroke. The example to the right has moved the dot slightly to the right in order to accomodate the vowel sign.

अंधे में
Context-sensitive placement of the anusvara diacritic.

In the following the image to the left shows the normal position of  ू [U+0942 DEVANAGARI VOWEL SIGN UU​], beneath the first letter. The example on the right shows that character displayed higher up and to the right when combined with the base character [U+0930 DEVANAGARI LETTER RA].

पूजा परू
Context-dependent placement of the glyph representing ra.

Font styles

tbd

Punctuation & inline features

Graphemes

A number of text operations work on the basis of graphemes, rather than code points, such as line-breaking, cursor movement, backspacing, vertical setting, justification, etc. Usually the minimal unit correlates with the Unicode concept of grapheme clusters, but not always.

Conjuncts

Conjuncts and any dependent combining characters should never be split.

This creates a problem when dealing with Unicode grapheme clusters, because they stop after reaching a virama. So conjuncts usually contain multiple grapheme clusters. This produces incorrect segmentation as seen on the left in fig_grapheme_conjuncts. Applications need to tailor the grapheme cluster rules to avoid splitting conjuncts.

हि‍ ‍न्‍ ‍दी हि‍ ‍न्दी
Segmentation of the word हिन्दी़ hin͓dị̄: using grapheme clusters (left), and how it should be (right).

Unfortunately, this is harder than it seems, because whether a conjunct is formed or not usually depends on the capabilities of the font – it cannot be determined solely by looking at the code points in memory. If a font doesn't contain the glyphs to create a conjunct it will render the consonant cluster with a visible virama. In that case, the grapheme cluster approach is appropriate.

Observation: It may be that justification and letter-spacing place spacing between vowel-sign glyphs and the glyph of their base characters in some circumstances. This kind of unit requires manipulation of the glyphs in the text, rather than the code points

Word boundaries

Words are separated by spaces.

Devanagari has hyphenated words – mainly conjoined nouns,i eg. लाभ-हानि माता-पिता

Phrase & section boundaries

!␣,␣:␣;␣.␣।␣॥␣?␣—
phrase

, [U+002C COMMA]

; [U+003B SEMICOLON]

: [U+003A COLON]

sentence [U+0964 DEVANAGARI DANDA]

. [U+002E FULL STOP]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK] 

section [U+0965 DEVANAGARI DOUBLE DANDA]

Devanagari uses standard Latin punctuation, but also has its own version of a full stop, [U+0964 DEVANAGARI DANDA]. Most style guides recommend to use no space before this punctuation, but use space after.

Observation: A number of Hindi style guides consulted require that the danda follow the last letter in the sentence, with no intervening space.

चाहिए।

Devanagari danda, used like a Western full stop.

For boundaries of text above the sentence level there is [U+0965 DEVANAGARI DOUBLE DANDA].

The dandas are commonly used in Sanskrit and Prakrit poetry, and the double danda may be both before and after benedictory headings, rest-stops, etc., eg. ॥ श्रीसीताराम ॥

Both of these code points are also used in a number of other indic scripts.

Parentheses & brackets

(␣)
  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Quotations

‘␣’␣“␣”
  start end
top level

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

The default quote marks for Hindi are [U+201C LEFT DOUBLE QUOTATION MARK] at the start, and [U+201D RIGHT DOUBLE QUOTATION MARK] at the end.cldr

When an additional quote is embedded within the first, the quote marks are [U+2018 LEFT SINGLE QUOTATION MARK] and [U+2019 RIGHT SINGLE QUOTATION MARK].cldr

Emphasis

Italicisation and bolding are not traditionally used for highlighting text in Devanagari.

Abbreviation, ellipsis & repetition

[U+0970 DEVANAGARI ABBREVIATION SIGN] is used to indicate abbreviations of words, eg. रुपया rupyā (rupayā) rupee can be abbreviated as रु॰

Inline notes & annotations

tbd

Other inline ranges

tbd

Other punctuation

tbd

Line & paragraph layout

Line breaking & hyphenation

Devanagari is normally wrapped at word boundaries.

Show (default) line-breaking properties for characters in the modern Hindi orthography.

Hyphenation

Devanagari text can be hyphenated during line wrap, though it is not very common (unlike several south Indian scripts). This is partly because Hindi contains mostly short words.st

Hyphenation is much more common when writing Sanskrit in the Devanagari script, because words tend to be much longer.

Hyphenation adds a hyphen at the end of the line when a word is broken.

Line-edge rules

According to ilreq, a line should not start with any of the following characters.

,␣.␣:␣;␣।␣॥␣)␣]␣}␣>␣+␣*␣/␣=␣_␣|␣~␣%

Line breaking should also not move a danda or double danda to the beginning of a new line, even if they are preceded by a space character. These punctuation characters should behave in the same way as a full stop does in English text.

Presumably, similar rules apply for the end of a line.

Text alignment & justification

Justification is done, principally, by adjusting the space between words. (I have no information about whether high-end systems also adjust inter-character spacing slightly if inter-word doesn't resolve the issue, or to improve aesthetics.)

Letter spacing

tbd

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Hindi orthography uses numeric and alphabetic styles.

Numeric

The devanagari numeric style is decimal-based and uses these digits.rmcs

०␣१␣२␣३␣४␣५␣६␣७␣८␣९

Examples:

१␣२␣३␣४␣११␣२२␣३३␣४४␣१११␣२२२␣३३३␣४४४

Alphabetic

The hindi alphabetic style for the Hindi language uses these letters.

क␣ख␣ग␣घ␣ङ␣च␣छ␣ज␣झ␣ञ␣ट␣ठ␣ड␣ढ␣ण␣त␣थ␣द␣ध␣न␣प␣फ␣ब␣भ␣म␣य␣र␣ल␣व␣श␣ष␣स␣ह

Examples:

क␣ख␣ग␣घ␣ट␣फ␣ह␣कट␣गठ␣चभ␣ञग␣डण

Prefixes and suffixes

The most common approach to writing lists in Hindi uses a full stop as a suffix.

Examples:

१. २. ३. ४. ५.
Separator for Hindi list counters: full stop + space.

Styling initials

Devanagari content does sometimes enlarge the first part of the first word in a paragraph, in a similar way to drop caps. Instead of enlarging just the first letter in the word, it is normal to enlarge the first syllable. If the first character is the beginning of a conjunct, the whole conjunct should be included in the styling.

Enlarged syllable styling at the start of a paragraph.

In theory, the top line of the characters should align in the large text and the following first line, however it is possible to easily find examples where this is not accurate.

It is very common to see such initial-syllable enlargement centred inside a coloured box.

Enlarged syllable styled inside a coloured box.

In the boxed style, the box itself is usually aligned with the top of the first line of text and the bottom of the last, and the highlighted character(s) are centred horizontally and vertically in the box.

In both styles shown above, any punctuation such as opening quotes and opening parentheses should also be included in the initial styling.

According to ilreq, Hindi generally doesn't use the raised or sunken styles, and contour-filling is not needed for Indian text.i

Page & book layout

This section is for any features that are specific to thisScript and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

Languages using the Devanagari script

According to ScriptSource, the Devanagari script is used for the following languages:

References