Gujarati

Updated 29 November, 2021

This page gathers basic information about the Gujarati script and its use for the Gujarati language. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Gujarati using Unicode.

Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription.

More about using this page
Related pages.
Other script summaries.

Sample (Gujarati)

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

અનુચ્છેદ ૧: પ્રતિષ્ઠા અને અધિકારોની દૃષ્ટિએ સર્વ માનવો જન્મથી સ્વતંત્ર અને સમાન હોય છે. તેમનામાં વિચારશક્તિ અને અંતઃકરણ હોય છે અને તેમણે પરસ્પર બંધુત્વની ભાવનાથી વર્તવું જોઇએ.

અનુચ્છેદ ૨: દરેક વ્યક્તિને જાતિ, રંગ, લિંગ, ભાષા, ધર્મે, રાજકીય અથવા બીજા અભિપ્રાય, રાષ્ટ્રીય અથવા સામાજિક ઉદ્ભવસ્થાન, મિલકત, જન્મ અથવા મોભા જેવા કોઇપણ જાતના ભેદભાવ વગર આ ધોષણામાં રજૂ કરવામાં આવેલા સધળા અધિકારો અને સ્વતંત્રતા ભોગવવાનો હક્ક છે. વધુમાં કોઇપણ વ્યક્તિ તે સ્વતંત્ર, ટ્રસ્ટ હેઠળના સ્વશાસન હેઠળ ન હોય તેવા અથવા સાર્વભામત્વની બીજી કોઇપણ મર્યાદા હેઠળ આવેલા દેશ અથવા પ્રદેશની હોય તો પણ રાજકીય, હફમવવિષયક અથવા આંતરરાષ્ટ્રીય મોભાના ધોરણે તેની સાથે કોઇપણ ભેદભાવ રાખવામાં આવશે નહિ.

Usage & history

The Gujarati script is used for writing the Gujarati and Chodri languages, together spoken by almost 47 million people, as well as use alongside Devanagari for languages of the Bhil people, one of India's largest indigenous groups. Until the mid-19th century it was used primarily for bookkeeping and personal correspondence, but since printing facilities became widely available to Gujarati speakers the script has been used in schools, for printing books and newspapers, in government offices and public signage, and is one of the official scripts of India.

ગુજરાતી લિપિ gujǎrātī lipi Gujarati script

The Gujarati script was adapted from the Devanagari script to write the Gujarati language from the 10th century. Since then it has gone through 3 distinct phases. The third phase, begun in the 17th century, saw the abandonment of the shiroreka (topline), part of an adaptation to enable ease and speed of writing. The Devanagari script was used for literature and academic writings until the modern widespread use of the script developed.

Sources: Scriptsource, Wikipedia

Basic features

The script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel-signs to the consonant. See the table to the right for a brief overview of features of the modern Gujarati orthography.

Gujarati text runs left to right in horizontal lines.

Words are separated by spaces.

Gujarati uses 34 consonant letters. The repertoire can be extended by applying the nukta diacritic to characters, or with additional characters, but these are used for Arabic and Avestan transliterations, rather than current Gujarati text. Gujarati doesn't use a shiroreka (top line) like its close relative Devanagari.

The Gujarati orthography has an inherent vowel, and represents other vowels using 11 vowel-signs, including 1 prescript and no circumgraphs. All vowel-signs are combining marks, and are stored after the base character.

There are 12 independent vowels, one for each vowel sound, including the inherent vowel, and these are used to write all standalone vowel sounds. Some vowel-signs can be visually analysed into subcomponents, but Unicode usage involves only one combining character per consonant.

There are no composite vowels.

Vowels may be nasalised, using the anusvara diacritic.

Consonant clusters at any location are normally indicated using the virama between consonants. This results in a large number of conjunct forms expressed using half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used.

The absence of the inherent vowel is not always indicated. It is generally not pronounced at the end of a word, but also it is sometimes elided without indications within a word. Consonant clusters can be indicated using the virama between consonants and include half-forms, stacked consonants, and ligated glyphs. Occasionally, a visible virama is used. In addition, participating consonants may just be moved together. Gujarati is also unusual in that components in clusters sometimes used Devanagari glyphs for one or more Gujarati characters.

Final consonant sounds may be represented by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama.

Two vocalics are used in current text. Others were used historically.

Gujarati has a set of native digits.

Character index

Letters

Show

Basic consonants

પ␣બ␣ભ␣ત␣થ␣દ␣ધ␣ટ␣ઠ␣ડ␣ઢ␣ક␣ખ␣ગ␣ઘ␣ચ␣છ␣જ␣ઝ␣ફ␣સ␣શ␣ષ␣હ␣મ␣ન␣ઞ␣ણ␣ઙ␣વ␣ર␣લ␣ળ␣ય

Vowels

ઇ␣ઈ␣ઊ␣ઉ␣એ␣ઓ␣અ␣ઑ␣ઍ␣આ␣ઔ␣ઐ

Vocalics

ઋ␣ૠ

Other

ઽ␣ૐ

Combining marks

Show

Vowels

િ␣ી␣ુ␣ૂ␣ે␣ો␣ૉ␣ૅ␣ા␣ૈ␣ૌ

Vocalics

ૃ␣ૄ

Other

ં␣ઃ␣઼␣્

Numbers

Show
૦␣૧␣૨␣૩␣૪␣૫␣૬␣૭␣૮␣૯

Punctuation

Show
‘␣’␣“␣”␣(␣)␣,␣.␣:␣;␣?␣!
૰␣।␣॥

CLDR additions

§␣‐␣–␣—␣†␣‡␣…␣′␣″

Symbols

Show

Other

Show
‍␣‌
Character lists show:

Phonology

These are sounds of the Gujarati language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i i u u e e o o ə ə ɛ ɛ ɔ ɔ æ æ ɑ ɑ

Diphthongs

əʋ əj əʋ əj

Consonant sounds

labial dental alveolar post-
alveolar
retroflex palatal velar glottal
stop p b
t d
    ʈ ɖ
ʈʰ ɖʰ
  k ɡ
ɡʰ
 
affricate       t͡ʃ d͡ʒ
t͡ʃʰ d͡ʒʰ
       
fricative f   s z ʃ       ɦ
nasal m   n   ɳ    
approximant ʋ   l   ɭ̆ j  
trill/flap     ɾ    

Vowels

Inherent vowel

ə following a consonant is not written, but is seen as an inherent part of the consonant letter, so is written by simply using the consonant letter [U+0A95 GUJARATI LETTER KA]. The sound is transcribed as a.

Vowel-signs

Non-inherent vowel sounds that follow a consonant are represented using vowel-signs, eg. ki is written કી [U+0A95 GUJARATI LETTER KA + U+0AC0 GUJARATI VOWEL SIGN II].

The following combining marks are used to indicate vowel sounds.

િ␣ી␣ુ␣ૂ␣ે␣ો␣ૉ␣ૅ␣ા␣ૈ␣ૌ

Gujarati vowel-signs are all combining characters. All vowel-signs are stored after the base consonant, and the font puts them in the correct place for display.

About half of the vowel-signs are are spacing marks, meaning that they consume horizontal space when added to a base consonant.

There are separate vowel-signs for the historical short and long variants, but length is no longer distinctive in modern pronunciation. It is only found in metrical structures of verse.ws,#Overview

[U+0AC5 GUJARATI VOWEL SIGN CANDRA E] and [U+0AC9 GUJARATI VOWEL SIGN CANDRA O] are used to represent the English æ and ɔ sounds, respectively.ws,#Vowels

See also vocalics.

pre-base vowel-sign

One vowel-sign appears to the left of the base consonant letter or cluster, eg. કિ.

િ

This is a combining mark that is always stored after the base consonant. The font places the glyph before the base consonant.

Nasalisation

[U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable, eg. મેં me˜ I

The anusvara may also represent a nasal before a plosive.

Vowel-sign placement

The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.

Vowel absence

The inherent vowel is not always pronounced, even if there is no visual indication of its absence.

For example, the inherent vowel is typically not pronounced at the end of a word, eg. ઘર The inherent vowel is still dropped when the root word is followed by suffixes or in compounds,ws,#Overview eg. ઘરપર ઘરકામ

The inherent vowel may also be elided when combining morphemes, eg. the root પકડ઼ is written the same even when inflected as પકડ઼ે even though the pronunciation is pakṛe. ws,#Overview

In other cases, the inherent vowel is simply missing, eg. વરસાદ is pronounced varsaːd.ws,#Overview

Gujarati can also use [U+0ACD GUJARATI SIGN VIRAMA] (called halant in Gujarati) to explicitly kill the inherent vowel after a consonant. The virama is rarely seen. Apart from the situations just mentioned, the virama is also usually hidden when part of a consonant cluster (see clusters).

The virama is visible, however, if it isn't followed by a consonant, eg. the following explicitly represents just the sound k. ક્

Standalone vowels

Gujarati represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound.

ઇ␣ઈ␣ઊ␣ઉ␣એ␣ઓ␣અ␣ઑ␣ઍ␣આ␣ઔ␣ઐ

Encoding choices

Visually, several of the standalone vowels and some vowel-signs look as it they could be composed of smaller parts. This section compares approaches and considers the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC) to give guidance on which approach is best.

Vowel-signs

The approaches listed here are not equivalent when the text is normalised, and therefore produce different content which creates problems for search or other operations. In all cases, only the precomposed approach in the left column should be used.

Use Do not use
[U+0ACB GUJARATI VOWEL SIGN O] + [U+0ABE GUJARATI VOWEL SIGN AA + U+0AC7 GUJARATI VOWEL SIGN E]
[U+0AC9 GUJARATI VOWEL SIGN CANDRA O] + [U+0ABE GUJARATI VOWEL SIGN AA + U+0AC5 GUJARATI VOWEL SIGN CANDRA E]
[U+0ACC GUJARATI VOWEL SIGN AU] + [U+0ABE GUJARATI VOWEL SIGN AA + U+0AC8 GUJARATI VOWEL SIGN AI]

Independent vowels

The approaches listed here are also not equivalent when the text is normalised, and therefore only the precomposed approach in the left column should be used.

Use Do not use
[U+0A86 GUJARATI LETTER AA] + [U+0A85 GUJARATI LETTER A + U+0ABE GUJARATI VOWEL SIGN AA]
[U+0A8F GUJARATI LETTER E] + [U+0A85 GUJARATI LETTER A + U+0AC7 GUJARATI VOWEL SIGN E]
[U+0A93 GUJARATI LETTER O] + [U+0A85 GUJARATI LETTER A + U+0ACB GUJARATI VOWEL SIGN O]
[U+0A8D GUJARATI VOWEL CANDRA E] + [U+0A85 GUJARATI LETTER A + U+0AC5 GUJARATI VOWEL SIGN CANDRA E]
[U+0A91 GUJARATI VOWEL CANDRA O] + [U+0A85 GUJARATI LETTER A + U+0AC9 GUJARATI VOWEL SIGN CANDRA O]
[U+0A90 GUJARATI LETTER AI] + [U+0A85 GUJARATI LETTER A + U+0AC8 GUJARATI VOWEL SIGN AI]
[U+0A94 GUJARATI LETTER AU] + [U+0A85 GUJARATI LETTER A + U+0ACC GUJARATI VOWEL SIGN AU]

Vocalics

In Gujarati, vocalics are available both as vowel-signs and independent vowels. According to CLDR, only two of the vocalics are regularly used for the Gujarati language.

ઋ␣ૠ␣ૃ␣ૄ

The other vocalics in the Gujarati block are:

ઌ␣ૡ␣ૢ␣ૣ

Consonants

Basic consonants

Stops

પ␣બ␣ભ␣ત␣થ␣દ␣ધ␣ટ␣ઠ␣ડ␣ઢ␣ક␣ખ␣ગ␣ઘ

Affricates

ચ␣છ␣જ␣ઝ

Fricatives

ફ␣સ␣શ␣ષ␣હ

Nasals

મ␣ન␣ઞ␣ણ␣ઙ

Liquids

વ␣ર␣લ␣ળ␣ય

Repertoire extension

The Unicode Gujarati block provides mechanisms for extending the basic set of consonants, in particular for the transcription of Arabic and Avestan.

A set of combining characters for the range 0AFA..0AFF was added to the block in Unicode v10 to allow representation of Arabic sounds. For more details, see the Unicode Standard.u,480

[U+0ABC GUJARATI SIGN NUKTA] is used to transliterate sounds in Avestan found in the texts of the Zoroastrians, who fled to Gujarat from Persia and are known as Parsis. They include the following;

ત઼␣ંઘ઼␣જ઼␣ખ઼

The Gujarati block also has an additional consonant character to represent  the letter ʒ in those transliterations.

For more information on this and other aspects of Avestan transliteration, see Proposal to encode Gujarati Letter ZHA.

Final consonants

Two combining characters can follow a consonant or vowel to produce a final consonant sound in a phonetic syllable.

[U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable (see nasalisation), or represents a homorganic nasal before a plosive.

[U+0A83 GUJARATI SIGN VISARGA] is a rarely used and silent hangover from Sanskrit, representing a final h.

Consonant clusters

The absence of a vowel sound between two or more consonants can be indicated visually in the following ways.

  1. Create a conjunct. There are a number of possibilities here:
    1. Half-forms : Reduce the shape of all consonants in the cluster except the last to a 'half-form'.
    2. Stacking : Reduce a non-initial consonant in size and shape and position it below the first. Sometimes the subjoined character is attached to the bottom left corner.
    3. Touching : Move the component consonants close together, so that they touch.
    4. Special ligation : Create a ligature combining the two shapes (where it may be difficult to identify one or more of the parts).
    5. The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
  2. Show a visible virama below the non-final consonants in the cluster.
  3. Use a final-consonant character before another consonant. See finals.
  4. No indication, although there are usually generalised pronunciation rules that allow readers to spot these locations. See absence.

Conjunct formation

See a table of 2-consonant clusters.
The table allows you to test results for various fonts.

To produce a conjunct, [U+0ACD GUJARATI SIGN VIRAMA] is added between the consonants in the cluster. There are exceptions, but this type of virama is usually not displayed, eg. the sequence + + [U+0AB6 GUJARATI LETTER SHA + U+0ACD GUJARATI SIGN VIRAMA + U+0A9A GUJARATI LETTER CA] produces શ્ચ

The font usually determines which visual method is used, although it is possible to influence this (see joiner).

One slightly unusual aspect of Gujarati is that it sometimes borrows Devanagari glyph shapes for one or more components of a conjunct (though the characters are still the normal Gujarati ones).

Click on the figures below to see which characters are being shown.

Half-forms

A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Gujarati consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly.

ત્‌વ→ત્વ ણ્‌ઢ→ણ્ઢ થ્‌થ→થ્થ

Examples of conjuncts formed by using half-forms.

Vertical combinations

Vertical combinations are particularly common for gemination.

ટ્‌ટ→ટ્ટ ઢ્‌ઢ→ઢ્ઢ ટ્‌ઠ→ટ્ઠ

Examples of conjuncts formed by subjoining non-initial consonants.

Some vertical combinations hang the subjoined second consonant from the bottom-left corner.

દ્‌ગ→દ્ઘ દ્‌ધ→દ્ધ દ્‌બ→દ્બ

Examples where subjoined consonants are attached to the bottom-left corner.

Using devanagari glyph shapes

Certain clusters use devanagari shapes for one or more of the consonants participating in the conjunct. This is a reminder that conjuncts are to a large extent a legacy of Sanskrit text.

હ્‌ય→હ્ય ઞ્‌જ→ઞ્જ દ્‌બ→દ્બ ઙ્‌ક→ઙ્ક

Examples where part(s) of the conjunct use a Devanagari shape.

Ligated conjuncts

Other clusters combine components into special ligated forms, often in a way that makes it difficult to spot the component parts.

ત્‌ત→ત્ત દ્‌પ→દ્ય શ્‌ચ→શ્ચ ક્‌ષ→ક્ષ

Conjuncts formed by ligation.

Touching consonants in conjuncts

Other clusters, particularly where there is no vertical stroke in the preceding consonant, move the components closer together, without major shape changes, so that they touch.

ક્‌ક→ક્ક ક્‌ય→ક્ય જ્‌જ→જ્જ

Consonants in a cluster that touch each other without substantial shape changes.

Conjuncts with ra

After a consonant.When ra follows another consonant, it is typically rendered as a small, diagonal line pointing downwards to the left, eg. ક્ર ગ્ર ભ્ર હ્ર શ્ર After 5 consonants, however, it is rendered as an upside-down v shape below, ie. ટ્ર ઠ્ર ડ્ર ઢ્ર દ્ર After it produces ત્ર

ગ્‌ર→ગ્ર ટ્‌ર→ટ્ર ત્‌ર→ત્ર

Conjuncts formed by a following ra.

Before a consonant.When ra precedes another consonant, it is rendered as a small hook above the following consonant, or a following vertical line in the cluster, eg. ર્ક r͓k ર્સ r͓s Where it precedes a cluster, it is aligned with the vertical line of the trailing consonant, eg. ર્સ્પ r͓s͓p However, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. ર્કા r͓kə ર્કી r͓kī (This illustrates that the basic units of the script are orthographic syllables.)

ર્ક ર્વા ર્સ્પ ર્સ્પા

Examples of positioning of the hook for conjuncts formed by a preceding ra.

Visible virama

The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, eg. ટ્બ ʈ͓b

Other letters

In addition to the consonants and vowel letters already mentioned, the Gujarati block contains the following letters, which CLDR lists as needed for writing the Gujarati language.

ઽ␣ૐ

[U+0ABD GUJARATI SIGN AVAGRAHA] is used to indicate elision when writing Sanskrit.

[U+0AD0 GUJARATI OM] is used for religious texts.

Numbers

Digits

Gujarati has a set of native digits, used in the same way as Latin digits.

૦␣૧␣૨␣૩␣૪␣૫␣૬␣૭␣૮␣૯

Currency

The abbreviation for રૂપિયો 'Rupee' can be written using the dedicated character, [U+0AF1 GUJARATI RUPEE SIGN] or using the initial syllable followed by an abbreviation mark or a full stop, ie. રૂ૰રૂ.

Text direction

Gujarati text runs left to right in horizontal lines.

Show default bidi_class properties for characters in the Gujarati orthography described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Gujarati character app.

Gujarati text is not cursive (ie. joined up like Arabic).

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Explicit shaping controls

U+200C ZERO WIDTH NON-JOINER can be used to force the production of a visible virama, rather than a half-form. For example, + + ZWNJ + [U+0AA5 GUJARATI LETTER THA + U+0ACD GUJARATI SIGN VIRAMA + U+200C ZERO WIDTH NON-JOINER + U+0AA5 GUJARATI LETTER THA] produces થ્‌થ, rather than થ્થ.

U+200D ZERO WIDTH JOINER can be used to produce a half-form, such as સ્‍ચrather than શ્ચ It can also be used to produce standalone half-forms (for educational text) such as સ્‍

Font styles

tbd

Punctuation & inline features

Grapheme boundaries

tbd

Word boundaries

Words are separated by spaces.

Phrase & section boundaries

phrase

, [U+002C COMMA]

; [U+003B SEMICOLON]

: [U+003A COLON]

sentence

. [U+002E FULL STOP]

? [U+003F QUESTION MARK]

! [U+0021 EXCLAMATION MARK]

[U+0964 DEVANAGARI DANDA] (less common)
section [U+0965 DEVANAGARI DOUBLE DANDA] (infrequent)

Gujarati uses standard western punctuation, but may also use the Devanagari version of a full stop, [U+0964 DEVANAGARI DANDA].

Infrequently, [U+0965 DEVANAGARI DOUBLE DANDA] is used for boundaries of text above the sentence level.

Parentheses & brackets

  start end
standard

( [U+0028 LEFT PARENTHESIS]

) [U+0029 RIGHT PARENTHESIS]

Quotations

  start end
initial

[U+201C LEFT DOUBLE QUOTATION MARK]

[U+201D RIGHT DOUBLE QUOTATION MARK]
nested

[U+2018 LEFT SINGLE QUOTATION MARK]

[U+2019 RIGHT SINGLE QUOTATION MARK]

Single quotation marks are used for quotations within quotations.

Emphasis

tbd

Abbreviation, ellipsis & repetition

[U+0AF0 GUJARATI ABBREVIATION SIGN] is a commonly-used character in Gujarati and appears in printed materials. It is used to write abbreviations of words in Gujarati, eg. ડોક્ટર can be abbreviated as ડો૰ The Latin full stop is used interchangably with this character, eg. ડો.

Inline notes & annotations

tbd

Other inline ranges

tbd

Other punctuation

CLDR also lists the following non-ASCII characters.

§␣‐␣–␣—␣†␣‡␣…␣′␣″

Line & paragraph layout

Line breaking & hyphenation

By default, Gujarati breaks lines on the spaces between words.

Show (default) line-breaking properties for characters in the modern Gujarati orthography.

Text alignment & justification

tbd

Letter spacing

tbd

Counters, lists, etc.

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Gujarati orthography uses a native numeric style.

Numeric

The gujarati numeric style is decimal-based and uses these digits.rmcs

૧␣૨␣૩␣૪␣૫␣૬␣૭␣૮␣૯␣૦

Examples:

૧␣૨␣૩␣૪␣૧૧␣૨૨␣૩૩␣૪૪␣૧૧૧␣૨૨૨␣૩૩૩␣૪૪૪

Prefixes and suffixes

Gujarati commonly uses a full stop + space as a suffix.

Examples:

૧. ૨. ૩. ૪. ૫.
Separator for Gujarati list counters: full stop + space.

Styling initials

tbd

Page & book layout

This section is for any features that are specific to Gujarati and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

Languages using the Gujarati script

According to ScriptSource, the Gujarati script is used for the following languages:

References