Gujarati (draft)

Updated 6 July, 2020

This page gathers together basic information about the Gujarati script and its use for the Gujarati language. It aims (generally) to provide an overview of the orthography and typographic features, and (specifically) to advise how to write Gujarati using Unicode.

See also the companion document, Gujarati character notes, for detailed information about specific Unicode characters.

Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription.

Related pages.
Other script summaries.
About this page

Sample (Gujarati)

Select part of this sample text to show a list of characters, with links to more details.

અનુચ્છેદ ૧: પ્રતિષ્ઠા અને અધિકારોની દૃષ્ટિએ સર્વ માનવો જન્મથી સ્વતંત્ર અને સમાન હોય છે. તેમનામાં વિચારશક્તિ અને અંતઃકરણ હોય છે અને તેમણે પરસ્પર બંધુત્વની ભાવનાથી વર્તવું જોઇએ.

અનુચ્છેદ ૨: દરેક વ્યક્તિને જાતિ, રંગ, લિંગ, ભાષા, ધર્મે, રાજકીય અથવા બીજા અભિપ્રાય, રાષ્ટ્રીય અથવા સામાજિક ઉદ્ભવસ્થાન, મિલકત, જન્મ અથવા મોભા જેવા કોઇપણ જાતના ભેદભાવ વગર આ ધોષણામાં રજૂ કરવામાં આવેલા સધળા અધિકારો અને સ્વતંત્રતા ભોગવવાનો હક્ક છે. વધુમાં કોઇપણ વ્યક્તિ તે સ્વતંત્ર, ટ્રસ્ટ હેઠળના સ્વશાસન હેઠળ ન હોય તેવા અથવા સાર્વભામત્વની બીજી કોઇપણ મર્યાદા હેઠળ આવેલા દેશ અથવા પ્રદેશની હોય તો પણ રાજકીય, હફમવવિષયક અથવા આંતરરાષ્ટ્રીય મોભાના ધોરણે તેની સાથે કોઇપણ ભેદભાવ રાખવામાં આવશે નહિ.

Usage & history

From Scriptsource:

The Gujarati script is used for writing the Gujarati and Chodri languages, together spoken by almost 47 million people. It is also used alongside the Devanagari script for writing a number of languages used by the Bhil people, one of India's largest indigenous groups. The script is related to Devanagari, with modifications to some of the letters, and without the headstroke which characterizes most of the Nagari scripts. The loss of the headstroke reflects the script's origins in informal writing; until the mid-19th century it was used primarily for bookkeeping and personal correspondence, but since printing facilities have become widely available to Gujarati speakers the script is used in schools, for printing books and newspapers, in government offices and public signage, and is one of the official scripts of India.

From Wikipedia:

The Gujarati script was adapted from the Devanagari script to write the Gujarati language. Gujarati language and script developed in three distinct phases — 10th to 15th century, 15th to 17th century and 17th to 19th century. The first phase is marked by use of Prakrit, Apabramsa and its variants such as Paisaci, Shauraseni, Magadhi and Maharashtri. In second phase, Old Gujarati script was in wide use. The earliest known document in the Old Gujarati script is a handwritten manuscript Adi Parva dating from 1591–92, and the script first appeared in print in a 1797 advertisement. The third phase is the use of script developed for ease and fast writing. The use of shirorekha (the topline as in Sanskrit) was abandoned. Until the 19th century it was used mainly for writing letters and keeping accounts, while the Devanagari script was used for literature and academic writings. It is also known as the śarāphī (banker's), vāṇiāśāī (merchant's) or mahājanī (trader's) script. This script became basis for modern script. Later the same script was adopted by writers of manuscripts. Jain community also promoted its use for copying religious texts by hired writers.

Basic features

The script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel-signs to the consonant. See the table to the right for a brief overview of features of the modern Gujarati orthography. (See the key. Character counts exclude ASCII characters.)

The following list describes some distinctive characteristics of the Gujarati script.

Text direction

Gujarati text is written horizontally, left to right.

Character lists show:

Vowels

Vowel sounds

Click on the sound groups to see where else in the document each of the sounds are referred to.

Plain vowels

i i u u e e o o ə ə ɛ ɛ ɔ ɔ æ æ ɑ ɑ

Sourcewp. Phones in a lighter colour are non-native or allophones.

Diphthongs

əʋ əj əʋ əj

Inherent vowel

The inherent vowel is usually transcribed as a and pronounced ə. So is pronounced .

Vowel-signs

Other than the inherent vowel, vowel sounds that follow a consonant sound are represented using vowel-signs, eg. કી .

િ␣ી␣ુ␣ૂ␣ે␣ો␣ૉ␣ૅ␣ા␣ૈ␣ૌ

Gujarati vowel-signs are all combining characters. A single Unicode character is used per base consonant, and there are no vowel-signs with multiple parts. All vowel-signs are typed and stored after the base consonant, whether or not they precede it when displayed, and the font puts them in the correct place for display.

About half of the vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

There are separate vowel-signs for the historical short and long variants, but length is no longer distinctive in modern pronunciation. It is only found in metrical structures of verse.w

[U+0AC5 GUJARATI VOWEL SIGN CANDRA E] and [U+0AC9 GUJARATI VOWEL SIGN CANDRA O] are used to represent the English æ and ɔ sounds, respectively.w

See also vocalics.

Nasalisation

[U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable, eg. મેં me˜ I.

The anusvara may also represent a nasal before a plosive.

Vowel-sign placement

The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.

Vowel absence

The inherent vowel is not always pronounced, even if there is no visual indication of its absence.

For example, the inherent vowel is typically not pronounced at the end of a word, eg. ઘર. The inherent vowel is still dropped when the root word is followed by suffixes or in compounds, eg. ઘરપર, and ઘરકામ. w

The inherent vowel may also be elided when combining morphemes, eg. the root પકડ઼ is written the same even when inflected as પકડ઼ે even though the pronunciation is pakṛe. w

In other cases, the inherent vowel is simply missing, eg. વરસાદ is pronounced varsaːd. w

Gujarati can also use [U+0ACD GUJARATI SIGN VIRAMA] (called halant in Gujarati) to explicitly kill the inherent vowel after a consonant. The virama is rarely seen. Apart from the situations just mentioned, the virama is also usually hidden when part of a consonant cluster (see clusters).

The virama is visible, however, if it isn't followed by a consonant, eg. ક્ explicitly represents just the sound k.

Standalone vowels

Gujarati represents standalone vowels using a set of independent vowel letters. The set includes a character to represent the inherent vowel sound.

ઇ␣ઈ␣ઊ␣ઉ␣એ␣ઓ␣અ␣ઑ␣ઍ␣આ␣ઔ␣ઐ

Visual composition

Visually, several of the standalone vowels and some vowel-signs look as it they could be composed of smaller parts, juxtaposed. For example, [U+0A86 GUJARATI LETTER AA] looks like it could be composed of + [U+0A85 GUJARATI LETTER A + U+0ABE GUJARATI VOWEL SIGN AA]. Similarly, [U+0ACB GUJARATI VOWEL SIGN O] looks like + [U+0ABE GUJARATI VOWEL SIGN AA + U+0AC7 GUJARATI VOWEL SIGN E].

These compositions and decompositions do not exist in Unicode normalisation forms, and the Unicode Standard requires the use of single codepoints rather than sequences in all cases.

Vocalics

In Gujarati, vocalics are available both as vowel-signs and independent vowels. According to CLDR, only two of the vocalics are regularly used for the Gujarati language.

ઋ␣ૠ␣ૃ␣ૄ

The other vocalics in the Gujarati block are:

ઌ␣ૡ␣ૢ␣ૣ

Consonants

Consonant sounds

Click on the sounds to see where else in the document they are referred to.

labial dental alveolar post-
alveolar
retroflex palatal velar glottal
stop p b
t d
    ʈ ɖ
ʈʰ ɖʰ
  k ɡ
ɡʰ
 
affricate       t͡ʃ d͡ʒ
t͡ʃʰ d͡ʒʰ
       
fricative f   s z ʃ       ɦ
nasal m   n   ɳ    
approximant ʋ   l   ɭ̆ j  
trill/flap     ɾ    

Sourcewp. Phones in a lighter colour are non-native or allophones.

Basic consonants

પ␣બ␣ભ␣ત␣થ␣દ␣ધ␣ટ␣ઠ␣ડ␣ઢ␣ક␣ખ␣ગ␣ઘ
ચ␣છ␣જ␣ઝ
ફ␣સ␣શ␣ષ␣હ
મ␣ન␣ઞ␣ણ␣ઙ
વ␣ર␣લ␣ળ␣ય

Repertoire extension

The Unicode Gujarati block provides mechanisms for extending the basic set of consonants, in particular for the transcription of Arabic and Avestan.

A set of combining characters for the range 0AFA..0AFF was added to the block in Unicode v10 to allow representation of Arabic sounds. For more details, see the Unicode Standard. →u480

[U+0ABC GUJARATI SIGN NUKTA] is used to transliterate sounds in Avestan found in the texts of the Zoroastrians, who fled to Gujarat from Persia and are known as Parsis. They include the following;

ત઼␣ંઘ઼␣જ઼␣ખ઼

The Gujarati block also has an additional consonant character to represent  the letter ʒ in those transliterations.

For more information on this and other aspects of Avestan transliteration, see Proposal to encode Gujarati Letter ZHA.

Final consonants

Two combining characters can follow a consonant or vowel to produce a final consonant sound in a phonetic syllable.

[U+0A82 GUJARATI SIGN ANUSVARA] nasalises the vowel in a syllable (see nasalisation), or represents a homorganic nasal before a plosive.

[U+0A83 GUJARATI SIGN VISARGA] is a rarely used and silent hangover from Sanskrit, representing a final h.

Consonant clusters

The absence of a vowel sound between two or more consonants can be indicated visually in the following ways.

  1. Create a conjunct. There are a number of possibilities here:
    1. Reduce the shape of all consonants in the cluster except the last to a 'half-form'.
    2. Reduce a non-initial consonant in size and shape and position it below the first. Sometimes the subjoined character is attached to the bottom left corner.
    3. Move the component consonants close together, so that they touch.
    4. Create a ligature combining the two shapes (where it may be difficult to identify one or more of the parts).
    5. The letter ra has its own idiosyncratic way of combining with other consonants, whether it precedes or follows them.
  2. Show a visible virama below the non-final consonants in the cluster.
  3. Use a final-consonant character before another consonant. See finals.
  4. No indication, although there are usually generalised pronunciation rules that allow readers to spot these locations. See absence.

One slightly unusual aspect of Gujarati is that it sometimes borrows Devanagari glyph shapes for one or more components of a conjunct (though the characters are still the normal Gujarati ones).

In all cases except the last, the underlying mechanism in terms of codepoints involves adding [U+0ACD GUJARATI SIGN VIRAMA] between the consonants in the cluster, eg. શ્ચ is produced by the sequence + + [U+0AB6 GUJARATI LETTER SHA + U+0ACD GUJARATI SIGN VIRAMA + U+0A9A GUJARATI LETTER CA].

The font usually determines which visual method is used, although it is possible to influence this (see below).

Half-forms

A half-form is typically created by removing the vertical line in the consonant shape, where there is one. (The vertical line is associated with the inherent vowel, and around two-thirds of Gujarati consonant shapes contain one.) There is often some additional tweaking of glyphs in order to join the components neatly.

ત્‌વ→ત્વ ણ્‌ઢ→ણ્ઢ થ્‌થ→થ્થ

Examples of conjuncts formed by using half-forms.

Vertical combinations

Vertical combinations are particularly common for gemination.

ટ્‌ટ→ટ્ટ ઢ્‌ઢ→ઢ્ઢ ટ્‌ઠ→ટ્ઠ

Examples of conjuncts formed by subjoining non-initial consonants.

Some vertical combinations hang the subjoined second consonant from the bottom-left corner.

દ્‌ગ→દ્ઘ દ્‌ધ→દ્ધ દ્‌બ→દ્બ

Examples where subjoined consonants are attached to the bottom-left corner.

Using devanagari glyph shapes

Certain clusters use devanagari shapes for one or more of the consonants participating in the conjunct. This is a reminder that conjuncts are to a large extent a legacy of Sanskrit text.

હ્‌ય→હ્ય ઞ્‌જ→ઞ્જ દ્‌બ→દ્બ ઙ્‌ક→ઙ્ક

Examples where part(s) of the conjunct use a Devanagari shape.

Ligated conjuncts

Other clusters combine components into special ligated forms, often in a way that makes it difficult to spot the component parts.

ત્‌ત→ત્ત દ્‌પ→દ્ય શ્‌ચ→શ્ચ ક્‌ષ→ક્ષ

Conjuncts formed by ligation.

touching consonants in conjuncts

Other clusters, particularly where there is no vertical stroke in the preceding consonant, move the components closer together, without major shape changes, so that they touch.

ક્‌ક→ક્ક ક્‌ય→ક્ય જ્‌જ→જ્જ

Consonants in a cluster that touch each other without substantial shape changes.

Conjuncts with ra

When ra follows another consonant, it is typically rendered as a small, diagonal line pointing downwards to the left, eg. ક્ર ગ્ર ભ્ર હ્ર શ્ર. After 5 consonants, however, it is rendered as an upside-down v shape below, ie. ટ્ર ઠ્ર ડ્ર ઢ્ર દ્ર. After it produces ત્ર.

ગ્‌ર→ગ્ર ટ્‌ર→ટ્ર ત્‌ર→ત્ર

Conjuncts formed by a following ra.

When ra precedes another consonant, it is rendered as a small hook above the following consonant, or a following vertical line in the cluster, eg. ર્ક r͓k and ર્સ r͓s. Where it precedes a cluster, it is aligned with the vertical line of the trailing consonant, eg. ર્સ્પ r͓s͓p. However, if there is a spacing vowel-sign with a vertical line to the right of the cluster, it aligns with that, eg. ર્કા r͓kə, and ર્કી r͓kī. (This illustrates how the basic units of the script are orthographic syllables.)

ર્ક ર્વા ર્સ્પ ર્સ્પા

Examples of positioning of the hook for conjuncts formed by a preceding ra.

Visible virama

The ability to form conjuncts depends on the richness of the font. Where a font is not able to produce a half-form or ligature, etc., it will leave a visible virama glyph below the initial consonant(s) to indicate the missing vowel sound, eg. ટ્બ ʈ͓b.

Using ZWJ & ZWNJ

The examples in the previous subsection used U+200C ZERO WIDTH NON-JOINER to force the production of a visible virama, rather than a half-form. For example, + + ZWNJ + [U+0AA5 GUJARATI LETTER THA + U+0ACD GUJARATI SIGN VIRAMA + U+200C ZERO WIDTH NON-JOINER + U+0AA5 GUJARATI LETTER THA] produces થ્‌થ, rather than થ્થ.

U+200D ZERO WIDTH JOINER can be used to produce a half-form, such as સ્‍ચ, rather than શ્ચ. It can also be used to produce standalone half-forms (for educational text) such as સ્‍.

Other letters

In addition to the consonants and vowel letters already mentioned, the Devanagari block contains the following letters, which CLDR lists as needed for writing the Gujarati language.

ઽ␣ૐ

[U+0ABD GUJARATI SIGN AVAGRAHA] is used to indicate elision when writing Sanskrit.

[U+0AD0 GUJARATI OM] is used for religious texts.

Combining marks

In addition to the vowel-signs and vocalics already mentioned, the Devanagari block contains the following diacritics, which are also introduced in the following sections, respectively, extensions, and absence.

ઁ␣ં␣ઃ␣઼␣્

[U+0A81 GUJARATI SIGN CANDRABINDU] and [U+0A82 GUJARATI SIGN ANUSVARA] produce nasalisation (see nasalisation), and anusvara can also represent a final consonant, as does [U+0A83 GUJARATI SIGN VISARGA] (see finals).

[U+0ACD GUJARATI SIGN VIRAMA] kills the inherent vowel and creates conjuncts (see absence).

[U+0ABC GUJARATI SIGN NUKTA] and the set of combining characters just below are used to extend the consonant repertoire (see extensions).

ૺ␣ૻ␣ૼ␣૽␣૾␣૿

Punctuation

Gujurati uses western punctuation, including the following non-ASCII characters.

‘␣’␣“␣”

CLDR also lists the following non-ASCII characters.

§␣‐␣–␣—␣†␣‡␣…␣′␣″

The Unicode Gujarati block only contains one characters with the general property of punctuation. It is used for abbreviations (see abbrev). CLDR for Gujarati lists this as auxiliary.

Gujurati sometimes uses the dandas from the Unicode Devanagari block (see phrase). CLDR for Gujarati doesn't mention these.

।␣॥

Symbols

The only character in the Unicode Gujarati block with the symbol property is the Gujarati rupee sign.

This can also be written using ordinary characters, ie. રૂ૰, which is short for રૂપિયો. Note that rupee can also be abbreviated using a period, ie. રૂ..

Numbers

There is a set of Gujarati digits, and they are used in the same way as Latin digits.

૦␣૧␣૨␣૩␣૪␣૫␣૬␣૭␣૮␣૯

Glyph shaping & positioning

You can experiment with examples using the Gujarati character app.

Context-based shaping

tbd

Context-based positioning

tbd

Baselines & inline alignment

tbd

Font styles

tbd

Transforming characters

tbd

Structural boundaries & markers

Grapheme boundaries

tbd

Word boundaries

Words are separated by spaces.

Phrase & section boundaries

Gujarati uses standard western punctuation, but may also use the Devanagari version of a full stop, [U+0964 DEVANAGARI DANDA] (although an ASCII full stop is seen in the sample text above.)

For boundaries of text above the sentence level there is [U+0965 DEVANAGARI DOUBLE DANDA].

Bracketing & range markers

tbd

Quotations

tbd

Emphasis

tbd

Abbreviation, ellipsis & repetition

[U+0AF0 GUJARATI ABBREVIATION SIGN] is a commonly-used character in Gujarati and appears in printed materials. It is used to write abbreviations of words in Gujarati, eg. ડોક્ટર can be abbreviated as ડો૰. The Latin full stop is used interchangably with this character, eg. ડો..

Other punctuation

tbd

Inline notes & annotations

tbd

Line & paragraph layout

Line breaking & hyphenation

Generally, Gujarati text breaks on the spaces between words.

Character properties

Characters used for the thisLanguage language have the following assignments related to line-break properties.

AL102૰ ઇ ઈ ઊ ઉ એ ઓ અ ઑ ઍ આ ઔ ઐ ઋ ૠ પ બ ભ ત થ દ ધ ટ ઠ ડ ઢ ક ખ ગ ઘ ચ છ જ ઝ ફ સ શ ષ હ મ ન ઞ ણ ઙ વ ર લ ળ ય ઽ ૐ
BA4। ॥
CM36િ ી ુ ૂ ે ો ૉ ૅ ા ૈ ૌ ૃ ૄ ઁ ં ઃ ઼ ્
NU20૦ ૧ ૨ ૩ ૪ ૫ ૬ ૭ ૮ ૯
PR2
QU8‘ ’ “ ”
Show legend u

AL (ordinary alphabetic and symbol characters) requires other characters to provide break opportunities; otherwise, unless tailored rules are applied, no line breaks are allowed between pairs of them.

BA (break after) indicates that it is normal to break after that character. 

CM (combining mark) takes on the behaviour of its base character.

NU (number) behaves like ordinary characters (AL) in the context of most characters but activate the prefix and postfix behavior of prefix and postfix characters.

PR (numeric prefix) may not be separated from following numeric characters or following opening characters, even if a space character intervenes. For example, there is no break opportunity in “฿ (100.00)”.

QU (quotation) characters can be opening or closing, or even both, depending on usage. The default is to treat them as both opening and closing.

Text alignment & justification

tbd

Letter spacing

tbd

Counters, lists, etc.

Ready-made Counter Styles lists one, numeric, counter style for use with the Gujarati language. You can experiment with these styles using the Counter styles converter.

  1 2 3 4
gujarati
(numeric)
  11 22 33 44
gujarati
(numeric)
੧੧ ੨੨ ੩੩ ੪੪
  111 222 333 444
gujarati
(numeric)
੧੧੧ ੨੨੨ ੩੩੩ ੪੪੪
Counters produced by the Gujarati counter style.

Numeric

The gujarati numeric style is decimal-based and uses the digits shown below.

੦␣੧␣੨␣੩␣੪␣੫␣੬␣੭␣੮␣੯

Styling initials

tbd

Page & book layout

General page layout & progression

tbd

Grids & tables

tbd

Notes, footnotes, etc

tbd

Forms & user interaction

tbd

Page numbering, running headers, etc

tbd

Character lists

The Gujarati script characters in Unicode 11.0 are in the following block:

The modern Gujarati orthography described here uses characters from the following Unicode blocks.

General Punctuation4‘​’​“​”Copy to clipboard
Gujarati79ઁ​ં​ઃ​અ​આ​ઇ​ઈ​ઉ​ઊ​ઋ​ઍ​એ​ઐ​ઑ​ઓ​ઔ​ક​ખ​ગ​ઘ​ઙ​ચ​છ​જ​ઝ​ઞ​ટ​ઠ​ડ​ઢ​ણ​ત​થ​દ​ધ​ન​પ​ફ​બ​ભ​મ​ય​ર​લ​ળ​વ​શ​ષ​સ​હ​઼​ઽ​ા​િ​ી​ુ​ૂ​ૃ​ૄ​ૅ​ે​ૈ​ૉ​ો​ૌ​્​ૐ​ૠ​૦​૧​૨​૩​૪​૫​૬​૭​૮​૯​૱Copy to clipboard

The infrequently used characters come from these blocks.

Devanagari2।​॥Copy to clipboard
General Punctuation8‐​–​—​†​‡​…​′​″Copy to clipboard
Gujarati1Copy to clipboard
Latin-1 Supplement1§Copy to clipboard

See also the Character usage lookup page, and the Script Comparison Table.

Languages using the Gujarati script

According to ScriptSource, the Gujarati script is used for the following languages:

References

  1. [ d ] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
  2. [ ss ] ScriptSource, Gujarati
  3. [ u ] The Unicode Standard v11.0
  4. [ w ] Wikipedia, Gujarati alphabet
  5. [ wp ] Wikipedia, Gujarati phonology
  6. [ z ] Vinodh Rajan, Proposal to encode Gujarati Letter ZHA
Show stats
Main
Auxiliary
Archaic
Other
Deprecated