Updated 2 April, 2019 • tags bengali, scriptnotes
This page provides basic information about the Bengali script, and its use for the Bangla language. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as i learned. For character-specific details follow the links to the Bengali character notes.
For similar information related to other scripts, see the Script comparison table.
Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text. Colours and annotations on panels listing characters are relevant to their use for the Bangla language.
ধারা ১ সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত।
ধারা ২ এ ঘোষণায় উল্লেখিত স্বাধীনতা এবং অধিকারসমূহে গোত্র, ধর্ম, বর্ণ, শিক্ষা, ভাষা, রাজনৈতিক বা অন্যবিধ মতামত, জাতীয় বা সামাজিক উত্পত্তি, জন্ম, সম্পত্তি বা অন্য কোন মর্যাদা নির্বিশেষে প্রত্যেকেরই সমান অধিকার থাকবে। কোন দেশ বা ভূখণ্ডের রাজনৈতিক, সীমানাগত বা আন্তর্জাতিক মর্যাদার ভিত্তিতে তার কোন অধিবাসীর প্রতি কোনরূপ বৈষম্য করা হবেনা; সে দেশ বা ভূখণ্ড স্বাধীনই হোক, হোক অছিভূক্ত, অস্বায়ত্বশাসিত কিংবা সার্বভৌমত্বের অন্য কোন সীমাবদ্ধতায় বিরাজমান।
The Bengali (also called Bangla) script is used for writing the Bengali language, spoken by over 180,000,000 people mostly in Bangladesh and India. It is also used for a number of other Indian languages including Sylheti and, with one or two modifications, Assamese. It is a Brahmic script although its exact derivation is disputed. Bengali writing shares some similarities with the Dravidian-language scripts, particularly in the shapes of some vowel letters, but it is generally more similar to the Aryan-language scripts, in particular Devanagari.
The Bengali alphabet or Bangla alphabet (Bengali: বাংলা বর্ণমালা, bangla bôrnômala) or Bengali script (Bengali: বাংলা লিপি, bangla lipi) is the writing system for the Bengali language and, together with the Assamese alphabet, is the fifth most widely used writing system in the world. The script is used for other languages like Meithei and Bishnupriya Manipuri, and has historically been used to write Sanskrit within Bengal.
Bengali is an abugida, and is called বাংলা baɱla in the Bangla language. See the table to the right for a brief overview of features, taken from the Script Comparison Table.
This is characterised by consonant characters that include an inherent vowel sound. The inherent vowel can be overridden using vowel signs appended to the character. There are also independent vowel signs to represent vowels that are not preceded by consonants. The syllable is the unit for various aspects of the behaviour of the script.
The alphabet is split into vowels and consonants. With one exception (ɔ-kar), each vowel is represented by both an independent version and a combining vowel sign.
Text runs horizontally, left to right, and lines typically break at the spaces between words. The script has no upper-/lowercase distinction.
The basic unit for text segmentation is the syllable. Unicode grapheme clusters don't cover consonant clusters, so some additional processing is needed to identify text unit boundaries.
The Bengali script characters in Unicode 12.0 are in the following block:
The following links give information about characters used for languages associated with this script. The numbers in parentheses are for non-ASCII characters.
For character-specific details see Bengali character notes.
The effective unit of the Bengali writing systems is the orthographic syllable.
An orthographic syllable can be defined in one of the following sequence of code point. Lowercase letters represent combining characters. Some vowel signs may be displayed at the start of the sequence, although the code points representing them always appear after the base consonant
[C[n]h] [C[n]h (ᶻʷʲ)] C[n] [h | (ᶻʷⁿʲ) v [v] (c)] [f]
The core of a consonant-based syllable is a base consonant character, which may or may not additionally represent an inherent vowel if it stands alone. If it is followed by a vowel-sign or hasant, there is no inherent vowel. At the end of a word, there may or may not be an inherent vowel, whether or not there is a hasant.
The base consonant may also be a combination of consonant code point plus nukta.
The base consonant can be preceded by up to two consonant+hasant pairs (where the consonant may also be a combination of consonant+nukta), but only if those consonants form conjuncts (ie. the hasant is invisible), eg. ক্ক,ম্প,ক্ষ, ন্ত্র k͓k,m͓p,k͓ʃ̇, n͓t͓r. If the preceding consonants carry visible hasant symbols, those are treated as separate orthographic syllables.
In some cases, a RA + hasant followed by YA may introduce a ZWJ character after the hasant, in order to specify special shaping rules for the YA.
Note that the variable use of the hasant in Bengali means that a phonetic cluster of consonants can constitute a larger series of orthographic syllables. For example, করতাল krtɑl kɔrtɑl has two phonetic syllables, but 3 orthographic since the rt combination is not combined.
The base consonant can be followed by either one or two code points representing vowel-signs, eg. কী, কি, কো, or instead by a hasant, eg. ক্ k͓. Vowel code points may be followed by a nasalisation diacritic.
A base consonant may be followed by ZWNJ + vowel code point where the author wants to prevent ligation of the following vowel sign, eg. শু ʃᶻʷⁿʲu.
Unless the base consonant is followed by a hasant, the syllable may end with a final consonant repesented by khanda ta, anusvara, or visarga.
Vowel-based syllables begin with a standalone vowel, which is represented by a single independent vowel or vocalic.
An independent vowel may be followed by an anusvara, visarga or candrabindu (nasalisation), eg. উঃ, আঁ ụh̽, ɑm̽.
Text is normally written horizontally, left to right.
Each consonant symbol has an inherent following vowel sound, typically transcribed as a, and pronounced as ɔ or o. (And sometimes halfway between these two, when influenced by surrounding sounds.) Bengalis are not always aware of these sound differences – thinking of this as one sound. So ক [U+0995 BENGALI LETTER KA] is actually pronounced kɔ or ko.
Note that there is also a vowel pronounced o. This can lead to inconsistent spellings, eg. bhalo, good, well, can be spelled either ভালো or ভাল. Verb forms tend to be particularly inconsistent, sometimes basing the rationale on what looks good in a particular context.
The rules for determining the sound of the inherent vowel are not simple. Partly it is a question of vowel harmony. The following two tendencies can help:
In words with inherent vowels in two consecutive syllables, the sound will usually be ɔ..o, not o..ɔ, eg. গরম gɔrôm, hot. However, exceptions occur for prefixes, such as prɔ-, ɔ-, and sɔ-. r8
When pronounced at the end of a word after a conjunct consonant, the inherent vowel is always o, eg. যুদ্ধ ýud͓dʰ ʤuddʰo war. r8
The pronunciation tends to be o when followed by a one of i, j, u, w either immediately or in the next syllable, but ɔ otherwise. d400
The inherent vowel is pronounced at the end of some words and not others, eg. গরম gɔrôm, hot vs. গড়ান gɔɽɑnô, to roll . There is no real way to tell when it is pronounced and when not in this position, except that it is usually pronounced following a word-final consonant cluster.
Bengali uses ্ [U+09CD BENGALI SIGN VIRAMA] (called hasant in Bengali) to kill the inherent vowel after a consonant. The virama is rarely seen. As just mentioned, no virama is used at the end of a word, or in many other situations.
The virama is used, however, when the consonant is part of a consonant cluster but also usually hidden (see clusters).
The virama is visible, however, if it isn't followed by a consonant, eg. ক্ k͓ explicitly represents just the sound k.
Refs: Radice 3, 7-8, 21, 148; Daniels 400
The pronunciation of a vowel can be affected by the vowel in the following syllable. Radice provides the following table, though this is a simplification and there are many exceptions.
|Followed by i or u||Followed by ɔ, o, e or a|
|o → u||o → ɔ|
|ɔ → o||u → o|
|e → i||e → æ|
|æ → e||i → e|
For example, the verb শোনা ʃonɑ to hear with an i ending becomes ʃuni, দেখা dækʰa to see becomes dekʰi, etc. This sometimes accounts for the pronunciation of the inherent vowel, eg. অতিথি otitʰi guest and অনুবাদ onubad translation start with o rather than ɔ.
To produce a different vowel than the inherent one, Bengali attaches vowel-signs (Sanskrit matra) to the preceding consonant, eg. কী kiː.
Bengali vowel signs are all combining characters. In principle a single character is used per base consonant, but 2 vowel signs decompose to more than one character.
All vowel-signs are typed and stored after the base consonant, whether or not they precede it when displayed. The font takes care of the glyph positioning.
Bengali has lost the distinction between short and long vowels in pronunciation, but retains the difference in spelling.
Almost all of the vowel-signs are spacing combining characters, meaning that they consume horizontal space when added to a base consonant..
See also vocalics.
The following list shows where vowel-signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are. The figure after the + sign represents combinations of Unicode characters,
Two vowel-signs have parts on both sides of the base consonant or cluster. It is possible to construct these vowel-signs from a sequence of individual shapes, and in fact in Normalisation Form D they do decompose, but the Unicode Standard recommends that authors use the precomposed code points.
|ে + ৗ
09C7 BENGALI VOWEL SIGN E +
09D7 BENGALI AU LENGTH MARK
09CC BENGALI VOWEL SIGN AU
|ে + া
09C7 BENGALI VOWEL SIGN E +
09BE BENGALI VOWEL SIGN AA
09CB BENGALI VOWEL SIGN O
Whether you use a single code point or two, the vowel-signs always come after the base consonant when typing and storing. You also need to enter and store the vowel-sign components in left-to-right order.
ঁ [U+0981 BENGALI SIGN CANDRABINDU] nasalises the vowel in a syllable, eg. হ্যাঁ h͓ýɑm̽ yes, হাঁপান hɑm̽pɑn hɑ̃pɑn to pant.
This appears over the top of an independent vowel, but over the basic consonant when a vowel sign is attached, not over the vowel sign, In the sequence of characters, however, this should occur after any combining vowel sign associated with the same syllable.
This positioning is not evident when using the Noto Sans webfont, but if you apply another font, such as Kohinoor Bangla, it appears in the correct location. Note how the base consonant is identified correctly in the second word below, even though it is 4 code points away.
Sometimes vowel signs (particularly u) form ligatures with a preceding base consonant. The table below shows ligated and non-ligated forms for several combinations. In certain contexts it may not be appropriate to ligate (eg. newspapers and modern typefaces). Both forms are equivalent in every way but visually.
The default behaviour of a given font can be modified using the zero-width non-joiner character in Unicode content. For example, a font that produces the ligature গু gu can be made to show the simpler form গু by the sequence গ + ZWNJ + ু [U+0997 BENGALI LETTER GA + U+200C ZERO WIDTH NON-JOINER + U+09C1 BENGALI VOWEL SIGN U].
Bengali represents syllable-initial vowels using a set of independent vowel letters.
One vocalic is in common use, shown here in standalone and vowel-sign forms.
Three more vocalics, ঌ [U+098C BENGALI LETTER VOCALIC L], ৠ [U+09E0 BENGALI LETTER VOCALIC RR] and ৡ [U+09E1 BENGALI LETTER VOCALIC LL], are historic and only used to write Sanskrit in Bengali.
Two more characters are specifically for Assamese.
The consonant cluster ক্ষ k͓ʃ̇ is called khiyɔ and is often treated as a letter of the alphabet in that some dictionaries give it it's own section, eg. ক্ষুদ্র k͓ʃ̇ud͓r kʰudro small.
़ [U+093C DEVANAGARI SIGN NUKTA] is used to create 3 additional letters, eg. the dot changes ড ɖ to ড় ɽ. Here is a list of graphemes that combine nukta with an existing consonant.
The Unicode Standard recommends that content authors use decomposed sequences for these letters. However, the Unicode block also contains the precomposed code points shown below.
Decomposed sequences are not recomposed by Unicode Normalisation Form C (NFC).
The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.
One letter and 2 diacritics represent syllable-final consonant sounds.
ৎ [U+09CE BENGALI LETTER KHANDA TA] is a variant form of ত [U+09A4 BENGALI LETTER TA] that was added to Unicode 4.1 as a separate character. In some words a tɔ that has no following inherent or other vowel may have this shape. It either comes at the end of words, eg. হঠাৎ hʈʰɑt̽ hɔʈʰɑt suddenly, or before a consonant that doesn't naturally combine with tɔ, eg. উৎসব ụt̽ʃ̈b utʃɔb festival or সৎমা ʃ̈t̽mɑ ʃɔtmɑ step-mother. Many such words, however, use ত, eg. সদাত্মা ʃ̈dɑt͓mɑ ʃadɑtːɑ. It's not possible to say which will be used.
ং [U+0982 BENGALI SIGN ANUSVARA] is a final nasal, eg. বাংলা bɑŋ̽lɑ bɑŋlɑ Bengali (language).
Sometimes spelling is inconsistent, especially when this or ঙ [U+0999 BENGALI LETTER NGA] are used in a conjunct, eg. সাঙঘাতিক ʃ̈ɑŋgʰɑtik or সাংঘাতিক ʃ̈ɑŋ̽gʰɑtik ʃ̈ɑŋgʰɑtik terrible; রঙ rŋ or রং rŋ̽ rɔŋ colour. However, in certain words the spelling is fixed. বাংলা is one such word. But, since this cannot support vowel signs, the word for Bengali nation (rather than language) has to be spelled with ঙ [U+0999 BENGALI LETTER NGA], ie. বাঙালী bɑŋɑlī bɑŋgɑlī.
ঃ [U+0983 BENGALI SIGN VISARGA] has two different effects:
The only common colloquial words containing this sign are বাঃ bɑh̽ bɑːɦ left and দুঃখ duh̽kʰ dukkʰo sorrow.
In the sequence of characters, these should all occur after any combining vowel sign associated with the same syllable. None carry vowel signs.
See also the candrabindu diacritic, which nasalises a vowel.
The absence of vowels between consonants can be represented in the following ways:
Unlike languages written in the Devanagari script, consonant clusters are often not represented as conjuncts in Bengali. It is necessary to just know that the vowel should not be pronounced, eg. রিকশা rikʃɑ rikʃɑ rickshaw. Grammatical suffixes and endings are typically written without conjuncts, eg. খাননা kʰɑnnɑ kʰɑnnɑ, which is the present tense form khan plus negative suffix na; করছ krcʰ korcʰo, which is stem kôr from kôra plus present continuous ending chô.
In all cases except the the one just described, the underlying mechanism in terms of codepoints involves adding ্ [U+09CD BENGALI SIGN VIRAMA] (called হসন্ত hʃ̈n͓t hɔʃonto in Bengali) between the consonants in the cluster, eg. ক্ষ k͓ʃ̇ is produced by the sequence ক + ্ + ষ [U+0995 BENGALI LETTER KA + U+09CD BENGALI SIGN VIRAMA + U+09B7 BENGALI LETTER SSA].
The font usually determines which visual method is used, although it is possible to influence this (see below).
Quite often, clustered consonants are pronounced differently than you would expect. In particular, conjuncts ending with ব [U+09AC BENGALI LETTER BA] or ম [U+09AE BENGALI LETTER MA] tend to not pronounce the latter, but double the length of the consonant before it, eg. .
Nasals in conjuncts tend to conform to phonological patterns. Velar consonants (k, kh, g, etc) combine with ঙ ŋɔ, palatal consonant (c, ch, ..) combine with ঞ ñɔ, retroflex ণ ɳɔ, dental ন nɔ, and labial ম mɔ.
This set of tables shows conjunct forms for a range of characters. It is not exhaustive. The shapes shown here are those contained in the Noto Sans Bengali font. Other fonts may combine components in different ways. Click on the red characters if you want to see the standard forms.
Components arranged vertically.
Components arranged side-by-side, frequently with simplification of the initial consonant.
Arranged in a way that involves ligation, significantly altering one or more of the components.
Conjuncts ending with র r typically have a wavy line below.
Conjuncts ending with ম m typically have a long component to the right.
Conjunct shapes are most commonly formed by arranging the components vertically, reducing and combining the shapes of the individual components as needed, eg. স+থ→স্থ ʃ͓̈tʰ in আস্থা ɑʃ͓̈tʰɑ ɑʃtʰɑ trust, or ল+ল→ল্ল l͓l in ঝিল্লি ʤʰil͓li ʤʰilli grasshopper.
Many conjuncts are formed by combining components horizontally, eg. ম+প→ম্প m͓p in ক্যম্পাস k͓ým͓pɑʃ̈ kyæmpas campus, or চ+চ→চ্চ c͓c in উচ্চারণ ục͓cɑrn̈ ut͡ʃɑrɔn pronunciation.
A small set of conjuncts combines the consonants into a ligated shape, where individual components can't be easily discerned, eg. ষ+ট→ষ্ট ʃ͓̇ʈ in খ্রিষ্টান kʰ͓riʃ͓̇ʈɑn kʰriʂʈan christian, or ক+ষ→ক্ষ k͓ʃ̇ in ক্ষণ k͓ʃ̇n̈ kʃon moment.
Different fonts may combine the same letters in different ways. The following figures shows characters that are combined in different ways by different fonts.
Like other scripts, initially র [U+09B0 BENGALI LETTER RA] is displayed in a non-standard way in consonant clusters.
A rɔ that is pronounced at the start of a cluster is displayed as a mark above the following consonant(s), eg. rt in গর্ত gɔrtô hole. Unlike Devanagari, it doesn't appear to be displayed above the vowel-sign of the orthographic syllable, eg. কুর্তা kur͓tɑ kurtɑ Indian shirt. Like other consonant clusters, it may not involve a conjunct at all, eg. কারসাজি kɑrʃ̈ɑʤi kɑrʃɑʤi trickery.
A trailing rɔ is typically displayed as a wavy line below the other consonants, eg. gr in গ্রাম gram village.
A cluster-final m is also displayed in a characteristic way, as a long line to the right with an appendage to the left at the bottom, eg. উন্মত্ত ụn͓mt͓t unmɔtto insane.
Bengali also has a particular way of representing a cluster-final j semi-vowel. This is typically represented using the full form of the preceding consonant followed by a special form of য [U+09AF BENGALI LETTER YA], ্য, known as y̌ɔ-phɔla, eg. হ্যাঁ hyæ̃ yes.
The effect of yo-phola at the end of a conjunct is generally (a) to double the length of the preceding consonant, and (b) to change the value of the following vowel if it is inherent or a. For more details, see the character notes.
When the virama is used it may be because the font doesn't have a particular conjunct ligature, but it may also be visible in places where the phonology is unusual, eg. ফ্ল্যাট pʰ͓l͓ýɑʈ pʰlæʈ flat; লান্চ lɑn͓c lɑnt͡ʃ lunch (though these may also be spelled with conjuncts, eg. ফ্ল্যাট pʰ͓l͓ýɑʈ). It is also quite common to see উদ্যাপন ụd͓ýɑpn to distinguish it from words like উদ্যান ụd͓ýɑn. These words are etymologically related, but distinct phonetically.
Again, if a visible virama is wanted but not what the font does by default, it is possible to force it by inserting a ZWNJ character after the virama.
As mentioned earlier, Bengali represents some final consonants using diacritics. Such syllable-final diacritics are followed by ordinary consonant shapes in consonant clusters.
Besides the vowels and consonants described above, the Unicode Bengali block contains the following letters.
ঽ [U+09BD BENGALI SIGN AVAGRAHA] is a Sanskrit-derived symbol that is used in modern Bengali to lengthen vowel sounds, eg. কিঽঽঽ? kiiii Whaaatt?, or শুনঽঽঽ ʃunooo Listennn.w
The other letters don't appear to be commonly used in Bangla.
Apart from the vowel-signs, the Bengali block contains the following combining characters, all of which are described elsewhere on this page.
Unicode 11 also introduced the following sandhi mark for use with Sanskrit.
The only punctuation in the Unicode Bengali block is this abbreviation symbol (see abbrev).
Bengali also uses two punctuation marks from the Devanagari block. See phrase.
The following symbols are used for the modern Bangla language.
The first is the Bengali currency symbol (see currency). The second, ৺ [U+09FA BENGALI ISSHAR] is used alongside the names of deceased persons.
The Bengali block contains 2 more symbols, both also associated with currency.
U+200C ZERO WIDTH NON-JOINER (ZWNJ) can be used to force the production of a visible virama, rather than a half-form (see visiblevirama). It can also be used to prevent the formation of vowel ligatures (see vowelligatures).
U+200D ZERO WIDTH JOINER (ZWJ) is used to produce special joining forms for YA (see consonant_syllable).
Bengali has a set of native digits, which are used regularly in text. They are decimal-based.
See also the section Counters below.
৳ [U+09F3 BENGALI RUPEE SIGN] is the Bengali rupee sign.
There are also a number of currency symbols, used in older texts, including ৲ [U+09F2 BENGALI RUPEE MARK] and the following currency denominator signs.
These were used in an additive/subtractive system for specifying the number of ānā in the Bengali notation for currency used up to 1957, eg. ৷৷৶৹ 11 ānā (11 ana); ৸৶৹ 15 ānā (15 ana). There are 16 ana in one rupee, and the system works in multiples of 4. For a detailed explanation of usage, see [Pandey].
Words are separated by spaces.
The danda, । [U+0964 DEVANAGARI DANDA], is used for sentence final punctuation. I haven't seen much evidence for the use of the double danda, ॥ [U+0965 DEVANAGARI DOUBLE DANDA].
Western punctuation, such as commas, semicolons, colons, quotation marks and hyphens are also used quite commonly.
The bisɔrgô ঃ [U+0983 BENGALI SIGN VISARGA] is sometimes used to mark initial abbreviations.
A sign called urdha-comma can be used to indicate truncation of words, eg. কʼরে kô're after, and ʼপরে 'pôre above. The Unicode Standard recommends use of ʼ [U+02BC MODIFIER LETTER APOSTROPHE], but in Wikipedia a normal apostrophe seems to be used [Unicode] p460.
The Unicode block also has the punctuation ৽ [U+09FD BENGALI ABBREVIATION SIGN], but it's not clear to me how it is used. It doesn't appear to be in common use.
Italicisation, bolding, and underlining are not generally used for Bengali text.
Bengali uses a single counter style according to Ready-made Counter Styles.
The numeric style uses the Bengali digits: '০' '১' '২' '৩' '৪' '৫' '৬' '৭' '৮' '৯'.
You can experiment with these styles using the Counter styles converter.
Initial letter styling can be applied to Bengali text.
Further information needed for this section includes:
Glyph shaping & positioning Cursive text Context-based shaping Multiple combining characters Context-based positioning Transforming characters Structural boundaries & markers Grapheme, word & phrase boundaries Hyphens & dashes Bracketing information Quotations Abbreviations, ellipsis, & repetition Emphasis & highlights Inline notes & annotations Inline layout Inline text spacing Line & paragraph layout Line breaking Hyphenation Text alignment & justification Baselines & inline alignment Page & book layout General page layout & progression Directional layout features Grids & tables Notes, footnotes, etc. Forms & user interaction Page numbering, running headers, etc.