Use accesskey "n" to jump to the internal navigation links at any point. Right now you can

 
r12a >> docs

Bengali (Bangla)
orthography notes

Updated 1 April, 2025 • recent changes scripts/beng/bn • leave a comment

This page brings together basic information about the Bengali script and its use for the Bangla language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Bengali using Unicode.

Referencing this document

Richard Ishida, Bengali (Bangla) Orthography Notes, 01-Apr-2025, https://r12a.github.io/scripts/beng/bn

 

Click to toggle Table of Contents.

Phonological transcriptions should be treated as a guide, only. They are taken from the sources consulted, and may be narrow or broad, phonemic or phonetic, depending on what is available. They mostly represent pronunciation of words in isolation. For more detailed information about allophones, alternations, sandhi, dialectal differences, and so on, follow the links to cited references.

This is an interactive document. Click/tap on the following to reveal detailed information and examples for each character: (a) coloured characters in examples and lists; (b) link text on character names. If your browser supports it, your cursor will change to look like as you hover over these items.

More about using this page

Character names. The names of characters in codepoint markup drop the initial BENGALI label (purely to reduce the length of the examples). In other places the full name can be found.

Navigation. The Toggle images icon opens the table of contents in a popup window. Dismiss it by clicking on the X alongside it, or by hitting the ESC key.

Detailed character notes. Clicking on coloured characters in lists or on character names opens panels that give detailed information about each character. This information is taken from the companion document, Bengali Character Notes. (Those panels can be dismissed by pressing on the ESC key.)

Transcriptions & transliterations. Phonological transcriptions are surrounded by ⌈corner brackets⌋, to indicate that they vary between narrow, [phonetic] and broad, /phonemic/ transcriptions.
Latin transcriptions between <angle brackets>, represent the letters as commonly written in the Latin script.
A transliteration has also been developed especially for this orthography, and is generally based on the sound of a letter where possible, but where a letter has multiple pronunciations, the transliteration represents only one.
Transliterations provide perfect round-trip conversion between the native script and Latin, whereas Latin transcriptions rarely do.
When you click on an example to see its composition, the top of the panel that opens contains a transliteration, followed by the native text, then (if available) an IPA transcription.

Copied !
TOC.
Accessibility settings
ˇ

Languages using the Bengali scriptBangla pickerTerms listCharacter notesBengali linksOther orthography notes

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

ধারা ১ সমস্ত মানুষ স্বাধীনভাবে সমান মর্যাদা এবং অধিকার নিয়ে জন্মগ্রহণ করে। তাঁদের বিবেক এবং বুদ্ধি আছে; সুতরাং সকলেরই একে অপরের প্রতি ভ্রাতৃত্বসুলভ মনোভাব নিয়ে আচরণ করা উচিত।

ধারা ২ এ ঘোষণায় উল্লেখিত স্বাধীনতা এবং অধিকারসমূহে গোত্র, ধর্ম, বর্ণ, শিক্ষা, ভাষা, রাজনৈতিক বা অন্যবিধ মতামত, জাতীয় বা সামাজিক উত্‍পত্তি, জন্ম, সম্পত্তি বা অন্য কোন মর্যাদা নির্বিশেষে প্রত্যেকের‌ই সমান অধিকার থাকবে। কোন দেশ বা ভূখণ্ডের রাজনৈতিক, সীমানাগত বা আন্তর্জাতিক মর্যাদার ভিত্তিতে তার কোন অধিবাসীর প্রতি কোনরূপ বৈষম্য করা হবেনা; সে দেশ বা ভূখণ্ড স্বাধীন‌ই হোক, হোক অছিভূক্ত, অস্বায়ত্বশাসিত কিংবা সার্বভৌমত্বের অন্য কোন সীমাবদ্ধতায় বিরাজমান।

Source: Unicode UDHR, articles 1 & 2

Usage & history

Origins of the Bengali script, 11thC – today.

Phoenician

└ Aramaic

└ Brahmi

└ Gupta

└ Siddham

└ Gaudi

└ Bengali

+ Oriya

+ Tirhuta

+ Nagari

+ Nepalese

The Bengali or Bangla script is used by over 180 million people in Bangladesh and India to write the Bengali language, and a number of other Indian languages including Sylheti, Meithei, Bishnupriya Manipuri, and, with one or two modifications, Assamese. It has historically been used to write Sanskrit within Bengal. It ranks 5th in the world for writing system usage.

বাংলা বর্ণমালা bɑŋ̽lɑ br͓n̈mɑlɑ (bangla bôrnômala) Bengali/Bangla alphabet বাংলা লিপি bɑŋ̽lɑ lipi (bangla lipi) Bangla script

The script is descended from Brahmi, but there is some dispute about its derivation, since it shares shapes with both Dravidian and Aryan scripts.

More information: Scriptsource, Wikipedia

Script codebeng
Language codebn
Script typeabugida
Originsasia
Native speakers260,000,000
  
Total characters84
Letters49
Combining marks15
Symbols2
Punctuation6
Numbers10
Other2
Possible other7
Unicode blocks1
  
Character counts above are for this
orthography but exclude ASCII.
  
Text directionltr
Post-consonant vowels2 inherent vowels
marks
pre-base marks
circumgraphs
Standalone vowelsletters
Case distinctionno
Cursive scriptno
Combining marks>1 per base
Clusters markedyes
Dedicated medialsmarks
letters
Dedicated finalsmarks
letters
Consonant
Clusters
ligated glyphs
stacks
conjoined glyphs
killer type: v
Other ligaturesyes
Word separatorspace
Wraps atword
Hyphenation(yes) -
G Clusters OK?no
Justificationspaces
Baselinehang

Basic features

The Bengali script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel signs to the consonant. See the table to the right for a brief overview of features for Bangla.

The orthographic letters of the Bengali script are derived from Sanskrit, and in some cases don't quite fit the needs of modern Bangla (eg. lack of simple vowels for the sounds ɛ and æ, letters for only 2 of many diphthongs, long and short letters where pronunciation no longer distinguishes those sounds, etc.)

Bengali runs left to right in horizontal lines. Words are separated by spaces. There are no case distinctions.

❯ Consonant summary table

The 33 consonant letters used for Bangla are supplemented by repertoire extensions for 3 more sounds by applying the nukta diacritic to characters.

Consonant clusters at any location are normally indicated using a virama (hasant) between consonants. This results in a large number of conjunct forms expressed using stacked consonants, conjoined consonants, and ligated glyphs. Bengali stacked conjuncts are usually the same height as ordinary letters. Conjuncts often have different pronunciations than might be expected from the letters involved, and in particular gemination is very common. Occasionally, a visible virama is used, however, clusters are often not marked at all.

Syllable-final consonant sounds may be represented by a special letter, , or by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama.

❯ Vowel summary table

The Bangla orthography is an abugida with 2 inherent vowels, pronounced ɔ and o. Other post-consonant vowels are written using 9 combining marks (vowel signs) and a specialised use of the y consonant letter.

Vowel harmony plays a significant role in the pronunciation of vowel-related code points.

Vowels may be nasalised, using U+0981 SIGN CANDRABINDU.

There are 3 pre-base and 2 circumgraph vowel signs. In principle, there are no multipart vowels, however in decomposed text the 2 circumgraphs split into 2 parts each.

Standalone vowels are written using 10 independent vowel letters, one for each vowel sound, including the inherent vowel and 2 diphthongs. The final sound of numerous diphthongs is also represented using independent vowels.

Bengali has native digits.

Character index

The index points to locations where a character is mentioned in this page, and indicates whether it is used by the Bengali orthography described here.

Manage characters.

Click on the image to the left to view all the 'main' and 'infrequent' characters in the index in various groupings or open related apps.

Letters

Show

Basic consonants

list all 32
09AA
BENGALI LETTER PAconsonant p p
09AC
BENGALI LETTER BAconsonant b ∅- b
09AB
BENGALI LETTER PHAconsonant pf pʰ f PH
09AD
BENGALI LETTER BHAconsonant bʰ v BH
09A4
BENGALI LETTER TAconsonant t t
09A6
BENGALI LETTER DAconsonant d d
09A5
BENGALI LETTER THAconsonant TH
09A7
BENGALI LETTER DHAconsonant DH
099F
BENGALI LETTER TTAconsonant ʈ
09A1
BENGALI LETTER DDAconsonant ɖ
09A0
BENGALI LETTER TTHAconsonant ʈʰ ṬH
09A2
BENGALI LETTER DDHAconsonant ɖʰ ḌH
0995
BENGALI LETTER KAconsonant k k
0997
BENGALI LETTER GAconsonant ɡ g
0996
BENGALI LETTER KHAconsonant KH
0998
BENGALI LETTER GHAconsonant ɡʰ GH
099A
BENGALI LETTER CAconsonant t͡ʃ c
099B
BENGALI LETTER CHAconsonant t͡ʃʰ CH
099C
BENGALI LETTER JAconsonant d͡ʒ z j
099D
BENGALI LETTER JHAconsonant d͡ʒʰ JH
09AF
BENGALI LETTER YAconsonant/consonant lengthener d͡ʒ- -æ y
09B6
BENGALI LETTER SHAconsonant ʃ s ś
09B7
BENGALI LETTER SSAconsonant ʃ
09B8
BENGALI LETTER SAconsonant ʃ s s
09B9
BENGALI LETTER HAconsonant h h
09AE
BENGALI LETTER MAconsonant m m
09A8
BENGALI LETTER NAconsonant n n
09A3
BENGALI LETTER NNAconsonant n
099E
BENGALI LETTER NYAconsonant n ñ
0999
BENGALI LETTER NGAconsonant ŋ ŋɡ
09B0
BENGALI LETTER RAconsonant r ɾ r
09B2
BENGALI LETTER LAconsonant l l

Extended consonants

list all 3
09DC
(infrequent)    BENGALI LETTER RRAatomic consonant ɽ ɽ
09DD
(rare)    BENGALI LETTER RHAatomic consonant ɽʱ ɽ
09DF
(infrequent)    BENGALI LETTER YYAatomic consonant j e̯

Independent vowels

list all 10
0987
BENGALI LETTER Iindependent vowel i i̯ I
0988
BENGALI LETTER IIindependent vowel i iː Ī
0989
BENGALI LETTER Uindependent vowel u u̯ U
098A
BENGALI LETTER UUindependent vowel u Ū
098F
BENGALI LETTER Eindependent vowel e æ E
0993
BENGALI LETTER Oindependent vowel o o̯ O
0985
BENGALI LETTER Aindependent vowel ɔ A
0986
BENGALI LETTER AAindependent vowel a Ā
0990
BENGALI LETTER AIindependent vowel oi̯ AI
0994
BENGALI LETTER AUindependent vowel ou̯ AU

Vocalic

list
098B
BENGALI LETTER VOCALIC Rindependent vocalic ri

Other

list all 3
09CE
BENGALI LETTER KHANDA TAfinal consonant -t
09BD
BENGALI SIGN AVAGRAHAvowel lengthener
ʼ02BC
MODIFIER LETTER APOSTROPHEapostrophe ʼ

Not used for modern Bangla

list all 3
09E0
(archaic)    BENGALI LETTER VOCALIC RRindependent vocalic Only used to write Sanskrit in Bengali.
098C
(archaic)    BENGALI LETTER VOCALIC Lindependent vocalic Only used to write Sanskrit in Bengali. li
09E1
(archaic)    BENGALI LETTER VOCALIC LLindependent vocalic Only used to write Sanskrit in Bengali.

Combining marks

Show

Vowel signs

list all 9
ি09BF
BENGALI VOWEL SIGN Ivowel sign i e i
09C7
BENGALI VOWEL SIGN Evowel sign e æ e
09C8
BENGALI VOWEL SIGN AIvowel sign oi̯ ai
09CB
BENGALI VOWEL SIGN Ovowel sign o ʊ ɔ o
09CC
BENGALI VOWEL SIGN AUvowel sign ou̯ au
09C0
BENGALI VOWEL SIGN IIvowel sign i ī
09C1
BENGALI VOWEL SIGN Uvowel sign u u
09C2
BENGALI VOWEL SIGN UUvowel sign u ū
09BE
BENGALI VOWEL SIGN AAvowel sign a æ ā

Vocalic

list
09C3
BENGALI VOWEL SIGN VOCALIC Rvocalic vowel sign ri

Other

list all 6
0981
BENGALI SIGN CANDRABINDUvowel nasalisation marker ̃
0982
BENGALI SIGN ANUSVARAfinal consonant
0983
BENGALI SIGN VISARGAfinal consonant/consonant lengthener
09BC
BENGALI SIGN NUKTAnukta
09CD
BENGALI SIGN VIRAMAvirama
09D7
(deprecated)    BENGALI AU LENGTH MARKlength mark

Not used for modern Bangla

list all 3
09C4
(archaic)    BENGALI VOWEL SIGN VOCALIC RRvocalic vowel sign Only used to write Sanskrit in Bengali. R̥̄
09E2
(archaic)    BENGALI VOWEL SIGN VOCALIC Lvocalic vowel sign Only used to write Sanskrit in Bengali. li
09E3
(archaic)    BENGALI VOWEL SIGN VOCALIC LLvocalic vowel sign Only used to write Sanskrit in Bengali. L̥̄

Numbers

Show
list all 10
09E6
BENGALI DIGIT ZEROdigit
09E7
BENGALI DIGIT ONEdigit 1
09E8
BENGALI DIGIT TWOdigit 2
09E9
BENGALI DIGIT THREEdigit 3
09EA
BENGALI DIGIT FOURdigit 4
09EB
BENGALI DIGIT FIVEdigit 5
09EC
BENGALI DIGIT SIXdigit 6
09ED
BENGALI DIGIT SEVENdigit 7
09EE
BENGALI DIGIT EIGHTdigit 8
09EF
BENGALI DIGIT NINEdigit 9

Not used for Bangla

list all 6
09F4
(archaic)    BENGALI CURRENCY NUMERATOR ONEcurrency numerator
09F5
(archaic)    BENGALI CURRENCY NUMERATOR TWOcurrency numerator
09F6
(archaic)    BENGALI CURRENCY NUMERATOR THREEcurrency numerator
09F7
(archaic)    BENGALI CURRENCY NUMERATOR FOURcurrency numerator
09F8
(archaic)    BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATORcurrency numerator
09F9
(archaic)    BENGALI CURRENCY DENOMINATOR SIXTEENcurrency numerator

Punctuation

Show
list all 6
0964
DEVANAGARI DANDAdanda .
0965
DEVANAGARI DOUBLE DANDAdouble danda
201C
LEFT DOUBLE QUOTATION MARKquotation mark
201D
RIGHT DOUBLE QUOTATION MARKquotation mark
2018
LEFT SINGLE QUOTATION MARKquotation mark
2019
RIGHT SINGLE QUOTATION MARKquotation mark

ASCII

list all 8
!0021
EXCLAMATION MARKexclamation mark !
(0028
LEFT PARENTHESISparenthesis (
)0029
RIGHT PARENTHESISparenthesis )
,002C
COMMAcomma ,
.002E
FULL STOPfull stop .
:003A
COLONcolon :
;003B
SEMICOLONsemicolon ;
?003F
QUESTION MARKquestion mark ?

Symbols

Show
list both
09F3
BENGALI RUPEE SIGNrupee sign
09FA
BENGALI ISSHARdeath symbol

Not used for Bangla

list both
09F2
(unused)    BENGALI RUPEE MARKrupee mark
09FB
(unused)    BENGALI GANDA MARK

Other

Show
list both
ZWJ200D
ZERO WIDTH JOINERzero-width joiner
ZWNJ200C
ZERO WIDTH NON-JOINERzero-width non-joiner

To be investigated

list all 12
§00A7
(tbc)    SECTION SIGNsection sign §
͏034F
(tbc)    COMBINING GRAPHEME JOINERcombining grapheme joiner
0980
(tbc)    BENGALI ANJI
09FC
(tbc)    BENGALI LETTER VEDIC ANUSVARA
09FD
(tbc)    BENGALI ABBREVIATION SIGNabbreviation marker
2013
(tbc)    EN DASHen dash
2014
(tbc)    EM DASHem dash
2020
(tbc)    DAGGERdagger
2021
(tbc)    DOUBLE DAGGERdouble dagger
2026
(tbc)    HORIZONTAL ELLIPSISellipsis
2032
(tbc)    PRIMEprime
2033
(tbc)    DOUBLE PRIMEdouble prime

Phonology

Click on the sound groups to see where else in the document each of the sounds are referred to.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i ĩ i ĩ u ũ u ũ ʊ e e o õ o õ ɛ ɛ̃ ɛ ɛ̃ ɔ ɔ̃ ɔ ɔ̃ æ æ a ã a ã

Complex vowels

There are a large number of diphthongs in Bangla, and the chart below shows an incomplete set.12

ii̯ iu̯ ii̯ iu̯ ui̯ ui̯ ei̯ eu̯ ei̯ eu̯ oi̯ ou̯ oe̯ oo̯ oi̯ ou̯ oe̯ oo̯ ɛe̯ ɛe̯ ɔe̯ ɔo̯ ɔe̯ ɔo̯ æe̯ æo̯ æe̯ æo̯ ai̯ au̯ ae̯ ao̯ ai̯ au̯ ae̯ ao̯

Vowel harmony

The pronunciation of a vowel can be affected by the vowel in the following syllable. Radice provides the following table, though this is a simplification and there are many exceptions.

Followed by i or u Followed by ɔ, o, e or a
o → u o → ɔ
ɔ → o u → o
e → i e → æ
æ → e i → e

For example, the verb শোনা ʃonɑ to hear with an i ending becomes ʃuni, দেখা dækʰa to see becomes dekʰi, etc. This sometimes accounts for the pronunciation of the inherent vowel, eg. অতিথি otitʰi guest and অনুবাদ onubad translation start with o rather than ɔ.

Consonant sounds

labial alveolar post-
alveolar
retroflex palatal velar glottal
stops p b t d   ʈ ɖ   k ɡ  
aspirated pf   ʈʰ ɖʰ   ɡʰ  
affricates     t͡ʃ d͡ʒ        
aspirated     t͡ʃʰ d͡ʒʰ        
fricatives f v s z ʃ       ɦ h
nasals m n       ŋ
approximants w l     j  
trills/flaps   r ɾ   ɽ
aspirated       ɽʰ

pf, and f are alternative pronunciations for the same phoneme, depending on where the speaker is from, and all are written using U+09AB LETTER PHA

True retroflex (murdhonno) consonants are not found in Bengali. They are apical postalveolar in Western Dialects. In other dialects, they are fronted to apical alveolar.12

r occurs word-initially, whereas ɾ occurs medially and finally. Both sounds are written using U+09B0 LETTER RA.12

s and ʃ are often merged. z is found mainly in foreign words.12

In the Bangla spoken in Dhaka, ɾ and ɽ are often indistinct phonemically,12 eg. the following two words can be homophonous: করা kɔɾa/kɔɹa/kɔɽa to do কড়া kɔɽa/kɔɾa strict

j and w are pronunciations of য়U+09AF LETTER YA + U+09BC SIGN NUKTA when it appears between certain vowels.

Structure

The effective unit of the Bengali writing systems is the orthographic syllable.

An orthographic syllable can be defined in one of the code point sequences described below. Lowercase letters represent combining characters. Some vowel signs may be displayed at the start of the sequence, although the code points representing them always appear after the base consonant

Consonant-based orthographic syllables

[C[n]h] [C[n]h] C[n] [h | v (n)] [f]

Legend
C
Consonant.
Cn
Consonant followed by nukta.
h
Hasant.
v
Vowel sign.
n
Nasalisation diacritic (candrabindu).
f
Final consonant (one of khanda ta, anusvara, or visarga).

The core of a consonant-based syllable is a base consonant character, which may or may not additionally represent an inherent vowel if it stands alone.

There is no inherent vowel if it is followed by a vowel sign, eg. কী কি ki কো ko or hasant, eg. ক্ .

At the end of a word, there may or may not be an inherent vowel, even if there is no hasant.

Any base consonant may be a combination of consonant code point plus nukta.

The base consonant can be preceded by up to two consonant+hasant pairs (where the consonant may also be a combination of consonant+nukta), but only if those consonants form conjuncts (ie. the hasant is invisible), eg. ক্ক k͓k ম্প m͓p ক্ষ k͓ʃ̇ ন্ত্র n͓t͓r. If the preceding consonants carry visible hasant symbols, those are treated as separate orthographic syllables.

Likewise, the variable use of the hasant in Bengali means that a phonetic cluster of consonants can constitute a larger series of orthographic syllables. For example, this word for cymbal has two phonetic syllables, but 3 orthographic since the rt combination is not combined: করতাল kɔrtɑl cymbal

A vowel sign may optionally be followed by a nasalisation diacritic.

Unless the base consonant is followed by a hasant, the syllable may be terminated by a final consonant repesented by khanda ta, anusvara, or visarga.

Consonant clusters

Native Bengali words do not allow initial consonant clusters, and word-final clusters are rare. However, words borrowed from Sanskrit, English, etc. have introduced many such features.

Many Bengali speakers, however, retain the native phonology, even when using Sanskrit or English borrowings, such as গেরাম gerɑm (CV.CVC) for গ্রাম g͓rɑm village (CCVC), or ইস্কুল ịʃ͓̈kul (VC.CVC) forস্কুল ʃ͓̈kul school.12

Most word-final clusters were introduced from English, eg. লিফ্ট lipʰ͓ʈ lift, elevator or ব্যাংক b͓ʲɑŋ̽k bank. In some dialects, a final nasal+stop is written as a cluster, whereas in standard Bengali it would use nasalisation, eg. চান্দ cɑn͓d vs. চাঁদ cɑm̽d moon.12

For more information, see Wikipedia12.

Vowel-based orthographic syllables

Vowel-based syllables begin with a standalone vowel, which is represented by a single independent vowel or vocalic.

An independent vowel may be followed by an anusvara, visarga or candrabindu (nasalisation), eg. উঃ, আঁ ụh̽, ɑm̽

Vowels

The Bangla orthography is an abugida with 2 inherent vowels, pronounced ɔ and o. Other post-consonant vowels are written using 9 combining marks (vowel signs) and a specialised use of the y consonant letter.

Vowel harmony plays a significant role in the pronunciation of vowel-related code points.

Vowels may be nasalised, using U+0981 SIGN CANDRABINDU.

There are 3 pre-base and 2 circumgraph vowel signs. In principle, there are no multipart vowels, however in decomposed text the 2 circumgraphs split into 2 parts each.

Standalone vowels are written using 10 independent vowel letters, one for each vowel sound, including the inherent vowel and 2 diphthongs. The final sound of numerous diphthongs is also represented using independent vowels.

Vowel summary table

The following table summarises the main vowel to character assigments.

The table shows dependent vowels on the left, and standalone vowels on the right. It doesn't capture sound changes produced by vowel harmony.

Plain:

4
iিii09BF
iīī09C0
  
uuu09C1
uūū09C2

4
iI0987
iĪị̄0988
  
uU0989
uŪụ̄098A

both
eee09C7
  
ooo09CB

both
eE098F
  
oO0993

ɔoo09CB

ɔAɔ̣0985

4
æ্যা ͓ʥɑ09CD
09AF
09BE
æ্য ͓ʥ09CD
09AF
æāɑ09BE
  
aāɑ09BE

aĀɑ̣0986
Diphthongs:

6
oi̯ai09C8
ou̯au09CC
-i̯-ই  0987
-u̯-উ  0989
-য়  09AF
09BC
-ও  0993

both
oi̯AIọʲ0990
ou̯AUọʷ0994
Vocalics:

ri09C3

rir̥̣098B

For additional details see Vowel sounds to characters.

Inherent vowel

An inherent vowel is a vowel sound that is automatically pronounced after a consonant letter, unless specifically suppressed.

kɔ ~ ko U+0995 BENGALI LETTER KA

The inherent vowel is typically transcribed as a, and pronounced as ɔ or o. (And sometimes halfway between these two, when influenced by surrounding sounds.) Bengalis are not always aware of these sound differences – thinking of this as one sound. So or ko are written by simply using the consonant letter.

There is also a vowel sign pronounced o. This can lead to inconsistent spellings, eg. bhalo, good, well, can be spelled either ভালো bʱɑlo good or ভাল bʱɑlo good. Verb forms tend to be particularly inconsistent, sometimes basing the rationale on what looks good in a particular context.

The rules for determining the sound of the inherent vowel are not simple. Partly it is a question of vowel harmony. The following two tendencies can help:

Inherent vowel suppression

Bengali uses U+09CD SIGN VIRAMA (called হসন্ত hʃ̈n͓t hasant hɔsonto in Bengali) to indicate that the inherent vowel is not pronounced after a consonant, eg. ক্U+0995 LETTER KA + U+09CD SIGN VIRAMA explicitly represents just the sound k.

The hasant is rarely seen. It is not used at the end of a word even though the inherent vowel is pronounced at the end of some words and not others, eg. গরম gɔrôm, hot vs. গড়ান gɔɽɑnô, to roll. There is no real way to tell when the inherent vowel is pronounced and when not in this position, except that it is usually pronounced following a word-final consonant cluster.

Within a word also, some clusters don't use the hasant in Bengali, and the reader simply has to know that the inherent vowel is not pronounced, eg. করতাল kɔrtɑl cymbal

This is particularly common at morpheme boundaries, for example in verb forms.

Consonant clusters that are represented by conjunct forms use the hasant between consonants to invoke the shape changes. If the font has the glyphs needed to produce the conjunct the hasant is hidden (see Consonant clusters).

Refs: Radice 3, 7-8, 21, 148; Daniels 400

Post-consonant vowels

কী ki U+0995 BENGALI LETTER KA + U+09C0 BENGALI VOWEL SIGN II

Post-consonant vowels are written using combining marks (vowel signs) and a specialised use of the consonant letter U+09AF LETTER YA. Vowel harmony plays a significant role in the pronunciation of vowel-related code points.

There are 3 pre-base and 2 circumgraph vowel signs. In principle, there are no multipart vowels, however in decomposed text the 2 circumgraphs split into 2 parts each.

Seven vowel signs are spacing combining characters, meaning that they consume horizontal space when added to a base consonant.

All vowel signs are typed and stored after the base consonant, whether or not they precede it when displayed. The glyph rendering system takes care of the positioning at display time. Conjuncts are treated as indivisible units when it comes to rendering vowel signs, meaning that pre-base vowel signs and left-side glyphs of circumgraphs are rendered before the conjunct as a whole (see Pre-base vowel signs).

Plain vowels

The following panel shows the primary vowels for Bengali, and how they are written. Most sounds are written using a single, dedicated code point, but one is represented by a sequence that includes a character usually used as a consonant (see Jô-phôla for more details).


8
িiii09BF
iīī09C0
uuu09C1
uūū09C2
eee09C7
ooo09CB
্যাæ ͓ʥɑ09CD
09AF
09BE
aāɑ09BE

Bengali has lost the distinction between short and long vowels in pronunciation, but retains the difference in spelling.

Other sounds are associated with some of the above graphemes. The variation in pronunciation for the vowel signs can often be explained by vowel harmony.

Click on the characters in the panel for more details.


4
িeii09BF
æee09C7
ɔoo09CB
æāɑ09BE

U+09CB VOWEL SIGN O was originally pronounced ʊ, and that pronunciation sometimes persists alongside the o that came from Sanskrit, eg. নোংরা nʊŋra foul

Vowel signs (particularly U) may form ligatures with a preceding base consonant. The use of ligatures may vary with the type of content (see Vowel ligatures).

Jô-phôla

ক্য U+0995 BENGALI KA + U+09CD SIGN VIRAMA + U+09AF LETTER YA


both
্যাɛ æ09CD
09AF
09BE
্যæ e09CD
09AF

When it occurs as the last member of a consonant cluster U+09AF LETTER YA has the special shape ্য and is called ʤɔ-pfɔlɑ (য-ফলা). One of its functions is to create the sound æ.

হ্যাঁ
The sound æ, created with the sequence ্যাU+09CD SIGN VIRAMA + U+09AF LETTER YA + U+09BE VOWEL SIGN AA.
show composition

হ্যাঁ hæ̃ː yes

There are exceptions to the previous rule, when the a-kar produces its normal value, eg. ব্যাখ্যা bækkʰa explanation

সন্ধ্যা ʃonddʰa evening

The sound æ can also be produced when there is no vowel sign, eg.

ব্যথা bætʰɑ pain

According to Radice, vowel harmony may phonetically produce e when the following syllable is i.

Unusually for Indian scripts, jô-phôla can also be used after independent vowels to create the standalone sound æ. The sequences অ্যাU+0985 LETTER A + U+09CD SIGN VIRAMA + U+09AF LETTER YA + U+09BE VOWEL SIGN AA and এ্যাU+098F LETTER E + U+09CD SIGN VIRAMA + U+09AF LETTER YA + U+09BE VOWEL SIGN AA are used mostly in transliterations of borrowed words, eg. অ্যাটর্নি ɔæʈorni attorney এ্যাডভোকেট eæɖbʰokeʈ advocate

See also Composite vowels, which explains how independent vowels are used for the off-glide of diphthongs.

Diphthongs

Like several other brahmi-derived scripts, the following 2 diphthongs are written using a single character each.


both
oi̯09C8
ou̯09CC

However, most of the many additional diphthongs are represented by a sequence of vowels13, where the off-glide is typically represented using one of the following independent vowels:

The following are examples of diphthongs. Hyphens indicate a consonant with inherent vowel. The detailed character notes contain examples of Bangla words containing the diphthongs shown.


14
িইii̯09BF
0987
ুইui̯09C1
0987
েইei̯09C7
0987
oi̯09C8
-ইoi̯002D
0987
াইai̯09BE
0987
িউiu̯09BF
0989
েউeu̯09C7
0989
ou̯09CC
াউau̯09BE
0989
-য়ɔe̯002D
09AF
09BC
ায়ae̯09BE
09AF
09BC
-ওɔo̯ oo̯002D
0993
াওao̯09BE
0993

Pre-base vowel signs

কে ke U+0995 BENGALI KA + U+09C7 VOWEL SIGN E

Three vowel signs appear to the left of the base consonant letter or cluster.


3
িi eii09BF
e æee09C7
   
oi̯ai09C8

These combining marks are always stored after the base consonant, ie. the codepoints follow the order in which the items are pronounced. The rendering process places the glyph before the base consonant without changing the code points. Click on the following word to see the sequence of characters in storage.

বাকি baki left over

The vowel sign is actually placed before the start of a conjunct. In Figure 2 the sequence of glyphs for the orthographic syllable is rendered VCC, whereas the pronunciation is CCV. In conjuncts with 3 consonants, it will still be rendered before all the consonants.

ব্যক্তি ব্যক্তি
Two examples of a prebase vowel, pronounced after a consonant cluster, but rendered to the left of the conjunct.
show composition

ব্যক্তি bækt̪i person

However, if the cluster doesn't form a conjunct, or is split by a visible virama, the cluster becomes two orthographic syllables and the pre-base vowel sign appears after the last consonant that has a virama. The sequence of displayed glyphs is now CVC. If the conjunct contains 3 consonants, the displayed order will be CCVC.

ব্যক্তি ব্যক্তি
Two examples of a the same prebase vowel, pronounced after consonant clusters that don't form conjuncts, and rendered to the left of the last consonant in the cluster.
show composition

বালতি ˈbal.t̪iˑ bucket

show composition

প্‌লিজ pliz please

Circumgraphs

When a single vowel sign code point produces glyphs on more than one side of the consonant base, it is referred to here as a circumgraph.

কো ko U+0995 BENGALI KA + U+09CB VOWEL SIGN O

Two vowels are represented by circumgraphs, producing glyphs on opposite sides of the consonant onset.


both
o ʊ ɔoo09CB
   
ou̯au09CC

Like pre-base glyphs, these are combining marks that are always stored after the base consonant. The rendering process places the glyphs around the base consonant, as needed. Click on the following word to see the sequence of characters in storage.

নৌকো nou̯ko boat

Again, like pre-base vowel signs, in Bangla the circumgraph surrounds the whole consonant cluster that is rendered as a conjunct.

অক্টোবর
The circumgraph U+09CB BENGALI VOWEL SIGN O is rendered on two sides of the consonant stack.
show composition

অক্টোবর ɔk.ʈo.bɔɾ October

These circumgraph vowel signs are normally a single code point, but in decomposed text the base consonant can be followed by two code points , eg. compare the following equivalent ways of writing ko:

কো U+0995 LETTER KA + U+09CB VOWEL SIGN O​

 

কো U+0995 LETTER KA + U+09C7 VOWEL SIGN E​ + U+09BE VOWEL SIGN AA​

An example of precomposed (top) and decomposed (bottom) approaches to circumgraph vowels.

Vowel elongation

U+09BD SIGN AVAGRAHA is a Sanskrit-derived symbol that is used in modern Bengali to lengthen vowel sounds14, eg. কিঽঽঽ? kiiii Whaaatt? শুনঽঽঽ ʃunooo Listennn

Nasalisation

U+0981 SIGN CANDRABINDU nasalises the vowel in a syllable, eg. হ্যাঁ hæ̃ː yes হাঁপান hɑ̃pɑn to pant Nasalised vowels include ĩ ũ ẽ õ ɛ̃ ɔ̃ ã.

The candrabindu should be placed over the top of an independent vowel, but over the base consonant when a vowel sign is attached – not over the vowel sign. In the code point sequence, however, this should occur after any combining vowel sign associated with the same syllable. Note how the base consonant is identified correctly in the second word of Figure 6, even though the candrabindu is 4 code points away. Some fonts do not position the candrabindu correctly.

হাঁপান হ্যাঁ
The candrabindu is positioned over the base consonant, even though it is the last code point in the syllable. (The arrow gives the approximate location of the code point.)
show composition

হাঁপান hɑ̃pɑn to pant

show composition

হ্যাঁ hæ̃ː yes

Composite vowels

কেউ keu̯ U+0995 BENGALI KA + U+09C7 VOWEL SIGN E + U+0989 VOWEL LETTER U

Composite vowels in Bengali may occur when U+09CB VOWEL SIGN O and U+09CC VOWEL SIGN AU are decomposed, however this is not common. See Encoding choices for more details.


both
োo09C7
09BE
ৌou̯09C7
09D7

Although 2 of the vowel signs (and independent vowels) represent diphthongs (oi̯ and ou̯) with a single code point, most of the many diphthongs are represented by a sequence of vowels,13 eg. কেউ keu̯ somebody

Standalone vowels

Standalone vowels are vowel sounds that are not preceded by a consonant sound, or are preceded by only a glottal stop. They may appear at the beginning of a word or in the middle of a word after a preceding vowel.


10
i i̯I0987
i iːĪị̄0988
u u̯U0989
uŪụ̄098A
e æE098F
o o̯O0993
ɔAɔ̣0985
aĀɑ̣0986
   
oi̯AIọʲ0990
ou̯AUọʷ0994

Bengali represents syllable-initial vowels using a set of independent vowel letters, eg. ওস্তাদ ost̪ad̪ teacher উট camel ঔষুধ ou̯ʃudʰ medecine

Standalone æ appears mostly in loan words. It involves the use of U+09AF LETTER YA. See Jô-phôla for more details.

Vowel sounds to characters

This section maps Bengali vowel sounds to common graphemes in the Bengali orthography.

Dependent vowels appear in the left column, standalone in the right. Hyphens indicate a consonant with inherent vowel.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

Plain vowels

i

dependent িU+09BF VOWEL SIGN I

 

standalone U+0987 LETTER I

 

dependent U+09C0 VOWEL SIGN II

 

standalone U+0988 LETTER II

dependent U+09C3 VOWEL SIGN VOCALIC R vocalic, contains i.

independent U+098B LETTER VOCALIC R vocalic, contains i.

u

dependent U+09C1 VOWEL SIGN U

standalone U+0989 LETTER U

dependent U+09C2 VOWEL SIGN UU

standalone U+098A LETTER UU

dependent U+09CB VOWEL SIGN O with vowel harmony before one of u or u.

standalone U+098A LETTER UU with vowel harmony before one of u or u.

standalone U+0989 LETTER U

e

dependent U+09C7 VOWEL SIGN E

standalone U+098F LETTER E

dependent িU+09BF VOWEL SIGN I with vowel harmony before one of ɔ o e a.

standalone U+0987 LETTER I with vowel harmony before one of ɔ o e a.

dependent ্যU+09CD SIGN VIRAMA + U+09AF LETTER YA with the inherent vowel before i.

special য়U+09AF LETTER YA + U+09BC SIGN NUKTA as part of a diphthong, esp after ɔ a o.

o

inherent vowel eg. ওজন od͡ʒon weight.

standalone U+0993 LETTER O

dependent U+09CB VOWEL SIGN O

dependent U+0985 LETTER A with vowel harmony before one of i u.

dependent U+09C1 VOWEL SIGN U with vowel harmony before one of ɔ o e a.

dependent U+0989 LETTER U with vowel harmony before one of ɔ o e a.

ɔ

inherent vowel eg. করবী kɔɾɔbi oleander.

standalone U+0985 LETTER A

dependent U+09CB VOWEL SIGN O with vowel harmony before one of ɔ o e a.

standalone U+0993 LETTER O with vowel harmony before one of ɔ o e a.

æ

dependent U+09BE VOWEL SIGN AA after জ্ঞU+099C LETTER JA + U+09CD SIGN VIRAMA + U+099E LETTER NYA.

standalone U+098F LETTER E with vowel harmony before one of ɔ o e a.

dependent ্যাU+09CD SIGN VIRAMA + U+09AF LETTER YA + U+09BE VOWEL SIGN AA

special ্যU+09CD SIGN VIRAMA + U+09AF LETTER YA with the inherent vowel and not followed by i.

dependent U+09C7 VOWEL SIGN E with vowel harmony before one of ɔ o e a.

special এ্যাU+098F LETTER E + U+09CD SIGN VIRAMA + U+09AF LETTER YA + U+09BE VOWEL SIGN AA

special অ্যাU+0985 LETTER A + U+09CD SIGN VIRAMA + U+09AF LETTER YA + U+09BE VOWEL SIGN AA in loan words.

a

dependent U+09BE VOWEL SIGN AA

standalone U+0986 LETTER AA

Diphthongs and other combinations

oi̯

dependent U+09C8 VOWEL SIGN AI

standalone U+0990 LETTER AI

dependentU+0987 LETTER I

standalone ওইU+0993 LETTER O + U+0987 LETTER I

ou̯

dependent U+09CC VOWEL SIGN AU

standalone U+0994 LETTER AU

oo̯

dependentU+0993 LETTER O

ɔe̯
ɔo̯

dependentU+0993 LETTER O

◌̃

nasalisation U+0981 SIGN CANDRABINDU

Vocalics

Vocalics are letters derived from Sanskrit that generally behave like vowels, but represent r/l followed by a vowel. They are often available both as vowel signs and independent vowel letters.


both
rir̥̣098B
ri09C3

Only one vocalic is in common use for modern Bangla. It is used in standalone and vowel sign forms, eg. ঋতু ɾit̪u season বৃহৎ brihɔt huge

Three more vocalics, U+098C LETTER VOCALIC L, U+09E0 LETTER VOCALIC RR and U+09E1 LETTER VOCALIC LL, and their dependent forms, are historic and only used to write Sanskrit in Bengali.8


6
archaic r̥̣̄09E0
archaicl̥̣098C
archaic l̥̣̄09E1
archaicR̥̄r̥̄09C4
archaic09E2
archaicL̥̄l̥̄09E3

Consonants

The 33 consonant letters used for Bangla are supplemented by repertoire extensions for 3 more sounds by applying the nukta diacritic to characters.

Consonant clusters at any location are normally indicated using a virama (hasant) between consonants. This results in a large number of conjunct forms expressed using stacked consonants, conjoined consonants, and ligated glyphs. Bengali stacked conjuncts are usually the same height as ordinary letters. Conjuncts often have different pronunciations than might be expected from the letters involved, and in particular gemination is very common. Occasionally, a visible virama is used, however, clusters are often not marked at all.

Syllable-final consonant sounds may be represented by a special letter, , or by 2 dedicated combining marks (anusvara & visarga), but are generally ordinary consonants that are not marked by a virama.

Consonant summary table

The following table summarises the main consonant to character assigments.

The left column is lowercase, and the right uppercase.

A number of letters have allophones which are not shown here. See the following sections for details. Normal letters are used as final consonants, but we list here some additional, dedicated finals.

Onsets

8
ppp09AA
bbb09AC
tBH09AD
ddd09A6
ʈʈ099F
ɖɖ09A1
kkk0995
ɡgg0997

9
PH09AB
tt09A4
TH09A5
DH09A7
ʈʰṬHʈʰ09A0
ɖʰḌHɖʰ09A2
KH0996
ক্ষ k͓ʃ̇0995
09CD
09B7
ɡʰGH0998

5
t͡ʃcc099A
t͡ʃʰCH099B
d͡ʒjʤ099C
d͡ʒyʥ09AF
d͡ʒʰJHʤʰ099D

4
ʃśʃ09B6
ʃʃ̇09B7
ʃsʃ̈09B8
hhh09B9

5
mmm09AE
nnn09A8
n09A3
nññ099E
ŋŋ0999

5
r rr09B0
ɽড় ɽɖˑ09A1
09BC
ɽʱঢ়rareɽɖʰˑ09A2
09BC
l ll09B2
jয় ʲˑ09AF
09BC
Finals

3
-t09CE
0983
ŋ̽0982

For more details see Consonant sounds to characters.

Basic consonants

Basic consonant sounds in Bengali are written using the following letters. Click on each letter for more details and for example usage.


32
ppp09AA
b ∅-bb09AC
pf pʰ fPH09AB
bʰ vBH09AD
ttt09A4
ddd09A6
TH09A5
DH09A7
ʈʈ099F
ɖɖ09A1
ʈʰṬHʈʰ09A0
ɖʰḌHɖʰ09A2
kkk0995
ɡgg0997
KH0996
ɡʰGH0998
   
t͡ʃcc099A
t͡ʃʰCH099B
d͡ʒ zjʤ099C
d͡ʒʰJHʤʰ099D
d͡ʒ- -æyʥ09AF
   
ʃ sśʃ09B6
ʃʃ̇09B7
ʃ ssʃ̈09B8
hhh09B9
   
mmm09AE
nnn09A8
n09A3
nññ099E
ŋ ŋɡŋ0999
   
r ɾrr09B0
lll09B2

Conjuncts ending with U+09AC LETTER BA or U+09AE LETTER MA tend to not pronounce the latter, but double the length of the consonant before it (see Consonant length).

Khiyɔ

ক্ষU+0995 LETTER KA + U+09CD SIGN VIRAMA + U+09B7 LETTER SSA is called khiyɔ, pronounced or kːʰ, and is often treated as a letter of the alphabet in that some dictionaries give it it's own section, eg. ক্ষুদ্র kʰudro small

Repertoire extension

U+09BC SIGN NUKTA is used to create 3 additional letters, eg. the dot changes ɖ to ড় ɽ. Here is a list of graphemes that combine nukta with an existing consonant.

To reveal detailed notes about usage see the list of precomposed characters a little lower.


3
ড় ɽɽɖˑ09A1
09BC
ঢ়rareɽʱɽɖʰˑ09A2
09BC
য় j e̯ʲˑ09AF
09BC

য়U+09AF LETTER YA + U+09BC SIGN NUKTA represents j, w or , depending on what vowels occur alongside it. See the character notes for details.

The nukta should always be typed and stored immediately after the consonant it modifies, and before any combining vowels or diacritics.

The Unicode Standard recommends that content authors use decomposed sequences for these letters. However, the Unicode block also contains the precomposed code points shown below.


3
infreq.ɽɽɽ09DC
rareɽʱɽɽ̇09DD
infreq.j e̯09DF

Decomposed sequences are not recomposed by Unicode Normalisation Form C (NFC).

Assamese & Sanskrit

Two more letters in the Unicode Bengali block are specifically for Assamese.


both
unusedɹrɹ09F0
unusedw βvv09F1

Onsets

Clusters of consonant letters at the beginning of an orthographic syllable occur in Bengali, and they are handled as described in the section Consonant clusters.

Special behaviours include handling of the following types of cluster at the beginning of an orthographic syllable: RA, MA, and jô-phôla.

Medial RA

A trailing U+09B0 LETTER RA in a conjunct is displayed as a wavy line below the other consonant(s), eg. gr in

গ্রাম ɡram village

Examples of clusters with a trailing r. The Latin text is a transliteration.


31
প্রp͞r
ফ্রpʰ͞r
ব্রb͞r
ভ্রbʰ͞r
ত্রt͞r
ত্রুt͞ru
দ্রd͞r
থ্রtʰ͞r
ধ্রdʰ͞r
জ্রʤ͞r
ট্রʈ͞r
ড্রɖ͞r
চ্ছ্রc͞cʰ͞r
ক্রk͞r
খ্রkʰ͞r
গ্রg͞r
ঘ্রgʰ͞r
ন্ধ্রn͞dʰ͞r
শ্রʃ͞r
ষ্ট্রʃ̇͞ʈ͞r
ষ্ক্রʃ̇͞k͞r
স্রʃ̈͞r
স্ট্রʃ̈͞ʈ͞r
স্প্রʃ̈͞p͞r
হ্রh͞r
ম্রm͞r
ম্প্রm͞p͞r
ম্ভ্রm͞bʰ͞r
ন্ত্রn͞t͞r
ন্দ্রn͞d͞r
ণ্ড্রn̈͞ɖ͞r

RA+jô-phôla

In rare cases, the RA at the start of a cluster is a syllable onset followed by ্য jô-phôla, rather than the typical situation where a cluster-initial RA is a syllable coda and is written as a repha mark.

When the RA is an onset, a full-sized RA should be followed by the jô-phôla form for YA. To create this shape, add ‍U+200D ZERO WIDTH JOINER character before the hasant (see Figure 7).

র্য U+09B0 RA + U+09CD VIRAMA​ + U+09AF YA

 

র‍্য U+09B0 RA + U+200D ZWJ + U+09CD VIRAMA​ + U+09AF YA

RA as a syllable coda (top), and as a syllable onset followed by jô-phôla (bottom).

The following is an example of a term with RA as a coda, followed by a YA onset. I was unable to find an example of RA followed by jô-phôla.

সূর্য ʃurd͡ʒo sun

Finals

Syllable codas are commonly written using ordinary letters, but there are also a few dedicated symbols in Bengali.

At the end of a word, a coda doesn't usually have a visible virama attached, but in a few cases it does.

গপ ɡɔp gossip

আল্লাহ্ alːa Allah

A coda followed by a consonant onset will often, but not always, form a conjunct (see No special rendering). Compare:

অঞ্চল ɔnt͡ʃɔl region

ইনকিলাব in.ki.lab revolution

Some special rendering rules apply for a syllable ending in r when it is part of a consonant cluster. See RA coda followed by a consonant.

One letter and 2 diacritics represent syllable-final consonant sounds only.


3
-t09CE
ŋ̽0982
0983

In a sequence of characters, these should all occur after any combining vowel sign associated with the same syllable. None carry vowel signs.

RA coda followed by a consonant

U+09B0 LETTER RA at the start of a cluster is normally displayed as a mark above the following consonant(s). Unlike Devanagari, it is typically displayed above the consonant glyph, rather than above the vowel sign of the orthographic syllable.

গর্ত gɔrtô hole

শার্ট ʃaɹt shirt

Like other consonant clusters, the sound may also be written without a conjunct at all.

কারসাজি kɑrʃɑd͡ʒi trickery

See also RA+jô-phôla.

Khanda ta

U+09CE LETTER KHANDA TA, pronounced , is a variant form of U+09A4 LETTER TA that was added to Unicode 4.1 as a separate character. This shape occurs in words where the consonant is not followed by an inherent or other vowel sound. It either comes at the end of words, or before a consonant that doesn't naturally combine with TA, eg.

হঠাৎ hɔʈʰɑt suddenly

উৎসব utʃɔb festival

Khanda ta is not used before the following characters:8


7
ttt09A4
TH09A5
nnn09A8
bbb09AC
mmm09AE
d͡ʒyʥ09AF
rrr09B0

Many words, however, use U+09A4 LETTER TA in the same situations, and it's not possible to guess which will be used for a given word, eg.

হঠাত্ ɦɔʈʰat̪ suddenly

This character replaces and obsoletes an earlier approach that required the use of the sequence ত্‍U+09A4 LETTER TA + U+09CD SIGN VIRAMA + U+200D ZERO WIDTH JOINER.

Anusvara

U+0982 SIGN ANUSVARA is a final nasal ŋ, eg. বাংলা ˈbaŋla Bengali, Bangla (language)

Sometimes spelling is inconsistent, especially when this or U+0999 LETTER NGA are used in a conjunct, eg. compare these pairs: সাঙঘাতিক ʃɑŋɡʰɑtik terrible সাংঘাতিক ʃɑŋɡʰɑtik terrible রঙ rɔŋ colour রং rɔŋ colour

However, in certain words the spelling is fixed. One such word is বাংলা ˈbaŋla Bengali, Bangla (language) But, since this cannot support vowel signs, the word for Bengali nation (rather than language) has to be spelled with U+0999 LETTER NGA, ie. বাঙালী bɑŋɑlī bɑŋgɑlī

See also the candrabindu diacritic, which nasalises a vowel.

Visarga

When used to represent a word-final consonant, U+0983 SIGN VISARGA produces vigorous final aspiration ɦ, eg. বাঃ baʱ wow, bravo/left

It doesn't appear in many common words.

One of the other uses of the visarga is to lengthen a following consonant, in which case there is no aspiration, eg. নিঃশব্দ niʃʃɔbdo silence

Consonant clusters

A consonant cluster is a sequence of consonant sounds with no intervening vowels.

A conjunct is a consonant cluster where the lack of intervening vowels is indicated by one or more of stacking, changing and merging the shapes of the constituent letter forms (usually in abugidas). Not all consonant clusters are displayed as conjuncts.

Consonant clusters are written using:

  1. No special rendering. This is a common occurrence in Bengali.
  2. Conjuncts. There are a number of possibilities here.
    1. Fused vertically : Reduce the component shapes and combine them vertically, usually approximately within the normal character height.
    2. Conjoined : The two consonants sit side by side, but the first consonant has an altered shape.
    3. Ligated : A fusion of the component letter shapes where it may be difficult to tell them apart.
    4. Special forms : These apply to cluster-final YA (see Jô-phôla), the letter RA (see Medial RA and RA coda followed by a consonant), and cluster-final MA (see Cluster-final MA).
  3. A visible virama below the non-final consonants in the cluster. In Bengali this may be a conscious decision, and not just a gap in font support.
  4. Final consonant letters or marks followed by another consonant. There is no interaction between the finals and the following character (see Finals).

See also Consonant length.

No special rendering

Unlike languages written in the Devanagari script, consonant clusters are often not represented as conjuncts in Bengali. It is necessary to just know that the vowel should not be pronounced, eg. রিকশা rik.ʃɑ rickshaw

Morphological boundaries, such as grammatical suffixes and endings are typically written without conjuncts, eg. the present tense form of khan plus negative suffix which is na is written খাননা kʰɑnnɑ not eat

The stem kôr from kôra plus present continuous ending chô is written করছ korcʰo doing

Conjunct formation

To produce a conjunct, U+09CD SIGN VIRAMA is added between the consonants in the cluster. There are exceptions, but this type of virama is usually not displayed, eg. the sequence + + U+0995 LETTER KA + U+09CD SIGN VIRAMA + U+09B7 LETTER SSA produces ক্ষ k͓ʃ̇

The font usually determines how a cluster is rendered, although it is possible to influence this (see Formatting characters). Different fonts may combine the same letters in different ways. The following figure shows characters that are combined in different ways by different fonts.

ল্গ ল্প হ্ব জ্ঝ ষ্ক হ্ণ ঞ্ঝ
Conjuncts composed in different ways by the Noto Sans Bengali font (top) and Solaimon Lipi font (bottom). (Click for list of code points.)

Quite often, clustered consonants are pronounced differently than you would expect. In particular, conjuncts ending with U+09AC LETTER BA or U+09AE LETTER MA tend to not pronounce the latter, but double the length of the consonant before it (see Consonant length).

Nasals in conjuncts tend to conform to phonological patterns. Velar consonants (k, kh, g, etc) combine with ŋɔ, palatal consonants (c, ch, ..) combine with ñɔ, retroflex ɳɔ, dental , and labial .

See a table of 2-consonant clusters.
The table allows you to test results for various fonts.

The sections below show examples of the various types of conjunct forms. The lists are not exhaustive. The shapes shown are by default those contained in the Noto Sans Bengali webfont. Other fonts may combine components in different ways. Click on the characters if you want to see the components.

Vertical conjuncts

Conjunct shapes are most commonly formed by arranging the components vertically, reducing and combining the shapes of the individual components as needed. These stacks don't extend below the lower baseline, as they do in conjuncts for many other scripts.

সথ→স্থ
sthô
লল→ল্ল
llô
Vertically fused conjunct forms.

The conjuncts in Figure 9 used in words: আস্থা ɑʃtʰɑ trust ঝিল্লি d͡ʒʰilli grasshopper

Examples of components arranged vertically.


61
প্পp͞p
প্তp͞t
প্নp͞n
প্লp͞l
ফ্লpʰ͞l
ব্লb͞l
ত্তt͞t
ত্বt͞b
ত্ত্বt͞t͞b
থ্বtʰ͞b
ত্নt͞n
দ্বd͞b
দ্ভd͞bʰ
ট্টʈ͞ʈ
ক্বk͞b
ক্টk͞ʈ
ক্কk͞k
ক্লk͞l
গ্ধg͞dʰ
গ্বg͞b
গ্গg͞g
গ্নg͞n
ঘ্নgʰ͞n
গ্লg͞l
শ্বʃ͞b
শ্নʃ͞n
শ্লʃ͞l
ষ্কʃ̇͞k
স্বʃ̈͞b
স্তʃ̈͞t
স্ত্রʃ̈͞t͞r
স্থʃ̈͞tʰ
স্কʃ̈͞k
স্খʃ̈͞kʰ
স্নʃ̈͞n
স্লʃ̈͞l
হ্ণh͞n̈
হ্লh͞l
ম্বm͞b
ম্ভm͞bʰ
ম্নm͞n
ম্লm͞l
ন্বn͞b
ন্তn͞t
ন্ত্বn͞t͞b
ন্থn͞tʰ
ন্নn͞n
ঞ্ছñ͞cʰ
ণ্বn̈͞b
ণ্ডn̈͞ɖ
ণ্ড্রn̈͞ɖ͞r
ণ্ণn̈͞n̈
ণ্মn̈͞m
ঞ্জñ͞ʤ
ঞ্ঝñ͞ʤʰ
ঙ্কŋ͞k
ল্পl͞p
ল্বl͞b
ল্কl͞k
ল্গl͞g
ল্লl͞l

Conjoined conjuncts

Many conjuncts are formed by combining components horizontally. Usually the initial consonant glyph is reduced.

মপ→ম্প
mpô
চ+চ→চ্চ
ccô
Conjoined conjunct forms.

The conjuncts in Figure 10 used in words: ক্যম্পাস kyæmpas campus উচ্চারণ ut͡ʃt͡ʃɑrɔn pronunciation

Some people argue that at least some of these conjoined, half-forms (reminiscent of Devanagari) don't belong in the Bengali script, but that instead a visible virama should be shown.§ In this case, the examples above would look like this:

ক্যম্‌পাস kyæmpas campus

উচ্‌চারণ ut͡ʃt͡ʃɑrɔn pronunciation

Other examples of components arranged side-by-side, frequently with simplification of the initial consonant. The Latin characters are transliterations.


35
প্টp͞ʈ
প্সp͞ʃ̈
ব্বb͞b
ব্দb͞d
দ্দd͞d
ধ্বdʰ͞b
ড্ডɖ͞ɖ
চ্চc͞c
চ্ছc͞cʰ
চ্ঞc͞ñ
জ্বʤ͞b
জ্জʤ͞ʤ
জ্ঝʤ͞ʤʰ
জ্জ্বʤ͞ʤ͞b
ষ্পʃ̇͞p
ষ্ফʃ̇͞pʰ
শ্চʃ͞c
শ্ছʃ͞cʰ
স্পʃ̈͞p
স্ফʃ̈͞pʰ
স্টʃ̈͞ʈ
হ্বh͞b
ল্ফl͞pʰ
ম্পm͞p
ম্ফm͞pʰ
ন্দn͞d
ন্সn͞ʃ̈
ণ্টn̈͞ʈ
ণ্ঠn̈͞ʈʰ
ণ্ঠn̈͞ʈʰ
ণ্ঢn̈͞ɖʰ
ঙ্খŋ͞kʰ
ঙ্মŋ͞m
ল্টl͞ʈ
ল্ডl͞ɖ

Ligating conjuncts

A small set of conjuncts combine the consonants into a ligated shape, where individual components can't always be easily discerned.

ষট→ষ্ট
mpô
কষ→ক্ষ
ccô
Ligated conjunct forms.

The conjuncts in Figure 11 used in words: খ্রিষ্টান kʰriʂʈan christian ক্ষণ kʃon moment

Examples of conjuncts arranged in a way that involves ligation, significantly altering one or more of the components.


20
ব্ধb͞dʰ
ব্জb͞ʤ
ত্থt͞tʰ
দ্ধd͞dʰ
দ্ধ্বd͞dʰ͞b
জ্ঞʤ͞ñ
ট্টʈ͞ʈ
ক্তk͞t
ক্ষ্মk͞ʃ̇͞m
ক্সk͞ʃ̈
ষ্টʃ̇͞ʈ
ষ্ঠʃ̇͞ʈʰ
ষ্ণʃ̇͞n̈
হ্মh͞m
হ্নh͞n
ন্ধn͞dʰ
ন্ধ্রn͞dʰ͞r
ঞ্চñ͞c
ঞ্ঝñ͞ʤʰ
ঙ্গŋ͞g

Jô-phôla

Bengali has a particular way of representing a cluster-final U+09AF LETTER YA, known as jô-phôla (pronounced ʤɔ-pfɔlɑ). This is typically represented with the shape ◌্যU+09CD SIGN VIRAMA + U+09AF LETTER YA, using the full form of the preceding consonant followed by a special form of YA, eg.

হ্যাঁ hæ̃ː yes

Jô-phôla at the end of a conjunct usually has two possible effects:

  • to double the length of the preceding consonant (see Consonant length), or
  • to change the value of the following vowel if it is inherent or a (see Jô-phôla).

Cluster-final MA

A cluster-final U+09AE LETTER MA is also displayed in a characteristic way. The initial consonant is reduced, and the m is rendered as a long vertical line to the right with an appendage to the left at the bottom, producing a kind of diagonal grouping, eg. উন্মত্ত unmɔtto insane

Used this way, this letter is typically silent but produces a lengthening of the previous consonant sound (see Consonant length).

Examples of cluster-final MA.


10
ত্মt͞m
দ্মd͞m
ক্মk͞m
গ্মg͞m
শ্মʃ͞m
ষ্মʃ̇͞m
স্মʃ̈͞m
ন্মn͞m
ম্মm͞m
ল্মl͞m

Visible virama

When the virama is used it may be because the font doesn't have a particular conjunct ligature, but it may also be visible in places where the phonology is unusual, eg. ফ্‌ল্যাট pʰlæʈ flat লান্‌চ lɑnt͡ʃ lunch although these may also be spelled with conjuncts, eg. ফ্ল্যাট pʰlæʈ flat

It is also quite common to see it used to distinguish words such as the following, which are etymologically related, but phonetically distinct:উদ্‌যাপন ụd͓‌ýɑpnউদ্যান ụd͓ýɑn

If a visible virama is wanted but not what the font does by default, it is possible to force it by inserting a ZWNJ character after the virama (see Formatting characters).

Formatting characters

‌U+200C ZERO WIDTH NON-JOINER (ZWNJ) can be used to force the production of a visible virama, rather than a half-form (see Visible virama). It can also be used to prevent the formation of vowel ligatures (see Vowel ligatures).

‍U+200D ZERO WIDTH JOINER (ZWJ) is used to produce special joining forms for RA (see RA coda followed by a consonant) and YA (see Jô-phôla).

Consonant length

There are a number of ways of producing a lengthened consonant sound in Bangla.

A straightforward approach is to duplicate the consonant sound in conjunct form. For example, a long l can be written ল্লU+09B2 LETTER LA + U+09CD SIGN VIRAMA + U+09B2 LETTER LA, eg. ঝিল্লি d͡ʒʰilli grasshopper

Another common way of doubling the length of a consonant is to use a conjunct ending with U+09AC LETTER BA or U+09AE LETTER MA, eg. ভস্ম bʰɔ̃ʃʃo ashes বিশ্ব biʃʃo universe

The jô-phôla (্যU+09CD SIGN VIRAMA + U+09AF LETTER YA) can also lengthen the consonant it follows, eg. জন্য d͡ʒɔnno for

U+0983 SIGN VISARGA can also lengthen the following consonant, with no aspiration, eg. নিঃশব্দ niʃʃɔbdo silence

Consonant sounds to characters

This section maps Bengali consonant sounds to common graphemes in the Bengali orthography.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

p

consonant U+09AA LETTER PA

pʰ~pf

consonant U+09AB LETTER PHA

b

consonant U+09AC LETTER BA

t

consonant U+09A4 LETTER TA

final consonant U+09CE LETTER KHANDA TA Coda.

consonant U+09A5 LETTER THA

t͡ʃ

consonant U+099A LETTER CA

t͡ʃʰ

consonant U+099B LETTER CHA

d

consonant U+09A6 LETTER DA

consonant U+09A7 LETTER DHA

d͡ʒ

consonant U+099C LETTER JA

consonant/consonant lengthener U+09AF LETTER YA when word-initial.

d͡ʒʰ

consonant U+099D LETTER JHA

ʈ

consonant U+099F LETTER TTA

ʈʰ

consonant U+09A0 LETTER TTHA

ɖ

consonant U+09A1 LETTER DDA

ɖʰ

consonant U+09A2 LETTER DDHA

k

consonant U+0995 LETTER KA

ɡ

consonant U+0997 LETTER GA

conjunct জ্ঞU+099C LETTER JA + U+09CD SIGN VIRAMA + U+099E LETTER NYA when word-initial.

ɡɡ

consonant জ্ঞU+099C LETTER JA + U+09CD SIGN VIRAMA + U+099E LETTER NYA when between vowels.

ɡʰ

consonant U+0998 LETTER GHA

f

consonant U+09AB LETTER PHA

v

consonant U+09AD LETTER BHA sometimes.

s

consonant U+09B8 LETTER SA in Bangladesh, or in English words that use the sound s.

consonant U+09B6 LETTER SHA in Bangladesh, or in English words that use the sound s.

z

consonant U+099C LETTER JA especially in Bangladesh, and with words of Perso-Arabic origin.

ʃ

consonant U+09B6 LETTER SHA

consonant U+09B7 LETTER SSA

consonant U+09B8 LETTER SA

h

consonant U+09B9 LETTER HA

final consonant/consonant lengthener U+0983 SIGN VISARGA Coda.

Other

w

consonant+nukta ওয়U+0993 LETTER O + U+09AF LETTER YA + U+09BC SIGN NUKTA (light, like French 'oui') between o...a, eg. দাওয়াত d̪awat̪ invitation.

r ɾ

consonant U+09B0 LETTER RA

ri

vocalic vowel sign U+09C3 VOWEL SIGN VOCALIC R

independent vocalic U+098B LETTER VOCALIC R

ɽ
ɽʱ
l

consonant U+09B2 LETTER LA

j

consonant+nukta য়U+09AF LETTER YA + U+09BC SIGN NUKTA between i...e, a...u, or e...e.

Other features

Other letters

Besides the vowels and consonants described above, the Unicode Bengali block contains the following letters. They don't appear to be commonly used in Bangla.


both
 0980
 09FC

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Canonically equivalent circumgraphs

The 2 circumgraphs can be written as a single character, or as two characters (in decomposed text).

The single code point per vowel sign is the form in common use for Bengali. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches are canonically equivalent.

Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround. In the case of decomposed vowel signs, the order is also important and must be as shown above.

Precomposed Decomposed
U+09CB VOWEL SIGN O োU+09C7 VOWEL SIGN E + U+09BE VOWEL SIGN AA
U+09CC VOWEL SIGN AU ৌU+09C7 VOWEL SIGN E + U+09D7 AU LENGTH MARK

U+09D7 AU LENGTH MARK is only used in this combination and never used on its own.8

Sequences to avoid

The following atomic characters look as if they could be composed of parts, but in fact there is no equivalence during normalisation, and so the atomic characters only should be used.

The single code point on the left should be used, and not the sequence on the right, because they are not made the same by normalisation, and they are not semantically equivalent. Using the right-hand sequence will cause searches and machine understanding of the data to fail.

Use Do not use
U+0986 LETTER AA + U+0985 LETTER A + U+09BE VOWEL SIGN AA
U+09E0 LETTER VOCALIC RR + U+098B LETTER VOCALIC R + U+09C3 VOWEL SIGN VOCALIC R
U+09E1 LETTER VOCALIC LL + U+098C LETTER VOCALIC L + U+09E2 VOWEL SIGN VOCALIC L
U+09CE LETTER KHANDA TA ত + ্ + ‍U+09A4 LETTER TA + U+09CD SIGN VIRAMA + U+200D ZERO WIDTH JOINER

This information draws on the DoNotEmit tables.

Nuktas

The way the Unicode Standard recommends to type and store graphemes with nuktas is slightly inconsistent for Bengali. Here we look at alternative strategies for all uses of the nukta in the Bengali block (usage recommendations for Bangla are given in the section Repertoire extension), and consider the effects of normalising the text using Unicode Normalisation Form D (NFD), and Normalisation Form C (NFC).

For the following alternatives the decomposed form is recommended by the Unicode Standard. NFC does not recombine the parts into precomposed characters. Instead, both NFC and NFD normalisation produce decomposed forms.

Decomposed (recommended) Precomposed
ড়U+09A1 LETTER DDA + U+09BC SIGN NUKTA U+09DC LETTER RRA
ঢ়U+09A2 LETTER DDHA + U+09BC SIGN NUKTA U+09DD LETTER RHA
য়U+09AF LETTER YA + U+09BC SIGN NUKTA U+09DF LETTER YYA

In the next case, the Unicode Standard recommends using the precomposed form. Neither form is converted into the other by normalisation, so they are not equivalent. It is therefore best to use the precomposed form only, so that text matches other text as expected.

Use Do not use
U+09B0 LETTER RA ব়U+09AC LETTER BA + U+09BC SIGN NUKTA

In practise, it's hard to envisage content authors being aware of, let alone respecting, rules about whether they should use precomposed or decomposed forms. Keyboards or other input mechanisms, or perhaps sometimes applications that automatically normalise can perhaps guide users to the recommended practise, but it's likely that Bengali text will always contain a mixture of forms for these graphemes, and matching algorithms will need to be prepared to equate them all.

Codepoint sequences

Nuktas must immediately follow the base consonant they modify.

When 2 vowel signs are used for a circumgraph, the encoded order of the combining marks should match the displayed order, left to right.

Numbers, dates, currency, etc.

This section describes typographic features related to digits, dates, currencies, etc.

Bengali has a set of native digits, which are used regularly in text. They are decimal-based.


10
009E6
109E7
209E8
309E9
409EA
509EB
609EC
709ED
809EE
909EF

See also the section Counters below.

Currency


09F3

U+09F3 RUPEE SIGN is the Bengali rupee sign.

There are also a number of currency symbols, used in older texts, including U+09F2 RUPEE MARK and the following currency denominator signs.


8
unused09F2
archaic09F4
archaic09F5
archaic09F6
archaic09F7
archaic09F8
archaic09F9
unused09FB

These were used in an additive/subtractive system for specifying the number of ānā in the Bengali notation for currency used up to 1957, eg. ৷৷৶৹ 11 ānā (11 ana); ৸৶৹ 15 ānā (15 ana). There are 16 ana in one rupee, and the system works in multiples of 4. For a detailed explanation of usage, see [Pandey].

Text direction

Text is normally written horizontally, left to right.

Show default bidi_class properties for characters by the modern Bangla orthography.

Glyph shaping & positioning

This section describes typographic features related to font/writing styles, cursive text, context-based shaping, context-based positioning, letterform slopes, weights & italics, and case & other character transforms.

This section brings together information about the following topics: font/writing styles; cursive text; context-based shaping; context-based positioning; letterform slopes, weights, & italics; case & other character transforms.

You can experiment with examples using the Bengali character app.

Writing styles

How are fonts grouped into recognisable writing styles? How is each writing style used?

The Noto Sans Bengali font eliminates 'knots' from the letter shapes. Some people feel that this is incorrect for a Bengali font.§ The knots are small round elements attached to the strokes for a letter.

System/Noto fonts that have knots include Tiro Bangla, Noto Serif Bengali, Bangla MN, Bangla Sangam MN, Shonar Bangla, and Vrinda. Those without knots include Noto Sans Bengali, Baloo Da 2, and Kohinoor Bangla.

অ আ ই ঈ এ ঐ ও ঔ ক খ ঞ ট ঢ ণ ত থ ধ ন ফ ভ ম ল শ হ ঢ় ১ ৩ ৯ অ আ ই ঈ এ ঐ ও ঔ ক খ ঞ ট ঢ ণ ত থ ধ ন ফ ভ ম ল শ হ ঢ় ১ ৩ ৯
Comparison of letters in Noto Sans Bengali (top), without knots, and Noto Serif Bengali (bottom), with knots.

Context-sensitive shaping & positioning

Are special glyph forms needed, depending on the context in which a character is used? Do glyphs interact in some circumstances? Are there requirements to position diacritics or other items specially, depending on context? Does the script have multiple diacritics competing for the same location relative to the base?

Bengali fonts need to adapt glyphs based on their context.

Principal areas where context-sensitive shaping is required include conjunct formation (see Consonant clusters) and vowel ligation (see Vowel ligatures).

Positioning of glyphs is also sometimes context-sensitive, particularly where multiple diacritics are applied to a single base. Pre-base vowel signs and circumgraphs require glyphs to be positioned relative to the base consonant(s) to which they are applied.

The rest of this section provides examples of context-sensitive shaping and positioning.

Vowel ligatures

Vowel signs (particularly U) may form ligatures with a preceding base consonant. Figure 13 shows ligated (top) and non-ligated (bottom) forms for several combinations. In certain contexts it may be less appropriate to ligate (eg. newspapers and modern typefaces). Both forms are equivalent in every way but visually.8

রু র‌ু রূ র‌ূ হৃ হ‌ৃ হু হ‌ু ন্তু ন্ত‌ু শু শ‌ু গু গ‌ু
Ligated (top) and non-ligated (bottom) forms for several combinations of consonant+vowel.

See a matrix of consonants followed by vowel signs for Bengali.

The table below indicates which sound combinations are rendered by default as ligatures by which fonts. The results cover Noto fonts, and system fonts for macOS (Sequoia) and Windows 11. The shapes resemble those just above. The link takes you to another page which shows all combinations and highlights the shapes indicated here.

Font gu shu hu hri ntu ru link
Noto Serif Bengali link
Noto Sans Bengali link
Bangla MN link
Bangla Sangam MN link
Tiro Bangla link
Baloo Da 2 link
Kohinoor Bangla link
November Bangla Traditional link
Shonar Bangla ✓* ✓* link
Vrinda ✓* ✓* link

* The Windows fonts (Shonar Bangla, Vrinda) create non-ligated forms for Assamese RA by default. All the other fonts create ligated forms, like Bengali RA.

Observation: The shapes in the Bangla MN and Bangla Sangam MN fonts for are identical to those used in other fonts for hri. Is this a mistake in the font?

The default behaviour of a given font can be modified using ‌U+200C ZERO WIDTH NON-JOINER or ‍U+200D ZERO WIDTH JOINER before the vowel to produce the simple form.8 In principle, if a font produces a ligature by default, and if it has the necessary logic, the ZWNJ should produce the non-ligated form, and vice versa for the ZWJ. In practice, the behaviour varies.

In general use, these joiner characters are rarely needed. A choice of default ligated or non-ligated forms can be made by choosing an appropriate font.

The next table shows the effect of adding ZWJ and ZWNJ between the consonant and vowel for the combinations gu, shu, hu, hri, Bengali ru, , and Assamese ru, . Without a joiner these fonts all render the sequence as ligated, except for Shonar Bangla and Vrinda, where the Assamese ru, is non-ligated. Blink and Gecko browsers appear to render these combinations differently in a textarea element compared to regular HTML, so results are shown for both. Results also indicate the browser engine: B for Blink (Chrome, Edge, ...), G for Gecko (Firefox, ...), and W for Webkit (Safari, ...). L indicates Ligated, L means Non-Ligated. L indicates that the sequence breaks, with a dotted circle between consonant and vowel.

Font ZWJ   ZWNJ  
  HTML Textarea HTML Textarea
Noto Serif Bengali B:NL
G:NL
W:NL
B: NL
G: NL
W:NL
B: NL
G: NL
W:NL
B:L
G: NL
W:NL
Noto Sans Bengali B:L
G:L
W:L
B:L
G:L
W:L
B: NL
G: NL
W:NL
B:L
G: NL
W:NL
Bangla MN B: NL
G: NL
W: NL
B: NL
G: NL
W: NL
B: NL
G: NL
W: NL
B: NL
G: NL
W: NL
Bangla Sangam MN B: NL
G: NL
W: NL
B: NL
G: NL
W: NL
B: NL
G: NL
W: NL
B: NL
G: NL
W: NL
Tiro Bangla B: NL
G: NL
W: ?**
B: NL
G: NL
W: ?**
B: NL
G: NL
W: ?**
B:L
G: NL
W:?**
Baloo Da 2 B: NL EX ru,
G: NL EX ru,
W:?**
B: NL EX ru,
G: NL EX ru,
W:?**
B: NL
G: NL
W:?**
B:L
G: NL
W:?**
Kohinoor Bangla B:L
G:L
W:L EX ru, *
B:L
G:L
W:L EX ru, *
B: NL
G: NL
W: NL
B:L
G: NL
W: NL
November Bangla Traditional B: NL
G: NL
W: ?**
B: NL
G: NL
W: ?**
B: NL
G: NL
W: ?**
B:L
G: NL
W: ?**
Shonar Bangla B:L EX hri
G:L EX hri, as_ru/ū
B:L EX hri, as_ru/ū, as_ru/ū
G:L EX hri, as_ru/ū
B: NL
G: NL
B:L
G: NL
Vrinda B:L EX hri, as_ru/ū
G:L EX hri, as_ru/ū
B:L EX hri, as_ru/ū
G:L EX hri, as_ru/ū
B: NL
G: NL
B:L
G: NL
Nirmala UI B: NL
G: NL
B: NL
G: NL
B: NL
G: NL
B:L
G: NL

?** indicates fonts that are not rendered on Safari.

as_ru/rū indicates Assamese RA.

Typographic units

Word boundaries

Are words separated by spaces, or other characters? Are there special requirements when double-clicking on the text? Are words hyphenated?

The concept of 'word' is difficult to define in any language (see What is a word?). Here, a word is a vaguely-defined, but recognisable semantic unit that is typically smaller than a phrase and may comprise one or more syllables.

Words are separated by spaces.

Graphemes

A grapheme is a user-perceived unit of text. Text operations that use graphemes as a unit of text include line-breaking, forwards deletion, cursor movement & selection, character counts, text spacing, text insertion, justification, case conversions, and sorting. The Unicode Standard uses generalised rules to define 'grapheme clusters', which approximate the likely grapheme boundaries in a writing system, however they don't work well with many complex scripts.

The term orthographic syllable is not clearly defined in the Unicode Standard. In the orthography notes on this site we define it to mean a typographic unit that includes more than one grapheme cluster. This is commonly the case for Brahmi-derived scripts, such as for Devanagari conjuncts, or Balinese stacks. Orthographic syllables do not correspond to phonetic syllables.

Changes introduced in Unicode 15.1 mean that grapheme clusters alone are now sufficient to represent typographic units in Bengali. Conjuncts are common and must not be split apart by edit operations that visually change the text (such as letter-spacing, first-letter highlighting, and in-word line breaking). For those operations one needs to segment the text using orthographic syllables, which string grapheme clusters together with a virama.

The Bengali virama (hasant) is U+09CD SIGN VIRAMA, which has an Indic Syllabic Category of Virama.

Simple grapheme clusters

Base ZW(N)J? Combining_mark* ZW(N)J?

Combining marks may include zero or more of the following types of character.

  1. Nukta [1] (see Repertoire extension) Only one per grapheme cluster, typed and stored immediately after the base consonant.
  2. Dependent vowels [10] (see Plain vowels and Vocalics) Usually a single code point, but in decomposed text can be two (see Circumgraphs).
  3. Nasalisation mark [1] (see Nasalisation) Occurs over an independent vowel, or over a consonant when it is followed by a vowel sign. It is nevertheless typed and stored after any vowel signs.
  4. Final consonant marks [2] (see Finals) One of 2 possible combining marks, at the end of a grapheme cluster sequence. May also occur after independent vowels.
  5. Virama (hasant) (see Consonant clusters and Inherent vowel suppression) Normally occurs immediately after a consonant (and optional nukta) at the beginning of a cluster, but also occurs after independent vowels, particularly when writing the sound æ. It may also occur after RA+ZWJ to force a particular rendered shape for RA (see below).

In some cases, a ZWJ is inserted between RA + hasant followed by YA in order to specify special shaping rules (see RA coda followed by a consonant).

A base consonant may be followed by ZWNJ before vowel sign code points where the author wants to prevent ligation of the following vowel sign (see Vowel ligatures). A ZWNJ may also be used after a virama to prevent conjunct formation and force the virama to be rendered visibly to the reader (see Visible virama).

The following examples show a variety of grapheme clusters:

Click on the text version of these words to see more detail about the composition.

বালতিবালতি ˈbal.t̪iˑ bucket
ইংরেজইংরেজ iŋred͡ʒ English
দুঃখওদুঃখও duk.kʰoo̯ sadness

Larger typographic units

(Consonant Nukta? Virama)+ Simple_grapheme_cluster

Bengali commonly stacks or conjoins glyphs, to form conjuncts (see Consonant clusters). The conjuncts represent consonant clusters or gemination.

Editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, line-breaking, and justification, should never split conjunct forms apart. For this reason, the extended grapheme cluster rules were modified in Unicode 15.1 to cover a number of named scripts, including Bengali. Since Bengali's U+09CD SIGN VIRAMA has an Indic_Conjunct_Break property value of Linker the hasant effectively extends the grapheme cluster that began the conjunct to include the grapheme cluster that follows it.

The following are examples.

Click on the text version of these words to see more detail about the composition.

ঝিল্লিঝিল্লি d͡ʒʰilli grasshopper
ক্যম্পাস kyæmpas campus
ক্ষণ kʃon moment
কুর্তা kurtɑ Indian shirt

Note that consonant clusters aren't always rendered as conjuncts, and in those cases the individual letters remain as separate graphemes. For example, consonants without an inherent vowel may be written using a consonant letter on its own at the end of a word or in other locations (see Inherent vowel suppression).

Click on the text version of these words to see more detail about the composition.

রিকশা rik.ʃɑ rickshaw
খাননা kʰɑnnɑ not eat

On the occasions when a virama needs to be visible even though it is followed by another base, an invisible character must be added to prevent it joining with the following base. ‌U+200C ZERO WIDTH NON-JOINER can achieve that.

ফ্‌ল্যাট pʰlæʈ flat
লান্‌চ lɑnt͡ʃ lunch

Complicating factors

Behaviour is font-dependent. If a font doesn't have a conjunct form for a particular combination of characters it will make the virama visible.

What's important to note here is that it is normally possible to break a line after the grapheme cluster containing the virama when the virama is visible. This is currently difficult to manage because the decision as to whether the text is segmented into 2 graphemes or one depends only on the capabilities of the font used (ie. the rendered result); the code point sequence is identical for both cases, and gives no clues to which approach to segmentation is applicable.

Visible viramas can also affect vowel sign positioning. For the purposes of illustration, see Figure 14, where the placement of the pre-base vowel varies. In the conjunct form on the left, the vowel sign is rendered to the left of the whole conjunct. If the sequence is not rendered as a conjunct, as in the second example, the pre-base glyph precedes the TA, not the SA. (The underlying sequence of characters is the same in both cases.)

কুস্তি
কুস্‌তি
Placement of pre-base vowel glyphs.

The grapheme cluster rules introduced in Unicode 15.1 mean that a sequence with a visible virama is not segmented after the virama. This is not ideal, but given the difficulty in distinguishing between situations where the virama is shown and those where it is not, where a choice has to be made it is probably better that the rules keep sequences together when the virama is invisible.

Browser behaviour

Test in your browser. The words test units that equate to grapheme clusters only, and others that include conjuncts. First, the text is displayed in a contenteditable paragraph, then in a textarea. Results are reported for Gecko (Firefox), Blink (Chrome), and WebKit (Safari) on a Mac.

ইংরেজ ক্যম্পাস ফ্‌ল্যাট কুর্তা

Cursor movement. Move the cursor through the text.
Gecko, Blink, and WebKit browsers step through the text using grapheme clusters, including the new rules around conjuncts (ie. they step over a stack and all associated combining characters in one jump), except that WebKit doesn't include the ZWNJ in the grapheme cluster it ends, requiring an extra jump to get past it (although the cursor doesn't move).

Selection. Place the cursor next to a character and hold down shift while pressing an arrow key.
The behaviour is the same as for cursor movement.

Deletion. Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point, for all browsers.

Line-break. See this test. The CSS sets the value of the line-break property to anywhere. Change the size of the box to slowly move the line break point.
The behaviour is the same as for cursor movement.

Punctuation & inline features

This section describes typographic features related to word boundaries, phrase & section boundaries, bracketed text, quotations & citations, emphasis, abbreviation, ellipsis & repetition, inline notes & annotations, other punctuation, and other inline text decoration.

Phrase & section boundaries

What characters are used to indicate the boundaries of phrases, sentences, and sections?


7
,002C
;003B
:003A
.002E
0964
?003F
!0021

Bangla uses a mixture of ASCII and Bengali punctuation.

phrase

,U+002C COMMA

;U+003B SEMICOLON

:U+003A COLON

sentence

.U+002E FULL STOP

U+0964 DEVANAGARI DANDA

?U+003F QUESTION MARK

!U+0021 EXCLAMATION MARK

The danda, U+0964 DEVANAGARI DANDA, is used for sentence final punctuation.

Observation: I haven't seen much evidence for the use of the double danda, U+0965 DEVANAGARI DOUBLE DANDA.

Western punctuation, such as commas, semicolons, colons, quotation marks and hyphens are also used quite commonly.

Bracketed text


both
(0028
)0029

Bengali commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(U+0028 LEFT PARENTHESIS

)U+0029 RIGHT PARENTHESIS

Quotations & citations

What characters are used to indicate quotations? Do quotations within quotations use different characters? What characters are used to indicate dialogue? Are the same mechanisms used to cite words, or for scare quotes, etc? What about citing book or article names?


4
201C
201D
2018
2019

Bengali texts typically use quotation marks. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

  start end
initial

U+201C LEFT DOUBLE QUOTATION MARK

U+201D RIGHT DOUBLE QUOTATION MARK
nested

U+2018 LEFT SINGLE QUOTATION MARK

U+2019 RIGHT SINGLE QUOTATION MARK

Emphasis

How are emphasis and highlighting achieved? If lines are drawn alongside, over or through the text, do they need to be a special distance from the text itself? Is it important to skip characters when underlining, etc? How do things change for vertically set text?

Italicisation, bolding, and underlining are not traditionally features of Bengali text.

Abbreviation, ellipsis & repetition

What characters are used to indicate abbreviation, ellipsis & repetition?


both
0983
ʼ02BC

The bisɔrgô U+0983 SIGN VISARGA is sometimes used to mark initial abbreviations.

A sign called urdha-comma can be used to indicate truncation of words, eg. কʼরে kôʼre afterʼপরে ʼpôre above The Unicode Standard recommends use of ʼU+02BC MODIFIER LETTER APOSTROPHE.9460

Observation: Wikipedia seems to use a normal apostrophe.

The Unicode Bengali block also has the punctuation U+09FD ABBREVIATION SIGN. It is possible that U+2026 HORIZONTAL ELLIPSIS is used.

Observation: Information is needed about how it is used. It doesn't appear to be in common use.

Other inline features

Any other form of highlighting or marking of text, such as underlining, numeric overbars, etc. What characters or methods (eg. text decoration) are used to convey information about a range of text? If lines are drawn alongside, over or through the text, do they need to be a special distance from the text itself? Is it important to skip characters when underlining, etc? How do things change for vertically set text?

Death marker


09FA

U+09FA ISSHAR is used alongside the names of deceased persons.

Line & paragraph layout

This section describes typographic features related to line breaking & hyphenation, text alignment & justification, text spacing, baselines, line height, counters, lists, and styling initials.

Line breaking

Are there special rules about the way text wraps when it hits the end of a line? Does line-breaking wrap whole 'words' at a time, or characters, or something else (such as syllables in Tibetan and Javanese)? What characters should not appear at the end or start of a line, and what should be done to prevent that? Is hyphenation used, or something else? What rules are used? What difficulties exist?

Bengali is preferably wrapped at word boundaries.

In-word line-breaks

'Hyphenation' here refers to an extra set of rules applied after the basic line-break algorithm to split words at syllable or morphological boundaries in order to improve the layout of a paragraph. Hyphenation may or may not be indicated using a visual marker at the end or start of a line, however it is commonly marked by a hyphen or other glyph.

Bengali text can be hyphenated during line wrap, though it is not very common (unlike several south Indian scripts). This is partly because Bengali contains mostly short words.7

Hyphenation adds a hyphen at the end of the line when a word is broken.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the modern Bangla orthography.

The following list gives examples of typical behaviours for some of the characters used in modern Bangla. Context may affect the behaviour of some of these and other characters.

Click/tap on the Bangla characters to show what they are.

  • “ ‘ (   should not be the last character on a line.
  • ” ’ ) . , ; ! ? । ॥ %   should not begin a new line.
  •   should be kept with any number, even if separated by a space or parenthesis.

Line breaking should not move a danda or double danda to the beginning of a new line even if they are preceded by a space character.

Baselines, line height, etc.

Does the script have special requirements for baseline alignment between mixed scripts and in general? Is line height special for this script? Are there other aspects that affect line spacing, or positioning of items vertically within a line?

tbd

Bangla uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

It also has a 'hanging baseline', which may be used for text alignment in things such as initial letter highlighting. The hanging baseline is based on the top bar that joins the letters.

Bangla requires slightly more vertical space than Latin text. To give an approximate idea, Figure 15 compares Latin and Bangla glyphs from Noto fonts. The basic height of Bangla letters is typically slightly higher than the Latin x-height, including conjunct stacks, however certain letters and combining marks extend beyond the Latin ascenders, creating a need for larger line spacing.

Xhqxঐল্লিম্ভস্ট্রর্তদ্‌ঢ়ীড়ৃঁকৣ Xhqxঐল্লিম্ভস্ট্রর্তদ্‌ঢ়ীড়ৃঁকৣ
Font metrics for Latin text compared with Bangla glyphs in the Noto Serif Bengali (top) and Noto Sans Bengali (bottom) fonts.

Figure 16 shows similar comparisons for the Bangla MN and Vrinda fonts.

Xhqxঐল্লিম্ভস্ট্রর্তদ্‌ঢ়ীড়ৃঁকৣ Xhqxঐল্লিম্ভস্ট্রর্তদ্‌ঢ়ীড়ৃঁকৣ
Latin font metrics compared with Bangla glyphs in the Bangla MN (top) and Vrinda (bottom) fonts.

Counters, lists, etc.

Are there list or other counter styles in use? If so, what is the format used? Do counters need to be upright in vertical text? Are there other aspects related to counters and lists that need to be addressed?

You can experiment with counter styles using the Counter styles converter. Patterns for using these styles in CSS can be found in Ready-made Counter Styles, and we use the names of those patterns here to refer to the various styles.

The modern Bangla orthography uses ASCII digit numbering, but also has a native numeric style.

Numeric

The bengali numeric style is decimal-based and uses these digits.6


10
009E6
109E7
209E8
309E9
409EA
509EB
609EC
709ED
809EE
909EF

Examples:


12
109E7
209E8
309E9
409EA
১১1109E7
09E7
২২2209E8
09E8
৩৩3309E9
09E9
৪৪4409EA
09EA
১১১11109E7
09E7
09E7
২২২22209E8
09E8
09E8
৩৩৩33309E9
09E9
09E9
৪৪৪44409EA
09EA
09EA

Prefixes and suffixes

Generally, Bangla lists use a full stop plus a space as a suffix.

Examples:

১. ২. ৩. ৪. ৫.
Separator for Bangla list counters: full stop + space.

Page & book layout

This section describes typographic features related to general page layout & progression; grids & tables, notes, footnotes, etc, forms & user interaction, and page numbering, running headers, etc.

References & sources

1Tanmoy Bhattacharya, Private correspondence, July 2004

2Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0

3Richard Ishida, Ready-made Counter Styles

4Anshuman Pandey, Proposal to Encode the Ganda Currency Mark for Bengali in ISO/IEC 10646

5William Radice, Teach Yourself Bengali, Hodder & Stoughton, ISBN 0-340-86029-4

6Richard Ishida, Ready-made Counter Styles

7Santhosh Thottingal, Personal correspondence

8Unicode Consortium, The Unicode Standard, Version 16.0, Chapter 12.2: South and Central Asia-I, Official Scripts of South Asia, Bengali (Bangla)

9Unicode Consortium, The Unicode Standard, Version 13.0, Chapter 12.5: South and Central Asia-I, Bengali (Bangla), 473-479, ISBN 978-1-936213-16-0.

10Unicode Consortium, Unicode Line Breaking Algorithm (UAX#14)

11Wikipedia, Bengali language

12Wikipedia, Bengali phonology

13Wikipedia, Bengali alphabet

14Wikipedia, Bengali–Assamese script

See recent changes.  •  Make a comment.  •  Licence CC-By © r12a.