Batak languages (draft)
Batak

Updated 8 May, 2022

This page brings together basic information about the Batak script and its use for the Mandailing, Simalungun, Toba, PakPak, and Karo languages. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write languages using the Batak script with Unicode.

Sample

Select part of this sample text to show a list of characters, with links to more details. Source
Change size:   36px

ᯤ​ᯉᯪ​ᯞᯂ᯲​ᯖᯮ​ᯞᯪ​ᯘᯉ᯲​ᯂᯪ​ᯖ ᯑᯪ​ᯅᯩᯉ᯲​ᯖᯮᯂ᯲​ᯀᯬ​ᯞᯩ​ᯂ᯲​ᯖ​ᯝ​ᯉ᯲​ᯖ​ᯝ​ᯉ᯲​ᯂᯪ​ᯖ. ᯀ​ᯞ​ᯖ᯲​ᯀ​ᯞ​ᯖ᯲​ᯂᯪ​ᯖ, ᯘᯩ​ᯐ​ᯒ​ᯄ᯲​ᯂᯪ​ᯖ. ᯂᯩ​ᯄᯪ​ᯞ​ᯝ​ᯉ᯲​ᯀ​ᯂ​ᯘ​ᯒ​ᯤ​ᯉᯪ, ᯅᯩ​ᯒ᯲​ᯔ​ᯂ᯲​ᯉ​ᯂᯩ​ᯄᯪ​ᯞ​ᯝ​ᯉ᯲​ᯑᯪ​ᯒᯪ​ᯂᯪ​ᯖ​ᯘᯩᯉ᯲​ᯑᯪ​ᯒᯪ.

Zero-width spaces have been added to the sample text so that it will wrap after orthographic syllables, rather than shoot off the right side of the page.

Usage & history

The Batak script is used on the island of Sumatra to write the five Batak dialects Karo, Mandailing, Pakpak, Simalungun, and Toba, which can differ as much as do the related languages English and Dutch. The script is taught in schools more for cultural purposes than as a practical writing system for Batak. The overwhelming majority of writing by Bataks is in Indonesian, as elsewhere in Indonesia). Batak script can be found in the signage of shops and governmental institutions.ek

ᯘᯮᯒᯖ᯲ᯅᯖᯂ᯲

It is thought that the script may be derived from the Kawi and Pallava scripts, both ultimately derived from the Indian Brahmi script, or from the hypothetical Proto-Sumatran script influenced by Pallava.ws For much more detail on Batak origins and development see Sejarah Aksara Batakuk.

Basic features

The Batak script is an abugida. Consonants carry an inherent vowel which can be modified by appending vowel-signs to the consonant. See the table to the right for a brief overview of features for the modern Batak orthography (the character counts are for a superset of all 5 languages described here). Each language uses its own set of consonants and vowel-signs from this set, and several Unicode code points are specific to a particular language.

Balinese text runs left to right in horizontal lines.

Words are not separated by spaces, and text segmentation doesn't pay attention to word boundaries.

The 33 consonant letters in the Unicode Batak block include many duplicates because of variations in the orthography from language to language. Repertoire extensions for 2 non-native sounds are achieved in Mandailing only by applying the tompi diacritic to characters.

There are no conjuncts or stacks.

Syllable-initial clusters appear to be restricted to prenasalised consonants (for which there are 2 dedicated letters) or nasal+consonant, which are written using an unpronounced standalone vowel, and these only occur in Karo.

Word-final consonant sounds may be represented by 2 final-consonant diacritics. Otherwise, if nothing follows, they are ordinary consonants followed by a visible [U+1BF2 BATAK PANGOLAT] (or [U+1BF3 BATAK PANONGONAN] in Karo and Simalungun).

A peculiarity of Batak is that a syllable CVCv (where 'v' represents the vowel-killer) is rendered as CCVv when the vowel is expressed using a vowel-sign.

The Balinese orthography has an inherent vowel, and represents vowels using language-specific selections from the superset of 9 vowel-signs. There are no pre-base vowels or circumgraphs. All vowel-signs are combining marks, and are stored after the base character.

Batak has 3 independent vowels (one with 2 alternate shapes) which are optionally mixed with standalone vowels represented by vowel-signs applied to [U+1BC0 BATAK LETTER A] (or [U+1BC1 BATAK LETTER SIMALUNGUN A] in Simalungun).

Batak has very little punctuation, and the punctuation marks that are used generally indicate section boundaries.

Character index

Letters

Show

Basic consonants

ᯂ␣ᯃ␣ᯄ␣ᯅ␣ᯆ␣ᯇ␣ᯈ␣ᯉ␣ᯋ␣ᯌ␣ᯍ␣ᯎ␣ᯏ␣ᯐ␣ᯑ␣ᯒ␣ᯓ␣ᯔ␣ᯕ␣ᯖ␣ᯗ␣ᯘ␣ᯙ␣ᯚ␣ᯛ␣ᯜ␣ᯝ␣ᯞ␣ᯟ␣ᯠ␣ᯡ␣ᯢ␣ᯣ

Extended consonants

ᯄ᯦␣ᯚ᯦

Vowels

ᯀ␣ᯁ␣ᯤ␣ᯥ

Combining marks

Show

Vowels

ᯧ␣ᯨ␣ᯩ␣ᯪ␣ᯫ␣ᯬ␣ᯭ␣ᯮ␣ᯯ

Finals

ᯰ␣ᯱ

Other

᯲␣᯳␣᯦

Punctuation

Show
᯼␣᯽␣᯾␣᯿
Items to show in lists

Structure

Batak words typically have a root that consists of 2 syllables, with the patterns CVCCVC, CVCV, CVCVC, or CVCCV.

Affixes added to the word can then make it longer.

Phonology

The following represents the repertoire of the Batak languages Toba, Karo, Mandailing, Simalungun, and Pakpak.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

i u e o ə ə a

ə is present only in Karo and Pakpak. The languages of the southern group substitute the sound o in words that are otherwise the same.

Consonant sounds

labial dental alveolar palatal velar glottal
stop p b t d     k ɡ ʔ
prenasalised ᵐb ⁿd        
affricate     t͡ʃ d͡ʒ      
fricative     s     h
nasal m   n ɲ ŋ
approximant w   l j  
trill/flap     r

Final h doesn't occur in Toba or Mandailing.

w and j are recent additions to the Toba and Simalungun languages, inherited from Indonesian loan words.

Vowels

Vowel sounds to characters

This section maps the vowel sounds of several Batak languages to common graphemes in the Batak orthography, grouped by vowel-sign ( vs ) or standalone ( s ). Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc.

Plain vowels

i
vs

[U+1BEA BATAK VOWEL SIGN I] for Karo, Mandailing, Pakpak, and Toba

[U+1BEB BATAK VOWEL SIGN KARO I] for Simalungun & as an alternative for Karo 

 
s

[U+1BE4 BATAK LETTER I] for all languages

u
vs

[U+1BEE BATAK VOWEL SIGN U] for Mandailing, Pakpak, Simalungun, & Toba

[U+1BEC BATAK VOWEL SIGN O] for Karo 

[U+1BEF BATAK VOWEL SIGN U FOR SIMALUNGUN SA] for Simalungun, used only with the letter [U+1BD9 BATAK LETTER SIMALUNGUN SA].

 
s

[U+1BE5 BATAK LETTER U] for all languages

e
vs

[U+1BE9 BATAK VOWEL SIGN EE] for Karo, Mandailing, Pakpak, Simalungun, & Toba

o
vs

[U+1BEC BATAK VOWEL SIGN O] for Mandailing, Pakpak, Simalungun, & Toba

[U+1BED BATAK VOWEL SIGN KARO O] for Karo

[U+1BE8 BATAK VOWEL SIGN PAKPAK E] for Karo 

ə
vs
a
-

Inherent vowel

 
s

[U+1BC0 BATAK LETTER A] for Mandailing, Toba, Karo & Pakpak

[U+1BC1 BATAK LETTER SIMALUNGUN A] for Simalungun 

Diphthongs and other combinations

ou
vs

[U+1BED BATAK VOWEL SIGN KARO O] for Simalungun

Inherent vowel

a following a consonant is not written, but is seen as an inherent part of the consonant letter, so ka is written by simply using the consonant letter [U+1BC2 BATAK LETTER HA].

Vowel-signs

Non-inherent vowel sounds that follow a consonant are represented using vowel-signs, eg. kiː is written ᯂᯪ [U+1BC2 BATAK LETTER HA + U+1BEA BATAK VOWEL SIGN I].

An orthography that uses vowel-signs is different from one that uses simple diacritics or letters for vowels in that the vowel-signs may be attached to the syllable as a whole, rather than just applied to the letter of the immediately preceding consonant. See finals.

Batak vowel-signs are all combining characters. All vowel-signs are stored after the base consonant. There are no vowel-signs displayed before the base, and there are no circumgraphs.

Four vowel-signs are spacing marks, meaning that they consume horizontal space when added to a base consonant.

Combining marks used for vowels

Batak languages use the following dedicated combining marks for vowels. Some languages have additional sounds, and some assign the same symbols to different sounds.

Toba & Mandailing

ᯪ␣ᯮ␣ᯩ␣ᯬ

Pakpak

ᯪ␣ᯮ␣ᯩ␣ᯬ␣ᯨ

Simalungun

ᯫ␣ᯮ␣ᯯ␣ᯩ␣ᯬ␣ᯭ

Karo

ᯪ␣ᯫ␣ᯬ␣ᯩ␣ᯨ␣ᯭ␣ᯧ

Different characters may be used for the same sound in Karo.

U ligatures

The vowel-sign [U+1BEE BATAK VOWEL SIGN U] often ligates with the base character, as can be seen in the following examples:

ᯇᯮ␣ᯅᯮ␣ᯗᯮ␣ᯖᯮ␣ᯑᯮ␣ᯂᯮ␣ᯎᯮ␣ᯐᯮ␣ᯘᯮ␣ᯀᯮ␣ᯔᯮ␣ᯉᯮ␣ᯝᯮ␣ᯋᯮ␣ᯍᯮ␣ᯒᯮ␣ᯞᯮ␣ᯛᯮ

Standalone vowels

Batak has two ways to represent standalone vowels.

Vowel-signs

ᯀᯪ␣ᯀᯫ␣ᯀᯬ␣ᯀᯩ␣ᯀᯧ␣ᯀᯨ␣ᯀᯭ

The normal approach combines a vowel sign with [U+1BC0 BATAK LETTER A] (or [U+1BC1 BATAK LETTER SIMALUNGUN A] in Simalungun).

ᯀᯮᯅᯉ᯲ ᯀᯮᯑᯉ᯲

Independent vowels

ᯤ␣ᯥ␣ᯀ␣ᯁ

Batak has independent vowels for 3 sounds. The use of [U+1BE4 BATAK LETTER I] and [U+1BE5 BATAK LETTER U], rather than the vowel-sign combinations shown above, is optional, and spelling of words may vary even within the same document.ab

ᯤᯉ ᯥᯑᯉ᯲ ᯀᯎᯂ᯲

[U+1BC0 BATAK LETTER A] may also represent the sound ha for Pakpak and Karo. The reading is ambiguous.

Suppressing the inherent vowel

To indicate that a consonant is not followed by an inherent vowel, Mandailing, Pakpak, and Toba use [U+1BF2 BATAK PANGOLAT], whereas Karo and Simalungun use [U+1BF3 BATAK PANONGONAN]. These characters are used after final consonants whether they appear word-medially or word finally.

ᯗᯉ᯲ᯑᯰ ᯀᯮᯑᯉ᯲ ᯂᯉ᯳ᯘᯰ ᯀᯉᯂ᯳

See also finals, which describes some unusual behaviour for final consonants that are preceded by a vowel-sign.

Consonants

Consonant sounds to characters

This section maps consonant sounds for languages using the Batak script to common graphemes in the orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, Sanskrit, etc. If no language is specified, the assignment is valid for all languages covered here.

Stops

p
l

[U+1BC7 BATAK LETTER PA] for Karo, Mandailing, Pakpak, & Toba

[U+1BC8 BATAK LETTER SIMALUNGUN PA] for Simalungun 

b
l

[U+1BC5 BATAK LETTER BA] for Mandailing, Pakpak, Toba, & Simalungun

[U+1BC6 BATAK LETTER KARO BA] for Karo 

ᵐb
l

[U+1BE3 BATAK LETTER MBA] for Karo

t
l

[U+1BD6 BATAK LETTER SOUTHERN TA] for Mandailing, Simalungun, and Toba

[U+1BD7 BATAK LETTER NORTHERN TA] for Karo, Pakpak, and also Toba 

d
l

[U+1BD1 BATAK LETTER DA] for all languages

ⁿd
l

[U+1BE2 BATAK LETTER NDA] for Karo

k
l

[U+1BC2 BATAK LETTER HA] for Pakpak & Karo in both syllable-initial and -final positions, and in syllable-final position only for Toba

[U+1BC3 BATAK LETTER SIMALUNGUN HA] for Simalungun (syllable-final only)

[U+1BC4 BATAK LETTER MANDAILING HA] for Mandailing in syllable-final position only 

ᯄ᯦ [U+1BC4 BATAK LETTER MANDAILING HA + U+1BE6 BATAK SIGN TOMPI] for Mandailing in syllable-initial position using the extended repertoire

ɡ
l

[U+1BCE BATAK LETTER GA] for Karo, Mandailing, Pakpak, & Toba

[U+1BCF BATAK LETTER SIMALUNGUN GA] for Simalungun 

Affricates

t͡ʃ
l

[U+1BD8 BATAK LETTER SA] for Pakpak 

[U+1BE1 BATAK LETTER CA] for Karo

[U+1BE0 BATAK LETTER NYA] also for Karo 

ᯚ᯦ [U+1BDA BATAK LETTER MANDAILING SA + U+1BE6 BATAK SIGN TOMPI] for Mandailing

d͡ʒ
l

[U+1BD0 BATAK LETTER JA] for all languages

Fricatives

s
l

[U+1BD8 BATAK LETTER SA] for Toba, Pakpak, and Karo

[U+1BDA BATAK LETTER MANDAILING SA] for Mandailing

[U+1BD9 BATAK LETTER SIMALUNGUN SA] for Simalungun

h
l

[U+1BC4 BATAK LETTER MANDAILING HA] for Mandailing in syllable-initial position

[U+1BCC BATAK LETTER SIMALUNGUN WA] for Simalungun in syllable-initial position

[U+1BC2 BATAK LETTER HA] for Toba in syllable-initial position

  [U+1BC0 BATAK LETTER A] for Pakpak and Karo

Nasals

m
l

[U+1BD4 BATAK LETTER MA] in Karo, Mandailing, Pakpak, & Toba

[U+1BD5 BATAK LETTER SIMALUNGUN MA] for Simalungun 

n
l

[U+1BC9 BATAK LETTER NA] for all languages

[U+1BCA BATAK LETTER MANDAILING NA] as an alternative for Mandailing

ɲ
l

[U+1BE0 BATAK LETTER NYA] for Mandailing, Simalungun, & Toba

ŋ
l

[U+1BDD BATAK LETTER NGA] for all languages

Other

w
l

[U+1BCB BATAK LETTER WA] for Mandailing, Toba, Pakpak, & Karo

[U+1BCC BATAK LETTER SIMALUNGUN WA] for Simalungun loan words

[U+1BCD BATAK LETTER PAKPAK WA] an alternative shape used for Pakpak & Toba

r
l

[U+1BD2 BATAK LETTER RA] for Karo, Mandailing, Pakpak, & Toba

[U+1BD3 BATAK LETTER SIMALUNGUN RA] for Simalungun 

l
l

[U+1BDE BATAK LETTER LA] for Karo, Mandailing, Pakpak, & Toba

[U+1BDF BATAK LETTER SIMALUNGUN LA] for Simalungun 

j
l

[U+1BDB BATAK LETTER YA] for Mandailing, Karo, & Pakpak, and in loan words for Toba

[U+1BDC BATAK LETTER SIMALUNGUN YA] for Simalungun loan words

Basic consonants

Stops & affricate

Simalungun
ᯈ␣ᯅ␣ᯖ␣ᯑ␣ᯃ␣ᯏ
Mandailing
ᯇ␣ᯅ␣ᯖ␣ᯑ␣ᯄ᯦␣ᯎ
Pakpak
ᯇ␣ᯅ␣ᯗ␣ᯑ␣ᯂ␣ᯎ
Toba
ᯇ␣ᯅ␣ᯖ␣ᯗ␣ᯑ␣ᯂ␣ᯎ
Karo
ᯇ␣ᯆ␣ᯗ␣ᯑ␣ᯂ␣ᯎ␣ᯢ␣ᯣ

Affricates

Simalungun
Mandailing
ᯚ᯦␣ᯐ
Pakpak
ᯘ␣ᯐ
Toba
Karo
ᯡ␣ᯐ

Fricatives

Simalungun
ᯙ␣ᯌ
Mandailing
ᯘ␣ᯄ
Pakpak
ᯙ␣ᯀ
Toba
ᯙ␣ᯂ
Karo
ᯙ␣ᯀ

Nasals

Simalungun
ᯕ␣ᯉ␣ᯠ␣ᯝ
Mandailing & Toba
ᯔ␣ᯉ␣ᯠ␣ᯝ
Pakpak & Karo
ᯔ␣ᯉ␣ᯝ

Approximants

Simalungun
ᯌ␣ᯓ␣ᯟ␣ᯜ
Mandailing & Kora
ᯋ␣ᯒ␣ᯞ␣ᯛ
Toba & Pakpak
ᯍ␣ᯒ␣ᯞ␣ᯛ

Repertoire extension

ᯄ᯦␣ᯚ᯦

Mandailing uses [U+1BE6 BATAK SIGN TOMPI] to change the sound of a couple of letters.

It changes the sound of [U+1BC4 BATAK LETTER MANDAILING HA] from h to k, and changes the sound of [U+1BDA BATAK LETTER MANDAILING SA] from s to t͡ʃ.

Note the difference in position for [U+1BF1 BATAK CONSONANT SIGN H] and [U+1BE6 BATAK SIGN TOMPI].

Onset consonants

The only syllable-initial consonant clusters in Batak appear to be nasal+consonant or prenasalised consonants. These appear to be limited to the Karo languageuk.

ᯣ␣ᯢ

For the Karo language there are two dedicated characters for prenasalised sounds. They are [U+1BE3 BATAK LETTER MBA] and [U+1BE2 BATAK LETTER NDA].

ᯀᯧᯣᯱ

The Surat Batak courseware describes an approach for other nasal+consonant combinations in their description of the Karo language. This involves writing a syllable-final nasal consonant after an unpronounced, standalone schwa (ᯀᯧ [U+1BC0 BATAK LETTER A + U+1BE7 BATAK VOWEL SIGN E]). An initial ŋ however is written ᯀᯰ [U+1BC0 BATAK LETTER A + U+1BF0 BATAK CONSONANT SIGN NG]. Here are some examples.

ᯀᯧᯉ᯳ᯗᯧᯒᯧᯔ᯳

ᯀᯧᯉ᯳ᯠᯪᯑᯱᯂᯧᯉ᯳

ᯀᯧᯉ᯳ᯐᯬᯒ᯳ᯐᯬᯒ᯳

ᯀᯰᯂᯧᯒᯪᯂᯧᯉ᯳

Final consonants

ᯰ␣ᯱ

Batak uses one of 2 diacritics to indicate syllabe-final consonants.

[U+1BF0 BATAK CONSONANT SIGN NG] is used for final in all languages.

[U+1BF1 BATAK CONSONANT SIGN H] is used for final -h in Karo, Mandailing, and Simalungun. (This consonant doesn't appear in syllable-final position in Toba and Mandailing.)

Otherwise a normal consonant letter is used, followed by the vowel-killer, which is either [U+1BF2 BATAK PANGOLAT] (Mandailing, Pakpak, Toba) or [U+1BF3 BATAK PANONGONAN] (Karo, Simalungun).

ᯞᯂ᯲ᯞᯂ᯲ ᯥᯑᯉ᯲

Vowel reordering

Batak has a unique behaviour if a vowel-sign is used between onset and coda consonants.

If a vowel-sign is used, the glyphs for the onset and coda consonants are placed side by side, and are followed by the glyphs for the intervening vowel and the vowel-killer. This reordering is produced by the font, and the typed order or order stored in memory remains the same as the spoken order. fig_finals shows an example.

ᯀᯝᯪᯉ᯲
The word a.ŋin ('wind') showing the location of onset and coda consonants and the vowel-sign after they have been rearranged by the font. The in memory or typed order is [a ŋ i n ∅].

This also applies to non-spacing marks, such as -u in the following.

ᯐᯒᯮᯔ᯲

Consonant clusters

Batak consonant letters do not interact to create conjuncts.

Where a syllable onset folllows a syllable coda, the lack of vowels is indicated using a vowel killer sign. See novowel and finals.

Encoding choices

There appears to be little to report with regards to Batak encoding choices. No characters decompose during NFD normalisation, no characters have an appearance that can be created by combining others, and no glyphs are easily mistaken for other Unicode characters.

CVC ordering

One potential trap to be aware of is described in vowel_reorder. Essentially, characters making up syllables with the shape CVCv are stored in memory in that order, and not in the order CCVv in which they are displayed. The display order is produced by the font, and not by typing the characters in the visual order.

Numbers

tbd

Text direction

Batak text runs left to right in horizontal lines that flow from top to bottom.

References to vertical, bottom to top, writing really refer to sideways writing on things such as bamboo, rather than to a vertical writing mode, according to Everson and Kozok.ek

Show default bidi_class properties for characters in the Batak orthographies described here.

Glyph shaping & positioning

This section brings together information about the following topics: writing styles; cursive text; context-based shaping; context-based positioning; baselines, line height, etc.; font styles; case & other character transforms.

You can experiment with examples using the Batak character app.

Batak text is not cursive (ie. joined up like Arabic).

The orthography has no case distinction, and no special transforms are needed to convert between characters.

Context-based shaping & positioning

The main shaping behaviour takes place where CVC syllables swap the displayed order of the nucleus and coda (although the in memory sequence retains the same order as the sounds are spoken). For more information see vowel_reorder.

Although it is a Brahmi-derived script, there are no conjunct forms.

Care needs to be taken in certain cases where multiple diacritics are applied to the same base. When both [U+1BE9 BATAK VOWEL SIGN EE] and [U+1BF0 BATAK CONSONANT SIGN NG] occur together they are rendered side by side, rather than one atop the other. The same goes for [U+1BE9 BATAK VOWEL SIGN EE] and [U+1BF1 BATAK CONSONANT SIGN H].

ᯒᯩᯰ
Multiple combining marks rendered side by side above the base.

Another important aspect of shaping occurs where [U+1BEE BATAK VOWEL SIGN U] ligates with or is positioned differently according to its base. For example:

ᯖᯮ ᯇᯮ ᯌᯮ ᯞᯮ

Examples of shaping and positioning applied to [U+1BEE BATAK VOWEL SIGN U] when it follows a particular base character.

 

Font styles

tbd

Typographic units

The rules for segmenting Batak are not clear. See a discussion of approaches to line-breaking.

Section delimiters are described in phrase.

Word boundaries

Words are not separated by spaces, and words are not relevant in determining boundaries for typographic units.

Punctuation & inline features

Phrase & section boundaries

Batak is normally written as an unbroken sequence of letters, without punctuation other than a few paragraph or section dividers.

The Unicode repertoire includes code points for the following bindu.

᯼␣᯽␣᯾␣᯿
phrase

᯿ [U+1BFF BATAK SYMBOL BINDU PANGOLAT]

paragraph

[U+1BFC BATAK SYMBOL BINDU NA METEK]

[U+1BFD BATAK SYMBOL BINDU PINARBORAS] 

title separator [U+1BFE BATAK SYMBOL BINDU JUDUL]

᯿ [U+1BFF BATAK SYMBOL BINDU PANGOLAT] is used to disambiguate text and follows a word, partially surrounding the final letter.

[U+1BFC BATAK SYMBOL BINDU NA METEK] and [U+1BFD BATAK SYMBOL BINDU PINARBORAS]  are used to indicate the beginning of paragraphs and stanzas. It can be written as a large sign that physically separates the sections of text, eg. by means of a long trailing line leading from it.

[U+1BFE BATAK SYMBOL BINDU JUDUL] is used to separate a title from the following text.

Batak also uses symbols to begin texts which do not appear to have Unicode code points. These are often decorative and take many forms.

A pustaha text (see generallayout) will often begin with a godang bindu, while bamboo texts will commonly begin with the pinarjolma bindu.

godang bindu

pinarjolma bindu
Examples of the godang bindu (top), and the pinarjolma bindu (bottom).

Parentheses & brackets

tbd

Quotations

tbd

Emphasis

tbd

Abbreviation, ellipsis & repetition

tbd

Other inline ranges

tbd

Text spacing

tbd

Inline notes & annotations

tbd

Other punctuation

tbd

Line & paragraph layout

Line breaking & hyphenation

The rules for where line-break opportunities occur are not very clear. The following paragraphs describe alternative approaches. After each description, the effect is shown on line-break opportunities for the word ᯂᯔ᯳ᯇᯪᯞ᯳ .

Currently, Unicode properties assigned to Batak letters have the value AL (ordinary alphabetic and symbol characters), which requires other characters to provide break opportunities; otherwise, unless tailored rules are applied, no line breaks are allowed between pairs of letters. This is clearly inappropriate for Batak, given the lack of spaces and the paucity of punctuation marks. As a result, Batak text usually runs off the right edge of a web page, because the line is not broken.

Show (default) line-breaking properties for characters in the Batak orthography described here.

Everson & Kozok, in their Batak Unicode proposalek, say that lines are broken after a 'full orthographic syllable', which they define to be C(V(Cp|F)) where a consonant C may be followed by a vowel V which may be followed either by a killed consonant Cp or a final -ng or -h F. This actually represents a full phonetic syllable, including cases such as ᯂᯂ᯲ kak.

On the other hand, proposals are currently being discussed by the Unicode Script Ad Hoc committee to break after a BCCS except where a vowel-killer appears – in which case, the whole phonetic syllable would be kept together. The latter exception is primarily motivated by the limitations of current rendering systems, which would struggle to achieve the reordering of final-consonant and vowel-sign before a vowel-killer if a line-break intervenes.

However, it is not hard to find examples of written Batak where line-breaks can occur before a vowel-sign or a vowel-killer, indicating that line-break opportunities can occur before any spacing glyph. Note that the vowel-sign and vowel-killer are represented in Unicode as combining marks, so we are splitting the BCCS unit here. Note also that the order of characters doesn't change. For examples, see handwritten examples of line-initial vowel-killer, and vowel-sign, and both in this printed text.

If line-breaks were introduced at grapheme cluster boundaries, the order of characters in a syllable rendered as CCVv would presumably not be maintained, so this does not appear to be a viable approach.

Text alignment & justification

tbd

Counters, lists, etc.

tbd

Styling initials

tbd

Page & book layout

This section is for any features that are specific to Batak and that relate to the following topics: general page layout & progression; grids & tables; notes, footnotes, etc; forms & user interaction; page numbering, running headers, etc.

General page layout & progression

The following is a description of media used for writing Batak, taken from Wikipediawsi,#Media.

Batak letters are traditionally written in a number of media, among which the most common are bamboo, bone, and bark. Manuscripts with these media can be found in various sizes and sophistication.

Common everyday writings are inscribed on the surface of bamboo or bone with a small knife. These strokes are then blackened with soot to improve readability. Bamboo and bones written on Batak letters are commonly used as daily tools, for example as storage tubes for areca nut or necklaces as well as amulets to ward off evil.

Traditional Batak priests (datu) write their knowledge on concertina-like scrolls called pustaha. To make pustaha, the bark of the agarwood tree ( Aquilaria malaccensis) is cut and mashed into long sheets called laklak . The length of these sheets can range from 60 cm to 7 m, but the largest known pustaha (now stored in the Tropenmuseum, Netherlands) is around 15 m long. This sheet of laklak is then folded, and both ends glued to a wooden cover called lampak, which often has a Boraspati lizard engraved on it. Unlike the bamboo and bone script, the Pustaha script is written in ink using a pen from the ribs of palm leaves (Arenga pinnata) called suligi or a pen from buffalo horn called tahunan.

Paper was only used in limited quantities from the mid-19th century onwards, but bamboo, bone, and bark continued to be used as the main medium for writing Batak script until the 20th century when the tradition of writing Batak script began to disappear.

References