Buhid orthography notes v32

This page brings together basic information about the Buhid script and its use for the Buhid language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Buhid using Unicode.

Citing this document

Richard Ishida, Buhid Orthography Notes, 27-Apr-2026, https://r12a.github.io/scripts/buhd/bku

Note: Given the difficulty in finding term lists in written in the Buhid orthography, the examples cited here were derived manually by applying the rules of the orthography to Latin transcriptions. Buhid is a simple enough script that these should be reliable, except that there is a question around the representation of the consonant f. In this page we represent that sound using ᝉ (click the name for more information). This tallies with usage for Baybayin.

Usage & history

Origins of the Buhid script, 18thC – today.

Phoenician

└ Aramaic

└ Brahmi

└ Pallava

└ Old Kawi

└ Baybayin

└ Buhid

+ Hanunó'o

+ Kulitan

+ Tagalog

+ Tagbanwa

+ Ibalnan

+ Balinese

+ Batak

+ Javanese

+ Lontara

+ Sundanese

+ Rencong

+ Rejang

The Buhid language is spoken by around 11,000e Mangyans in the island of Mindoro, Philippines.

The Buhid script is currently endangered, and authorities in the area where it is spoken are trying to encourage its use by the younger generation. One particularly common former use was for writing ambahan, traditional poetry.

When the Spaniards arrived in the Philippines in the 1500s they were surprised to find that the inhabitants were largely literate in scripts of which Buhid is one survivor. The scripts have the characteristics of Brahmi-derived scripts, but the pathway that led to this orthography is not clear. It is thought that it may lead via Java and have arrived in the Philippines between the 10th and 14th centuries.me

Unicode 17 has 1 dedicated block, comprising 30 characters.

More information: Lorenzo Catapang

Basic features

The Buhid script is an abugida, ie. each consonant contains an inherent vowel sound.

❯ basicV

Vowels The inherent vowel is generally pronounced a, but sometimes ʌ.

Post-consonant vowels are written using one of only 2 combining marks (vowel signs), representing 4 sounds.

There are no pre-base glyphs, circumgraphs, or composite vowel signs

Standalone vowel sounds are written using 3 independent vowels, which may occur word-initially or word-medially.

❯ consonantSummary

Consonants Buhid has 15 consonant letters, but they are only used to indicate syllable onsets. Syllable codas are not written. This can lead to some word ambiguity, and also means that the text doesn't indicate any consonant clusters.

Vowel absence Because codas are not written, Buhid has no conjuncts or other special mechanisms for handling consonant clusters, which anyway only occur normally when a syllable with a coda precedes a syllable with a consonant onset.

Numbers Buhid has no native digits.

Layout Text runs left to right in horizontal lines. Words are separated by spaces. There are no case distinctions.

Buhid traditionally uses just 2 native punctuation marks.

Notable features

only 2 combining marks are used for the 4 post-consonant vowel sounds other than the inherent vowel
syllable codas are not written

Phonology

The following represents the repertoire of the Buhid language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Barham.

Vowel sounds

Plain vowels

a and ʌ are sometimes interchangeable.

Consonant sounds

	labial	alveolar	palatal	velar	glottal
stop	p b	t d		k ɡ	ʔ
fricative	f	s		x ɣ	h
nasal	m	n		ŋ
approximant	w	l	j
trill/flap		r

Tone

Buhid is not a tonal language.

Structure

Barhammb reports 2 syllable types:

CV | CVC

These are combined into words with the following structures:

CVC | CVCV | CVCVC |CVCCVC | CVCCV

Barham reportsmb§9 that certain Tagalog words with the structure CVCVC have corresponding words in Buhid with the structure CVC where there is a tendency to lengthen the single vowel, but only in slow speech.

The following restrictions apply:

Onset: p only appears in loan words. Otherwise all consonants can appear.
Nucleus: Includes any vowel.
Coda: Can be any consonant except f and h.

Barham (p8) provides additional detail about which consonant sequences can appear in clusters.

Vowels

	Post-consonant	Standalone
	ᝒ,,ᝓ	ᝁ, ,ᝂ
	ᝒ,,ᝓ	ᝁ, ,ᝂ
	ⓘ,ⓘ	ᝀ,ᝀ

ⓘ represents the inherent vowel.

Inherent vowel

ᝃ ka

The inherent vowel is pronounced a. So ma is written by simply using the consonant letter.

eg.

ᝋᝌᝏ

ᝋ,ᝌ,ᝏ

Post-consonant vowels

Post-consonant vowels are written using only 2 combining marks (for 4 sounds). The two marks are identical and are distinguished only by position relative to the base.

There are no pre-base glyphs, circumgraphs, or composite vowel signs.

Vowel signs

ᝃᝒ ki

Buhid uses the following two combining marks to override the inherent vowel.

ᝒ,ᝓ

Each vowel sign represents one of 2 sounds. 1752 represents either the sound i or the sound e; 1753 represents either u or o.

eg.

ᝐᝒᝆ

ᝐᝒᝌᝓ

ᝐᝓᝑᝓ

ᝐᝓᝎ

In principle, the glyphs look the same, and the distinction is made by position: i ~ e goes above the base, and o ~ u goes below. However, in practise, although the relative height distinction is always preserved, the way the vowel sign connects with the base varies from consonant to consonant. The differences are significant enough to make it worthwhile to show all possible combinations in the table below.

Consonant	No vowel sign	With i~e	With o~u
p	ᝉ	ᝉᝒ	ᝉᝓ
b	ᝊ	ᝊᝒ	ᝊᝓ
t	ᝆ	ᝆᝒ	ᝆᝓ
d	ᝇ	ᝇᝒ	ᝇᝓ
k	ᝃ	ᝃᝒ	ᝃᝓ
ɡ	ᝄ	ᝄᝒ	ᝄᝓ
s	ᝐ	ᝐᝒ	ᝐᝓ
h	ᝑ	ᝑᝒ	ᝑᝓ
m	ᝋ	ᝋᝒ	ᝋᝓ
n	ᝈ	ᝈᝒ	ᝈᝓ
ŋ	ᝅ	ᝅᝒ	ᝅᝓ
w	ᝏ	ᝏᝒ	ᝏᝓ
r	ᝍ	ᝍᝒ	ᝍᝓ
l	ᝎ	ᝎᝒ	ᝎᝓ
j	ᝌ	ᝌᝒ	ᝌᝓ

Placement of vowel signs with Buhid consonants.

Standalone vowels

ᝀ a

Standalone vowels in Buhid are written using 3 independent vowels.

ᝀ,ᝁ,ᝂ

Vowels at the beginning of a word or following another vowel are actually transcribed in IPA with a preceding glottal stop (ʔ), but they are written using one of 3 independent vowel letters.

As with the vowel signs, these letters each represent one of two possible sounds. (See the box above.)

eg.

ᝀᝊᝓᝑ

ᝁᝇᝓ

ᝄᝓᝂ

Vowel sounds to characters

This section maps Buhid vowel sounds to common graphemes in the Buhid orthography.

dependent ᝒ

standalone ᝁ

dependent ᝓ

standalone ᝂ

dependent ᝒ

standalone ᝁ

dependent ᝓ

standalone ᝂ

inherent vowel eg. ᝆᝊᝓ

standalone ᝀ

inherent vowel eg. ᝀᝊᝐ

standalone ᝀ

Consonants

	ᝉ,ᝊ,ᝆ,ᝇ,ᝃ,ᝄ
	ᝉ,ᝃ,ᝄ,ᝐ,ᝑ
	ᝋ,ᝈ,ᝅ
	ᝏ,ᝍ,ᝎ,ᝌ

Basic consonants

Buhid consonants are few and simple. There is no repertoire extension mechanism.

Click on each letter for more details and for examples of usage, especially where more than one sound is indicated.

ᝉ,ᝊ,ᝆ,ᝇ,ᝃ,ᝄ,ᝐ,ᝑ,ᝋ,ᝈ,ᝅ,ᝏ,ᝍ,ᝎ,ᝌ

ᝃ may be pronounced x when word-medial and before a vowel; likewise, ᝄ may be pronounced ɣ.

The glottal stop ʔ occurs in Buhid pronunciation as a syllable onset, but it is never written. Instead, one of the standalone vowels is used.

The f phoneme

The phonetic transcriptions by Barhammb indicate common use of the phoneme f, but no character for that sound exists in the Buhid Unicode block. The Unicode proposal document mentions a possible character for f introduced during a script reform, but doesn't propose anything because it is 'wanting attestation'. Room was left in the block for later additions, if necessary.

It has been difficult to find evidence of how this sound is written, but Satots mentions that in Philippine Latin orthography f is usually written as p. This page therefore uses 1749 to represent this sound. This needs to be checked against actual usage.

Onsets

Buhid syllable onsets are straightforward. They don't involve consonant clusters.

Codas

Like some other neighouring scripts, the syllable codas are not written in the Buhid orthography. This, of course, can lead to a certain amount of ambiguity.

eg.

ᝄᝋ

ᝏᝐ

Consonant sounds to characters

This section maps Buhid consonant sounds to common graphemes in the Buhid orthography.

Syllable-final consonants are never written. The right-hand column shows the shape alone, combined with vowel sign I, and combined with vowel sign U, respectively.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

ᝉ ᝉᝒ ᝉᝓ consonant ᝉ

ᝊ ᝊᝒ ᝊᝓ consonant ᝊ

ᝆ ᝆᝒ ᝆᝓ consonant ᝆ

ᝇ ᝇᝒ ᝇᝓ consonant ᝇ

ᝃ ᝃᝒ ᝃᝓ consonant ᝃ

ᝄ ᝄᝒ ᝄᝓ consonant ᝄ

ᝉ ᝉᝒ ᝉᝓ consonant ᝉ

ᝐ ᝐᝒ ᝐᝓ consonant ᝐ

consonant ᝃ when word-medial and followed by a vowel.

consonant ᝄ when word-medial and followed by a vowel.

ᝑ ᝑᝒ ᝑᝓ consonant ᝑ

ᝋ ᝋᝒ ᝋᝓ consonant ᝋ

ᝈ ᝈᝒ ᝈᝓ consonant ᝈ

ᝅ ᝅᝒ ᝅᝓ consonant ᝅ

ᝏ ᝏᝒ ᝏᝓ consonant ᝏ

ᝍ ᝍᝒ ᝍᝓ consonant ᝍ

ᝎ ᝎᝒ ᝎᝓ consonant ᝎ

ᝌ ᝌᝒ ᝌᝓ consonant ᝌ

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC). It also takes into account Unicode's Do Not Emit guidelines.

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Confusables & spelling errors

This section lists characters that may be mistakenly rendered using sequences that look the same as or similar to the atomic code points available.

Incorrect	Correct	Notes
ᝀᝓ	ᝁ	Use the atomic character, since there is no equivalence here.

Codepoint sequences

Combining marks always follow the based character.

Glyph shaping & positioning

You can experiment with examples using the Buhid character workbench.

Context-based shaping & positioning

Buhid letters don't interact with each other, but the placement of the vowel signs requires context-sensitive placement, and in some cases reshaping of the letter. The various combinations are shown in dependent_vowel_table.

Buhid has no multiple combining marks, or other shaping to consider.

Letterform slopes, weights, & italics

tbd

Since it is very hard to find any printed examples of Buhid text, it is likely that there is no standard approach to the use of oblique and bold forms, if they are used at all. The Noto Buhid font has only a regular face.

Typographic units

Word boundaries

Words are separated by spaces.

Graphemes

Buhid is a simple orthography and typographic units can be easily segmented using grapheme clusters.

Phrase, sentence, and section delimiters are described in phrase.

Grapheme clusters

Base Combining_mark*

Buhid typographic units consist of a letter or a letter with a single combining mark (one of two vowel signs). Both of these units fit the definition of a grapheme cluster.

As previously noted, syllable codas are not written in Buhid text, and so the segmentation only captures onsets and the syllable nucleus.

phrase	᜵
sentence	᜶

Line & paragraph layout

Line breaking & hyphenation

The primary line-break opportunity occurs at word boundaries.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the Buhid orthography.

Baselines, line height, etc.

Buhid uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.

Buhid letters vary slightly in height but are mostly around the same, with no ascenders or descenders. Vowel signs may appear above or below some letters, but these are on horizontal dashes.

To give an approximate idea, fig_baselines compares Latin and Buhid glyphs from the Noto Sans font. The basic height of Buhid letters is typically around the Latin x-height, however some taller letters and combining marks can reach just beyond the Latin ascenders (but not the descenders), creating a need for slightly larger line spacing.

Hhqxᝉᝇᝏᝒᝌᝒᝐᝒᝁ᜶ — Font metrics for Latin text compared with Buhid glyphs in the Noto Sans Buhid font.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

Buhid

Sample

Usage & history

Basic features

Notable features

Character index

Letters

Consonants

Vowel letters

Combining marks

Vowel marks

Punctuation

Phonology

Vowel sounds

Plain vowels

Consonant sounds

Tone

Structure

Vowels

Inherent vowel

Post-consonant vowels

Vowel signs

Standalone vowels

Vowel sounds to characters

Vowel absence

Consonants

Basic consonants

The f phoneme

Onsets

Codas

Consonant sounds to characters

Encoding choices

Confusables & spelling errors

Codepoint sequences

Numbers

Text direction

Glyph shaping & positioning

Context-based shaping & positioning

Letterform slopes, weights, & italics

Typographic units

Word boundaries

Graphemes

Grapheme clusters

Punctuation & inline features

Phrase & section boundaries

Line & paragraph layout

Line breaking & hyphenation

Line-edge rules

Baselines, line height, etc.

Page & book layout

References