This app segments text in 3 different ways:
- BCv graphemes start with a base character and add all following combining marks, unless the base character is preceded by a character with the virama or invisible stacker indic property, in which case it extends the previous grapheme. This may produce inaccurate results if a virama is meant to signal the end of a syllable with a visible marker.
- BC graphemes start with a base character and add all following combining marks. They don't extend the grapheme where there are viramas or stackers. That means that conjunct graphemes are split into separate parts.
- Unicode grapheme clusters are an approximation to user perceived graphemes where the boundaries are established by rules applied to code point sequences according to UAX #29. The rules tend to be biased towards producing the units of text needed for cursor positioning. (There is a different set of rules for establishing break opportunities for line-breaking.) Grapheme clusters may also be tailored for particular languages.
To pass a string in the URL, use one of:
?bcv=<string>
?bc=<string>
?gc=<string>
To indicate in the URL the font you want to use for the display, add &font=<font_name>
.
See also the ICU line-break segmenter.