Clear all

   

This tool parses a string and shows extended grapheme cluster boundaries (except for Korean jamo and emoji character sequences.)

Click on the segments to reveal the character names.

show more

This app segments text in 3 different ways:

  • BCv graphemes start with a base character and add all following combining marks, unless the base character is preceded by a character with the virama or invisible stacker indic property, in which case it extends the previous grapheme. This may produce inaccurate results if a virama is meant to signal the end of a syllable with a visible marker.
  • BC graphemes start with a base character and add all following combining marks. They don't extend the grapheme where there are viramas or stackers. That means that conjunct graphemes are split into separate parts.
  • Unicode grapheme clusters are an approximation to user perceived graphemes where the boundaries are established by rules applied to code point sequences according to UAX #29. The rules tend to be biased towards producing the units of text needed for cursor positioning. (There is a different set of rules for establishing break opportunities for line-breaking.) Grapheme clusters may also be tailored for particular languages.

To pass a string in the URL, use one of:

  • ?bcv=<string>
  • ?bc=<string>
  • ?gc=<string>

To indicate in the URL the font you want to use for the display, add &font=<font_name>.

See also the ICU line-break segmenter.

Updated 4 April, 2022

See recent changes. Make a comment. Licence CC-By © r12a