Use accesskey "n" to jump to the internal navigation links at any point. Right now you can
Updated 25 April, 2024 • recent changes scripts/cher/chr • leave a comment
This page brings together basic information about the Cherokee script and its use for the Cherokee language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Cherokee using Unicode.
It is remarkably difficult to find actual phonetic transcriptions of Cherokee words, and most so-called 'phonetic' transcriptions use one of the Latin orthographies, which also lack the more fine-grained distinctions needed to understand how to pronounce the text properly. Take, for example, the word for apple (ᏒᎦᏔ), which is written svgata in the Latin orthography, but is actually pronounced sə̃̌ːkʰtʰ. Therefore, the pronunciation information here and in the character notes is necessarily somewhat vague.
Richard Ishida, Cherokee Orthography Notes, 25-Apr-2024, https://r12a.github.io/scripts/cher/chr
Click to toggle Table of Contents.
Phonological transcriptions should be treated as a guide, only. They are taken from the sources consulted, and may be narrow or broad, phonemic or phonetic, depending on what is available. They mostly represent pronunciation of words in isolation. For more detailed information about allophones, alternations, sandhi, dialectal differences, and so on, follow the links to cited references.
This is an interactive document. Click/tap on the following to reveal detailed information and examples for each character: (a) coloured characters in examples and lists; (b) link text on character names. If your browser supports it, your cursor will change to look like as you hover over these items.
Languages using the Cherokee script • Cherokee picker • Terms list • Character notes • Cherokee links • Other orthography notes
Ꭰꮿꮩꮈ 1 Ꮒꭶꮣ ꭰꮒᏼꮻ ꭴꮎꮥꮕꭲ ꭴꮎꮪꮣꮄꮣ ꭰꮄ ꭱꮷꮃꭽꮙ ꮎꭲ ꭰꮲꮙꮩꮧ ꭰꮄ ꭴꮒꮂ ꭲᏻꮎꮫꮧꭲ. Ꮎꮝꭹꮎꮓ ꭴꮅꮝꭺꮈꮤꮕꭹ ꭴꮰꮿꮝꮧ ꮕᏸꮅꮫꭹ ꭰꮄ ꭰꮣꮕꮦꮯꮣꮝꮧ ꭰꮄ ꭱꮅꮝꮧ ꮟᏼꮻꭽ ꮒꮪꮎꮣꮫꮎꮥꭼꭹ ꮎ ꮧꮎꮣꮕꮯ ꭰꮣꮕꮩ ꭼꮧ.
Ꭰꮿꮩꮈ 2 Ꮒꭶꮫ ꭰꮒᏼꮻ ꭴꮎꮣꮒꮬ ꮎꭲ ꮒꭶꮣ ꭴꮒꮂ ꭲᏻꮎꮫꮑꮧꭲ ꭰꮄ ꮩꭿ ꭰꮥꮧꭲ ꮎꭲ ꮥꭶꭷꮕꭹ ꭿꭰ ꮧꭶꮓꮳꮃꮕꭲ, ꭴꮎꮴꮅꮫ ꮔꮎꮰꮿꮝꮫꮎ ꮎꭲ ꮒꭶꭵꮙ ꮷꮣꮄꮕꮣ, ꮥꭷꮑꭲꮝꮤꮕꭿ ꮷꮎꮣꮄꮕꮣ ꭰꮒᏼꮻ, ꮧꭸꭶꭶꮕꮧꭲ, ꭰꭸꮿ ꭰꮄ ꭰꮝꭶꮿ, ꭶꮼꮒꭿꮝꮧ, ꮷꮎꮑꮅꮧ, ꮧꮎꮩꭹꮿꮝꭹ ꭰꮄ ꮠꭲ ꮎꮒꮅꮝꭼꭹ, ꭰᏸꮅ ꭴꮎꮩꮲꭿ ꭰꮄ ᏼꮻ ꮒꮩꮣᏻꮎꮣꮄꮕꭹ, ꮔꮕꮏꮕ, ꭴꮥꮕ ꭰꮄ ꮠꭲ ꮔꮝꮧꮣꮕꭲ. Ꭴꮧꮧꮲꭲꭸꮝꮩꮧ, ꮭ ꮔꮎꮰꮿꮝꮫꮎ ꭴꮩꭿᏻꮢꮎ ꮎꮝꭹꮓ ꮧꮎꮩꭹꮿꮝꭹ ꮒꮣᏻꮅꮝꮩꮤꮕ ꮎꮝꭹ ꭴꮩꮲꮕꭲ, ꭲᏻꮎꮫꮑꮅꮣꮝꮧ ꭴꮒꮂꭹ ꭰꮄ ꭰᏸꮅ ꮪꮎꮩꮲꮢ ꮔꮝꮧꮣꮕ ꮎꮝꭹ ꮒꭼꮎꮫꭲ ꭰꮄ ꮝꭶꮪꭹ ꮎꮝꭹꮓ ꭰꮒᏼꮻ ꭰꮎꮑꮈꭹ, ꭲᏻꮓꮝꮚ ꮎꮝꭹꮎꭲ ꭴꮎꮣꮴꮅꮣ, ꭶꭸꭶꮕꮨ ꭸꮢꭲ, ꭼꮒꭼꭼ-ꭴꮹꮢ-ꭴꭶꮞꮝꮧꮥꭹ ꭽꮻꮒꮧꮲ ꮒꭶꭵ ꮠꭲ ꮕꮒᏺꭲꮝꮣꮑꮂꮎ ꮎꭲ ꭴꮒꮂ ꭴꮎꮣꮴꮅꭶꮿ.
ᎠᏯᏙᎸ 1 ᏂᎦᏓ ᎠᏂᏴᏫ ᎤᎾᏕᏅᎢ ᎤᎾᏚᏓᎴᏓ ᎠᎴ ᎡᏧᎳᎭᏉ ᎾᎢ ᎠᏢᏉᏙᏗ ᎠᎴ ᎤᏂᎲ ᎢᏳᎾᏛᏗᎢ. ᎾᏍᎩᎾᏃ ᎤᎵᏍᎪᎸᏔᏅᎩ ᎤᏠᏯᏍᏗ ᏅᏰᎵᏛᎩ ᎠᎴ ᎠᏓᏅᏖᏟᏓᏍᏗ ᎠᎴ ᎡᎵᏍᏗ ᏏᏴᏫᎭ ᏂᏚᎾᏓᏛᎾᏕᎬᎩ Ꮎ ᏗᎾᏓᏅᏟ ᎠᏓᏅᏙ ᎬᏗ.
ᎠᏯᏙᎸ 2 ᏂᎦᏛ ᎠᏂᏴᏫ ᎤᎾᏓᏂᏜ ᎾᎢ ᏂᎦᏓ ᎤᏂᎲ ᎢᏳᎾᏛᏁᏗᎢ ᎠᎴ ᏙᎯ ᎠᏕᏗᎢ ᎾᎢ ᏕᎦᎧᏅᎩ ᎯᎠ ᏗᎦᏃᏣᎳᏅᎢ, ᎤᎾᏤᎵᏛ ᏄᎾᏠᏯᏍᏛᎾ ᎾᎢ ᏂᎦᎥᏉ ᏧᏓᎴᏅᏓ, ᏕᎧᏁᎢᏍᏔᏅᎯ ᏧᎾᏓᎴᏅᏓ ᎠᏂᏴᏫ, ᏗᎨᎦᎦᏅᏗᎢ, ᎠᎨᏯ ᎠᎴ ᎠᏍᎦᏯ, ᎦᏬᏂᎯᏍᏗ, ᏧᎾᏁᎵᏗ, ᏗᎾᏙᎩᏯᏍᎩ ᎠᎴ ᏐᎢ ᎾᏂᎵᏍᎬᎩ, ᎠᏰᎵ ᎤᎾᏙᏢᎯ ᎠᎴ ᏴᏫ ᏂᏙᏓᏳᎾᏓᎴᏅᎩ, ᏄᏅᎿᏅ, ᎤᏕᏅ ᎠᎴ ᏐᎢ ᏄᏍᏗᏓᏅᎢ. ᎤᏗᏗᏢᎢᎨᏍᏙᏗ, Ꮭ ᏄᎾᏠᏯᏍᏛᎾ ᎤᏙᎯᏳᏒᎾ ᎾᏍᎩᏃ ᏗᎾᏙᎩᏯᏍᎩ ᏂᏓᏳᎵᏍᏙᏔᏅ ᎾᏍᎩ ᎤᏙᏢᏅᎢ, ᎢᏳᎾᏛᏁᎵᏓᏍᏗ ᎤᏂᎲᎩ ᎠᎴ ᎠᏰᎵ ᏚᎾᏙᏢᏒ ᏄᏍᏗᏓᏅ ᎾᏍᎩ ᏂᎬᎾᏛᎢ ᎠᎴ ᏍᎦᏚᎩ ᎾᏍᎩᏃ ᎠᏂᏴᏫ ᎠᎾᏁᎸᎩ, ᎢᏳᏃᏍᏊ ᎾᏍᎩᎾᎢ ᎤᎾᏓᏤᎵᏓ, ᎦᎨᎦᏅᏘ ᎨᏒᎢ, ᎬᏂᎬᎬ-ᎤᏩᏒ-ᎤᎦᏎᏍᏗᏕᎩ ᎭᏫᏂᏗᏢ ᏂᎦᎥ ᏐᎢ ᏅᏂᏲᎢᏍᏓᏁᎲᎾ ᎾᎢ ᎤᏂᎲ ᎤᎾᏓᏤᎵᎦᏯ.
Source: Unicode UDHR, articles 1 & 2. Cased version generated by hand from that.
It is estimated that only around 2,000 Cherokee people speak the language. However, those who do speak the language use the script widely for writing letters, recipes, folktales, diaries, and for personal record-keeping. It is also used in some legal, governmental and religious documents and, in some areas, public signage. Efforts are being made to revive both the language and the script; to that end it is used in a limited capacity in education. Knowledge of the script is considered a prerequisite for full Cherokee citizenship
.6
ᏣᎳᎩ tsalagi Cherokee
The script was developed by a Cherokee named Sequoyah and presented to the Cherokee Nation in 1821. It was popular and most Cherokee were literate in the script by 1828, when Sequoyah and Samuel Worcester reformed the orthography during the process of preparing it for printing.
From the 1870s to the early 1900s, the US government actively suppressed the Cherokee language and culture, sending children away from their parents and creating a generation that was unfamiliar with the language and script. The ultimate result of this policy is that the Cherokee language is now considered endangered to moribund. There are, however, efforts to increase usage, and users are able to use the language and script for social media on mobile devices.
Sources: Scriptsource, Wikipedia.
Script code | cher |
---|---|
Language code | chr |
Script type | syllabary |
Origin | nam |
Native speakers | 1,520 |
Total characters | 184 |
Letters | 170 |
Combining marks | 10 |
Punctuation | 4 |
Possible other | 14 |
Unicode blocks | 2 |
Character counts above are for this orthography but exclude ASCII. | |
Text direction | ltr |
Post-consonant vowels | letters |
Standalone vowels | |
Case distinction | yes |
Cursive script | no |
Combining marks | no |
Clusters marked | no |
Other ligatures | no |
Word separator | space |
Wraps at | word |
Hyphenation | ? |
G Clusters OK? | yes |
Justification | spaces |
Baseline | romn |
Cherokee is a syllabary. Letters typically represent a combination of consonants and vowels. See the table to the right for a brief overview of features of modern Cherokee.
Cherokee text runs left to right in horizontal lines. Words are separated by spaces.
The script is becoming bicameral, after a long period when syllabic characters ressembled uppercase letters.
The Cherokee syllabary has 85 characters, of which 6 represent syllables that start with either no consonant or with ʔ (Ꭰ Ꭱ Ꭲ Ꭳ Ꭴ Ꭵ), and one character represents the non-syllabic consonant sound s (Ꮝ). The rest nominally represent a combination of consonant plus vowel, though the actual practise is a little more nuanced, and there is a degree of vagueness in the script when it comes to phonetically transcribing spoken sounds.1 It is a simple syllabary where letter shapes don't follow any systematic pattern.
The script doesn't fully represent the sounds of the spoken language. Vowel length is not distinguished, with some exceptions syllable-final consonants and syllable-initial aspiration are not reflected in the orthography, and the user has to figure out when to drop the vowel of a CV letter to make consonant clusters. Some readers are beginning to use diacritics to indicate pronunciation more accurately.
The spoken language is tonal, but tones are not written. A set of diacritics exists, however, to enable linguists to indicate tones.
There is no standard spelling. The way a word is written may vary, according to the pronunciation of the writer, or choices they make for dealing with consonant clusters.
The visual forms of letters don't interact. There are no combining characters or diacritics, and ASCII digits are used.
The index points to locations where a character is mentioned in this page, and indicates whether it is used by the Cherokee orthography described here.
Click on the image to the left to view all the 'main' and 'infrequent' characters in the index in various groupings or open related apps.
These are sounds for the Cherokee language.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones. Source Wikipedia.
labial | alveolar | post- alveolar |
palatal | velar | glottal | |
---|---|---|---|---|---|---|
stop | t tʰ | k kʰ | ʔ | |||
labialised stop | kw̥ kʰw̥ | |||||
affricate | t͡s tʰ͡s | t͡ʃ tʰ͡ʃ t͡l t͡ɬ |
||||
fricative | s ɬ |
h | ||||
nasal | m | n n̥ | ||||
approximant | w w̥ | l | j j̥ | |||
trill/flap | ||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Although some transcriptions suggest it, Cherokee doesn't contrast voiced vs. unvoiced stops and affricates; all are unvoiced. Instead, it contrasts aspirated and unaspirated forms334.
t͡s is pronounced t͡ʃ by some speakers. This also applies to the aspirated forms.
The glottal stop appears between vowels, and also sometimes between a vowel and a consonant, though less frequently. However, it is not written when using the syllabary, and so minimal pairs may be spelled the same, eg. ᎪᎢ kòʔi oil, grease ᎠᏓ àta/áʔta wood/young animal
Cherokee has 6 tones, 2 level and 4 contoured. They are shown in the table with Latin transcriptions used by Scancarelli (1986), Montgomery-Anderson (2008,2015), and Feeling (2003)/Uchihara (2016), respectively10.
IPA | short | long | name |
---|---|---|---|
˨ | à, a, a | à:, aa, aa | low |
˧ | á, á, á | á:, áa, áá | high |
˨˧ | ǎ:, aá, aá | rising | |
˧˨ | â:, áà, áà | falling | |
˨˩ | ȁ:, aà, àà, àa | lowfall | |
˧˦ | a̋:, áá, aa̋ | superhigh |
The level tones can appear with short or long vowels, but the others only occur with long vowels.
Tones are not marked in the Cherokee orthography, although this doesn't often create ambiguity10. Nor are they used in the Latin orthographies, except in dictionaries. See also Tone marks.
For the superhigh tone, Wikipedia says: The superhigh tone, also called "highfall" by Montgomery-Anderson, has a distinctive morphosyntactical function, primarily appearing on adjectives, nouns derived from verbs, and on subordinate verbs. It is mobile and falls on the rightmost long vowel. If the final short vowel is dropped and the superhigh tone becomes in word-final position, it is shortened and pronounced like a slightly higher final tone (notated as a̋ in most orthographies). There can only be one superhigh tone per word, constraint not shared by the other tones. For these reasons, this contour exhibits some accentual properties and has been referred to as an "accent" (or stress) in the literature.
Cherokee, like many other North American languages, builds words into short phrases by adding prefixes and suffixes to the word root. A number of phonological changes are applied during this process. Typically, but not always, the result of these changes is captured by the orthography.
In certain circumstances, vowels in the underlying morphemic model are dropped when particles are added to a word root. The rule is as follows365:
t, k, j, w, y, n, kw, l + short vowel + h + plosive or vowel → removal of the short vowel and aspiration of the initial letter in the sequence
For example:
You're heading there,
I'm heading there, and
Are you already going?. The first experiences vowel deletion, the second not, the third does.
This phonological change is generally reflected in the orthography, however, the kʰ in the first example still uses a non-aspirated letter.
tbd
The script is becoming bicameral, after a long period when syllabic characters ressembled uppercase letters.
The Cherokee syllabary has 85 characters, of which 6 represent syllables that start with either no consonant or with ʔ (Ꭰ Ꭱ Ꭲ Ꭳ Ꭴ Ꭵ), and one character represents the non-syllabic consonant sound s (Ꮝ). The rest nominally represent a combination of consonant plus vowel, though the actual practise is a little more nuanced, and there is a degree of vagueness in the script when it comes to phonetically transcribing spoken sounds.1 It is a simple syllabary where letter shapes don't follow any systematic pattern.
The script doesn't fully represent the sounds of the spoken language. Vowel length is not distinguished, with some exceptions syllable-final consonants and syllable-initial aspiration are not reflected in the orthography, and the user has to figure out when to drop the vowel of a CV letter to make consonant clusters. Some readers are beginning to use diacritics to indicate pronunciation more accurately.
The spoken language is tonal, but tones are not written. A set of diacritics exists, however, to enable linguists to indicate tones.
Cherokee syllabic letters don't provide all the information needed to detect the underlying sounds if you are not familiar with the language. Features that are not generally expressed by the orthography include syllable-initial aspiration, syllable-final consonants, vowel length and unpronounced vowels, and tone. These are described in more detail below. Montgomery-Anderson lists the following possible sounds that are represented by ᏙU+13D9 LETTER DO:
tò, tó, tòː, tóː, tǒː, tôː, tȍː, tőː, tòh, tóh, tòʔ, tóʔ,
tʰò, tʰó, tʰòː, tʰóː, tʰǒː, tʰôː, tʰȍː, tʰőː, tʰòh, tʰóh, tʰòʔ, tʰóʔ
The realisation of Cherokee sounds is also often affected by phonological rules that are determined by context – creating another step away from the sounds implied by the syllabic letters.
In addition, vowels at the end of a word or in syllable onsets may be dropped, eg. ᏥᎩᎵ t͡ski.li ghost
As mentioned in the introduction, it is difficult to find precise information about how Cherokee syllables and words are pronounced, so while we try to provide what phonetic information we can here, most of the transcriptions are in one of the rather imprecise Latin orthographies. There are also different versions of the Latin orthographies, some preferring to make a distinction between d and t, whereas others map to t and th.
The six vowel characters, when they appear at the start of a word represent plain vowel sounds. They may be short or long, and will be modified by tone, but none of those things are expressed by the orthography, eg. ᎠᎹ ama/aːma water/salt
Elsewhere they represent a syllable starting with ʔ,1 eg. ᎯᎠ hiʔa this
The vowel in a CV syllable doesn't distinguish between short and long vowel sounds, nor does it indicate tonal values, eg. the following sequence of Cherokee characters represents two different words, each having different lengths and tones (low vs. high, respectively)1: ᎠᎹ ama/aːma water/salt
A 3rd fricative, ɬ, appears below as an aspirated form of l.
See also Consonant 's'.
One syllable is archaic and not used.
Because it is not followed by a vowel, this character can be used to form consonant clusters at the start of a syllable1, eg. ᏍᎪᎯ skoːhi ten
It is also used for syllables that end with an s sound, eg. ᎯᏴᏫᏯᏍ hijə̃ːwiːjaːs Are you a Native American?
Some manuscripts precede syllables beginning with an s sound with this character. Sequoyah spelled his name like that, ie. ᏍᏏᏉᏯ s-si-quo-ya
Most syllables can start with aspirated forms, but only 6 pairs of letters distinguish between aspirated and non-aspirated sounds in the onset.
Five pairs of characters make this distinction for stops or affricates: Ꭶ+Ꭷ, Ꮣ+Ꮤ, Ꮥ+Ꮦ, Ꮧ+Ꮨ, Ꮬ+Ꮭ. For example, it is possible to distinguish between the first two syllables of ᎧᎦᎵ kʰaːkaʔli February but not1590 between the two meanings of ᎪᎳ kőːla/ kʰǒːla winter/bone
Only one nasal syllable makes this distinction, ie. compare ᎬᎾ kə̃́.na/kə̃̀ː.na turkey/I'm alive ᎬᎿ kə̃ː.hn̥a he is alive
However, the following could have two different meanings ᎬᏂᎭ kə̃ːniha/kə̃ːhn̥iha I am/(s)he is striking it
Aspiration can also arise when there is a non-written h sound in a syllable. Most syllables can have a coda with this sound, which then interacts with the sounds around it as morpheme prefixes and suffixes are attached to the base word. In some cases, it may produce transformations in other syllables.
The following 2 words illustrate non-written h sounds, the first in the syllable coda, and the second in the onset. ᎤᏂᎷᏨ ȕː.nì.lúh.tʰ͡ʃə̃ they arrived ᎠᎨᏯ à.kěː.hj̥a woman
In spoken Cherokee, vowels at the end of a word are often dropped, although the orthography indicates what the vowel would have been396. Figure 2 shows an example based on Montgomery-Anderson of the pronunciation of the sentence: The hungry man ate all the good food.
Additional vowel loss occurs as a result of phonological changes. See phonological_processes.
Each character may not only end with a vowel, but may also end with ʔ or h, eg. the following are written with just two characters ᏑᏗ suhti fishhook ᏔᎵ tʰaʔ.li two
There is one distinctive pair related to syllables ending with h, ie. compare: Ꮎ na Ꮐ nah
Syllables that end with an s sound can be written using ᏍU+13CD LETTER S, eg. ᎯᏴᏫᏯᏍ hijə̃ːwiːjaːs Are you a Native American?
With one exception, consonant clusters are managed by using a normal syllabic character but ignoring the ('dummy') vowel, eg. ᎦᎵᏉᎩ kaɬkʷoːki seven ᎬᏙᎠ ktʰoːʔa it's hanging The character chosen is largely up to the writer, but some words bring in etymological connections.
The exception is ᏍU+13CD LETTER S, which is not followed by a vowel,1 eg. ᏍᎪᎯ skoːhi ten
Some manuscripts precede syllables beginning with an s sound with ᏍU+13CD LETTER S, and Sequoyah spelled his name like that, ie. ᏍᏏᏉᏯ s-si-qo-ya
Spoken Cherokee has tones, but they are not shown in the text.7
Linguists who want to show tones do so using standard allocations of combining characters. The following list shows diacritics used to express tones. (Mid is the default, and doesn't need marking.)25
Everson reports that some combining diacritical marks are now used in Cherokee text by ordinary readers and especially children.25
These diacritics are in the Unicode Combining Diacritical Marks block. The Cherokee block has no combining characters.
◌̣U+0323 COMBINING DOT BELOW indicates shifts in consonant readings – such as voiced to voiceless, voiceless to voiced; for example, where Ꭺ is ko, Ꭺ̣ would be kʰo.
◌̱U+0331 COMBINING MACRON BELOW indicates the dropping of a vowel; for example, Oklahoma could be written
ᎣᎦ̱ᎳᎰᎹ òːklàːhőːma Oklahoma
When a consonant is both shifted and has its vowel dropped, ◌̤U+0324 COMBINING DIAERESIS BELOW is used.
Nasalisation is only very rarely marked: in such cases, it can be indicated using ◌̰U+0330 COMBINING TILDE BELOW.
This section describes typographic features related to digits, dates, currencies, etc.
Sequoyah, the inventor of the script, created a set of Cherokee numbers, but they were not adopted and are not encoded in Unicode.7 The shapes of the numbers can be seen on the Omniglot page.
Cherokee text runs left-to-right in horizontal lines.
Show default bidi_class
properties for characters in the Cherokee orthography described here.
This section describes typographic features related to font/writing styles, cursive text, context-based shaping, context-based positioning, letterform slopes, weights & italics, and case & other character transforms.
Experiment with examples using the Cherokee character app.
Are special glyph forms needed, depending on the context in which a character is used? Do glyphs interact in some circumstances? Are there requirements to position diacritics or other items specially, depending on context? Does the script have multiple diacritics competing for the same location relative to the base?
There is no interaction between the glyphs in Cherokee.
Normally, there are no combining marks in Cherokee text. Such marks are only found in special cases, such as specialised educational or linguistic contexts.
Are italicisation, bolding, oblique, etc relevant? Do italic fonts lean in the right direction? Is synthesised italicisation problematic? Are there other problems relating to bolding or italicisation - perhaps relating to generalised assumptions of applicability?
Cherokee users would like their fonts to have italic and bold styles, although this is not currently common. These alternate styles would be used in the same way as for the Latin script.25
Is the orthography bicameral? Are there other character pairings, especially when transforms are needed to convert between the two?
In 2015 a set of lowercase letters were added to version 8.0 of the Unicode repertoire, to complement the original set. This is discussed in more detail in Case.
Applications should provide for transformations between upper and lower case forms, however the situation is slightly unusual in that the pre-existing text is now written uppercase, and transforms need to in some cases treat lowercasing as the default operation. The following is from the Unicode Standard:
This exceptional introduction of a lowercase set to change a unicameral encoding into a bicameral encoding has important implications that implementers of the Cherokee script need to keep in mind. First, in order to preserve case folding stability, Cherokee case folds to the previously encoded uppercase letters, rather than to the newly encoded lowercase letters. This exceptional case folding behavior impacts identifiers, and so can trip up implementations if they are not prepared for it. Second, representation of cased Cherokee text requires using the new lowercase letters for most of the body text, instead of just changing a few initial letters to uppercase. That means that representation of traditional text such as the Cherokee New Testament requires substantial re-encoding of the text. Third, the fact that uppercase Cherokee still represents the default and is most widely supported in fonts means that input systems which are extended to support the new lowercase letters face unusual design choices.
Lowercase characters were introduced in Unicode 8.0, to cover growing use of bicameral content in modern typesetting, as well as some older texts such as the Cherokee New Testament. The lowercase text above is likely to be displayed as tofu (boxes), since it is currently difficult to find a font that includes lowercase forms.
It is unusual for the majority of content to be in uppercase, and for lowercase to come in later, and implementers may need to take care in introducing the new characters. For example, Cherokee case-folds to uppercase, rather than lower. For more details see the Unicode Standard.7
The shapes of the upper- vs. lower-cased letters don't change radically (as they do in Latin or Cyrillic). The lowercase letters are often simply smaller, however they may have ascenders and descenders in some fonts25.
ᎠᎾᎦᎵᏍᎬ anagalisgv lightning
Are words separated by spaces, or other characters? Are there special requirements when double-clicking on the text? Are words hyphenated?
Words are separated by spaces.
See type samples.
Some words are hyphenated.
Since there are no combining marks or decompositions in normal Cherokee text, grapheme clusters correspond to individual characters. Where combining marks are attached to letters, the combination of base and combining mark still fits within the definition of a grapheme cluster.
Base (Mark?)
Each letter is a grapheme cluster, even if (rare) combining marks are attached.
Click on the text version of this word to see more detail about the composition.
![]() | ᎠᎺᏉᎢ a.méː.kʷő.ʔi ocean |
![]() | ᎠᎨᏯ à.kěː.hj̥a woman |
![]() | ᎬᏙᎠ ktʰoːʔa it's hanging |
![]() | ᎣᎦ̱ᎳᎰᎹ òːklàːhőːma Oklahoma |
This section describes typographic features related to word boundaries, phrase & section boundaries, bracketed text, quotations & citations, emphasis, abbreviation, ellipsis & repetition, inline notes & annotations, other punctuation, and other inline text decoration.
What characters are used to indicate the boundaries of phrases, sentences, and sections?
See type samples.
Cherokee uses standard Latin punctuation.7
phrase | |
---|---|
sentence |
In some cases, it has been known for full stops to be raised above the baseline.1
See type samples.
Cherokee commonly uses ASCII parentheses to insert parenthetical information into text.
start | end | |
---|---|---|
standard |
What characters are used to indicate quotations? Do quotations within quotations use different characters? What characters are used to indicate dialogue? Are the same mechanisms used to cite words, or for scare quotes, etc? What about citing book or article names?
See type samples.
Cherokee texts typically use quotation marks. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.
start | end | |
---|---|---|
initial | ||
alternative |
This section describes typographic features related to line breaking & hyphenation, text alignment & justification, text spacing, baselines, line height, counters, lists, and styling initials.
Are there special rules about the way text wraps when it hits the end of a line? Does line-breaking wrap whole 'words' at a time, or characters, or something else (such as syllables in Tibetan and Javanese)? What characters should not appear at the end or start of a line, and what should be done to prevent that? Is hyphenation used, or something else? What rules are used? What difficulties exist?
By default, lines are broken at inter-word spaces. As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line.
As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.
Show (default) line-breaking properties for characters in the Cherokee orthography.
The following list gives examples of typical behaviours for some of the characters used in Cherokee. Context may affect the behaviour of some of these and other characters.
Click/tap on the characters to show what they are.
Does text in a paragraph needs to have flush lines down both sides? Does the script allow punctuation to hang outside the text box at the start or end of a line? Where adjustments are need to make a line flush, how is that done? Does the script shrink/stretch space between words and/or letters? Are word baselines stretched, as in Arabic? What about paragraph indents?
Justification is done, principally, by adjusting the space between words.
Does the script have special requirements for baseline alignment between mixed scripts and in general? Is line height special for this script? Are there other aspects that affect line spacing, or positioning of items vertically within a line?
Cherokee uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.
Cherokee character glyphs are generally the same height, and only rarely descend (a short way) below the baseline. There are no combining marks in normal text.
To give an approximate idea, Figure 4 compares Latin and Cherokee glyphs from a Noto font. The height of uppercase Cherokee letters is that of the Latin cap-height, and lowercase is set to the Latin x-height.
Figure 5 shows similar comparisons for the Galvji and Gadugi fonts.
This section describes typographic features related to general page layout & progression; grids & tables, notes, footnotes, etc, forms & user interaction, and page numbering, running headers, etc.
1Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
2Michael Everson & Durbin Feeling, Revised proposal for the addition of Cherokee characters to the UCS
3Brad Montgomery-Anderson (2008), A reference grammar of Oklahoma Cherokee, PhD dissertation
4Omniglot, Cherokee
5Margaret Peake Raymond (2008), The Cherokee Nation and its Language, tsalagi ayeli ale uniwonishisdi (retr. Dec 2021)
6ScriptSource, Cherokee
7Unicode Consortium, The Unicode Standard, Version 13.0, Chapter 20.1: Americas, Cherokee, 788-789, ISBN 978-1-936213-16-0.
8Uchihara, Hiroto (2007), Cherokee Phonology and Verb Morphology, University of Tokyo, MA Thesis (retr. Dec 2021)
9Unicode Consortium, Unicode Line Breaking Algorithm (UAX#14)✓
10Wikipedia, Cherokee syllabary