Updated 24 December, 2024
This page brings together basic information about the Latin script and its use for the Kurmanji language written in the Hawar orthography. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Kurmanji using Unicode.
Richard Ishida, Kurmanji (Hawar) Orthography Notes, 24-Dec-2024, https://r12a.github.io/scripts/latn/kmr
Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.
Heryek, bê tu cihêyî, nemaze ya nijad, reng, zayend (cisn), ziman, ol, ramana siyasî an her ramana din, eslê neteweyî an civakî, serwet, zayîn an her rewşeke din, xwediyê hemû maf û hemû azadiyên ku di vê Danezanê de hatine daxuyankirî ye.
Bi serde, ewê tu cihêyî neyê kirin ji ber statuya siyasî, hiqûqî an navneteweyî ya welat an erdê ku kesek jê tê, heke ev welat an erd serbixwe, li jêr dest, ne xweser (otonom) an li jêr her tahdîda din a serweriyê be an na.
Source: UDHR, articles 1 & 2
Origins of the Latin script, 7thC – today.
Phoenician
└ Greek
└ Old Italic
└ Latin
+ Glagolitic
+ Cyrillic
+ Armenian
+ Georgian
+ Coptic
+ Runes
The Kurmanji (or Northern Kurdish) language is spoken by almost 16 million people. 9 million speakers are in Turkey, although overall usage is beginning to decline there.ekmr The Latin script is the primary orthography, but the Arabic script is used in Iran, Iraq, Syria, and Lebanon.ekmr
Kurmanji is widely written in the Latin script for poetry, general literature, education, and political documents.ekmr
Kurmancî
The Hawar orthography described here was devised by Celadet Bedirxan and his brother Kamuran Alî Bedirxan and launched in 1932. They aimed to create an alphabet that didn't use two letters for representing one sound. In addition to the older Latin orthography, Kurmanji has been written in the past in Arabic, Armenian, and Cyrillic orthographies.
The following map of Kurdish dialects was created for Wikipedia. The Wikipedia article on Sorani contains a useful additional details about the use of Sorani since the 1700s.
The Latin script is an alphabet. This means that it is largely phonetic in nature, where each letter represents a basic sound. See the table to the right for a brief overview of features for the modern Wolof orthography using the Latin script.
Kurmanji text runs left-to-right in horizontal lines. Words are separated by spaces. The orthography is bicameral. The visual forms of letters don't usually interact.
Kurmanji has 23 basic consonant letters (each of which is duplicated in upper- and lowercase). The alphabet is largely phonemic, however, four of the consonant letters represent more than one consonant sound: 3 additional letters are listed here that represent individual sounds, but their use is not standard.
❯ basicV
This orthography is an alphabet in which vowels are written using 8 vowel letters, each with upper and lowercase forms. (Two combining marks only occur in decomposed text.) Long vowels are indicated by letters with a circumflex above.
No special features are used for standalone vowels.
Numbers use ASCII digits.
Line-breaking and justification are primarily based on inter-word spaces.
Decomposed text only
The following represents the repertoire of the Kurmanji language.
Click on the sounds to reveal locations in this document where they are mentioned.
Phones in a lighter colour are non-native or allophones. Source Wikipedia.
Thaxton reports that The phoneme ɛ is often transcribed æ or ə, but it is also closer to ɛ in all environments for many speakers. The examples transcribed in this page from Wiktionary all use the latter.wmt
labial | alveolar | post- alveolar |
palatal | velar | uvular | pharyngeal | glottal | |
---|---|---|---|---|---|---|---|---|
stop | p b | t d | k ɡ | q | ʔ | |||
affricate | t͡ʃ d͡ʒ | |||||||
fricative | f v | s z | ʃ ʒ | x ɣ | χ | ħ ʕ | h | |
nasal | m | n | ŋ | |||||
approximant | w | l ɫ | j | |||||
trill/flap | r ɾ | |||||||
Unlike Sorani, some transcriptions of Kurmanji add phonemes for unaspirated voiceless stops, pˤ, tˤ,kˤ, and t͡ʃˤ, which are also slightly pharyngealized. However, this distinction doesn't apply to all dialects, and in some areas is dying out. The distinction is not reflected in the Latin orthography (but is in the Cyrillic). The examples in this page do not make that distinction.ua
The sounds ħ and ʕ only occur in some dialects with Arabic loan words. In other cases h and ʔ are used insteadua, as they are in most of the examples in this page.
Kurmanji is not a tonal language.
tbd
The following table summarises the main vowel to character assigments.
Lowercase on the left, uppercase on the right.
Simple: | ||
---|---|---|
For additional details see vowel_mappings.
These are dedicated vowel letters.
Multiple code points are only used when text is decomposed. See encoding.
Vowel length is indicated by the vowel letter used.
No special features are used for standalone vowels. The vowel letters shown above are used on their own.
afret
eylo
otomobîl
The same applies for standalone vowels within a word.
reaktor
qenaet
biencam
This section maps Kurmanji vowel sounds to common graphemes in the Hawar Latin orthography.
Uppercase forms are shown to the right.
Î vowel î
I i
U vowel u
Û vowel û
Ê vowel ê
O vowel o
E vowel e In some transcriptions words written here with this phoneme may be transcribed as æ or ə.
A vowel a
The following table summarises the main consonant to character assigments.
Uppercase is not shown, but it has no unexpected values.
Stops | |
---|---|
Affricates | |
Fricatives | |
Nasals | |
Approximants trills & flaps |
For additional details see consonant_mappings.
Whereas the table just above takes you from sounds to letters, the following simply lists the basic consonant letters (however, since the orthography is highly alphabetic there is little difference in ordering).
Most letters relate to sounds as one might expect, but the use of c and j differs from that in many Indic scripts.
As in Sorani, the distinction between ɾ and r exists in the phonology, but not in the writing. On the other hand, unlike Sorani, ɫ is not a phoneme in Kurmanji.ua
ḧ and ẍ were proposed by the script inventors but not widely adopted.
This section maps Kurmanji consonant sounds to common graphemes in the Latin Hawar orthography.
Uppercase forms are shown to the right.
Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.
P consonant p
B consonant b
T consonant t
Ç consonant ç
D consonant d
C consonant c
K consonant k
G consonant g
Q consonant q
Not written. Occurs sometimes in transcriptions of standalone vowels.
F consonant f
V consonant v
S consonant s
Z consonant z
Ş consonant ş
J consonant j
X consonant x
consonant x
consonant ẍ Proposed by the script inventors but not widely adopted.
consonant h
consonant ḧ Proposed by the script inventors but not widely adopted.
H consonant h
M consonant m
N consonant n
consonant n as part of a multi-consonant coda followed by a velar consonant.
Ň consonant ň
W consonant w
R consonant r
R consonant r
consonant rr Only used by some authors when they want to make a distinction between the flat and the trill in writing.
L consonant l
Y consonant y
This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..
Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.
Five letters can be represented as an atomic character (the norm), or as a sequence of base letter plus combining mark. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.
Precomposed | Decomposed |
---|---|
î | 0069 0302 |
ê | 0065 0302 |
û | 0075 0302 |
ç | 0063 0327 |
ş | 0073 0327 |
In decomposed text the vowel signs must be typed and stored after the consonant characters.
ASCII digits are used.
Kurmanji text written in the Hawar orthography runs left to right in horizontal lines.
This section brings together information about the following topics: font/writing styles; cursive text; context-based shaping; context-based positioning; letterform slopes, weights, & italics; case & other character transforms.
Experiment with examples using the Kurmanji character app.
tbd
Letters don't interact, and although the diacritics in decomposed texts need to be positioned correctly relative to the base there is no variation in placement for a given diacritic.
The Latin orthography used for Kurmanji is bicameral, and applications may need to enable transforms to allow the user to switch between cases. Capital letters are used at the beginning of sentences or titles, and for proper nouns.
Words are separated by spaces.
Some words are hyphenated.
ancax-ancax
Since there are no combining marks or decompositions in normal Kurmanji text, grapheme clusters correspond to individual characters. Where combining marks appear in decomposed text, the combination of base and combining mark still fits within the definition of a grapheme cluster.
Kurmanji uses ASCII punctuation.
phrase |
, ; : |
---|---|
sentence |
. ? ! |
Wolof commonly uses ASCII parentheses to insert parenthetical information into text.
start | end | |
---|---|---|
standard | ( |
) |
Kurmanji texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.
start | end | |
---|---|---|
initial |
“ |
” |
Lines are generally broken between words.
As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.
Show line-breaking properties for characters in the Kurmanji orthography.
The following list gives examples of typical behaviours for some of the characters used in Kurmanji. Context may affect the behaviour of some of these and other characters.
Click/tap on the characters to show what they are.
The principal justification opportunities are inter-word spaces.
tbd
Kurmanji Hawar orthography uses the 'alphabetic' baseline.
Line height is the same as for other simple Latin orthographies: there are no combining marks, and any diacritics occurring above of below the base fall within the ascender/descender height.