Kurmanji (draft)

Hawar orthography notes

Updated 20 April, 2024

This page brings together basic information about the Latin script and its use for the Kurmanji language written in the Hawar orthography. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Kurmanji using Unicode.

Referencing this document

Richard Ishida, Kurmanji (Hawar) Orthography Notes, 20-Apr-2024, https://r12a.github.io/scripts/latn/kmr

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size:   28px

Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.

Heryek, bê tu cihêyî, nemaze ya nijad, reng, zayend (cisn), ziman, ol, ramana siyasî an her ramana din, eslê neteweyî an civakî, serwet, zayîn an her rewşeke din, xwediyê hemû maf û hemû azadiyên ku di vê Danezanê de hatine daxuyankirî ye.

Bi serde, ewê tu cihêyî neyê kirin ji ber statuya siyasî, hiqûqî an navneteweyî ya welat an erdê ku kesek jê tê, heke ev welat an erd serbixwe, li jêr dest, ne xweser (otonom) an li jêr her tahdîda din a serweriyê be an na.

Source: UDHR, articles 1 & 2

Usage & history

Origins of the Latin script, 7thC – today.

Phoenician

└ Greek

└ Old Italic

└ Latin

+ Glagolitic

+ Cyrillic

+ Armenian

+ Georgian

+ Coptic

+ Runes

The Kurmanji (or Northern Kurdish) language is spoken by almost 16 million people. 9 million speakers are in Turkey, although overall usage is beginning to decline there.ekmr The Latin script is the primary orthography, but the Arabic script is used in Iran, Iraq, Syria, and Lebanon.ekmr

Kurmanji is widely written in the Latin script for poetry, general literature, education, and political documents.ekmr

Kurmancî

The Hawar orthography described here was devised by Celadet Bedirxan and his brother Kamuran Alî Bedirxan and launched in 1932. They aimed to create an alphabet that didn't use two letters for representing one sound. In addition to the older Latin orthography, Kurmanji has been written in the past in Arabic, Armenian, and Cyrillic orthographies.

The following map of Kurdish dialects was created for Wikipedia. The Wikipedia article on Sorani contains a useful additional details about the use of Sorani since the 1700s.

Map of Kurdish language use.
Map of Kurdish language use.

Basic features

The Latin script is an alphabet. This means that it is largely phonetic in nature, where each letter represents a basic sound. See the table to the right for a brief overview of features for the modern Wolof orthography using the Latin script.

Kurmanji text runs left-to-right in horizontal lines. Words are separated by spaces. The orthography is bicameral. The visual forms of letters don't usually interact.

❯ consonantSummary

Kurmanji has 23 basic consonant letters (each of which is duplicated in upper- and lowercase). The alphabet is largely phonemic, however, four of the consonant letters represent more than one consonant sound: 3 additional letters are listed here that represent individual sounds, but their use is not standard.

❯ basicV

This orthography is an alphabet in which vowels are written using 8 vowel letters, each with upper and lowercase forms. (Two combining marks only occur in decomposed text.) Long vowels are indicated by letters with a circumflex above.

No special features are used for standalone vowels.

Numbers use ASCII digits.

Line-breaking and justification are primarily based on inter-word spaces.

Character index

Letters

Show

Consonants

ç␣ş
Ç␣Ş

Vowels

ê␣î␣û
Ê␣Î␣Û

Non-standard letters

ň␣ḧ␣ẍ
Ň␣Ḧ␣Ẍ

ASCII

a␣b␣c␣d␣e␣f␣g␣h␣i␣j␣k␣l␣m␣n␣o␣p␣q␣r␣s␣t␣u␣v␣w␣x␣y␣z
A␣B␣C␣D␣E␣F␣G␣H␣I␣J␣K␣L␣M␣N␣O␣P␣Q␣R␣S␣T␣U␣V␣W␣X␣Y␣Z

Combining marks

Show

Decomposed text only

̧

Punctuation

Show
–␣—␣…␣‰␣‘␣’␣“␣”␣‑

ASCII

!␣%␣(␣)␣,␣-␣.␣:␣;␣?␣[␣]

Other

Show

To be investigated

‹␣›␣«␣»
Items to show in lists

Phonology

The following represents the repertoire of the ZZZZ language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

ɪ ʊ ɛ ɑː ɑː

Thaxton reports that The phoneme ɛ is often transcribed æ or ə, but it is also closer to ɛ in all environments for many speakers. The examples transcribed in this page from Wiktionary all use the latter.wmt

Consonant sounds

labial alveolar post-
alveolar
palatal velar uvular pharyngeal glottal
stop p b t d     k ɡ q   ʔ
affricate     t͡ʃ d͡ʒ          
fricative f v s z ʃ ʒ   x ɣ χ ħ ʕ h
nasal m n     ŋ    
approximant w l ɫ   j      
trill/flap   r ɾ    

Unlike Sorani, some transcriptions of Kurmanji add phonemes for unaspirated voiceless stops, , ,, and t͡ʃˤ, which are also slightly pharyngealized. However, this distinction doesn't apply to all dialects, and in some areas is dying out. The distinction is not reflected in the Latin orthography (but is in the Cyrillic). The examples in this page do not make that distinction.ua

The sounds ħ and ʕ only occur in some dialects with Arabic loan words. In other cases h and ʔ are used insteadua, as they are in most of the examples in this page.

Tone

Kurmanji is not a tonal language.

Structure

tbd

Vowels

Vowel summary table

The following table summarises the main vowel to character assigments.

Lowercase on the left, uppercase on the right.

Simple:
î␣i␣u␣û
Î␣I␣U␣Û
ê␣o
Ê␣O
e
E
a
A

For additional details see vowel_mappings.

Here is the set of characters described in this section.

A␣E␣I␣O␣U␣a␣e␣i␣o␣u␣Ê␣Î␣Û␣ê␣î␣û

Vowels after consonants

Vowel letters

These are dedicated vowel letters.

î␣i␣u␣û␣ê␣o␣e␣a
Î␣I␣U␣Û␣Ê␣O␣E␣A

Multipart vowels

Multipart vowels are only produced when text is decomposed. See encoding.

Vowel length

Vowel length is indicated by the vowel letter used.

Standalone vowels

No special features are used for standalone vowels. The vowel letters shown above are used on their own.

afret

eylo

otomobîl

The same applies for standalone vowels within a word.

reaktor

qenaet

biencam

Vowel sounds to characters

This section maps Kurmanji vowel sounds to common graphemes in the Hawar orthography.

Uppercase is not shown here.

Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.

Plain vowels

 

00EE

inglîzî

ɪ
 

0069

ihtiyac

ʊ
 

0075

sîxur

 

00FB

hêrûg

 

00EA

bêdê

 

006F

otomobîl

ɛ
 

0065 In some transcriptions words written here with this phoneme may be transcribed as æ or ə.

enextar

ɑː
 

0061

agahdarî

Consonants

Consonant summary table

The following table summarises the main consonant to character assigments.

Uppercase is not shown, but it has no unexpected values.

Stops
p␣b␣t␣d␣k␣g␣q
Affricates
ç␣c
Fricatives
f␣v␣s␣z␣ş␣j␣x␣h
Nasals
m␣n
Approximants
trills & flaps
w␣r␣l␣y

For additional details see consonant_mappings.

Basic consonant letters

Whereas the table just above takes you from sounds to letters, the following simply lists the basic consonant letters (however, since the orthography is highly alphabetic there is little difference in ordering).

p␣b␣t␣d␣k␣g␣q␣ç␣c␣f␣v␣s␣z␣ş␣j␣x␣h␣m␣n␣w␣r␣l␣y
P␣B␣T␣D␣K␣G␣Q␣Ç␣C␣F␣V␣S␣Z␣Ş␣J␣X␣H␣M␣N␣W␣R␣L␣Y

Most letters relate to sounds as one might expect, but the use of c and j differs from that in many Indic scripts.

As in Sorani, the distinction between ɾ and r exists in the phonology, but not in the writing. On the other hand, unlike Sorani, ɫ is not a phoneme in Kurmanji.ua

Additional letters

ḧ␣ẍ␣ň

Consonant sounds to characters

This section maps Kurmanji consonant sounds to common graphemes in the Latin Hawar orthography.

Uppercase is not shown. Characters in a column to the right are alternatives that are not in general use.

Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown. Sounds listed as 'infrequent' are allophones, dialectal variants, or sounds used for foreign words, etc.

p
 

0070

pûşper

b
 

0062

berbijar

t
 

0074

tişt

d
 

0064

dadmend

k
 

006B

keskesor

ɡ
 

0067

girtîgeh

q
 

0071

qozeqer

ʔ
 

Not written. Occurs sometimes in transcriptions of standalone vowels.

t͡ʃ
 

00E7

çoçik

d͡ʒ
 

0063

carcaran

f
 

0066

ferfûr

v
 

0076

veavakirin

s
 

0073

sist

z
 

007A

zanist

ʃ
 

015F

şimşûr

ʒ
 

006A

mijmij

x
 

0078

xox

χ
 

0078

kuxîn

1E8D Proposed by the script inventors but not widely adopted.

ħ
 

0068

heft

1E27 Proposed by the script inventors but not widely adopted.

h
 

0068

herherî

m
 

006D

misilman

n
 

006E

navendî

ŋ
 

006E as part of a multi-consonant coda followed by a velar consonant.

aheng

0148

w
 

0077

weza

ɾ
 

0072

rêwî

r
 

0072

rûpel

0072 0072 Only used by some authors when they want to make a distinction between the flat and the trill in writing.

l
 

006C

kilîl

j
 

0079

yek

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Canonically equivalent encodings

Five letters can be represented as an atomic character (the norm), or as a sequence of base letter plus combining mark. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.

Precomposed Decomposed
î 0069 0302
ê 0065 0302
û 0075 0302
ç 0063 0327
ş 0073 0327

Codepoint sequences

In decomposed text the vowel signs must be typed and stored after the consonant characters.

Numbers, dates, currency, etc

ASCII digits are used.

Text direction

Kurmanji text written in the Hawar orthography runs left to right in horizontal lines.

Glyph shaping & positioning

This section brings together information about the following topics: font/writing styles; cursive text; context-based shaping; context-based positioning; letterform slopes, weights, & italics; case & other character transforms.

Experiment with examples using the Kurmanji character app.

Context-based shaping & positioning

tbd

Letters don't interact, and although the diacritics in decomposed texts need to be positioned correctly relative to the base there is no variation in placement for a given diacritic.

Transforming characters

The Latin orthography used for Kurmanji is bicameral, and applications may need to enable transforms to allow the user to switch between cases. Capital letters are used at the beginning of sentences or titles, and for proper nouns.

Typographic units

Word boundaries

Words are separated by spaces.

Some words are hyphenated.

ancax-ancax

Graphemes

Since there are no combining marks or decompositions in normal Kurmanji text, grapheme clusters correspond to individual characters. Where combining marks appear in decomposed text, the combination of base and combining mark still fits within the definition of a grapheme cluster.

Punctuation & inline features

Phrase & section boundaries

,␣;␣:␣.␣?␣!

Kurmanji uses ASCII punctuation.

phrase

,

;

:

sentence

.

?

!

Bracketed text

(␣)

Wolof commonly uses ASCII parentheses to insert parenthetical information into text.

  start end
standard

(

)

Quotations & citations

“␣”␣‘␣’

Kurmanji texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

  start end
initial

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the Kurmanji orthography.

The following list gives examples of typical behaviours for some of the characters used in Kurmanji. Context may affect the behaviour of some of these and other characters.

Click/tap on the characters to show what they are.

  • “ ‘ (   should not be the last character on a line.
  • ” ’ ) . , ; ! ? %   should not begin a new line.

Text alignment & justification

The principal justification opportunities are inter-word spaces.

Baselines, line height, etc.

tbd

Kurmanji Hawar orthography uses the 'alphabetic' baseline.

Line height is the same as for other simple Latin orthographies: there are no combining marks, and any diacritics occurring above of below the base fall within the ascender/descender height.

Page & book layout

Online resources

  1. Kurdish Wikipedia

References