Kurmanji Hawar orthography notes

Sample

Select part of this sample text to show a list of characters, with links to more details.
Change size: 28px

Hemû mirov azad û di weqar û mafan de wekhev tên dinyayê. Ew xwedî hiş û şuûr in û divê li hember hev bi zihniyeteke bratiyê bilivin.

Heryek, bê tu cihêyî, nemaze ya nijad, reng, zayend (cisn), ziman, ol, ramana siyasî an her ramana din, eslê neteweyî an civakî, serwet, zayîn an her rewşeke din, xwediyê hemû maf û hemû azadiyên ku di vê Danezanê de hatine daxuyankirî ye.

Bi serde, ewê tu cihêyî neyê kirin ji ber statuya siyasî, hiqûqî an navneteweyî ya welat an erdê ku kesek jê tê, heke ev welat an erd serbixwe, li jêr dest, ne xweser (otonom) an li jêr her tahdîda din a serweriyê be an na.

Source: UDHR, articles 1 & 2

Usage & history

The Kurmanji (or Northern Kurdish) language is spoken by almost 16 million people. 9 million speakers are in Turkey, although overall usage is beginning to decline there.ekmr The Latin script is the primary orthography, but the Arabic script is used in Iran, Iraq, Syria, and Lebanon.ekmr

Kurmanji is widely written in the Latin script for poetry, general literature, education, and political documents.ekmr

Kurmancî

The Hawar orthography described here was devised by Celadet Bedirxan and his brother Kamuran Alî Bedirxan and launched in 1932. They aimed to create an alphabet that didn't use two letters for representing one sound. In addition to the older Latin orthography, Kurmanji has been written in the past in Arabic, Armenian, and Cyrillic orthographies.

The following map of Kurdish dialects was created for Wikipedia. The Wikipedia article on Sorani contains a useful additional details about the use of Sorani since the 1700s.

Basic features

The Latin script is an alphabet. This means that it is largely phonetic in nature, where each letter represents a basic sound. See the table to the right for a brief overview of features for the modern Wolof orthography using the Latin script.

Kurmanji text runs left-to-right in horizontal lines. Words are separated by spaces. The orthography is bicameral. The visual forms of letters don't usually interact.

❯ consonantSummary

Kurmanji has 23 basic consonant letters (each of which is duplicated in upper- and lowercase). The alphabet is largely phonemic, however, four of the consonant letters represent more than one consonant sound: 3 additional letters are listed here that represent individual sounds, but their use is not standard.

❯ basicV

This orthography is an alphabet in which vowels are written using 8 vowel letters, each with upper and lowercase forms. (Two combining marks only occur in decomposed text.) Long vowels are indicated by letters with a circumflex above.

No special features are used for standalone vowels.

Numbers use ASCII digits.

Line-breaking and justification are primarily based on inter-word spaces.

Phonology

The following represents the repertoire of the Kurmanji language.

Click on the sounds to reveal locations in this document where they are mentioned.

Phones in a lighter colour are non-native or allophones. Source Wikipedia.

Vowel sounds

Plain vowels

Thaxton reports that The phoneme ɛ is often transcribed æ or ə, but it is also closer to ɛ in all environments for many speakers. The examples transcribed in this page from Wiktionary all use the latter.wmt

Consonant sounds

	labial	alveolar	post- alveolar	palatal	velar	uvular	pharyngeal	glottal
stop	p b	t d			k ɡ	q		ʔ
affricate			t͡ʃ d͡ʒ
fricative	f v	s z	ʃ ʒ		x ɣ	χ	ħ ʕ	h
nasal	m	n			ŋ
approximant	w	l ɫ		j
trill/flap		r ɾ

Unlike Sorani, some transcriptions of Kurmanji add phonemes for unaspirated voiceless stops, pˤ, tˤ,kˤ, and t͡ʃˤ, which are also slightly pharyngealized. However, this distinction doesn't apply to all dialects, and in some areas is dying out. The distinction is not reflected in the Latin orthography (but is in the Cyrillic). The examples in this page do not make that distinction.ua

The sounds ħ and ʕ only occur in some dialects with Arabic loan words. In other cases h and ʔ are used insteadua, as they are in most of the examples in this page.

Tone

Kurmanji is not a tonal language.

Structure

tbd

Vowels

	î,i,u,û
	ê,o
	e
	a

Show uppercase

	Î,I,U,Û
	Ê,O
	E
	A

Post-consonant vowels

Vowel letters

These are dedicated vowel letters.

î,i,u,û,ê,o,e,a

Î,I,U,Û,Ê,O,E,A

Composite vowels

Multiple code points are only used when text is decomposed. See encoding.

Vowel length

Vowel length is indicated by the vowel letter used.

Standalone vowels

No special features are used for standalone vowels. The vowel letters shown above are used on their own.

eg.

afret

eylo

otomobîl

The same applies for standalone vowels within a word.

eg.

reaktor

qenaet

biencam

Vowel sounds to characters

This section maps Kurmanji vowel sounds to common graphemes in the Hawar Latin orthography.

Uppercase forms are shown to the right.

iː

Î vowel î

I i

U vowel u

uː

Û vowel û

eː

Ê vowel ê

oː

O vowel o

E vowel e In some transcriptions words written here with this phoneme may be transcribed as æ or ə.

ɑː

A vowel a

Consonants

	p,b,t,ç,d,c,k,g,q
	f,v,s,z,ş,j,x,x,ẍ,h,ḧ,h
	m,n,n,ň
	w,r,r,rr,l,y

Show uppercase

	P,B,T,Ç,D,C,K,G,Q
	F,V,S,Z,Ş,J,X,X,Ẍ,H,Ḧ,H
	M,N,N,Ň
	W,R,R,Rr,L,Y

Basic consonant letters

Basic consonant sounds in Kurmanji are written using the following letters.

Click on each letter for more details and for examples of usage, especially where more than one sound is indicated.

p,b,t,d,k,g,q,ç,c,f,v,s,z,ş,j,x,h,m,n,w,r,l,y

P,B,T,D,K,G,Q,Ç,C,F,V,S,Z,Ş,J,X,H,M,N,W,R,L,Y

Most letters relate to sounds as one might expect, but the use of c and j differs from that in many Indic scripts.

As in Sorani, the distinction between ɾ and r exists in the phonology, but not in the writing. On the other hand, unlike Sorani, ɫ is not a phoneme in Kurmanji.ua

Additional letters

ḧ,ẍ,ň

ḧ and ẍ were proposed by the script inventors but not widely adopted.

Consonant sounds to characters

This section maps Kurmanji consonant sounds to common graphemes in the Latin Hawar orthography.

Uppercase forms are shown to the right.

Sounds listed as 'infrequent' are allophones, or sounds used for foreign words, etc. Light coloured characters occur infrequently.

P consonant p

B consonant b

T consonant t

t͡ʃ

Ç consonant ç

D consonant d

d͡ʒ

C consonant c

K consonant k

G consonant g

Q consonant q

Not written. Occurs sometimes in transcriptions of standalone vowels.

F consonant f

V consonant v

S consonant s

Z consonant z

Ş consonant ş

J consonant j

X consonant x

consonant x

consonant ẍ Proposed by the script inventors but not widely adopted.

consonant h

consonant ḧ Proposed by the script inventors but not widely adopted.

H consonant h

M consonant m

N consonant n

consonant n as part of a multi-consonant coda followed by a velar consonant.

Ň consonant ň

W consonant w

R consonant r

consonant rr Only used by some authors when they want to make a distinction between the flat and the trill in writing.

L consonant l

Y consonant y

Encoding choices

This section offers advice about characters or character sequences to avoid, and what to use instead. It takes into account the relevance of Unicode Normalisation Form D (NFD) and Unicode Normalisation Form C (NFC)..

Although usage is recommended here, content authors may well be unaware of such recommendations. Therefore, applications should look out for the non-recommended approach and treat it the same as the recommended approach wherever possible.

Canonically equivalent encodings

Five letters can be represented as an atomic character (the norm), or as a sequence of base letter plus combining mark. The parts are separated in Unicode Normalisation Form D (NFD), and recomposed in Unicode Normalisation Form C (NFC), so both approaches should be treated as canonically equivalent.

Precomposed	Decomposed
î	0069 0302
ê	0065 0302
û	0075 0302
ç	0063 0327
ş	0073 0327

Codepoint sequences

In decomposed text the vowel signs must be typed and stored after the consonant characters.

Glyph shaping & positioning

This section brings together information about the following topics: font/writing styles; cursive text; context-based shaping; context-based positioning; letterform slopes, weights, & italics; case & other character transforms.

Experiment with examples using the Kurmanji character app.

Context-based shaping & positioning

tbd

Letters don't interact, and although the diacritics in decomposed texts need to be positioned correctly relative to the base there is no variation in placement for a given diacritic.

Transforming characters

The Latin orthography used for Kurmanji is bicameral, and applications may need to enable transforms to allow the user to switch between cases. Capital letters are used at the beginning of sentences or titles, and for proper nouns.

Punctuation & inline features

Phrase & section boundaries

Kurmanji uses ASCII punctuation.

phrase	, ; :
sentence	. ? !

phrase

;

sentence

Bracketed text

Wolof commonly uses ASCII parentheses to insert parenthetical information into text.

	start	end
standard	(	)

Quotations & citations

Kurmanji texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.

	start	end
initial	“	”

Line & paragraph layout

Line breaking & hyphenation

Lines are generally broken between words.

Line-edge rules

As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.

Show line-breaking properties for characters in the Kurmanji orthography.

The following list gives examples of typical behaviours for some of the characters used in Kurmanji. Context may affect the behaviour of some of these and other characters.

Click/tap on the characters to show what they are.

“ ‘ ( should not be the last character on a line.
” ’ ) . , ; ! ? % should not begin a new line.

Text alignment & justification

The principal justification opportunities are inter-word spaces.

Baselines, line height, etc.

tbd

Kurmanji Hawar orthography uses the 'alphabetic' baseline.

Line height is the same as for other simple Latin orthographies: there are no combining marks, and any diacritics occurring above of below the base fall within the ascender/descender height.

Notes, footnotes, etc

See inlinenotes for purely inline annotations, such as ruby or warichu. This section is about annotation systems that separate the reference marks and the content of the notes.

Latin, Kurmanji

Sample

Usage & history

Basic features

Character index

Letters

Consonants

Vowels

Non-standard letters

ASCII

Combining marks

Punctuation

ASCII

Other

To be investigated

Phonology

Vowel sounds

Plain vowels

Consonant sounds

Tone

Structure

Vowels

Post-consonant vowels

Vowel letters

Composite vowels

Vowel length

Standalone vowels

Vowel sounds to characters

Vowel absence

Consonants

Basic consonant letters

Additional letters

Consonant sounds to characters

Encoding choices

Canonically equivalent encodings

Codepoint sequences

Numbers, dates, currency, etc

Text direction

Glyph shaping & positioning

Context-based shaping & positioning

Transforming characters

Typographic units

Word boundaries

Graphemes

Punctuation & inline features

Phrase & section boundaries

Bracketed text

Quotations & citations

Line & paragraph layout

Line breaking & hyphenation

Line-edge rules

Text alignment & justification

Baselines, line height, etc.

Page & book layout

Online resources

References