Greek script summary (Draft)

Updated 16-Apr-2019 • tags greek, scriptnotes

This page provides basic information about the Greek script, and how it is used for the modern Greek language. It is not authoritative, peer-reviewed information – these are just notes I have gathered or copied from various places as I learned. For character-specific details follow the links to the Greek character notes.

For similar information related to other scripts, see the Script comparison table.

Clicking on red text examples, or highlighting part of the sample text shows a list of characters, with links to more details. Click on the vertical blue bar (bottom right) to change font settings for the sample text. Colours and annotations on panels listing characters relate to their use for the Greek language.

Sample (Modern Greek)

ΑΡΘΡΟ 1 Όλοι οι άνθρωποι γεννιούνται ελεύθεροι και ίσοι στην αξιοπρέπεια και τα δικαιώματα. Είναι προικισμένοι με λογική και συνείδηση, και οφείλουν να συμπεριφέρονται μεταξύ τους με πνεύμα αδελφοσύνης.

ΑΡΘΡΟ 2 Κάθε άνθρωπος δικαιούται να επικαλείται όλα τα δικαιώματα και όλες τις ελευθερίες που προκηρύσσει η παρούσα Διακήρυξη, χωρίς καμία απολύτως διάκριση, ειδικότερα ως προς τη φυλή, το χρώμα, το φύλο, τη γλώσσα, τις θρησκείες, τις πολιτικές ή οποιεσδήποτε άλλες πεποιθήσεις, την εθνική ή κοινωνική καταγωγή, την περιουσία, τη γέννηση ή οποιαδήποτε άλλη κατάσταση. Δεν θα μπορεί ακόμα να γίνεται καμία διάκριση εξαιτίας του πολιτικού, νομικού ή διεθνούς καθεστώτος της χώρας από την οποία προέρχεται κανείς, είτε πρόκειται για χώρα ή εδαφική περιοχή ανεξάρτητη, υπό κηδεμονία ή υπεξουσία, ή που βρίσκεται υπό οποιονδήποτε άλλον περιορισμό κυριαρχίας.

Usage & history

From Scriptsource:

The Greek script was the first documented writing system to represent consonants and vowels with distinct symbols, making it the oldest true alphabet. It has been used since the 8th century BC for writing the Greek language, and select letters are also currently used in writing mathematic and scientific concepts such as π (pi) and Ω (Ohm). Some symbols have been incorporated into the International Phonetic Alphabet. The script has also been used for writing a number of minority languages, including Urum, Albanian Tosk, and Balkan Gagauz Turkish.

The Phoenician prince Cadmus is credited with introducing Phoenician writing to the Greeks, who then adapted it, most significantly by adding vowel letters, to write the Greek language. Historically, there were a number of variants of the alphabet, including an East/West variation. The Western variant was called Chalcidian, from which the Old Italic and Latin alphabets descended, and the Eastern variant was called Ionic, from which the modern Greek alphabet descended. A third variant, called the Attic script, was used in Athens until 400 BC, when they adopted the Ionic script. Subsequently the Ionic script became standard throughout Greece. At that time, Greek was written from right to left or in boustrophedon style, though it is now written from left to right.

From Wikipedia:

The Greek alphabet has been used to write the Greek language since the late 9th century BC or early 8th century BC. It was derived from the earlier Phoenician alphabet, and was the first alphabetic script to have distinct letters for vowels as well as consonants. It is the ancestor of the Latin and Cyrillic scripts. Apart from its use in writing the Greek language, in both its ancient and its modern forms, the Greek alphabet today also serves as a source of technical symbols and labels in many domains of mathematics, science and other fields.

Key features

Greek is an alphabet. Letters typically represent a consonant or vowel sound. See the table to the right for a brief overview of features, taken from the Script Comparison Table.

Text is written horizontally, left to right, and the visual forms of letters don't usually interact. The script is bicameral.

Modern Greek comes in 2 flavours: monotonic and polytonic. Monotonic Greek generally uses only the tonos diacritic to show the location of emphasis in a word, although it may also use the dialytica occasionally to separate vowel sounds.

Polytonic Greek attaches multiple diacritics more often.

Greek letters are in a sense encoded twice, since there is a sizeable set of precomposed characters, but it is also always possible to write equivalent decomposed sequences.

Character lists

The Greek script characters in Unicode 10.0 are spread across 2 blocks, plus one extra for Ancient Greek numbers (not counting the phonetic blocks nor any of the combining character blocks):

The following links give information about characters used for languages associated with this script. The numbers in parentheses are for non-ASCII characters.

For character-specific details see Greek Character notes.

In yellow boxes, show:

Text direction

Modern Greek text is written horizontally, left to right.

Vowels

Basic vowels

The basic set of vowels used in modern Greek includes the following.

Αα␣Εε␣Ιι␣Οο␣Υυ␣Ωω

Diphthongs & digraphs

In modern Greek pairs of vowels may represent a single sound, or something other than two consecutive basic vowel sounds. These are the combinations used in modern text.

αι␣ει␣οι␣υι␣αυ␣ευ␣ηυ␣ου

Polytonic Greek has additional digraphs involving iota.

Three items in the list above are pronounced with v before a vowel or voiced consonant, and f elsewhere.

Monotonic vowel diacritics

Stressed syllables carry a tonos diacritic. This can be written by following the above vowel characters with  ́ [U+0301 COMBINING ACUTE ACCENT], but Unicode also has a set of precomposed characters.

Άά␣Έέ␣Ήή␣Ίί␣Όό␣Ύύ␣Ώώ

Note how the diacritic appears to the left of the uppercase letters. The uppercase letters shown here are only used if the first letter of a word is capitalised and that letter happens to be a vowel with a tonos. If the whole word is uppercased, the tonos is dropped. See case_conversion.

Additionally, monotonic Greek on occasion uses a dialytika diacritic to indicate that two adjacent vowel letters don't form a digraph. Again, it is possible to use  ̈ [U+0308 COMBINING DIAERESIS] with the basic vowel or vowel+tonos, or to use one of the following precomposed characters.

Ϊϊ␣Ϋϋ␣ΰ␣ΐ

Tonos and dialytika may appear together above a vowel. Unlike tonos, dialytika is not dropped for capital letters, but may be produced from a tonos in some circumstances (see case_conversion).

The code point   ̈́ [U+0344 COMBINING GREEK DIALYTIKA TONOS] exists, but its use is discouraged by the Unicode Standard in favour of   ̈ + ́ [U+0308 COMBINING DIAERESIS + U+0301 COMBINING ACUTE ACCENT].u

Polytonic vowel diacritics

In polytonic Greek, stressed syllables are identified using one of 3 diacritics: oxia (called tonos in monotonic Greek), varia, or perispomeni. The original distinctions represented by these 3 marks are no longer relevant to modern Greek, and they simply reflect much older spellings. There are precomposed characters for most combinations, but decomposed sequences use   ́ [U+0301 COMBINING ACUTE ACCENT] for oxia and   ̀ [U+0300 COMBINING GRAVE ACCENT] for varia. Perispomeni can be rendered as a circumflex, a tilde, or occasionally a macron, so a special code point is available for it:   ͂ [U+0342 COMBINING GREEK PERISPOMENI].

A vowel that begins a word carries one of two breathing marks, where the rough breathing mark (dasia) indicates the presence of h, and the smooth (psili) its absence. (h is no longer used in modern Greek.)   ̔ [U+0314 COMBINING REVERSED COMMA ABOVE] represents the rough breathing mark, and  ̓ [U+0313 COMBINING COMMA ABOVE] the smooth breathing mark. The code point  ̓ [U+0343 COMBINING GREEK KORONIS] also represents the smooth breathing mark, but exists for compatibility with other encodings and should not be used.u

The ypogegrammeni (or iota subscript) represents the former offglide for what were long diphthongs in ancient Greek, and in decomposed text can be written using  ͅ [U+0345 COMBINING GREEK YPOGEGRAMMENI]. It is used with 3 vowel letters, α, η, and ω, ie. ᾳῃῳ.

Polytonic Greek also uses  ̈ [U+0308 COMBINING DIAERESIS] to indicate that two adjacent vowels receive equal weight.

The Greek Extended Unicode block provides precomposed characters for most of the combinations of Greek letters and diacritics. The precomposed code points are produced by normalisation.

Ὰὰ␣Ἀἀ␣Ἁἁ␣Ἂἂ␣Ἃἃ␣Ἄἄ␣Ἅἅ␣Ἆἆ␣Ἇἇ␣ᾲ␣ᾴ␣ᾶ␣ᾷ␣Ᾰᾰ␣Ᾱᾱ␣ᾼᾳ␣ᾈᾀ␣ᾉᾁ␣ᾊᾂ␣ᾋᾃ␣ᾌᾄ␣ᾍᾅ␣ᾎᾆ␣ᾏᾇ␣Ὲὲ␣Ἐἐ␣Ἑἑ␣Ἒἒ␣Ἓἓ␣Ἔἔ␣Ἕἕ␣Ὴὴ␣Ἠἠ␣Ἡἡ␣Ἢἢ␣Ἣἣ␣Ἤἤ␣Ἥἥ␣Ἦἦ␣Ἧἧ␣ῆ␣ῌῃ␣ῂ␣ῄ␣ῇ␣ᾘᾐ␣ᾙᾑ␣ᾚᾒ␣ᾛᾓ␣ᾜᾔ␣ᾝᾕ␣ᾞᾖ␣ᾟᾗ␣Ὶὶ␣Ἰἰ␣Ἱἱ␣Ἲἲ␣Ἳἳ␣Ἴἴ␣Ἵἵ␣Ἶἶ␣Ἷἷ␣Ῐῐ␣Ῑῑ␣ῒ␣ῖ␣ῗ␣Ὸὸ␣Ὀὀ␣Ὁὁ␣Ὂὂ␣Ὃὃ␣Ὄὄ␣Ὅὅ␣Ὺὺ␣ὐ␣ὒ␣ὔ␣ὖ␣Ὑὑ␣Ὓὓ␣Ὕὕ␣Ὗὗ␣Ῠῠ␣Ῡῡ␣ῢ␣ῦ␣ῧ␣Ὼὼ␣Ὠὠ␣Ὡὡ␣Ὢὢ␣Ὣὣ␣Ὤὤ␣Ὥὥ␣Ὦὦ␣Ὧὧ␣ῼῳ␣ᾨᾠ␣ᾩᾡ␣ᾪᾢ␣ᾫᾣ␣ᾬᾤ␣ᾭᾥ␣ᾮᾦ␣ᾯᾧ␣ῲ␣ῴ␣ῶ␣ῷ

In addition to the characters just listed, there are a set that replicate characters in monotonic Greek, but change tonos in the character name to oxia. These probably shouldn't be used, since they normalise to characters in the main Greek block (which don't don't get converted back to these characters).

Άά␣Έέ␣Ήή␣Ίί␣ΐ␣Όό␣Ύύ␣ΰ␣Ώώ

The Greek Extended Unicode block also contains a set of spacing diacritics. These should be used for educational purposes, only. Note that those used in monotonic Greek normalise to different modifier characters, so if they are used care needs to be taken that normalisation doesn't take place.

´␣῾␣᾽␣ι␣᾿␣῀␣῁␣῍␣῎␣῏␣῝␣῞␣῟␣῭␣΅␣`

Consonants

Consonants used in modern Greek.

Ββ␣Γγ␣Δδ␣Ζζ␣Ηη␣Θθ␣Κκ␣Λλ␣Μμ␣Νν␣Ξξ␣Ππ␣Ρρ␣Σςσ␣Ττ␣Φφ␣Χχ␣Ψψ

ς [U+03C2 GREEK SMALL LETTER FINAL SIGMA] is a word-final form of σ [U+03C3 GREEK SMALL LETTER SIGMA]. Due to legacy implementations, Unicode has a separate code point for this glyph (see gsub).

Accented rho

Although not a vowel, an initial letter ρ [U+03C1 GREEK SMALL LETTER RHO] can also carry a rough breathing mark. When geminated, the first always has a smooth breathing mark, and the second rough, ie. ῤῥ. w

Precomposed characters are available in the Extended Greek block.

ῤ␣Ῥῥ

Other letters

Apart from the letters mentioned so far, the Unicode Greek block contains a significant number of other characters with the general category property of letter. (This list does not include those characters with COPTIC in their name.)

The following are described by Unicode as archaic.

Ͱ␣ͱ␣Ͳ␣ͳ␣Ͷ␣ͷ␣Ϙ␣ϙ␣Ϛ␣ϛ␣Ϝ␣ϝ␣Ϟ␣ϟ␣Ϡ␣ϡ␣Ϸ␣ϸ␣Ϻ␣ϻ

A number are described as variant letterforms.

Ϗ␣ϐ␣ϑ␣ϒ␣ϓ␣ϔ␣ϕ␣ϖ␣ϗ␣ϰ␣ϱ␣ϲ␣ϴ␣ϵ␣Ϲ

The rest are an assortment of signs, additional letters, editorial symbols, etc.

ʹ␣ͺ␣ι␣ͻ␣ͼ␣ͽ␣Ϳ␣ϳ␣ϼ␣Ͻ␣Ͼ␣Ͽ

Combining marks

The Unicode Greek block contains no combining marks, but in decomposed text combining characters from the common blocks may appear.

́␣̈
̀␣̄␣̆␣̓␣̔␣͂␣ͅ

Punctuation

Modern Greek uses standard western punctuation, including the following.

«␣»␣;␣·

See phrase for more information.

The Unicode Greek block contains only 2 punctuation marks, but it is not recommended to use either.

;␣·

; [U+037E GREEK QUESTION MARK] was originally intended to represent the Greek question mark, but Unicode recommends using ; [U+003B SEMICOLON] instead. During normalization this character is changed to the ASCII semicolon.

The other punctuation mark in the Greek block is · [U+0387 GREEK ANO TELEIA]. During normalisation, this is changed to · [U+00B7 MIDDLE DOT].

Symbols

The Greek Extended Unicode block contains a set of spacing diacritics with the general category of symbol. These should be used for educational purposes, only. Note that those used in monotonic Greek normalise to different modifier characters, so if they are used care needs to be taken that normalisation doesn't take place.

´␣῾␣᾽␣ι␣᾿␣῀␣῁␣῍␣῎␣῏␣῝␣῞␣῟␣῭␣΅␣`

The basic Greek block contains 4 more symbols, which also appear to be not used in modern Greek text.

͵␣΄␣΅␣϶

Glyph shaping & positioning

Context-based shaping

The letter sigma in Greek varies in shape, depending on whether it appears in the middle or at the end of a word.

κόσμος

Two different shapes for sigma (in red), depending on word position.

However, this shaping is not done by rendering rules. There are two separate code points in Unicode for the lowercase σ [U+03C3 GREEK SMALL LETTER SIGMA] and ς [U+03C2 GREEK SMALL LETTER SIGMA], and separate keys on the standard keyboard.

Case conversion

There are different rules around the use of accents with uppercase Greek letters, depending on whether the context is ALL-CAPS or Titlecase. The following description focuses on modern, monotonic Greek.

The tonos accent is only retained for the latter case, ie. words which start with a vowel+tonos when only the first letter of a word is capitalised. When the whole word is capitalised, the tonos is dropped, eg. Έλληνας Éllinas̽ Greek, but ΕΛΛΗΝΑΣ.

The dialytika, on the other hand, is never dropped. A letter with both tonos and dialytika above drops the tonos but keeps the dialytika, eg. ευφυΐα eyfyḯa intelligence becomes ΕΥΦΥΪΑ.

There are, however, some additional rules.

In all-caps, Greek diphthongs with tonos over the first vowel lose the tonos but gain a dialytika over the second vowel in the diphthong, eg. νεράιδα neráıða fairy becomes ΝΕΡΑΪΔΑ.

Also, all-caps Greek does not drop the tonos on the disjunctive eta (usually meaning ‘or’), eg. ήσουν ή εγώ ή εσύ becomes ΗΣΟΥΝ Ή ΕΓΩ Ή ΕΣΥ (note that the initial eta is not disjunctive, and so does drop the tonos). This is to maintain the distinction between ‘either/or’ ή from the η feminine form of the article, in the nominative case, singular number.

The consequences of these rules are that:

  1. it is relatively straightforward to convert lowercase Greek letters to uppercase, but it involves more than just mapping to an uppercase code point for all-caps.
  2. all-caps uppercase letters cannot be easily transformed to lowercase because only context determines whether the conversion should insert tonos marks. For example, does ΑΘΗΝΑ convert to Αθηνά (the goddess) or Αθήνα (capital of Greece)?

Greek converts uppercase sigma to either a final or non-final form, depending on the position in a word, eg. ΟΔΥΣΣΕΥΣ becomes οδυσσευς. This contextual difference is easy to manage, however, compared to the lexical issues in the previous paragraph.

Structural boundaries & markers

Word boundaries

The concept of 'word' is difficult to define in any language (see What is a word?). Here, a word is a vaguely-defined, but recognisable semantic unit that is typically smaller than a phrase and may comprise one or more syllables.

Words are separated by spaces.

Phrase boundaries

Greek uses standard Latin punctuation, except that, instead of a question mark, Greek uses a semi-colon instead.

There is a Greek question mark character in the Unicode Greek & Coptic block, but the Unicode Standard recommends use of the ASCII semi-colon instead.

Line & paragraph layout

Text alignment & justification

Justification is done, principally, by adjusting the space between words.

Use the control below to see how your browser justifies the text sample here.

Όλοι είναι ίσοι απέναντι στον νόμο και έχουν δικαίωμα σε ίση προστασία του νόμου, χωρίς καμία απολύτως διάκριση. Όλοι έχουν δικαίωμα σε ίση προστασία από κάθε διάκριση που θα παραβίαζε την παρούσα Διακήρυξη και από κάθε πρόκληση για μια τέτοια δυσμενή διάκριση.

TBD

Further information needed for this section includes:

Glyph shaping & positioning
    Cursive text
    Context-based shaping
    Multiple combining characters
    Context-based positioning
    Transforming characters

Structural boundaries & markers
    Grapheme, word & phrase boundaries
    Hyphens & dashes
    Bracketing information
    Quotations
    Abbreviations, ellipsis, & repetition
    Emphasis & highlights
    Inline notes & annotations

Inline layout
    Inline text spacing
    Bidirectional text

Line & paragraph layout
    Text direction
    Line breaking
    Hyphenation
    Text alignment & justification
    Counters, lists, etc.
    Styling initials
    Baselines & inline alignment

Page & book layout
    General page layout & progression
    Directional layout features
	Grids & tables
    Notes, footnotes, etc.
    Forms & user interaction
    Page numbering, running headers, etc.

References

  1. [ u ] The Unicode Standard v11.0, Greek, esp. pp303-309.
  2. [D] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0, pp559-563.
  3. [ w ] Wikipedia, Greek alphabet.
  4. [S] Scriptsource, Arabic
Last changed 2019-04-16 7:17 GMT.  •  Make a comment.  •  Licence CC-By © r12a.