Getting Started with sumer

1. Introduction

The sumer package provides tools for translating and analyzing Sumerian cuneiform texts. It converts between different text representations, offers dictionary lookup, and includes an interactive translation tool.

Modern scholars typically work with Sumerian texts in transliterated form โ€“ a phonetic rendering in Latin characters. For example, the Sumerian word for king is transliterated as lugal. However, translation does not depend on pronunciation. The meaning of a sign depends on the sign itself, not on how it is read aloud. Even when there is reason to believe that words with similar pronunciations have similar meanings, dictionaries can be based solely on the cuneiform characters. The same cuneiform sign can have several different readings (transliterations), but it always has exactly one sign name and one cuneiform character.

library(sumer)
#> Watch for dictionary updates:
#>     https://founder-hypothesis.com/en/sumerian-mythology/downloads/

2. Representations of Cuneiform Signs

Each cuneiform sign has three representations:

Representation Example Description
Transliteration lugal Phonetic transcription in lowercase letters
Sign name LUGAL Canonical name in uppercase letters
Cuneiform ๐’ˆ— Unicode character (U+12000 to U+12500)

The package works internally with cuneiform characters and sign names. Transliteration serves as a convenient input method.

Note on display: Cuneiform characters require a font supporting the Unicode Cuneiform block (U+12000โ€“U+12500). In RStudio, the AGG graphics backend should be enabled (Tools > Global Options > General > Graphics > Backend > AGG).

2.1 Retrieving sign information

The function info() retrieves all available information about a sign or sign sequence:

info("lugal")
#> ๐’ˆ—    LUGAL   lugal, lillan, rab3, ลกarrum 
#> 
#> syllables      : lugal
#> sign names     : LUGAL
#> cuneiform text : ๐’ˆ—

For compound expressions, all contained signs are analyzed:

info("d-en-lil2")
#> ๐’€ญ    AN  an, d, diฤir, il3, am6, naggax 
#> ๐’‚—    EN  en, in4, ru12, uru16 
#> ๐’†ค    KID ke4, kid, lil2, ge2, gi2 
#> 
#> syllables      : d-en-lil2
#> sign names     : AN.EN.KID
#> cuneiform text : ๐’€ญ๐’‚—๐’†ค

Each sign (d, en, lil2) is shown with its sign name (AN, EN, KID) and cuneiform character. The alternatives column lists all possible readings โ€“ for instance, EN can also be read as ru12 or uru16.

2.2 Conversion between representations

Two functions convert entire texts:

# Transliteration -> Cuneiform
as.cuneiform("lugal-e")
#> ๐’ˆ—๐’‚Š
as.cuneiform(c("d-en-lil2", "an-ki"))
#> ๐’€ญ๐’‚—๐’†ค
#> ๐’€ญ๐’† 

# Transliteration -> Sign names
as.sign_name("lugal-e")
#> LUGAL.E
as.sign_name(c("d-en-lil2", "an-ki"))
#> AN.EN.KID
#> AN.KI

Within a word, hyphens (-) separate syllables; dots (.) separate sign names; spaces separate words.

3. Dictionary Lookup

3.1 Loading a dictionary

The package includes a built-in dictionary:

dic <- read_dictionary()
#>  ###---------------------------------------------------------------
#>  ###                Sumerian Dictionary
#>  ###
#>  ### Author:  Robin Wellmann
#>  ### Year:    2026
#>  ### Version: 0.5
#>  ### Watch for Updates: https://founder-hypothesis.com/en/sumerian-mythology/downloads/
#>  ###---------------------------------------------------------------

The vignette โ€œTranslating Sumerian Textsโ€ describes how you can create your own dictionary.

3.2 Forward lookup: Sumerian -> English

look_up("lugal", dic)
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  Search: lugal
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#> 
#>  Cuneiform:   ๐’ˆ—
#>  Sign Names:  LUGAL
#> 
#>  โ–ถ Translations:
#>   [16] S      great one with human body {king}
#>   [ 3] S      kingship
#>   [ 1] S      great one with human body
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The output shows the sign name, cuneiform character, translations with frequency counts and grammatical types, and entries for individual signs and substrings. For compound expressions, all partial combinations are looked up as well:

look_up("d-suen", dic)
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  Search: d-suen
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#> 
#>  Cuneiform:   ๐’€ญ๐’‚—๐’ช
#>  Sign Names:  AN.EN.ZU
#> 
#>  โ–ถ Translations:
#>   [ 8] S      god of heaven who is a cultural leader with knowledge {god Suen}
#>   [ 1] S      divine cultural leader with knowledge
#> 
#>  โ–ถ Individual Signs / Substrings:
#> 
#>   AN  ๐’€ญ
#>   [29] S      god of heaven
#>   [18] โ˜’Sโ†’S   divine S
#>   [5]  S      sky/heaven
#>   [1]  Aโ˜’โ†’A   A by divine intervention
#>   [1]  S      divine one
#> 
#>   EN  ๐’‚—
#>   [17] S      cultural leader
#>   [4]  Sโ˜’โ†’A   who acts as S
#>   [1]  S      cultural leadership
#> 
#>   ZU  ๐’ช
#>   [1]  S      knowledge
#>   [1]  โ˜’Sโ†’A   with knowledge about S
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

3.3 Reverse lookup: English -> Sumerian

To find the Sumerian sign for an English term, use lang = "en":

look_up("Enki", dic, "en")
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  Search: Enki
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#> 
#>  โ–ถ Matches for 'Enki':
#> 
#>   AN.EN.KI  ๐’€ญ๐’‚—๐’† 
#>   [1] S      god of heaven who is the cultural leader of the earth {god Enki}
#> 
#>   AN.ZA.GUDร—KUR.BI.I.A  ๐’€ญ๐’๐’„ ๐’‰๐’„ฟ๐’€€
#>   [1] S      divine radiance of the strong invincible one, the raw material of the
#>              god Enki, the god of heaven who is the cultural leader of the earth
#>              {Zambija}
#> 
#>   I.A  ๐’„ฟ๐’€€
#>   [1] S      life force with transformative power {god Enki}
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The reverse lookup searches all translations and displays matching entries with their sign names and cuneiform characters.

4. The Type System

Each dictionary entry has a grammatical type in addition to its translation. These types describe the function of a sign in a sentence. Since the same sign can serve different functions depending on context, it may have multiple entries with different types.

4.1 Basic types

There are three basic types:

Type Name Description Example
S Substantive Noun phrases and substantives โ€œkingโ€, โ€œEarthโ€
V Verb Verbs and verbal expressions โ€œcreateโ€, โ€œgoโ€
A Attribute Modifying clauses โ€œwho is strongโ€

You can see the different types of a sign with look_up():

look_up("an", dic)
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#>  Search: an
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
#> 
#>  Cuneiform:   ๐’€ญ
#>  Sign Names:  AN
#> 
#>  โ–ถ Translations:
#>   [29] S      god of heaven
#>   [18] โ˜’Sโ†’S   divine S
#>   [ 5] S      sky/heaven
#>   [ 1] Aโ˜’โ†’A   A by divine intervention
#>   [ 1] S      divine one
#> 
#> โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

The sign AN appears both as a noun (S: โ€œsky/heavenโ€) and as an operator that transforms other expressions โ€“ which brings us to the next topic.

4.2 Operators

In addition to basic types, there are operators. An operator takes one or two expressions of a certain type as arguments and produces an expression of a (possibly different) type. The notation describes where the arguments stand and what type is produced:

Notation Alternative notation Meaning
Sx->V Sโ˜’โ†’V Takes an S to the left, produces a V
xS->S โ˜’Sโ†’S Takes an S to the right, produces an S
Sx->A Sโ˜’โ†’A Takes an S to the left, produces an A
Sx->S Sโ˜’โ†’S Takes an S to the left, produces an S
xV->V โ˜’Vโ†’V Takes a V to the right, produces a V
Vx->V Vโ˜’โ†’V Takes a V to the left, produces a V
SSx->V SSโ˜’โ†’V Takes two S to the left, produces a V

The x marks the position of the operator itself. In Sx->V, the argument S stands to the left of the operator; in xS->S, the argument stands to the right. Some operators with two arguments (like SSx->V) take both arguments from the same side. The symbols x and -> have the alternative Unicode representations โ˜’ and โ†’ that may appear in dictionary entries and line files.

In translations, the placeholder S (or V) stands for the argument. For example, an operator xS->S with the translation โ€œcommunity of Sโ€ means: take the noun to the right of this sign and insert it where the S placeholder stands. For operators with two arguments of the same type, the placeholders are numbered: S1 and S2.

Let us trace through a concrete example. Consider the expression un-ma-gi from โ€œEnki and the World Orderโ€ (line 16), which consists of three signs:

Syllable Cuneiform Sign Type Translation
un ๐’Œง โ˜’Sโ†’S community of S
ma ๐’ˆ  S container
gi ๐’„€ Sโ˜’โ†’S the permanent S

We build up the noun phrase step by step:

  1. ๐’ˆ  is a simple noun (S): โ€œcontainerโ€.
  2. ๐’„€ is an operator Sโ˜’โ†’S that takes the S to its left. It binds ๐’ˆ  and replaces the placeholder: โ€œthe permanent containerโ€. The result is again an S.
  3. ๐’Œง is an operator โ˜’Sโ†’S that takes the S to its right. It binds the result from step 2: โ€œcommunity of the permanent containerโ€. The result is again an S.

Note that operators with arguments at the right-hand side bind stronger than operators with arguments at the left-hand side. The final noun phrase is: โ€œcommunity of the permanent containerโ€ (type S). In this context, โ€œcommunityโ€ refers to the people and โ€œpermanent containerโ€ refers to the land of Sumer. The expression un-ma-gi thus means โ€œthe people of Sumerโ€. In the dictionary, such context-dependent meanings are recorded using the notation โ€œliteral meaning {specific meaning}โ€. The dictionary entry for this expression would be: โ€œcommunity of the permanent container {people of Sumer}โ€. The literal meaning documents the compositional structure, while the specific meaning in curly braces gives a contextual interpretation.

4.3 Verb types

A simple verb (type V) stands on its own โ€“ for example, โ€œto be used as a resourceโ€. It combines directly with a subject noun phrase (S + V -> SEN) to form a sentence.

Many Sumerian verbs, however, take a noun phrase as their object. The type Vt describes such a transitive verb: it takes an S as its object and produces a complex intransitive verb (V). The translation contains an S placeholder for the object, for example โ€œto equip Sโ€. The resulting V must then be combined with a subject (S) to form a complete sentence (SEN). Vt is a generalization of Sx->V that also works correctly when the verb has prefixes or suffixes.

In Sumerian, verbs are often preceded by verb prefixes โ€“ signs that modify the verbโ€™s meaning (expressing modality, aspect, or other nuances). A verb prefix has the type โ˜’Vโ†’V: it takes a verb to its right and produces a modified verb. Since each prefix produces a V, multiple prefixes can chain together. They bind from right to left, wrapping around the core verb like layers. Conversely, a verb suffix has the type Vโ˜’โ†’V and binds from left to right. Prefixes and suffixes can co-occur.

Consider the verb chain gan-ig-la from line 8 of โ€œEnki and the World Orderโ€:

Syllable Cuneiform Sign Type Translation
gan ๐’ƒถ โ˜’Vโ†’V may V
ig ๐’…… โ˜’Vโ†’V V with the task of establishing sustenance of human existence
la ๐’†ท Vt to equip S

The verb builds up from the core outward:

๐’ƒถ          ๐’……                         ๐’†ท
โ˜’Vโ†’V         โ˜’Vโ†’V                        Vt
"may V"      "V with the task ..."        "to equip S"
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              ๐’……(๐’†ท) = "equip S with the task ..."
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
๐’ƒถ(๐’……(๐’†ท)) = "may equip S with the task ..."

๐’†ท (Vt: โ€œto equip Sโ€) is the core verb. ๐’…… (โ˜’Vโ†’V) wraps it with additional meaning. ๐’ƒถ (โ˜’Vโ†’V) adds modality. The final composed verb is: โ€œmay equip S with the task of establishing sustenance of human existenceโ€ (Vt). This complex verb still has its S placeholder โ€“ it will be filled when the verb meets its object during sentence composition.

4.4 Composition rules

When two expressions stand side by side and neither is an operator, they are combined by composition rules:

Left Right Result Translation pattern
S S S โ€œX of/with Yโ€
S A S โ€œX Yโ€ (juxtaposition)
S V SEN โ€œX Yโ€ (subject + verb, โ€œtoโ€ stripped)
SEN SEN SEN โ€œX. Yโ€ (sentences joined with period)

The type SEN stands for a complete sentence. It arises when a noun phrase (S) meets a verb (V). For example, if โ€œseparated groupsโ€ (S) is followed by โ€œto be createdโ€ (V), the composition strips the โ€œtoโ€ and produces the sentence โ€œSeparated groups be created.โ€ (SEN). This raw composition must then be finished by hand โ€“ in this case, adjusting the verb form to produce โ€œSeparated groups are created.โ€

These rules, together with the operator types from the previous sections, are sufficient to translate entire Sumerian sentences from their individual parts. Since in Old Sumerian texts, signs decode phrases rather than syllables, this implies that the Old Sumerian language is actually not a natural language. It is a formal language that can be pronounced in any natural language that is sufficiently complex.

5. Text Analysis

5.1 N-gram analysis

A good first step when working with a new text is to search for frequently recurring sign combinations (n-grams). Such patterns are valuable clues: if a certain sequence of cuneiform signs appears repeatedly, it is likely a fixed term, a compound word, or an idiomatic expression.

The package includes the example text โ€œEnki and the World Orderโ€:

path <- system.file("extdata", "project", "enki_and_the_world_order.txt", package = "sumer")
text <- readLines(path, encoding = "UTF-8")

The function ngram_frequencies() finds recurring combinations:

freq <- ngram_frequencies(text, min_freq = c(6, 4, 2))
head(freq, 10)
#>    frequency length          combination
#> 1          2     20 ๐’€ญ๐’‚—๐’† ๐’ˆ—๐’ช๐’€Š๐’†ค๐’…Ž๐’ƒฒ๐’ˆพ๐’†ณ๐’†ช๐’ฒ๐’ฃ๐’‰ˆ๐’Œ‹๐’Œ‹๐’Œ‹๐’ˆพ๐’‚Š
#> 2          2     16     ๐’€ญ๐’‚—๐’†ค๐’ˆ—๐’†ณ๐’†ณ๐’Š๐’Š๐’‚—๐’†ค๐’† ๐’‚ ๐’ƒถ๐’ˆพ๐’€Š๐’บ
#> 3          2     15      ๐’€Š๐’ˆฌ๐’Œง๐’ƒป๐’€Š๐’†ฌ๐’‚ต๐’€€๐’€ญ๐’Šฎ๐’‰๐’ƒด๐’†“๐’€€๐’€ญ
#> 4          2     15      ๐’€€๐’ˆพ๐’€€๐’Š๐’€ญ๐’‡ฒ๐’€€๐’ˆพ๐’€€๐’Š๐’€Š๐’ˆญ๐’‚Š๐’‰ˆ๐’‚—
#> 5          3     14       ๐’‚๐’†ณ๐’Š‘๐’‚๐’€ญ๐’‚—๐’†ค๐’‡ฒ๐’†ค๐’ƒป๐’……๐’†ท๐’‰†๐’‹›
#> 6          2     14       ๐’‰ฃ๐’† ๐’† ๐’†ฌ๐’† ๐’†—๐’†—๐’†ท๐’€ธ๐’ˆจ๐’ˆค๐’‹—๐’‹พ๐’€€
#> 7          2     14       ๐’ˆฅ๐’Œ…๐’ˆง๐’€ฒ๐’Š•๐’‚Š๐’Œ‹๐’Œ‹๐’Œ‹๐’ˆฌ๐’‰Œ๐’‰บ๐’„ธ๐’บ
#> 8          2     12         ๐’†ค๐’ˆฌ๐’Œง๐’•๐’„พ๐’‚—๐’†ค๐’† ๐’…—๐’‰Œ๐’€€๐’€ญ
#> 9         11     10           ๐’€ญ๐’‚—๐’† ๐’†ค๐’ ๐’€๐’‰†๐’ˆช๐’…”๐’บ
#> 10         2     10           ๐’…†๐’…Ž๐’ˆ ๐’€ญ๐’ป๐’ป๐’„€๐’Œ‹๐’Œ‹๐’Œ‹

The min_freq parameter controls the minimum frequency for different n-gram lengths. The default value c(6, 4, 2) means: single signs must occur at least 6 times, pairs at least 4 times, and all longer combinations at least 2 times.

The analysis works from the longest combinations down to the shortest. When a long combination is identified as frequent, duplicated occurrences are masked so that shorter sub-combinations are not falsely counted as frequent just because they are part of the longer combination.

With mark_ngrams(), the identified patterns are marked in the text with curly braces:

text_marked <- mark_ngrams(text, freq)
cat(text_marked[1:5], sep = "\n")
#> 
#> 1    ๐’‚—๐’ˆค๐’ฒ {๐’€ญ๐’† } ๐’‰ช๐’……๐’…Ž๐’‹ผ๐’ˆพ
#> 2     { {๐’€€ {๐’€€๐’€ญ} ๐’‚—} ๐’† } ๐’„ž๐’ฎ๐’€€๐’Š‘๐’€€๐’„  {๐’ƒฒ๐’‚Š๐’Œ…๐’•} 
#> 3     {๐’Šฉ {๐’…—๐’‚ต} }  {๐’†ณ๐’ƒฒ { { {๐’€ญ๐’‚—} ๐’†ค} ๐’‡ท} }  {๐’† ๐’‰˜}  {๐’€ญ {๐’†ฌ๐’‚ต} } 
#> 4    ๐’ˆ—๐’„‘ {๐’ˆฉ {๐’ช๐’€Š} }  {๐’€€๐’†• {๐’€€๐’†ณ๐’†ณ} ๐’‹ซ๐’…๐’†ท}

You can also search for a specific pattern in the annotated text:

term    <- "IGI.DIB.TU"
pattern <- mark_ngrams(term, freq)
pattern
#> [1] " { {๐’…†๐’ณ} ๐’Œ…} "
result  <- text_marked[grepl(pattern, text_marked, fixed = TRUE)]
cat(result, sep = "\n")
#> 12   ๐’„‹ { {๐’…†๐’ณ} ๐’Œ…} ๐’‡ป๐’…† { { {๐’…†๐’ณ} ๐’Œ…} ๐’•} 
#> 13   ๐’Šพ { {๐’…†๐’ณ} ๐’Œ…} ๐’Šพ๐’‡ { { {๐’…†๐’ณ} ๐’Œ…} ๐’•} 
#> 53    {๐’€ญ๐’‰ก๐’ถ๐’„ท๐’„ญ} ๐’‡‡ {๐’ฃโŸจฤœA2โŸฉ { {๐’Œ“๐’บ} ๐’‰Œ} } ๐’ƒข {๐’ฃ๐’ƒถ { {๐’…†๐’ณ} ๐’Œ…} } 
#> 54   ๐’€–๐’†ฐ { {๐’Œ“๐’บ} ๐’‰Œ} ๐’€ซ {๐’ฃ๐’ƒถ { {๐’…†๐’ณ} ๐’Œ…} } 
#> 55   ๐’š {๐’ฃโŸจฤœA2โŸฉ { {๐’Œ“๐’บ} ๐’‰Œ} } ๐’ˆง {๐’ฃ๐’ƒถ { {๐’…†๐’ณ} ๐’Œ…} } 
#> 80    { { {๐’…†๐’ณ} ๐’Œ…} ๐’•} ๐’Œ‰๐’Š• {๐’€ญ {๐’†ฌ๐’‚ต} }  {๐’ˆจ๐’‚—} 
#> 196  ๐’Œฃ๐’ฃ๐’† ๐’€ญ { {๐’…†๐’ณ} ๐’Œ…} ๐’๐’€ญ๐’ถ๐’‹—๐’‰ก๐’‹ผ๐’‚ท
#> 197   { {๐’ˆ— { {๐’…†๐’ณ} ๐’Œ…} } ๐’ˆฝ๐’ฃ๐’†Ÿ๐’‰ˆ} 
#> 198   { {๐’‚— { {๐’…†๐’ณ} ๐’Œ…} } ๐’Š•๐’ƒž {๐’‚ท๐’‚ท} } 
#> 258   { {๐’€€๐’‡‰} ๐’ˆฆ๐’„˜๐’ƒผ} ๐’„ ๐’ƒฒ๐’ถ๐’Šฎ๐’…Ž๐’„พ { {๐’…†๐’ณ} ๐’Œ…} ๐’€ {๐’ˆฌ๐’‰Œ} โ€ฆ
#> 280  ๐’ƒป๐’†Ÿ๐’•๐’‰Œ { {๐’…†๐’ณ} ๐’Œ…}  {๐’‰ก {๐’Œ“๐’บ} } 
#> 296   {๐’‰ฃ๐’ƒฒ} XXX { {๐’…†๐’ณ} ๐’Œ…} โ€ฆ
#> 298  ๐’€Š๐’ƒป๐’„ญ๐’€Š๐’ƒป { {๐’…†๐’ณ} ๐’Œ…} โ€ฆ
#> 402   {๐’ˆ— { {๐’…†๐’ณ} ๐’Œ…} }  {๐’‚— { {๐’…†๐’ณ} ๐’Œ…} } ๐’‰ {๐’‹—๐’‰Œ๐’€€ { {๐’ƒถ๐’‚—} ๐’……} } 
#> 410   { {๐’ˆ— { {๐’…†๐’ณ} ๐’Œ…} } ๐’ˆฝ๐’ฃ๐’†Ÿ๐’‰ˆ} 
#> 411   { {๐’‚— { {๐’…†๐’ณ} ๐’Œ…} } ๐’Š•๐’ƒž {๐’‚ท๐’‚ท} } ๐’‹—๐’ˆพ { {๐’ƒถ๐’‚—} ๐’……}

5.2 Grammar probabilities

To understand the structure of a sentence, it is helpful to know which grammatical role each individual sign is likely to play. The function sign_grammar() looks up each sign of a string in the dictionary and counts how often it occurs with each grammatical type:

sg  <- sign_grammar("a-ma-ru ba-ur3 ra", dic)

The raw frequencies can be refined into probabilities using a Bayesian model. First, compute the prior distribution of types across all signs in the dictionary:

prior <- prior_probs(dic, sentence_prob = 0.25)

The sentence_prob parameter corrects a systematic bias: if a dictionary was primarily built from noun phrases (rather than complete sentences), verbs are underrepresented. A value of 0.25 means that an estimated 25% of the dictionary entries come from complete sentences. Verb probabilities are then upweighted accordingly.

Next, grammar_probs() computes the posterior probabilities for each sign:

gp <- grammar_probs(sg, prior, dic)

For signs with many dictionary entries, the observed frequencies dominate; for rare signs, the result falls back to the prior distribution. The position of a sign in the sequence is currently not taken into account for calculating probabilities.

The function plot_sign_grammar() presents the results as a stacked bar chart:

plot_sign_grammar(gp, sign_names = TRUE)

Each bar represents a sign position in the sentence. The colours represent grammatical types: green for nouns (S), red shades for verbs (V) and verb operators, blue shades for attribute operators, orange for adjective-like operators (Sโ˜’โ†’S), and grey shades for all other operators. A tall bar in a particular colour indicates that the sign likely has that grammatical function.

The chart can also be saved to a file:

plot_sign_grammar(gp, output_file = "grammar.png")

5.3 Grammatical structure of a cuneiform text

Once you have assigned grammatical types to each sign, the function grammatical_structure() shows how the parts are grouped according to the operator binding and composition rules. The output uses typed brackets to indicate the role of each group: () for substantives (S), <> for verbs (V), [] for attributes (A), and {} for sentences (SEN).

Consider the expression mec3-ki-aj2-ga-ce-er ce du:

x <- "mec3-ki-aj2-ga-ce-er-ce-du"
x <- paste0(info(x)$reading, collapse = "-")
x
#> [1] "meลก3-ki-aฤ2-ga-ลกe-er-ลกe-du"
expr <- split_sumerian(x)$signs
type <- c("S", "S", "Sx->A", "xS->A", "S", "Sx->S", "S", "Sx->V")

grammatical_structure(x, type, expr)
#> {(((meลก3)[(ki)aฤ2])[ga((ลกe)er)])<(ลกe)du>}

The following figure shows the same result with colour coding:

The figure shows that the sentence has the typical structure of an Old Sumerian sentence with the subject (mec3) at its beginning, followed by some specifications of the subject (here in square brackets), followed by the object (ce), and the verb (du) that absorbs the object. This example demonstrates that many Sumerian proper names are self-explanatory. The term โ€œmec3-ki-aj2-ga-ce-erโ€ stands for the proper name โ€œMeskiagasherโ€, but can also be read as a noun phrase.

This visualization makes the grammatical structure explicit and can help verify that the type assignments produce a sensible grouping.

6. Interactive Translation with translate()

The function translate() opens an interactive Shiny gadget for translating Sumerian text. To demonstrate, we use a fragment from line 16 of โ€œEnki and the World Orderโ€:

x <- as.cuneiform("cag4-kalam-ma-gi-hal. hal-la-gin7.")
result <- translate(x)

This expression contains eight cuneiform signs. Our task is to assign each sign a grammatical type and translation, and then compose them into coherent English sentences.

6.1 Recognizing sentence boundaries

The input actually contains two sentences. You must recognize sentence boundaries yourself โ€“ they are not detected automatically. In general, sentence boundaries follow directly after verbs.

A striking feature of this Old Sumerian text is that duplicated signs often mark sentence boundaries: the left occurrence functions as a verb at the end of one sentence, while the right occurrence functions as a noun at the beginning of the next sentence. In our example, the sign HAL (๐’„ฌ) appears twice. The first HAL is a verb (Vt: โ€œto split S into separate groupsโ€) ending the first sentence, while the second HAL is a noun (S: โ€œseparated groupsโ€) beginning the second sentence.

The two sentences are:

  1. cag4-kalam-ma-gi-hal (๐’Šฎ๐’Œง๐’ˆ ๐’„€๐’„ฌ): โ€œThe central administration splits the people of Sumer into separate groups.โ€
  2. hal-la-gin7 (๐’„ฌ๐’†ท๐’ถ): โ€œPlaces for the separated groups are created.โ€

6.2 Structure of the translate gadget

When translate() opens, you see a scrollable page with the following sections. The gadget is described in more detail in the vignette โ€œTranslating Sumerian Textsโ€.

6.3 Looking up and adopting dictionary entries

When the gadget opens, each sign is pre-filled with its most frequent translation from the dictionary. These suggestions are not always correct โ€“ they are simply the entries with the highest count.

Consider the sign gi=GI=๐’„€ in our example. The automatic suggestion may show a noun entry (S) if that is the most frequent type for ๐’„€ in the dictionary. However, in this context, ๐’„€ functions as an adjective operator Sโ˜’โ†’S meaning โ€œpermanent Sโ€.

To correct this:

  1. Click the green arrow button next to the ๐’„€ entry. This opens the dictionary panel below, showing all entries for ๐’„€ with their counts and types.
  2. Find the correct entry โ€“ in this case, Sโ˜’โ†’S: โ€œpermanent Sโ€.
  3. Click the dictionary row to adopt its type and translation into the skeleton.

If you use multiple dictionaries, the first one has priority for the automatic suggestions. All dictionaries are displayed in the lookup panel, so you can choose from any of them.

6.4 Defining structure with brackets

In the bracket input field (next to the โ€œUpdate Skeletonโ€ button), you can control how the skeleton is structured by inserting brackets:

Round brackets ( ) group signs into a compound expression. The skeleton will show an entry for the group in addition to entries for its individual signs. Hence, the brackets tell the tool that these signs form a coherent phrase and adds a line to the skeleton where its translation can be entered.

Angle brackets < > mark a fixed term (typically a proper name). No individual entries are generated for the signs inside. For instance, <d-en-ki> would be treated as a single unit โ€œEnkiโ€ without breaking it into AN, EN, KI.

Curly braces { } mark operator arguments. In most cases this is not necessary, because operators and their arguments are detected automatically. Only when the automatic detection fails โ€“ for instance in ambiguous groupings โ€“ do you need to specify operator arguments explicitly with curly braces.

After editing the brackets, click โ€œUpdate Skeletonโ€ to rebuild the template. All previously entered translations are preserved.

6.5 Composing entries with the compose button

Once you have assigned types and translations to the individual signs of a group, you can click the brown compose button (๐’ƒป) next to the parent entry. This automatically combines the children into a composed translation, applying the operator and composition rules described in Section 4.

For example, after filling in the three children of (un-ma-gi):

Syllable Cuneiform Sign Type Translation
un ๐’Œง โ˜’Sโ†’S community of S
ma ๐’ˆ  S container
gi ๐’„€ Sโ˜’โ†’S the permanent S

clicking the compose button on the parent entry produces: type S, translation โ€œcommunity of the permanent containerโ€.

The composed translation often needs manual finishing. In this case, you would edit the translation to add the specific meaning: โ€œcommunity of the permanent container {people of Sumer}โ€. Other common adjustments include adding articles or conjugating verbs. The compose button provides a starting point that you then refine.

6.6 Result and next steps

When you click โ€œDoneโ€, translate() returns a skeleton object โ€“ a character vector containing the completed translation in pipe format. This can be saved as a text file:

result <- translate(x)
writeLines(result, "my_translation.txt")

The saved file serves as input for building a custom dictionary (see Vignette 2).

A completed translation for our example looks like this:

Structure: (๐’Šฎ(๐’Œง๐’ˆ ๐’„€)๐’„ฌ). (๐’„ฌ๐’†ท๐’ถ).

|cag4-kalam-ma-gi-hal-hal-la-gin7: SEN: The central administration splits
  the people of Sumer into separate groups. Places for the separated
  groups are created.

|cag4-kalam-ma-gi-hal=ล A3.UN.MA.GI.HAL: SEN: The central administration
  splits the people of Sumer into separate groups.
|   cag4=ล A3=๐’Šฎ: S: center {the central administration}
|   kalam-ma-gi=UN.MA.GI=๐’Œง๐’ˆ ๐’„€: S: community of the permanent container {people of Sumer}
|       kalam=UN=๐’Œง: โ˜’Sโ†’S: community of S
|       ma=MA=๐’ˆ : S: container
|       gi=GI=๐’„€: Sโ˜’โ†’S: the permanent S
|   hal=HAL=๐’„ฌ: Vt: to split S into separate groups

|hal-la-gin7=HAL.LA.DIM2=๐’„ฌ๐’†ท๐’ถ: SEN: Places for the separated groups
  are created.
|   hal=HAL=๐’„ฌ: S: separated groups
|   la=LA=๐’†ท: Sโ˜’โ†’S: place for S
|   gin7=DIM2=๐’ถ: V: to be created

Each line starting with | is a dictionary entry. The indentation reflects the hierarchical structure: the overall sentence at the top, word groups below, and individual signs at the deepest level.

Learning by example. The package includes an example project with lines 1โ€“31 of โ€œEnki and the World Orderโ€ already translated. You can open any of these lines to study the translations and learn how the type system works in practice:

path <- system.file("extdata", package = "sumer")

file.copy(
  from = file.path(path, "project"),
  to   = tempdir(),
  recursive = TRUE
)

ctx <- translation_context(
  line_folder   = file.path(tempdir(), "project/lines"),
  text          = file.path(tempdir(), "project/enki_and_the_world_order.txt"),
  dic           = file.path(path, "sumer-dictionary.txt"),
  sentence_prob = 0.25
)

# Open line 16 to see the full translation of our example
translate_line(16, ctx)

The second vignette (โ€œTranslating Sumerian Textsโ€) describes the complete workflow for translating a document line by line and building a dictionary from the results.