The sumer package provides tools for translating and
analyzing Sumerian cuneiform texts. It converts between different text
representations, offers dictionary lookup, and includes an interactive
translation tool.
Modern scholars typically work with Sumerian texts in
transliterated form โ a phonetic rendering in Latin
characters. For example, the Sumerian word for king is transliterated as
lugal. However, translation does not depend on
pronunciation. The meaning of a sign depends on the sign itself, not on
how it is read aloud. Even when there is reason to believe that words
with similar pronunciations have similar meanings, dictionaries can be
based solely on the cuneiform characters. The same cuneiform sign can
have several different readings (transliterations), but it always has
exactly one sign name and one cuneiform character.
Each cuneiform sign has three representations:
| Representation | Example | Description |
|---|---|---|
| Transliteration | lugal |
Phonetic transcription in lowercase letters |
| Sign name | LUGAL |
Canonical name in uppercase letters |
| Cuneiform | ๐ | Unicode character (U+12000 to U+12500) |
The package works internally with cuneiform characters and sign names. Transliteration serves as a convenient input method.
Note on display: Cuneiform characters require a font supporting the Unicode Cuneiform block (U+12000โU+12500). In RStudio, the AGG graphics backend should be enabled (Tools > Global Options > General > Graphics > Backend > AGG).
The function info() retrieves all available information
about a sign or sign sequence:
info("lugal")
#> ๐ LUGAL lugal, lillan, rab3, ลกarrum
#>
#> syllables : lugal
#> sign names : LUGAL
#> cuneiform text : ๐For compound expressions, all contained signs are analyzed:
info("d-en-lil2")
#> ๐ญ AN an, d, diฤir, il3, am6, naggax
#> ๐ EN en, in4, ru12, uru16
#> ๐ค KID ke4, kid, lil2, ge2, gi2
#>
#> syllables : d-en-lil2
#> sign names : AN.EN.KID
#> cuneiform text : ๐ญ๐๐คEach sign (d, en, lil2) is shown with its sign name (AN, EN, KID) and
cuneiform character. The alternatives column lists all
possible readings โ for instance, EN can also be read as
ru12 or uru16.
Two functions convert entire texts:
# Transliteration -> Cuneiform
as.cuneiform("lugal-e")
#> ๐๐
as.cuneiform(c("d-en-lil2", "an-ki"))
#> ๐ญ๐๐ค
#> ๐ญ๐
# Transliteration -> Sign names
as.sign_name("lugal-e")
#> LUGAL.E
as.sign_name(c("d-en-lil2", "an-ki"))
#> AN.EN.KID
#> AN.KIWithin a word, hyphens (-) separate syllables; dots
(.) separate sign names; spaces separate words.
The package includes a built-in dictionary:
dic <- read_dictionary()
#> ###---------------------------------------------------------------
#> ### Sumerian Dictionary
#> ###
#> ### Author: Robin Wellmann
#> ### Year: 2026
#> ### Version: 0.5
#> ### Watch for Updates: https://founder-hypothesis.com/en/sumerian-mythology/downloads/
#> ###---------------------------------------------------------------The vignette โTranslating Sumerian Textsโ describes how you can create your own dictionary.
look_up("lugal", dic)
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> Search: lugal
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#>
#> Cuneiform: ๐
#> Sign Names: LUGAL
#>
#> โถ Translations:
#> [16] S great one with human body {king}
#> [ 3] S kingship
#> [ 1] S great one with human body
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโThe output shows the sign name, cuneiform character, translations with frequency counts and grammatical types, and entries for individual signs and substrings. For compound expressions, all partial combinations are looked up as well:
look_up("d-suen", dic)
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> Search: d-suen
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#>
#> Cuneiform: ๐ญ๐๐ช
#> Sign Names: AN.EN.ZU
#>
#> โถ Translations:
#> [ 8] S god of heaven who is a cultural leader with knowledge {god Suen}
#> [ 1] S divine cultural leader with knowledge
#>
#> โถ Individual Signs / Substrings:
#>
#> AN ๐ญ
#> [29] S god of heaven
#> [18] โSโS divine S
#> [5] S sky/heaven
#> [1] AโโA A by divine intervention
#> [1] S divine one
#>
#> EN ๐
#> [17] S cultural leader
#> [4] SโโA who acts as S
#> [1] S cultural leadership
#>
#> ZU ๐ช
#> [1] S knowledge
#> [1] โSโA with knowledge about S
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโTo find the Sumerian sign for an English term, use
lang = "en":
look_up("Enki", dic, "en")
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> Search: Enki
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#>
#> โถ Matches for 'Enki':
#>
#> AN.EN.KI ๐ญ๐๐
#> [1] S god of heaven who is the cultural leader of the earth {god Enki}
#>
#> AN.ZA.GUDรKUR.BI.I.A ๐ญ๐๐ ๐๐ฟ๐
#> [1] S divine radiance of the strong invincible one, the raw material of the
#> god Enki, the god of heaven who is the cultural leader of the earth
#> {Zambija}
#>
#> I.A ๐ฟ๐
#> [1] S life force with transformative power {god Enki}
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโThe reverse lookup searches all translations and displays matching entries with their sign names and cuneiform characters.
Each dictionary entry has a grammatical type in addition to its translation. These types describe the function of a sign in a sentence. Since the same sign can serve different functions depending on context, it may have multiple entries with different types.
There are three basic types:
| Type | Name | Description | Example |
|---|---|---|---|
| S | Substantive | Noun phrases and substantives | โkingโ, โEarthโ |
| V | Verb | Verbs and verbal expressions | โcreateโ, โgoโ |
| A | Attribute | Modifying clauses | โwho is strongโ |
You can see the different types of a sign with
look_up():
look_up("an", dic)
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#> Search: an
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#>
#> Cuneiform: ๐ญ
#> Sign Names: AN
#>
#> โถ Translations:
#> [29] S god of heaven
#> [18] โSโS divine S
#> [ 5] S sky/heaven
#> [ 1] AโโA A by divine intervention
#> [ 1] S divine one
#>
#> โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโThe sign AN appears both as a noun (S: โsky/heavenโ) and as an operator that transforms other expressions โ which brings us to the next topic.
In addition to basic types, there are operators. An operator takes one or two expressions of a certain type as arguments and produces an expression of a (possibly different) type. The notation describes where the arguments stand and what type is produced:
| Notation | Alternative notation | Meaning |
|---|---|---|
| Sx->V | SโโV | Takes an S to the left, produces a V |
| xS->S | โSโS | Takes an S to the right, produces an S |
| Sx->A | SโโA | Takes an S to the left, produces an A |
| Sx->S | SโโS | Takes an S to the left, produces an S |
| xV->V | โVโV | Takes a V to the right, produces a V |
| Vx->V | VโโV | Takes a V to the left, produces a V |
| SSx->V | SSโโV | Takes two S to the left, produces a V |
The x marks the position of the operator itself. In
Sx->V, the argument S stands to the left of the
operator; in xS->S, the argument stands to the right.
Some operators with two arguments (like SSx->V) take
both arguments from the same side. The symbols x and
-> have the alternative Unicode representations
โ and โ that may appear in dictionary entries
and line files.
In translations, the placeholder S (or
V) stands for the argument. For example, an operator
xS->S with the translation โcommunity of Sโ means: take
the noun to the right of this sign and insert it where the S placeholder
stands. For operators with two arguments of the same type, the
placeholders are numbered: S1 and S2.
Let us trace through a concrete example. Consider the expression
un-ma-gi from โEnki and the World Orderโ (line 16), which
consists of three signs:
| Syllable | Cuneiform Sign | Type | Translation |
|---|---|---|---|
| un | ๐ง | โSโS | community of S |
| ma | ๐ | S | container |
| gi | ๐ | SโโS | the permanent S |
We build up the noun phrase step by step:
SโโS that takes the S
to its left. It binds ๐ and replaces the placeholder: โthe permanent
containerโ. The result is again an S.โSโS that takes the S
to its right. It binds the result from step 2: โcommunity of the
permanent containerโ. The result is again an S.Note that operators with arguments at the right-hand side bind
stronger than operators with arguments at the left-hand side. The final
noun phrase is: โcommunity of the permanent containerโ (type S). In this
context, โcommunityโ refers to the people and โpermanent containerโ
refers to the land of Sumer. The expression un-ma-gi thus
means โthe people of Sumerโ. In the dictionary, such context-dependent
meanings are recorded using the notation โliteral meaning
{specific meaning}โ. The dictionary entry for this expression
would be: โcommunity of the permanent container {people of Sumer}โ. The
literal meaning documents the compositional structure, while the
specific meaning in curly braces gives a contextual interpretation.
A simple verb (type V) stands on its own โ for example, โto be used as a resourceโ. It combines directly with a subject noun phrase (S + V -> SEN) to form a sentence.
Many Sumerian verbs, however, take a noun phrase as their object. The
type Vt describes such a transitive
verb: it takes an S as its object and produces a complex
intransitive verb (V). The translation contains an S placeholder for the
object, for example โto equip Sโ. The resulting V must then be combined
with a subject (S) to form a complete sentence (SEN). Vt is
a generalization of Sx->V that also works correctly when
the verb has prefixes or suffixes.
In Sumerian, verbs are often preceded by verb
prefixes โ signs that modify the verbโs meaning (expressing
modality, aspect, or other nuances). A verb prefix has the type
โVโV: it takes a verb to its right and produces a modified
verb. Since each prefix produces a V, multiple prefixes can chain
together. They bind from right to left, wrapping around the core verb
like layers. Conversely, a verb suffix has the type
VโโV and binds from left to right. Prefixes and suffixes
can co-occur.
Consider the verb chain gan-ig-la from line 8 of โEnki
and the World Orderโ:
| Syllable | Cuneiform Sign | Type | Translation |
|---|---|---|---|
| gan | ๐ถ | โVโV | may V |
| ig | ๐ | โVโV | V with the task of establishing sustenance of human existence |
| la | ๐ท | Vt | to equip S |
The verb builds up from the core outward:
๐ถ ๐
๐ท
โVโV โVโV Vt
"may V" "V with the task ..." "to equip S"
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐
(๐ท) = "equip S with the task ..."
โโโโโโโโโโโโโโโโโโโโโโโโ
๐ถ(๐
(๐ท)) = "may equip S with the task ..."
๐ท (Vt: โto equip Sโ) is the core verb. ๐
(โVโV) wraps it with additional meaning. ๐ถ
(โVโV) adds modality. The final composed verb is: โmay
equip S with the task of establishing sustenance of human existenceโ
(Vt). This complex verb still has its S placeholder โ it
will be filled when the verb meets its object during sentence
composition.
When two expressions stand side by side and neither is an operator, they are combined by composition rules:
| Left | Right | Result | Translation pattern |
|---|---|---|---|
| S | S | S | โX of/with Yโ |
| S | A | S | โX Yโ (juxtaposition) |
| S | V | SEN | โX Yโ (subject + verb, โtoโ stripped) |
| SEN | SEN | SEN | โX. Yโ (sentences joined with period) |
The type SEN stands for a complete sentence. It arises when a noun phrase (S) meets a verb (V). For example, if โseparated groupsโ (S) is followed by โto be createdโ (V), the composition strips the โtoโ and produces the sentence โSeparated groups be created.โ (SEN). This raw composition must then be finished by hand โ in this case, adjusting the verb form to produce โSeparated groups are created.โ
These rules, together with the operator types from the previous sections, are sufficient to translate entire Sumerian sentences from their individual parts. Since in Old Sumerian texts, signs decode phrases rather than syllables, this implies that the Old Sumerian language is actually not a natural language. It is a formal language that can be pronounced in any natural language that is sufficiently complex.
A good first step when working with a new text is to search for frequently recurring sign combinations (n-grams). Such patterns are valuable clues: if a certain sequence of cuneiform signs appears repeatedly, it is likely a fixed term, a compound word, or an idiomatic expression.
The package includes the example text โEnki and the World Orderโ:
path <- system.file("extdata", "project", "enki_and_the_world_order.txt", package = "sumer")
text <- readLines(path, encoding = "UTF-8")The function ngram_frequencies() finds recurring
combinations:
freq <- ngram_frequencies(text, min_freq = c(6, 4, 2))
head(freq, 10)
#> frequency length combination
#> 1 2 20 ๐ญ๐๐ ๐๐ช๐๐ค๐
๐ฒ๐พ๐ณ๐ช๐ฒ๐ฃ๐๐๐๐๐พ๐
#> 2 2 16 ๐ญ๐๐ค๐๐ณ๐ณ๐๐๐๐ค๐ ๐ ๐ถ๐พ๐๐บ
#> 3 2 15 ๐๐ฌ๐ง๐ป๐๐ฌ๐ต๐๐ญ๐ฎ๐๐ด๐๐๐ญ
#> 4 2 15 ๐๐พ๐๐๐ญ๐ฒ๐๐พ๐๐๐๐ญ๐๐๐
#> 5 3 14 ๐๐ณ๐๐๐ญ๐๐ค๐ฒ๐ค๐ป๐
๐ท๐๐
#> 6 2 14 ๐ฃ๐ ๐ ๐ฌ๐ ๐๐๐ท๐ธ๐จ๐ค๐๐พ๐
#> 7 2 14 ๐ฅ๐
๐ง๐ฒ๐๐๐๐๐๐ฌ๐๐บ๐ธ๐บ
#> 8 2 12 ๐ค๐ฌ๐ง๐๐พ๐๐ค๐ ๐
๐๐๐ญ
#> 9 11 10 ๐ญ๐๐ ๐ค๐ ๐๐๐ช๐
๐บ
#> 10 2 10 ๐
๐
๐ ๐ญ๐ป๐ป๐๐๐๐The min_freq parameter controls the minimum frequency
for different n-gram lengths. The default value c(6, 4, 2)
means: single signs must occur at least 6 times, pairs at least 4 times,
and all longer combinations at least 2 times.
The analysis works from the longest combinations down to the shortest. When a long combination is identified as frequent, duplicated occurrences are masked so that shorter sub-combinations are not falsely counted as frequent just because they are part of the longer combination.
With mark_ngrams(), the identified patterns are marked
in the text with curly braces:
text_marked <- mark_ngrams(text, freq)
cat(text_marked[1:5], sep = "\n")
#>
#> 1 ๐๐ค๐ฒ {๐ญ๐ } ๐ช๐
๐
๐ผ๐พ
#> 2 { {๐ {๐๐ญ} ๐} ๐ } ๐๐ฎ๐๐๐๐ {๐ฒ๐๐
๐}
#> 3 {๐ฉ {๐
๐ต} } {๐ณ๐ฒ { { {๐ญ๐} ๐ค} ๐ท} } {๐ ๐} {๐ญ {๐ฌ๐ต} }
#> 4 ๐๐ {๐ฉ {๐ช๐} } {๐๐ {๐๐ณ๐ณ} ๐ซ๐
๐ท}You can also search for a specific pattern in the annotated text:
term <- "IGI.DIB.TU"
pattern <- mark_ngrams(term, freq)
pattern
#> [1] " { {๐
๐ณ} ๐
} "
result <- text_marked[grepl(pattern, text_marked, fixed = TRUE)]
cat(result, sep = "\n")
#> 12 ๐ { {๐
๐ณ} ๐
} ๐ป๐
{ { {๐
๐ณ} ๐
} ๐}
#> 13 ๐พ { {๐
๐ณ} ๐
} ๐พ๐ { { {๐
๐ณ} ๐
} ๐}
#> 53 {๐ญ๐ก๐ถ๐ท๐ญ} ๐ {๐ฃโจฤA2โฉ { {๐๐บ} ๐} } ๐ข {๐ฃ๐ถ { {๐
๐ณ} ๐
} }
#> 54 ๐๐ฐ { {๐๐บ} ๐} ๐ซ {๐ฃ๐ถ { {๐
๐ณ} ๐
} }
#> 55 ๐ {๐ฃโจฤA2โฉ { {๐๐บ} ๐} } ๐ง {๐ฃ๐ถ { {๐
๐ณ} ๐
} }
#> 80 { { {๐
๐ณ} ๐
} ๐} ๐๐ {๐ญ {๐ฌ๐ต} } {๐จ๐}
#> 196 ๐ฃ๐ฃ๐ ๐ญ { {๐
๐ณ} ๐
} ๐๐ญ๐ถ๐๐ก๐ผ๐ท
#> 197 { {๐ { {๐
๐ณ} ๐
} } ๐ฝ๐ฃ๐๐}
#> 198 { {๐ { {๐
๐ณ} ๐
} } ๐๐ {๐ท๐ท} }
#> 258 { {๐๐} ๐ฆ๐๐ผ} ๐ ๐ฒ๐ถ๐ฎ๐
๐พ { {๐
๐ณ} ๐
} ๐ {๐ฌ๐} โฆ
#> 280 ๐ป๐๐๐ { {๐
๐ณ} ๐
} {๐ก {๐๐บ} }
#> 296 {๐ฃ๐ฒ} XXX { {๐
๐ณ} ๐
} โฆ
#> 298 ๐๐ป๐ญ๐๐ป { {๐
๐ณ} ๐
} โฆ
#> 402 {๐ { {๐
๐ณ} ๐
} } {๐ { {๐
๐ณ} ๐
} } ๐ {๐๐๐ { {๐ถ๐} ๐
} }
#> 410 { {๐ { {๐
๐ณ} ๐
} } ๐ฝ๐ฃ๐๐}
#> 411 { {๐ { {๐
๐ณ} ๐
} } ๐๐ {๐ท๐ท} } ๐๐พ { {๐ถ๐} ๐
}To understand the structure of a sentence, it is helpful to know
which grammatical role each individual sign is likely to play. The
function sign_grammar() looks up each sign of a string in
the dictionary and counts how often it occurs with each grammatical
type:
The raw frequencies can be refined into probabilities using a Bayesian model. First, compute the prior distribution of types across all signs in the dictionary:
The sentence_prob parameter corrects a systematic bias:
if a dictionary was primarily built from noun phrases (rather than
complete sentences), verbs are underrepresented. A value of 0.25 means
that an estimated 25% of the dictionary entries come from complete
sentences. Verb probabilities are then upweighted accordingly.
Next, grammar_probs() computes the posterior
probabilities for each sign:
For signs with many dictionary entries, the observed frequencies dominate; for rare signs, the result falls back to the prior distribution. The position of a sign in the sequence is currently not taken into account for calculating probabilities.
The function plot_sign_grammar() presents the results as
a stacked bar chart:
Each bar represents a sign position in the sentence. The colours represent grammatical types: green for nouns (S), red shades for verbs (V) and verb operators, blue shades for attribute operators, orange for adjective-like operators (SโโS), and grey shades for all other operators. A tall bar in a particular colour indicates that the sign likely has that grammatical function.
The chart can also be saved to a file:
Once you have assigned grammatical types to each sign, the function
grammatical_structure() shows how the parts are grouped
according to the operator binding and composition rules. The output uses
typed brackets to indicate the role of each group: () for
substantives (S), <> for verbs (V), []
for attributes (A), and {} for sentences (SEN).
Consider the expression mec3-ki-aj2-ga-ce-er ce du:
x <- "mec3-ki-aj2-ga-ce-er-ce-du"
x <- paste0(info(x)$reading, collapse = "-")
x
#> [1] "meลก3-ki-aฤ2-ga-ลกe-er-ลกe-du"
expr <- split_sumerian(x)$signs
type <- c("S", "S", "Sx->A", "xS->A", "S", "Sx->S", "S", "Sx->V")
grammatical_structure(x, type, expr)
#> {(((meลก3)[(ki)aฤ2])[ga((ลกe)er)])<(ลกe)du>}The following figure shows the same result with colour coding:
The figure shows that the sentence has the typical structure of an Old Sumerian sentence with the subject (mec3) at its beginning, followed by some specifications of the subject (here in square brackets), followed by the object (ce), and the verb (du) that absorbs the object. This example demonstrates that many Sumerian proper names are self-explanatory. The term โmec3-ki-aj2-ga-ce-erโ stands for the proper name โMeskiagasherโ, but can also be read as a noun phrase.
This visualization makes the grammatical structure explicit and can help verify that the type assignments produce a sensible grouping.
translate()The function translate() opens an interactive Shiny
gadget for translating Sumerian text. To demonstrate, we use a fragment
from line 16 of โEnki and the World Orderโ:
This expression contains eight cuneiform signs. Our task is to assign each sign a grammatical type and translation, and then compose them into coherent English sentences.
The input actually contains two sentences. You must recognize sentence boundaries yourself โ they are not detected automatically. In general, sentence boundaries follow directly after verbs.
A striking feature of this Old Sumerian text is that duplicated signs often mark sentence boundaries: the left occurrence functions as a verb at the end of one sentence, while the right occurrence functions as a noun at the beginning of the next sentence. In our example, the sign HAL (๐ฌ) appears twice. The first HAL is a verb (Vt: โto split S into separate groupsโ) ending the first sentence, while the second HAL is a noun (S: โseparated groupsโ) beginning the second sentence.
The two sentences are:
cag4-kalam-ma-gi-hal (๐ฎ๐ง๐ ๐๐ฌ): โThe central
administration splits the people of Sumer into separate groups.โhal-la-gin7 (๐ฌ๐ท๐ถ): โPlaces for the separated groups are
created.โWhen translate() opens, you see a scrollable page with
the following sections. The gadget is described in more detail in the
vignette โTranslating Sumerian Textsโ.
When the gadget opens, each sign is pre-filled with its most frequent translation from the dictionary. These suggestions are not always correct โ they are simply the entries with the highest count.
Consider the sign gi=GI=๐ in our example. The automatic
suggestion may show a noun entry (S) if that is the most frequent type
for ๐ in the dictionary. However, in this context, ๐ functions as an
adjective operator SโโS meaning โpermanent Sโ.
To correct this:
SโโS: โpermanent Sโ.If you use multiple dictionaries, the first one has priority for the automatic suggestions. All dictionaries are displayed in the lookup panel, so you can choose from any of them.
In the bracket input field (next to the โUpdate Skeletonโ button), you can control how the skeleton is structured by inserting brackets:
Round brackets ( ) group signs into a
compound expression. The skeleton will show an entry for the group in
addition to entries for its individual signs. Hence, the brackets tell
the tool that these signs form a coherent phrase and adds a line to the
skeleton where its translation can be entered.
Angle brackets < > mark a fixed
term (typically a proper name). No individual entries are generated for
the signs inside. For instance, <d-en-ki> would be
treated as a single unit โEnkiโ without breaking it into AN, EN, KI.
Curly braces { } mark operator
arguments. In most cases this is not necessary, because operators and
their arguments are detected automatically. Only when the automatic
detection fails โ for instance in ambiguous groupings โ do you need to
specify operator arguments explicitly with curly braces.
After editing the brackets, click โUpdate Skeletonโ to rebuild the template. All previously entered translations are preserved.
When you click โDoneโ, translate()
returns a skeleton object โ a character vector containing
the completed translation in pipe format. This can be saved as a text
file:
The saved file serves as input for building a custom dictionary (see Vignette 2).
A completed translation for our example looks like this:
Structure: (๐ฎ(๐ง๐ ๐)๐ฌ). (๐ฌ๐ท๐ถ).
|cag4-kalam-ma-gi-hal-hal-la-gin7: SEN: The central administration splits
the people of Sumer into separate groups. Places for the separated
groups are created.
|cag4-kalam-ma-gi-hal=ล A3.UN.MA.GI.HAL: SEN: The central administration
splits the people of Sumer into separate groups.
| cag4=ล A3=๐ฎ: S: center {the central administration}
| kalam-ma-gi=UN.MA.GI=๐ง๐ ๐: S: community of the permanent container {people of Sumer}
| kalam=UN=๐ง: โSโS: community of S
| ma=MA=๐ : S: container
| gi=GI=๐: SโโS: the permanent S
| hal=HAL=๐ฌ: Vt: to split S into separate groups
|hal-la-gin7=HAL.LA.DIM2=๐ฌ๐ท๐ถ: SEN: Places for the separated groups
are created.
| hal=HAL=๐ฌ: S: separated groups
| la=LA=๐ท: SโโS: place for S
| gin7=DIM2=๐ถ: V: to be created
Each line starting with | is a dictionary entry. The
indentation reflects the hierarchical structure: the overall sentence at
the top, word groups below, and individual signs at the deepest
level.
Learning by example. The package includes an example project with lines 1โ31 of โEnki and the World Orderโ already translated. You can open any of these lines to study the translations and learn how the type system works in practice:
path <- system.file("extdata", package = "sumer")
file.copy(
from = file.path(path, "project"),
to = tempdir(),
recursive = TRUE
)
ctx <- translation_context(
line_folder = file.path(tempdir(), "project/lines"),
text = file.path(tempdir(), "project/enki_and_the_world_order.txt"),
dic = file.path(path, "sumer-dictionary.txt"),
sentence_prob = 0.25
)
# Open line 16 to see the full translation of our example
translate_line(16, ctx)The second vignette (โTranslating Sumerian Textsโ) describes the complete workflow for translating a document line by line and building a dictionary from the results.