In this mini-tutorial we will implement a small domain-specific language and a corresponding Eclipse IDE integration with Xtext. Later, we will create a code generator that is capable of reading the models you create with the DSL editor and process them, i.e. generate some Java code from it.
Download and install the latest version of Xtext and set up a fresh workspace. We want to develop a small language used to define domain models. It is a stripped-down version of our Domain-Model Example which is shipped with Xtext and can be materialized into your workspace using the example wizard. The following sample of a domain model should give you an idea about the language.
datatype String
datatype Bool
entity Session {
title: String
isTutorial : Bool
}
entity Conference {
name : String
attendees : Person*
speakers : Speaker*
}
entity Person {
name : String
}
entity Speaker extends Person {
sessions : Session*
}
In order to get started we first need to create some Eclipse projects. Use the Xtext wizard to do so:
File -> New -> Project... -> Xtext -> Xtext projectChoose a meaningful project name, language name and file extension, e.g.
Main project name: | org.eclipse.xtext.example.domainmodel |
Language name: | org.eclipse.xtext.example.Domainmodel |
DSL-File extension: | dmodel |
Keep “Create generator project” checked, as we will also create a code generator in a second step. If you want to materialize the full blown Domainmodel example that ships with Xtext, it is strongly recommend to use another language name and file extension for this tutorial.
Click on Finish to create the projects.
In the Package Explorer you can see three new projects. In org.eclipse.xtext.example.domainmodel the grammar is defined and all runtime aspects, such as linking, scoping and validation are developed. The IDE-aspects such as the editor, any views, and incremental project builder go into org.eclipse.xtext.example.domainmodel.ui. Both projects consist of generated classes derived from the grammar and manual code such as the grammar itself and further classes to differentiate and adapt from the default behavior. Although Xtext makes use of code generation, most of the code is actually written as libraries, which are referenced by the means of OSGi using the Manifest.MF.
It is good to be clear and unambiguous whether the code is generated or is to be manipulated by the developer. Thus, the generated code should be held separately from the manually written code. We follow this pattern by having a folder src/ and a folder src-gen/ in each project. Keep in mind not to make changes in the src-gen/ folder. They will be overwritten by the generator without any further warning.
A third project, org.eclipse.xtext.example.domainmodel.generator will later contain an Xpand code generator that leverages the model created with the DSL editor. This is fully optional and it is also optional to use Xpand. In fact you can use any programming language running on the JVM to implement a code generator for your Xtext models. Also it is of course perfectly ok, to have no code generator at all but use the models dynamically at runtime. We call these kind of model processors interpreters.
Please note : If you encounter strange errors while copying and pasting a snippet from this document to your Eclipse editor, your documentation viewer most likely has inserted characters different from {space} into your clipboard. Reenter the “whitespace” or type the text by hand to be sure everything works fine.
The wizard will automatically open the example grammar file Domainmodel.xtext from the first project in the editor. A grammar has two purposes: First, it is used to describe the concrete syntax of your language. Second, it contains information about how the parser shall create a model during parsing.
The first line of the grammar is the grammar declaration:
grammar org.eclipse.xtext.example.Domainmodel
with org.eclipse.xtext.common.Terminals
In Xtext each grammar has a unique name, which like public Java classes needs to reflect the location of the file within the Java classpath. In our case the grammar file is located in /org/eclipse/xtext/example/Domainmodel.xtext therefore the name of the grammar is org.eclipse.xtext.example.Domainmodel. The second part of that statement ( with org.eclipse.xtext.common.Terminals) states, that this grammar reuses and overrides rules from the specified grammar. The org.eclipse.xtext.common.Terminals is a library grammar shipped with Xtext and predefines the most common terminal rules, such as ID, STRING and INT. It also introduces single-line and multi-line comments as well as rules for whitespace, which may occur everywhere by default. You can open that grammar in the editor to have a look at these rules. It turns out that these set of rules are often the same and often used, so that most Xtext languages extend this grammar. However, it is just a library, so you won’t use it if it doesn’t fit your needs. Also you can use the grammar inheritance mechanism for your own grammar libraries, of course.
The next statement declares an EMF Ecore package, aka EPackage, to be derived from the grammar:
generate domainmodel "http://www.eclipse.org/xtext/example/Domainmodel"
Ecore EPackages are effectively a set of classes (in EMF they are called EClass) which are used to represent the in-memory-model of a text file. Instead of model we also sometimes refer to it as semantic model or _Abstract Syntax Tree (AST)_. Note, that in the case of Xtext it’s actually a graph rather than a tree, since it also contains crosslinks. But as the term AST is so commonly used and well known we are ignoring this minor detail.
In order to tell the parser which EClasses to use when creating the AST, we need to make them available by either importing existing Ecore packages or like in our example let Xtext derive a package from the grammar. An EPackage has a mandatory name and namespace URI (short: nsURI), so we have to declare these two values in that statement.
That’s all for the prelude, now we have to define the parser rules which effectively define a bi-directional mapping from text to a constructed AST. It is bi-directional because Xtext not only derives a parser from the grammar but also a serializer. Anyway, we will mostly talk about parsing in this chapter.
Ignore and delete the parser rules generated by the wizard and start by declaring an entry rule for our language. The entry rule for the parser will be called DomainModel. Xtext is fine with any name, it just picks the first rule in a grammar and interprets it as the entry rule.
As a DomainModel consists of one or more Entity entries, this rule delegates to another rule named Entity, which will be defined later on. As we can have an arbitrary number of entities within a model, the cardinality is ' *'. There are four different cardinalities available.
(no operator) | exactly one |
? | zero or one |
* | zero or more |
+ | one or more |
Each rule ends with a semicolon. So our first rule reads for now as
DomainModel :
Entity*;
An Xtext grammar does not only describe rules for the parser but also the structure of the resulting AST. Usually, each parser rule will create a new object in that tree. The type of that element can be specified after the rule name using the keyword returns. If the type’s name is the same as the rule’s name, it can be omitted as in our case. Which effectively means that
DomainModel returns DomainModel: ...
is the same as
DomainModel : ...
In order to connect the different objects together, we have to assign the elements returned by called rules to some feature of the current rule’s object. This is done by using so called assignments.
DomainModel :
(elements+=Entity)*;
The assignment with the ' +=' operator adds every object returned by the Entity rule to the elements referenced of the AST type Model. If you use ' +=' as opposed to ' =' the feature needs to be a list – the upperBound has to be -1. In addition to the two mentioned assignment operators there is a third one ' ?=' which is called boolean assignment and which sets a certain feature to true, if the part on the right hand side was parsed successfully. Here’s the list of available assignment operators:
feature=... | corresponds to setFeature(...) |
---|---|
list+=... | corresponds to getList().add(...) |
condition?=... | corresponds to setCondition(true) |
The next rule on our list is the rule Entity. Looking at our target syntax, each entity begins with the keyword 'entity' followed by the entity’s name and an opening curly brace (we will handle the extends clause in a second step). Then, an entity defines a number of features and ends with a closing curly brace. The rule looks like this:
Entity :
'entity' name=ID '{'
(features+=Feature)*
'}';
Keywords are simply declared as string literals within the rules. The name uses the ID rule from the mixed-in grammar org.eclipse.xtext.common.Terminals. The ID rule is a so called terminal rule which always returns simple data types, such as String, Date, or Integer. Actually any Java class can be a simple EMF EDataType. The ID rule recognizes alphanumeric words similar to identifiers in Java and returns the value as a String. You can navigate to its declaration using F3. As mentioned before the assignment operator ' =' denotes a single valued feature.
We want to allow inheritance for entities and therefore now add an optional ' extends' part.
Entity :
'entity' name=ID ('extends' superType=[Entity])? '{'
(features+=Feature)*
'}';
The question mark marks an element or as in this case an entire group of elements as optional. The other new thing is the right hand side of of the assignment superType=[Entity]. This is a cross-reference literal and states, that a cross-reference to an elsewhere declared Entity should be established. So in this case Entity does not point to the corresponding rule but to the EClass. As both are named equally this might cause a little confusion at first. The full syntax for a cross-reference is [TypeName|RuleCall] where RuleCall defaults to ID. This means that ? would work equally well. The parser only parses the name of the cross-referenced element using the ID rule and stores it internally. Later on the linker establishes the cross-reference using the name, the defined cross-reference’s type ( Entity in this case) and the defined scoping rules. Scoping is not covered in this tutorial since the domain model DSL works with the default scoping semantics. By default a namespace-based scoping is applied, which means that all entities living in the same namespace are possible candidates for a cross-reference. We talk a bit about this later on in this chapter when we introduce namespaces.
In our introductory example model we had not only defined entities but also two data types. So the DomainModel rule needs to be able to parse data types as well. As both entities and data types will be referenced from the yet to be defined Feature rule, we should introduce a common super type for them.
DomainModel :
(elements+=Type)*;
Type:
DataType | Entity;
DataType:
'datatype' name=ID;
A DomainModel now consists of types where a Type can either be a DataType or an Entity. Our rule Type will just delegate to either of them, using the ' |' alternatives operator. Now that we have a common super type for Entity and DataType we are able to refer to both types of elements with a single cross-reference.
Next up we need to define the syntax of Feature, which shouldn’t be surprising to you, as we do not use any new concepts here.
Feature:
name=ID ':' type=TypeRef;
In many target platforms there is a difference between the notion of a simple attribute and a reference. In relational databases, for instance, a table contains the values of attributes directly, but references are “modeled” by the means of foreign keys. In object-relational persistence technologies such as JPA or Hibernate references can have additional life-cycle semantics, which define what to do with referenced objects in certain circumstances. Therefore you typically make a distinction between the two concepts, as we do in the Domain Model Example you can create using the wizard. However, in this small scenario we do not need a separation therefore we stick with one single concept for features.
As the type reference has an additional property for the multiplicity (many or one), we make it a separate model element and parser rule TypeRef. The presence of the postfix ‘*’ in an input file should set the boolean flag to ‘true’ that indicates a multi-value type reference in the AST model. This is the purpose of the assignment operator ' ?='. The parser rule looks like this:
TypeRef :
referenced=[Type] (multi?='*')?;
Again we have specified a cross-reference, which this time not only allows to reference entities but all instances of type. So far this only includes data type and entities, but you may want to introduce additional concepts in a future version, like for instance a Value Object as proposed in Eric Evan’s great book Domain Driven Design.
In the end your grammar editor should look like this:
grammar org.eclipse.xtext.example.Domainmodel
with org.eclipse.xtext.common.Terminals
generate domainmodel "http://www.eclipse.org/xtext/example/Domainmodel"
DomainModel :
(elements+=Entity)*;
Type:
DataType | Entity;
DataType:
'datatype' name=ID;
Entity:
'entity' name=ID ('extends' superType=[Entity])? '{'
(features+=Feature)*
'}';
Feature:
name=ID ':' type=TypeRef;
TypeRef:
referenced=[Type] (multi?='*')?;
Now that we have the grammar in place and defined we need to run the generator that will derive the various language components. To do so locate the file GenerateDomainmodel.mwe2 file next to the grammar file in the package explorer view. From its context menu, choose
Run As -> MWE2 Workflow.That will trigger the Xtext language generator. It generates the parser and serializer and some additional infrastructure code. You will see its logging messages in the Console View.
If the code generation succeeded, we are now able to test the IDE integration. Right-click on the Xtext project and choose
Run As -> Eclipse Application.This will spawn a new Eclipse workbench with your projects as plug-ins installed. In the new workbench, create a new project ( File -> New -> Project... -> General -> Project) and therein a new file with the file extension you chose in the beginning ( *.dmodel). This will open the generated entity editor. Try it a bit and discover the default functionality for code completion ( CTRL SPACE even for cross-references), syntax highlighting, syntactic validation, linking errors, outline, find references etc.
At the current stage, our language puts all types into the same scope. As cross-references are specified by means of names, this can easily yield collisions as the model size increases. Avoiding such name clashes is why most programming languages invent concepts like namespaces or modules. We will now introduce the concept of a Package, which effectively is a namespace.
Our languages’s definition of a Package is as follows: A package has a name and contains other named elements and namespace-imports. A named element is either another Package or a Type. Within a package, elements must have unique names. The fully qualified name (FQN) of a named element consists of the FQN of its package, a separator ' .' and its own simple name. Elements refer to each other using their qualified names which may be relative to their common namespaces or absolute. By means of import statements, these names can be further abbreviated.
That might sound a bit complicated at first but the good thing is that Xtext comes with exactly such a default semantic for namespaces, which feels very natural in practice. Although all aspects of this semantic are adaptable, we will stick to the defaults within this tutorial.
Before we extend the grammar, let’s have a look at how a the example model would look like when using the newly introduced concepts:
package my.types {
datatype String
datatype Boolean
}
package my.entities {
import my.types.*
entity Session {
title: String
isTutorial : Boolean
}
entity Conference {
name : String
attendees : Person*
speakers : Speaker*
}
entity Person {
name : String
}
entity Speaker extends Person {
sessions : Session*
}
}
Let’s start to modify our grammar now. First we introduce the rule AbstractElement:
AbstractElement:
PackageDeclaration | Type;
DomainModel should now call AbstractElement instead of Type:
DomainModel:
(elements+=AbstractElement)*;
To allow qualified names, we need a new rule QualifiedName. This rule returns a string. But we do not want this to be a terminal rule, as terminal rules have special semantics (they are used context-less in the lexer, which often causes problems). Instead we want it to be a parser rule, but one which doesn’t return an instance of an EClass from the referenced EPackage but just a plain String. Such rules are called data type rules, because they return instances of EDataType as opposed to instances of EClass. String is the most often used EDataType in this context and is therefore the default return type for data type rules. The rule looks like this:
QualifiedName:
ID ('.' ID)*;
Note, that Xtext automatically figures out that the rule is a data type rule rather then a normal parser rule, because it doesn’t contain any assignments and all rule calls go to either other data type rules or terminal rules. For a data type rule the parser simply concatenates the consumed text and returns it as a string. The transformation to a user defined data type is done by so called ValueConverters. But as we use strings we don’t need to care about ValueConverters here.
Next up we need to define the syntax of PackageDeclarations, which makes use of the QualifiedName rule but doesn’t make use of anything we haven’t yet talked about.
PackageDeclaration:
'package' name=QualifiedName '{'
(elements+=AbstractElement)*
'}';
Having qualified names at hand, we want to specify cross-references that way, too. As mentioned by default, Xtext assumes the rule ID as the syntax for cross-references, which has been fully sufficient so far. But now we want to allow fully qualified names as well, so we explicitly specify the syntax rule after a delimiter ' |':
fn.. Note, that the ' |' has nothing to do with the previously introduced alternative operator. In the context of a cross-reference it is simply a delimiter.
Entity:
'entity' name=ID ('extends' superType=[Entity|QualifiedName])? '{'
(features+=Feature)*
'}';
and
TypeRef:
referenced=[Type|QualifiedName] (multi?='*')?;
As the last step, we introduce imports. An Import is an instance of AbstractElement, too, so that it can occur as a child of DomainModel as well as of PackageDeclaration.
AbstractElement:
PackageDeclaration | Type | Import;
An imported namespace is not just a qualified name but it also allows an optional wildcard character ' *' at the end, so that multiple names can be imported with one statement. That requires a new data type rule which we name QualifiedNameWithWildCard:
QualifiedNameWithWildCard:
QualifiedName '.*'?;
The declaration of the Import rule should look familiar by now:
Import:
'import' importedNamespace=QualifiedNameWithWildCard;
The default scoping implementation is based on naming conventions. First everything which has a name is referenceable. By default something has a name if it has a property 'name'. If such an EAttribute is available, the default implementation computes a qualified name by asking the container for its name and concatenates the two names separated by a dot. The computation of qualified names can be arbitrarily changed by implementing an IQualifedNameProvider. The other naming convention is that if some element has an EAttribute ‘importedNamespace’ that value is used as a namespace import and is automatically prefixed to any simple names used within that namespace. Also the asterisk ' *' is used as a wildcard by default.
That’s all for the grammar. It should now read as
grammar org.eclipse.xtext.example.Domainmodel
with org.eclipse.xtext.common.Terminals
generate domainmodel "http://www.eclipse.org/xtext/example/Domainmodel"
DomainModel:
(elements+=AbstractElement)*;
AbstractElement:
PackageDeclaration | Type | Import;
Import:
'import' importedNamespace=QualifiedNameWithWildCard;
PackageDeclaration:
'package' name=QualifiedName '{'
(elements+=AbstractElement)*
'}';
Type:
Entity | DataType;
DataType:
'datatype' name=ID;
Entity:
'entity' name=ID ('extends' superType=[Entity|QualifiedName])? '{'
(features+=Feature)*
'}';
Feature:
name=ID ':' type=TypeRef;
TypeRef:
referenced=[Type|QualifiedName] (multi?='*')?;
QualifiedName:
ID ('.' ID)*;
QualifiedNameWithWildCard:
QualifiedName '.*'?;
Now you should regenerate the language infrastructure as described in the previous section, and give the editor a try. You can even split up your model into smaller parts and have cross-references across file boundaries, as long as the referenced models are on the classpath.