Now that we know how to create a language we should talk about processing the parsed models somehow. There are typically two useful things one can do with Xtext models: One is translating them to another programming language, i.e. writing a code generator, the other is loading them at runtime and use them dynamically. We’ll talk about code generation later. In this chapter we want to see how to interact with Xtext models programmatically.
Text files parsed by Xtext are represented as object graphs in memory. We call these object graphs Abstract Syntax Tree (AST), semantic model or simply model interchangeably. In Xtext models are implemented using the Eclipse Modeling Framework (EMF), which can be seen as a very powerful version of JavaBeans. It not only provides the typical getter and setter methods for the different features of a model element but also comes with an long list of advanced concepts and semantics, which are extremely useful in the context of Xtext.
EMF models can be persisted by the means of a so called Resource. Xtext languages implement the Resource interface which is why you can use the EMF API to load a model into memory (and also save them):
new DomainmodelStandaloneSetup().createInjectorAndDoEMFRegistration();
ResourceSet rs = new ResourceSetImpl();
Resource resource = rs.getResource(URI.createURI("./mymodel.dmodel"), true);
EObject eobject = resource.getContents().get(0);
The first line initializes the language infrastructure to run in standalone mode. That is EMF is designed to work in Eclipse and therefore makes use of Equinox extension points in order to register factories and the like. In a vanilla Java project there is no Equinox, hence we do the registration programmaticly. The generated MyLanguageStandaloneSetup class does just that. You don’t have to do this kind of initialization when you run your plug-ins within Eclipse, since in that case the extension point declarations are used.
The other thing the StandaloneSetup takes care of is creating a Guice injector. The use of Guice and Dependency Injection is explained here.
Now that the language infrastructure is initialized and the different contributions to EMF are registered, we want to load a Resource. To do so we first create a ResourceSet, which as the name suggests represents a set of Resources. If one Resource references another Resource, EMF will automatically load that other Resource into the same ResourceSet as soon as the cross-reference is resolved. Resolution of cross-references is done lazy, i.e. on first access.
The 4th line loads the Resource using the resource set. We pass in a URI which points to the file in the file system. EMF’s URI is a powerful concept. It supports a lot of different schemes to load resources from file system, web sites, jars, OSGi bundles or even from Java’s classpath. And if that is not enough you can come up with your own schemes. Also a URI can not only point to a resource but also to any EObject in a resource. This is done by appending a so called URI fragment to the URI.
The second parameter denotes whether the resource should automatically be loaded if it wasn’t already before. Alternatively we could have written
Resource resource = rs.getResource(URI.createURI("./mymodel.dmodel"), false);
resource.load(null);
The load method optionally takes a map of properties, which allows to define a contract between a client and the specific implementation. In Xtext, for instance, we use the map to state whether cross-references should be eagerly resolved. In order to find out what properties are supported, it’s usually best to look into the concrete implementations. That said, in most cases you don’t need to pass any properties at all.
The last line
EObject eobject = resource.getContents().get(0);
assigns the root element to a local variable. Actually it is the first element from the contents list of a Resource, but in Xtext a Resource always has just one root element.
We have previously talked about Ecore models, which effectively is an EPackage containing a any number of EClasses with EAttributes and EReferences. Ecore defines additional concepts but they are not so important here. An EObject is an instance of an EClass. For instance, the root element of a domain model would be an instance of the EClass DomainModel, as defined in the grammar in the last chapter. EMF provides a reflection layer to work with EObjects in a generic way.
assertEquals("DomainModel", eobject.eClass().getName());
It is also possible to create new instances or get and set values using the reflection layer. That reflection layer is very helpful when creating generic libraries for EMF models, however if you know what kind of model you are working with it’s much nicer to program against the generated Java classes. As we know that the root element is an instance of DomainModel we can just cast it to the Java interface generated by EMF:
DomainModel dm = (DomainModel) eobject;
The generated Java types provide getter and setter methods for each EAttribute and EReference, so that you can easily navigate the model using Java:
EList<AbstractElement> elements = domainModel.getElements();
for (AbstractElement abstractElement : elements) {
if (abstractElement instanceof Entity) {
Entity entity = ((Entity)abstractElement);
System.out.println("entity "
+ entity.getName()
+ " extends "
+ entity.getSuperType().getName());
}
}
Note that you’ll find a lot of convenience API in EMF’s EcoreUtil and Xtext’s EcoreUtil2.
In many situations the information from the AST is sufficient, but in some situations you need additional syntactical information. In Xtext not only an AST is constructed while parsing but also a so called parse tree, which contains all the textual information chunked in so called tokens. The parse tree, also called node model, consists of two different kinds of nodes.
LeafNodes as the name suggests represent the leafs of the parse tree. Each leaf node represents one token. If you go through all leaf nodes of a parse tree and concatenate all the tokens to one string, you’ll end up with the whole textual representation of the model, including all hidden tokens such as whitespace and comments. The following code does exactly that:
CompositeNode node = NodeUtil.getNode(domainModel);
Iterable<AbstractNode> contents = NodeUtil.getAllContents(node);
StringBuffer text = new StringBuffer();
for (AbstractNode abstractNode : contents) {
if (abstractNode instanceof LeafNode)
text.append(((LeafNode)abstractNode).getText());
}
System.out.println(text);
In addition to the text LeafNodes also hold information about the line, the offset and the length of that token.
The other node type is called CompositeNode and is created for almost each grammar element. A composite node can contain other composite nodes and leaf nodes. The super type of both node types is AbstractNode. One can find a couple of convenience methods in NodeUtil and ParseTreeUtil.
Also the grammar is represented as an EMF model and can be used in Java. In fact each node of the node model references the element from the grammar which was responsible for parsing or lexing that node:
DomainModel domainModel = (DomainModel) eObject;
CompositeNode node = NodeUtil.getNode(domainModel);
ParserRule parserRule = (ParserRule) node.getGrammarElement();
assertEquals("DomainModel", parserRule.getName());
In a running Xtext Workbench, there are a number of components which access the semantic model of an open editor, i.e. the parser, the linker, the validator, the outline, the index builder etc. While some of these components are executed by the display thread, others like the parser or the indexer use different concurrent threads to not deteriorate the editing experience. If you for example want to have a consistent outline of your model, it is essential to keep other threads from modifying the model while the outline component reads it.
Many of the prominent locations where users can hook in their own code in Xtext are already called from within a thread safe context, e.g. the API for quick fixes. Consequently, the following usually applies only if you add additional functionality on top of Xtext, e.g. custom UI actions.
Each XtextEditor uses an IXtextDocument to store its model. To avoid synchronization trouble, neither of them provides direct access to the XtextResource storing the semantic model. Instead, the IXtextDocument has two methods readOnly() and modify(). Both take an argument of type IUnitOfWork (<T>, IXtextResource) which defines a method <T> exec(IXtextResource) that contains what you want to do with the model and allows to deliver a result of arbitrary type.
So here is an example of safely reading a model:
IXtextDocument myDocument = ...;
String rootElementName = myDocument.readOnly(
new IUnitOfWork(){
public String exec(IXtextResource resource) {
MyType type = (MyType)resource.getContents().get(0);
return myType.getName();
}
});
Direct write-access on the document is usually only performed inside the framework. If you want to change a document by means of its semantic model, you should rather use an IDocumentEditor which uses the modify() method internally but takes care of synchronizing the node model, too:
@Inject
private IDocumentEditor documentEditor;
public void setRootName(IXtextDocument myDocument,
final String newName) {
documentEditor.process(
new IUnitOfWork.Void() {
public void process(IXtextResource resource) {
MyType type = (MyType)resource.getContents().get(0);
myType.setName(newName);
}
}, myDocument);
}
Let’s summarize what we have learned: An Xtext model is loaded by an EMF Resource. The main model is represented as an instance of so called EClasses which are themselves declared within Ecore models. A parse tree is created as well, which effectively acts as a tracing model between the text, the AST and the grammar. The following diagram illustrates the four different kinds of models.