A formatter can be implemented via the IFormatter service. Technically speaking, a formatter is a Token Stream which inserts/removes/modifies hidden tokens (whitespace, line-breaks, comments).
The formatter is invoked during the serialization phase and when the user triggers formatting in the editor (for example, using the CTRL+SHIFT+F shortcut).
Xtext ships with two formatters:
The OneWhitespaceFormatter simply writes one whitespace between all tokens.
The AbstractDeclarativeFormatter allows advanced configuration using a FormattingConfig. Both are explained in the next chapter.
A declarative formatter can be implemented by sub-classing AbstractDeclarativeFormatter, as shown in the following example:
public class ExampleFormatter extends AbstractDeclarativeFormatter {
@Override
protected void configureFormatting(FormattingConfig c) {
ExampleLanguageGrammarAccess f = getGrammarAccess();
c.setAutoLinewrap(120);
// find common keywords an specify formatting for them
for (Pair<Keyword, Keyword> pair : f.findKeywordPairs("(", ")")) {
c.setNoSpace().after(pair.getFirst());
c.setNoSpace().before(pair.getSecond());
}
for (Keyword comma : f.findKeywords(",")) {
c.setNoSpace().before(comma);
}
// formatting for grammar rule Line
c.setLinewrap(2).after(f.getLineAccess().getSemicolonKeyword_1());
c.setNoSpace().before(f.getLineAccess().getSemicolonKeyword_1());
// formatting for grammar rule TestIndentation
c.setIndentationIncrement().after(
f.getTestIndentationAccess().getLeftCurlyBracketKeyword_1());
c.setIndentationDecrement().before(
f.getTestIndentationAccess().getRightCurlyBracketKeyword_3());
c.setLinewrap().after(
f.getTestIndentationAccess().getLeftCurlyBracketKeyword_1());
c.setLinewrap().after(
f.getTestIndentationAccess().getRightCurlyBracketKeyword_3());
// formatting for grammar rule Param
c.setNoLinewrap().around(f.getParamAccess().getColonKeyword_1());
c.setNoSpace().around(f.getParamAccess().getColonKeyword_1());
// formatting for Comments
cfg.setLinewrap(0, 1, 2).before(g.getSL_COMMENTRule());
cfg.setLinewrap(0, 1, 2).before(g.getML_COMMENTRule());
cfg.setLinewrap(0, 1, 1).after(g.getML_COMMENTRule());
}
}
The formatter has to implement the method configureFormatting(...) which declaratively sets up a FormattingConfig.
The FormattingConfig consist of general settings and a set of formatting instructions:
setAutoLinewrap(int) defines the amount of characters after which a line-break should be dynamically inserted between two tokens. The instructions setNoLinewrap().???(), setNoSpace().???() and setSpace(space).???() suppress this behavior locally. The default is 80.
Per default, the declarative formatter inserts one whitespace between two tokens. Instructions can be used to specify a different behavior. They consist of two parts: When to apply the instruction and what to do.
To understand when an instruction is applied think of a stream of tokens whereas each token is associated with the corresponding grammar element. The instructions are matched against these grammar elements. The following matching criteria exist:
after(ele): The instruction is applied after the grammar element ele has been matched. For example, if your grammar uses the keyword “;” to end lines, this can instruct the formatter to insert a line break after the semicolon.
before(ele): The instruction is executed before the matched element. For example, if your grammar contains lists which separate their values with the keyword “,”, you can instruct the formatter to suppress the whitespace before the comma.
around(ele): This is the same as before(ele) combined with after(ele).
between(ele1, ele2): This matches if ele2 directly follows ele1 in the document. There may be no other tokens in between ele1 and ele2.
bounds(ele1, ele2): This is the same as after(ele1) combined with before(ele2).
range(ele1, ele2): The rule is enabled when ele1 is matched, and disabled when ele2 is matched. Thereby, the rule is active for the complete region which is surrounded by ele1 and ele2.
The term tokens is used slightly different here compared to the parser/lexer. Here, a token is a keyword or the string that is matched by a terminal rule, data type rule or cross-reference. In the terminology of the lexer a data type rule can match a composition of multiple tokens.
The parameter ele can be a grammar’s AbstractElement or a grammar’s AbstractRule. All grammar rules and almost all abstract elements can be matched. This includes rule calls, parser rules, groups and alternatives. The semantic of before(ele), after(ele), etc. for rule calls and parser rules is identical to when the parser would “pass” this part of the grammar. The stack of called rules is taken into account. The following abstract elements can not have assigned formatting instructions:
Actions. E.g. {MyAction} or {MyAction.myFeature=current}.
Grammar elements nested in data type rules. This is due to to the fact that tokens matched by a data type rule are treated as atomic by the serializer. To format these tokens, please implement a ValueConverter.
Grammar elements nested in CrossReferences.
After having explained how rules can be activated, this is what they can do:
setIndentationIncrement() increments indentation by one unit at this position. Whether one unit consists of one tab-character or spaces is defined by IIndentationInformation. The default implementation consults Eclipse’s PreferenceStore.
setIndentationDecrement() decrements indentation by one unit.
setLinewrap(): Inserts a line-wrap at this position.
setLinewrap(int count): Inserts count numbers of line-wrap at this position.
setLinewrap(int min, int default, int max): If the amount of line-wraps that have been at this position before formatting can be determined (i.g. when a node model is present), then the amount of of line-wraps is adjusted to be within the interval [ min, max] and is then reused. In all other cases default line-wraps are inserted. Example: setLinewrap(0, 0, 1) will preserve existing line-wraps, but won’t allow more than one line-wrap between two tokens.
setNoLinewrap(): Suppresses automatic line wrap, which may occur when the line’s length exceeds the defined limit.
setSpace(String space): Inserts the string space at this position. If you use this to insert something else than whitespace, tabs or newlines, a small puppy will die somewhere in this world.
setNoSpace(): Suppresses the whitespace between tokens at this position. Be aware that between some tokens a whitespace is required to maintain a valid concrete syntax.
Sometimes, if a grammar contains many similar elements for which the same formatting instructions ought to apply, it can be tedious to specify them for each grammar element individually. The IGrammarAccess provides convenience methods for this. The find methods are available for the grammar and for each parser rule.
findKeywords(String... keywords) returns all keywords that equal one of the parameters.
findKeywordPairs(String leftKw, String rightKw): returns tuples of keywords from the same grammar rule. Pairs are matched nested and sequentially. Example: for Rule: ‘(’ name=ID (‘(’ foo=ID ‘)’) ‘)’ | ‘(’ bar=ID ')' findKeywordPairs(“(”, “)”) returns three pairs.
Although inter-Xtext linking is not done by URIs, you may want to be able to reference your EObject from non-Xtext models. In those cases URIs are used, which are made up of a part identifying the resource and a second part that points to an object. Each EObject contained in a resource can be identified by a so called fragment.
A fragment is a part of an EMF URI and needs to be unique per resource.
The generic resource shipped with EMF provides a generic path-like computation of fragments. These fragment paths are unique by default and do not have to be serialized. On the other hand, they can be easily broken by reordering the elements in a resource.
With an XMI or other binary-like serialization it is also common and possible to use UUIDs. UUIDs are usually binary and technical, so you don’t want them in human readable representations.
However with a textual concrete syntax we want to be able to compute fragments out of the human readable information. We don’t want to force people to use UUIDs (i.e. synthetic identifiers) or fragile, relative, generic paths in order to refer to EObjects.
Therefore one can contribute a so called IFragmentProvider per language. It has two methods: getFragment(EObject, Fallback) to calculate the fragment of an EObject and getEObject(Resource, String, Fallback) to go the opposite direction. The Fallback interface allows to delegate to the default strategy – usually the fragment paths described above.
The following snippet from the GMF Example shows how to use qualified names as fragments:
public QualifiedNameFragmentProvider implements IFragmentProvider {
@Inject
private IQualifiedNameProvider qualifiedNameProvider;
public String getFragment(EObject obj, Fallback fallback) {
String qualifiedName = qualifiedNameProvider.getQualifiedName(obj);
return qualifiedName != null ? qualifiedName : fallback.getFragment(obj);
}
public EObject getEObject(Resource resource,
String fragment,
Fallback fallback) {
if (fragment != null) {
Iterator<EObject> i = EcoreUtil.getAllContents(resource, false);
while(i.hasNext()) {
EObject eObject = i.next();
String candidateFragment = (eObject.eIsProxy())
? ((InternalEObject) eObject).eProxyURI().fragment()
: getFragment(eObject, fallback);
if (fragment.equals(candidateFragment))
return eObject;
}
}
return fallback.getEObject(fragment);
}
}
For performance reasons it is usually a good idea to navigate the resource based on the fragment information instead of traversing it completely. If you know that your fragment is computed from qualified names and your model contains something like NamedElements, you should split your fragment into those parts and query the root elements, the children of the best match and so on.
Furthermore it’s a good idea to have some kind of conflict resolution strategy to be able to distinguish between equally named elements that actually are different, e.g. properties may have the very same qualified name as entities.