ANTLR Grammar Optimisation: What Worked and What Didn't
Atfinity currently relies on ANTLR to properly parse data when configuring our software. We have therefore committed ourselves to getting the most out of ANTLR, optimising the process as much as possible.
In this article, we want to cover our most ambitious round of optimisations so far that aims to tackle the ANTLR grammar, the parsing process, visitors, listeners, and more. This project was done in collaboration with the ZWAH School of Engineering in Zurich, who we have worked with in the past, and Luca Marceca, who directly worked on and authored the project.
With 25 different methods performed and measured, we want to go over what worked, what backfired, and what failed to move the needle, to hopefully help you achieve similar results.

Understanding ANTLR 4 grammar
First off, we’d like to explain what ANTLR is currently capable of. The change from ANTLR 3 to ANTLR 4 saw many important improvements, as LL(*) parsing was replaced by ALL(*) or Adaptive LL(*) parsing. This then helped tackle many issues present with ANTLR 3, such as:
- Elimination of backtracking – Unlike ANTLR 3, which relies on backtracking as a fallback strategy when DFA-based lookahead fails, ANTLR 4 dynamically analyses multiple parsing paths, reducing inefficiencies caused by backtracking.
- Dynamic DFA construction – ANTLR 4 does not compute all possible lookahead sequences in advance. Instead, it creates DFAs dynamically based on the input sequence, improving efficiency compared to ANTLR 3’s static DFA computation.
- Support for left recursion – ANTLR 4 introduces a left-recursion detection and elimination mechanism, allowing it to handle left-recursive rules, which ANTLR 3 cannot process.
- Subparser execution for decision-making – Instead of relying on static DFA-based decisions like ANTLR 3, ANTLR 4 spawns subparsers for different parsing alternatives. These subparsers run in pseudo-parallel, and unsuccessful paths are discarded dynamically.
- Graph-Structured Stack (GSS) – ANTLR 4 introduces GSS, which links multiple parser contexts efficiently, avoiding cycles that could cause infinite loops. This enhances handling of recursive and nested structures.
With all of that being said, ANTLR 4 is not fully optimised for speed but rather user-friendliness. Therefore, many large and small tweaks can be made to the grammar, parsers, visitors and listeners to significantly lower the processing time.
What worked
In this section, we will cover 12 optimisations that when implemented together, have led to an optimisation of 85.6% for unit tests with the generated ANTLR Python parser. As you will see later on, there is potential for additional optimisations but currently, these 12 produced the best overall speed increase when implemented together.
With that being said, it’s worth keeping in mind that optimisations of this nature are dependent on the grammar being used. Therefore, while this can serve as a good starting point for figuring out how to optimise your own grammar, results may vary.
1. Inlining build-in functions (optimisation of 50%)
Currently, ANTLR 4, using the ALL(*) algorithm, follows each nonterminal and creates new subparsers for each branch, which slows things down significantly. However, we’ve found that removing explicitly defined lexer rules drastically reduces the number of subparses that need to be created.
We built a proof-of-concept model and found that while the original parser has 12347 lines of code, the newly optimised parser only had 7325. This number can be lowered even further by outsourcing the build-in-function-call rule from the grammar to the visitor, at just 6880 lines of code, but this comes with added issues which we will talk about later on.
Before:
Buildin_function_call:
buildin_functions OPENING_PARENTHESES (args | named_args)?
CLOSING_PARENTHESES;Buildin_functions: (add_days_ | add_months_ | …)
#method_name;add_days_:
'ADD_DAYS' | 'add_days';add_months_: 'ADD_MONTHS' | 'add_months';
After:
buildin_function_call: name OPENING_PARENTHESES (args |
named_args)? CLOSING_PARENTHESES;
name: 'ADD_DAYS' | 'add_days' | 'ADD_MONTHS' | 'add_months'| …;
2. 2-stage parsing (optimisation of 15.2%)
Currently, ANTLR 4 automatically switches between SLL and LL, which can lead to some wasted effort. To speed things up, we used a 2-stage parsing strategy in tandem with Speedy ANTLR, which we will talk about in a following section. We also implemented a strict order to ensure that the switch is only made when necessary. Namely:
- Parsing is initially done in SLL. Most decisions are SLL (context-free) and therefore it should have priority.
- Only when an error appears does it switch to LL mode (context-aware but slower) and reprocess.
To facilitate this in practice, we implemented the following error strategy:
- SLL mode → BailErrorStrategy (abort on errors, preventing wasted effort).
- LL mode → DefaultErrorStrategy (standard error recovery).
Defining more LL(1) productions in the grammar further reduced unnecessary switches, making parsing even faster.
3. Generalisation of the method call rules (optimisation of 6%)
This optimisation merges redundant rules for built-in and user-defined function calls into a single, more flexible rule. Instead of maintaining separate definitions, a generalised function_call rule is introduced, allowing both function types to share a common structure.
Before
buildin_function_call:
function_name OPENING_PARENTHESES (args | named_args)?
CLOSING_PARENTHESES;
user_function_call:
function_name OPENING_PARENTHESES args? CLOSING_PARENTHESES;
After
function_call:
name OPENING_PARENTHESES (args | named_args)?
CLOSING_PARENTHESES;
name: /* Buildin functions… */ | IDENTIFIER;
This approach simplifies the grammar, reduces redundancy, and enhances flexibility, allowing for future expansions without modifying the core syntax.
4. Left-factorisation of the declaration sub-production (optimisation of 2.2%)
The declaration production contains left-factorisable structures, which means that multiple alternatives can be merged. For example, in the following existence_identifier, the three IDENTIFIER alternatives can be merged into just one.
Before:
existence_identifier:
IDENTIFIER IS #single_match
| IDENTIFIER IS_OPTIONAL #single_match_optional
| IDENTIFIER IS_ALL #multiple_matches_single_identifier
| identifiers_to_unpack rest_identifier IS_ALL
#multiple_matches_multiple_identifiers;
After:
existence_identifier:
IDENTIFIER (IS | IS_OPTIONAL | IS_ALL) #single_match
| identifiers_to_unpack rest_identifier IS_ALL
#multiple_matches_multiple_identifiers;
5. Rule resolutions in declaration sub-production (optimisation of 2.7%)
We inlined the identifiers_to_unpack sub-production within the existence_identifier rule to reduce unnecessary lookahead checks.
Previously, existence_identifier had two alternatives, both starting with IDENTIFIER. This meant ANTLR 4 required a lookahead greater than 1 to determine whether the input should match the first or second alternative.
By directly integrating identifiers_to_unpack, the parser can now immediately decide which alternative to apply, improving efficiency in SLL mode. The logic for handling single vs multiple identifiers was shifted to the visitor, where we simply check if the single match label is Python type None.
Before:
existence_identifier:
IDENTIFIER (IS | IS_OPTIONAL | IS_ALL) #single_match
| identifiers_to_unpack rest_identifier IS_ALL
#multiple_matches_multiple_identifiers;
identifiers_to_unpack:
IDENTIFIER (LIST_SEPARATOR IDENTIFIER)*;
After
existence_identifier:
IDENTIFIER (single=(IS | IS_OPTIONAL | IS_ALL) |
(LIST_SEPARATOR IDENTIFIER)* rest_identifier
IS_ALL)#match;
6. Merging the operators (optimisation of 3%)
This optimisation focuses on simplifying operator handling while preserving precedence and associativity in expressions. It can be broken down into the following adjustments.
Merging infix operators
- Previously, some operators (like AND) were defined indirectly through separate non-terminals.
- The new approach directly integrates lexer rules into the expression rule, making all operators structurally uniform.
- This reduces unnecessary productions and improves parsing efficiency.
Grouping operators with equal precedence
- Operators with the same priority (e.g., AND and AND_THEN) are combined into single rules using |.
- This eliminates redundancy while maintaining correct parsing behavior.
Preserving precedence in arithmetic operators
- Unlike logical operators, arithmetic operators cannot be merged indiscriminately because different operators have different precedence (e.g., TIMES has higher precedence than PLUS).
- Merging them into a single rule would break the order of operations, leading to incorrect parse trees.
Handling associativity correctly
- ANTLR defaults to left-associativity, which works for most operators.
- However, exponentiation (POWER) should be right-associative (i.e., a^b^c should be parsed as a^(b^c)).
- Using <assoc=right> ensures that expressions like a^b^c are interpreted correctly.
7. Merging variable accesses using Optional (optimisation of 1.4%)
When two member access expression definitions each begin with the same two structures, they can be merged by making one alternative optional.
Before:
expression:
base=IDENTIFIER member_access+ #alternative1
| base=IDENTIFIER member_access+ '.key()' #alternative2
Since .key() is the only difference between the two alternatives, they can be merged in the following way:
expression:
base=IDENTIFIER member_access+ key='.key()'? #alternative1
This way, a lookahead of at least two is avoided and the logic for .key() can be handled later on rather than on first parse.
8. Merging variable accesses - clever left factorisation (optimisation of 2.1%)
Common prefixes can be factorised into a single rule in order to avoid ambiguity and the need for lookahead. For example, in the code:
expression:
structure=expression member_access+ #alternative1
| base=IDENTIFIER member_access* translated_member_access
#alternative2
| structure=expression translated_member_access #alternative3
Since both alternative1 and alternative3 begin with expression, left factorisation can be used in the following way:
expression:
structure=expression (alt1 = member_access+ | alt2 =
member_access* translated_member_access)
#structure_member_access_expression
By merging the alternatives into a single structure (structure_member_access_expression), the parser can decide the correct path after recognising expression, instead of having to guess between alternatives upfront.
9. Removing the bracket expression (optimisation of 2%)
The expression:
expression: OPENING_PARENTHESES expression CLOSING_PARENTHESES
is used in many expression rules of language grammars, such as Rula to ensure that an opening parenthesis is always followed by a closing parenthesis. However, in the previous grammar, this expression is already covered by another subrule, list_of_expressions, which can lead to ambiguity.
Therefore, that expression can be completely removed from the expression production by utilising the following format:
list_of_expressions’:
OPENING_PARENTHESES (expression) CLOSING_PARENTHESES;
10. Factorisation of the build-in-function-call rule (optimisation of >1%)
The previous buildin_function_call rule has three alternative paths. They differ only in the
argument definitions. In principle, no or an infinite number of arguments can be passed. The new grammar rule combines this information into one rule.
Before:
buildin_function_call:
(buildin_functions OPENING_PARENTHESES args
CLOSING_PARENTHESES)
| (buildin_functions OPENING_PARENTHESES named_args
CLOSING_PARENTHESES)
| (buildin_functions OPENING_PARENTHESES CLOSING_PARENTHESES);
After
buildin_function_call:
buildin_functions OPENING_PARENTHESES (args | named_args)?
CLOSING_PARENTHESES;
11. Factorisation of the user function call rule (optimisation of >1%)
The user function calls are defined syntactically in a similar way to the built-in function calls. They can also be reduced to a subrule;
Before:
user_function_call:
(function_name OPENING_PARENTHESES args CLOSING_PARENTHESES)
| (function_name OPENING_PARENTHESES CLOSING_PARENTHESES);
After
user_function_call:
function_name OPENING_PARENTHESES args? CLOSING_PARENTHESES;
12. Lexer rule pattern: comment*
This optimisation is unique as it doesn’t directly affect the parsing time. However, it does help straighten out the syntax, which is why it is still very valuable.
The existing line comment lexer rule contained a | operator within the expression [\n|\r], which incorrectly allowed | as a valid character instead of representing "or" between \n and \r.
To correct this, we revised the lexer rule to properly define line break characters (\n or \r) without including |, ensuring accurate comment termination and preventing unintended parsing behaviour.
Before:
COMMENT: '#' ~[\n|\r]* -> channel(HIDDEN);
After:
COMMENT: '#' ~[\r\n]* -> channel(HIDDEN);
Previous optimisation - Speedy ANTLR
Since ANTLR is crucial for Atfinity’s software, this isn’t the first time we’ve put in the work to optimise the process. Specifically, we’ve used Speedy ANTLR to automatically create both a Python and C++ backend. This significantly speeds up the parsing process as we can take advantage of the better performance that comes with C++ and then automatically translate it all back into Python.
What backfired
We also want to give attention to a special case of sorts. Namely, the following optimisation actually ended up increasing the processing time, showcasing a potential pitfall when attempting to adjust ANTLR’s grammar.
1. Merging variable accesses using semantic predicates (optimisation of -6%)
Semantic predicates can be used to solve cases of ambiguity. For example, in the following code:
base=IDENTIFIER member_access+ #alternative1
| structure=expression member_access+ #alternative2
| structure=expression translated_member_access #alternative3
| base=IDENTIFIER member_access+ '.key()' #alternative4
| structure=expression OPENING_SQUARE_BRACKET member=expression
CLOSING_SQUARE_BRACKET #alternative5
There are five different rules for variable access, some of which overlap. Namely, alternative1 and alternative2 can both start with IDENTIFIER, leading to ambiguity. This happens because expression (used in alternative2) can also begin with IDENTIFIER, making it unclear which rule to follow.
To fix this, the rules are merged, and a semantic predicate is added. The new version ensures that the alt1 branch (member access) only activates if the previous token is not an IDENTIFIER. This eliminates the ambiguity by adding an extra condition at runtime rather than relying only on lookahead parsing.
base=IDENTIFIER member_access+ key='.key()'? #alternative1
| structure=expression ({self._input.LT(-1).type !=
self.IDENTIFIER}? alt1=member_access+
| alt2=OPENING_SQUARE_BRACKET member=expression
CLOSING_SQUARE_BRACKET
| alt3=translated_member_access) #alternative2
What didn’t work
In this section, we’d like to cover a few optimisations that didn’t pan out as expected. To be more precise, all of the following optimisations ended up not affecting the processing time in any meaningful way. Therefore, while they might serve a purpose in a different context, in terms of just optimising ANTLR’s grammar for faster processing, they are not worth implementing in their current state.
1. Setting available storage space
As ANTLR is largely implemented in Java, it is recommended to use the ANTLR tool as a JAR file and execute it with Java. The Java parameter -Xmx500M can be used to define the maximum heap size in bytes (in this case, 500 megabytes). However, a larger heap size can help to process larger inputs without frequent garbage collection and thus speed up overall processing. Therefore, changing the parameter to -Xmx12g and thus allocating 12 gigabytes can help with processing speed.
2. Same parser instance via multiple parsing
In ANTLR 4 there is currently no persistence of the lookahead DFAs, which emerge from the static ATNs and are required to make a decision dynamically. To try and remedy this issue in a more straightforward way, the Python parser generator template (sa_X.pyt) in the SA library was extended with the following snippet:
# Parse
global parser
if parser is None:
parser = {{grammar_name}}Parser(token_stream)
else:
parser.setTokenStream(token_stream) # includes parser.reset()
This way, the parser is only re-instantiated every first time. Each subsequent time, i.e. each time SA is called with a new input, the existing parser instance is used and the DFA is passed on.
However, this only affects the current parsing run and the DFA can grow indefinitely very quickly. For this reason, there is a discussion about alternative caching strategies within the open source ANTLR project that could be more effective.
3. Explicit definition of lexer rules
In the generated Python parser, symbolicNames store token names, while literalNames contain lexemes. However, both lists included <INVALID> entries due to implicitly defined lexemes within productions.
To resolve this, we explicitly defined all lexemes as lexer rules, ensuring a clean mapping between symbolicNames and literalNames. This adjustment can potentially help streamline parsing and improve performance by reducing unnecessary lookups.
4. Lexer rule pattern: whitespace and newline
For spaces and line breaks, the following pattern can be used, which combines all cases in one rule.
Before:
WHITESPACE: ('. ' | ' .' | ' ' | '\t')+ -> channel(HIDDEN);
NEWLINE: ('\r'? '\n' | '\r')+ -> channel(HIDDEN);
After:
WHITESPACE: [ \t\r\n\u000C]+ -> skip;
5. Reordering subrules - declaration
Reordering the subrules can improve efficiency, especially in 2-stage parsing with SLL mode.
For example, let’s say the first subrule starts with existence_identifier, which reduces to an IDENTIFIER token. The second subrule begins with no, which is already a valid IDENTIFIER. This created ambiguity: when encountering no, the parser could mistakenly match the first rule before realising the second rule was the correct one.
In ANTLR 3, this would trigger backtracking, forcing the parser to restart and try the correct rule. While ANTLR 4 evaluates subrules in parallel, SLL mode (used in 2-stage parsing) relies on a simpler, faster lookahead mechanism. If subrules are ordered more predictably, SLL can resolve them without unnecessary retries, reducing parsing overhead and improving performance.
6. Left-factorisation of the declaration rule
In Rula, when two subrules of a declaration rule differ only in the first symbol, left factorisation can be carried out to merge the two rules. This reduces the number of subrules for the declaration rule, however, both visitors must also be merged.
7. Use of concrete ANTLR operators in the type_or_not subproduction
Two sub-rules of type_or_not can be broken down to one by setting an optional operator ? In addition, the asterisk operator * is changed to the plus operator + in role_choice_block. This eliminates the ambiguity between role_choice_block and type_or_not. Previously, a simple IDENTIFIER token could be covered by both productions.
Before:
type_reference:
(type_or_not | role_choice_block) (LIST_SEPARATOR
(type_or_not | role_choice_block))*;
type_or_not:
(IDENTIFIER | 'not' IDENTIFIER);
role_choice_block:
IDENTIFIER (OR IDENTIFIER)*;
After:
type_reference:
(type_or_not | role_choice_block) (LIST_SEPARATOR
(type_or_not | role_choice_block))*;
type_or_not:
‘not’? IDENTIFIER;
role_choice_block:
IDENTIFIER (OR IDENTIFIER)+;
8. Subrule rearrangement - expression
The order in which ALL(*) is used is particularly important in ANTLR 4 if there is an ambiguity conflict. With SLL, as used with 2-stage parsing, the order is important with regard to backtracking, as ANTLR 4 then tests each subrule individually (starting with the first subrule). A reordering can therefore be carried out in order to achieve optimisation in these scenarios.
From this, the strategy can be derived to move SLL-parseable rules higher up so that ANTLR does not first have to try rules that are not in SLL and therefore has to switch to the second phase for 2-stage parsing. However, the order cannot be changed at will, as certain sub-rules have precedence over other sub-rules. In this case, the order would be:
- Variable accesses
- Statements
- Primaries
- Token-dependent structures
- Function and method calls
- Arithmetic and logical operations
9. Elimination of left-recursion
Left recursion occurs when a rule refers to itself at the beginning, causing infinite recursion in ANTLR's LL(k) parsing. To eliminate this, grammar is restructured using right recursion or iteration, ensuring that parsing moves forward step by step.
This is done by:
- Defining a base case (e.g., literals, function calls) with the highest precedence, ensuring a clear starting point.
- Creating a top-level rule that references only the lowest-precedence operation, maintaining usability.
- Building a precedence chain, where higher-precedence operations are progressively defined in terms of lower ones, preventing ambiguity and infinite loops.
Instead of recursion, repetition operators ((...)*) or right-recursive structures are used to ensure efficient and predictable parsing. This keeps the parse tree shallower and avoids unnecessary backtracking, making it easier to traverse with visitors.
Other potential optimisations
Lastly, we’d like to cover a few optimisations that have shown potential but don’t completely fit into the picture right now - be it because other methods are more optimal for our goals or because their implementation is outside the scope of the original research. However, we believe they’re still worth covering as they may open the doors for further optimisations when used in a different context.
1. Outsourcing the build-in-function-call rule from the grammar to the visitor (optimisation of 55%)
The function names of the built-in functions are all listed individually in the original grammar. In this optimisation step, these are moved to the respective visitor. A list with the names that were previously defined in the grammar is defined in the corresponding visitor. It is checked whether the parsed function name is included. If so, the previous format is returned in order to achieve the same result.
You might be wondering why this optimisation isn’t part of the “optimal build”? Simply put, this optimisation works better in theory than in practice, as while unit tests will be much faster, visitor times will be much slower. Furthermore, it is difficult to implement together with all of the other optimisation methods we’ve covered so far. Therefore, while it might have potential, currently the first optimisation in this article is a much better choice, as it provides similar results while also working together with other methods and not overloading the visitors.
2. Generalisation of the function call rules (optimisation of 3%)
The following three method call categories differ only in the method name and the argument type. They can therefore be reduced to one production.
Before:
method_call:
'.' method OPENING_PARENTHESES args? CLOSING_PARENTHESES;
method: 'keys’;
arrow_function_method: 'map' | 'filter’;
named_args_method: 'format';
named_args_method_call:
'.' named_args_method OPENING_PARENTHESES named_args
CLOSING_PARENTHESES;
arrow_function_method_call:
'.' arrow_function_method OPENING_PARENTHESES arrow_function
CLOSING_PARENTHESES;
After
method_call:
'.' method OPENING_PARENTHESES (args | named_args |
arrow_function)? CLOSING_PARENTHESES;
method: 'keys' | 'format' | 'map' | 'filter';
However, when making the proof of concept, this optimisation method did not show measurable results.
3. Merging variable accesses v2 (optimisation of >1%)
Three member access expression definitions each begin with the same left-recursive term expression. These can be converted into a rule using left factorisation. ANTLR therefore only has to convert one rule from left recursion and the expression rule also becomes more compact.
Before:
expression:
structure=expression member_access+ #alternative1
| structure=expression OPENING_SQUARE_BRACKET member=expression
CLOSING_SQUARE_BRACKET #alternative2
| structure=expression translated_member_access #alternative3
After:
expression:
structure=expression (alt1=member_access+ |
alt2=OPENING_SQUARE_BRACKET member=expression
CLOSING_SQUARE_BRACKET |
alt3=translated_member_access) #alternative1
4. Event listeners and visitors
All of the optimisations we’ve talked about here have to do with the parser itself, with the overall optimisation of 85.6% relating only to the Python parser and the transformation from Rula to R3. However, upon inspecting the results, we found that event listeners and visitors play a crucial role in processing time.
Namely, we’ve found that evaluation visitors take up visitor time of 2 ms in our environment, leaving room for potential optimisations. Caching strategies for the evaluation viewers could also prove to yield further improvements.
More about ANTLR and Atfinity
As a no-code platform that helps financial institutions automate key processes such as KYC/KYB, loan origination and customer onboarding, the ability to take raw data and efficiently parse, structure, and pass it on to relevant systems is essential for Atfinity. Furthermore, we pride ourselves on our ability to adapt to large changes in the industry swiftly and efficiently, implementing the needed changes for our clients in only a few days or weeks.
For all of these reasons, getting the most that we can from ANTLR was a necessity. And with this being our second round of major optimisations, we believe we are now in a better position than ever. However, that doesn’t mean that we’re going to stop any time soon. Therefore, if you would like to stay in the loop on further developments and optimisations, make sure to keep a close eye on Atfinity.
Similarly, if you’ve made it to the end of this article and see the unique potential both Atfinity and ANTLR have, we suggest you take a look at our career page - as if you’re interesting in data parsing and grammar optimisation, working with the actual software should be a win for all parties involved.
Conclusion
Lastly, we’re going to cover some background information about these optimisations and this article in general. Namely, all of the optimisations mentioned above are from a much longer and more detailed paper, written by Luca Marceca, as part of a collaboration between Atfinity and the ZWAH School of Engineering in Zurich.
Therefore, if you want to learn about these optimisations in more detail, as well as see how exactly all of the tests were created and performed, you can do so by reading the full paper. And, of course, I have to give props to Luca for making all of this possible!