I am going to cover basics about ANTLR here and the output of the POC.
The Big Picture
* The lexical analyzer, or lexer, breaks up the input stream into tokens.
* Tokens are vocabulary symbols emanating from the lexical analyzer.
* The parser feeds off this token stream and tries to recognize the sentence structure.
* Abstract syntax tree (AST) is a TREE representation of the language syntax.
* Output can be AST or text ( using StringTemplates ).
Definitions, Tools and usage
* Grammar describes the syntax of a language.
* ANTLTWorks – IDE to write, verify, test, debug the grammar
* Commandline - java org.antlr.Tool grammar.g
* Generates Tokens, Lexer, Parser java classes
Imagine a maze with a single entrance and single exit that has words written on the floor. Every path from entrance to exit generates a sentence by “saying” the words in sequence. In a sense, the maze is analogous to a grammar that defines a language.
Grammar Example
prog : stat+ ;
stat : expr NEWLINE
| ID '=' expr NEWLINE
expr : multExpr (('+' |'-' ) multExpr)* ;
multExpr : atom ('*' atom)* ;
atom : INT
| ID
| '(' expr ')’ ;
ID : ('a'..'z' |'A'..'Z' )+ ;
INT : '0'..'9' + ;
WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;
Test Class
public class Test {
public static void main(String[] args) throws Exception {
// Create an input character stream from standard in
ANTLRInputStream input = new ANTLRInputStream(System.in);
// Create an ExprLexer that feeds from that stream
ExprLexer lexer = new ExprLexer(input);
// Create a stream of tokens fed by the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// Create a parser that feeds off the token stream
ExprParser parser = new ExprParser(tokens);
// Begin parsing at rule prog
AST vs Parse Tree for 3+4*5
An AST is to be distinguished from a parse tree, which represents the sequence of rule invocations used to match an input stream.
Grammar File Structure
Filename : grammarName.g
/** This is a document comment */
grammarType grammarName;
/** doc comment */
rule1 : ... | ... | ... ;
rule2 : ... | ... | ... ;
The order of the grammar sections must be as shown, with the rules appearing after all the other sections.
Sample Grammar
grammar T;
options {
@members {
String s;
public String changeCase(String in)
return in.toUpperCase();
r : ID '#' {s = $ID.text; System.out.println("found "+s);} -> declTemplate(id={$ID.text}) ;
ID : 'a'..'z' + ;
WS : (' ' |'\n' |'\r' )+ {skip();} ; // ignore whitespace
Template file : mytemplate.stg
declTemplate (id) ::= <<
Actions embedded within Rules
grammar T;
options {language=Java;}
@header {
package org.antlr.test;
r[int a, String b] returns [int n]
@init {
$n=$a; // init return value
@after {
System.out.println("returning value n="+$n);
: ... {$n=23;}
| ... {$n=9;}
| ... {$n=1;}
| // use initialized value of n
Rules Error Handling
void rule() throws RecognitionException {
try {
catch (RecognitionException re) {
Overriding default error handling
To get these rich error messages, override two methods from BaseRecognizer,
displayRecognitionError( )
getTokenErrorDisplay( )
Example of a rich message
$ java Test
<< 1+;
<< EOF
>> line 1:2 [prog, stat, expr, multExpr, atom] no viable alternative,
token=[@2,2:2=';',<7>,1:2] (decision=5 state 0)
decision=<<35:1: atom : ( INT | '(' expr ')' );>>
found expr: 1+;
Grammar Level Options
options {
language = Java / C#;
output = template / AST;
backtrack = true / false;
memorize = true / false;
rewrite = true / false;
Rule Attributes
Attribute Type
--------- ------------------------------------------------------------
text String // text matched by the rule
start Token // first token matched by the rule
stop Token // last token matched by the rule
tree Object // AST computed by the rule
st StringTemplate // Template computed for this rule
r : ID '#' {s = $ID.text; System.out.println("found "+s);} -> declTemplate(id={$ID.st}) ;
POC : PLSQL to JAVA Converter
