Technology, explored !: Exploring ANTLR

Getting ANTLR to work, atleast as a proof of concept, wasn't trivial as I was very new to the whole concept of defining language constructs, writing compilers and so on. But once you get accustomed to the terms and this technology, you feel really empowered. You can theoretically write your own language or convert anything, into anything else :)

I am going to cover basics about ANTLR here and the output of the POC.

The Big Picture


*   The lexical analyzer, or lexer, breaks up the input stream into tokens.
*   Tokens are vocabulary symbols emanating from the lexical analyzer.
*   The parser feeds off this token stream and tries to recognize the sentence structure.
*   Abstract syntax tree (AST) is a TREE representation of the language syntax.
*   Output can be AST or text ( using StringTemplates ).

Definitions, Tools and usage


*   Grammar describes the syntax of a language.
*   ANTLTWorks – IDE to write, verify, test, debug the grammar
*   Commandline - java org.antlr.Tool grammar.g
*   Generates Tokens, Lexer, Parser java classes

Imagine a maze with a single entrance and single exit that has words written on the floor. Every path from entrance to exit generates a sentence by “saying” the words in sequence. In a sense, the maze is analogous to a grammar that defines a language.

Grammar Example


prog : stat+ ;
stat : expr NEWLINE
       | ID '=' expr NEWLINE
       | NEWLINE;
expr : multExpr (('+' |'-' ) multExpr)* ;
multExpr : atom ('*' atom)* ;
atom : INT 
       | ID
       | '(' expr ')’ ;
ID  : ('a'..'z' |'A'..'Z' )+ ;
INT : '0'..'9' + ;
WS  : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;

Test Class


public class Test {
   public static void main(String[] args) throws Exception {
   // Create an input character stream from standard in
   ANTLRInputStream input = new ANTLRInputStream(System.in);

   // Create an ExprLexer that feeds from that stream
   ExprLexer lexer = new ExprLexer(input);

   // Create a stream of tokens fed by the lexer
   CommonTokenStream tokens = new CommonTokenStream(lexer);

   // Create a parser that feeds off the token stream
   ExprParser parser = new ExprParser(tokens);

   // Begin parsing at rule prog
   parser.prog();
   }
}

AST vs Parse Tree for 3+4*5

An AST is to be distinguished from a parse tree, which represents the sequence of rule invocations used to match an input stream.

Grammar File Structure

Filename : grammarName.g


/** This is a document comment */
grammarType grammarName;
«optionsSpec»
«tokensSpec»
«attributeScopes»
«actions»
/** doc comment */
rule1 : ... | ... | ... ;
rule2 : ... | ... | ... ;
...

The order of the grammar sections must be as shown, with the rules appearing after all the other sections.

Sample Grammar


grammar T;
options {
 language=Java;
}
@members {
 String s;
 public String changeCase(String in) 
 {
  return in.toUpperCase();
 }
}
r  : ID '#' {s = $ID.text; System.out.println("found "+s);} -> declTemplate(id={$ID.text}) ;
ID : 'a'..'z' + ;
WS : (' ' |'\n' |'\r' )+ {skip();} ; // ignore whitespace

Template file : mytemplate.stg


declTemplate (id) ::=  <<
   var  ;
>>

Actions embedded within Rules


grammar T;
options {language=Java;}
@header {
      package org.antlr.test;
} 
r[int a, String b] returns [int n]
@init {
 $n=$a; // init return value
}
@after {
 System.out.println("returning value n="+$n);
}
 : ... {$n=23;}
 | ... {$n=9;}
 | ... {$n=1;}
 | // use initialized value of n
;

Rules Error Handling


void rule() throws RecognitionException {
  try {
    «rule-body»
  }
  catch (RecognitionException re) {
  reportError(re);
  recover(input,re);
 }
}

Overriding default error handling

To get these rich error messages, override two methods from BaseRecognizer,
displayRecognitionError( )
getTokenErrorDisplay( )

Example of a rich message


$ java Test
<< 1+;
<< EOF
>> line 1:2 [prog, stat, expr, multExpr, atom] no viable alternative,
token=[@2,2:2=';',<7>,1:2] (decision=5 state 0)
decision=<<35:1: atom : ( INT | '(' expr ')' );>>
found expr: 1+;

Grammar Level Options


options {
language = Java / C#;
output = template / AST;
backtrack = true / false;
memorize = true / false;
rewrite = true / false;
}

Rule Attributes


Attribute  Type
---------  ------------------------------------------------------------
text    String   // text matched by the rule
start    Token   // first token matched by the rule
stop    Token   // last token matched by the rule
tree    Object  // AST computed by the rule
st         StringTemplate  // Template computed for this rule

Example


r  : ID '#' {s = $ID.text; System.out.println("found "+s);} -> declTemplate(id={$ID.st}) ;

POC : PLSQL to JAVA Converter

References


Websites
http://www.antlr.org/
http://www.stringtemplate.org/
http://www.antlr.org/grammar/list
http://www.antlr.org/grammar/1279318813752/PLSQL.g

Books
 The.Definitive.ANTLR.Reference.pdf

Technology, explored !

Monday 2 April 2012

Exploring ANTLR

No comments:

Post a Comment