ANTLRチュートリアル
をテンプレートにして作成
[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
]
開始行:
[[構文解析の記事一覧]]
*趣旨 [#r1ba6363]
自分の英語力で何となくわかる箇所のうち、ひょっとしたら興...
、
拙い英語力で和訳を試みる。
いまのところ、はじめにの箇所を訳している
http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/...
*目次 [#ad270d48]
#contents
*チュートリアル作者 [#u9a69d9e]
アーシュリー マイルズさん
**メール [#b9dd1bdd]
ashley@ashleymills.com
英語で質問しよう
**著作権 [#uaf48e8a]
Copyright � 2005 バーミンガム大学
*はじめに [#ff0dbbc5]
ANTLR (もうひとつの言語認識のためのツール) is a parser an...
-Java http://java.sun.com/
-C++ http://anubis.dkuug.dk/jtc1/sc22/wg21/ or
-Sather http://www.icsi.berkeley.edu/~sather/.
ANTLR は PRED-LL(k) 法のパーサを作ります。
ANTLRとは何かに対するTerrance Parrさんによる答えは、以下...
http://www.jguru.com/faq/view.jsp?EID=77
* 背景となる事情 [#j1394d6a]
ANTLR is a compiler tool hence it's developer base is gen...
**単語の切り出し [#c88f241e]
Other names: Scanner, lexical analyser, tokenizer.
Programming languages are made up of keywords, and strict...
A source program is written using some kind of editing to...
A source file is streamed to a lexer on a character by ch...
A lexer usually generates errors pertaining to sequences ...
**パーサ [#v3e1eec0]
別名を構文解析ともいう。
A lexer groups sequences of characters it recognises in t...
The parser checks to see if the tokens conform to the syn...
通常パーサは認識した単語の並びをパターンマッチするAST(Abs...
An AST is easier to translate to a target language becaus...
The parser generates one or more symbol table(s) which co...
A parser usually generates errors pertaining to sequences...
Both lexers and parsers are recognizers, lexers recognize...
It is recommended that you read Building Recognizers By H...
**ANTLRを構成している部品について [#bdfb25dd]
ANTLR lets you define the rules that the lexer should use...
*インストール方法 [#b354ba6c]
The documentation for the installation is written under t...
Obtain the ANTLR download by following the download secti...
安全な場所に解凍します。
Add, /path/to/where/you/unzipped/ANTLR/antlr.jar and /pat...
*ANTLR文法雛形 [#d2ec8620]
An ANTLR grammar file has a number of components, some of...
header {
// stuff that is placed at the top of <all> generated f...
}
options { options for entire grammar file }
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourLexerClass extends Lexer;
// definition extends from here to next class definition
// (or EOF if no more class defs)
options { YourOptions }
tokens...
lexer rules...
myrule[args] returns [retval]
options { defaultErrorHandler=false; }
: // body of rule...
;
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourParserClass extends Parser;
options { YourOptions }
tokens...
parser rules...
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourTreeParserClass extends TreeParser;
options { YourOptions }
tokens...
tree parser rules...
// arbitrary lexers, parsers and treeparsers may be incl...
header {
// stuff that is placed at the top of <all> generated f...
}
The textual content of the header will be copied verbatim...
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourLexerClass extends Lexer;
// definition extends from here to next class definition
// (or EOF if no more class defs)
options { YourOptions }
tokens...
lexer rules...
This begins with an optional class preamble, any text pla...
options {
k = 2; // Set token lookahead to...
tokenVocabulary = Befunge; // Call it's vocabulary "...
defaultErrorHandler = false; // Don't generate parser ...
}
Extensive detail pertaining to the various options availa...
tokens {
EXPR; // Imaginary token
THIS="that"; // Literal definition
INT="int"; // Literal definition
}
The lexer rules come next and have the general form:
rulename [args] returns [retval]
options { defaultErrorHandler=false; }
{ optional initiation code }
: alternative_1
| alternative_2
...
| alternative_n
;
**例: [#bfa463c7]
INT : ('0'..'9')+; // Matches an integer
There can be an arbitrary number of rules.
The other two sections have the same layout as the one de...
*ANTLRでつかうワイルドカード記号 [#m4ca8a9c]
ANTLR specifies it's lexical rules and parser rules using...
**ゼロまたは複数にマッチさせる記号* [#o233a365]
ANTLR uses the notation (expression)* to indicate that th...
**1つまたは複数にマッチさせる記号+ [#i91001b0]
ANTLR uses the notation (expression)+ to indicate that th...
**ゼロまたは1つにマッチさせる記号? [#g1653cd4]
ANTLR uses the notation (expression)? to indicate that th...
*単語切り出し例 [#z6068f24]
This example illustrates a very simple lexer that matches...
class SimpleLexer extends Lexer;
options { k=1; filter=true; }
ALPHA : ('a'..'z'|'A'..'Z')+
{ System.out.println("Found alpha: "+getText()); }
;
NUMERIC : ('0'..'9')+
{ System.out.println("Found numeric: "+getText())...
;
EXIT : '.' { System.exit(0); } ;
**解説 [#y606b0e5]
***class [#eae4a64b]
This will be explained in sections, lets begin with the f...
class SimpleLexer extends Lexer;
This is the lexer declaration, pretty straightforward, it...
***options [#lad5bd90]
options { k=1; filter=true; }
Here some basic options are set. k is set to one, k is th...
SILLY1 : "ab" ;
SILLY2 : "ac" ;
And when trying to parse a file containing these lexer ru...
warning: lexical nondeterminism between rules SILLY1 and ...
silly.g:0: k==1:'a'
Because if the lexer encountered "ab", with a lookahead o...
The filter=true option sets filtering on, which means tha...
options { filter=BLAH; }
protected BLAH : { _ttype = Token.SKIP; } ;
Has the same effect as setting filter to true. However, u...
**アルファベットのマッチ [#j6570c95]
ALPHA : ('a'..'z'|'A'..'Z')+
{ System.out.println("Found alpha: "+getText()); }
;
This is an example of a lexer rule, it matches any sequen...
**数値のマッチ [#fe3b767f]
NUMERIC : ('0'..'9')+
{ System.out.println("Found numeric: "+getText())...
;
This is very similar to the ALPHA rule but instead matche...
**終端 [#k1ff7148]
EXIT : '.' { System.exit(0); } ;
This matches the literal character '.', (when unquoted '....
**複数行の扱い方 [#vb85c9ae]
When using the filter=true option, if one is processing a...
NEWLINE : ( "\r\n" // DOS
| '\r' // MAC
| '\n' // Unix
)
{ newline();
$setType(Token.SKIP);
}
;
***無視される文字について
The ANTLR directive $setType(tokenType) is used to indica...
***Javaとかの生成やエラーについて [#n6c8e5c7]
After defining our lexer, ANTLR s ran on the source file ...
java antlr.Tool simple.g
The command displays the version information and then, if...
***警告例 [#dd5834f4]
ANTLR Parser Generator Version 2.7.1 1989-2000 jGuru...
simple.g:13: warning:did you forget to terminate previou...
The line number indicated is the start of the rule that f...
SimpleLexer.java, the lexer produced implements TokenStre...
SimpleLexerTokenTypes.java contains the token type defini...
import java.io.*;
public class Main {
public static void main(String[] args) {
SimpleLexer simpleLexer = new SimpleLexer(System.in);
while(true) {
try {
simpleLexer.nextToken();
} catch(Exception e) {}
}
}
}
A new instance of SimpleLexer is created with the constru...
All of the java files are compiled by issuing:
javac *.java
The Main.class produced is executed by issuing:
java Main
An example session using this Lexer is shown below:
This Lexer recognises strings and numbers: hello 22 good...
Found alpha: This
Found alpha: Lexer
Found alpha: recognises
Found alpha: strings
Found alpha: and
Found alpha: numbers
Found alpha: hello
Found numeric: 22
Found alpha: goodbye
Found numeric: 33
It ignores everything else: -=+/#
Found alpha: It
Found alpha: ignores
Found alpha: everything
Found alpha: else
.
This Lexer exclusively recognises alpha and numeric conte...
11aa33hi
Found numeric: 11
Found alpha: aa
Found numeric: 33
Found alpha: hi
.
That about wraps up this Lexer introduction but it should...
*シンプルな文字切り出しとパーサの例 [#ndf321e9]
A simple lexer and parser will be discussed. The Job of t...
class SimpleParser extends Parser;
entry : (d:DOB n:NAME a:AGE(SEMI)
{
System.out.println(
"Name: " +
n.getText() +
", Age: " +
a.getText() +
", DOB: " +
d.getText()
);
})*
;
class SimpleLexer extends Lexer;
NAME : ('a'..'z'|'A'..'Z')+;
DOB : ('0'..'9' '0'..'9' '/')=>
(('0'..'9')('0'..'9')'/')(('0'..'9')('0'..'9')'/')...
| ('0'..'9')+ { $setType(AGE); } ;
WS :
(' '
| '\t'
| '\r' '\n' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
SEMI : ';' ;
The parser is the first class to be specified, however, o...
entry : (d:DOB n:NAME a:AGE(SEMI)
{
System.out.println(
"Name: " +
n.getText() +
", Age: " +
a.getText() +
", DOB: " +
d.getText()
);
})*
;
You can see that the first rule is entry, this is the rul...
DOB NAME AGE;
If it is successful in finding this sequence of tokens, t...
NAME : ('a'..'z'|'A'..'Z')+;
A simple sequence of one or more lower-case OR upper-case...
DOB : ('0'..'9' '0'..'9' '/')=>
(('0'..'9')('0'..'9')'/')(('0'..'9')('0'..'9')'/')...
| ('0'..'9')+ { $setType(AGE); } ;
If DOB and AGE had been specified like this:
DOB : ('0'..'9')('0'..'9')'/' ('0'..'9')('0'..'9')'/' ('0...
AGE : ('0'..'9')+ ;
ANTLR would have issued the warning:
warning: lexical nondeterminism between rules DOB and AG...
simple2.g:0: k==1:'0'..'9'
This is because both of the rules mentioned begin with ('...
One way around this would be to specify a parser lookahea...
options { k=3; }
This would enable the parser to look forward a maximum of...
If there are a lot of alternatives all that can be distin...
DOB : ('0'..'9' '0'..'9' '/')=>
This says that the lexer should see if it can find two di...
(('0'..'9')('0'..'9')'/')(('0'..'9')('0'..'9')'/')('0'.....
If it is successful in doing so then the token will be ma...
| ('0'..'9')+ { $setType(AGE); } ;
The implication is that if the rule is matched then the a...
WS :
(' '
| '\t'
| '\r' '\n' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
The WS rule matches white space, hence "WS". The rule wil...
SEMI : ';' ;
This simply matches a semicolon (';').
The Lexer and Parser are setup using a main method, this ...
import java.io.*;
public class Main {
public static void main(String args[]) {
DataInputStream input = new DataInputStream(System.in);
SimpleLexer lexer = new SimpleLexer(input);
SimpleParser parser = new SimpleParser(lexer);
try {
parser.entry();
} catch(Exception e) {}
}
}
Pretty straightforward, instantiation of an input-stream ...
06/06/82 Peter 20;
03/04/83 Rosie 19;
04/05/81 Mikey 21;
After creating the classes with:
java antlr.Tool Simple2.g
And compiling everything:
java *.java
Main is invoked:
java Main < test.txt
This produces the output:
Name: Peter, Age: 20, DOB: 06/06/82
Name: Rosie, Age: 19, DOB: 03/04/83
Name: Mikey, Age: 21, DOB: 04/05/81
An error can be simulated by changing Mikey's age to "a21...
Name: Peter, Age: 20, DOB: 06/06/82
Name: Rosie, Age: 19, DOB: 03/04/83
line 3: expecting AGE, found 'a'
*文法を評価する例 [#we3cd2c9]
It has come to that time where it is necessary to step in...
class ExpressionParser extends Parser;
options { buildAST=true; }
expr : sumExpr SEMI!;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT ;
class ExpressionLexer extends Lexer;
PLUS : '+' ;
MINUS : '-' ;
MUL : '*' ;
DIV : '/' ;
MOD : '%' ;
POW : '^' ;
SEMI : ';' ;
protected DIGIT : '0'..'9' ;
INT : (DIGIT)+ ;
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
The grammar definition begins with the parser section, th...
class ExpressionParser extends Parser;
options { buildAST=true; }
The class is declared normally and then the option buildA...
expr : sumExpr SEMI!;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
The top level rule of the expression is expr, this simply...
The sumExpr is an expression that consists of a prodExpr ...
prodExprPLUSprodExprMINUSprodExpr
Because zero or more of these sequences must be present f...
An AST is a special kind of tree that can have an arbitra...
The PLUS and MINUS token references are postfixed with th...
The AST structure is defined in such a way that operator ...
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
Dictates that the first and second children of a sumExpr ...
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
This says that a prodExpr must consist of a powExpr follo...
The children of the root will be evaluated before applyin...
powExpr : atom (POW^ atom)? ;
atom : INT ;
It can be seen that a powExpr consists of an atom followe...
powExpr : atom (POW^ atom)* ;
Because this powExpr is broken. What happens if more than...
The point is that the subtrees will be evaluated first, s...
**備考 [#e5ca8fe1]
You may find it easier to think of the AST defined here a...
Let's take another look at the parser in one block:
class ExpressionParser extends Parser;
options { buildAST=true; }
expr : sumExpr SEMI!;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT ;
How can an expression be just an atom? Assume we are tryi...
The first component of powExpr is an atomso the parser tr...
Figure�1.�INT SEMI Match
The red lines (or dark lines if you are in greyscale), ar...
Do the rules handle precedence correctly? Let's take a lo...
The parser can only generate the tree:
Figure�2.�1+(2*5) Binary Tree
Because a sumExpr cannot be a subtree of a prodExpr, as i...
Figure�3.�1+(2*5) AST
An AST can have an arbitrary number of children, the chil...
Figure�4.�Sum AST
Where sum is some function that accepts multiple argument...
class ExpressionLexer extends Lexer;
PLUS : '+' ;
MINUS : '-' ;
MUL : '*' ;
DIV : '/' ;
MOD : '%' ;
POW : '^' ;
SEMI : ';' ;
protected DIGIT : '0'..'9' ;
INT : (DIGIT)+ ;
These rules are pretty self-explanatory and define the to...
The next section of the grammar is the tree parser:
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
The line before the the class declaration is a header tha...
expr returns [double r]
{ double a,b; r=0; }
Immediately preceding the opening of the rule is a Java c...
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
The rule definitions all take use the AST syntax, that is:
#(ROOT child.1 child.2 ... child.n);
The first rule says, if PLUS is found as a root, assign t...
In this case, it is obvious with a lookahead of one which...
| i:INT { r=(double)Integer.parseInt(i.getText()); }
This alternative is more interesting than the others so w...
The treeparser, is being used to walk the tree and evalua...
import java.io.*;
import antlr.CommonAST;
import antlr.collections.AST;
import antlr.debug.misc.ASTFrame;
public class Main {
public static void main(String args[]) {
try {
DataInputStream input = new DataInputStream(System....
ExpressionLexer lexer = new ExpressionLexer(input);
ExpressionParser parser = new ExpressionParser(lexe...
parser.expr();
CommonAST parseTree = (CommonAST)parser.getAST();
System.out.println(parseTree.toStringList());
ASTFrame frame = new ASTFrame("The tree", parseTree);
frame.setVisible(true);
ExpressionTreeWalker walker = new ExpressionTreeWal...
double r = walker.expr(parseTree);
System.out.println("Value: "+r);
} catch(Exception e) { System.err.println("Exception:...
}
}
First we import all the necessary classes and open the cl...
DataInputStream input = new DataInputStream(System....
A DataInputStream is setup.
ExpressionLexer lexer = new ExpressionLexer(input);
The lexer is created and told to accept data from the inp...
ExpressionParser parser = new ExpressionParser(lexe...
parser.expr();
The parser is created, using the lexer to deliver tokens....
CommonAST parseTree = (CommonAST)parser.getAST();
Remember that options { buildAST=true; } was specified? H...
System.out.println(parseTree.toStringList());
The AST is printed using the CommonAST toStringList() met...
ASTFrame frame = new ASTFrame("The tree", parseTree);
frame.setVisible(true);
A new ASTFrame is created, which is a frame designed for ...
double r = walker.expr(parseTree);
System.out.println("Value: "+r);
A new double is defined and assigned the value of the exp...
Let's go through an example expression to see what output...
1+2-3*4/5^6;
Is placed in a file called test.txt and used as input to ...
java Main < test.txt
The tree expressed as a list via toStringList() is output:
( - ( + 1 2 ) ( / ( * 3 4 ) ( ^ 5 6 ) ) ) ;
The AST comes up in the ASTFrame:
Figure�5.�AST of (1+2-3*4/5^6)
And the value is output:
Value: 2.999232
That concludes the basic expression evaluator, the next s...
*文法評価機能の拡張について [#r206b4cf]
**入れ子状の表現 [#w0dadb6b]
The old expression evaluator did not allow nesting of bra...
expr : sumExpr SEMI;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)*;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT ;
powExpr has a higher precedence than prodExpr and is made...
#(PLUS a=expr b=expr) { r=a+b; }
See that a and b have to be evaluated first because the o...
atom : INT | expr ;
This would generate some infinite recursion errors:
expression.g:8: infinite recursion to rule sumExpr from ...
expression.g:7: infinite recursion to rule sumExpr from ...
expression.g:6: infinite recursion to rule sumExpr from ...
expression.g:5: infinite recursion to rule sumExpr from ...
expression.g:8: infinite recursion to rule sumExpr from ...
Imagine the case of an out of place token. The parser wou...
expr : LPAREN^ sumExpr RPAREN! ;
This redefinition is crucial, LPAREN and RPAREN are token...
If the out of place token arose again, the parser could n...
expr : LPAREN^ sumExpr RPAREN! ;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT | expr ;
So now an expr can recursively contain as many expr's as ...
| #(LPAREN a=expr) { r=a;}
This rule matches an AST with a LPAREN as root and an exp...
Eventually all leaves of the AST must be INT's, unless an...
In the normal world, the calls to expr will return a fina...
To restate: Whenever an LPAREN is encountered, the sub-ex...
An AST illustrating this should help to clarify things, h...
Figure�6.�AST of (5*(2+3))
The root of the tree is an expr signified by the open bra...
Figure�7.�Another View of (5*(2+3))
Here is the whole grammar: expression2.g. The Main method...
**備考 [#tb532217]
Adding subexpressions also solves the problem of being li...
**符号の追加 [#kfc5fd9e]
The sign operator has a higher precedence than any of the...
imaginaryTokenDefinitions :
SIGN_MINUS
SIGN_PLUS
;
expr : LPAREN^ sumExpr RPAREN! ;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : signExpr (POW^ signExpr)? ;
signExpr : (
m:MINUS^ {#m.setType(SIGN_MINUS);}
| p:PLUS^ {#p.setType(SIGN_PLUS);}
)? atom ;
atom : INT | expr ;
Ignore the imaginary token business for now, this will be...
Consider the matching of the expression (-3--2). The firs...
Is the first token of signExpr a MINUS? Yes, it is! Brill...
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
The next token of the sumExpr rule is PLUS OR MINUS, the ...
The parser has just matched the MINUS in sumExpr, it now ...
Hopefully I have not obfuscated the working of this rule....
signExpr : (
m:MINUS^ {#m.setType(SIGN_MINUS);}
| p:PLUS^ {#p.setType(SIGN_PLUS);}
)? atom ;
First of all, note that the first part of signExpr is enc...
m:MINUS^
Assigns the root node MINUS to the variable 'm'
{#m.setType(SIGN_MINUS);}
Replaces this root node in the tree with the token SIGN_M...
| #(MINUS a=expr b=expr) { r=a-b; }
In order not to have some kind of syntactic predicate to ...
imaginaryTokenDefinitions :
SIGN_MINUS
SIGN_PLUS
;
When parsing the AST, the tree parser knows exactly which...
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| #(LPAREN a=expr) { r=a; }
| #(SIGN_MINUS a=expr) { r=-1*a; }
| #(SIGN_PLUS a=expr) { if(a<0)r=0-a; else r=a; }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
If a SIGN_MINUS root is encountered, the desired conseque...
**備考 [#ac077c87]
As pointed out to me by Safak Oekmen, this interpretation...
Let's look at an example run through of the expression (-...
( ( ( - ( - 3 ) ( - 2 ) ) )
Value: -1.0
Here is the AST produced:
Figure�8.�(-3--2) AST
Here are some more AST's:
Figure�9.�A few AST's
**備考 [#ab49b80a]
In the parser for the expression evaluator, a top level e...
expr : LPAREN^ sumExpr RPAREN! ;
If we were reading the expression from a file, this rule ...
expr : LPAREN^ sumExpr RPAREN! EOF! ;
So that the EOF(End Of File), token is matched and the wh...
*CSV から XHTML テーブルへの変換例 [#sc41e24e]
A Lexer translates from a stream of characters to a strea...
A CSV is a very simple kind of data structure; variables ...
"STUDENT ID, "NAME, "DATE OF BIRTH
"129384, "Davy Jones, "03/04/81
"328649, "Clare Manstead, "30/11/81
"237090, "Richard Stoppit, "22/06/82
"523343, "Brian Hardwick, "15/11/81
"908423, "Sally Brush, "06/06/81
"328453, "Elisa Strudel, "12/09/82
"883632, "Peter Smith, "03/05/83
"542033, "Ryan Alcot, "04/12/80
The translator developed in this section will translate C...
The CSV file is composed of one or more lines and termina...
file : ( line (NEWLINE line)* (NEWLINE)? EOF )
A line consists of one or more records, NEWLINE is handle...
line : ( (record)+ ) ;
A record consists of a RECORD token, optionally followed ...
record : ( (r:RECORD) (COMMA)? ) ;
Notice that the last line of the CSV is an exceptional ca...
class CSVParser extends Parser;
file : ( line (NEWLINE line)* (NEWLINE)? EOF )
line : ( (record)+ ) ;
record : ( (r:RECORD) (COMMA)? ) ;
It is coupled with this lexer:
class CSVLexer extends Lexer;
options { charVocabulary='\3'..'\377'; }
RECORD : '"'! (~(','|'\r'|'\n'))+ ;
COMMA : ',' ;
NEWLINE : ('\r''\n')=> '\r''\n' //DOS
| '\r' //MAC
| '\n' //UNIX
{ newline(); }
;
WS : (' '|'\t') { $setType(Token.SKIP); } ;
First of all, charVocabulary is set to '\3'..'\377', this...
RECORD is an example of an "everything except" rule and s...
������Dave,����21
������Richard,�55
������Peter,���98
����
Was used, then the records would be "Dave",
"����21"
, "Richard", "55 ", "Peter" and
"����98"
, which is probably not what is desired. Of course, one c...
If this lexer and parser were used to process a CSV file,...
class CSVParser extends Parser;
options { k=2; }
file {System.out.println("file called");}
: ( line (NEWLINE line)* (NEWLINE)? EOF)
{System.out.println("file matched");}
;
line {System.out.println("line called");}
: ( (record)+ )
{System.out.println("line matched");}
;
record {System.out.println("record called");}
: ( (r:RECORD) (COMMA)? )
{System.out.println("record = " + r.getText());
System.out.println("record matched");}
;
Notice the lookahead of two to distinguish between (NEWLI...
"David Sprocket, "89
"Cindy Brocket, "18
"Michael Rocket, "33
The output that was produced is shown below (reformatted ...
file called
line called
record called
record = David Sprocket
record matched
record called
record = 89
record matched
line matched
line called
record called
record = Cindy Brocket
record matched
record called
record = 18
record matched
line matched
line called
record called
record = Michael Rocket
record matched
record called
record = 33
record matched
line matched
file matched
The ANTLR grammar so far can be downloaded here: csv1.g. ...
class CSVParser extends Parser;
options { k=2; }
file {System.out.println("<table align=\"center\" bord...
: ( line (NEWLINE line)* (NEWLINE)? EOF)
{System.out.println("</table>");}
;
line {System.out.println(" <tr>");}
: ( (record)+ )
{System.out.println(" </tr>");}
;
record {System.out.print(" <td>");}
: ( (r:RECORD) (COMMA)? )
{System.out.print(r.getText());
System.out.println("</td>");}
;
The output produced when processing the same file as befo...
<table align="center" border="1">
<tr>
<td>David Sprocket</td>
<td>89</td>
<tr/>
<tr>
<td>Cindy Brocket</td>
<td>18</td>
<tr/>
<tr>
<td>Michael Rocket</td>
<td>33</td>
<tr/>
</table>
Which is what was intended of the translator. Apart from ...
import java.io.*;
public class Main {
public static void main(String args[]) {
if(args.length==0) { error(); }
FileInputStream fileInput = null;
try {
fileInput = new FileInputStream(args[0]);
} catch(Exception e) { error(); }
try {
DataInputStream input = new DataInputStream(file...
CSVLexer csvLexer = new CSVLexer(input);
CSVParser csvParser = new CSVParser(csvLexer);
csvParser.file();
} catch(Exception e) { System.err.println(e.getMess...
}
private static void error() {
System.out.println("*-----------------------*");
System.out.println("| USAGE: |");
System.out.println("| java Main inputfile |");
System.out.println("*-----------------------*");
System.exit(0);
}
}
The program first checks that the one compulsory command ...
The Main program used to run the translator can be downlo...
The full grammar file can be downloaded here: csv2.g.
Some test data can be downloaded here: test.txt
At the moment the parser outputs to stdout, this is not v...
class CSVParser extends Parser;
options { k=2; }
file returns[String table = new String()]
{String lineData; table+="<table align=\"center\" ...
: ( lineData=line {table+=lineData;}
(NEWLINE lineData=line {table+=lineData;} )*
(NEWLINE)? EOF )
{table+="</table>";}
;
line returns [String lineData = new String()]
{String recordData; lineData+=" <tr>\n";}
: ( (recordData=record {lineData+=recordData;})+ )
{lineData+=" <tr/>\n";}
;
record returns [String recordData = new String()]
{recordData+=(" <td>");}
: ( (rec:RECORD) (COMMA)? )
{recordData+=(rec.getText());
recordData+="</td>\n";}
;
The full grammar can be downloaded here: csv3.g.
Within file: Immediately the opening table element is app...
line returns [String lineData = new String()]
{String recordData; lineData+=" <tr>\n";}
: ( (recordData=record {lineData+=recordData;})+ )
{lineData+=" <tr/>\n";}
;
Within line: Immediately the opening <tr> element is appe...
record returns [String recordData = new String()]
{recordData+=(" <td>");}
: ( (rec:RECORD) (COMMA)? )
{recordData+=(rec.getText());
recordData+="</td>\n";}
;
Within record: Immediately the opening td element is appe...
line receives recordData and this is used in the construc...
file receives lineData and this is used in the constructi...
Here is the new main method containing class:
import java.io.*;
public class CSVhtml {
public static void main(String args[]) {
if(args.length!=2) { error(); }
FileInputStream fileInput = null;
DataOutputStream fileOutput = null;
try {
fileInput = new FileInputStream(args[0]);
fileOutput = new DataOutputStream(new FileOutput...
} catch(Exception e) { error2(); }
try {
DataInputStream input = new DataInputStream(file...
CSVLexer csvLexer = new CSVLexer(input);
CSVParser csvParser = new CSVParser(csvLexer);
String p ="";
p ="<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
p+="<!DOCTYPE html\n";
p+="PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional/...
p+="\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-tra...
p+="<html>\n";
p+=" <head>\n";
p+=" <title>A Table</title>\n";
p+=" <meta http-equiv=\"Content-Type\" conte...
p+=" </head>\n";
p+=" <body>\n";
p+= csvParser.file();
p+=" </body>\n";
p+="</html>";
fileOutput.writeBytes(p);
fileOutput.close();
} catch(Exception e) { System.err.println(e.getMess...
}
private static void error() {
System.out.println("*------------------------------...
System.out.println("| USAGE: ...
System.out.println("| java CSVhtml inputfile outp...
System.out.println("*------------------------------...
System.exit(0);
}
private static void error2() {
System.out.println("*------------------------------...
System.out.println("| You must specify a valid inpu...
System.out.println("*------------------------------...
System.exit(0);
}
}
The program shown above can be downloaded here: CSVHTML.j...
After the grammar is converted to a lexer, parsed by ANTL...
java CSVHTML inputfile outputfile
The table produced from executing the command:
java CSVHTML test.txt output.html
Looks, under a certain proprietary web-browser, like this:
Figure�10.�test.txt expressed as a table
*Snippets From Behind The Scenes [#g621b422]
When the lexer tokenizes the input stream, each token enc...
// $ANTLR 2.7.1: "csv.g" -> "CSVParser.java"$
public interface CSVParserTokenTypes {
int EOF = 1;
int NULL_TREE_LOOKAHEAD = 3;
int NEWLINE = 4;
int RECORD = 5;
int COMMA = 6;
int WS = 7;
}
There is a class called TokenBuffer whose job it is to bu...
// Note, I have cleaned this up a little, ANTLR generate...
// { {instructions} {instructions} } which can be repres...
// { instructions instructions }
public final void file() throws RecognitionException, To...
try { // for error handling
System.out.println("file called");
int _cnt3=0;
_loop3:
do {
if ((LA(1)==RECORD)) {
line();
} else {
if ( _cnt3>=1 ) { break _loop3; }
else {throw new NoViableAltException(LT(1), g...
}
_cnt3++;
} while (true);
} catch (RecognitionException ex) {
reportError(ex);
consume();
consumeUntil(_tokenSet_0);
}
TokenBuffer provides tokens via LT and tokens via LA. Tok...
// Cleaned up a little by me
public Token nextToken() throws TokenStreamException {
Token theRetToken=null;
tryAgain:
for (;;) {
Token _token = null;
int _ttype = Token.INVALID_TYPE;
resetText();
try { // for char stream error handling
try { // for lexical error handling
switch ( LA(1)) {
case ',': {
mCOMMA(true);
theRetToken=_returnToken;
break;
}
case '\n': case '\r': {
mNEWLINE(true);
theRetToken=_returnToken;
break;
}
case '\t': case ' ': {
mWS(true);
theRetToken=_returnToken;
break;
}
default:
if ((_tokenSet_0.member(LA(1)))) {
mRECORD(true);
theRetToken=_returnToken;
} else {
if (LA(1)==EOF_CHAR) {uponEOF(); _retur...
else {throw new NoViableAltForCharExcep...
}
}
if ( _returnToken==null ) continue tryAgain; ...
_ttype = _returnToken.getType();
_ttype = testLiteralsTable(_ttype);
_returnToken.setType(_ttype);
return _returnToken;
} catch (RecognitionException e) {
throw new TokenStreamRecognitionException(e);
}
} catch (CharStreamException cse) {
if ( cse instanceof CharStreamIOException ) {
throw new TokenStreamIOException(((CharStream...
} else {
throw new TokenStreamException(cse.getMessage...
}
}
}
}
Notice the bit that says:
if (LA(1)==EOF_CHAR) {uponEOF(); _returnToken = makeToke...
else {throw new NoViableAltForCharException((char)LA(1),...
This says if the token found is an EOF_CHAR, call the upo...
The code below shows rec being assigned the Token returne...
{
rec = LT(1);
match(RECORD);
}
.
.
.
recordData+=(rec.getText());
recordData+="</td>\n";
*謝辞 [#x091c61a]
Thanks go to Bogdan Mitu for showing me the way with the ...
*参考リンク [#ve123230]
**書籍 [#uf731d80]
http://www.antlr.org/book/index.html
Practical�Computer�Language�Recognition�and�Translation
A�guide�for�building�source-to-source�translators�with�AN...
Copyright�1999�Terence�Parr
Updated�2/1/99
����������
**ANTLRに関する記事 [#j46e6a9c]
http://www.antlr.org/article/list
ANTLR articles page - lots of interesting things.
**とりあえずANTLRをはじめる方法 [#ye6b85de]
http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Getting+St...
Getting Started with ANTLR.
**ANTLRのチュートリアル [#y254c77c]
http://javadude.com/articles/antlrtut/
An ANTLR Tutorial by Scott Stanchfield
**コンパイラーに関する本 [#jfd9d1ac]
http://www.amazon.com/Compiler-Design-Theory-Systems-prog...
Compiler Theory And Design
The ANTLR Reference Manual
Comes included with the ANTLR installation
終了行:
[[構文解析の記事一覧]]
*趣旨 [#r1ba6363]
自分の英語力で何となくわかる箇所のうち、ひょっとしたら興...
、
拙い英語力で和訳を試みる。
いまのところ、はじめにの箇所を訳している
http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/...
*目次 [#ad270d48]
#contents
*チュートリアル作者 [#u9a69d9e]
アーシュリー マイルズさん
**メール [#b9dd1bdd]
ashley@ashleymills.com
英語で質問しよう
**著作権 [#uaf48e8a]
Copyright � 2005 バーミンガム大学
*はじめに [#ff0dbbc5]
ANTLR (もうひとつの言語認識のためのツール) is a parser an...
-Java http://java.sun.com/
-C++ http://anubis.dkuug.dk/jtc1/sc22/wg21/ or
-Sather http://www.icsi.berkeley.edu/~sather/.
ANTLR は PRED-LL(k) 法のパーサを作ります。
ANTLRとは何かに対するTerrance Parrさんによる答えは、以下...
http://www.jguru.com/faq/view.jsp?EID=77
* 背景となる事情 [#j1394d6a]
ANTLR is a compiler tool hence it's developer base is gen...
**単語の切り出し [#c88f241e]
Other names: Scanner, lexical analyser, tokenizer.
Programming languages are made up of keywords, and strict...
A source program is written using some kind of editing to...
A source file is streamed to a lexer on a character by ch...
A lexer usually generates errors pertaining to sequences ...
**パーサ [#v3e1eec0]
別名を構文解析ともいう。
A lexer groups sequences of characters it recognises in t...
The parser checks to see if the tokens conform to the syn...
通常パーサは認識した単語の並びをパターンマッチするAST(Abs...
An AST is easier to translate to a target language becaus...
The parser generates one or more symbol table(s) which co...
A parser usually generates errors pertaining to sequences...
Both lexers and parsers are recognizers, lexers recognize...
It is recommended that you read Building Recognizers By H...
**ANTLRを構成している部品について [#bdfb25dd]
ANTLR lets you define the rules that the lexer should use...
*インストール方法 [#b354ba6c]
The documentation for the installation is written under t...
Obtain the ANTLR download by following the download secti...
安全な場所に解凍します。
Add, /path/to/where/you/unzipped/ANTLR/antlr.jar and /pat...
*ANTLR文法雛形 [#d2ec8620]
An ANTLR grammar file has a number of components, some of...
header {
// stuff that is placed at the top of <all> generated f...
}
options { options for entire grammar file }
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourLexerClass extends Lexer;
// definition extends from here to next class definition
// (or EOF if no more class defs)
options { YourOptions }
tokens...
lexer rules...
myrule[args] returns [retval]
options { defaultErrorHandler=false; }
: // body of rule...
;
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourParserClass extends Parser;
options { YourOptions }
tokens...
parser rules...
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourTreeParserClass extends TreeParser;
options { YourOptions }
tokens...
tree parser rules...
// arbitrary lexers, parsers and treeparsers may be incl...
header {
// stuff that is placed at the top of <all> generated f...
}
The textual content of the header will be copied verbatim...
{ optional class preamble - output to generated classfile
immediately before the definition of the class }
class YourLexerClass extends Lexer;
// definition extends from here to next class definition
// (or EOF if no more class defs)
options { YourOptions }
tokens...
lexer rules...
This begins with an optional class preamble, any text pla...
options {
k = 2; // Set token lookahead to...
tokenVocabulary = Befunge; // Call it's vocabulary "...
defaultErrorHandler = false; // Don't generate parser ...
}
Extensive detail pertaining to the various options availa...
tokens {
EXPR; // Imaginary token
THIS="that"; // Literal definition
INT="int"; // Literal definition
}
The lexer rules come next and have the general form:
rulename [args] returns [retval]
options { defaultErrorHandler=false; }
{ optional initiation code }
: alternative_1
| alternative_2
...
| alternative_n
;
**例: [#bfa463c7]
INT : ('0'..'9')+; // Matches an integer
There can be an arbitrary number of rules.
The other two sections have the same layout as the one de...
*ANTLRでつかうワイルドカード記号 [#m4ca8a9c]
ANTLR specifies it's lexical rules and parser rules using...
**ゼロまたは複数にマッチさせる記号* [#o233a365]
ANTLR uses the notation (expression)* to indicate that th...
**1つまたは複数にマッチさせる記号+ [#i91001b0]
ANTLR uses the notation (expression)+ to indicate that th...
**ゼロまたは1つにマッチさせる記号? [#g1653cd4]
ANTLR uses the notation (expression)? to indicate that th...
*単語切り出し例 [#z6068f24]
This example illustrates a very simple lexer that matches...
class SimpleLexer extends Lexer;
options { k=1; filter=true; }
ALPHA : ('a'..'z'|'A'..'Z')+
{ System.out.println("Found alpha: "+getText()); }
;
NUMERIC : ('0'..'9')+
{ System.out.println("Found numeric: "+getText())...
;
EXIT : '.' { System.exit(0); } ;
**解説 [#y606b0e5]
***class [#eae4a64b]
This will be explained in sections, lets begin with the f...
class SimpleLexer extends Lexer;
This is the lexer declaration, pretty straightforward, it...
***options [#lad5bd90]
options { k=1; filter=true; }
Here some basic options are set. k is set to one, k is th...
SILLY1 : "ab" ;
SILLY2 : "ac" ;
And when trying to parse a file containing these lexer ru...
warning: lexical nondeterminism between rules SILLY1 and ...
silly.g:0: k==1:'a'
Because if the lexer encountered "ab", with a lookahead o...
The filter=true option sets filtering on, which means tha...
options { filter=BLAH; }
protected BLAH : { _ttype = Token.SKIP; } ;
Has the same effect as setting filter to true. However, u...
**アルファベットのマッチ [#j6570c95]
ALPHA : ('a'..'z'|'A'..'Z')+
{ System.out.println("Found alpha: "+getText()); }
;
This is an example of a lexer rule, it matches any sequen...
**数値のマッチ [#fe3b767f]
NUMERIC : ('0'..'9')+
{ System.out.println("Found numeric: "+getText())...
;
This is very similar to the ALPHA rule but instead matche...
**終端 [#k1ff7148]
EXIT : '.' { System.exit(0); } ;
This matches the literal character '.', (when unquoted '....
**複数行の扱い方 [#vb85c9ae]
When using the filter=true option, if one is processing a...
NEWLINE : ( "\r\n" // DOS
| '\r' // MAC
| '\n' // Unix
)
{ newline();
$setType(Token.SKIP);
}
;
***無視される文字について
The ANTLR directive $setType(tokenType) is used to indica...
***Javaとかの生成やエラーについて [#n6c8e5c7]
After defining our lexer, ANTLR s ran on the source file ...
java antlr.Tool simple.g
The command displays the version information and then, if...
***警告例 [#dd5834f4]
ANTLR Parser Generator Version 2.7.1 1989-2000 jGuru...
simple.g:13: warning:did you forget to terminate previou...
The line number indicated is the start of the rule that f...
SimpleLexer.java, the lexer produced implements TokenStre...
SimpleLexerTokenTypes.java contains the token type defini...
import java.io.*;
public class Main {
public static void main(String[] args) {
SimpleLexer simpleLexer = new SimpleLexer(System.in);
while(true) {
try {
simpleLexer.nextToken();
} catch(Exception e) {}
}
}
}
A new instance of SimpleLexer is created with the constru...
All of the java files are compiled by issuing:
javac *.java
The Main.class produced is executed by issuing:
java Main
An example session using this Lexer is shown below:
This Lexer recognises strings and numbers: hello 22 good...
Found alpha: This
Found alpha: Lexer
Found alpha: recognises
Found alpha: strings
Found alpha: and
Found alpha: numbers
Found alpha: hello
Found numeric: 22
Found alpha: goodbye
Found numeric: 33
It ignores everything else: -=+/#
Found alpha: It
Found alpha: ignores
Found alpha: everything
Found alpha: else
.
This Lexer exclusively recognises alpha and numeric conte...
11aa33hi
Found numeric: 11
Found alpha: aa
Found numeric: 33
Found alpha: hi
.
That about wraps up this Lexer introduction but it should...
*シンプルな文字切り出しとパーサの例 [#ndf321e9]
A simple lexer and parser will be discussed. The Job of t...
class SimpleParser extends Parser;
entry : (d:DOB n:NAME a:AGE(SEMI)
{
System.out.println(
"Name: " +
n.getText() +
", Age: " +
a.getText() +
", DOB: " +
d.getText()
);
})*
;
class SimpleLexer extends Lexer;
NAME : ('a'..'z'|'A'..'Z')+;
DOB : ('0'..'9' '0'..'9' '/')=>
(('0'..'9')('0'..'9')'/')(('0'..'9')('0'..'9')'/')...
| ('0'..'9')+ { $setType(AGE); } ;
WS :
(' '
| '\t'
| '\r' '\n' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
SEMI : ';' ;
The parser is the first class to be specified, however, o...
entry : (d:DOB n:NAME a:AGE(SEMI)
{
System.out.println(
"Name: " +
n.getText() +
", Age: " +
a.getText() +
", DOB: " +
d.getText()
);
})*
;
You can see that the first rule is entry, this is the rul...
DOB NAME AGE;
If it is successful in finding this sequence of tokens, t...
NAME : ('a'..'z'|'A'..'Z')+;
A simple sequence of one or more lower-case OR upper-case...
DOB : ('0'..'9' '0'..'9' '/')=>
(('0'..'9')('0'..'9')'/')(('0'..'9')('0'..'9')'/')...
| ('0'..'9')+ { $setType(AGE); } ;
If DOB and AGE had been specified like this:
DOB : ('0'..'9')('0'..'9')'/' ('0'..'9')('0'..'9')'/' ('0...
AGE : ('0'..'9')+ ;
ANTLR would have issued the warning:
warning: lexical nondeterminism between rules DOB and AG...
simple2.g:0: k==1:'0'..'9'
This is because both of the rules mentioned begin with ('...
One way around this would be to specify a parser lookahea...
options { k=3; }
This would enable the parser to look forward a maximum of...
If there are a lot of alternatives all that can be distin...
DOB : ('0'..'9' '0'..'9' '/')=>
This says that the lexer should see if it can find two di...
(('0'..'9')('0'..'9')'/')(('0'..'9')('0'..'9')'/')('0'.....
If it is successful in doing so then the token will be ma...
| ('0'..'9')+ { $setType(AGE); } ;
The implication is that if the rule is matched then the a...
WS :
(' '
| '\t'
| '\r' '\n' { newline(); }
| '\n' { newline(); }
)
{ $setType(Token.SKIP); }
;
The WS rule matches white space, hence "WS". The rule wil...
SEMI : ';' ;
This simply matches a semicolon (';').
The Lexer and Parser are setup using a main method, this ...
import java.io.*;
public class Main {
public static void main(String args[]) {
DataInputStream input = new DataInputStream(System.in);
SimpleLexer lexer = new SimpleLexer(input);
SimpleParser parser = new SimpleParser(lexer);
try {
parser.entry();
} catch(Exception e) {}
}
}
Pretty straightforward, instantiation of an input-stream ...
06/06/82 Peter 20;
03/04/83 Rosie 19;
04/05/81 Mikey 21;
After creating the classes with:
java antlr.Tool Simple2.g
And compiling everything:
java *.java
Main is invoked:
java Main < test.txt
This produces the output:
Name: Peter, Age: 20, DOB: 06/06/82
Name: Rosie, Age: 19, DOB: 03/04/83
Name: Mikey, Age: 21, DOB: 04/05/81
An error can be simulated by changing Mikey's age to "a21...
Name: Peter, Age: 20, DOB: 06/06/82
Name: Rosie, Age: 19, DOB: 03/04/83
line 3: expecting AGE, found 'a'
*文法を評価する例 [#we3cd2c9]
It has come to that time where it is necessary to step in...
class ExpressionParser extends Parser;
options { buildAST=true; }
expr : sumExpr SEMI!;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT ;
class ExpressionLexer extends Lexer;
PLUS : '+' ;
MINUS : '-' ;
MUL : '*' ;
DIV : '/' ;
MOD : '%' ;
POW : '^' ;
SEMI : ';' ;
protected DIGIT : '0'..'9' ;
INT : (DIGIT)+ ;
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
The grammar definition begins with the parser section, th...
class ExpressionParser extends Parser;
options { buildAST=true; }
The class is declared normally and then the option buildA...
expr : sumExpr SEMI!;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
The top level rule of the expression is expr, this simply...
The sumExpr is an expression that consists of a prodExpr ...
prodExprPLUSprodExprMINUSprodExpr
Because zero or more of these sequences must be present f...
An AST is a special kind of tree that can have an arbitra...
The PLUS and MINUS token references are postfixed with th...
The AST structure is defined in such a way that operator ...
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
Dictates that the first and second children of a sumExpr ...
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
This says that a prodExpr must consist of a powExpr follo...
The children of the root will be evaluated before applyin...
powExpr : atom (POW^ atom)? ;
atom : INT ;
It can be seen that a powExpr consists of an atom followe...
powExpr : atom (POW^ atom)* ;
Because this powExpr is broken. What happens if more than...
The point is that the subtrees will be evaluated first, s...
**備考 [#e5ca8fe1]
You may find it easier to think of the AST defined here a...
Let's take another look at the parser in one block:
class ExpressionParser extends Parser;
options { buildAST=true; }
expr : sumExpr SEMI!;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT ;
How can an expression be just an atom? Assume we are tryi...
The first component of powExpr is an atomso the parser tr...
Figure�1.�INT SEMI Match
The red lines (or dark lines if you are in greyscale), ar...
Do the rules handle precedence correctly? Let's take a lo...
The parser can only generate the tree:
Figure�2.�1+(2*5) Binary Tree
Because a sumExpr cannot be a subtree of a prodExpr, as i...
Figure�3.�1+(2*5) AST
An AST can have an arbitrary number of children, the chil...
Figure�4.�Sum AST
Where sum is some function that accepts multiple argument...
class ExpressionLexer extends Lexer;
PLUS : '+' ;
MINUS : '-' ;
MUL : '*' ;
DIV : '/' ;
MOD : '%' ;
POW : '^' ;
SEMI : ';' ;
protected DIGIT : '0'..'9' ;
INT : (DIGIT)+ ;
These rules are pretty self-explanatory and define the to...
The next section of the grammar is the tree parser:
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
The line before the the class declaration is a header tha...
expr returns [double r]
{ double a,b; r=0; }
Immediately preceding the opening of the rule is a Java c...
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
The rule definitions all take use the AST syntax, that is:
#(ROOT child.1 child.2 ... child.n);
The first rule says, if PLUS is found as a root, assign t...
In this case, it is obvious with a lookahead of one which...
| i:INT { r=(double)Integer.parseInt(i.getText()); }
This alternative is more interesting than the others so w...
The treeparser, is being used to walk the tree and evalua...
import java.io.*;
import antlr.CommonAST;
import antlr.collections.AST;
import antlr.debug.misc.ASTFrame;
public class Main {
public static void main(String args[]) {
try {
DataInputStream input = new DataInputStream(System....
ExpressionLexer lexer = new ExpressionLexer(input);
ExpressionParser parser = new ExpressionParser(lexe...
parser.expr();
CommonAST parseTree = (CommonAST)parser.getAST();
System.out.println(parseTree.toStringList());
ASTFrame frame = new ASTFrame("The tree", parseTree);
frame.setVisible(true);
ExpressionTreeWalker walker = new ExpressionTreeWal...
double r = walker.expr(parseTree);
System.out.println("Value: "+r);
} catch(Exception e) { System.err.println("Exception:...
}
}
First we import all the necessary classes and open the cl...
DataInputStream input = new DataInputStream(System....
A DataInputStream is setup.
ExpressionLexer lexer = new ExpressionLexer(input);
The lexer is created and told to accept data from the inp...
ExpressionParser parser = new ExpressionParser(lexe...
parser.expr();
The parser is created, using the lexer to deliver tokens....
CommonAST parseTree = (CommonAST)parser.getAST();
Remember that options { buildAST=true; } was specified? H...
System.out.println(parseTree.toStringList());
The AST is printed using the CommonAST toStringList() met...
ASTFrame frame = new ASTFrame("The tree", parseTree);
frame.setVisible(true);
A new ASTFrame is created, which is a frame designed for ...
double r = walker.expr(parseTree);
System.out.println("Value: "+r);
A new double is defined and assigned the value of the exp...
Let's go through an example expression to see what output...
1+2-3*4/5^6;
Is placed in a file called test.txt and used as input to ...
java Main < test.txt
The tree expressed as a list via toStringList() is output:
( - ( + 1 2 ) ( / ( * 3 4 ) ( ^ 5 6 ) ) ) ;
The AST comes up in the ASTFrame:
Figure�5.�AST of (1+2-3*4/5^6)
And the value is output:
Value: 2.999232
That concludes the basic expression evaluator, the next s...
*文法評価機能の拡張について [#r206b4cf]
**入れ子状の表現 [#w0dadb6b]
The old expression evaluator did not allow nesting of bra...
expr : sumExpr SEMI;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)*;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT ;
powExpr has a higher precedence than prodExpr and is made...
#(PLUS a=expr b=expr) { r=a+b; }
See that a and b have to be evaluated first because the o...
atom : INT | expr ;
This would generate some infinite recursion errors:
expression.g:8: infinite recursion to rule sumExpr from ...
expression.g:7: infinite recursion to rule sumExpr from ...
expression.g:6: infinite recursion to rule sumExpr from ...
expression.g:5: infinite recursion to rule sumExpr from ...
expression.g:8: infinite recursion to rule sumExpr from ...
Imagine the case of an out of place token. The parser wou...
expr : LPAREN^ sumExpr RPAREN! ;
This redefinition is crucial, LPAREN and RPAREN are token...
If the out of place token arose again, the parser could n...
expr : LPAREN^ sumExpr RPAREN! ;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : atom (POW^ atom)? ;
atom : INT | expr ;
So now an expr can recursively contain as many expr's as ...
| #(LPAREN a=expr) { r=a;}
This rule matches an AST with a LPAREN as root and an exp...
Eventually all leaves of the AST must be INT's, unless an...
In the normal world, the calls to expr will return a fina...
To restate: Whenever an LPAREN is encountered, the sub-ex...
An AST illustrating this should help to clarify things, h...
Figure�6.�AST of (5*(2+3))
The root of the tree is an expr signified by the open bra...
Figure�7.�Another View of (5*(2+3))
Here is the whole grammar: expression2.g. The Main method...
**備考 [#tb532217]
Adding subexpressions also solves the problem of being li...
**符号の追加 [#kfc5fd9e]
The sign operator has a higher precedence than any of the...
imaginaryTokenDefinitions :
SIGN_MINUS
SIGN_PLUS
;
expr : LPAREN^ sumExpr RPAREN! ;
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
prodExpr : powExpr ((MUL^|DIV^|MOD^) powExpr)* ;
powExpr : signExpr (POW^ signExpr)? ;
signExpr : (
m:MINUS^ {#m.setType(SIGN_MINUS);}
| p:PLUS^ {#p.setType(SIGN_PLUS);}
)? atom ;
atom : INT | expr ;
Ignore the imaginary token business for now, this will be...
Consider the matching of the expression (-3--2). The firs...
Is the first token of signExpr a MINUS? Yes, it is! Brill...
sumExpr : prodExpr ((PLUS^|MINUS^) prodExpr)* ;
The next token of the sumExpr rule is PLUS OR MINUS, the ...
The parser has just matched the MINUS in sumExpr, it now ...
Hopefully I have not obfuscated the working of this rule....
signExpr : (
m:MINUS^ {#m.setType(SIGN_MINUS);}
| p:PLUS^ {#p.setType(SIGN_PLUS);}
)? atom ;
First of all, note that the first part of signExpr is enc...
m:MINUS^
Assigns the root node MINUS to the variable 'm'
{#m.setType(SIGN_MINUS);}
Replaces this root node in the tree with the token SIGN_M...
| #(MINUS a=expr b=expr) { r=a-b; }
In order not to have some kind of syntactic predicate to ...
imaginaryTokenDefinitions :
SIGN_MINUS
SIGN_PLUS
;
When parsing the AST, the tree parser knows exactly which...
{import java.lang.Math;}
class ExpressionTreeWalker extends TreeParser;
expr returns [double r]
{ double a,b; r=0; }
: #(PLUS a=expr b=expr) { r=a+b; }
| #(MINUS a=expr b=expr) { r=a-b; }
| #(MUL a=expr b=expr) { r=a*b; }
| #(DIV a=expr b=expr) { r=a/b; }
| #(MOD a=expr b=expr) { r=a%b; }
| #(POW a=expr b=expr) { r=Math.pow(a,b); }
| #(LPAREN a=expr) { r=a; }
| #(SIGN_MINUS a=expr) { r=-1*a; }
| #(SIGN_PLUS a=expr) { if(a<0)r=0-a; else r=a; }
| i:INT { r=(double)Integer.parseInt(i.getText()); }
;
If a SIGN_MINUS root is encountered, the desired conseque...
**備考 [#ac077c87]
As pointed out to me by Safak Oekmen, this interpretation...
Let's look at an example run through of the expression (-...
( ( ( - ( - 3 ) ( - 2 ) ) )
Value: -1.0
Here is the AST produced:
Figure�8.�(-3--2) AST
Here are some more AST's:
Figure�9.�A few AST's
**備考 [#ab49b80a]
In the parser for the expression evaluator, a top level e...
expr : LPAREN^ sumExpr RPAREN! ;
If we were reading the expression from a file, this rule ...
expr : LPAREN^ sumExpr RPAREN! EOF! ;
So that the EOF(End Of File), token is matched and the wh...
*CSV から XHTML テーブルへの変換例 [#sc41e24e]
A Lexer translates from a stream of characters to a strea...
A CSV is a very simple kind of data structure; variables ...
"STUDENT ID, "NAME, "DATE OF BIRTH
"129384, "Davy Jones, "03/04/81
"328649, "Clare Manstead, "30/11/81
"237090, "Richard Stoppit, "22/06/82
"523343, "Brian Hardwick, "15/11/81
"908423, "Sally Brush, "06/06/81
"328453, "Elisa Strudel, "12/09/82
"883632, "Peter Smith, "03/05/83
"542033, "Ryan Alcot, "04/12/80
The translator developed in this section will translate C...
The CSV file is composed of one or more lines and termina...
file : ( line (NEWLINE line)* (NEWLINE)? EOF )
A line consists of one or more records, NEWLINE is handle...
line : ( (record)+ ) ;
A record consists of a RECORD token, optionally followed ...
record : ( (r:RECORD) (COMMA)? ) ;
Notice that the last line of the CSV is an exceptional ca...
class CSVParser extends Parser;
file : ( line (NEWLINE line)* (NEWLINE)? EOF )
line : ( (record)+ ) ;
record : ( (r:RECORD) (COMMA)? ) ;
It is coupled with this lexer:
class CSVLexer extends Lexer;
options { charVocabulary='\3'..'\377'; }
RECORD : '"'! (~(','|'\r'|'\n'))+ ;
COMMA : ',' ;
NEWLINE : ('\r''\n')=> '\r''\n' //DOS
| '\r' //MAC
| '\n' //UNIX
{ newline(); }
;
WS : (' '|'\t') { $setType(Token.SKIP); } ;
First of all, charVocabulary is set to '\3'..'\377', this...
RECORD is an example of an "everything except" rule and s...
������Dave,����21
������Richard,�55
������Peter,���98
����
Was used, then the records would be "Dave",
"����21"
, "Richard", "55 ", "Peter" and
"����98"
, which is probably not what is desired. Of course, one c...
If this lexer and parser were used to process a CSV file,...
class CSVParser extends Parser;
options { k=2; }
file {System.out.println("file called");}
: ( line (NEWLINE line)* (NEWLINE)? EOF)
{System.out.println("file matched");}
;
line {System.out.println("line called");}
: ( (record)+ )
{System.out.println("line matched");}
;
record {System.out.println("record called");}
: ( (r:RECORD) (COMMA)? )
{System.out.println("record = " + r.getText());
System.out.println("record matched");}
;
Notice the lookahead of two to distinguish between (NEWLI...
"David Sprocket, "89
"Cindy Brocket, "18
"Michael Rocket, "33
The output that was produced is shown below (reformatted ...
file called
line called
record called
record = David Sprocket
record matched
record called
record = 89
record matched
line matched
line called
record called
record = Cindy Brocket
record matched
record called
record = 18
record matched
line matched
line called
record called
record = Michael Rocket
record matched
record called
record = 33
record matched
line matched
file matched
The ANTLR grammar so far can be downloaded here: csv1.g. ...
class CSVParser extends Parser;
options { k=2; }
file {System.out.println("<table align=\"center\" bord...
: ( line (NEWLINE line)* (NEWLINE)? EOF)
{System.out.println("</table>");}
;
line {System.out.println(" <tr>");}
: ( (record)+ )
{System.out.println(" </tr>");}
;
record {System.out.print(" <td>");}
: ( (r:RECORD) (COMMA)? )
{System.out.print(r.getText());
System.out.println("</td>");}
;
The output produced when processing the same file as befo...
<table align="center" border="1">
<tr>
<td>David Sprocket</td>
<td>89</td>
<tr/>
<tr>
<td>Cindy Brocket</td>
<td>18</td>
<tr/>
<tr>
<td>Michael Rocket</td>
<td>33</td>
<tr/>
</table>
Which is what was intended of the translator. Apart from ...
import java.io.*;
public class Main {
public static void main(String args[]) {
if(args.length==0) { error(); }
FileInputStream fileInput = null;
try {
fileInput = new FileInputStream(args[0]);
} catch(Exception e) { error(); }
try {
DataInputStream input = new DataInputStream(file...
CSVLexer csvLexer = new CSVLexer(input);
CSVParser csvParser = new CSVParser(csvLexer);
csvParser.file();
} catch(Exception e) { System.err.println(e.getMess...
}
private static void error() {
System.out.println("*-----------------------*");
System.out.println("| USAGE: |");
System.out.println("| java Main inputfile |");
System.out.println("*-----------------------*");
System.exit(0);
}
}
The program first checks that the one compulsory command ...
The Main program used to run the translator can be downlo...
The full grammar file can be downloaded here: csv2.g.
Some test data can be downloaded here: test.txt
At the moment the parser outputs to stdout, this is not v...
class CSVParser extends Parser;
options { k=2; }
file returns[String table = new String()]
{String lineData; table+="<table align=\"center\" ...
: ( lineData=line {table+=lineData;}
(NEWLINE lineData=line {table+=lineData;} )*
(NEWLINE)? EOF )
{table+="</table>";}
;
line returns [String lineData = new String()]
{String recordData; lineData+=" <tr>\n";}
: ( (recordData=record {lineData+=recordData;})+ )
{lineData+=" <tr/>\n";}
;
record returns [String recordData = new String()]
{recordData+=(" <td>");}
: ( (rec:RECORD) (COMMA)? )
{recordData+=(rec.getText());
recordData+="</td>\n";}
;
The full grammar can be downloaded here: csv3.g.
Within file: Immediately the opening table element is app...
line returns [String lineData = new String()]
{String recordData; lineData+=" <tr>\n";}
: ( (recordData=record {lineData+=recordData;})+ )
{lineData+=" <tr/>\n";}
;
Within line: Immediately the opening <tr> element is appe...
record returns [String recordData = new String()]
{recordData+=(" <td>");}
: ( (rec:RECORD) (COMMA)? )
{recordData+=(rec.getText());
recordData+="</td>\n";}
;
Within record: Immediately the opening td element is appe...
line receives recordData and this is used in the construc...
file receives lineData and this is used in the constructi...
Here is the new main method containing class:
import java.io.*;
public class CSVhtml {
public static void main(String args[]) {
if(args.length!=2) { error(); }
FileInputStream fileInput = null;
DataOutputStream fileOutput = null;
try {
fileInput = new FileInputStream(args[0]);
fileOutput = new DataOutputStream(new FileOutput...
} catch(Exception e) { error2(); }
try {
DataInputStream input = new DataInputStream(file...
CSVLexer csvLexer = new CSVLexer(input);
CSVParser csvParser = new CSVParser(csvLexer);
String p ="";
p ="<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
p+="<!DOCTYPE html\n";
p+="PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional/...
p+="\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-tra...
p+="<html>\n";
p+=" <head>\n";
p+=" <title>A Table</title>\n";
p+=" <meta http-equiv=\"Content-Type\" conte...
p+=" </head>\n";
p+=" <body>\n";
p+= csvParser.file();
p+=" </body>\n";
p+="</html>";
fileOutput.writeBytes(p);
fileOutput.close();
} catch(Exception e) { System.err.println(e.getMess...
}
private static void error() {
System.out.println("*------------------------------...
System.out.println("| USAGE: ...
System.out.println("| java CSVhtml inputfile outp...
System.out.println("*------------------------------...
System.exit(0);
}
private static void error2() {
System.out.println("*------------------------------...
System.out.println("| You must specify a valid inpu...
System.out.println("*------------------------------...
System.exit(0);
}
}
The program shown above can be downloaded here: CSVHTML.j...
After the grammar is converted to a lexer, parsed by ANTL...
java CSVHTML inputfile outputfile
The table produced from executing the command:
java CSVHTML test.txt output.html
Looks, under a certain proprietary web-browser, like this:
Figure�10.�test.txt expressed as a table
*Snippets From Behind The Scenes [#g621b422]
When the lexer tokenizes the input stream, each token enc...
// $ANTLR 2.7.1: "csv.g" -> "CSVParser.java"$
public interface CSVParserTokenTypes {
int EOF = 1;
int NULL_TREE_LOOKAHEAD = 3;
int NEWLINE = 4;
int RECORD = 5;
int COMMA = 6;
int WS = 7;
}
There is a class called TokenBuffer whose job it is to bu...
// Note, I have cleaned this up a little, ANTLR generate...
// { {instructions} {instructions} } which can be repres...
// { instructions instructions }
public final void file() throws RecognitionException, To...
try { // for error handling
System.out.println("file called");
int _cnt3=0;
_loop3:
do {
if ((LA(1)==RECORD)) {
line();
} else {
if ( _cnt3>=1 ) { break _loop3; }
else {throw new NoViableAltException(LT(1), g...
}
_cnt3++;
} while (true);
} catch (RecognitionException ex) {
reportError(ex);
consume();
consumeUntil(_tokenSet_0);
}
TokenBuffer provides tokens via LT and tokens via LA. Tok...
// Cleaned up a little by me
public Token nextToken() throws TokenStreamException {
Token theRetToken=null;
tryAgain:
for (;;) {
Token _token = null;
int _ttype = Token.INVALID_TYPE;
resetText();
try { // for char stream error handling
try { // for lexical error handling
switch ( LA(1)) {
case ',': {
mCOMMA(true);
theRetToken=_returnToken;
break;
}
case '\n': case '\r': {
mNEWLINE(true);
theRetToken=_returnToken;
break;
}
case '\t': case ' ': {
mWS(true);
theRetToken=_returnToken;
break;
}
default:
if ((_tokenSet_0.member(LA(1)))) {
mRECORD(true);
theRetToken=_returnToken;
} else {
if (LA(1)==EOF_CHAR) {uponEOF(); _retur...
else {throw new NoViableAltForCharExcep...
}
}
if ( _returnToken==null ) continue tryAgain; ...
_ttype = _returnToken.getType();
_ttype = testLiteralsTable(_ttype);
_returnToken.setType(_ttype);
return _returnToken;
} catch (RecognitionException e) {
throw new TokenStreamRecognitionException(e);
}
} catch (CharStreamException cse) {
if ( cse instanceof CharStreamIOException ) {
throw new TokenStreamIOException(((CharStream...
} else {
throw new TokenStreamException(cse.getMessage...
}
}
}
}
Notice the bit that says:
if (LA(1)==EOF_CHAR) {uponEOF(); _returnToken = makeToke...
else {throw new NoViableAltForCharException((char)LA(1),...
This says if the token found is an EOF_CHAR, call the upo...
The code below shows rec being assigned the Token returne...
{
rec = LT(1);
match(RECORD);
}
.
.
.
recordData+=(rec.getText());
recordData+="</td>\n";
*謝辞 [#x091c61a]
Thanks go to Bogdan Mitu for showing me the way with the ...
*参考リンク [#ve123230]
**書籍 [#uf731d80]
http://www.antlr.org/book/index.html
Practical�Computer�Language�Recognition�and�Translation
A�guide�for�building�source-to-source�translators�with�AN...
Copyright�1999�Terence�Parr
Updated�2/1/99
����������
**ANTLRに関する記事 [#j46e6a9c]
http://www.antlr.org/article/list
ANTLR articles page - lots of interesting things.
**とりあえずANTLRをはじめる方法 [#ye6b85de]
http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Getting+St...
Getting Started with ANTLR.
**ANTLRのチュートリアル [#y254c77c]
http://javadude.com/articles/antlrtut/
An ANTLR Tutorial by Scott Stanchfield
**コンパイラーに関する本 [#jfd9d1ac]
http://www.amazon.com/Compiler-Design-Theory-Systems-prog...
Compiler Theory And Design
The ANTLR Reference Manual
Comes included with the ANTLR installation
ページ名: