以下のページのタイトルだけでも日本語だったらうれしいとおもったから、和訳に挑戦 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+v3+FAQ
http://www.antlr.org/api/Java/index.html
http://www.antlr.org/api/C/index.html
http://www.antlr.org/api/Python/index.html
リンク切れ
http://www.antlr.org/api/ActionScript/index.html
以下のサイトから
Examples
と書いてある箇所をさがす http://www.antlr.org/download.html
ANTLR must parse ahead to see if something matches. If it fails, then ANTLR tries the next viable alt. Upon failure, it's pretty hard to undo actions in general so ANTLR gates actions out with something like this:
if ( backtracking==0 ) { SymbolTable.getPredefinedType("void"), symtab.getDefaultPkg()); }
Labels are still defined, because semantic preds might need them:
packageDefinition1=packageDefinition();
AST actions are off during backtracking:
if ( backtracking==0 ) list_packageDefinition.add(packageDefinition1.tree);
Upon success, antlr still rewinds the input and then does the same parse again "with feeling".
Clearly you don't want actions executed when that alt ultimately doesn't succeed. Similarly, when the alt does succeed, you do not want actions executed twice. This all makes sense.
The problem arises when you want actions to execute during backtracking so that semantic predicates make sense during backtracking. If you are combining backtracking (syntactic predicates) and semantic predicates for tough languages like C++ then you must execute actions during the backtrack but then avoid them during the parse.
To be finished when I know how to solve...
Actions are specific to one grammar. When using a combined parser/lexer grammar, use
// applies only to the parser: @header {package foo;} // applies only to the lexer: @lexer::header {package foo;}
not just the @header.
When using a separate (non-combined) parser, @header applies to the parser. Likewise, when using a separate (non-combined) lexer, @header applies to the lexer (it is not necessary to use @lexer::header in this case).
Added by Terence Parr, last edited by Terence Parr on Oct 18, 2009 (view change) ANTLR writes files to the current directory by default. But, the output filename is sensitive to the output directory and the directory where the grammar file was found. If you reference a grammar with a (relative or absolute) path, ANTLR will pull the grammar from that directory and write the output to that directory:
$ cd ~parrt $ java org.antlr.Tool T.g # write to ~parrt/TParser.java $ java org.antlr.Tool /foo/T.g # write to /foo/TParser.java
If you specify an output directory with -o then ANTLR will put the output files in that directory or underneath if you have a relative path on the grammar file:
$ java org.antlr.Tool -o /tmp T.g # write to /tmp/TParser.java $ java org.antlr.Tool -o /tmp sub/T.g # write to /tmp/sub/TParser.java
The output directory -o value takes precedence over the grammar's path when the output directory is absolute.
$ java org.antlr.Tool -o /tmp /usr/lib/T.g # write to /tmp/TParser.java $ java org.antlr.Tool -o ick /usr/lib/T.g # write to ick/TParser.java $ java org.antlr.Tool -o . /usr/lib/T.g # write to TParser.java
Use -fo option to force output to go explicitly into a directory, ignoring any path on the input grammar name.
$ java org.antlr.Tool -fo /tmp T.g # write to /tmp/TParser.java $ java org.antlr.Tool -fo /tmp sub/T.g # write to /tmp/TParser.java $ java org.antlr.Tool -fo /tmp /usr/lib/T.g # write to /tmp/TParser.java
Note: If the outputDir set by -o is not present it will be created.
Added by Terence Parr, last edited by Terence Parr on May 05, 2007 (view change) [See also Migrating from ANTLR 2 to ANTLR 3#NewinANTLR3]
There are number of things have changed for the better in v3 including:
a brand-new very powerful extension to LL(k) called LL(*) an auto backtracking mode partial parsing result memoization to increase the speed of backtracking a really nice AST rewrite rule mechanism integration of the StringTemplate template engine for generating structured text improved error reporting and recovery a truly retargetable code generator that makes it easy to build backends ("targets"); currently we have the following Code Generation Targets: Java, C#, C, Objective-C, Python with others in development BSD license Perhaps most importantly, Jean Bovet built the amazing ANTLRWorks grammar development environment.
http://www.antlr.org/works/index.html
For ANTLR v3 I decided to make the most common tasks easy by default rather. This means that some of the basic objects are heavier weight than some speed demons would like, but they are free to pare it down leaving most programmers the luxury of having it "just work." For example, to read in some input, tweak it, and write it back out preserving whitespace, is easy in v3.
ANTLR v3 does not require you to specify a lookahead depth (k option in v2). ANTLR's new LL(*) algorithm allows parser lookahead to roam arbitrarily far ahead looking for a token or sequence that disambiguates a decision. LL(k) can only look a fixed k symbols ahead. It is the difference between using a cyclic DFA and an acyclic DFA to make lookahead decisions.
method : type ID '(' arg* ')' ';' | type ID '(' arg* ')' '{' body '}' ;
This is not LL(k) for any finite k. From the left edge, lookahead unbounded to see the ';' vs '{'. We need arbitrary lookahead because of the arg*. If you have actions after ID, you can't easily refactor the rule. ANTLR v3 handles is no problem as long as rule arg is not recursive. ANTLR builds a tight little DFA to scan ahead looking for ';' vs '{'.
Sometimes even LL(*) will not be powerful enough to handle a particular decision or multiple decisions within a grammar. Rather than have to add a bunch of syntactic predicates in those decisions as you would in v2, you can turn on the auto backtracking feature (set option backtrack=true). This feature essentially tells ANTLR that if it fails to analyze a decision, simply figure it out at run time by backtracking across the alternatives within that decision.
Take the method rule from the previous section. If rule arg is now recursive (as it is in C), ANTLR will fail to build a DFA because DFA have no stacks. A full parser is required to match the argument list. Just turn on the auto backtracking feature and ANTLR will gladly accept just about any non-left-recursive grammar.
grammar C; options { backtrack=true; } method : type ID '(' arg* ')' ';' | type ID '(' arg* ')' '{' body '}' ;
v3 introduces a really nice way to build ASTs. The following example demonstrates the AST rewrite mechanism. Anything after -> is considered a tree grammar. ANTLR figures out how to map the parser grammar to tree grammar.
packageDefinition : 'package' classname ';' -> ^('package' classname) ;
This builds a tree with 'package' at the root and classname as its first child. Here's another example:
formalArgs : typename declarator (',' typename declarator )* -> ^(ARG typename declarator)+ ;
This builds tree sequences like:
^(ARG int v1) ^(ARG int v2)
ANTLR v3 tightly integrates the StringTemplate template engine, which is useful for generating any kind of structure text such as source code or XML. In the following grammar, the rule matches a simple assignment statement and implicitly returns a template.
The template is an instance of the assign template pulled from a group of templates (set by you in your main()). The arguments to the template are a series of template attribute assignments. This is the interface between the template mechanism and the parser attribute mechanism.
assign_statement : ID '=' INT ';' -> assign(x={$ID.text},y={$INT.text}) ;
Here's a rule that creates a template called 'import' for each import definition found in the input stream:
grammar Java; options { output=template; } ... importDefinition : 'import' identifierStar SEMI -> import(name={$identifierStar.st}, begin={$identifierStar.start}, end={$identifierStar.stop}) ;
The attributes are set via assignments in the argument list. The arguments are actions with arbitrary expressions in the target language. The .st label property is the result template from a rule reference. There is a nice shorthand in actions too:
to z [languages like python without ';' must still use the
';' which the code generator is free to remove during code gen]
Same as '(x).setAttribute("y", z);'
See Template construction for more information.
http://www.stringtemplate.org/
http://www.antlr.org/wiki/display/ST/Five+minute+Introduction
Lexers are much easier due to the LL(*)algorithm as well. Previously these two lexer rules would cause trouble because ANTLR couldn't distinguish between them with finite lookahead to see the decimal point:
INT : ('0'..'9')+ ; FLOAT : INT '.' INT ; The syntax is almost identical for features in common, but you should note that labels are always '=' not ':'. So do id=ID not id:ID.
You can do combined lexer/parser grammars again (ala PCCTS) both lexer and parser rules are defined in the same file. See the examples. Really nice. You can reference strings and characters in the grammar and ANTLR will generate the lexer for you.
Semantic predicates hoisting Introduced in ANTLR v1, semantic predicates hoisting is back in v3 (it was absent in v2).
Attributes The attribute structure has been enhanced. Rules may have multiple return values, for example. Further, there are dynamically scoped attributes whereby a rule may define a value usable by any rule it invokes directly or indirectly w/o having to pass a parameter all the way down.
finally block Code in the finally {...} action executes in the finally block (java target) after all other stuff like rule memoization. Example:
foo : FOO ;
finally { System.out.println("on the way out of foo"); }
Source code The ANTLR source code is much prettier and there are about 800 unit tests so far.
You'll also note that the run-time classes are conveniently encapsulated in the org.antlr.runtime package.