構文解析の記事一覧

趣旨

以下のページのタイトルだけでも日本語だったらうれしいとおもったから、和訳に挑戦 http://www.antlr.org/wiki/display/ANTLR3/ANTLR+v3+FAQ

目次

言語ごとのAPI資料

Java API

http://www.antlr.org/api/Java/index.html

C API

http://www.antlr.org/api/C/index.html

Python API

http://www.antlr.org/api/Python/index.html

C# API

リンク切れ

ActionScript? API

http://www.antlr.org/api/ActionScript/index.html

ダウンロード可能なサンプル

以下のサイトから

Examples

と書いてある箇所をさがす http://www.antlr.org/download.html

FAQ - アクション

バックトラック中のアクション実行について

ANTLR must parse ahead to see if something matches. If it fails, then ANTLR tries the next viable alt. Upon failure, it's pretty hard to undo actions in general so ANTLR gates actions out with something like this:

if ( backtracking==0 ) {
    SymbolTable.getPredefinedType("void"), symtab.getDefaultPkg());
}

Labels are still defined, because semantic preds might need them:

packageDefinition1=packageDefinition();

AST actions are off during backtracking:

if ( backtracking==0 ) list_packageDefinition.add(packageDefinition1.tree);

Upon success, antlr still rewinds the input and then does the same parse again "with feeling".

Clearly you don't want actions executed when that alt ultimately doesn't succeed. Similarly, when the alt does succeed, you do not want actions executed twice. This all makes sense.

The problem arises when you want actions to execute during backtracking so that semantic predicates make sense during backtracking. If you are combining backtracking (syntactic predicates) and semantic predicates for tough languages like C++ then you must execute actions during the backtrack but then avoid them during the parse.

To be finished when I know how to solve...

パーサのように単語切り出しではなぜ自分のヘッダは表示されないのか?

Actions are specific to one grammar. When using a combined parser/lexer grammar, use

// applies only to the parser:
@header {package foo;}

// applies only to the lexer:
@lexer::header {package foo;}

not just the @header.

When using a separate (non-combined) parser, @header applies to the parser. Likewise, when using a separate (non-combined) lexer, @header applies to the lexer (it is not necessary to use @lexer::header in this case).

コマンドラインオプション

ANTLR はどこにファイルを出力するのか?

Added by Terence Parr, last edited by Terence Parr on Oct 18, 2009 (view change) ANTLR writes files to the current directory by default. But, the output filename is sensitive to the output directory and the directory where the grammar file was found. If you reference a grammar with a (relative or absolute) path, ANTLR will pull the grammar from that directory and write the output to that directory:

$ cd ~parrt
$ java org.antlr.Tool T.g       # write to ~parrt/TParser.java
$ java org.antlr.Tool /foo/T.g  # write to /foo/TParser.java

If you specify an output directory with -o then ANTLR will put the output files in that directory or underneath if you have a relative path on the grammar file:

$ java org.antlr.Tool -o /tmp T.g      # write to /tmp/TParser.java
$ java org.antlr.Tool -o /tmp sub/T.g  # write to /tmp/sub/TParser.java

The output directory -o value takes precedence over the grammar's path when the output directory is absolute.

$ java org.antlr.Tool -o /tmp /usr/lib/T.g  # write to /tmp/TParser.java
$ java org.antlr.Tool -o ick /usr/lib/T.g   # write to ick/TParser.java 
$ java org.antlr.Tool -o . /usr/lib/T.g     # write to TParser.java 

Use -fo option to force output to go explicitly into a directory, ignoring any path on the input grammar name.

$ java org.antlr.Tool -fo /tmp T.g           # write to /tmp/TParser.java
$ java org.antlr.Tool -fo /tmp sub/T.g       # write to /tmp/TParser.java
$ java org.antlr.Tool -fo /tmp /usr/lib/T.g  # write to /tmp/TParser.java

Note: If the outputDir set by -o is not present it will be created.

デバック

How do I broadcast debug events to multiple listeners?

How do I debug a tree grammar using AntlrWorks??

How do I set up remote debugging anyway?

When do I need to use remote debugging?

FAQ - Error handling

How can I print out the text of the line where an error occurs?

FAQ - General

How do I make ANTLRWorks and Visual Studio work together? (CSharp as target language)

How do I rebuild ANTLR v3?

How do I turn off the warnings that antlr generated classes show?

What is ANTLR?

ANTLR v2 と v3ではどこがちがうのか?

Added by Terence Parr, last edited by Terence Parr on May 05, 2007 (view change) [See also Migrating from ANTLR 2 to ANTLR 3#NewinANTLR3]

There are number of things have changed for the better in v3 including:

a brand-new very powerful extension to LL(k) called LL(*) an auto backtracking mode partial parsing result memoization to increase the speed of backtracking a really nice AST rewrite rule mechanism integration of the StringTemplate template engine for generating structured text improved error reporting and recovery a truly retargetable code generator that makes it easy to build backends ("targets"); currently we have the following Code Generation Targets: Java, C#, C, Objective-C, Python with others in development BSD license Perhaps most importantly, Jean Bovet built the amazing ANTLRWorks grammar development environment.

参考ANTLRWorks

http://www.antlr.org/works/index.html

For ANTLR v3 I decided to make the most common tasks easy by default rather. This means that some of the basic objects are heavier weight than some speed demons would like, but they are free to pare it down leaving most programmers the luxury of having it "just work." For example, to read in some input, tweak it, and write it back out preserving whitespace, is easy in v3.

LL(*) parsing

ANTLR v3 does not require you to specify a lookahead depth (k option in v2). ANTLR's new LL(*) algorithm allows parser lookahead to roam arbitrarily far ahead looking for a token or sequence that disambiguates a decision. LL(k) can only look a fixed k symbols ahead. It is the difference between using a cyclic DFA and an acyclic DFA to make lookahead decisions.

method : type ID '(' arg* ')' ';'
      | type ID '(' arg* ')' '{' body '}'
      ;

This is not LL(k) for any finite k. From the left edge, lookahead unbounded to see the ';' vs '{'. We need arbitrary lookahead because of the arg*. If you have actions after ID, you can't easily refactor the rule. ANTLR v3 handles is no problem as long as rule arg is not recursive. ANTLR builds a tight little DFA to scan ahead looking for ';' vs '{'.

Auto-backtracking

Sometimes even LL(*) will not be powerful enough to handle a particular decision or multiple decisions within a grammar. Rather than have to add a bunch of syntactic predicates in those decisions as you would in v2, you can turn on the auto backtracking feature (set option backtrack=true). This feature essentially tells ANTLR that if it fails to analyze a decision, simply figure it out at run time by backtracking across the alternatives within that decision.

Take the method rule from the previous section. If rule arg is now recursive (as it is in C), ANTLR will fail to build a DFA because DFA have no stacks. A full parser is required to match the argument list. Just turn on the auto backtracking feature and ANTLR will gladly accept just about any non-left-recursive grammar.

grammar C;
options {
 backtrack=true;
}
method : type ID '(' arg* ')' ';'
      | type ID '(' arg* ')' '{' body '}'
      ;

AST rewrite rules

v3 introduces a really nice way to build ASTs. The following example demonstrates the AST rewrite mechanism. Anything after -> is considered a tree grammar. ANTLR figures out how to map the parser grammar to tree grammar.

packageDefinition
	:	'package' classname ';' -> ^('package' classname)
	;

This builds a tree with 'package' at the root and classname as its first child. Here's another example:

formalArgs
	:	typename declarator (',' typename declarator )*
		-> ^(ARG typename declarator)+
	;

This builds tree sequences like:

^(ARG int v1) ^(ARG int v2)

StringTemplate integration

ANTLR v3 tightly integrates the StringTemplate template engine, which is useful for generating any kind of structure text such as source code or XML. In the following grammar, the rule matches a simple assignment statement and implicitly returns a template.

The template is an instance of the assign template pulled from a group of templates (set by you in your main()). The arguments to the template are a series of template attribute assignments. This is the interface between the template mechanism and the parser attribute mechanism.

assign_statement : ID '=' INT ';' -> assign(x={$ID.text},y={$INT.text}) ;

Here's a rule that creates a template called 'import' for each import definition found in the input stream:

grammar Java;
options {
  output=template;
}
...
importDefinition
   :   'import' identifierStar SEMI
       -> import(name={$identifierStar.st},
               begin={$identifierStar.start},
               end={$identifierStar.stop})
   ;

The attributes are set via assignments in the argument list. The arguments are actions with arbitrary expressions in the target language. The .st label property is the result template from a rule reference. There is a nice shorthand in actions too:

to z [languages like python without ';' must still use the

';' which the code generator is free to remove during code gen]

Same as '(x).setAttribute("y", z);'

See Template construction for more information.

参考 String Template Engine

http://www.stringtemplate.org/

5分でわかるString Template Engine

http://www.antlr.org/wiki/display/ST/Five+minute+Introduction

Integrated lexers and parsers

Lexers are much easier due to the LL(*)algorithm as well. Previously these two lexer rules would cause trouble because ANTLR couldn't distinguish between them with finite lookahead to see the decimal point:

INT : ('0'..'9')+ ; FLOAT : INT '.' INT ; The syntax is almost identical for features in common, but you should note that labels are always '=' not ':'. So do id=ID not id:ID.

You can do combined lexer/parser grammars again (ala PCCTS) both lexer and parser rules are defined in the same file. See the examples. Really nice. You can reference strings and characters in the grammar and ANTLR will generate the lexer for you.

Semantic predicates hoisting Introduced in ANTLR v1, semantic predicates hoisting is back in v3 (it was absent in v2).

Attributes The attribute structure has been enhanced. Rules may have multiple return values, for example. Further, there are dynamically scoped attributes whereby a rule may define a value usable by any rule it invokes directly or indirectly w/o having to pass a parameter all the way down.

finally block Code in the finally {...} action executes in the finally block (java target) after all other stuff like rule memoization. Example:

foo : FOO ;

   finally { System.out.println("on the way out of foo"); }

Source code The ANTLR source code is much prettier and there are about 800 unit tests so far.

You'll also note that the run-time classes are conveniently encapsulated in the org.antlr.runtime package.

FAQ - Getting Started

Building ANTLR Projects with Maven

How do I install this damn thing?

How do I use ANTLR v3 from the command line?

How do I use ANTLR v3 generated Lexer and Parser from Java?

Where do I find this damn thing?

FAQ - Grammar analysis

How to get a list of all valid options for the next token?

FAQ - Lexical analysis

Can I see a more complete example?

How can I allow keywords as identifiers?

How can I emit an error token upon lexical error?

How can I emit more than a single token per lexer rule?

How can I make the lexer exit upon first lexical error?

How do I access hidden tokens from parser?

How do I combine fuzzy parsing and stream rewriting?

How do I fetch tokens on demand not all at once up front?

How do I get case insensitivity?

How do I handle abbreviated keywords?

How do I implement include files?

How do I match multi-line comments?

How do I selectively ignore tokens depending on parser context?

How do I strip quotes?

How do I use a custom token object?

Lexing floating point numbers, dot opertor and range operator

What is the intended behavior of the lexer?

FAQ - Parsing

Why the generated parser code tolerates illegal expression?

FAQ - Runtime libraries

FAQ - Translation

FAQ - Tree construction

Can you explain ANTLR's tree construction facilities?

How can I build a different AST node type for each token type?

How can I build parse trees not ASTs?

How do I display ASTs graphically

How do I track whitespace, comments, and other hidden channels during AST construction?

What does the error 'cannot find tokenRefBangTrack?.st' mean?

FAQ - Tree Parsing

How do I get detailed tree parser error messages?

Tree Parsing in the Eclipse Modeling Framework with Ecore Metamodel

Why do I get a ClassCastException? when parsing a tree?

Why do I get a null pointer exception when I try to reference text attributes in my tree parser?

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2011-01-30 (日) 22:18:46 (3181d)