[[JAVAの記事一覧]]

[[構文解析の記事一覧]]


&topicpath;

*趣旨 [#x0113801]
jparsecの日本語訳のページがなかったので
適当に気が向いたときに翻訳してみるページです。

わかりやすいと思った記事を勝手に追加するのを良しとします。

英語はそんなに得意じゃないんだけどね。

ソースコード読めばなんとなくわかるから、そっちの方がいいかも。

と、おもったけど、きのせいだったみたい。

*目次 [#bca26b28]
#contents

*オリジナルページURL [#z8ecfe5c]
http://jparsec.codehaus.org/jparsec+Overview

*jparsec 概要 [#w06934a5]
In a typical parser program written in jparsec, programmer creates a bunch of Parser objects and combines them together. These Parser objects each represent a piece of parsing logic.
*jparsec の概要 [#w06934a5]
jparsecで記述された典型的な構文解析プログラムで、プログラマーは構文解析木やそれらを寄せ集めた束を作ることができます。これらのパーサはそれぞれ部分部分の構文の解析をします。

*始め方 [#h1540338]
jparsecは、パーサ生成コードを元にパーサを構築します。パーサを作ればあとは次のようにパースすることができます。

With jparsec, one constructs Parser object in terms of the production rule of the grammar. Once a Parser object is created, it can be used as in:

 parser.parse("code to parse");
Depending on your need, this return value can be either the calculation result or an abstract syntax tree.
あなたのお好みに合わせて、戻り値は計算結果や構文解析木にすることができます。

So how does one create Parser object? The following are the most important classes:
では、どのようにパーサを作るのでしょうか?下記に最も重要なクラスを上げます

-Parser
**Parserクラス [#m29d1670]
A parser encapsulates a piece of parsing logic and simple Parser objects can be combined to create more complex parser.
-Parsers
Common parser implementation.
-Scanners
A scanner is a parser that scans the source string and recognizes certain patterns.
-Terminals
Provides tokenizers and lexers for common terminals such as identifier, integer, scientific number etc.
-OperatorTable
Supports operator precedence grammar. Programmer declares operators in an OperatorTable and the framework will take care of constructing a full-blown expression parser.

*What are the top 5 combinators that I need to familiarize myself with? [#hff6279d]
**Parsersクラス [#o43d4ecd]
パーサが共通に実装してます。

**or [#oa4441a3]
This is the logical alternative combinator. A production rule "A ::= B|C|D" can be represented as:
**Scannersクラス [#u3d6113c]
スキャナーが解析対象のソースコードの文字列をスキャンしてパターンによって認識に認識します。

**Terminalsクラス [#l6183fab]
提供するトークナイザは、識別子、整数、科学的な数などの一般的な末端の字句解析器です。

**OperatorTableクラス [#n44bf222]
演算子の構文における優先順位を担当します。 プログラマは演算子を演算子一覧表に使用する演算子を宣言します。そうすることでフレームワークは本格的なパーサの構築する手助けをします。

*パーサ定義をこのフークワーク用に記述し直す例のトップ5[#hff6279d]

**orの記述 A ::= B|C|D [#oa4441a3]
これは論理的などちらかを選択する概念です。次のルール 
 A ::= B|C|D
は次のように記述できます。
 Parser<Foo> a = Parsers.or(b, c, d);
**next/sequence [#cd7b12ba]
This is the sequential combinator. Production rule "A ::= BCD" can be represented as:

**シーケンスの記述  A ::= BCD[#cd7b12ba]
これは、シーケンスのコンビネータです。 
生成ルール
 "A ::= BCD" 
は次のように記述できます。
 Parser<Foo> a = Parsers.sequence(b,c,d);
With alternative and sequential combinator, one can start building sophisticated parsers.

**map/sequence [#f99b7d60]
When constructing a parser, we typically want to not only recognize a certain grammar, but also to build some object or perform some computation based on the recognized grammar. This family of map/sequence combinators can be used to perform such computation. For example, in order to use the  parser result of B, C, D to create an object of A, one can implement the callback interface Map3, which accepts the parser result of B, C and D as input parameter and returns the A object as result.
パーサを作る時, we typically want to not only recognize a certain grammar, but also to build some object or perform some computation based on the recognized grammar. This family of map/sequence combinators can be used to perform such computation. For example, in order to use the  parser result of B, C, D to create an object of A, one can implement the callback interface Map3, which accepts the parser result of B, C and D as input parameter and returns the A object as result.

Implementing anonymous class for the Map interfaces could be verbose though. A convenience Mapper class is provided to simplify the syntax. It requires additional dependency on cglib.
many/many1
These combinators implement the "kleene star" and "kleene cross" logic in BNF. "A ::= B*" is represented as:

**many/many1 [#c02acfad]
These combinators implement the "kleene star" and "kleene cross" logic in BNF.

 "A ::= B*"
は次のように記述できます。
 
 Parser<Foo> foo = ...;
 Parser<Void> a = foo.skipMany();
or
または

 Parser<Foo> foo = ...;
 Parser<List<Foo>> a = foo.many();
where the latter will additionally return a list of Foo object as the parser result.

**lazy [#e177fd23]
Production rules can involve recursion. (for example, an expression with binary operators is represented recursively in production rule). The lazy combinator allows a parser to reference a parser that will be set later.
**lazy 後での評価 [#e177fd23]
生成規則は再帰処理を記述可能です。 (例えば, an expression with binary operators is represented recursively in production rule). 後で評価するコンビネータはパーサがパーサが設定されるあとで参照されます。

*Lexical analysis vs. syntactical analysis [#h7557774]
*字句解析対構文解析 [#h7557774]

In a simple scenario, all work can be done in the scanning phase. For example:
簡単な例では, スキャン段階ではすべて動作します。 例えば:

 Parser<List<String>> numbers = Scanners.INTEGER.sepBy(Scanners.isChar(','));
 assertEquals(Arrays.asList("1", "2", "3"), numbers.parse("1,2,3"));
However, when the complexity of the grammar rule scales up and there are whitespaces and comments to ignore from the grammar, one-pass parsing becomes awkward. A 2-pass approach can then be used. That is, a lexical analysis phase scans the source as a list of Tokens and then a second syntactical analysis phase parses the tokens.
しかしながら, 構文の規則が複雑になる時 、それと、スペース文字やコメントを無視するようになる時、解析がぎこちなくなります。次の段階でトークンの解析をします。 

The Terminals class provides common tokenizers that scans the source string and turns them into tokens. It also provides corresponding syntactic parsers that recognize these tokens in the syntactical analysis phase.
Terminalsクラスは共通のトークン化処理を提供しており、ソース文字列をスキャンしてトークン化します. It also provides corresponding syntactic parsers that recognize these tokens in the syntactical analysis phase.

A syntactical parser takes a list of tokens as input, this list needs to come from the output of a lexer. The Parser.from() API can be used to chain a syntactical parser with a lexer.

*What are the typical steps in creating a conventional 2-pass parser? [#ob164d22]
*2パスパーサの典型的な構築手順 [#ob164d22]

**Step 1: Terminals [#t492dde1]
**手順 1: Terminals 終端 [#t492dde1]

Use the pre-defined tokenizers and terminal syntactical parsers in Terminals to define the atoms of your language.

For example, the following parser parses a list of integers separated by a comma, with hitespaces and block comments ignored.

 Terminals operators = Terminals.operators(","); // only one operator supported so far 
 Parser<?> integerTokenizer = Terminals.IntegerLiteral.TOKENIZER;
 Parser<String> integerSyntacticParser = Terminals.IntegerLiteral.PARSER;
 Parser<?> ignored = Parsers.or(Scanners.JAVA_BLOCK_COMMENT, Scanners.WHITESPACES);
 Parser<?> tokenizer = Parsers.or(operators.tokenizer(), integerTokenizer); // tokenizes the operators and integer
 Parser<List<String>> integers = integerSyntacticParser.sepBy(operators.token(","))
    .from(tokenizer, ignored.skipMany());
 assertEquals(Arrays.asList("1", "2", "3"), integers.parse("1, /*this is comment*/2, 3");
**Step 2: Production rule [#w9c81006]

The next step is to build the syntactical parser following production rules. The "integers" parser used above is a simple example. Real parsers can be arbitrarily complex. For operator precedence grammar, OperatorTable can be used to declare operator precedences and associativities and construct parser based on the declaration.
**手順2: 生成ルール [#w9c81006]

次の手順では以下にしめす生成ルールで文法解析機を作ります。The "integers" parser used above is a simple example. Real parsers can be arbitrarily complex. For operator precedence grammar, OperatorTable can be used to declare operator precedences and associativities and construct parser based on the declaration.

As in most recursive descent parsers, left-recursion needs to be avoided. Beware not to write a parser like this:

 Parser.Reference<Expr> ref = Parser.newReference();
 Parser<Expr> expr = Parsers.sequence(ref.lazy(), operators.token("+"), number); // left recursion!
ref.set(expr);
It will fail with stack overflow!

A less obvious left-recursion is a production rule that looks like:

 Parser.Reference<Expr> ref = Parser.newReference();
 Parser<Expr> expr = Parsers.sequence(operators.token("-").many(), ref.lazy());
ref.set(expr);
 ref.set(expr);
As many can occur 0 times, we have a potential left recursion here.

Although left recursive grammar isn't generally supported, the most common case of left recursion stems from left associative binary operator, which is handled by OperatorTable.

Tips
*便利技 [#f3702c8c]

Please see jparsec Tips for tips and catches.

http://jparsec.codehaus.org/jparsec+Tips

*Haskell版persecのwikiはこちら [#w16a8de9]
http://www.haskell.org/haskellwiki/Parsec

トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS