|
MollyPages.org
"You were wrong case. To live here is to live." |
Pages /
Database /
Forms /
Servlet /
Javadocs
/
License & Download /
Tutorials /
Cookbook
/
Contact & Hire
|
This suggests a recursive-descent procedure.
If our molly grammar was working with tokens, it may look like:
A -> code_open_tok foo code_close_tok -> exp_open_tok foo exp_close_tokwhere, for example, the code_open_tok is '[[' and exp_open_tok is '[='
The above grammar is already LL(1). Since we are working with code_open or exp_open tokens already, we don't have to transform the grammar - here the fact that exp_open and code_open share a common prefix of '[' is irrelevant because the regular-expression tokenizer will disambiguate between them.
Generally, we can move the analysis of tokens into the grammar itself. For example, to recognize the following sentences:
Our grammar looks like:[[ foo ]] [ bar ]
To make this LL(1), our grammar now looks like:A -> [[ foo ]] B -> [ bar ]
Since there is no nesting heirachy to these 'tokens' the above can be parsed with a regular grammar. A combined tokenizer/parser approach is simpler for small grammars like molly pages. The tokenizer looks at as many characters as it needs to disambiguate tokens (for example, it reads 2 characters to distinguish between [[ and [!).A -> [ B B -> [ C ]] -> D ]
Based on the type of identified token, we use recursive-descent and call a method to further parse the construct that starts with that token.
The molly grammar is:
S -> (HTML_Text | Molly)*
Molly -> Expression | Code | Declaration | Comment | DirectiveOrHdoc
Expression -> ExpOpen EText ExpClose
Declaration -> DeclarationOpen EText DeclarationClose
Comment -> CommentOpen EText CommentClose
Code -> CodeOpen EText|Hash CodeClose
Hash -> HashOpen EText|Expression HashClose
HTML_Text -> Any chars. Any tags that open molly sections must be escaped.
EText -> Any chars. The closing tag (if any) must be escaped
within EText.
DeclarationOpen -> [!
DeclarationClose -> !]
CodeOpen -> [[
CodeClose -> ]]
ExpOpen -> [=
ExpClose -> ]
HashOpen -> #
HashClose -> #
DeclarationOpen -> [include
[include-file
[include-decl
[forward
[import
DeclarationClose -> ]
CommentOpen -> [/*
CommentClose -> */]
DirectiveOrHdoc -> [page Directive|Heredoc
Directive -> DirectiveOpen DirectiveClose
Heredoc -> HeredocOpen ] HeredocClose
DirectiveOpen -> .. a=b, a = "c d" .. etc..
DirectiveClose -> ]
HeredocOpen -> <<<EText ]
HeredocClose -> EText (same as starting EText)
Since there are no ε-productions, we don't have to examine follow sets.
A simple visual inspection shows that all first sets are clearly
disjoint.
This is a very simple grammar !
The entire parser is found in the class PageParser.java in the molly package web.page. The syntax and start-end tokens can be easily changed and customized to one's individual liking.