"You were wrong case.
To live here is to live."
This suggests a recursive-descent procedure.
If our molly grammar was working with tokens, it may look like:
A -> code_open_tok foo code_close_tok -> exp_open_tok foo exp_close_tokwhere, for example, the code_open_tok is '[[' and exp_open_tok is '[='
The above grammar is already LL(1). Since we are working with code_open or exp_open tokens already, we don't have to transform the grammar - here the fact that exp_open and code_open share a common prefix of '[' is irrelevant because the regular-expression tokenizer will disambiguate between them.
Generally, we can move the analysis of tokens into the grammar itself. For example, to recognize the following sentences:
Our grammar looks like:[[ foo ]] [ bar ]
To make this LL(1), our grammar now looks like:A -> [[ foo ]] B -> [ bar ]
Since there is no nesting heirachy to these 'tokens' the above can be parsed with a regular grammar. A combined tokenizer/parser approach is simpler for small grammars like molly pages. The tokenizer looks at as many characters as it needs to disambiguate tokens (for example, it reads 2 characters to distinguish between [[ and [!).A -> [ B B -> [ C ]] -> D ]
Based on the type of identified token, we use recursive-descent and call a method to further parse the construct that starts with that token.
The molly grammar is:
S -> (HTML_Text | Molly)* Molly -> Expression | Code | Declaration | Comment | DirectiveOrHdoc Expression -> ExpOpen EText ExpClose Declaration -> DeclarationOpen EText DeclarationClose Comment -> CommentOpen EText CommentClose Code -> CodeOpen EText|Hash CodeClose Hash -> HashOpen EText|Expression HashClose HTML_Text -> Any chars. Any tags that open molly sections must be escaped. EText -> Any chars. The closing tag (if any) must be escaped within EText. DeclarationOpen -> [! DeclarationClose -> !] CodeOpen -> [[ CodeClose -> ]] ExpOpen -> [= ExpClose -> ] HashOpen -> # HashClose -> # DeclarationOpen -> [include [include-file [include-decl [forward [import DeclarationClose -> ] CommentOpen -> [/* CommentClose -> */] DirectiveOrHdoc -> [page Directive|Heredoc Directive -> DirectiveOpen DirectiveClose Heredoc -> HeredocOpen ] HeredocClose DirectiveOpen -> .. a=b, a = "c d" .. etc.. DirectiveClose -> ] HeredocOpen -> <<<EText ] HeredocClose -> EText (same as starting EText)Since there are no ε-productions, we don't have to examine follow sets. A simple visual inspection shows that all first sets are clearly disjoint. This is a very simple grammar !
The entire parser is found in the class PageParser.java in the molly package web.page. The syntax and start-end tokens can be easily changed and customized to one's individual liking.