A parser combinator library with optional memoization and convenient syntax.
Parse streams are an abstraction to treat ports as proper streams so that we can backtrack from previous states. A single Parse-Stream record represents a single buffered chunk of text.Create a parse stream open on the givenfilename
, with a
possibly already opened port
.Open filename
and create a parse stream on it.Create a parse stream on a string str
.Access the next buffered chunk of a parse stream.Returns true iff i
is the first character position in the
parse stream source
.Returns true iff i
is the last character position in the
parse stream source
.Returns the character in parse stream source
indexed by
i
.Returns a string composed of the characters starting at parse
stream s0
index i0
(inclusive), and ending at s1
index i1
(exclusive).
Combinator to indicate failure.Call the parser combinator f
on the parse stream
source
, starting at index index
, passing the result to
the given success continuation sk
, which should be a
procedure of the form (result source index fail)
. The
optional failure continuation should be a procedure of the form
(source index reason)
, and defaults to just returning
#f
.Call the parser combinator f
on the parse stream
source
, at index index
, and return the result, or
#f
if parsing fails.Call the parser combinator f
on the parse stream
source
, at index index
. If the entire source is not
parsed, raises an error, otherwise returns the result.The fundamental parse iterator. Repeatedly applies the parser
combinator f
to source
, starting at index
, as
long as a valid parse is found. On each successful parse applies
the procedure kons
to the parse result and the previous
kons
result, beginning with knil
. If no parses
succeed returns knil
.Parse as many of the parser combinator f
from the parse
stream source
, starting at index
, as possible, and
return the result as a list.As parse->list
but requires the entire source be parsed
with no left over characters, signalling an error otherwise.Return a new parser combinator with the same behavior as f
,
but on failure replaces the reason with reason
. This can be
useful to provide more descriptive parse failure reasons when
chaining combinators. For example, parse-string
just
expects to parse a single fixed string. If it were defined in
terms of parse-char
, failure would indicate some char
failed to match, but it's more useful to describe the whole string
we were expecting to see.
Parse nothing successfully.Parse any single character successfully. Fails at end of input.Always fail to parse.The disjunction combinator. Returns the first combinator that
succeeds parsing from the same source and index.The conjunction combinator. If both f
and g
parse
successfully starting at the same source and index, returns the
result of g
. Otherwise fails.The negation combinator. If f
succeeds, fails, otherwise
succeeds with #t
.The sequence combinator. Each combinator is applied in turn just
past the position of the previous. If all succeed, returns a list
of the results in order, skipping any ignored values.Convert the list of parser combinators ls
to a
parse-seq
sequence.The optional combinator. Parse the combinator f
(in
sequence with any additional combinator args o
), and return
the result, or parse nothing successully on failure.The repetition combinator. Parse f
repeatedly and return a
list of the results. lo
is the minimum number of parses
(deafult 0) to be considered a successful parse, and hi
is
the maximum number (default infinite) before stopping.Parse f
one or more times.Parse f
and apply the procedure proc
to the result on success.Parse f
and apply the procedure proc
to the substring
of the parsed data. proc
defaults to the identity.Parses the same streams as f
but ignores the result on
success. Inside a parse-seq
the result will not be
included in the list of results. Useful for discarding
boiler-plate without the need for post-processing results.Parse with f
and further require check?
to return true
when applied to the result.Parse with f
once and keep the first result, not allowing
further backtracking within f
.Parse with f
once, keep the first result, and commit to the
current parse path, discarding any prior backtracking options.
Returns true iff index
is the first index of the first parse
stream source
.Returns true iff index
is the last index of the last parse
stream source
.Returns true iff source
, index
indicate the beginning
of a line (or the entire stream).Returns true iff source
, index
indicate the end of a
line (or the entire stream).Returns true iff source
, index
indicate the beginning
of a word (or the entire stream).Returns true iff source
, index
indicate the end of a
word (or the entire stream).Parse the combinator word
(default a parse-token
of
char-alphabetic?
or underscores), ensuring it begins and
ends on a word boundary.As parse-word
, but instead of an arbitrary word
combinator takes a character predicate pred
(conjoined with
char-alphabetic?
or underscore), and parses a sequence of
those characters with parse-token
. Returns the parsed
substring.Parse a single char which matches x
, which can be a
character, character set, or arbitrary procedure.Parse a single char which does not match x
, which can be a
character, character set, or arbitrary procedure.Parse the exact string str
.Parse a sequence of characters matching x
as with
parse-char
, and return the resulting substring.We provide an extended subset of SRE syntax (see
SRFI 115),
taking advantage of more general parsing features. These are just
translated directly into parser combinators, with characters and
strings implicitly matching themselves. For example, '(or
"foo" "bar")
matches either of the strings "foo"
or
"bar"
. Existing parser combinators may be embedded directly.
This is of course more powerful than SREs since it is not
restricted to regular languages (or in fact any languages), though
it does not provide the same performance guarantees.
A delayed combinator. This is equivalent to the parser combinator
f
, but is delayed so it can be more efficient if never used
and f
is expensive to compute. Moreover, it can allow
self-referentiality as in:
(letrec* ((f (parse-lazy (parse-or (parse-seq g f) h))))
...)
Parse the same strings as f
, but memoize the result at each
source and index to avoid exponential backtracking. name
is
provided for debugging only.
The four basic interfaces are grammar
,
define-grammar
, and their unmemoized variants
grammar/unmemoized
and
define-grammar/unmemoized
. This is optimized for the
common case - generally you want to memoize grammars, and may or
may not want to memoize the smaller lexical components.
Describe an grammar for the given named rules
and return the
rule named init
. The rules are parser combinators which
match the first clause
which succeeds, and returns the
corresponding action. Each clause
is an SRE parser as in
parse-sre
, which may include embdedded parser combinators
with unquote
(,). In particular, the rule itself and any
other rules can be referenced in this way. The optional
action
, which defaults to the normal result of the clause
parser, is a normal Scheme expression with all ->
named
expressions in clause bound to the corresponding result.
Alternately, action
can be of the form => receiver
to send the results directly to a success continuation as in
call-with-parse
.
Equivalent to grammar
but memoizes each clause. Parsers
nested within each clause are not automatically memoized, so if
necessary should be memoized explicitly or split out into separate
rules.
Similar to grammar/unmemoized
, instead of returning a
single entry point parser defines each rule
as its own
parser. Also defines name
as an alist mapping rule names to
their values.
The memoized version of define-grammar/unmemoized
.
Example:
(define-grammar calc
(space ((* ,(parse-char char-whitespace?))))
(number ((-> n (+ ,(parse-char char-numeric?)))
(string->number (list->string n))))
(simple ((-> n ,number) n)
((: "(" (=> e1 ,term) ")") e1))
(term-op ("*" *)
("/" /)
("%" modulo))
(term ((: (-> e1 ,simple) ,space (-> op ,term-op) ,space (-> e2 ,term))
(op e1 e2))
((-> e1 ,simple)
e1)))
(parse term "12 / (2*3)")
=> 2