This section describes a functional interface for building regular
expressions and matching them against strings.
The matching is done using the POSIX regular expression package.
Regular expressions are in the structure regexps.
A regular expression is either a character set, which matches any character in the set, or a composite expression containing one or more subexpressions. A regular expression can be matched against a string to determine success or failure, and to determine the substrings matched by particular subexpressions.
Character sets may be defined using a list of characters and strings, using a range or ranges of characters, or by using set operations on existing character sets.
(set character-or-string ...) -> char-set
(range low-char high-char) -> char-set
(ranges low-char high-char ...) -> char-set
(ascii-range low-char high-char) -> char-set
(ascii-ranges low-char high-char ...) -> char-set
Set returns a set that contains the character arguments and the
characters in any string arguments. Range returns a character
set that contain all characters between low-char and high-char,
inclusive. Ranges returns a set that contains all characters in
the given ranges. Range and ranges use the ordering induced by
char->integer. Ascii-range and ascii-ranges use the
ASCII ordering.
It is an error for a high-char to be less than the preceding
low-char in the appropriate ordering.
(negate char-set) -> char-set
(intersection char-set char-set) -> char-set
(union char-set char-set) -> char-set
(subtract char-set char-set) -> char-set
The following character sets are predefined:
The above are taken from the default locale in POSIX. The characters in
lower-case(set "abcdefghijklmnopqrstuvwxyz")upper-case(set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")alphabetic(union lower-case upper-case)numeric(set "0123456789")alphanumeric(union alphabetic numeric)punctuation(set "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~")graphic(union alphanumeric punctuation)printing(union graphic (set #\space))control(negate printing)blank(set #\space (ascii->char 9)); 9 is tabwhitespace(union (set #\space) (ascii-range 9 13))hexdigit(set "0123456789abcdefABCDEF")
whitespace are space, tab,
newline (= line feed), vertical tab, form feed, and
carriage return.
String-start returns a regular expression that matches the beginning
of the string being matched against; string-end returns one that matches
the end.
Sequence matches the concatenation of its arguments, one-of matches
any one of its arguments.
Text returns a regular expression that matches the characters in
string, in order.
Repeat returns a regular expression that matches zero or more
occurences of its reg-exp argument. With no count the result
will match any number of times (reg-exp*). With a single
count the returned expression will match
reg-exp exactly that number of times.
The final case will match from min to max
repetitions, inclusive.
Max may be #f, in which case there
is no maximum number of matches.
Count and min should be exact, non-negative integers;
max should either be an exact non-negative integer or #f.
Regular expressions are normally case-sensitive.
The value returned byignore-case is identical its argument except that case will be
ignored when matching.
The value returned by use-case is protected
from future applications of ignore-case.
The expressions returned
by use-case and ignore-case are unaffected by later uses of the
these procedures.
By way of example, the following matches "ab" but not "aB",
"Ab", or "AB".
while(text "ab")
matches(ignore-case (test "ab"))
"ab", "aB",
"Ab", and "AB" and
(ignore-case (sequence (text "a")
(use-case (text "b"))))
matches "ab" and "Ab" but not "aB" or "AB".
A subexpression within a larger expression can be marked as a submatch. When an expression is matched against a string, the success or failure of each submatch within that expression is reported, as well as the location of the substring matched be each successful submatch.
Submatch returns a regular expression that matches its argument and
causes the result of matching its argument to be reported by the match
procedure.
Key is used to indicate the result of this particular submatch
in the alist of successful submatches returned by match.
Any value may be used as a key.
No-submatches returns an expression identical to its
argument, except that all submatches have been elided.
(any-match? reg-exp string) -> boolean
(exact-match? reg-exp string) -> boolean
(match reg-exp string) -> match or #f
(match-start match) -> index
(match-end match) -> index
(match-submatches match) -> alist
Any-match? returns #t if string matches reg-exp or
contains a substring that does, and #f otherwise.
Exact-match? returns #t if string matches
reg-exp and #f otherwise.
Match returns #f if reg-exp does not match string
and a match record if it does match.
A match record contains three values: the beginning and end of the substring
that matched
the pattern and an a-list of submatch keys and corresponding match records
for any submatches that also matched.
Match-start returns the index of
the first character in the matching substring and match-end gives index
of the first character after the matching substring.
Match-submatches returns an alist of submatch keys and match records.
Only the top match record returned by match has a submatch alist.
Matching occurs according to POSIX.
The match returned is the one with the lowest starting index in string.
If there is more than one such match, the longest is returned.
Within that match the longest possible submatches are returned.
All three matching procedures cache a compiled version of reg-exp.
Subsequent calls with the same reg-exp will be more efficient.
The C interface to the POSIX regular expression code uses ASCII nul
as an end-of-string marker.
The matching procedures will ignore any characters following an
embedded ASCII nuls in string.
(define pattern (text "abc")) (any-match? pattern "abc")->#t (any-match? pattern "abx")->#f (any-match? pattern "xxabcxx")->#t (exact-match? pattern "abc")->#t (exact-match? pattern "abx")->#f (exact-match? pattern "xxabcxx")->#f (match pattern "abc")->(#{match 0 3}) (match pattern "abx")->#f (match pattern "xxabcxx")->(#{match 2 5}) (let ((x (match (sequence (text "ab") (submatch 'foo (text "cd")) (text "ef")) "xxxabcdefxx"))) (list x (match-submatches x)))->(#{match 3 9} ((foo . #{match 5 7})) (match-submatches (match (sequence (set "a") (one-of (submatch 'foo (text "bc")) (submatch 'bar (text "BC")))) "xxxaBCd"))->((bar . #{match 4 6}))
Previous: Regular expressions | Next: Regular expressions