This section describes a functional interface for building regular
 expressions and matching them against strings.
The matching is done using the POSIX regular expression package.
Regular expressions are in the structure regexps.
A regular expression is either a character set, which matches any character in the set, or a composite expression containing one or more subexpressions. A regular expression can be matched against a string to determine success or failure, and to determine the substrings matched by particular subexpressions.
Character sets may be defined using a list of characters and strings, using a range or ranges of characters, or by using set operations on existing character sets.
(set character-or-string ...) -> char-set 
(range low-char high-char) -> char-set 
(ranges low-char high-char ...) -> char-set 
(ascii-range low-char high-char) -> char-set 
(ascii-ranges low-char high-char ...) -> char-set 
Set returns a set that contains the character arguments and the
characters in any string arguments.  Range returns a character
set that contain all characters between low-char and high-char,
inclusive.  Ranges returns a set that contains all characters in
the given ranges.  Range and ranges use the ordering induced by
char->integer.  Ascii-range and ascii-ranges use the
 ASCII ordering.
It is an error for a high-char to be less than the preceding
 low-char in the appropriate ordering.
(negate char-set) -> char-set 
(intersection char-set char-set) -> char-set 
(union char-set char-set) -> char-set 
(subtract char-set char-set) -> char-set 
The following character sets are predefined:
The above are taken from the default locale in POSIX. The characters in
lower-case(set "abcdefghijklmnopqrstuvwxyz")upper-case(set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")alphabetic(union lower-case upper-case)numeric(set "0123456789")alphanumeric(union alphabetic numeric)punctuation(set "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~")graphic(union alphanumeric punctuation)printing(union graphic (set #\space))control(negate printing)blank(set #\space (ascii->char 9)); 9 is tabwhitespace(union (set #\space) (ascii-range 9 13))hexdigit(set "0123456789abcdefABCDEF")
whitespace are space, tab,
 newline (= line feed), vertical tab, form feed, and
 carriage return.
String-start returns a regular expression that matches the beginning
 of the string being matched against; string-end returns one that matches
 the end.
Sequence matches the concatenation of its arguments, one-of matches
any one of its arguments.
Text returns a regular expression that matches the characters in
 string, in order.
Repeat returns a regular expression that matches zero or more
occurences of its reg-exp argument.  With no count the result
will match any number of times (reg-exp*).  With a single
count the returned expression will match
 reg-exp exactly that number of times.
The final case will match from min to max
 repetitions, inclusive.
Max may be #f, in which case there
 is no maximum number of matches.
Count and min should be exact, non-negative integers;
 max should either be an exact non-negative integer or #f.
Regular expressions are normally case-sensitive.
The value returned byignore-case is identical its argument except that case will be
 ignored when matching.
The value returned by use-case is protected
 from future applications of ignore-case.
The expressions returned
 by use-case and ignore-case are unaffected by later uses of the
 these procedures.
By way of example, the following matches "ab" but not "aB",
 "Ab", or "AB".
while(text "ab")
matches(ignore-case (test "ab"))
"ab", "aB",
 "Ab", and "AB" and
(ignore-case (sequence (text "a")
                       (use-case (text "b"))))
"ab" and "Ab" but not "aB" or "AB".
A subexpression within a larger expression can be marked as a submatch. When an expression is matched against a string, the success or failure of each submatch within that expression is reported, as well as the location of the substring matched be each successful submatch.
Submatch returns a regular expression that matches its argument and
 causes the result of matching its argument to be reported by the match
 procedure.
Key is used to indicate the result of this particular submatch 
 in the alist of successful submatches returned by match.
 Any value may be used as a key.
No-submatches returns an expression identical to its
 argument, except that all submatches have been elided.
(any-match? reg-exp string) -> boolean 
(exact-match? reg-exp string) -> boolean 
(match reg-exp string) -> match or  #f 
(match-start match) -> index 
(match-end match) -> index 
(match-submatches match) -> alist 
Any-match? returns #t if string matches reg-exp or
 contains a substring that does, and #f otherwise.
Exact-match? returns #t if string matches
 reg-exp and #f otherwise.
Match returns #f if reg-exp does not match string
 and a match record if it does match.
A match record contains three values: the beginning and end of the substring
 that matched
 the pattern and an a-list of submatch keys and corresponding match records
 for any submatches that also matched.
Match-start returns the index of
 the first character in the matching substring and match-end gives index
 of the first character after the matching substring.
Match-submatches returns an alist of submatch keys and match records.
Only the top match record returned by match has a submatch alist.
Matching occurs according to POSIX.
The match returned is the one with the lowest starting index in string.
If there is more than one such match, the longest is returned.
Within that match the longest possible submatches are returned.
All three matching procedures cache a compiled version of reg-exp.
Subsequent calls with the same reg-exp will be more efficient.
The C interface to the POSIX regular expression code uses ASCII nul
 as an end-of-string marker.
The matching procedures will ignore any characters following an
 embedded ASCII nuls in string.
(define pattern (text "abc")) (any-match? pattern "abc")->#t (any-match? pattern "abx")->#f (any-match? pattern "xxabcxx")->#t (exact-match? pattern "abc")->#t (exact-match? pattern "abx")->#f (exact-match? pattern "xxabcxx")->#f (match pattern "abc")->(#{match 0 3}) (match pattern "abx")->#f (match pattern "xxabcxx")->(#{match 2 5}) (let ((x (match (sequence (text "ab") (submatch 'foo (text "cd")) (text "ef")) "xxxabcdefxx"))) (list x (match-submatches x)))->(#{match 3 9} ((foo . #{match 5 7})) (match-submatches (match (sequence (set "a") (one-of (submatch 'foo (text "bc")) (submatch 'bar (text "BC")))) "xxxaBCd"))->((bar . #{match 4 6}))
Previous: Regular expressions | Next: Regular expressions