Libraries

Use the ,open command (section 3.4) or the module language (chapter 2.6) to open the structures described below.

5.1  General utilities

These are in the big-util structure.

(atom? x) is the same as (not (pair? x)).

Returns true for the empty list, false for a pair, and signals an error otherwise.

(neq? x y) is the same as (not (eq? x y)).

(n= x y) is the same as (not (= x y)).

These both just return their argument. No-op is guaranteed not to be compiled in-line, identity may be.

Returns true if value is in list, false otherwise.

Returns true if predicate is true for any element of list.

Returns true if predicate is true for every element of list.

Any returns some element of list for which predicate is true, or false if there are none. First does the same except that it returns the first element for which predicate is true.

Returns a list containing all of the elements of list for which predicate is true. The order of the elements is preserved. Filter! may reuse the storage of list.

The same as filter except the returned list contains the results of applying procedure instead of elements of list. (filter-map p l) is the same as (filter identity (map p l)).

The first return value contains those elements list for which predicate is true, the second contains the remaining elements. The order of the elements is preserved. Partition-list! may reuse the storage of the list.

Returns its argument with all duplicate elements removed. The first instance of each element is preserved.

All three of these return list with some elements removed. Delq removes all elements eq? to value. Delq! does the same and may modify the list argument. Delete removes all elements for which predicate is true. Both delq and delete may reuse some of the storage in the list argument, but won’t modify it.

Destructively reverses list.

Returns the symbol whose name is produced by concatenating the displayed representations of value ....

(concatenate-symbol 'abc "-" 4) ===⇒ 'abc-4

5.2  Pretty-printing

These are in the pp structure.

Pretty-print value The current output port is used if no port is specified. Position is the starting offset. Value will be pretty-printed to the right of this column.

5.3  Bitwise integer operations

These functions use the two’s-complement representation for integers. There is no limit to the number of bits in an integer. They are in the structures bitwise and big-scheme.

These perform various logical operations on integers on a bit-by-bit basis. ‘ior’ is inclusive OR and ‘xor’ is exclusive OR.

Shifts the integer by the given bit count, which must be an integer, shifting left for positive counts and right for negative ones. Shifting preserves the integer’s sign.

Counts the number of bits set in the integer. If the argument is negative a bitwise NOT operation is performed before counting.

5.4  Byte vectors

These are homogeneous vectors of small integers (0 ≤ i ≤ 255). The functions that operate on them are analogous to those for vectors. They are in the structure byte-vectors.

5.5  Sparse vectors

These are vectors that grow as large as they need to. That is, they can be indexed by arbitrarily large nonnegative integers. The implementation allows for arbitrarily large gaps by arranging the entries in a tree. They are in the structure sparse-vectors.

Make-sparse-vector, sparse-vector-ref, and sparse-vector-set! are analogous to make-vector, vector-ref, and vector-set!, except that the indices passed to sparse-vector-ref and sparse-vector-set! can be arbitrarily large. For indices whose elements have not been set in a sparse vector, sparse-vector-ref returns #f.

Sparse-vector->list is for debugging: It returns a list of the consecutive elements in a sparse vector from 0 to the highest element that has been set. Note that the list will also include all the #f elements for the unset elements.

5.6  Cells

These hold a single value and are useful when a simple indirection is required. The system uses these to hold the values of lexical variables that may be set!.

5.7  Queues

These are ordinary first-in, first-out queues. The procedures are in structure queues.

Make-queue creates an empty queue, queue? is a predicate for identifying queues, and queue-empty? tells you if a queue is empty. List->queue returns a queue containing values, preserving their order. Enqueue! adds one value to the queue; enqueue-many! adds a list of values to the queue. Queue-head-or-value, queue-head-or-thunk, queue-head, and maybe-queue-head return the first value in queue if it is not empty; if the queue is empty, queue-head-or-value returns value, queue-head-or-thunk tail-calls thunk, queue-head raises an error, and maybe-queue-head returns #f. Dequeue-or-value!, dequeue-or-thunk!, dequeue!, and maybe-dequeue! remove a value from the queue if one is available; if the queue is empty, dequeue-or-value! returns value, dequeue-or-thunk! tail-calls thunk, dequeue! raises an error, and maybe-dequeue! returns #f. Empty-queue! removes all values from queue.

(Dequeue-or-value! q value) is more efficient than, but otherwise equivalent to:

(ensure-atomicity
 (if (queue-empty? q)
     value
     (dequeue! q)))

Because queue-head and dequeue! raise exceptions if they are called on an empty queue, they must not be called with a proposal already active unless queue-empty? has returned #f with the same proposal active.

The following procedures are not used in the Scheme 48 system, and are very slow. These operations may be removed from the queues structure in a future revision.

Queue-length returns the number of values in queue. Queue->list returns the values in queue as a list, in the order in which the values were added. Delete-from-queue! removes the first instance of value from queue, using eqv? for comparisons. Delete-from-queue! returns #t if it removes an element and #f if it does not. On-queue? returns #t if value is in the queue (using eqv? for comparisons) and #f if it is not.

5.8  Arrays

These provide N-dimensional, zero-based arrays and are in the structure arrays. The array interface is derived from one invented by Alan Bawden.

Make-array makes a new array with the given dimensions, each of which must be a non-negative integer. Every element is initially set to value. Array Returns a new array with the given dimensions and elements. Dimensions must be a list of positive integers, The number of elements should be the equal to the product of the dimensions. The elements are stored in row-major order.

(make-array 'a 2 3) => {Array 2 3}

(array '(2 3) 'a 'b 'c 'd 'e 'f)
    => {Array 2 3}

Copy-array returns a copy of array. The copy is identical to the array but does not share storage with it.

Returns #t if value is an array.

Array-ref returns the specified array element and array-set! replaces the element with value.

(let ((a (array '(2 3) 'a 'b 'c 'd 'e 'f)))
  (let ((x (array-ref a 0 1)))
    (array-set! a 'g 0 1)
    (list x (array-ref a 0 1))))
    => '(b g)

Array->vector returns a vector containing the elements of array in row-major order. Array-shape returns the dimensions of the array as a list.

Make-shared-array makes a new array that shares storage with array and uses linear-map to map indexes to elements. Linear-map must accept as many arguments as the number of dimensions given and must return a list of non-negative integers that are valid indexes into array.

(array-ref (make-shared-array a f i0 i1 ...)
           j0 j1 ...)

is equivalent to

(apply array-ref a (f j0 j1 ...))

As an example, the following function makes the transpose of a two-dimensional array:

(define (transpose array)
  (let ((shape (array-shape array)))
    (make-shared-array array
                       (lambda (x y)
                         (list y x))
                       (cadr shape)
                       (car shape))))

(array->vector
  (transpose
    (array '(2 3) 'a 'b 'c 'd 'e 'f)))
      => '(a d b e c f)

5.9  Records

New types can be constructed using the define-record-type macro from the define-record-types structure The general syntax is:

(define-record-type [tag] type-name
  (constructor-name field-tag ...)
  predicate-name
  (field-tag accessor-name [modifier-name])
  ...)

This makes the following definitions:

Type-name is the record type itself, and can be used to specify a print method (see below). Constructor-name is a constructor that accepts values for the fields whose tags are specified. Predicate-name is a predicate that returns #t for elements of the type and #f for everything else. The accessor-names retrieve the values of fields, and the modifier-name’s update them. Tag is used in printing instances of the record type and the field-tags are used in the inspector and to match constructor arguments with fields. If tag is not specified, type-name is used instead.

Define-record-discloser determines how records of type type are printed. Discloser should be procedure which takes a single record of type type and returns a list whose car is a symbol. The record will be printed as the value returned by discloser with curly braces used instead of the usual parenthesis.

For example

(define-record-type pare :pare
  (kons x y)
  pare?
  (x kar set-kar!)
  (y kdr))

defines kons to be a constructor, kar and kdr to be accessors, set-kar! to be a modifier, and pare? to be a predicate for a new type of object. The type itself is named :pare. Pare is a tag used in printing the new objects.

By default, the new objects print as #{Pare}. The print method can be modified using define-record-discloser:

(define-record-discloser :pare
  (lambda (p) `(pare ,(kar p) ,(kdr p))))

will cause the result of (kons 1 2) to print as #{Pare 1 2}.

Define-record-resumer (section 8.7.6) can be used to control how records are stored in heap images.

5.9.1  Low-level access to records

Records are implemented using primitive objects exactly analogous to vectors. Every record has a record type (which is another record) in the first slot. Note that use of these procedures, especially record-set!, breaks the record abstraction described above; caution is advised.

These procedures are in the structure records.

These the same as the standard vector- procedures except that they operate on records. The value returned by record-length includes the slot holding the record’s type. (record-type x) is equivalent to (record-ref x 0).

5.9.2  Record types

Record types are themselves records of a particular type (the first slot of :record-type points to itself). A record type contains four values: the name of the record type, a list of the names its fields, and procedures for disclosing and resuming records of that type. Procedures for manipulating them are in the structure record-types.

These procedures construct the usual record-manipulating procedures. Record-constructor returns a constructor that is passed the initial values for the fields specified and returns a new record. Record-predicate returns a predicate that return true when passed a record of type record-type and false otherwise. Record-accessor and record-modifier return procedures that reference and set the given field in records of the appropriate type.

Record-types is the initial exporter of define-record-discloser (re-exported by define-record-types described above) and define-record-resumer (re-exported by external-calls (section 8.7.6)).

The procedures described in this section can be used to define new record-type-defining macros.

(define-record-type pare :pare
  (kons x y)
  pare?
  (x kar set-kar!)
  (y kdr))

is (semantically) equivalent to

(define :pare (make-record-type 'pare '(x y)))
(define kons (record-constructor :pare '(x y)))
(define kar (record-accessor :pare 'x))
(define set-kar! (record-modifier :pare 'x))
(define kdr (record-accessor :pare 'y))

The ‘‘(semantically)’’ above is because define-record-type adds declarations, which allows the type checker to detect some misuses of records, and uses more efficient definitions for the constructor, accessors, and modifiers. Ignoring the declarations, which will have to wait for another edition of the manual, what the above example actually expands into is:

(define :pare (make-record-type 'pare '(x y)))
(define (kons x y) (record :pare x y))
(define (kar r) (checked-record-ref r :pare 1))
(define (set-kar! r new)
  (checked-record-set! r :pare 1 new))
(define (kdr r) (checked-record-ref r :pare 2))

Checked-record-ref and Checked-record-set! are low-level procedures that check the type of the record and access or modify it using a single VM instruction.

5.10  Finite record types

The structure finite-types has two macros for defining ‘finite’ record types. These are record types for which there are a fixed number of instances, all of which are created at the same time as the record type itself. The syntax for defining an enumerated type is:

(define-enumerated-type tag type-name
  predicate-name
  vector-of-instances-name
  name-accessor
  index-accessor
  (instance-name ...))

This defines a new record type, bound to type-name, with as many instances as there are instance-name’s. Vector-of-instances-name is bound to a vector containing the instances of the type in the same order as the instance-name list. Tag is bound to a macro that when given an instance-name expands into an expression that returns corresponding instance. The name lookup is done at macro expansion time. Predicate-name is a predicate for the new type. Name-accessor and index-accessor are accessors for the name and index (in vector-of-instances) of instances of the type.

(define-enumerated-type color :color
  color?
  colors
  color-name
  color-index
  (black white purple maroon))

(color-name (vector-ref colors 0)) => black
(color-name (color white))         => white
(color-index (color purple))       => 2

Finite types are enumerations that allow the user to add additional fields in the type. The syntax for defining a finite type is:

(define-finite-type tag type-name
  (field-tag ...)
  predicate-name
  vector-of-instances-name
  name-accessor
  index-accessor
  (field-tag accessor-name [modifier-name])
  ...((instance-name field-value ...)
   ...))

The additional fields are specified exactly as with define-record-type. The field arguments to the constructor are listed after the type-name; these do not include the name and index fields. The form ends with the names and the initial field values for the instances of the type. The instances are constructed by applying the (unnamed) constructor to these initial field values. The name must be first and the remaining values must match the field-tags in the constructor’s argument list.

(define-finite-type color :color
  (red green blue)
  color?
  colors
  color-name
  color-index
  (red   color-red)
  (green color-green)
  (blue  color-blue)
  ((black    0   0   0)
   (white  255 255 255)
   (purple 160  32 240)
   (maroon 176  48  96)))

(color-name (color black))         => black
(color-name (vector-ref colors 1)) => white
(color-index (color purple))       => 2
(color-red (color maroon))         => 176

5.11  Sets over finite types

The structure enum-sets has a macro for defining types for sets of elements of finite types. These work naturally with the finite types defined by the finite-types structure, but are not tied to them. The syntax for defining such a type is:

(define-enum-set-type id type-name predicate constructor
   element-syntax element-predicate all-elements element-index-ref)

This defines id to be syntax for constructing sets, type-name to be a value representing the type, predicate to be a predicate for those sets, and constructor a procedure for constructing one from a list.

Element-syntax must be the name of a macro for constructing set elements from names (akin to the tag argument to define-enumerated-type). Element-predicate must be a predicate for the element type, all-elements a vector of all values of the element type, and element-index-ref must return the index of an element within the all-elements vector.

Enum-set->list converts a set into a list of its elements. Enum-set-member? tests for membership. Enum-set=? tests two sets of equal type for equality. (If its arguments are not of the same type, enum-set=? raises an exception.) Enum-set-subset? tests, for two sets of equal type, if the first set is a subset of the second one. Enum-set-union computes the union of two sets of equal type, enum-set-intersection computes the intersection, enum-set-difference computes the intersection, and enum-set-negation computes the complement of a set.

Here is an example. Given an enumerated type:

(define-enumerated-type color :color
  color?
  colors
  color-name
  color-index
  (red blue green))

we can define sets of colors:

(define-enum-set-type color-set :color-set
                      color-set?
                      make-color-set
  color color? colors color-index)

> (enum-set->list (color-set red blue))
(#{Color red} #{Color blue})
> (enum-set->list (enum-set-negation (color-set red blue)))
(#{Color green})
> (enum-set-member? (color-set red blue) (color blue))
#t

5.12  Hash tables

These are generic hash tables, and are in the structure tables. Strictly speaking they are more maps than tables, as every table has a value for every possible key (for that type of table). All but a finite number of those values are #f.

The first four functions listed make various kinds of tables. Make-table returns a table whose keys may be symbols, integer, characters, booleans, or the empty list (these are also the values that may be used in case expressions). As with case, comparison is done using eqv?. The comparison procedures used in symbol, string, and integer tables are eq?, string=?, and =.

Make-table-maker takes two procedures as arguments and returns a nullary table-making procedure. Compare-proc should be a two-argument equality predicate. Hash-proc should be a one argument procedure that takes a key and returns a non-negative integer hash value. If (compare-proc x y) returns true, then (= (hash-proc x) (hash-proc y)) must also return true. For example, make-integer-table could be defined as (make-table-maker = abs).

Make-table-immutable! prohibits future modification to its argument.

Table? is the predicate for tables. Table-ref and table-set! access and modify the value of key in table. Table-walk applies procedure, which must accept two arguments, to every associated key and non-#f value in table. Table->entry-list returns a list with the values of the table.

Default-hash-function is the hash function used in the tables returned by make-table, string-hash is the one used by make-string-table, and symbol-hash is the one used by make-symbol-table.

5.13  Port extensions

These procedures are in structure extended-ports.

Make-string-input-port returns an input port that that reads characters from the supplied string. An end-of-file object is returned if the user reads past the end of the string. Make-string-output-port returns an output port that saves the characters written to it. These are then returned as a string by string-output-port-output.

(read (make-string-input-port "(a b)"))
    => '(a b)

(let ((p (make-string-output-port)))
  (write '(a b) p)
  (let ((s (string-output-port-output p)))
    (display "c" p)
    (list s (string-output-port-output p))))
    => '("(a b)" "(a b)c")

Procedure is called on an output port. Output written to that port is copied to output-port until n characters have been written, at which point limit-output returns. If procedure returns before writing n characters, then limit-output also returns at that time, regardless of how many characters have been written.

Make-tracking-input-port and make-tracking-output-port return ports that keep track of the current row and column and are otherwise identical to their arguments. Closing a tracking port does not close the underlying port. Current-row and current-column return port’s current read or write location. They return #f if port does not keep track of its location. Fresh-line writes a newline character to output-port if (current-row port) is not 0.

(define p (make-tracking-output-port (open-output-file "/tmp/temp")))
(list (current-row p) (current-column p))
    => '(0 0)
(display "012" p)
(list (current-row p) (current-column p))
    => '(0 3)
(fresh-line p)
(list (current-row p) (current-column p))
    => '(1 0)
(fresh-line p)
(list (current-row p) (current-column p))
    => '(1 0)

5.14  Fluid bindings

These procedures implement dynamic binding and are in structure fluids. A fluid is a cell whose value can be bound dynamically. Each fluid has a top-level value that is used when the fluid is unbound in the current dynamic environment.

Make-fluid returns a new fluid with value as its initial top-level value. Fluid returns fluid’s current value. Let-fluid calls thunk, with fluid bound to value until thunk returns. Using a continuation to throw out of the call to thunk causes fluid to revert to its original value, while throwing back in causes fluid to be rebound to value. Let-fluid returns the value(s) returned by thunk. Let-fluids is identical to let-fluid except that it binds an arbitrary number of fluids to new values.

(let* ((f (make-fluid 'a))
       (v0 (fluid f))
       (v1 (let-fluid f 'b
             (lambda ()
               (fluid f))))
       (v2 (fluid f)))
  (list v0 v1 v2))
  => '(a b a)

(let ((f (make-fluid 'a))
      (path '())
      (c #f))
  (let ((add (lambda ()
               (set! path (cons (fluid f) path)))))
    (add)
    (let-fluid f 'b
      (lambda ()
        (call-with-current-continuation
          (lambda (c0)
            (set! c c0)))
        (add)))
    (add)
    (if (< (length path) 5)
        (c)
        (reverse path))))
  => '(a b a b a)

5.15  OS strings

On common operating systems such as Unix and Windows, various parameters to OS functionality---such as file names, user names, command-line arguments etc.---appear as text in most contexts, but are really byte sequences: On Unix, the byte sequence may be interpreted as text through some locale-determined encoding. On Windows, such parameters are typically represented as sequences of UTF-16 code units. In both cases, not every such byte sequence has a string equivalent: On Unix, a byte sequence encoding a file name using Latin-1 often cannot be decoded using UTF-8. On Windows, unpaired UTF-16 surrogates are admissible in encodings, and no lossless text decoding for them exists.

For representing such string-like parameters, Scheme 48 uses an abstraction called OS strings. An OS string is created from either a string or a NUL-terminated byte sequence stored in a byte vector, and has an associated text codec (see section 6.6.1) that is able to convert from one representation to the other. The exact meaning of a NUL-terminated byte sequence is dependent on this text codec. However, only codecs for encodings that are a conservative extension of ASCII (such as ASCII itself, Latin-1, or UTF-8) should be used here, to allow a minimal set of portable file names. (The Windows port uses a special synthetic encoding called UTF-8of16 compatible with UTF-8 but capable of encoding even invalid UTF-16 internally, but uses the UTF-8 codec at the Scheme level.)

Most procedures accepting OS strings also accept strings or byte vectors, which are then used to construct a OS string. In the headers of the specifications of these procedures, such arguments occur as os-string-thing.The standard Scheme procedures such as open-input-file that take file names all accept os-string-thing arguments. OS strings are in the os-strings structure.

The os-string? predicate returns #t if its argument is an OS string, #f otherwise.

The make-os-string procedure creates an OS string from a text codec and a byte vector or string that defines its contents. If the argument is a byte vector, it does not matter if it is NUL-terminated or not.

These procedures create an OS string from a string, a byte-vector (whose last value should be 0), and an os-string-thing argument, respectively, always using the standard OS-string text codec (see below).

These procedures yield the contents of an OS string. For an OS string created from a string, os-string->string will return a string with the same contents; for an OS string created from a byte vector, os-string->byte-vector will return a byte vector with the same contents. For the other cases, data loss as determined by the text codec is possible.

This procedure returns the text codec of the OS string.

This procedure returns #t if its arguments denote the same byte sequence, #f otherwise.

These are convenience procedures: The first is the composition of x->os-string and os-string->byte-vector, and the second is the composition of string->os-string and os-string->byte-vector.

The current-os-string-text-codec returns the current text codec used for creating new OS strings. The initial default is determined by the operating system. (On Unix, this is the text codec determined by the locale. On Windows, this is UTF-8.) The call-with-os-string-text-codec procedure dynamically binds the current text codec to text-codec during the invocation of thunk.

5.16  Shell commands

Structure c-system-function provides access to the C system() function.

Have-system? returns true if the underlying C implementation has a command processor. (System string) passes string to the C system() function and returns the result.

(begin
  (system "echo foo > test-file")
  (call-with-input-file "test-file" read))
=> 'foo

5.17  Sockets

Structure sockets provides access to TCP/IP sockets for interprocess and network communication.

Open-socket creates a new socket. If no port-number is supplied the system picks one at random. Socket-port-number returns a socket’s port number. Close-socket closes a socket, preventing any further connections. Socket-accept accepts a single connection on socket, returning an input port and an output port for communicating with the client. If no client is waiting socket-accept blocks until one appears. Get-host-name returns the network name of the machine.

Socket-client connects to the server at port-number on the machine named host-name. Socket-client blocks until the server accepts the connection.

The following simple example shows a server and client for a centralized UID service.

(define (id-server)
  (let ((socket (open-socket)))
    (display "Waiting on port ")
    (display (socket-port-number socket))
    (newline)
    (let loop ((next-id 0))
      (call-with-values
        (lambda ()
          (socket-accept socket))
        (lambda (in out)
          (display next-id out)
          (close-input-port in)
          (close-output-port out)
          (loop (+ next-id 1)))))))
         
(define (get-id machine port-number)
  (call-with-values
    (lambda ()
      (socket-client machine port-number))
    (lambda (in out)
      (let ((id (read in)))
        (close-input-port in)
        (close-output-port out)
        id))))

5.18  Profiling

The profiler can be used programmatically from within the code and its results can be processed dynamically.

The structure profiler offers basic data structures and functions. All functions need a profile-data record argument that can be created with make-empty-profile-data.

profile-thunk executes a thunk under the profiler. The data is stored in the profile-data record passed as argument. Optionally the timeout for the profiler interrupt can be passed (in milliseconds). The return value of profile-thunk is the return value of thunk. By default, non-instrumented code will be profiled by the sampling process. By passing with-non-instr? = #f only instrumented functions will be profiled.

After profiling data, can be retrieved with the following accessors:

Times are in milliseconds, memory usage in bytes.

The following functions produce the same output as the ,profile command (see section 3.6). They all take an optional port argument. Default is the current-output-port.

profile-display prints the full output of the profiler. The other profile-display-... functions only display the respective part of the output.

The single fields in the flat profile can be retrieved with the following accessors:

Here names is the list of names specifying the function, optionally with it’s module. For example, names = ("dynamic-wind", "wind") would specify the dynamic-wind function in the module wind, if it was seen while profiling. If two or more functions match, the first one is used. If no function matches, the functions return #{Unspecific}.

The argument names can also be a plain string, as in (profile-display-function-flat prof-data "module"). This will display all flat function profiles that have ‘‘module’’ either as name or module.

The meanings of the fields that the functions return are as follows:

The following shows a short example of the usage of the profiler interface, where main is the function to be profiled:

(define prof-data (make-empty-profile-data))

(profile-thunk prof-data (lambda () (main 22)))
(display "Samples: ")
(display (profile-data-samples prof-data))
(newline)

(profile-display-overview prof-data)
(profile-display-flat prof-data (current-output-port))
(profile-display-tree prof-data)

; print only function "a"
(profile-display-function-flat prof-data '("a"))

; print only function "a" in file "x.scm"
(profile-display-function-flat prof-data '("a" "x.scm"))

; print all profiled functions in file "x.scm"
(profile-display-function-flat prof-data "x.scm")

; print percentage of time "a" was running
(display (* (profile-function-timeshare prof-data '("a")) 100))

5.19  Macros for writing loops

Iterate and reduce are extensions of named-let for writing loops that walk down one or more sequences, such as the elements of a list or vector, the characters read from a port, or an arithmetic series. Additional sequences can be defined by the user. Iterate and reduce are in structure reduce.

5.19.1  Iterate

The syntax of iterate is:

  (iterate loop-name
           ((sequence-type element-variable sequence-data ...)
            ...)
           ((state-variable initial-value)
            ...)
    body-expression
    [final-expression])

Iterate steps the element-variables in parallel through the sequences, while each state-variable has the corresponding initial-value for the first iteration and have later values supplied by body-expression. If any sequence has reached its limit the value of the iterate expression is the value of final-expression, if present, or the current values of the state-variables, returned as multiple values. If no sequence has reached its limit, body-expression is evaluated and either calls loop-name with new values for the state-variables, or returns some other value(s).

The loop-name and the state-variables and initial-values behave exactly as in named-let. The named-let expression

  (let loop-name ((state-variable initial-value) ...)
    body ...)

is equivalent to an iterate expression with no sequences (and with an explicit let wrapped around the body expressions to take care of any internal defines):

  (iterate loop-name
           ()
           ((state-variable initial-value) ...)
    (let () body ...))

The sequence-types are keywords (they are actually macros of a particular form; it is easy to add additional types of sequences). Examples are list* which walks down the elements of a list and vector* which does the same for vectors. For each iteration, each element-variable is bound to the next element of the sequence. The sequence-data gives the actual list or vector or whatever.

If there is a final-expression, it is evaluated when the end of one or more sequences is reached. If the body-expression does not call loop-name the final-expression is not evaluated. The state-variables are visible in final-expression but the sequence-variables are not.

The body-expression and the final-expression are in tail-position within the iterate. Unlike named-let, the behavior of a non-tail-recursive call to loop-name is unspecified (because iterating down a sequence may involve side effects, such as reading characters from a port).

5.19.2  Reduce

If an iterate expression is not meant to terminate before a sequence has reached its end, body-expression will always end with a tail call to loop-name. Reduce is a macro that makes this common case explicit. The syntax of reduce is the same as that of iterate, except that there is no loop-name. The body-expression returns new values of the state-variables instead of passing them to loop-name. Thus body-expression must return as many values as there are state variables. By special dispensation, if there are no state variables then body-expression may return any number of values, all of which are ignored.

The syntax of reduce is:

  (reduce ((sequence-type element-variable sequence-data ...)
            ...)
           ((state-variable initial-value)
            ...)
    body-expression
    [final-expression])

The value(s) returned by an instance of reduce is the value(s) returned by the final-expression, if present, or the current value(s) of the state variables when the end of one or more sequences is reached.

A reduce expression can be rewritten as an equivalent iterate expression by adding a loop-var and a wrapper for the body-expression that calls the loop-var.

(iterate loop
         ((sequence-type element-variable sequence-data ...)
          ...)
         ((state-variable initial-value)
          ...)
  (call-with-values (lambda ()
                      body-expression)
                    loop)
  [final-expression])

5.19.3  Sequence types

The predefined sequence types are:

For lists, vectors, and strings the element variable is bound to the successive elements of the list or vector, or the characters in the string.

For list-spine* the element variable is bound to the successive pairs in the spine of the list. List-spine-cycle-safe* is similar, but calls on-cycle-thunk with no arguments and with the continuation of the loop macro at an unspecified time if list contains a cycle.

For count* the element variable is bound to the elements of the sequence

 start, start + step, start + 2step, ..., end

inclusive of start and exclusive of end. The default step is 1. The sequence does not terminate if no end is given or if there is no N > 0 such that end = start + Nstep (= is used to test for termination). For example, (count* i 0 -1) doesn’t terminate because it begins past the end value and (count* i 0 1 2) doesn’t terminate because it skips over the end value.

For bits*, the element variable is bound to a sequence of representations of successive bit-fields of i, from least to most significant. If size is present, it must be a positive exact integer, and the element variable is bound to a sequence of size-bit integers. If size is omitted, bits* iterates through single bits, and the element variable is bound to a sequence of booleans. #t represents 1, and #f represents 0.

For input* the elements are the results of successive applications of read-procedure to input-port. The sequence ends when read-procedure returns an end-of-file object.

For a stream, the procedure takes the current data value as an argument and returns two values, the next value of the sequence and a new data value. If the new data is #f then the previous element was the last one. For example,

  (list* elt my-list)

is the same as

  (stream* elt list->stream my-list)

where list->stream is

  (lambda (list)
    (if (null? list)
        (values 'ignored #f)
        (values (car list) (cdr list))))

5.19.4  Synchronous sequences

When using the sequence types described above, a loop terminates when any of its sequences reaches its end. To help detect bugs it is useful to have sequence types that check to see if two or more sequences end on the same iteration. For this purpose there is second set of sequence types called synchronous sequences. These are identical to the ones listed above except that they cause an error to be signalled if a loop is terminated by a synchronous sequence and some other synchronous sequence did not reach its end on the same iteration.

Sequences are checked for termination in order, from left to right, and if a loop is terminated by a non-synchronous sequence no further checking is done.

The synchronous sequences are:

Note that the synchronous count% must have an end, unlike the nonsynchronous count*.

5.19.5  Examples

Gathering the indexes of list elements that answer true to some predicate.

(lambda (my-list predicate)
  (reduce ((list* elt my-list)
           (count* i 0))
          ((hits '()))
    (if (predicate elt)
        (cons i hits)
        hits)
    (reverse hits))

Looking for the index of an element of a list.

(lambda (my-list predicate)
  (iterate loop
           ((list* elt my-list)
            (count* i 0))
           ()                                ; no state
    (if (predicate elt)
        i
        (loop))))

Reading one line.

(define (read-line port)
  (iterate loop
           ((input* c port read-char))
           ((chars '()))
    (if (char=? c #\newline)
        (list->string (reverse chars))
        (loop (cons c chars)))
    (if (null? chars)
        (eof-object)
        ; no newline at end of file
        (list->string (reverse chars)))))

Counting the lines in a file. We can’t use count* because we need the value of the count after the loop has finished.

(define (line-count name)
  (call-with-input-file name
    (lambda (in)
      (reduce ((input* l in read-line))
              ((i 0))
        (+ i 1)))))

5.19.6  Defining sequence types

The sequence types are object-oriented macros similar to enumerations. A non-synchronous sequence macro needs to supply three values: #f to indicate that it isn’t synchronous, a list of state variables and their initializers, and the code for one iteration. The first two methods are CPS’ed: they take another macro and argument to which to pass their result. The sync method gets no additional arguments. The state-vars method is passed a list of names which will be bound to the arguments to the sequence. The final method, for the step, is passed the list of names bound to the arguments and the list of state variables. In addition there is a variable to be bound to the next element of the sequence, the body expression for the loop, and an expression for terminating the loop.

The definition of list* is

(define-syntax list*
  (syntax-rules (sync state-vars step)
    ((list* sync (next more))
     (next #f more))
    ((list* state-vars (start-list) (next more))
     (next ((list-var start-list)) more))
    ((list* step (start-list) (list-var)
            value-var loop-body final-exp)
     (if (null? list-var)
         final-exp
         (let ((value-var (car list-var))
               (list-var (cdr list-var)))
           loop-body)))))

Synchronized sequences are the same, except that they need to provide a termination test to be used when some other synchronized method terminates the loop.

(define-syntax list%
  (syntax-rules (sync done)
    ((list% sync (next more))
     (next #t more))
    ((list% done (start-list) (list-var))
     (null? list-var))
    ((list% stuff ...)
     (list* stuff ...))))

5.19.7  Expanded code

The expansion of

  (reduce ((list* x '(1 2 3)))
          ((r '()))
    (cons x r))

is

  (let ((final (lambda (r) (values r)))
        (list '(1 2 3))
        (r '()))
    (let loop ((list list) (r r))
      (if (null? list)
          (final r)
          (let ((x (car list))
                (list (cdr list)))
            (let ((continue (lambda (r)
                              (loop list r))))
              (continue (cons x r)))))))

The only inefficiencies in this code are the final and continue procedures, both of which could be substituted in-line. The macro expander could do the substitution for continue when there is no explicit proceed variable, as in this case, but not in general.

5.20  Sorting lists and vectors

(This section, as the libraries it describes, was written mostly by Olin Shivers for the draft of SRFI 32.)

The sort libraries in Scheme 48 include

5.20.1  Design rules

What vs. how

There are two different interfaces: ‘‘what’’ (simple) and ‘‘how’’ (detailed).

Simple
you specify semantics: datatype (list or vector), mutability, and stability.

Detailed
you specify the actual algorithm (quick, heap, insert, merge). Different algorithms have different properties, both semantic and pragmatic, so these exports are necessary.

It is necessarily the case that the specifications of these procedures make statements about execution ‘‘pragmatics.’’ For example, the sole distinction between heap sort and quick sort---both of which are provided by this library----is one of execution time, which is not a ‘‘semantic’’ distinction. Similar resource-use statements are made about ‘‘iterative’’ procedures, meaning that they can execute on input of arbitrary size in a constant number of stack frames.

Consistency across procedure signatures

The two interfaces share common procedure signatures wherever possible, to facilitate switching a given call from one procedure to another.

Less-than parameter first, data parameter after

These procedures uniformly observe the following parameter order: the data to be sorted comes after the comparison procedure. That is, we write

  (sort < list)

not

  (sort list <)

Ordering, comparison procedures and stability

These routines take a < comparison procedure, not a ≤ comparison procedure, and they sort into increasing order. The difference between a < spec and a ≤ spec comes up in two places:

We say that a data set (a list or vector) is sorted or ordered if it contains no adjacent pair of values ... x, y ... such that y < x.

In other words, scanning across the data never takes a ‘‘downwards’’ step.

If you use a ≤ procedure where these algorithms expect a < procedure, you may not get the answers you expect. For example, the list-sorted? procedure will return false if you pass it a ≤ comparison procedure and an ordered list containing adjacent equal elements.

A ‘‘stable’’ sort is one that preserves the pre-existing order of equal elements. Suppose, for example, that we sort a list of numbers by comparing their absolute values, i.e., using comparison procedure

(lambda (x y) (< (abs x) (abs y)))

If we sort a list that contains both 3 and -3:

... 3, ..., −3 ...

then a stable sort is an algorithm that will not swap the order of these two elements, that is, the answer is guaranteed to to look like

... 3, −3 ...

not

... −3, 3 ...

Choosing < for the comparison procedure instead of ≤ affects how stability is coded. Given an adjacent pair x, y, (< y x) means ‘‘x should be moved in front of x’’---otherwise, leave things as they are. So using a ≤ procedure where a < procedure is expected will invert stability.

This is due to the definition of equality, given a < comparator:

    (and (not (< x y))
         (not (< y x)))

The definition is rather different, given a ≤ comparator:

    (and (<= x y)
         (<= y x))

A ‘‘stable’’ merge is one that reliably favors one of its data sets when equal items appear in both data sets. All merge operations in this library are stable, breaking ties between data sets in favor of the first data set---elements of the first list come before equal elements in the second list.

So, if we are merging two lists of numbers ordered by absolute value, the stable merge operation list-merge

    (list-merge (lambda (x y) (< (abs x) (abs y)))
                '(0 -2 4 8 -10) '(-1 3 -4 7))

reliably places the 4 of the first list before the equal-comparing -4 of the second list:

    (0 -1 -2 4 -4 7 8 -10)

Some sort algorithms will not work correctly if given a ≤ when they expect a < comparison (or vice-versa).

In short, if your comparison procedure f answers true to (f x x), then

Note that you can synthesize a < procedure from a ≤ procedure with

    (lambda (x y) (not (<= y x)))

if need be.

Precise definitions give sharp edges to tools, but require care in use. ‘‘Measure twice, cut once.’’

All vector operations accept optional subrange parameters

The vector operations specified below all take optional start/end arguments indicating a selected subrange of a vector’s elements. If a start parameter or start/end parameter pair is given to such a procedure, they must be exact, non-negative integers, such that

0 ≤
start
end
(vector-length
vector
)

where vector is the related vector parameter. If not specified, they default to 0 and the length of the vector, respectively. They are interpreted to select the range [start,end), that is, all elements from index start (inclusive) up to, but not including, index end.

Required vs. allowed side-effects

List-sort! and List-stable-sort! are allowed, but not required, to alter their arguments’ cons cells to construct the result list. This is consistent with the what-not-how character of the group of procedures to which they belong (the sorting structure).

The list-delete-neighbor-dups!, list-merge! and list-merge-sort! procedures, on the other hand, provide specific algorithms, and, as such, explicitly commit to the use of side-effects on their input lists in order to guarantee their key algorithmic properties (e.g., linear-time operation).

5.20.2  Procedure specification

Structure name Functionality
sorting General sorting for lists and vectors
sorted Sorted predicates for lists and vectors
list-merge-sortList merge sort
vector-merge-sort Vector merge sort
vector-heap-sort Vector heap sort
vector-quick-sort Vector quick sort
vector-quick-sort3 Vector quick sort with 3-way comparisons
vector-insert-sort Vector insertion sort
delete-neighbor-duplicates List and vector delete neighbor duplicates
binary-searches Vector binary search
Note that there is no ‘‘list insert sort’’ package, as you might as well always use list merge sort. The reference implementation’s destructive list merge sort will do fewer set-cdr!s than a destructive insert sort.

Procedure naming and functionality

Almost all of the procedures described below are variants of two basic operations: sorting and merging. These procedures are consistently named by composing a set of basic lexemes to indicate what they do.

Lexeme Meaning
sortThe procedure sorts its input data set by some < comparison procedure.
mergeThe procedure merges two ordered data sets into a single ordered result.
stable This lexeme indicates that the sort is a stable one.
vectorThe procedure operates upon vectors.
list The procedure operates upon lists.
! Procedures that end in ! are allowed, and sometimes required, to reuse their input storage to construct their answer.

Types of parameters and return values

In the procedures specified below,

Passing values to procedures with these parameters that do not satisfy these types is an error.

If a procedure is said to return ‘‘unspecified,’’ this means that nothing at all is said about what the procedure returns, not even the number of return values. Such a procedure is not even required to be consistent from call to call in the nature or number of its return values. It is simply required to return a value (or values) that may be passed to a command continuation, e.g. as the value of an expression appearing as a non-terminal subform of a begin expression. Note that in R5RS, this restricts such a procedure to returning a single value; non-R5RS systems may not even provide this restriction.

5.20.2.1  sorting---general sorting package

This library provides basic sorting and merging functionality suitable for general programming. The procedures are named by their semantic properties, i.e., what they do to the data (sort, stable sort, merge, and so forth).

Procedure Suggested algorithm
list-sort vector heap or quick
list-sort! list merge sort
list-stable-sort vector merge sort
list-stable-sort! list merge sort
vector-sort heap or quick sort
vector-sort! heap or quick sort
vector-stable-sort vector merge sort
vector-stable-sort! merge sort
List-Sorted? and vector-sorted? return true if their input list or vector is in sorted order, as determined by their < comparison parameter.

All four merge operations are stable: an element of the initial list list1 or vector vector1 will come before an equal-comparing element in the second list list2 or vector vector2 in the result.

The procedures

do not alter their inputs and are allowed to return a value that shares a common tail with a list argument.

The procedure

are ‘‘linear update’’ operators---they are allowed, but not required, to alter the cons cells of their arguments to produce their results.

On the other hand, the list-merge! procedure make only a single, iterative, linear-time pass over its argument list, using set-cdr!s to rearrange the cells of the list into the final result ---it works ‘‘in place.’’ Hence, any cons cell appearing in the result must have originally appeared in an input. The intent of this iterative-algorithm commitment is to allow the programmer to be sure that if, for example, list-merge! is asked to merge two ten-million-element lists, the operation will complete without performing some extremely (possibly twenty-million) deep recursion.

The vector procedures

do not alter their inputs, but allocate a fresh vector for their result, of length endstart.

The vector procedures

sort their data in-place. (But note that vector-stable-sort! may allocate temporary storage proportional to the size of the input .)

Vector-merge returns a vector of length (end1start1 + (end2start2).

Vector-merge! writes its result into vector v, beginning at index start, for indices less than end = start + (end1start1) + (end2start2). The target subvector v[start,end) may not overlap either source subvector vector1[start1,end1) vector2[start2,end2).

The ...-delete-neighbor-dups-... procedures: These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure. The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.

These procedures are linear time---much faster than the O(n2) general duplicate-element deletors that do not assume any ‘‘bunching’’ of elements (such as the ones provided by SRFI 1). If you want to delete duplicate elements from a large list or vector, you can sort the elements to bring equal items together, then use one of these procedures, for a total time of O(nlog(n)).

The comparison procedure = passed to these procedures is always applied ( = x y) where x comes before y in the containing list or vector.

Examples:

(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2))
  ===⇒ (1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2))
  ===⇒ #(1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7)
  ===⇒ #(7 0 -2)

5.20.2.2  Algorithm-specific sorting packages

These packages provide more specific sorting functionality, that is, specific commitment to particular algorithms that have particular pragmatic consequences (such as memory locality, asymptotic running time) beyond their semantic behaviour (sorting, stable sorting, merging, etc.). Programmers that need a particular algorithm can use one of these packages.

sorted---sorted predicates

Return #f iff there is an adjacent pair ... x, y ... in the input list or vector such that y < x. The optional start/end range arguments restrict vector-sorted? to the indicated subvector.

list-merge-sort---list merge sort

The sort procedures sort their data using a list merge sort, which is stable. (The reference implementation is, additionally, a ‘‘natural’’ sort. See below for the properties of this algorithm.)

The ! procedures are destructive---they use set-cdr!s to rearrange the cells of the lists into the proper order. As such, they do not allocate any extra cons cells---they are ‘‘in place’’ sorts.

The merge operations are stable: an element of list1 will come before an equal-comparing element in list2 in the result list.

vector-merge-sort---vector merge sort

The sort procedures sort their data using vector merge sort, which is stable. (The reference implementation is, additionally, a ‘‘natural’’ sort. See below for the properties of this algorithm.)

The optional start/end arguments provide for sorting of subranges, and default to 0 and the length of the corresponding vector.

Merge-sorting a vector requires the allocation of a temporary ‘‘scratch’’ work vector for the duration of the sort. This scratch vector can be passed in by the client as the optional temp argument; if so, the supplied vector must be of size ≤ end, and will not be altered outside the range [start,end). If not supplied, the sort routines allocate one themselves.

The merge operations are stable: an element of vector1 will come before an equal-comparing element in vector2 in the result vector.

vector-heap-sort---vector heap sort

These procedures sort their data using heap sort, which is not a stable sorting algorithm.

Vector-heap-sort returns a vector of length endstart. Vector-heap-sort! is in-place, leaving its result in vector[start,end).

vector-quick-sort---vector quick sort

These procedures sort their data using quick sort, which is not a stable sorting algorithm.

Vector-quick-sort returns a vector of length endstart. Vector-quick-sort! is in-place, leaving its result in vector[start,end).

vector-quick-sort3---vector quick sort with 3-way comparisons

These procedures sort their data using quick sort, which is not a stable sorting algorithm.

Vector-quick-sort3 returns a vector of length endstart. Vector-quick-sort3! is in-place, leaving its result in vector[start,end).

These procedures implement a variant of quick-sort that takes a three-way comparison procedure C. C compares a pair of elements and returns an exact integer whose sign indicates their relationship:

(C x y) < 0 x<y
(C x y) = 0 x = y
(C x y) > 0 x>y

*

To help remember the relationship between the sign of the result and the relation, use the procedure − as the model for C: (− x y) < 0 means that x < y; (− x y) > 0 means that x > y.

vector-insert-sort---vector insertion sort

These procedures stably sort their data using insertion sort.

delete-neighbor-duplicates---list and vector delete neighbor duplicates

These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure = . The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.

These procedures are linear time---much faster than the O(n2) general duplicate-element deletors that do not assume any ‘‘bunching’’ of elements (such as the ones provided by SRFI 1). If you want to delete duplicate elements from a large list or vector, you can sort the elements to bring equal items together, then use one of these procedures, for a total time of O(nlog(n)).

The comparison procedure = passed to these procedures is always applied

( =  x y)

where x comes before y in the containing list or vector.

Examples:

(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2))
  ===⇒ (1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2))
  ===⇒ #(1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7)
  ===⇒ #(7 0 -2)

;; Result left in v[3,9):
(let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6)))
  (cons (vector-delete-neighbor-dups! = v 3)
        v))
   ===⇒ (9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))

binary-searches---vector binary search

vector-binary-search searches vector in range [start,end) (which default to 0 and the length of vector, respectively) for an element whose associated key is equal to key. The procedure elt->key is used to map an element to its associated key. The elements of the vector are assumed to be ordered by the < relation on these keys. That is,

(vector-sorted? (lambda (x y) (< (elt->key x) (elt->key y)))
                vector start end) ===⇒ true

An element e of vector is a match for key if it’s neither less nor greater than the key:

(and (not (< (elt->key e) key))
     (not (< key (elt->key e))))

If there is such an element, the procedure returns its index in the vector as an exact integer. If there is no such element in the searched range, the procedure returns false.

(vector-binary-search < car 4 '#((1 . one) (3 . three)
                                 (4 . four) (25 . twenty-five)))
===⇒ 2

(vector-binary-search < car 7 '#((1 . one) (3 . three)
                                 (4 . four) (25 . twenty-five)))
===⇒ #f

Vector-binary-search3 is a variant that uses a three-way comparison procedure compare-proc. Compare-proc compares its parameter to the search key, and returns an exact integer whose sign indicates its relationship to the search key.

array
rclcrcl
(
compare-proc
 x)   <   0   ⇒   x   <  
search-key
(
compare-proc
 x)    =    0   ⇒   x    =   
search-key
(
compare-proc
 x)   >   0   ⇒   x   >  
search-key
endarray

(vector-binary-search3 (lambda (elt) (- (car elt) 4))
                       '#((1 . one) (3 . three)
                          (4 . four) (25 . twenty-five)))
===⇒ 2

5.20.3  Algorithmic properties

Different sort and merge algorithms have different properties. Choose the algorithm that matches your needs:

Vector insert sort
Stable, but only suitable for small vectors---O(n2).
Vector quick sort
Not stable. Is fast on average---O(nlog(n))---but has bad worst-case behaviour. Has good memory locality for big vectors (unlike heap sort). A clever pivot-picking trick (median of three samples) helps avoid worst-case behaviour, but pathological cases can still blow up.
Vector heap sort
Not stable. Guaranteed fast---O(nlog(n)) worst case. Poor locality on large vectors. A very reliable workhorse.
Vector merge sort
Stable. Not in-place---requires a temporary buffer of equal size. Fast---O(nlog(n))---and has good memory locality for large vectors.

The implementation of vector merge sort provided by this implementation is, additionally, a ‘‘natural’’ sort, meaning that it exploits existing order in the input data, providing O(n) best case.

Destructive list merge sort
Stable, fast and in-place (i.e., allocates no new cons cells). ‘‘Fast’’ means O(nlog(n)) worst-case, and substantially better if the data is already mostly ordered, all the way down to linear time for a completely-ordered input list (i.e., it is a ‘‘natural’’ sort).

Note that sorting lists involves chasing pointers through memory, which can be a loser on modern machine architectures because of poor cache and page locality. Sorting vectors has inherently better locality.

This implementation’s destructive list merge and merge sort implementations are opportunistic---they avoid redundant set-cdr!s, and try to take long already-ordered runs of list structure as-is when doing the merges.

Pure list merge sort
Stable and fast---O(nlog(n)) worst-case, and possibly O(n), depending upon the input list (see discussion above).

Algorithm Stable? Worst case Average case In-place
Vector insert YesO(n2) O(n2)Yes
Vector quick No O(n2) O(nlog(n))Yes
Vector heap No O(nlog(n))O(nlog(n))Yes
Vector merge YesO(nlog(n))O(nlog(n))No
List merge YesO(nlog(n))O(nlog(n))Either

5.21  Regular expressions

This section describes a functional interface for building regular expressions and matching them against strings. The matching is done using the POSIX regular expression package. Regular expressions are in the structure regexps.

A regular expression is either a character set, which matches any character in the set, or a composite expression containing one or more subexpressions. A regular expression can be matched against a string to determine success or failure, and to determine the substrings matched by particular subexpressions.

Returns #t if value is a regular expression created using the functional interface for regular expressions, and #f otherwise.

5.21.1  Character sets

Character sets may be defined using a list of characters and strings, using a range or ranges of characters, or by using set operations on existing character sets.

Set returns a set that contains the character arguments and the characters in any string arguments. Range returns a character set that contain all characters between low-char and high-char, inclusive. Ranges returns a set that contains all characters in the given ranges. Range and ranges use the ordering induced by char->integer. Ascii-range and ascii-ranges use the ASCII ordering. It is an error for a high-char to be less than the preceding low-char in the appropriate ordering.

These perform the indicated operations on character sets.

The following character sets are predefined:

lower-case (set "abcdefghijklmnopqrstuvwxyz")
upper-case (set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")
alphabetic (union lower-case upper-case)
numeric (set "0123456789")
alphanumeric (union alphabetic numeric)
punctuation (set "!\"#$%&'()*+,‑./:;<=>?@[\\]^_`{|}~")
graphic (union alphanumeric punctuation)
printing (union graphic (set #\space))
control (negate printing)
blank (set #\space (ascii->char 9)) ; 9 is tab
whitespace (union (set #\space) (ascii-range 9 13))
hexdigit (set "0123456789abcdefABCDEF")

The above are taken from the default locale in POSIX. The characters in whitespace are space, tab, newline (= line feed), vertical tab, form feed, and carriage return.

5.21.2  Anchoring

String-start returns a regular expression that matches the beginning of the string being matched against; string-end returns one that matches the end.

5.21.3  Composite expressions

Sequence matches the concatenation of its arguments, one-of matches any one of its arguments.

Text returns a regular expression that matches the characters in string, in order.

Repeat returns a regular expression that matches zero or more occurrences of its reg-exp argument. With no count the result will match any number of times (reg-exp*). With a single count the returned expression will match reg-exp exactly that number of times. The final case will match from min to max repetitions, inclusive. Max may be #f, in which case there is no maximum number of matches. Count and min should be exact, non-negative integers; max should either be an exact non-negative integer or #f.

5.21.4  Case sensitivity

Regular expressions are normally case-sensitive.

The value returned by ignore-case is identical its argument except that case will be ignored when matching. The value returned by use-case is protected from future applications of ignore-case. The expressions returned by use-case and ignore-case are unaffected by later uses of the these procedures. By way of example, the following matches "ab" but not "aB", "Ab", or "AB".

(text "ab")

while

(ignore-case (test "ab"))

matches "ab", "aB", "Ab", and "AB" and

(ignore-case (sequence (text "a")
                       (use-case (text "b"))))

matches "ab" and "Ab" but not "aB" or "AB".

5.21.5  Submatches and matching

A subexpression within a larger expression can be marked as a submatch. When an expression is matched against a string, the success or failure of each submatch within that expression is reported, as well as the location of the substring matched be each successful submatch.

Submatch returns a regular expression that matches its argument and causes the result of matching its argument to be reported by the match procedure. Key is used to indicate the result of this particular submatch in the alist of successful submatches returned by match. Any value may be used as a key. No-submatches returns an expression identical to its argument, except that all submatches have been elided.

Any-match? returns #t if string matches reg-exp or contains a substring that does, and #f otherwise. Exact-match? returns #t if string matches reg-exp and #f otherwise.

Match returns #f if reg-exp does not match string and a match record if it does match. A match record contains three values: the beginning and end of the substring that matched the pattern and an a-list of submatch keys and corresponding match records for any submatches that also matched. Match-start returns the index of the first character in the matching substring and match-end gives index of the first character after the matching substring. Match-submatches returns an alist of submatch keys and match records. Only the top match record returned by match has a submatch alist.

Matching occurs according to POSIX. The match returned is the one with the lowest starting index in string. If there is more than one such match, the longest is returned. Within that match the longest possible submatches are returned.

All three matching procedures cache a compiled version of reg-exp. Subsequent calls with the same reg-exp will be more efficient.

The C interface to the POSIX regular expression code uses ASCII nul as an end-of-string marker. The matching procedures will ignore any characters following an embedded ASCII nuls in string.

(define pattern (text "abc"))
(any-match? pattern "abc")         => #t
(any-match? pattern "abx")         => #f
(any-match? pattern "xxabcxx")     => #t

(exact-match? pattern "abc")       => #t
(exact-match? pattern "abx")       => #f
(exact-match? pattern "xxabcxx")   => #f

(match pattern "abc")              => #{match 0 3}
(match pattern "abx")              => #f
(match pattern "xxabcxx")          => #{match 2 5}

(let ((x (match (sequence (text "ab")
                          (submatch 'foo (text "cd"))
                          (text "ef"))
                "xxxabcdefxx")))
  (list x (match-submatches x)))
  => (#{match 3 9} ((foo . #{match 5 7}))

(match-submatches
  (match (sequence
           (set "a")
           (one-of (submatch 'foo (text "bc"))
                   (submatch 'bar (text "BC"))))
         "xxxaBCd"))
  => ((bar . #{match 4 6}))

5.22  SRFIs

‘SRFI’ stands for ‘Scheme Request For Implementation’. An SRFI is a description of an extension to standard Scheme. Draft and final SRFI documents, a FAQ, and other information about SRFIs can be found at the SRFI web site.

Scheme 48 includes implementations of the following (final) SRFIs:

Documentation on these can be found at the web site mentioned above.

SRFI 4 specifies an external representation for homogeneous numeric vectors that is incompatible with R5RS. The Scheme 48 version of SRFI 4 does not support this external representation.

SRFI 14 includes the procedure ->char-set which is not a standard Scheme identifier (in R5RS the only required identifier starting with - is - itself). In the Scheme 48 version of SRFI 14 we have renamed ->char-set as x->char-set.

SRFI bindings can be accessed either by opening the appropriate structure (the structure srfi-n contains SRFI n) or by loading structure srfi-7 and then using the ,load-srfi-7-program command to load an SRFI 7-style program. The syntax for the command is

,load-srfi-7-program name filename

This creates a new structure and associated package, binds the structure to name in the configuration package, and then loads the program found in filename into the package.

As an example, if the file test.scm contains

(program (code (define x 10)))

this program can be loaded as follows:

> ,load-package srfi-7
> ,load-srfi-7-program test test.scm
[test]
> ,in test
test> x
10
test>