Chapter 5

Libraries

Use the ,open command (section 3.4) or the module language (chapter 2.6) to open the structures described below.

5.1 General utilities

These are in the big-util structure.

(atom? value) –> boolean

(atom? x) is the same as (not (pair? x)).

(null-list? list) –> boolean

Returns true for the empty list, false for a pair, and signals an error otherwise.

(neq? value value) –> boolean

(neq? x y) is the same as (not (eq? x y)).

(n= number number) –> boolean

(n= x y) is the same as (not (= x y)).

(identity value) –> value
(no-op value) –> value

These both just return their argument. No-op is guaranteed not to be compiled in-line, identity may be.

(memq? value list) –> boolean

Returns true if value is in list, false otherwise.

(any? predicate list) –> boolean

Returns true if predicate is true for any element of list.

(every? predicate list) –> boolean

Returns true if predicate is true for every element of list.

(any predicate list) –> value
(first predicate list) –> value

Any returns some element of list for which predicate is true, or false if there are none. First does the same except that it returns the first element for which predicate is true.

(filter predicate list) –> list
(filter! predicate list) –> list

Returns a list containing all of the elements of list for which predicate is true. The order of the elements is preserved. Filter! may reuse the storage of list.

(filter-map procedure list) –> list

The same as filter except the returned list contains the results of applying procedure instead of elements of list. (filter-map p l) is the same as (filter identity (map p l)).

(partition-list predicate list) –> list list
(partition-list! predicate list) –> list list

The first return value contains those elements list for which predicate is true, the second contains the remaining elements. The order of the elements is preserved. Partition-list! may reuse the storage of the list.

(remove-duplicates list) –> list

Returns its argument with all duplicate elements removed. The first instance of each element is preserved.

(delq value list) –> list
(delq! value list) –> list
(delete predicate list) –> list

All three of these return list with some elements removed. Delq removes all elements eq? to value. Delq! does the same and may modify the list argument. Delete removes all elements for which predicate is true. Both delq and delete may reuse some of the storage in the list argument, but won’t modify it.

(reverse! list) –> list

Destructively reverses list.

(concatenate-symbol value ...) –> symbol

Returns the symbol whose name is produced by concatenating the displayed representations of value ....

(concatenate-symbol 'abc "-" 4) ===⇒ 'abc-4

5.2 Pretty-printing

These are in the pp structure.

(p value)
(p value output-port)
(pretty-print value output-port position)

Pretty-print value The current output port is used if no port is specified. Position is the starting offset. Value will be pretty-printed to the right of this column.

5.3 Bitwise integer operations

These functions use the two’s-complement representation for integers. There is no limit to the number of bits in an integer. They are in the structures bitwise and big-scheme.

(bitwise-and integer integer) –> integer
(bitwise-ior integer integer) –> integer
(bitwise-xor integer integer) –> integer
(bitwise-not integer) –> integer

These perform various logical operations on integers on a bit-by-bit basis. ‘ior’ is inclusive OR and ‘xor’ is exclusive OR.

(arithmetic-shift integer bit-count) –> integer

Shifts the integer by the given bit count, which must be an integer, shifting left for positive counts and right for negative ones. Shifting preserves the integer’s sign.

(bit-count integer) –> integer

Counts the number of bits set in the integer. If the argument is negative a bitwise NOT operation is performed before counting.

5.4 Byte vectors

These are homogeneous vectors of small integers (0 ≤ i ≤ 255). The functions that operate on them are analogous to those for vectors. They are in the structure byte-vectors.

(byte-vector? value) –> boolean
(make-byte-vector k fill) –> byte-vector
(byte-vector b ...) –> byte-vector
(byte-vector-length byte-vector) –> integer
(byte-vector-ref byte-vector k) –> integer
(byte-vector-set! byte-vector k b)

5.5 Sparse vectors

These are vectors that grow as large as they need to. That is, they can be indexed by arbitrarily large nonnegative integers. The implementation allows for arbitrarily large gaps by arranging the entries in a tree. They are in the structure sparse-vectors.

(make-sparse-vector) –> sparse-vector
(sparse-vector-ref sparse-vector k) –> value
(sparse-vector-set! sparse-vector k value)
(sparse-vector->list sparse-vector) –> list

Make-sparse-vector, sparse-vector-ref, and sparse-vector-set! are analogous to make-vector, vector-ref, and vector-set!, except that the indices passed to sparse-vector-ref and sparse-vector-set! can be arbitrarily large. For indices whose elements have not been set in a sparse vector, sparse-vector-ref returns #f.

Sparse-vector->list is for debugging: It returns a list of the consecutive elements in a sparse vector from 0 to the highest element that has been set. Note that the list will also include all the #f elements for the unset elements.

5.6 Cells

These hold a single value and are useful when a simple indirection is required. The system uses these to hold the values of lexical variables that may be set!.

(cell? value) –> boolean
(make-cell value) –> cell
(cell-ref cell) –> value
(cell-set! cell value)

5.7 Queues

These are ordinary first-in, first-out queues. The procedures are in structure queues.

(make-queue) –> queue
(queue? value) –> boolean
(queue-empty? queue) –> boolean
(enqueue! queue value)
(dequeue! queue) –> value

Make-queue creates an empty queue, queue? is a predicate for identifying queues, queue-empty? tells you if a queue is empty, enqueue! and dequeue! add and remove values.

(queue-length queue) –> integer
(queue->list queue) –> values
(list->queue values) –> queue
(delete-from-queue! queue value) –> boolean

Queue-length returns the number of values in queue. Queue->list returns the values in queue as a list, in the order in which the values were added. List->queue returns a queue containing values, preserving their order. Delete-from-queue removes the first instance of value from queue, using eq? for comparisons. Delete-from-queue returns #t if value is found and #f if it is not.

5.8 Arrays

These provide N-dimensional, zero-based arrays and are in the structure arrays. The array interface is derived from one invented by Alan Bawden.

(make-array value dimension₀ ...) –> array
(array dimensions element₀ ...) –> array
(copy-array array) –> array

Make-array makes a new array with the given dimensions, each of which must be a non-negative integer. Every element is initially set to value. Array Returns a new array with the given dimensions and elements. Dimensions must be a list of non-negative integers, The number of elements should be the equal to the product of the dimensions. The elements are stored in row-major order.

(make-array 'a 2 3) => {Array 2 3}

(array '(2 3) 'a 'b 'c 'd 'e 'f)
    => {Array 2 3}

Copy-array returns a copy of array. The copy is identical to the array but does not share storage with it.

(array? value) –> boolean

Returns #t if value is an array.

(array-ref array index₀ ...) –> value
(array-set! array value index₀ ...)
(array->vector array) –> vector
(array-shape array) –> list

Array-ref returns the specified array element and array-set! replaces the element with value.

(let ((a (array '(2 3) 'a 'b 'c 'd 'e 'f)))
  (let ((x (array-ref a 0 1)))
    (array-set! a 'g 0 1)
    (list x (array-ref a 0 1))))
    => '(b g)

Array->vector returns a vector containing the elements of array in row-major order. Array-shape returns the dimensions of the array as a list.

(make-shared-array array linear-map dimension₀ ...) –> array

Make-shared-array makes a new array that shares storage with array and uses linear-map to map indexes to elements. Linear-map must accept as many arguments as the number of dimensions given and must return a list of non-negative integers that are valid indexes into array.

(array-ref (make-shared-array a f i0 i1 ...)
           j0 j1 ...)

is equivalent to

(apply array-ref a (f j0 j1 ...))

As an example, the following function makes the transpose of a two-dimensional array:

(define (transpose array)
  (let ((shape (array-shape array)))
    (make-shared-array array
                       (lambda (x y)
                         (list y x))
                       (cadr shape)
                       (car shape))))

(array->vector
  (transpose
    (array '(2 3) 'a 'b 'c 'd 'e 'f)))
      => '(a d b e c f)

5.9 Records

New types can be constructed using the define-record-type macro from the define-record-types structure The general syntax is:

(define-record-type tag type-name
  (constructor-name field-tag ...)
  predicate-name
  (field-tag accessor-name [modifier-name])
  ...)

This makes the following definitions:

type-name (type)
(constructor-name field-init ...) –> type-name
(predicate-name value) –> boolean
(accessor-name type-name) –> value
(modifier-name type-name value)

Type-name is the record type itself, and can be used to specify a print method (see below). Constructor-name is a constructor that accepts values for the fields whose tags are specified. Predicate-name is a predicate that returns #t for elements of the type and #f for everything else. The accessor-names retrieve the values of fields, and the modifier-name’s update them. Tag is used in printing instances of the record type and the field-tags are used in the inspector and to match constructor arguments with fields.

(define-record-discloser type discloser)

Define-record-discloser determines how records of type type are printed. Discloser should be procedure which takes a single record of type type and returns a list whose car is a symbol. The record will be printed as the value returned by discloser with curly braces used instead of the usual parenthesis.

For example

(define-record-type pare :pare
  (kons x y)
  pare?
  (x kar set-kar!)
  (y kdr))

defines kons to be a constructor, kar and kdr to be accessors, set-kar! to be a modifier, and pare? to be a predicate for a new type of object. The type itself is named :pare. Pare is a tag used in printing the new objects.

By default, the new objects print as #{Pare}. The print method can be modified using define-record-discloser:

(define-record-discloser :pare
  (lambda (p) `(pare ,(kar p) ,(kdr p))))

will cause the result of (kons 1 2) to print as #{Pare 1 2}.

Define-record-resumer (section 8.7.3) can be used to control how records are stored in heap images.

5.9.1 Low-level access to records

Records are implemented using primitive objects exactly analogous to vectors. Every record has a record type (which is another record) in the first slot. Note that use of these procedures, especially record-set!, breaks the record abstraction described above; caution is advised.

These procedures are in the structure records.

(make-record n value) –> record
(record value ...) –> record-vector
(record? value) –> boolean
(record-length record) –> integer
(record-type record) –> value
(record-ref record i) –> value
(record-set! record i value)

These the same as the standard vector- procedures except that they operate on records. The value returned by record-length includes the slot holding the record’s type. (record-type x) is equivalent to (record-ref x 0).

5.9.2 Record types

Record types are themselves records of a particular type (the first slot of :record-type points to itself). A record type contains four values: the name of the record type, a list of the names its fields, and procedures for disclosing and resuming records of that type. Procedures for manipulating them are in the structure record-types.

(make-record-type name field-names) –> record-type
(record-type? value) –> boolean
(record-type-name record-type) –> symbol
(record-type-field-names record-type) –> symbols

(record-constructor record-type field-names) –> procedure
(record-predicate record-type) –> procedure
(record-accessor record-type field-name) –> procedure
(record-modifier record-type field-name) –> procedure

These procedures construct the usual record-manipulating procedures. Record-constructor returns a constructor that is passed the initial values for the fields specified and returns a new record. Record-predicate returns a predicate that return true when passed a record of type record-type and false otherwise. Record-accessor and record-modifier return procedures that reference and set the given field in records of the approriate type.

(define-record-discloser record-type discloser)
(define-record-resumer record-type resumer)

Record-types is the initial exporter of define-record-discloser (re-exported by define-record-types described above) and define-record-resumer (re-exported by external-calls (section 8.7.3)).

The procedures described in this section can be used to define new record-type-defining macros.

(define-record-type pare :pare
  (kons x y)
  pare?
  (x kar set-kar!)
  (y kdr))

is (sematically) equivalent to

(define :pare (make-record-type 'pare '(x y)))
(define kons (record-constructor :pare '(x y)))
(define kar (record-accessor :pare 'x))
(define set-kar! (record-modifier :pare 'x))
(define kdr (record-accessor :pare 'y))

The “(semantically)” above is because define-record-type adds declarations, which allows the type checker to detect some misuses of records, and uses more efficient definitions for the constructor, accessors, and modifiers. Ignoring the declarations, which will have to wait for another edition of the manual, what the above example actually expands into is:

(define :pare (make-record-type 'pare '(x y)))
(define (kons x y) (record :pare x y))
(define (kar r) (checked-record-ref r :pare 1))
(define (set-kar! r new)
  (checked-record-set! r :pare 1 new))
(define (kdr r) (checked-record-ref r :pare 2))

Checked-record-ref and Checked-record-set! are low-level procedures that check the type of the record and access or modify it using a single VM instruction.

5.10 Finite record types

The structure finite-types has two macros for defining ‘finite’ record types. These are record types for which there are a fixed number of instances, all of which are created at the same time as the record type itself. The syntax for defining an enumerated type is:

(define-enumerated-type tag type-name
  predicate-name
  vector-of-instances-name
  name-accessor
  index-accessor
  (instance-name ...))

This defines a new record type, bound to type-name, with as many instances as there are instance-name’s. Vector-of-instances-name is bound to a vector containing the instances of the type in the same order as the instance-name list. Tag is bound to a macro that when given an instance-name expands into an expression that returns corresponding instance. The name lookup is done at macro expansion time. Predicate-name is a predicate for the new type. Name-accessor and index-accessor are accessors for the name and index (in vector-of-instances) of instances of the type.

(define-enumerated-type color :color
  color?
  colors
  color-name
  color-index
  (black white purple maroon))

(color-name (vector-ref colors 0)) => black
(color-name (color white))         => white
(color-index (color purple))       => 2

Finite types are enumerations that allow the user to add additional fields in the type. The syntax for defining a finite type is:

(define-finite-type tag type-name
  (field-tag ...)
  predicate-name
  vector-of-instances-name
  name-accessor
  index-accessor
  (field-tag accessor-name [modifier-name])
  ...((instance-name field-value ...)
   ...))

The additional fields are specified exactly as with define-record-type. The field arguments to the constructor are listed after the type-name; these do not include the name and index fields. The form ends with the names and the initial field values for the instances of the type. The instances are constructed by applying the (unnamed) constructor to these initial field values. The name must be first and the remaining values must match the field-tags in the constructor’s argument list.

(define-finite-type color :color
  (red green blue)
  color?
  colors
  color-name
  color-index
  (red   color-red)
  (green color-green)
  (blue  color-blue)
  ((black    0   0   0)
   (white  255 255 255)
   (purple 160  32 240)
   (maroon 176  48  96)))

(color-name (color black))         => black
(color-name (vector-ref colors 1)) => white
(color-index (color purple))       => 2
(color-red (color maroon))         => 176

5.11 Sets over finite types

The structure enum-sets has a macro for defining types for sets of elements of finite types. These work naturally with the finite types defined by the finite-types structure, but are not tied to them. The syntax for defining such a type is:

(define-enum-set-type id type-name predicate constructor
   element-syntax element-predicate all-elements element-index-ref)

This defines id to be syntax for constructing sets, type-name to be a value representing the type, predicate to be a predicate for those sets, and constructor a procedure for constructing one from a list.

Element-syntax must be the name of a macro for constructing set elements from names (akin to the tag argument to define-enumerated-type). Element-predicate must be a predicate for the element type, all-elements a vector of all values of the element type, and element-index-ref must return the index of an element within the all-elements vector.

(enum-set->list enum-set) –> list
(enum-set-member? enum-set enumerand) –> boolean
(enum-set=? enum-set enum-set) –> boolean
(enum-set-union enum-set enum-set) –> enum-set
(enum-set-intersection enum-set enum-set) –> enum-set
(enum-set-negation enum-set) –> enum-set

Enum-set->list converts a set into a list of its elements. Enum-set-member? tests for membership. Enum-set=? tests two sets of equal type for equality. (If its arguments are not of the same type, enum-set=? raises an exception.) Enum-set-union computes the union of two sets of equal type, enum-set-intersection computes the intersection, and enum-set-negation computes the complement of a set.

Here is an example. Given an enumerated type:

(define-enumerated-type color :color
  color?
  colors
  color-name
  color-index
  (red blue green))

we can define sets of colors:

(define-enum-set-type color-set :color-set
                      color-set?
                      make-color-set
  color color? colors color-index)

> (enum-set->list (color-set red blue))
(#{Color red} #{Color blue})
> (enum-set->list (enum-set-negation (color-set red blue)))
(#{Color green})
> (enum-set-member? (color-set red blue) (color blue))
#t

5.12 Hash tables

These are generic hash tables, and are in the structure tables. Strictly speaking they are more maps than tables, as every table has a value for every possible key (for that type of table). All but a finite number of those values are #f.

(make-table) –> table
(make-symbol-table) –> symbol-table
(make-string-table) –> string-table
(make-integer-table) –> integer-table
(make-table-maker compare-proc hash-proc) –> procedure
(make-table-immutable! table)

The first four functions listed make various kinds of tables. Make-table returns a table whose keys may be symbols, integer, characters, booleans, or the empty list (these are also the values that may be used in case expressions). As with case, comparison is done using eqv?. The comparison procedures used in symbol, string, and integer tables are eq?, string=?, and =.

Make-table-maker takes two procedures as arguments and returns a nullary table-making procedure. Compare-proc should be a two-argument equality predicate. Hash-proc should be a one argument procedure that takes a key and returns a non-negative integer hash value. If (compare-proc x y) returns true, then (= (hash-proc x) (hash-proc y)) must also return true. For example, make-integer-table could be defined as (make-table-maker = abs).

Make-table-immutable! prohibits future modification to its argument.

(table? value) –> boolean
(table-ref table key) –> value or #f
(table-set! table key value)
(table-walk procedure table)

Table? is the predicate for tables. Table-ref and table-set! access and modify the value of key in table. Table-walk applies procedure, which must accept two arguments, to every associated key and non-#f value in table.

(default-hash-function value) –> integer
(string-hash string) –> integer

Default-hash-function is the hash function used in the tables returned by make-table, and string-hash it the one used by make-string-table.

5.13 Port extensions

These procedures are in structure extended-ports.

(make-string-input-port string) –> input-port
(make-string-output-port) –> output-port
(string-output-port-output string-output-port) –> string

Make-string-input-port returns an input port that that reads characters from the supplied string. An end-of-file object is returned if the user reads past the end of the string. Make-string-output-port returns an output port that saves the characters written to it. These are then returned as a string by string-output-port-output.

(read (make-string-input-port "(a b)"))
    => '(a b)

(let ((p (make-string-output-port)))
  (write '(a b) p)
  (let ((s (string-output-port-output p)))
    (display "c" p)
    (list s (string-output-port-output p))))
    => '("(a b)" "(a b)c")

(limit-output output-port n procedure)

Procedure is called on an output port. Output written to that port is copied to output-port until n characters have been written, at which point limit-output returns. If procedure returns before writing n characters, then limit-output also returns at that time, regardless of how many characters have been written.

(make-tracking-input-port input-port) –> input-port
(make-tracking-output-port output-port) –> output-port
(current-row port) –> integer or #f
(current-column port) –> integer or #f
(fresh-line output-port)

Make-tracking-input-port and make-tracking-output-port return ports that keep track of the current row and column and are otherwise identical to their arguments. Closing a tracking port does not close the underlying port. Current-row and current-column return port’s current read or write location. They return #f if port does not keep track of its location. Fresh-line writes a newline character to output-port if (current-row port) is not 0.

(define p (open-output-port "/tmp/temp"))
(list (current-row p) (current-column p))
    => '(0 0)
(display "012" p)
(list (current-row p) (current-column p))
    => '(0 3)
(fresh-line p)
(list (current-row p) (current-column p))
    => '(1 0)
(fresh-line p)
(list (current-row p) (current-column p))
    => '(1 0)

5.14 Fluid bindings

These procedures implement dynamic binding and are in structure fluids. A fluid is a cell whose value can be bound dynamically. Each fluid has a top-level value that is used when the fluid is unbound in the current dynamic environment.

(make-fluid value) –> fluid
(fluid fluid) –> value
(let-fluid fluid value thunk) –> value(s)
(let-fluids fluid₀ value₀ fluid₁ value₁ ...thunk) –> value(s)

Make-fluid returns a new fluid with value as its initial top-level value. Fluid returns fluid’s current value. Let-fluid calls thunk, with fluid bound to value until thunk returns. Using a continuation to throw out of the call to thunk causes fluid to revert to its original value, while throwing back in causes fluid to be rebound to value. Let-fluid returns the value(s) returned by thunk. Let-fluids is identical to let-fluid except that it binds an arbitrary number of fluids to new values.

(let* ((f (make-fluid 'a))
       (v0 (fluid f))
       (v1 (let-fluid f 'b
             (lambda ()
               (fluid f))))
       (v2 (fluid f)))
  (list v0 v1 v2))
  => '(a b a)

(let ((f (make-fluid 'a))
      (path '())
      (c #f))
  (let ((add (lambda ()
               (set! path (cons (fluid f) path)))))
    (add)
    (let-fluid f 'b
      (lambda ()
        (call-with-current-continuation
          (lambda (c0)
            (set! c c0)))
        (add)))
    (add)
    (if (< (length path) 5)
        (c)
        (reverse path))))
  => '(a b a b a)

5.15 OS strings

On common operating systems such as Unix and Windows, various parameters to OS functionality—such as file names, user names, command-line arguments etc.—appear as text in most contexts, but are really byte sequences: On Unix, the byte sequence may be interpreted as text through some locale-determined encoding. On Windows, such parameters are typically represented as sequences of UTF-16 code units. In both cases, not every such byte sequence has a string equivalent: On Unix, a byte sequence encoding a file name using Latin-1 often cannot be decoded using UTF-8. On Windows, unpaired UTF-16 surrogates are admissible in encodings, and no lossless text decoding for them exists.

For representing such string-like parameters, Scheme 48 uses an abstraction called OS strings. An OS string is created from either a string or a NUL-terminated byte sequence stored in a byte vector, and has an associated text codec (see section 6.6.1) that is able to convert from one representation to the other. The exact meaning of a NUL-terminated byte sequence is dependent on this text codec. However, only codecs for encodings that are a conservative extension of ASCII (such as ASCII itself, Latin-1, or UTF-8) should be used here, to allow a minimal set of portable file names. (The Windows port uses a special synthetic encoding called UTF-8of16 compatible with UTF-8 but capable of encoding even invalid UTF-16 internally, but uses the UTF-8 codec at the Scheme level.)

Most procedures accepting OS strings also accept strings or byte vectors, which are then used to construct a OS string. In the headers of the specifications of these procedures, such arguments occur as os-string-thing.The standard Scheme procedures such as open-input-file that take file names all accept os-string-thing arguments. OS strings are in the os-strings structure.

(string->os-string string) –> os-string
(byte-vector->os-string byte-vector) –> os-string
(x->os-string os-string-thing) –> os-string

These procedures create an OS string from a string, a byte-vector (whose last value should be 0), and an os-string-thing argument, respectively, always using the standard OS-string text codec (see below).

(os-string->byte-vector os-string) –> byte-vector
(os-string->string os-string) –> string

These procedures yield the contents of an OS string. For an OS string created from a string, os-string->string will return a string with the same contents; for an OS string created from a byte vector, os-string->byte-vector will return a byte vector with the same contents. For the other cases, data loss as determined by the text codec is possible.

(current-os-string-text-codec) –> text-codec
(call-with-os-string-text-codec text-codec thunk) –> value(s)

The current-os-string-text-codec returns the current text codec used for creating new OS strings. The initial default is determined by the operating system. (On Unix, this is the text codec determined by the locale. On Windows, this is UTF-8.) The call-with-os-string-text-codec procedure dynamically binds the current text codec to text-codec during the invocation of thunk.

5.16 Shell commands

Structure c-system-function provides access to the C system() function.

(have-system?) –> boolean
(system os-string-thing) –> integer

Have-system? returns true if the underlying C implementation has a command processor. (System string) passes string to the C system() function and returns the result.

(begin
  (system "echo foo > test-file")
  (call-with-input-file "test-file" read))
=> 'foo

5.17 Sockets

Structure sockets provides access to TCP/IP sockets for interprocess and network communication.

(open-socket) –> socket
(open-socket port-number) –> socket
(socket-port-number socket) –> integer
(close-socket socket)
(socket-accept socket) –> input-port output-port
(get-host-name) –> string

Open-socket creates a new socket. If no port-number is supplied the system picks one at random. Socket-port-number returns a socket’s port number. Close-socket closes a socket, preventing any further connections. Socket-accept accepts a single connection on socket, returning an input port and an output port for communicating with the client. If no client is waiting socket-accept blocks until one appears. Get-host-name returns the network name of the machine.

(socket-client host-name port-number) –> input-port output-port

Socket-client connects to the server at port-number on the machine named host-name. Socket-client blocks until the server accepts the connection.

The following simple example shows a server and client for a centralized UID service.

(define (id-server)
  (let ((socket (open-socket)))
    (display "Waiting on port ")
    (display (socket-port-number socket))
    (newline)
    (let loop ((next-id 0))
      (call-with-values
        (lambda ()
          (socket-accept socket))
        (lambda (in out)
          (display next-id out)
          (close-input-port in)
          (close-output-port out)
          (loop (+ next-id 1)))))))
         
(define (get-id machine port-number)
  (call-with-values
    (lambda ()
      (socket-client machine port-number))
    (lambda (in out)
      (let ((id (read in)))
        (close-input-port in)
        (close-output-port out)
        id))))

5.18 Macros for writing loops

Iterate and reduce are extensions of named-let for writing loops that walk down one or more sequences, such as the elements of a list or vector, the characters read from a port, or an arithmetic series. Additional sequences can be defined by the user. Iterate and reduce are in structure reduce.

5.18.1 `Iterate`

The syntax of iterate is:

  (iterate loop-name
           ((sequence-type element-variable sequence-data ...)
            ...)
           ((state-variable initial-value)
            ...)
    body-expression
    [final-expression])

Iterate steps the element-variables in parallel through the sequences, while each state-variable has the corresponding initial-value for the first iteration and have later values supplied by body-expression. If any sequence has reached its limit the value of the iterate expression is the value of final-expression, if present, or the current values of the state-variables, returned as multiple values. If no sequence has reached its limit, body-expression is evaluated and either calls loop-name with new values for the state-variables, or returns some other value(s).

The loop-name and the state-variables and initial-values behave exactly as in named-let. The named-let expression

  (let loop-name ((state-variable initial-value) ...)
    body ...)

is equivalent to an iterate expression with no sequences (and with an explicit let wrapped around the body expressions to take care of any internal defines):

  (iterate loop-name
           ()
           ((state-variable initial-value) ...)
    (let () body ...))

The sequence-types are keywords (they are actually macros of a particular form; it is easy to add additional types of sequences). Examples are list* which walks down the elements of a list and vector* which does the same for vectors. For each iteration, each element-variable is bound to the next element of the sequence. The sequence-data gives the actual list or vector or whatever.

If there is a final-expression, it is evaluated when the end of one or more sequences is reached. If the body-expression does not call loop-name the final-expression is not evaluated. The state-variables are visible in final-expression but the sequence-variables are not.

The body-expression and the final-expression are in tail-position within the iterate. Unlike named-let, the behavior of a non-tail-recursive call to loop-name is unspecified (because iterating down a sequence may involve side effects, such as reading characters from a port).

5.18.2 `Reduce`

If an iterate expression is not meant to terminate before a sequence has reached its end, body-expression will always end with a tail call to loop-name. Reduce is a macro that makes this common case explicit. The syntax of reduce is the same as that of iterate, except that there is no loop-name. The body-expression returns new values of the state-variables instead of passing them to loop-name. Thus body-expression must return as many values as there are state variables. By special dispensation, if there are no state variables then body-expression may return any number of values, all of which are ignored.

The syntax of reduce is:

  (reduce ((sequence-type element-variable sequence-data ...)
            ...)
           ((state-variable initial-value)
            ...)
    body-expression
    [final-expression])

The value(s) returned by an instance of reduce is the value(s) returned by the final-expression, if present, or the current value(s) of the state variables when the end of one or more sequences is reached.

A reduce expression can be rewritten as an equivalent iterate expression by adding a loop-var and a wrapper for the body-expression that calls the loop-var.

(iterate loop
         ((sequence-type element-variable sequence-data ...)
          ...)
         ((state-variable initial-value)
          ...)
  (call-with-values (lambda ()
                      body-expression)
                    loop)
  [final-expression])

5.18.3 Sequence types

The predefined sequence types are:

(list* elt-var list) (syntax)
(vector* elt-var vector) (syntax)
(string* elt-var string) (syntax)
(count* elt-var start [end [step]]) (syntax)
(input* elt-var input-port read-procedure) (syntax)
(stream* elt-var procedure initial-data) (syntax)

For lists, vectors, and strings the element variable is bound to the successive elements of the list or vector, or the characters in the string.

For count* the element variable is bound to the elements of the sequence

 start, start + step, start + 2step, ..., end

inclusive of start and exclusive of end. The default step is 1. The sequence does not terminate if no end is given or if there is no N > 0 such that end = start + Nstep (= is used to test for termination). For example, (count* i 0 -1) doesn’t terminate because it begins past the end value and (count* i 0 1 2) doesn’t terminate because it skips over the end value.

For input* the elements are the results of successive applications of read-procedure to input-port. The sequence ends when read-procedure returns an end-of-file object.

For a stream, the procedure takes the current data value as an argument and returns two values, the next value of the sequence and a new data value. If the new data is #f then the previous element was the last one. For example,

  (list* elt my-list)

is the same as

  (stream* elt list->stream my-list)

where list->stream is

  (lambda (list)
    (if (null? list)
        (values 'ignored #f)
        (values (car list) (cdr list))))

5.18.4 Synchronous sequences

When using the sequence types described above, a loop terminates when any of its sequences reaches its end. To help detect bugs it is useful to have sequence types that check to see if two or more sequences end on the same iteration. For this purpose there is second set of sequence types called synchronous sequences. These are identical to the ones listed above except that they cause an error to be signalled if a loop is terminated by a synchronous sequence and some other synchronous sequence did not reach its end on the same iteration.

Sequences are checked for termination in order, from left to right, and if a loop is terminated by a non-synchronous sequence no further checking is done.

The synchronous sequences are:

(list% elt-var list) (syntax)
(vector% elt-var vector) (syntax)
(string% elt-var string) (syntax)
(count% elt-var start end [step]) (syntax)
(input% elt-var input-port read-procedure) (syntax)
(stream% elt-var procedure initial-data) (syntax)

Note that the synchronous count% must have an end, unlike the nonsynchronous count%.

5.18.5 Examples

Gathering the indexes of list elements that answer true to some predicate.

(lambda (my-list predicate)
  (reduce ((list* elt my-list)
           (count* i 0))
          ((hits '()))
    (if (predicate elt)
        (cons i hits)
        hits)
    (reverse hits))

Looking for the index of an element of a list.

(lambda (my-list predicate)
  (iterate loop
           ((list* elt my-list)
            (count* i 0))
           ()                                ; no state
    (if (predicate elt)
        i
        (loop))))

Reading one line.

(define (read-line port)
  (iterate loop
           ((input* c port read-char))
           ((chars '()))
    (if (char=? c #\newline)
        (list->string (reverse chars))
        (loop (cons c chars)))
    (if (null? chars)
        (eof-object)
        ; no newline at end of file
        (list->string (reverse chars)))))

Counting the lines in a file. We can’t use count* because we need the value of the count after the loop has finished.

(define (line-count name)
  (call-with-input-file name
    (lambda (in)
      (reduce ((input* l in read-line))
              ((i 0))
        (+ i 1)))))

5.18.6 Defining sequence types

The sequence types are object-oriented macros similar to enumerations. A non-synchronous sequence macro needs to supply three values: #f to indicate that it isn’t synchronous, a list of state variables and their initializers, and the code for one iteration. The first two methods are CPS’ed: they take another macro and argument to which to pass their result. The synchronized? method gets no additional arguments. The state-vars method is passed a list of names which will be bound to the arguments to the sequence. The final method, for the step, is passed the list of names bound to the arguments and the list of state variables. In addition there is a variable to be bound to the next element of the sequence, the body expression for the loop, and an expression for terminating the loop.

The definition of list* is

(define-syntax list*
  (syntax-rules (synchronized? state-vars step)
    ((list* synchronized? (next more))
     (next #f more))
    ((list* state-vars (start-list) (next more))
     (next ((list-var start-list)) more))
    ((list* step (start-list) (list-var)
            value-var loop-body final-exp)
     (if (null? list-var)
         final-exp
         (let ((value-var (car list-var))
               (list-var (cdr list-var)))
           loop-body)))))

Synchronized sequences are the same, except that they need to provide a termination test to be used when some other synchronized method terminates the loop.

(define-syntax list%
  (syntax-rules (sync done)
    ((list% sync (next more))
     (next #t more))
    ((list% done (start-list) (list-var))
     (null? list-var))
    ((list% stuff ...)
     (list* stuff ...))))

5.18.7 Expanded code

The expansion of

  (reduce ((list* x '(1 2 3)))
          ((r '()))
    (cons x r))

  (let ((final (lambda (r) (values r)))
        (list '(1 2 3))
        (r '()))
    (let loop ((list list) (r r))
      (if (null? list)
          (final r)
          (let ((x (car list))
                (list (cdr list)))
            (let ((continue (lambda (r)
                              (loop list r))))
              (continue (cons x r)))))))

The only inefficiencies in this code are the final and continue procedures, both of which could be substituted in-line. The macro expander could do the substitution for continue when there is no explicit proceed variable, as in this case, but not in general.

5.19 Sorting lists and vectors

(This section, as the libraries it describes, was written mostly by Olin Shivers for the draft of SRFI 32.)

The sort libraries in Scheme 48 include

vector insert sort (stable)
vector heap sort
vector merge sort (stable)
pure and destructive list merge sort (stable)
stable vector and list merge
miscellaneous sort-related procedures: vector and list merging, sorted predicates, vector binary search, vector and list delete-equal-neighbor procedures.
a general, non-algorithmic set of procedure names for general sorting and merging

5.19.1 Design rules

What vs. how

There are two different interfaces: “what” (simple) and “how” (detailed).


Simple: you specify semantics: datatype (list or vector), mutability, and stability.
Detailed: you specify the actual algorithm (quick, heap, insert, merge). Different algorithms have different properties, both semantic and pragmatic, so these exports are necessary.
It is necessarily the case that the specifications of these procedures make statements about execution “pragmatics.” For example, the sole distinction between heap sort and quick sort—both of which are provided by this library—-is one of execution time, which is not a “semantic” distinction. Similar resource-use statements are made about “iterative” procedures, meaning that they can execute on input of arbitrary size in a constant number of stack frames.

Consistency across procedure signatures

The two interfaces share common procedure signatures wherever possible, to facilitate switching a given call from one procedure to another.

Less-than parameter first, data parameter after

These procedures uniformly observe the following parameter order: the data to be sorted comes after the comparison procedure. That is, we write

  (sort < list)

not

  (sort list <)

Ordering, comparison procedures and stability

These routines take a < comparison procedure, not a ≤ comparison procedure, and they sort into increasing order. The difference between a < spec and a ≤ spec comes up in two places:

the definition of an ordered or sorted data set, and
the definition of a stable sorting algorithm.

We say that a data set (a list or vector) is sorted or ordered if it contains no adjacent pair of values ... x, y ... such that y < x.

In other words, scanning across the data never takes a “downwards” step.

If you use a ≤ procedure where these algorithms expect a < procedure, you may not get the answers you expect. For example, the list-sorted? procedure will return false if you pass it a ≤ comparison procedure and an ordered list containing adjacent equal elements.

A “stable” sort is one that preserves the pre-existing order of equal elements. Suppose, for example, that we sort a list of numbers by comparing their absolute values, i.e., using comparison procedure

(lambda (x y) (< (abs x) (abs y)))

If we sort a list that contains both 3 and -3:

... 3, ..., - 3 ...

then a stable sort is an algorithm that will not swap the order of these two elements, that is, the answer is guaranteed to to look like

... 3, - 3 ...

not

... - 3, 3 ...

Choosing < for the comparison procedure instead of ≤ affects how stability is coded. Given an adjacent pair x, y, (< y x) means “x should be moved in front of x”—otherwise, leave things as they are. So using a ≤ procedure where a < procedure is expected will invert stability.

This is due to the definition of equality, given a < comparator:

    (and (not (< x y))
         (not (< y x)))

The definition is rather different, given a ≤ comparator:

    (and (<= x y)
         (<= y x))

A “stable” merge is one that reliably favors one of its data sets when equal items appear in both data sets. All merge operations in this library are stable, breaking ties between data sets in favor of the first data set—elements of the first list come before equal elements in the second list.

So, if we are merging two lists of numbers ordered by absolute value, the stable merge operation list-merge

    (list-merge (lambda (x y) (< (abs x) (abs y)))
                '(0 -2 4 8 -10) '(-1 3 -4 7))

reliably places the 4 of the first list before the equal-comparing -4 of the second list:

    (0 -1 -2 4 -4 7 8 -10)

Some sort algorithms will not work correctly if given a ≤ when they expect a < comparison (or vice-versa).

In short, if your comparison procedure f answers true to (f x x), then

using a stable sorting or merging algorithm will not give you a stable sort or merge,
list-sorted? may surprise you.

Note that you can synthesize a < procedure from a ≤ procedure with

    (lambda (x y) (not (<= y x)))

if need be.

Precise definitions give sharp edges to tools, but require care in use. “Measure twice, cut once.”

All vector operations accept optional subrange parameters

The vector operations specified below all take optional start/end arguments indicating a selected subrange of a vector’s elements. If a start parameter or start/end parameter pair is given to such a procedure, they must be exact, non-negative integers, such that

0 ≤ start ≤ end ≤ (vector-length vector)

where vector is the related vector parameter. If not specified, they default to 0 and the length of the vector, respectively. They are interpreted to select the range [start,end), that is, all elements from index start (inclusive) up to, but not including, index end.

Required vs. allowed side-effects

List-sort! and List-stable-sort! are allowed, but not required, to alter their arguments’ cons cells to construct the result list. This is consistent with the what-not-how character of the group of procedures to which they belong (the sorting structure).

The list-delete-neighbor-dups!, list-merge! and list-merge-sort! procedures, on the other hand, provide specific algorithms, and, as such, explicitly commit to the use of side-effects on their input lists in order to guarantee their key algorithmic properties (e.g., linear-time operation).

5.19.2 Procedure specification

Structure name	Functionality
`sorting`	General sorting for lists and vectors
`sorted`	Sorted predicates for lists and vectors
`list-merge-sort`	List merge sort
`vector-merge-sort`	Vector merge sort
`vector-heap-sort`	Vector heap sort
`vector-insert-sort`	Vector insertion sort
`delete-neighbor-duplicates`	List and vector delete neighbor duplicates
`binary-searches`	Vector binary search

Note that there is no “list insert sort” package, as you might as well always use list merge sort. The reference implementation’s destructive list merge sort will do fewer set-cdr!s than a destructive insert sort.

Procedure naming and functionality

Almost all of the procedures described below are variants of two basic operations: sorting and merging. These procedures are consistently named by composing a set of basic lexemes to indicate what they do.

Lexeme	Meaning
`sort`	The procedure sorts its input data set by some < comparison procedure.
`merge`	The procedure merges two ordered data sets into a single ordered result.
`stable`	This lexeme indicates that the sort is a stable one.
`vector`	The procedure operates upon vectors.
`list`	The procedure operates upon lists.
`!`	Procedures that end in `!` are allowed, and sometimes required, to reuse their input storage to construct their answer.

Types of parameters and return values

In the procedures specified below,

A < or = parameter is a procedure accepting two arguments taken from the specified procedure’s data set(s), and returning a boolean;
Start and end parameters are exact, non-negative integers that serve as vector indices selecting a subrange of some associated vector. When specified, they must satisfy the relation

0 ≤ start ≤ end ≤ (vector-length vector)

where vector is the associated vector.

Passing values to procedures with these parameters that do not satisfy these types is an error.

If a procedure is said to return “unspecified,” this means that nothing at all is said about what the procedure returns, not even the number of return values. Such a procedure is not even required to be consistent from call to call in the nature or number of its return values. It is simply required to return a value (or values) that may be passed to a command continuation, e.g. as the value of an expression appearing as a non-terminal subform of a begin expression. Note that in R⁵RS, this restricts such a procedure to returning a single value; non-R⁵RS systems may not even provide this restriction.

5.19.2.1 `sorting`—general sorting package

This library provides basic sorting and merging functionality suitable for general programming. The procedures are named by their semantic properties, i.e., what they do to the data (sort, stable sort, merge, and so forth).

(list-sorted? < list) –> boolean
(list-merge < list₁ list₂) –> list
(list-merge! < list₁ list₂) –> list
(list-sort < lis) –> list
(list-sort! < lis) –> list
(list-stable-sort < list) –> list
(list-stable-sort! < list) –> list
(list-delete-neighbor-dups = list) –> list
(vector-sorted? < v [start [end]]) –> boolean
(vector-merge < v₁ v₂ [start1 [end1 [start2 [end2]]]]) –> vector
(vector-merge! < v v₁ v₂ [start [start1 [end1 [start2 [end2]]]]])
(vector-sort < v [start [end]]) –> vector
(vector-sort! < v [start [end]])
(vector-stable-sort < v [start [end]]) –> vector
(vector-stable-sort! < v [start [end]])
(vector-delete-neighbor-dups = v [start [end]]) –> vector

Procedure	Suggested algorithm
`list-sort`	vector heap or quick
`list-sort!`	list merge sort
`list-stable-sort`	vector merge sort
`list-stable-sort!`	list merge sort
`vector-sort`	heap or quick sort
`vector-sort!` or quick sort
`vector-stable-sort`	vector merge sort
`vector-stable-sort!` merge sort

List-Sorted? and vector-sorted? return true if their input list or vector is in sorted order, as determined by their < comparison parameter.

All four merge operations are stable: an element of the initial list list₁ or vector vector₁ will come before an equal-comparing element in the second list list₂ or vector vector₂ in the result.

The procedures

list-merge
list-sort
list-stable-sort
list-delete-neighbor-dups

do not alter their inputs and are allowed to return a value that shares a common tail with a list argument.

The procedure

list-sort!
list-stable-sort!

are “linear update” operators—they are allowed, but not required, to alter the cons cells of their arguments to produce their results.

On the other hand, the list-merge! procedure make only a single, iterative, linear-time pass over its argument list, using set-cdr!s to rearrange the cells of the list into the final result —it works “in place.” Hence, any cons cell appearing in the result must have originally appeared in an input. The intent of this iterative-algorithm commitment is to allow the programmer to be sure that if, for example, list-merge! is asked to merge two ten-million-element lists, the operation will complete without performing some extremely (possibly twenty-million) deep recursion.

The vector procedures

vector-sort
vector-stable-sort
vector-delete-neighbor-dups

do not alter their inputs, but allocate a fresh vector for their result, of length end - start.

The vector procedures

vector-sort!
vector-stable-sort!

sort their data in-place. (But note that vector-stable-sort! may allocate temporary storage proportional to the size of the input .)

Vector-merge returns a vector of length (end₁ - start₁ + (end₂ - start₂).

Vector-merge! writes its result into vector v, beginning at index start, for indices less than end = start + (end₁ - start₁) + (end₂ - start₂). The target subvector v[start,end) may not overlap either source subvector vector₁[start₁,end₁) vector₂[start₂,end₂).

The ...-delete-neighbor-dups-... procedures: These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure. The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.

These procedures are linear time—much faster than the O(n²) general duplicate-element deletors that do not assume any “bunching” of elements (such as the ones provided by SRFI 1). If you want to delete duplicate elements from a large list or vector, you can sort the elements to bring equal items together, then use one of these procedures, for a total time of O(nlog(n)).

The comparison procedure = passed to these procedures is always applied ( = x y) where x comes before y in the containing list or vector.

List-delete-neighbor-dups does not alter its input list; its answer may share storage with the input list.
Vector-delete-neighbor-dups does not alter its input vector, but rather allocates a fresh vector to hold the result.

Examples:

(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2))
  ===⇒ (1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2))
  ===⇒ #(1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7)
  ===⇒ #(7 0 -2)

5.19.2.2 Algorithm-specific sorting packages

These packages provide more specific sorting functionality, that is, specific committment to particular algorithms that have particular pragmatic consequences (such as memory locality, asymptotic running time) beyond their semantic behaviour (sorting, stable sorting, merging, etc.). Programmers that need a particular algorithm can use one of these packages.

`sorted`—sorted predicates

(list-sorted? < list) –> boolean
(vector-sorted? < vector) –> boolean
(vector-sorted? < vector start) –> boolean
(vector-sorted? < vector start end) –> boolean

Return #f iff there is an adjacent pair ... x, y ... in the input list or vector such that y < x. The optional start/end range arguments restrict vector-sorted? to the indicated subvector.

`list-merge-sort`—list merge sort

(list-merge-sort < list) –> list
(list-merge-sort! < list) –> list
(list-merge list₁ < list₂) –> list
(list-merge! list₁ < list₂) –> list

The sort procedures sort their data using a list merge sort, which is stable. (The reference implementation is, additionally, a “natural” sort. See below for the properties of this algorithm.)

The ! procedures are destructive—they use set-cdr!s to rearrange the cells of the lists into the proper order. As such, they do not allocate any extra cons cells—they are “in place” sorts.

The merge operations are stable: an element of list₁ will come before an equal-comparing element in list₂ in the result list.

`vector-merge-sort`—vector merge sort

(vector-merge-sort < vector [start [end [temp]]]) –> vector
(vector-merge-sort! < vector [start [end [temp]]])
(vector-merge < vector₁ vector₂ [start₁ [end₁ [start₂ [end₂]]]]) –> vector
(vector-merge! < vector vector₁ vector₂ [start [start₁ [end₁ [start₂ [end₂]]]]])

The sort procedures sort their data using vector merge sort, which is stable. (The reference implementation is, additionally, a “natural” sort. See below for the properties of this algorithm.)

The optional start/end arguments provide for sorting of subranges, and default to 0 and the length of the corresponding vector.

Merge-sorting a vector requires the allocation of a temporary “scratch” work vector for the duration of the sort. This scratch vector can be passed in by the client as the optional temp argument; if so, the supplied vector must be of size ≤ end, and will not be altered outside the range [start,end). If not supplied, the sort routines allocate one themselves.

The merge operations are stable: an element of vector₁ will come before an equal-comparing element in vector₂ in the result vector.

Vector-merge-sort! leaves its result in vector[start,end).
Vector-merge-sort returns a vector of length end - start.
Vector-merge returns a vector of length (end₁ - start₁) + (end₂ - start₂).
Vector-merge! writes its result into vector, beginning at index start, for indices less than end = start + (end₁ - start₁) + (end₂ - start₂). The target subvector

vector[start,end)

may not overlap either source subvector

vector₁[start₁,end₁), or vector₂[start₂,end₂).

`vector-heap-sort`—vector heap sort

(vector-heap-sort < vector [start [end]]) –> vector
(vector-heap-sort! < vector [start [end]])

These procedures sort their data using heap sort, which is not a stable sorting algorithm.

Vector-heap-sort returns a vector of length end - start. Vector-heap-sort! is in-place, leaving its result in vector[start,end).

`vector-insert-sort`—vector insertion sort

(vector-insert-sort < vector [start [end]]) –> vector
(vector-insert-sort! < vector [start [end]])

These procedures stably sort their data using insertion sort.

Vector-insert-sort returns a vector of length end - start.
Vector-insert-sort! is in-place, leaving its result in vector[start,end).

`delete-neighbor-duplicates`—list and vector delete neighbor duplicates

(list-delete-neighbor-dups = list) –> list
(list-delete-neighbor-dups! = list) –> list
(vector-delete-neighbor-dups = vector [start [end]]) –> vector
(vector-delete-neighbor-dups! = vector [start [end]]) –> end′

These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure = . The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.

The comparison procedure = passed to these procedures is always applied

( =  x y)

where x comes before y in the containing list or vector.

List-delete-neighbor-dups does not alter its input list; its answer may share storage with the input list.
Vector-delete-neighbor-dups does not alter its input vector, but rather allocates a fresh vector to hold the result.
List-delete-neighbor-dups! is permitted, but not required, to mutate its input list in order to construct its answer.
Vector-delete-neighbor-dups! reuses its input vector to hold the answer, packing its answer into the index range [start,end′), where end′ is the non-negative exact integer returned as its value. It returns end′ as its result. The vector is not altered outside the range [start,end′).

Examples:

(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2))
  ===⇒ (1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2))
  ===⇒ #(1 2 7 0 -2)

(vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7)
  ===⇒ #(7 0 -2)

;; Result left in v[3,9):
(let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6)))
  (cons (vector-delete-neighbor-dups! = v 3)
        v))
   ===⇒ (9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))

`binary-searches`—vector binary search

(vector-binary-search < elt->key key vector [start [end]]) –> integer or #f
(vector-binary-search3 compare-proc vector [start [end]]) –> integer or #f

vector-binary-search searches vector in range [start,end) (which default to 0 and the length of vector, respectively) for an element whose associated key is equal to key. The procedure elt->key is used to map an element to its associated key. The elements of the vector are assumed to be ordered by the < relation on these keys. That is,

(vector-sorted? (lambda (x y) (< (elt->key x) (elt->key y)))
                vector start end) ===⇒ true

An element e of vector is a match for key if it’s neither less nor greater than the key:

(and (not (< (elt->key e) key))
     (not (< key (elt->key e))))

If there is such an element, the procedure returns its index in the vector as an exact integer. If there is no such element in the searched range, the procedure returns false.

(vector-binary-search < car 4 '#((1 . one) (3 . three)
                                 (4 . four) (25 . twenty-five)))
===⇒ 2

(vector-binary-search < car 7 '#((1 . one) (3 . three)
                                 (4 . four) (25 . twenty-five)))
===⇒ #f

Vector-binary-search3 is a variant that uses a three-way comparison procedure compare-proc. Compare-proc compares its parameter to the search key, and returns an exact integer whose sign indicates its relationship to the search key.

arrayrclcrcl (compare-proc x) &<& 0& ⇒& x &<& search-key
(compare-proc x) & = & 0& ⇒& x & = & search-key
(compare-proc x) &>& 0& ⇒& x &>& search-key endarray

(vector-binary-search3 (lambda (elt) (- (car elt) 4))
                       '#((1 . one) (3 . three)
                          (4 . four) (25 . twenty-five)))
===⇒ 2

5.19.3 Algorithmic properties

Different sort and merge algorithms have different properties. Choose the algorithm that matches your needs:

Vector insert sort

Stable, but only suitable for small vectors—O(n²).

Vector heap sort

Not stable. Guaranteed fast—O(nlog(n)) worst case. Poor locality on large vectors. A very reliable workhorse.

Vector merge sort

Stable. Not in-place—requires a temporary buffer of equal size. Fast—O(nlog(n))—and has good memory locality for large vectors.

The implementation of vector merge sort provided by this implementation is, additionally, a “natural” sort, meaning that it exploits existing order in the input data, providing O(n) best case.

Destructive list merge sort

Stable, fast and in-place (i.e., allocates no new cons cells). “Fast” means O(nlog(n)) worse-case, and substantially better if the data is already mostly ordered, all the way down to linear time for a completely-ordered input list (i.e., it is a “natural” sort).

Note that sorting lists involves chasing pointers through memory, which can be a loser on modern machine architectures because of poor cache and page locality. Sorting vectors has inherently better locality.

This implementation’s destructive list merge and merge sort implementations are opportunistic—they avoid redundant set-cdr!s, and try to take long already-ordered runs of list structure as-is when doing the merges.

Pure list merge sort

Stable and fast—O(nlog(n)) worst-case, and possibly O(n), depending upon the input list (see discussion above).

Algorithm	Stable?	Worst case	Average case	In-place
Vector insert	Yes	O(n²)	O(n²)	Yes
Vector quick	No	O(n²)	O(nlog(n))	Yes
Vector heap	No	O(nlog(n))	O(nlog(n))	Yes
Vector merge	Yes	O(nlog(n))	O(nlog(n))	No
List merge	Yes	O(nlog(n))	O(nlog(n))	Either

5.20 Regular expressions

This section describes a functional interface for building regular expressions and matching them against strings. The matching is done using the POSIX regular expression package. Regular expressions are in the structure regexps.

A regular expression is either a character set, which matches any character in the set, or a composite expression containing one or more subexpressions. A regular expression can be matched against a string to determine success or failure, and to determine the substrings matched by particular subexpressions.

(regexp? value) –> boolean

Returns #t if value is a regular expression created using the functional interface for regular expressions, and #f otherwise.

5.20.1 Character sets

Character sets may be defined using a list of characters and strings, using a range or ranges of characters, or by using set operations on existing character sets.

(set character-or-string ...) –> char-set
(range low-char high-char) –> char-set
(ranges low-char high-char ...) –> char-set
(ascii-range low-char high-char) –> char-set
(ascii-ranges low-char high-char ...) –> char-set

Set returns a set that contains the character arguments and the characters in any string arguments. Range returns a character set that contain all characters between low-char and high-char, inclusive. Ranges returns a set that contains all characters in the given ranges. Range and ranges use the ordering induced by char->integer. Ascii-range and ascii-ranges use the ASCII ordering. It is an error for a high-char to be less than the preceding low-char in the appropriate ordering.

(negate char-set) –> char-set
(intersection char-set char-set) –> char-set
(union char-set char-set) –> char-set
(subtract char-set char-set) –> char-set

These perform the indicated operations on character sets.

The following character sets are predefined:

`lower-case`	`(set "abcdefghijklmnopqrstuvwxyz")`
`upper-case`	`(set "ABCDEFGHIJKLMNOPQRSTUVWXYZ")`
`alphabetic`	`(union lower-case upper-case)`
`numeric`	`(set "0123456789")`
`alphanumeric`	`(union alphabetic numeric)`
`punctuation`	`(set "`!\"#$%&'()*+,-./:;<=>?@[\\]^_`{\|}~`")`
`graphic`	`(union alphanumeric punctuation)`
`printing`	`(union graphic (set #\space))`
`control`	`(negate printing)`
`blank`	`(set #\space (ascii->char 9))` ; 9 is tab
`whitespace`	`(union (set #\space) (ascii-range 9 13))`
`hexdigit`	`(set "0123456789abcdefABCDEF")`

The above are taken from the default locale in POSIX. The characters in whitespace are space, tab, newline (= line feed), vertical tab, form feed, and carriage return.

5.20.2 Anchoring

(string-start) –> reg-exp
(string-end) –> reg-exp

String-start returns a regular expression that matches the beginning of the string being matched against; string-end returns one that matches the end.

5.20.3 Composite expressions

(sequence reg-exp ...) –> reg-exp
(one-of reg-exp ...) –> reg-exp

Sequence matches the concatenation of its arguments, one-of matches any one of its arguments.

(text string) –> reg-exp

Text returns a regular expression that matches the characters in string, in order.

(repeat reg-exp) –> reg-exp
(repeat count reg-exp) –> reg-exp
(repeat min max reg-exp) –> reg-exp

Repeat returns a regular expression that matches zero or more occurences of its reg-exp argument. With no count the result will match any number of times (reg-exp*). With a single count the returned expression will match reg-exp exactly that number of times. The final case will match from min to max repetitions, inclusive. Max may be #f, in which case there is no maximum number of matches. Count and min should be exact, non-negative integers; max should either be an exact non-negative integer or #f.

5.20.4 Case sensitivity

Regular expressions are normally case-sensitive.

(ignore-case reg-exp) –> reg-exp
(use-case reg-exp) –> reg-exp

The value returned by ignore-case is identical its argument except that case will be ignored when matching. The value returned by use-case is protected from future applications of ignore-case. The expressions returned by use-case and ignore-case are unaffected by later uses of the these procedures. By way of example, the following matches "ab" but not "aB", "Ab", or "AB".

(text "ab")

while

(ignore-case (test "ab"))

matches "ab", "aB", "Ab", and "AB" and

(ignore-case (sequence (text "a")
                       (use-case (text "b"))))

matches "ab" and "Ab" but not "aB" or "AB".

5.20.5 Submatches and matching

A subexpression within a larger expression can be marked as a submatch. When an expression is matched against a string, the success or failure of each submatch within that expression is reported, as well as the location of the substring matched be each successful submatch.

(submatch key reg-exp) –> reg-exp
(no-submatches reg-exp) –> reg-exp

Submatch returns a regular expression that matches its argument and causes the result of matching its argument to be reported by the match procedure. Key is used to indicate the result of this particular submatch in the alist of successful submatches returned by match. Any value may be used as a key. No-submatches returns an expression identical to its argument, except that all submatches have been elided.

(any-match? reg-exp string) –> boolean
(exact-match? reg-exp string) –> boolean
(match reg-exp string) –> match or #f
(match-start match) –> index
(match-end match) –> index
(match-submatches match) –> alist

Any-match? returns #t if string matches reg-exp or contains a substring that does, and #f otherwise. Exact-match? returns #t if string matches reg-exp and #f otherwise.

Match returns #f if reg-exp does not match string and a match record if it does match. A match record contains three values: the beginning and end of the substring that matched the pattern and an a-list of submatch keys and corresponding match records for any submatches that also matched. Match-start returns the index of the first character in the matching substring and match-end gives index of the first character after the matching substring. Match-submatches returns an alist of submatch keys and match records. Only the top match record returned by match has a submatch alist.

Matching occurs according to POSIX. The match returned is the one with the lowest starting index in string. If there is more than one such match, the longest is returned. Within that match the longest possible submatches are returned.

All three matching procedures cache a compiled version of reg-exp. Subsequent calls with the same reg-exp will be more efficient.

The C interface to the POSIX regular expression code uses ASCII nul as an end-of-string marker. The matching procedures will ignore any characters following an embedded ASCII nuls in string.

(define pattern (text "abc"))
(any-match? pattern "abc")         => #t
(any-match? pattern "abx")         => #f
(any-match? pattern "xxabcxx")     => #t

(exact-match? pattern "abc")       => #t
(exact-match? pattern "abx")       => #f
(exact-match? pattern "xxabcxx")   => #f

(match pattern "abc")              => (#{match 0 3})
(match pattern "abx")              => #f
(match pattern "xxabcxx")          => (#{match 2 5})

(let ((x (match (sequence (text "ab")
                          (submatch 'foo (text "cd"))
                          (text "ef"))
                "xxxabcdefxx")))
  (list x (match-submatches x)))
  => (#{match 3 9} ((foo . #{match 5 7}))

(match-submatches
  (match (sequence
           (set "a")
           (one-of (submatch 'foo (text "bc"))
                   (submatch 'bar (text "BC"))))
         "xxxaBCd"))
  => ((bar . #{match 4 6}))

5.21 SRFIs

‘SRFI’ stands for ‘Scheme Request For Implementation’. An SRFI is a description of an extension to standard Scheme. Draft and final SRFI documents, a FAQ, and other information about SRFIs can be found at the SRFI web site.

Scheme 48 includes implementations of the following (final) SRFIs:

SRFI 1 – List Library
SRFI 2 – and-let*
SRFI 4 – Homogeneous numeric vector datatypes (see note below)
SRFI 5 – let with signatures and rest arguments
SRFI 6 – Basic string ports
SRFI 7 – Program configuration
SRFI 8 – receive
SRFI 9 – Defining record types
SRFI 11 – Syntax for receiving multiple values
SRFI 13 – String Library
SRFI 14 – Character-Set Library (see note below)
SRFI 16 – Syntax for procedures of variable arity
SRFI 17 – Generalized set!
SRFI 19 – Time Data Types and Procedures
SRFI 22 – Running Scheme Scripts on Unix
SRFI 23 – Error reporting mechanism
SRFI 25 – Multi-dimensional Array Primitives
SRFI 26 – Notation for Specializing Parameters without Currying
SRFI 27 – Sources of Random Bits
SRFI 28 – Basic Format Strings
SRFI 31 – A special form rec for recursive evaluation
SRFI 34 – Exception Handling for Programs
SRFI 35 – Conditions
SRFI 36 – I/O Conditions
SRFI 37 – args-fold: a program argument processor
SRFI 40 – A Library of Streams
SRFI 42 – Eager Comprehensions
SRFI 43 – Vector library
SRFI 45 – Primitives for Expressing Iterative Lazy Algorithms
SRFI 60 – Integers as Bits
SRFI 61 – A more general cond clause
SRFI 63 – Homogeneous and Heterogeneous Arrays
SRFI 66 – Octet Vectors
SRFI 67 – Compare Procedures
SRFI 74 – Octet-Addressed Binary Blocks
SRFI 78 – Lightweight testing

Documentation on these can be found at the web site mentioned above.

SRFI 4 specifies an external representation for homogeneous numeric vectors that is incompatible with R⁵RS. The Scheme 48 version of SRFI 4 does not support this external representation.

SRFI 14 includes the procedure ->char-set which is not a standard Scheme identifier (in R⁵RS the only required identifier starting with - is - itself). In the Scheme 48 version of SRFI 14 we have renamed ->char-set as x->char-set.

With the exception of SRFI 62 (which is supported by default), the SRFI bindings can be accessed either by opening the appropriate structure (the structure srfi-n contains SRFI n) or by loading structure srfi-7 and then using the ,load-srfi-7-program command to load an SRFI 7-style program. The syntax for the command is

,load-srfi-7-program name filename

This creates a new structure and associated package, binds the structure to name in the configuration package, and then loads the program found in filename into the package.

As an example, if the file test.scm contains

(program (code (define x 10)))

this program can be loaded as follows:

> ,load-package srfi-7
> ,load-srfi-7-program test test.scm
[test]
> ,in test
test> x
10
test>