Libraries
Use the ,open command (section 3.4) or the module language (chapter 2.6) to open the structures described below.
5.1 General utilities
These are in the big-util structure.
(atom? x) is the same as (not (pair? x)).
Returns true for the empty list, false for a pair, and signals an error otherwise.
(neq? x y) is the same as (not (eq? x y)).
(n= x y) is the same as (not (= x y)).
These both just return their argument. No-op is guaranteed not to be compiled in-line, identity may be.
Returns true if value is in list, false otherwise.
Returns true if predicate is true for any element of list.
Returns true if predicate is true for every element of list.
Any returns some element of list for which predicate is true, or false if there are none. First does the same except that it returns the first element for which predicate is true.
Returns a list containing all of the elements of list for which predicate is true. The order of the elements is preserved. Filter! may reuse the storage of list.
The same as filter except the returned list contains the results of applying procedure instead of elements of list. (filter-map p l) is the same as (filter identity (map p l)).
The first return value contains those elements list for which predicate is true, the second contains the remaining elements. The order of the elements is preserved. Partition-list! may reuse the storage of the list.
Returns its argument with all duplicate elements removed. The first instance of each element is preserved.
All three of these return list with some elements removed. Delq removes all elements eq? to value. Delq! does the same and may modify the list argument. Delete removes all elements for which predicate is true. Both delq and delete may reuse some of the storage in the list argument, but won't modify it.
Destructively reverses list.
Returns the symbol whose name is produced by concatenating the displayed representations of value ....
(concatenate-symbol 'abc "-" 4) ===> 'abc-4
5.2 Pretty-printing
These are in the pp structure.
Pretty-print value The current output port is used if no port is specified. Position is the starting offset. Value will be pretty-printed to the right of this column.
5.3 ASCII character encoding
These are in the structure ascii.
These are identical to char->integer and integer->char except that they use the ASCII encoding (appendix A).
Ascii-limit is one more than the largest value that char->ascii may return. Ascii-whitespaces is a list of the ASCII values of whitespace characters (space, horizontal tab, line feed (= newline), vertical tab, form feed, and carriage return).
5.4 Bitwise integer operations
These functions use the two's-complement representation for integers. There is no limit to the number of bits in an integer. They are in the structures bitwise and big-scheme.
These perform various logical operations on integers on a bit-by-bit basis. `ior' is inclusive OR and `xor' is exclusive OR.
Shifts the integer by the given bit count, which must be an integer, shifting left for positive counts and right for negative ones. Shifting preserves the integer's sign.
Counts the number of bits set in the integer. If the argument is negative a bitwise NOT operation is performed before counting.
5.5 Byte vectors
These are homogeneous vectors of small integers (0 < i < 255). The functions that operate on them are analogous to those for vectors. They are in the structure byte-vectors.
5.6 Sparse vectors
These are vectors that grow as large as they need to. That is, they can be indexed by arbitrarily large nonnegative integers. The implementation allows for arbitrarily large gaps by arranging the entries in a tree. They are in the structure sparse-vectors.
Make-sparse-vector, sparse-vector-ref, and sparse-vector-set! are analogous to make-vector, vector-ref, and vector-set!, except that the indices passed to sparse-vector-ref and sparse-vector-set! can be arbitrarily large. For indices whose elements have not been set in a sparse vector, sparse-vector-ref returns #f.
Sparse-vector->list is for debugging: It returns a list of the consecutive elements in a sparse vector from 0 to the highest element that has been set. Note that the list will also include all the #f elements for the unset elements.
5.7 Cells
These hold a single value and are useful when a simple indirection is required. The system uses these to hold the values of lexical variables that may be set!.
5.8 Queues
These are ordinary first-in, first-out queues. The procedures are in structure queues.
Make-queue creates an empty queue, queue? is a predicate for identifying queues, queue-empty? tells you if a queue is empty, enqueue! and dequeue! add and remove values.
Queue-length returns the number of values in queue. Queue->list returns the values in queue as a list, in the order in which the values were added. List->queue returns a queue containing values, preserving their order. Delete-from-queue removes the first instance of value from queue, using eq? for comparisons. Delete-from-queue returns #t if value is found and #f if it is not.
5.9 Arrays
These provide N-dimensional, zero-based arrays and are in the structure arrays. The array interface is derived from one invented by Alan Bawden.
Make-array makes a new array with the given dimensions, each of which must be a non-negative integer. Every element is initially set to value. Array Returns a new array with the given dimensions and elements. Dimensions must be a list of non-negative integers, The number of elements should be the equal to the product of the dimensions. The elements are stored in row-major order.
(make-array 'a 2 3)=>
{Array 2 3} (array '(2 3) 'a 'b 'c 'd 'e 'f)=>
{Array 2 3}
Copy-array returns a copy of array. The copy is identical to the array but does not share storage with it.
Returns #t if value is an array.
Array-ref returns the specified array element and array-set! replaces the element with value.
(let ((a (array '(2 3) 'a 'b 'c 'd 'e 'f)))
(let ((x (array-ref a 0 1)))
(array-set! a 'g 0 1)
(list x (array-ref a 0 1))))
=>
'(b g)
Array->vector returns a vector containing the elements of array in row-major order. Array-shape returns the dimensions of the array as a list.
Make-shared-array makes a new array that shares storage with array and uses linear-map to map indexes to elements. Linear-map must accept as many arguments as the number of dimensions given and must return a list of non-negative integers that are valid indexes into array.
(array-ref (make-shared-array a f i0 i1 ...) j0 j1 ...)
is equivalent to
(apply array-ref a (f j0 j1 ...))
As an example, the following function makes the transpose of a two-dimensional array:
(define (transpose array)
(let ((shape (array-shape array)))
(make-shared-array array
(lambda (x y)
(list y x))
(cadr shape)
(car shape))))
(array->vector
(transpose
(array '(2 3) 'a 'b 'c 'd 'e 'f)))
=>
'(a d b e c f)
5.10 Records
New types can be constructed using the define-record-type macro from the define-record-types structure The general syntax is:
(define-record-type tag type-name (constructor-name field-tag ...) predicate-name (field-tag accessor-name [modifier-name]) ...)
This makes the following definitions:
type-name (type)
(constructor-name field-init ...) -> type-name
(predicate-name value) -> boolean
(accessor-name type-name) -> value
(modifier-name type-name value)
Type-name is the record type itself, and can be used to specify a print method (see below). Constructor-name is a constructor that accepts values for the fields whose tags are specified. Predicate-name is a predicate that returns #t for elements of the type and #f for everything else. The accessor-names retrieve the values of fields, and the modifier-name's update them. Tag is used in printing instances of the record type and the field-tags are used in the inspector and to match constructor arguments with fields.
Define-record-discloser determines how records of type type are printed. Discloser should be procedure which takes a single record of type type and returns a list whose car is a symbol. The record will be printed as the value returned by discloser with curly braces used instead of the usual parenthesis.
For example
(define-record-type pare :pare (kons x y) pare? (x kar set-kar!) (y kdr))
defines kons to be a constructor, kar and kdr to be accessors, set-kar! to be a modifier, and pare? to be a predicate for a new type of object. The type itself is named :pare. Pare is a tag used in printing the new objects.
By default, the new objects print as #{Pare}. The print method can be modified using define-record-discloser:
(define-record-discloser :pare (lambda (p) `(pare ,(kar p) ,(kdr p))))
will cause the result of (kons 1 2) to print as #{Pare 1 2}.
Define-record-resumer (section 7.8.3) can be used to control how records are stored in heap images.
5.10.1 Low-level access to records
Records are implemented using primitive objects exactly analogous to vectors. Every record has a record type (which is another record) in the first slot. Note that use of these procedures, especially record-set!, breaks the record abstraction described above; caution is advised.
These procedures are in the structure records.
These the same as the standard vector- procedures except that they operate on records. The value returned by record-length includes the slot holding the record's type. (record-type x) is equivalent to (record-ref x 0).
5.10.2 Record types
Record types are themselves records of a particular type (the first slot of :record-type points to itself). A record type contains four values: the name of the record type, a list of the names its fields, and procedures for disclosing and resuming records of that type. Procedures for manipulating them are in the structure record-types.
These procedures construct the usual record-manipulating procedures. Record-constructor returns a constructor that is passed the initial values for the fields specified and returns a new record. Record-predicate returns a predicate that return true when passed a record of type record-type and false otherwise. Record-accessor and record-modifier return procedures that reference and set the given field in records of the approriate type.
Record-types is the initial exporter of define-record-discloser (re-exported by define-record-types described above) and define-record-resumer (re-exported by external-calls (section 7.8.3)).
The procedures described in this section can be used to define new record-type-defining macros.
(define-record-type pare :pare (kons x y) pare? (x kar set-kar!) (y kdr))
is (sematically) equivalent to
(define :pare (make-record-type 'pare '(x y))) (define kons (record-constructor :pare '(x y))) (define kar (record-accessor :pare 'x)) (define set-kar! (record-modifier :pare 'x)) (define kdr (record-accessor :pare 'y))
The ``(semantically)'' above is because define-record-type adds declarations, which allows the type checker to detect some misuses of records, and uses more efficient definitions for the constructor, accessors, and modifiers. Ignoring the declarations, which will have to wait for another edition of the manual, what the above example actually expands into is:
(define :pare (make-record-type 'pare '(x y))) (define (kons x y) (record :pare x y)) (define (kar r) (checked-record-ref r :pare 1)) (define (set-kar! r new) (checked-record-set! r :pare 1 new)) (define (kdr r) (checked-record-ref r :pare 2))
Checked-record-ref and Checked-record-set! are low-level procedures that check the type of the record and access or modify it using a single VM instruction.
5.11 Finite record types
The structure finite-types has two macros for defining `finite' record types. These are record types for which there are a fixed number of instances, all of which are created at the same time as the record type itself. The syntax for defining an enumerated type is:
(define-enumerated-type tag type-name predicate-name vector-of-instances-name name-accessor index-accessor (instance-name ...))
This defines a new record type, bound to type-name, with as many instances as there are instance-name's. Vector-of-instances-name is bound to a vector containing the instances of the type in the same order as the instance-name list. Tag is bound to a macro that when given an instance-name expands into an expression that returns corresponding instance. The name lookup is done at macro expansion time. Predicate-name is a predicate for the new type. Name-accessor and index-accessor are accessors for the name and index (in vector-of-instances) of instances of the type.
(define-enumerated-type color :color color? colors color-name color-index (black white purple maroon)) (color-name (vector-ref colors 0))=>
black (color-name (color white))=>
white (color-index (color purple))=>
2
Finite types are enumerations that allow the user to add additional fields in the type. The syntax for defining a finite type is:
(define-finite-type tag type-name (field-tag ...) predicate-name vector-of-instances-name name-accessor index-accessor (field-tag accessor-name [modifier-name]) ...((instance-name field-value ...) ...))
The additional fields are specified exactly as with define-record-type. The field arguments to the constructor are listed after the type-name; these do not include the name and index fields. The form ends with the names and the initial field values for the instances of the type. The instances are constructed by applying the (unnamed) constructor to these initial field values. The name must be first and the remaining values must match the field-tags in the constructor's argument list.
(define-finite-type color :color (red green blue) color? colors color-name color-index (red color-red) (green color-green) (blue color-blue) ((black 0 0 0) (white 255 255 255) (purple 160 32 240) (maroon 176 48 96))) (color-name (color black))=>
black (color-name (vector-ref colors 1))=>
white (color-index (color purple))=>
2 (color-red (color maroon))=>
176
5.12 Sets over finite types
The structure enum-sets has a macro for defining types for sets of elements of finite types. These work naturally with the finite types defined by the finite-types structure, but are not tied to them. The syntax for defining such a type is:
(define-enum-set-type id type-name predicate constructor element-syntax element-predicate all-elements element-index-ref)
This defines id to be syntax for constructing sets, type-name to be a value representing the type, predicate to be a predicate for those sets, and constructor a procedure for constructing one from a list.
Element-syntax must be the name of a macro for constructing set elements from names (akin to the tag argument to define-enumerated-type). Element-predicate must be a predicate for the element type, all-elements a vector of all values of the element type, and element-index-ref must return the index of an element within the all-elements vector.
Enum-set->list converts a set into a list of its elements. Enum-set-member? tests for membership. Enum-set=? tests two sets of equal type for equality. (If its arguments are not of the same type, enum-set=? raises an exception.) Enum-set-union computes the union of two sets of equal type, enum-set-intersection computes the intersection, and enum-set-negation computes the complement of a set.
Here is an example. Given an enumerated type:
(define-enumerated-type color :color color? colors color-name color-index (red blue green))
we can define sets of colors:
(define-enum-set-type color-set :color-set color-set? make-color-set color color? colors color-index)
> (enum-set->list (color-set red blue)) (#{Color red} #{Color blue}) > (enum-set->list (enum-set-negation (color-set red blue))) (#{Color green}) > (enum-set-member? (color-set red blue) (color blue)) #t
5.13 Hash tables
These are generic hash tables, and are in the structure tables. Strictly speaking they are more maps than tables, as every table has a value for every possible key (for that type of table). All but a finite number of those values are #f.
The first four functions listed make various kinds of tables. Make-table returns a table whose keys may be symbols, integer, characters, booleans, or the empty list (these are also the values that may be used in case expressions). As with case, comparison is done using eqv?. The comparison procedures used in symbol, string, and integer tables are eq?, string=?, and =.
Make-table-maker takes two procedures as arguments and returns a nullary table-making procedure. Compare-proc should be a two-argument equality predicate. Hash-proc should be a one argument procedure that takes a key and returns a non-negative integer hash value. If (compare-proc x y) returns true, then (= (hash-proc x) (hash-proc y)) must also return true. For example, make-integer-table could be defined as (make-table-maker = abs).
Make-table-immutable! prohibits future modification to its argument.
Table? is the predicate for tables. Table-ref and table-set! access and modify the value of key in table. Table-walk applies procedure, which must accept two arguments, to every associated key and non-#f value in table.
Default-hash-function is the hash function used in the tables returned by make-table, and string-hash it the one used by make-string-table.
5.14 Port extensions
These procedures are in structure extended-ports.
Make-string-input-port returns an input port that that reads characters from the supplied string. An end-of-file object is returned if the user reads past the end of the string. Make-string-output-port returns an output port that saves the characters written to it. These are then returned as a string by string-output-port-output.
(read (make-string-input-port "(a b)"))=>
'(a b) (let ((p (make-string-output-port))) (write '(a b) p) (let ((s (string-output-port-output p))) (display "c" p) (list s (string-output-port-output p))))=>
'("(a b)" "(a b)c")
Procedure is called on an output port. Output written to that port is copied to output-port until n characters have been written, at which point limit-output returns. If procedure returns before writing n characters, then limit-output also returns at that time, regardless of how many characters have been written.
Make-tracking-input-port and make-tracking-output-port return ports that keep track of the current row and column and are otherwise identical to their arguments. Closing a tracking port does not close the underlying port. Current-row and current-column return port's current read or write location. They return #f if port does not keep track of its location. Fresh-line writes a newline character to output-port if (current-row port) is not 0.
(define p (open-output-port "/tmp/temp")) (list (current-row p) (current-column p))=>
'(0 0) (display "012" p) (list (current-row p) (current-column p))=>
'(0 3) (fresh-line p) (list (current-row p) (current-column p))=>
'(1 0) (fresh-line p) (list (current-row p) (current-column p))=>
'(1 0)
5.15 Fluid bindings
These procedures implement dynamic binding and are in structure fluids. A fluid is a cell whose value can be bound dynamically. Each fluid has a top-level value that is used when the fluid is unbound in the current dynamic environment.
Make-fluid returns a new fluid with value as its initial top-level value. Fluid returns fluid's current value. Let-fluid calls thunk, with fluid bound to value until thunk returns. Using a continuation to throw out of the call to thunk causes fluid to revert to its original value, while throwing back in causes fluid to be rebound to value. Let-fluid returns the value(s) returned by thunk. Let-fluids is identical to let-fluid except that it binds an arbitrary number of fluids to new values.
(let* ((f (make-fluid 'a))
(v0 (fluid f))
(v1 (let-fluid f 'b
(lambda ()
(fluid f))))
(v2 (fluid f)))
(list v0 v1 v2))
=>
'(a b a)
(let ((f (make-fluid 'a))
(path '())
(c #f))
(let ((add (lambda ()
(set! path (cons (fluid f) path)))))
(add)
(let-fluid f 'b
(lambda ()
(call-with-current-continuation
(lambda (c0)
(set! c c0)))
(add)))
(add)
(if (< (length path) 5)
(c)
(reverse path))))
=>
'(a b a b a)
5.16 Shell commands
Structure c-system-function provides access to the C system() function.
Have-system? returns true if the underlying C implementation has a command processor. (System string) passes string to the C system() function and returns the result.
(begin
(system "echo foo > test-file")
(call-with-input-file "test-file" read))
=>
'foo
5.17 Sockets
Structure sockets provides access to TCP/IP sockets for interprocess and network communication.
Open-socket creates a new socket. If no port-number is supplied the system picks one at random. Socket-port-number returns a socket's port number. Close-socket closes a socket, preventing any further connections. Socket-accept accepts a single connection on socket, returning an input port and an output port for communicating with the client. If no client is waiting socket-accept blocks until one appears. Get-host-name returns the network name of the machine.
Socket-client connects to the server at port-number on the machine named host-name. Socket-client blocks until the server accepts the connection.
The following simple example shows a server and client for a centralized UID service.
(define (id-server) (let ((socket (open-socket))) (display "Waiting on port ") (display (socket-port-number socket)) (newline) (let loop ((next-id 0)) (call-with-values (lambda () (socket-accept socket)) (lambda (in out) (display next-id out) (close-input-port in) (close-output-port out) (loop (+ next-id 1))))))) (define (get-id machine port-number) (call-with-values (lambda () (socket-client machine port-number)) (lambda (in out) (let ((id (read in))) (close-input-port in) (close-output-port out) id))))
5.18 Macros for writing loops
Iterate and reduce are extensions of named-let for writing loops that walk down one or more sequences, such as the elements of a list or vector, the characters read from a port, or an arithmetic series. Additional sequences can be defined by the user. Iterate and reduce are in structure reduce.
5.18.1 Iterate
The syntax of iterate is:
(iterate loop-name ((sequence-type element-variable sequence-data ...) ...) ((state-variable initial-value) ...) body-expression [final-expression])
Iterate steps the element-variables in parallel through the sequences, while each state-variable has the corresponding initial-value for the first iteration and have later values supplied by body-expression. If any sequence has reached its limit the value of the iterate expression is the value of final-expression, if present, or the current values of the state-variables, returned as multiple values. If no sequence has reached its limit, body-expression is evaluated and either calls loop-name with new values for the state-variables, or returns some other value(s).
The loop-name and the state-variables and initial-values behave exactly as in named-let. The named-let expression
(let loop-name ((state-variable initial-value) ...) body ...)
is equivalent to an iterate expression with no sequences (and with an explicit let wrapped around the body expressions to take care of any internal defines):
(iterate loop-name () ((state-variable initial-value) ...) (let () body ...))
The sequence-types are keywords (they are actually macros of a particular form; it is easy to add additional types of sequences). Examples are list* which walks down the elements of a list and vector* which does the same for vectors. For each iteration, each element-variable is bound to the next element of the sequence. The sequence-data gives the actual list or vector or whatever.
If there is a final-expression, it is evaluated when the end of one or more sequences is reached. If the body-expression does not call loop-name the final-expression is not evaluated. The state-variables are visible in final-expression but the sequence-variables are not.
The body-expression and the final-expression are in tail-position within the iterate. Unlike named-let, the behavior of a non-tail-recursive call to loop-name is unspecified (because iterating down a sequence may involve side effects, such as reading characters from a port).
5.18.2 Reduce
If an iterate expression is not meant to terminate before a sequence has reached its end, body-expression will always end with a tail call to loop-name. Reduce is a macro that makes this common case explicit. The syntax of reduce is the same as that of iterate, except that there is no loop-name. The body-expression returns new values of the state-variables instead of passing them to loop-name. Thus body-expression must return as many values as there are state variables. By special dispensation, if there are no state variables then body-expression may return any number of values, all of which are ignored.
The syntax of reduce is:
(reduce ((sequence-type element-variable sequence-data ...) ...) ((state-variable initial-value) ...) body-expression [final-expression])
The value(s) returned by an instance of reduce is the value(s) returned by the final-expression, if present, or the current value(s) of the state variables when the end of one or more sequences is reached.
A reduce expression can be rewritten as an equivalent iterate expression by adding a loop-var and a wrapper for the body-expression that calls the loop-var.
(iterate loop ((sequence-type element-variable sequence-data ...) ...) ((state-variable initial-value) ...) (call-with-values (lambda () body-expression) loop) [final-expression])
5.18.3 Sequence types
The predefined sequence types are:
(list* elt-var list) (syntax)
(vector* elt-var vector) (syntax)
(string* elt-var string) (syntax)
(count* elt-var start [end [step]]) (syntax)
(input* elt-var input-port read-procedure) (syntax)
(stream* elt-var procedure initial-data) (syntax)
For lists, vectors, and strings the element variable is bound to the successive elements of the list or vector, or the characters in the string.
For count* the element variable is bound to the elements of the sequence
start, start + step, start + 2step, ..., end
inclusive of start and exclusive of end. The default step is 1. The sequence does not terminate if no end is given or if there is no N > 0 such that end = start + Nstep (= is used to test for termination). For example, (count* i 0 -1) doesn't terminate because it begins past the end value and (count* i 0 1 2) doesn't terminate because it skips over the end value.
For input* the elements are the results of successive applications of read-procedure to input-port. The sequence ends when read-procedure returns an end-of-file object.
For a stream, the procedure takes the current data value as an argument and returns two values, the next value of the sequence and a new data value. If the new data is #f then the previous element was the last one. For example,
(list* elt my-list)
is the same as
(stream* elt list->stream my-list)
where list->stream is
(lambda (list) (if (null? list) (values 'ignored #f) (values (car list) (cdr list))))
5.18.4 Synchronous sequences
When using the sequence types described above, a loop terminates when any of its sequences reaches its end. To help detect bugs it is useful to have sequence types that check to see if two or more sequences end on the same iteration. For this purpose there is second set of sequence types called synchronous sequences. These are identical to the ones listed above except that they cause an error to be signalled if a loop is terminated by a synchronous sequence and some other synchronous sequence did not reach its end on the same iteration.
Sequences are checked for termination in order, from left to right, and if a loop is terminated by a non-synchronous sequence no further checking is done.
The synchronous sequences are:
(list% elt-var list) (syntax)
(vector% elt-var vector) (syntax)
(string% elt-var string) (syntax)
(count% elt-var start end [step]) (syntax)
(input% elt-var input-port read-procedure) (syntax)
(stream% elt-var procedure initial-data) (syntax)
Note that the synchronous count% must have an end, unlike the nonsynchronous count%.
5.18.5 Examples
Gathering the indexes of list elements that answer true to some predicate.
(lambda (my-list predicate) (reduce ((list* elt my-list) (count* i 0)) ((hits '())) (if (predicate elt) (cons i hits) hits) (reverse hits))
Looking for the index of an element of a list.
(lambda (my-list predicate) (iterate loop ((list* elt my-list) (count* i 0)) () ; no state (if (predicate elt) i (loop))))
Reading one line.
(define (read-line port)
(iterate loop
((input* c port read-char))
((chars '()))
(if (char=? c #\
newline)
(list->string (reverse chars))
(loop (cons c chars)))
(if (null? chars)
(eof-object)
; no newline at end of file
(list->string (reverse chars)))))
Counting the lines in a file. We can't use count* because we need the value of the count after the loop has finished.
(define (line-count name) (call-with-input-file name (lambda (in) (reduce ((input* l in read-line)) ((i 0)) (+ i 1)))))
5.18.6 Defining sequence types
The sequence types are object-oriented macros similar to enumerations. A non-synchronous sequence macro needs to supply three values: #f to indicate that it isn't synchronous, a list of state variables and their initializers, and the code for one iteration. The first two methods are CPS'ed: they take another macro and argument to which to pass their result. The synchronized? method gets no additional arguments. The state-vars method is passed a list of names which will be bound to the arguments to the sequence. The final method, for the step, is passed the list of names bound to the arguments and the list of state variables. In addition there is a variable to be bound to the next element of the sequence, the body expression for the loop, and an expression for terminating the loop.
The definition of list* is
(define-syntax list* (syntax-rules (synchronized? state-vars step) ((list* synchronized? (next more)) (next #f more)) ((list* state-vars (start-list) (next more)) (next ((list-var start-list)) more)) ((list* step (start-list) (list-var) value-var loop-body final-exp) (if (null? list-var) final-exp (let ((value-var (car list-var)) (list-var (cdr list-var))) loop-body)))))
Synchronized sequences are the same, except that they need to provide a termination test to be used when some other synchronized method terminates the loop.
(define-syntax list% (syntax-rules (sync done) ((list% sync (next more)) (next #t more)) ((list% done (start-list) (list-var)) (null? list-var)) ((list% stuff ...) (list* stuff ...))))
5.18.7 Expanded code
The expansion of
(reduce ((list* x '(1 2 3))) ((r '())) (cons x r))
is
(let ((final (lambda (r) (values r))) (list '(1 2 3)) (r '())) (let loop ((list list) (r r)) (if (null? list) (final r) (let ((x (car list)) (list (cdr list))) (let ((continue (lambda (r) (loop list r)))) (continue (cons x r)))))))
The only inefficiencies in this code are the final and continue procedures, both of which could be substituted in-line. The macro expander could do the substitution for continue when there is no explicit proceed variable, as in this case, but not in general.
5.19 Sorting lists and vectors
(This section, as the libraries it describes, was written mostly by Olin Shivers for the draft of SRFI 32.)
The sort libraries in Scheme 48 include
vector insert sort (stable)
vector heap sort
vector merge sort (stable)
pure and destructive list merge sort (stable)
stable vector and list merge
miscellaneous sort-related procedures: vector and list merging, sorted predicates, vector binary search, vector and list delete-equal-neighbor procedures.
a general, non-algorithmic set of procedure names for general sorting and merging
5.19.1 Design rules
What vs. how
There are two different interfaces: ``what'' (simple) and ``how'' (detailed).
- Simple
- you specify semantics: datatype (list or vector),
mutability, and stability.
- Detailed
- you specify the actual algorithm (quick, heap,
insert, merge). Different algorithms have different properties,
both semantic and pragmatic, so these exports are necessary.
It is necessarily the case that the specifications of these procedures make statements about execution ``pragmatics.'' For example, the sole distinction between heap sort and quick sort -- both of which are provided by this library -- -is one of execution time, which is not a ``semantic'' distinction. Similar resource-use statements are made about ``iterative'' procedures, meaning that they can execute on input of arbitrary size in a constant number of stack frames.
Consistency across procedure signatures
The two interfaces share common procedure signatures wherever possible, to facilitate switching a given call from one procedure to another.
Less-than parameter first, data parameter after
These procedures uniformly observe the following parameter order: the data to be sorted comes after the comparison procedure. That is, we write
(sort < list)
not
(sort list <)
Ordering, comparison procedures and stability
These routines take a < comparison procedure, not a < comparison procedure, and they sort into increasing order. The difference between a < spec and a < spec comes up in two places:
the definition of an ordered or sorted data set, and
the definition of a stable sorting algorithm.
We say that a data set (a list or vector) is sorted or ordered if it contains no adjacent pair of values ... x, y ... such that y < x.
In other words, scanning across the data never takes a ``downwards'' step.
If you use a < procedure where these algorithms expect a < procedure, you may not get the answers you expect. For example, the list-sorted? procedure will return false if you pass it a < comparison procedure and an ordered list containing adjacent equal elements.
A ``stable'' sort is one that preserves the pre-existing order of equal elements. Suppose, for example, that we sort a list of numbers by comparing their absolute values, i.e., using comparison procedure
(lambda (x y) (< (abs x) (abs y)))
If we sort a list that contains both 3 and -3:
... 3, ..., - 3 ... |
then a stable sort is an algorithm that will not swap the order of these two elements, that is, the answer is guaranteed to to look like
... 3, - 3 ... |
not
... - 3, 3 ... |
Choosing < for the comparison procedure instead of < affects how stability is coded. Given an adjacent pair x, y, (< y x) means ``x should be moved in front of x'' -- otherwise, leave things as they are. So using a < procedure where a < procedure is expected will invert stability.
This is due to the definition of equality, given a < comparator:
(and (not (< x y)) (not (< y x)))
The definition is rather different, given a < comparator:
(and (<= x y) (<= y x))
A ``stable'' merge is one that reliably favors one of its data sets when equal items appear in both data sets. All merge operations in this library are stable, breaking ties between data sets in favor of the first data set -- elements of the first list come before equal elements in the second list.
So, if we are merging two lists of numbers ordered by absolute value, the stable merge operation list-merge
(list-merge (lambda (x y) (< (abs x) (abs y))) '(0 -2 4 8 -10) '(-1 3 -4 7))
reliably places the 4 of the first list before the equal-comparing -4 of the second list:
(0 -1 -2 4 -4 7 8 -10)
Some sort algorithms will not work correctly if given a < when they expect a < comparison (or vice-versa).
In short, if your comparison procedure f answers true to (f x x), then
using a stable sorting or merging algorithm will not give you a stable sort or merge,
list-sorted? may surprise you.
Note that you can synthesize a < procedure from a < procedure with
(lambda (x y) (not (<= y x)))
if need be.
Precise definitions give sharp edges to tools, but require care in use. ``Measure twice, cut once.''
All vector operations accept optional subrange parameters
The vector operations specified below all take optional start/end arguments indicating a selected subrange of a vector's elements. If a start parameter or start/end parameter pair is given to such a procedure, they must be exact, non-negative integers, such that
0 < start < end < (vector-length vector) |
where vector is the related vector parameter. If not specified, they default to 0 and the length of the vector, respectively. They are interpreted to select the range [start,end), that is, all elements from index start (inclusive) up to, but not including, index end.
Required vs. allowed side-effects
List-sort! and List-stable-sort! are allowed, but not required, to alter their arguments' cons cells to construct the result list. This is consistent with the what-not-how character of the group of procedures to which they belong (the sorting structure).
The list-delete-neighbor-dups!, list-merge! and list-merge-sort! procedures, on the other hand, provide specific algorithms, and, as such, explicitly commit to the use of side-effects on their input lists in order to guarantee their key algorithmic properties (e.g., linear-time operation).
5.19.2 Procedure specification
|
Procedure naming and functionality
Almost all of the procedures described below are variants of two basic operations: sorting and merging. These procedures are consistently named by composing a set of basic lexemes to indicate what they do.
|
Types of parameters and return values
In the procedures specified below,
A < or = parameter is a procedure accepting two arguments taken from the specified procedure's data set(s), and returning a boolean;
Start and end parameters are exact, non-negative integers that serve as vector indices selecting a subrange of some associated vector. When specified, they must satisfy the relation
0 < start < end < (vector-length vector) where vector is the associated vector.
Passing values to procedures with these parameters that do not satisfy these types is an error.
If a procedure is said to return ``unspecified,'' this means that nothing at all is said about what the procedure returns, not even the number of return values. Such a procedure is not even required to be consistent from call to call in the nature or number of its return values. It is simply required to return a value (or values) that may be passed to a command continuation, e.g. as the value of an expression appearing as a non-terminal subform of a begin expression. Note that in R5RS, this restricts such a procedure to returning a single value; non-R5RS systems may not even provide this restriction.
5.19.2.1 sorting -- general sorting package
This library provides basic sorting and merging functionality suitable for general programming. The procedures are named by their semantic properties, i.e., what they do to the data (sort, stable sort, merge, and so forth).
(vector-merge < v1 v2 [start1 [end1 [start2 [end2]]]]) -> vector
(vector-merge! < v v1 v2 [start [start1 [end1 [start2 [end2]]]]])
|
All four merge operations are stable: an element of the initial list list1 or vector vector1 will come before an equal-comparing element in the second list list2 or vector vector2 in the result.
The procedures
list-merge
list-sort
list-stable-sort
list-delete-neighbor-dups
do not alter their inputs and are allowed to return a value that shares a common tail with a list argument.
The procedure
list-sort!
list-stable-sort!
are ``linear update'' operators -- they are allowed, but not required, to alter the cons cells of their arguments to produce their results.
On the other hand, the list-merge! procedure make only a single, iterative, linear-time pass over its argument list, using set-cdr!s to rearrange the cells of the list into the final result -- it works ``in place.'' Hence, any cons cell appearing in the result must have originally appeared in an input. The intent of this iterative-algorithm commitment is to allow the programmer to be sure that if, for example, list-merge! is asked to merge two ten-million-element lists, the operation will complete without performing some extremely (possibly twenty-million) deep recursion.
The vector procedures
vector-sort
vector-stable-sort
vector-delete-neighbor-dups
do not alter their inputs, but allocate a fresh vector for their result, of length end - start.
The vector procedures
vector-sort!
vector-stable-sort!
sort their data in-place. (But note that vector-stable-sort! may allocate temporary storage proportional to the size of the input .)
Vector-merge returns a vector of length (end1 - start1 + (end2 - start2).
Vector-merge! writes its result into vector v, beginning at index start, for indices less than end = start + (end1 - start1) + (end2 - start2). The target subvector v[start,end) may not overlap either source subvector vector1[start1,end1) vector2[start2,end2).
The ...-delete-neighbor-dups-... procedures: These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure. The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.
These procedures are linear time -- much faster than the O(n2) general duplicate-element deletors that do not assume any ``bunching'' of elements (such as the ones provided by SRFI 1). If you want to delete duplicate elements from a large list or vector, you can sort the elements to bring equal items together, then use one of these procedures, for a total time of O(nlog(n)).
The comparison procedure = passed to these procedures is always applied ( = x y) where x comes before y in the containing list or vector.
List-delete-neighbor-dups does not alter its input list; its answer may share storage with the input list.
Vector-delete-neighbor-dups does not alter its input vector, but rather allocates a fresh vector to hold the result.
Examples:
(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2)) ===> (1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2)) ===> #(1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7) ===> #(7 0 -2)
5.19.2.2 Algorithm-specific sorting packages
These packages provide more specific sorting functionality, that is, specific committment to particular algorithms that have particular pragmatic consequences (such as memory locality, asymptotic running time) beyond their semantic behaviour (sorting, stable sorting, merging, etc.). Programmers that need a particular algorithm can use one of these packages.
sorted -- sorted predicates
Return #f iff there is an adjacent pair ... x, y ... in the input list or vector such that y < x. The optional start/end range arguments restrict vector-sorted? to the indicated subvector.
list-merge-sort -- list merge sort
The sort procedures sort their data using a list merge sort, which is stable. (The reference implementation is, additionally, a ``natural'' sort. See below for the properties of this algorithm.)
The ! procedures are destructive -- they use set-cdr!s to rearrange the cells of the lists into the proper order. As such, they do not allocate any extra cons cells -- they are ``in place'' sorts.
The merge operations are stable: an element of list1 will come before an equal-comparing element in list2 in the result list.
vector-merge-sort -- vector merge sort
(vector-merge < vector1 vector2 [start1 [end1 [start2 [end2]]]]) -> vector
(vector-merge! < vector vector1 vector2 [start [start1 [end1 [start2 [end2]]]]])
The sort procedures sort their data using vector merge sort, which is stable. (The reference implementation is, additionally, a ``natural'' sort. See below for the properties of this algorithm.)
The optional start/end arguments provide for sorting of subranges, and default to 0 and the length of the corresponding vector.
Merge-sorting a vector requires the allocation of a temporary ``scratch'' work vector for the duration of the sort. This scratch vector can be passed in by the client as the optional temp argument; if so, the supplied vector must be of size < end, and will not be altered outside the range [start,end). If not supplied, the sort routines allocate one themselves.
The merge operations are stable: an element of vector1 will come before an equal-comparing element in vector2 in the result vector.
Vector-merge-sort! leaves its result in vector[start,end).
Vector-merge-sort returns a vector of length end - start.
Vector-merge returns a vector of length (end1 - start1) + (end2 - start2).
Vector-merge! writes its result into vector, beginning at index start, for indices less than end = start + (end1 - start1) + (end2 - start2). The target subvector
vector[start,end) may not overlap either source subvector
vector1[start1,end1), or vector2[start2,end2).
vector-heap-sort -- vector heap sort
These procedures sort their data using heap sort, which is not a stable sorting algorithm.
Vector-heap-sort returns a vector of length end - start. Vector-heap-sort! is in-place, leaving its result in vector[start,end).
vector-insert-sort -- vector insertion sort
These procedures stably sort their data using insertion sort.
Vector-insert-sort returns a vector of length end - start.
Vector-insert-sort! is in-place, leaving its result in vector[start,end).
delete-neighbor-duplicates -- list and vector delete neighbor duplicates
(vector-delete-neighbor-dups = vector [start [end]]) -> vector
(vector-delete-neighbor-dups! = vector [start [end]]) -> end'
These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure = . The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.
These procedures are linear time -- much faster than the O(n2) general duplicate-element deletors that do not assume any ``bunching'' of elements (such as the ones provided by SRFI 1). If you want to delete duplicate elements from a large list or vector, you can sort the elements to bring equal items together, then use one of these procedures, for a total time of O(nlog(n)).
The comparison procedure = passed to these procedures is always applied
( = x y)
where x comes before y in the containing list or vector.
List-delete-neighbor-dups does not alter its input list; its answer may share storage with the input list.
Vector-delete-neighbor-dups does not alter its input vector, but rather allocates a fresh vector to hold the result.
List-delete-neighbor-dups! is permitted, but not required, to mutate its input list in order to construct its answer.
Vector-delete-neighbor-dups! reuses its input vector to hold the answer, packing its answer into the index range [start,end'), where end' is the non-negative exact integer returned as its value. It returns end' as its result. The vector is not altered outside the range [start,end').
Examples:
(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2)) ===> (1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2)) ===> #(1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2) 3 7) ===> #(7 0 -2) ;; Result left in v[3,9): (let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6))) (cons (vector-delete-neighbor-dups! = v 3) v)) ===> (9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))
binary-searches -- vector binary search
(vector-binary-search < elt->key key vector [start [end]]) -> integer or #f
(vector-binary-search3 compare-proc vector [start [end]]) -> integer or #f
vector-binary-search searches vector in range [start,end) (which default to 0 and the length of vector, respectively) for an element whose associated key is equal to key. The procedure elt->key is used to map an element to its associated key. The elements of the vector are assumed to be ordered by the < relation on these keys. That is,
(vector-sorted? (lambda (x y) (< (elt->key x) (elt->key y))) vector start end) ===> true
An element e of vector is a match for key if it's neither less nor greater than the key:
(and (not (< (elt->key e) key)) (not (< key (elt->key e))))
If there is such an element, the procedure returns its index in the vector as an exact integer. If there is no such element in the searched range, the procedure returns false.
(vector-binary-search < car 4 '#((1 . one) (3 . three) (4 . four) (25 . twenty-five))) ===> 2 (vector-binary-search < car 7 '#((1 . one) (3 . three) (4 . four) (25 . twenty-five))) ===> #f
Vector-binary-search3 is a variant that uses a three-way comparison procedure compare-proc. Compare-proc compares its parameter to the search key, and returns an exact integer whose sign indicates its relationship to the search key.
arrayrclcrcl
(compare-proc x) &<& 0& ==>& x &<& search-key (compare-proc x) & = & 0& ==>& x & = & search-key (compare-proc x) &>& 0& ==>& x &>& search-key endarray |
(vector-binary-search3 (lambda (elt) (- (car elt) 4)) '#((1 . one) (3 . three) (4 . four) (25 . twenty-five))) ===> 2
5.19.3 Algorithmic properties
Different sort and merge algorithms have different properties. Choose the algorithm that matches your needs:
- Vector insert sort
- Stable, but only suitable for small vectors -- O(n2).
- Vector heap sort
- Not stable. Guaranteed fast -- O(nlog(n)) worst case. Poor locality on large vectors. A very reliable workhorse.
- Vector merge sort
-
Stable. Not in-place -- requires a temporary buffer of equal size.
Fast -- O(nlog(n)) -- and has good memory locality for large vectors.
The implementation of vector merge sort provided by this implementation is, additionally, a ``natural'' sort, meaning that it exploits existing order in the input data, providing O(n) best case.
- Destructive list merge sort
-
Stable, fast and in-place (i.e., allocates no new cons cells). ``Fast''
means O(nlog(n)) worse-case, and substantially better if the data
is already mostly ordered, all the way down to linear time for
a completely-ordered input list (i.e., it is a ``natural'' sort).
Note that sorting lists involves chasing pointers through memory, which can be a loser on modern machine architectures because of poor cache and page locality. Sorting vectors has inherently better locality.
This implementation's destructive list merge and merge sort implementations are opportunistic -- they avoid redundant set-cdr!s, and try to take long already-ordered runs of list structure as-is when doing the merges.
- Pure list merge sort
- Stable and fast -- O(nlog(n)) worst-case, and possibly O(n), depending upon the input list (see discussion above).
|
5.20 Regular expressions
This section describes a functional interface for building regular expressions and matching them against strings. The matching is done using the POSIX regular expression package. Regular expressions are in the structure regexps.
A regular expression is either a character set, which matches any character in the set, or a composite expression containing one or more subexpressions. A regular expression can be matched against a string to determine success or failure, and to determine the substrings matched by particular subexpressions.
Returns #t if value is a regular expression created using the functional interface for regular expressions, and #f otherwise.
5.20.1 Character sets
Character sets may be defined using a list of characters and strings, using a range or ranges of characters, or by using set operations on existing character sets.
Set returns a set that contains the character arguments and the characters in any string arguments. Range returns a character set that contain all characters between low-char and high-char, inclusive. Ranges returns a set that contains all characters in the given ranges. Range and ranges use the ordering induced by char->integer. Ascii-range and ascii-ranges use the ASCII ordering. It is an error for a high-char to be less than the preceding low-char in the appropriate ordering.
These perform the indicated operations on character sets.
The following character sets are predefined:
|
5.20.2 Anchoring
String-start returns a regular expression that matches the beginning of the string being matched against; string-end returns one that matches the end.
5.20.3 Composite expressions
Sequence matches the concatenation of its arguments, one-of matches any one of its arguments.
Text returns a regular expression that matches the characters in string, in order.
Repeat returns a regular expression that matches zero or more occurences of its reg-exp argument. With no count the result will match any number of times (reg-exp*). With a single count the returned expression will match reg-exp exactly that number of times. The final case will match from min to max repetitions, inclusive. Max may be #f, in which case there is no maximum number of matches. Count and min should be exact, non-negative integers; max should either be an exact non-negative integer or #f.
5.20.4 Case sensitivity
Regular expressions are normally case-sensitive.
The value returned by ignore-case is identical its argument except that case will be ignored when matching. The value returned by use-case is protected from future applications of ignore-case. The expressions returned by use-case and ignore-case are unaffected by later uses of the these procedures. By way of example, the following matches "ab" but not "aB", "Ab", or "AB".
(text "ab")
while
(ignore-case (test "ab"))
matches "ab", "aB", "Ab", and "AB" and
(ignore-case (sequence (text "a") (use-case (text "b"))))
matches "ab" and "Ab" but not "aB" or "AB".
5.20.5 Submatches and matching
A subexpression within a larger expression can be marked as a submatch. When an expression is matched against a string, the success or failure of each submatch within that expression is reported, as well as the location of the substring matched be each successful submatch.
Submatch returns a regular expression that matches its argument and causes the result of matching its argument to be reported by the match procedure. Key is used to indicate the result of this particular submatch in the alist of successful submatches returned by match. Any value may be used as a key. No-submatches returns an expression identical to its argument, except that all submatches have been elided.
Any-match? returns #t if string matches reg-exp or contains a substring that does, and #f otherwise. Exact-match? returns #t if string matches reg-exp and #f otherwise.
Match returns #f if reg-exp does not match string and a match record if it does match. A match record contains three values: the beginning and end of the substring that matched the pattern and an a-list of submatch keys and corresponding match records for any submatches that also matched. Match-start returns the index of the first character in the matching substring and match-end gives index of the first character after the matching substring. Match-submatches returns an alist of submatch keys and match records. Only the top match record returned by match has a submatch alist.
Matching occurs according to POSIX. The match returned is the one with the lowest starting index in string. If there is more than one such match, the longest is returned. Within that match the longest possible submatches are returned.
All three matching procedures cache a compiled version of reg-exp. Subsequent calls with the same reg-exp will be more efficient.
The C interface to the POSIX regular expression code uses ASCII nul as an end-of-string marker. The matching procedures will ignore any characters following an embedded ASCII nuls in string.
(define pattern (text "abc")) (any-match? pattern "abc")=>
#t (any-match? pattern "abx")=>
#f (any-match? pattern "xxabcxx")=>
#t (exact-match? pattern "abc")=>
#t (exact-match? pattern "abx")=>
#f (exact-match? pattern "xxabcxx")=>
#f (match pattern "abc")=>
(#{match 0 3}) (match pattern "abx")=>
#f (match pattern "xxabcxx")=>
(#{match 2 5}) (let ((x (match (sequence (text "ab") (submatch 'foo (text "cd")) (text "ef")) "xxxabcdefxx"))) (list x (match-submatches x)))=>
(#{match 3 9} ((foo . #{match 5 7})) (match-submatches (match (sequence (set "a") (one-of (submatch 'foo (text "bc")) (submatch 'bar (text "BC")))) "xxxaBCd"))=>
((bar . #{match 4 6}))
5.21 SRFIs
`SRFI' stands for `Scheme Request For Implementation'. An SRFI is a description of an extension to standard Scheme. Draft and final SRFI documents, a FAQ, and other information about SRFIs can be found at the SRFI web site.
Scheme 48 includes implementations of the following (final) SRFIs:
SRFI 1 - List Library
SRFI 2 - and-let*
SRFI 4 - Homogeneous numeric vector datatypes (see note below)
SRFI 5 - let with signatures and rest arguments
SRFI 6 - Basic string ports
SRFI 7 - Program configuration
SRFI 8 - receive
SRFI 9 - Defining record types
SRFI 11 - Syntax for receiving multiple values
SRFI 13 - String Library
SRFI 14 - Character-Set Library (see note below)
SRFI 16 - Syntax for procedures of variable arity
SRFI 17 - Generalized set!
SRFI 22 - Running Scheme Scripts on Unix
SRFI 23 - Error reporting mechanism
SRFI 25 - Multi-dimensional Array Primitives
SRFI 26 - Notation for Specializing Parameters without Currying
SRFI 27 - Sources of Random Bits
SRFI 28 - Basic Format Strings
SRFI 31 - A special form rec for recursive evaluation
SRFI 34 - Exception Handling for Programs
SRFI 35 - Conditions
SRFI 36 - I/O Conditions
SRFI 37 - args-fold: a program argument processor
SRFI 40 - A Library of Streams
SRFI 42 - Eager Comprehensions
SRFI 43 - Vector library
SRFI 45 - Primitives for Expressing Iterative Lazy Algorithms
SRFI 60 - Integers as Bits
SRFI 61 - A more general cond clause
SRFI 63 - Homogeneous and Heterogeneous Arrays
SRFI 66 - Octet Vectors
SRFI 67 - Compare Procedures
SRFI 74 - Octet-Addressed Binary Blocks
SRFI 78 - Lightweight testing
Documentation on these can be found at the web site mentioned above.
SRFI 4 specifies an external representation for homogeneous numeric vectors that is incompatible with R5RS. The Scheme 48 version of SRFI 4 does not support this external representation.
SRFI 14 includes the procedure ->char-set which is not a standard Scheme identifier (in R5RS the only required identifier starting with - is - itself). In the Scheme 48 version of SRFI 14 we have renamed ->char-set as x->char-set.
With the exception of SRFI 62 (which is supported by default), the SRFI bindings can be accessed either by opening the appropriate structure (the structure srfi-n contains SRFI n) or by loading structure srfi-7 and then using the ,load-srfi-7-program command to load an SRFI 7-style program. The syntax for the command is
,load-srfi-7-program name filename
This creates a new structure and associated package, binds the structure to name in the configuration package, and then loads the program found in filename into the package.
As an example, if the file test.scm contains
(program (code (define x 10)))
this program can be loaded as follows:
> ,load-package srfi-7 > ,load-srfi-7-program test test.scm [test] > ,in test test> x 10 test>