The Standard ML Basis Library

The `StreamIO` functor

The optional StreamIO functor provides a way to build a stream IO stack on top of an arbitrary primitive I/O implementation. For example, given an implementation of readers and writers for pairs of integers, one can define streams of pairs of integers.

Synopsis

functor StreamIO ( ... ) : STREAM_IO

Functor argument interface

structure PrimIO : PRIM_IO structure Vector : MONO_VECTOR structure Array : MONO_ARRAY sharing type PrimIO.elem = Vector.elem = Array.elem sharing type PrimIO.vector = Vector.vector = Array.vector sharing type PrimIO.array = Array.array val someElem : PrimIO.elem

Description

structure PrimIO: This is the underlying primitive I/O structure.
structure Vector
structure Array
sharing type PrimIO.elem
sharing type PrimIO.vector
sharing type PrimIO.array
someElem: is some arbitrary element used to initialize buffer arrays.

Discussion

The Vector and Array structures provide vector and array operations for manipulating the vectors and arrays used in PrimIO and StreamIO. The element someElem is used to initialize buffer arrays; any element will do.

The types instream and outstream in the result of the StreamIO functor must be abstract.

If flushOut finds that it can do only a partial write (i.e., writeVec or a similar function returns a ``number of elements written'' less than its sz argument), then flushOut must adjust its buffer for the items written and then try again. If the first or any successive write attempt returns zero elements written (or raises an exception) then flushOut raises the IO.Io exception.

If an exception occurs during any stream I/O operation, then the module must, of course, leave itself in a consistent state, without losing or duplicating data.

In some ML systems, a user interrupt aborts execution and returns control to a top-level prompt, without raising any exception that the current execution can handle. It may be the case that some information must be lost or duplicated. Data (input or output) must never be duplicated, but may be lost. This can be accomplished without stream I/O doing any explicit masking of interrupts or locking. On output, the internal state (saying how much has been written should be updated before doing the write operation; on input, the read should be done before updating the count of valid characters in the buffer.

Implementation note:

Here are some suggestions for efficient performance:

Operations on the underlying readers and writers (readVec, etc.) are expected to be expensive (involving a system call, with context switch).
Small input operations can be done from a buffer; the readVec or readVecNB operation of the underlying reader can replenish the buffer when necessary.
Each reader may provide only a subset of readVec, readVecNB, block, canInput, etc. An augmented reader that provides more operations can be constructed using PrimIO.augmentIn, but it may be more efficient to use the functions directly provided by the reader, instead of relying on the constructed ones. The same applies to augmented writers.
Keep the position of the beginning of the buffer on a multiple-of-chunkSize boundary, and do read or write operations with a multiple-of-chunkSize number of elements.
For very large inputAll or inputN operations, it is (somewhat) inefficient to read one chunkSize at a time and then concatenate all the results together. Instead, it is good to try to do the read all in one large system call; that is, readBlock(n). However, in a typical implementation of readVec, this requires pre-allocating a vector of size n. However, in inputAll(), the size of the vector is not known a priori and if the argument to inputN is large, the allocation of a much-too-large buffer is wasteful. Therefore, for large input operations, query the size of the reader using endPos, subtract the current position, and try to read that much. But one should also keep things rounded to the nearest chunkSize.
The use of endPos to try to do (large) read operations of just the right size will be inaccurate on translated readers. But this inaccuracy can be tolerated: if the translation is anything close to 1-1, endPos will still provide a very good hint about the order-of-magnitude size of the file.
Similar suggestions apply to very large output operations. Small outputs go through a buffer; the buffer is written with writeArr. Very large outputs can be written directly from the argument string using writeVec.
A lazy functional instream can (should) be implemented as a sequence of immutable (vector) buffers, each with a mutable ref to the next ``thing,'' which is either another buffer, the underlying reader, or an indication that the stream has been truncated.
The input function should return the largest sequence that is most convenient. Usually this means ``the remaining contents of the current buffer.''
To support non-blocking input, use readVecNB if it exists, otherwise do canInput followed (if appropriate) by readVec.
To support blocking input, use readVec if it exists, otherwise do readVecNB followed (if it would block) by block. and then another readVecNB.
To support lazy functional streams, readArr and readArrNB are not useful. If necessary, readVec should be synthesized from readArr and readVecNB from readArrNB.
writeArr should, if necessary, be synthesized from writeVec and vice versa. Similarly for writeArrNB and writeVecNB.