StreamIO functor provides a way to build a stream IO stack on top of an arbitrary primitive I/O implementation. For example, given an implementation of readers and writers for pairs of integers, one can define streams of pairs of integers.
StreamIO( ... ) : STREAM_IO
structure PrimIO : PRIM_IO
structure Vector : MONO_VECTOR
structure Array : MONO_ARRAY
sharing type PrimIO.elem = Vector.elem = Array.elem
sharing type PrimIO.vector = Vector.vector = Array.vector
sharing type PrimIO.array = Array.array
val someElem : PrimIO.elem
sharing type PrimIO.elem
sharing type PrimIO.vector
sharing type PrimIO.array
Array structures provide vector and array operations for manipulating the vectors and arrays used in
StreamIO. The element someElem is used to initialize buffer arrays; any element will do.
outstream in the result of the StreamIO functor must be abstract.
If flushOut finds that it can do only a partial write (i.e.,
writeVec or a similar function returns a ``number of elements written'' less than its sz argument), then flushOut must adjust its buffer for the items written and then try again. If the first or any successive write attempt returns zero elements written (or raises an exception) then flushOut raises the IO.Io exception.
If an exception occurs during any stream I/O operation, then the module must, of course, leave itself in a consistent state, without losing or duplicating data.
In some ML systems, a user interrupt aborts execution and returns control to a top-level prompt, without raising any exception that the current execution can handle. It may be the case that some information must be lost or duplicated. Data (input or output) must never be duplicated, but may be lost. This can be accomplished without stream I/O doing any explicit masking of interrupts or locking. On output, the internal state (saying how much has been written should be updated before doing the write operation; on input, the read should be done before updating the count of valid characters in the buffer.
Here are some suggestions for efficient performance:
- Operations on the underlying readers and writers (
readVec, etc.) are expected to be expensive (involving a system call, with context switch).
- Small input operations can be done from a buffer; the
readVecNBoperation of the underlying reader can replenish the buffer when necessary.
- Each reader may provide only a subset of
canInput, etc. An augmented reader that provides more operations can be constructed using
PrimIO.augmentIn, but it may be more efficient to use the functions directly provided by the reader, instead of relying on the constructed ones. The same applies to augmented writers.
- Keep the position of the beginning of the buffer on a multiple-of-
chunkSizeboundary, and do read or write operations with a multiple-of-
chunkSizenumber of elements.
- For very large
inputNoperations, it is (somewhat) inefficient to read one
chunkSizeat a time and then concatenate all the results together. Instead, it is good to try to do the read all in one large system call; that is,
readBlock(n). However, in a typical implementation of
readVec, this requires pre-allocating a vector of size n. However, in
inputAll(), the size of the vector is not known a priori and if the argument to
inputNis large, the allocation of a much-too-large buffer is wasteful. Therefore, for large input operations, query the size of the reader using
endPos, subtract the current position, and try to read that much. But one should also keep things rounded to the nearest
- The use of
endPosto try to do (large) read operations of just the right size will be inaccurate on translated readers. But this inaccuracy can be tolerated: if the translation is anything close to 1-1,
endPoswill still provide a very good hint about the order-of-magnitude size of the file.
- Similar suggestions apply to very large output operations. Small outputs go through a buffer; the buffer is written with
writeArr. Very large outputs can be written directly from the argument string using
- A lazy functional instream can (should) be implemented as a sequence of immutable (vector) buffers, each with a mutable ref to the next ``thing,'' which is either another buffer, the underlying reader, or an indication that the stream has been truncated.
inputfunction should return the largest sequence that is most convenient. Usually this means ``the remaining contents of the current buffer.''
- To support non-blocking input, use
readVecNBif it exists, otherwise do
canInputfollowed (if appropriate) by
- To support blocking input, use
readVecif it exists, otherwise do
readVecNBfollowed (if it would block) by
block. and then another
- To support lazy functional streams,
readArrNBare not useful. If necessary,
readVecshould be synthesized from
writeArrshould, if necessary, be synthesized from
writeVecand vice versa. Similarly for
Last Modified May 10, 1996
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies