The Standard ML Basis Library

SML'96 Changes

Concurrent with, but mainly independent of, the design of the SML Basis Library has been work by the authors of the SML language to revise the language [CITE]definition/. In addition to simplifying and clarifying certain aspects of the original definition, the revision includes modest changes to the language that affect the programmer's use of the language and that address issues raised during the design of the library. For example, the revised language supports character literals, which greatly extends the expressiveness of the library's character types.

This chapter discusses the most significant of these changes, at least from the library's viewpoint. In addition, it describes in passing the changes concerning imperative/weak types and structure sharing, and notes incompatibilities between the current library proposal and the initial basis described in the original definition. A complete and authoritative discussion of the language changes is given, of course, in the revised [CITE]definition/.

Literals

The new character type and the possibility of multiple implementations of the numeric types requires addressing the issue of literals.

Character literals

The revised definition extends the allowed escape sequences for characters to include:

  \a       Alert (ASCII 0x07)
  \b       Backspace (ASCII 0x08)
  \v       Vertical tab (ASCII 0x0B)
  \f       Form feed (ASCII 0x0C)
  \r       Carriage return (ASCII 0x0D)
  \uxxxx   The character whose encoding is the number xxxx 
           consisting of four hexadecimal digits.

There is additional notation for character literals:

#"c"

where c is any legal string representing a single character. This notation has the advantage that existing legal SML code will not be affected.

Numeric literals

Hexadecimal integer constants are part of the revised language. Hexadecimal literals have the notation:

[~]{0x}[0123456789abcdefABCDEF]⁽⁺⁾

The language supports word types, i.e., nonnegative integers with modular arithmetic corresponding to machine words. The revised definition provides decimal and hexadecimal word literals. Word literals will have a ``0w'' prefix; for example: 0w0, 0w10, or 0wxFF. Word literals do not have a sign.

The specification of real literals has been relaxed to allow either `E' or `e' for the exponent.

Overloading on literals

With the possibility of multiple representations of the basic types in a given implementation (e.g., Int32 and LargeInt), it is convenient to be able to resolve literals to various specific types without the programmer having to supply specific type information. The revised definition specifies that literals are viewed as overloaded symbols that, in the lack of additional type information, are given a default representation. Thus, the top-level binding

val x = 1

would give x the type int, while

val x = (1 : LargeInt.int)
val x : LargeInt.int = 1

would both give x the type LargeInt.int. In addition, if f has type LargeInt.int -> unit, the expression f 1 would typecheck.

In general, without additional implicit or explicit type constraints, integer literals default to type int, word literals become type word, real literals become type real, string literals become type string, and character literals become type char.

Note that, after overload resolution has determined a specific representation, literals out of range of that representation should be detected at compile time.

Overloaded functions

In addition to overloaded literals, the revised language continues to allow overloading on a restricted set of identifiers. These identifiers include the standard arithmetic and relational operators. A complete list is given in Chapter 3. As with literals, the value identifiers have a default type that is adopted in lieu of any type information supplied by the surrounding context. All overloaded value identifiers default to an int-based type except for the operator /, whose default type is real * real -> real. Thus, the following code would typecheck:

fun f(x,y) = x <= y

val x = (1 : LargeInt.int)
val y = x + 1

fun g x = x + x before ignore (x + 0w0)

with f, y and g having types int * int -> bool, LargeInt.int, and word -> word, respectively.

Imperative types

As is well-known, imperative features such as ref and polymorphism cannot be combined naively without compromising type safety. Attempts to deal with this problem, using imperative type variables or weak types, have proven unsatisfactory, both because they are complex and unintuitive, and because they violate abstraction by exposing the pure or imperative nature of a computation in its type.

The revised definition of SML adopts value polymorphism to solve this problem. Specifically, in the expression

let val x = e in e' end

x is given a polymorphic type only if e is a syntactic value, i.e., e is a constant, a variable, a lambda expression, or a record, tuple or non-ref datatype value whose component parts are all syntactic values. This solution is not upward-compatible, in that certain expressions that are valid in SML will no longer type check. However, there is evidence that this solution is quite viable in practice. Most SML programs already restrict polymorphism to values and in most cases where non-value polymorphism is used, value polymorphism can be introduced by a small syntactic change. Given the enormous simplification this change effects, value polymorphism seems like the right solution.

Structure sharing

The original definition specified a very restrictive meaning to structure sharing. While retaining the original definition of type sharing, the revised definition reinterprets structure sharing as an abbreviation for a collection of type sharing specifications on the common type names among the specified structures.

Type abbreviations in signatures

Previously, types could occur in signatures only as a simple name or as a datatype definition. Although there are technical reasons for this decision, in practice this is too restrictive. In the future, type abbreviations can occur in signatures as well as structures. There is also a where type notation, which allows a programmer to extend a signature by adding definitions for its type components.

Opaque signature matching

To increase abstraction, it will be possible to match structures against signatures such that, unless the signature specifies the definition of a type as a datatype or a type abbreviation, the representation of the type is hidden outside of the structure.

Special types and values

The boolean constructors true and false, the list constructors nil and ::, and the reference constructor ref are treated specially. They are bound at top-level in the initial environment as datatype constructors, and cannot be rebound. Effectively, this makes them additional keywords, though technically they could be used as names for types, signatures, structures or functors. Note, in addition, that the bool and list types are defined at top-level and not in any module.

Basis library incompatibilities

The SML Basis Library is largely a conservative extension of the basis described in the original definition, but there are a few points of incompatibility worth noting:

Certain exceptions have been eliminated or replaced by new exceptions.
The type carried by the Io exception has changed.
The rules for associativity of infixed operators were fixed.
Non-equality of zero length arrays is specified.
The I/O interfaces. Operations are not at top-level, and some of the functions have changed.
The semantics of overloading.
The implode and explode functions.
The types of ord and chr.
The math functions (sin, etc.) are not bound at top-level.
Real values are now explicitly IEEE floating-point with non-trapping semantics.

The initial basis described in appendices C and D of the revised definition has been pruned to the bare minimum necessary to specify the semantics of the special constants and derived forms described in the definition.

Further information on the differences between the two bases can be found in the SML90 structure.

[ INDEX | TOP | Parent | Root ]