ML Kit Download  |  Documentation  |  Papers  |  Bugs  |  People  |  About

Home :: Static Interpretation

Static Interpretation

A New Approach to Compiling Standard ML Modules

The Kit uses a novel approach to compilation of Modules. The Kit regards the Modules language as a linking language, which describes how pieces of code are glued together. The dynamic semantics of Modules in the Definition of Standard ML can be regarded as describing an interpreter which combines runtime environments. However, nothing in the dynamic semantics of Modules depends on runtime values. Therefore, the dynamic semantics can be used as a specification of how code fragments can be combined.

By taking this approach, the Kit is able to perform all module-level execution during compilation. Neither structures, nor signatures nor functors exist at runtime. Conceptually, the modules language is simply used for putting together pieces of declarations from the Core Language.

Functors are elaborated when they are declared, as in most ML compilers. Thus type errors in functors are caught already when the functor is declared (as opposed to when it is applied). However, when the Kit has elaborated a functor declaration, it postpones code generation for the functor till the functor is applied. Indeed, if a functor is applied to two different arguments, code for its body will be generated twice.

We refer to this approach to Module compilation as static interpretation.

Static interpretation has important advantages:

Smart Recompilation

Delaying code generation till functor application time is not feasible, unless it is integrated with a scheme for smart recompilation of modules. For an example, consider a program consisting of the following three compilation units:

     unit1:
         functor f(X: ...) = struct ... end
     unit2:
         functor g(Y: ...) = struct ... end
     unit3:
         structure A = f( ... )
         structure B = g(A)

Suppose the user makes a change to the body of functor f in unit1. Some separate compilation schemes will say that since the declaration of f has changed and since f is used in unit3, unit3 has to be recompiled as well. But if every functor application results in generation of code for the functor body in question, then recompilation of unit3 implies recompilation of the body of g - although g has not changed. More generally, if a system is constructed by first declaring a large number of functors and then applying all the functors in one compilation unit at the end, a change to one module might result in recompilation of the entire system. Clearly, this would be unacceptable.

The Kit provides a solution to this problem. To explain the solution, let us cosider the above example once more.

Since the declaration of f has changed and since f is applied in unit3, we do need to re-interpret unit3. Since the declaration of f has changed, the static interpretation of the application f(...) in unit3 results in fresh code for the body of f. However, we might be able to avoid recompilation of the body of g second time around. When the Kit first generates code for the body of g (while interpreting unit3), it stores a snapshot of the compilation environment at the entry point to the body of g. This environment is the combination of the compile time environment that existed when g was declared and the compile time environment for the actual argument to g (i.e., A).

When the Kit interprets unit3 for the second time and reaches the application g(A), it checks whether the current compiler environment enriches the compiler environment previously stored. (This enrichment check only involves the identifiers that are free in the body of g.) If the current compiler environment enriches the previously stored environment, the Kit simply re-uses the previously generated code for the application g(A).

For this to work, the Kit has to try to make the result of the second compilation of the appliation f(...) in unit3 to be the same as the result of the first compilation of that application, if possible. This is non-trivial, because compilation generates ``fresh'' names of various kinds (type names, lambda variables, region variables, and even machine code labels.) Nonetheless, the Kit matches the result of the recompilation to the result of the previous compilation and tries to make them fit.

The theory of this recompilation scheme and a proof of its correctness may be found in Martin Elsman's Ph.D. thesis.


The ML Kit is hosted by the IT University of Copenhagen mlkit@it.edu