Generic C# Sample Programs

Generic C# is an extension of the C# programming language with generic types and methods. Generic C# was developed and implemented in 2000-2003 by Don Syme and Andrew Kennedy at Microsoft Research, Cambridge UK, who also implemented the high-tech support for generics in the Microsoft Common Language Runtime (CLR). They have written a PLDI paper that describes Generic C# and the Generic CLR (Postscript and PDF).

News

Resources

Authorship and legalese

The sample programs shown here were developed by me while visiting Microsoft Research, Cambridge UK in October-December 2001.

These programs are made available here in the hope that they will be interesting to look at, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. They are presented here solely to indicate some programming styles and potential uses of generics in C#. In particular, nobody assumes any responsibility for any legal or real consequences of using these sample programs.

Overview of sample Generic C# programs

The following programs have been implemented in Generic C#:
  1. A class hierarchy representing a tiny language of typed expressions
  2. A convex hull algorithm
  3. Converting regular expressions to deterministic finite automata
  4. Several different interfaces to quicksort
  5. Generic collection classes
  6. Creating and printing an index of word occurrences
  7. A typed parser combinator library
  8. Rings, complex numbers, and polynomials
  9. Generic method memoization
  10. A tricky example due to Erik Ernst

1. Representing typed expressions

See file Phantom.cs

The example is a class hierarchy that implements a language of typed expressions in a typesafe way, that is, a typesafe embedded language. It uses type parameters in much the same way as Leijen and Meijer do, but here the type parameters are not `phantoms'; they play an obvious role as the type of the stored value in literals, and the return type of an evaluation function.

Reference

2. Finding the convex hull in the plane

See file GConvexhull.cs

An algorithm for finding the convex hull in the plane was modified to use generics instead of a specialized sorting function and a specialized linked-list class. It now uses a generic arraysort and a generic cyclic doubly-linked list class.

References

3. Converting regular expressions to deterministic finite automata

See file GNfaToDfa.cs

The program converts a regular expression to a nondeterministic finite automaton, which is converted to a deterministic finite automaton, from which it generates a script for the `dot' graph-drawing program. It uses hashsets of integers, queues of hashsets of integers, hashmaps from integer to hashmap from string to integer, and even hashmap from hashset of integer to hashmap from string to hashset of integer. Having generics really makes the program more understandable as well as safer, and eliminates many typecasts.

The typesafe generic hashset, hashmap, queue and arraylist classes are implemented simply as wrappers around the existing dynamically typed non-generic classes Queue, ArrayList, and Hashtable from System.Collections, and hence do not improve the efficiency of the algorithm, as they would if implemented in a typesafe manner from scratch.

September 2003: This has been rewritten to use the generic collection library of .Net 2.0, and to use properties, foreach, and so on.

4. A comparison of generic and non-generic quicksorts

See file Gsort.cs

We compare four different ways to implement generic quicksort (arraysort) and four different ways to implement non-generic quicksort. This experiment demonstrates the efficiency benefits of generics, besides its type safety benefits.

In all cases the algorithms were used to sort an array of one million positive pseudo-random integers (for string sorting, see below). The execution time was measured for 20 different such data sets on a 1 GHz Pentium III with 256 MB RAM, Windows XP, and CLR 1.0.3417 (Generic) in retail mode (bbt), with shared code disabled.

The results are as follows:

The straightforward non-generic approach to general sorting (store each integer in an IComparable array, and sort the array) is the slowest one, 10.7 times slower than the fastest (but non-general) sorting function. The reason seems to be that the comparison i1.CompareTo(i2) is slow, presumably because i1 and i2 get boxed even before the method is called; and this happens even when i1 and i2 are definitely known to be ints.

Wrapping instead the integers in a class OrderedInt that implements IComparable, defining its own CompareTo method by this expression

   i1 < i2 ? -1 : i1 > i2 ? +1 : 0;
is faster, only 6.6 times slower than the fastest one. The sorting function is exactly the same, so the only difference is that the integer wrapper objects carry a different CompareTo functions. Hence referring to integers through wrappers, and calling functions via interfaces, is not the major source of inefficiency. This approach still requires typecasts and thus is not typesafe.

The straightforward specialized integer sort, using an int array and the built-in integer comparison (<) is, not surprisingly, the fastest one. But it is not general; a handcrafted sorting function is required for every type that needs to be sorted.

The obvious generics version of general sorting defines a generic version IGComparable<T> of the IComparable interface, and passes an array of IGComparable<int> objects, each with a CompareTo function, to sort ints. This is as efficient as the OrderedInt : IComparable approach above (6.6 times slower than the fastest one), but still requires a typecast in CompareTo and thus is not typesafe.

Defining a different generics version IGSelfComparable<T> of IComparable and using a bounded type parameter T : IGSelfComparable<T> in the sorting function, one can avoid the typecasts and obtain a general sorting function. This is not only typesafe but also more efficient: only 5.0 times slower than the fastest one.

Taking instead an approach more similar to functional programming, where the comparer is kept separate from the values, one obtains a solution that is general, typesafe, and highly efficient: only 2.3 times slower than the fastest one. The overhead relative to the fastest (non-general) function is due only to the passing and invocation of the comparer, as was shown by measuring a non-general non-generic integer sort in which the comparer is passes explicitly.

Again taking the functional approach but now implementing the comparer as a delegate causes a slowdown; this is 4.0 times slower than the fastest one.

Here the sorting functions are summarized in the order they are mentioned above. The times are given both for all sharing disables and all sharing enabled in the generic runtime:

GeneralTypesafe GenericArraytypeComparer Time/s
No sharing Sharing
yesnonoIComparable[]IComparable4.954.99
yesnonoOrderedInt[]IComparable3.033.08
noyesnoint[]<(int,int)0.460.47
yesnoyesIGComparable[]IGComparable3.083.15
yesyesyesIGSelfComp[]IGSelfComparable2.332.45
yesyesyesT[]IGComparer1.041.14
noyesnoint[]IntComparer1.041.06
yesyesyesT[]DGComparer1.831.91

Enabling sharing (of method instances etc) in the generic runtime slows down the generic versions by 5 to 10 per cent.

Making wrapper classes sealed does not seem to affect these results.

In conclusion, only generics offer generality as well as type safety, and (much) improved efficiency: the best generic typesafe solution is 2.9 times faster than the best non-generic (non-typesafe) object-oriented solution, and 4.7 times faster than the non-typesafe solution that many programmers would naively write.

We repeated the above experiments on arrays of strings instead of integers, and sorted 200000 random strings of random length between 5 and 19, averaging execution time over twenty repetitions:

GeneralTypesafeGenericsArraytypeComparerTime/s
yesnonoIComparable[]IComparable3.18
yesnonoOrderedString[]IComparable2.58
noyesnostring[]string.Compare2.10
yesnoyesIGComparable[]IGComparable2.57
yesyesyesIGSelfComp[]IGSelfComparable2.54
yesyesyesT[]IGComparer2.19
noyesnostring[]StringComparer2.19
yesyesyesT[]DGComparer2.83

The difference in execution times is not as dramatic as for integers, reflecting that strings need to be boxed anyway, and that their comparison requires a function call anyway. But still, the only way to obtain typesafety and generality at the same time is to use generics, and they in addition provide faster execution than the usual object-based general but non-typesafe functions. The functional-style generic sort (passing the comparer explicitly) at 2.19 sec is less than 5 per cent slower than a custom string sort at 2.10, and the measurements show that the only overhead is in passing the comparer; the use of generics in itself cost nothing.

5. Generic collection classes

Note: For serious work, use the C5 generic collection library for C# instead of these rudimentary generic collection classes.

If you want to look at these rudimentary collection classes anyway, the implementation is in file GCollections.cs and the documentation is in file collections.txt.

A namespace GCollections for representing generic collection classes has been defined. It provides

The HashSet and HashMap implementations currently just provide typesafe wrappers around an instance of System.Collections.Hashtable but should be implemented from scratch.

The possibility of passing parameters by reference in C# helps writing recursive functions that modify trees in the implementation of red-black trees.

Java's and C#'s object-based collections (Maps) return null when an element is not found. In general this does not work for generic collections (e.g. when the element type is int), so we throw an exception instead. Actually, the null idea is dubious even in object-based collections (as in Java), because one may deliberately want to store a null value in a table, and subsequently cannot distinguish the presence of a null from the absence of any value at all.

Enumerators are made to resemble .NET Framework collections: they are `fail-early' so that MoveNext throws InvalidOperationException if the collection has been structurally modified (the current .NET Framework documentation does not make it clear whether only structural modifications matter, or also updates to collection elements). Also, Current throws InvalidOperationException before MoveNext has been called, and after it has returned false. (Current .NET Framework documentation does not say what Current is supposed to do after MoveNext has thrown an exception).

There is an interaction between object-oriented method overloading and generics: When instantiating Linkedlist<T> or ArrayList<T> with T:=int, there will be two Remove(int) methods. This gives a compile-time error only when one tries to call a Remove(int) method, but not when declaring it. Presumably the compiler could work out when there exists an instantiation that would cause problems.

Sadly, generic TreeSet with parameter T : IComparable is considerably slower than object-based Rbset, because Rbset boxes the integer once, before passing it to Add(IComparable), and then just invokes CompareTo on the boxed int forever after, whereas the generic TreeSet stores the integer unboxed and then boxes at every comparison (of which there are many). One should not base generic collections on IComparable; it is a performance trap. Or the JIT should inline a three-way int comparison for the expression o1.CompareTo(o2) when o1 and o2 have compile-time type int; and reduce o1.CompareTo(o2) < 0 to o1 < o2. (This is safe because Int32 is a struct and cannot be subclassed).

TreeSet<T> is implemented simply as a TreeMap<T,int> in which the int is always zero. When is GC# allows instantiation with void, possibly it could be made TreeMap<T,void>, and with luck, no storage will be set aside for the absent value.

TreeMap<K,V> works for the following four cases:

This is implemented by moving the four operations that perform comparisons (get, add, contains, del) to auxiliary TreeOps classes. These operations then need to call auxiliary static generic methods in the Node class (lbal, rbal, balleft, balrght, append).

The TreeMap generic class itself then simply contains an ITreeOps<K,V> field and the requisite public methods, that call the get, add, contains and del methods in the ITreeOps object. TreeMap has no need to look at comparisons etc.

Only the methods in the four classes implementing ITreeOps use comparisons, and they are guaranteed to have elements that implement System.IComparable or to have an explicit comparer of the right type. Hence compile-time typesafety.

This scheme causes duplication of the code for the four methods add, del, get, contains, but that seems unavoidable. At least it seems impossible to share code between the IComparer and IComparable versions.

At first it seems possible that the System.Collections.IComparer and the IComparer<T> versions could share the same source code, at the expense of wrapping up the System.Collections.IComparer as an IComparer<object>, and passing an extra type parameter to GTreeOps<K,V,T>, where T=object in the object-based case, and T=K in the generics case. But GTreeOps<K,V,T> cannot compile, because there is no implicit or explicit conversion between two type parameters K and T that are not related by a constraint, and one cannot use a naked type parameter T as a constraint on K. One could use Antonio Cisternino's trick: perform a cast via class object, as in (T)(object)k where k has type K. However, that would imply a boxing when K is a value type, and in the present context completely defy the purpose of having a type instance at the value type K.

Efficiency is preserved because all methods are non-virtual in the TreeOps classes. There is a cost for the call through the interface ITreeOps, but every operation on a collection implies at most one such call.

I have done no performance engineering of the collection implementations, but the performance of the object-based version of red-black trees (under the generics retail build, but using no generics) is comparable to that of Java 2 JDK 1.4beta2 TreeSets (slightly faster for int objects, slightly slower for string objects, but that turns out to be because CLR's string comparison is surprisingly slow; a problem that will probably be fixed).

6. Creating and printing an index of word occurrences

See file Fileindex.cs

The indexer program reads a text file and outputs a sorted list of all words in the file, and for every word, a sorted list of all the lines on which the word occurs. Using System.Text.RegularExpressions, System.IO.TextReader, and the TreeMap and TreeSet classes, this can be done very efficiently in only 23 lines of code. A text file of 400 KB or 10700 lines can be processed in ca. 6 seconds, including the writing to a file of the list of all occurrences of the 3600 different words. The presentation of the corresponding Pascal program (using records, pointers, etc) took up an entire chapter in Welsh and Elder's textbook.

7. A typed parser combinator library

See file Parsers.cs

We have defined a small typed parser combinator library, as known from functional languages, that reads from a character source. A parser combinator of type Parser<T> either returns Succ<T>(result, rest) where result of type T is the result of parsing a prefix of the source, and rest is the remainder of the source; or it returns Fail<T>(). Various auxiliary types are needed to return pairs and sums, to transform one type of parser into another etc. The generic LinkedList<T> is used to store the result of the Star<T> and Plus<T> parser combinators.

Recursive parsers are constructed using a technique that could possibly be used for other typed fixpoints, eg. for defining well-typed XML processors. Basically, a protoparser, an object containing a function that maps a Parser<T> to a Parser<T>, must be constructed first. Then a recursive Parser<T> is created as an object rp whose initialization passes the object rp itself to the protoparser's Build function, and binds the resulting Parser<T> in a field tp in the object. All calls to the Parse of rp function simply invoke tp.Parse, and tp.Parse in turn can invoke rp.Parse recursively.

As an example, a simple parser for Lisp S-expressions (trees) is built.

8. Generic memoization of methods

See file Memoization.cs

We implement function argument memoization for recursive functions in two ways:

Functional-style, where a method to be memoized must be written as a function transformer, essentially of type (A -> R) * A -> R, represented by a ProtoMethod<R,A> delegate. Given a ProtoMethod<R,A>, the memoization class Memoize<R,A> then recursively defines a function, essentially of type A -> R, as the memoized fixed-point of the function transformer, thus implementing the interface Method<R,A>.

The use of delegates permits mutually recursive memoized methods, and memoization of static as well as non-static methods.

   public interface Method<R, A> {
     R call(A a);
   }

   public delegate R ProtoMethod<R, A>(Method<R, A> method, A a);

   class Memoize<R, A> : Method<R, A> {
     private HashMap<A, R> table;
     private ProtoMethod<R, A> protomethod;

     public Memoize(ProtoMethod<R, A> protomethod) {
       this.protomethod = protomethod;
       table = new HashMap<A, R>();
     }

     public R call(A a) {
       if (table.Contains(a))
	 return table[a];
       else 
	 return table[a] = protomethod(this, a);
     }
   }
An alternative implementation, using an abstract class and subclasses, initially looks simpler, but does not permit memoization of static methods, nor of mutually recursive methods.

9. Rings, complex numbers, and polynomials

See file Polynomial.cs

Inspired by Meyer's Ring/ComplexRing/VectorRing/ComplexVector example (taken from Thorup 1997) we model rings, complex numbers and the ring of polynomials over a ring using recursive constraints (F-bounded polymorphism). This seems to work well, but there are two problems: (1) the impossibility of obtaining an element of the type described by a type parameter de novo, without already having some element to clone from; and (2) the lack of an official way to obtain a default element from a type parameter. Problem (1) exist in plain C# and Java also: constructors are not inheritable, and static methods cannot be described by interfaces or abstract classes.

Thus a ring element must have a method that permits us to obtain the zero of the ring. For this purpose it is practical to require the array cs of coefficients of a polynomial to have at least one member, but it can be avoided by more careful coding of Times (because the Zero polynomial can be represented by an empty array) and Evaluate (because an argument of type E is given, from which a zero result can be obtained if the polynomial is zero).

This example demonstrates F-bounded polymorphism, and a somewhat fancy subclass relation for Polynomial<E>.

   interface Ring<E : Ring<E>> {
     E Zero();     // Lack of constructor or static specification
     E Plus(E e);
     E Times(E e);
     E Negate();
     E Minus(E e);
   }

   struct Complex : Ring<Complex> { ... }

   abstract class RingC<E : Ring<E>> : Ring<E> { ... }

   class Polynomial<E : Ring<E>> : RingC<Polynomial<E>> { 
     // Coefficients of x^0, x^1, ...; absent coefficients are zero.
     // Invariant: cs != null && cs.Length >= 1, so cs[0].Zero() is a zero for E.
     private readonly E[] cs;  
     ...
   }

References

10. A tricky example due to Erik Ernst

We try to model Erik Ernst's typed observer pattern example. Thorup renders the example using virtual types (type members that can be specialized/further specified in subclasses). In the end I don't think it can be made typesafe, neither with virtual types nor with Generic C#.

We want Observers to listen to notifications from Subjects, but only Subjects of type SType. Subjects talk to Observers, but only to Observers of type OType:

   interface Observer {
     typedef SType as Subject;
     void notify(SType subj);
   }

   class Subject {
     typedef OType as Observer;
     OType[] observers;
     void notifyObservers() {
       foreach (OType observer in observers) 
         observer.notify(this);
     }
   }
Subclassing Observers and Subjects together, we should be able to obtain a typesafe interaction between Observers and Subjects:
   interface WindowObserver extends Observer {
     typedef SType as WindowSubject;
   }

   class WindowSubject extends Subject { 
     typedef OType as WindowObserver;
     void Resize(int w, int h) { ... notifyObservers() ... }
   }
So a class implementing WindowObserver should receive notifications only from subjects that are at least WindowSubjects, fine. However, let us further specialize these types:
   class FileWindowObserver implements WindowObserver {
     typedef SType as FileWindowSubject;
     void notify(SType fws) { 
       ... fws.ChosenFile ...
     }
   }

   class FileWindowSubject extends WindowSubject { ... }
Now it seems that with the usual type rules, an object of type FileWindowObserver can appear in the array called observers in a WindowSubject, and so it seems that a plain WindowSubject can pass itself to a FileWindowObserver, which cannot at all be the programmer's intention? Thorup does not discuss this. According to Mads Torgersen, a runtime checkcast is needed, either at every call to FileWindowObserver.notify() --- the method itself may do this checkcast --- or when storing the FileWindowObserver in the observers array. Hence it is not clear what Thorup's example demonstrates.

Here is my naive attempt to fix this type safety problem using recursive constraints in GC#:

   interface Observer<S : Subject<Observer<S>>> {
      void notify(S subj);
   }

   class Subject<O : Observer<Subject<O>>> {
     LinkedList<O> observers;
     void notifyObservers() {
       foreach (O observer in observers) 
         observer.notify(this);
     }
   }
The idea is that an observer of type O = Observer<S> listens to an observed subject of type S = Subject<O>, which may notify the observer. An observed subject of type S = Subject<O> can notify all its observers, each of which must have type O = Observed<S>, by sending itself to the observer.

But clearly (in retrospect), the above GC# declarations are not legal, because it cannot be verified that the Observer and Subject respect each other's type parameter constraints.

If the constraints were equalities (which obviously would not work for other reasons), and type equations have unique (maximal) solutions, the example should work. Namely,
given S = Subject<Observer<S>>
and O = Observer<Subject<O>>
we could conclude from

   Observer<S> = Observer<Subject<Observer<S>>>
that Observer<S> = O and so clearly
   Observer<S> : Observer<Subject<O>> 
as required in the constraint on O in Subject. Similarly for the constraint on S in Observer.

But with (:) being subtyping, we cannot reason that way.
From S : Subject<Observer<S>>
and O : Observer<Subject<O>>
we cannot conclude that

   Observer<S> : Observer<Subject<O>>
because we only know S : Subject<Observer<S>>, not S = Subject<O>. It seems that co-variant subtyping on polymorphic classes would not help either.

Reference


Peter Sestoft (sestoft@itu.dk) 2002-02-23, 2002-11-13, 2004-10-06