GENERIC COLLECTIONS FOR GENERIC C# ================================== sestoft@itu.dk * 2001-12-02, 2003-08-11 [NOTE added 26 July 2004: The collection library C5 is a redesigned and vastly extended and improved descendant of this library; get it from http://www.itu.dk/research/c5/] This is an implementation of generic collections for Generic C#. Only the most basic operations are provided, but they have the types and the asymptotic complexity one would expect. They have not been systematically tested yet. The general design is inspired by that of object-based collections in Java 1.2, which is more complete and has more convenient names than System.Collections in the .NET Framework. This may be awkward for non-technical reasons, but has the virtue that Java programmers can use the present interfaces and classes with little effort. Some changes have been made, partly to exploit C#-specific features such as properties and indexers, and partly because in generic collections one cannot use the return value null to signify the absence of a result. We first give an overview of the extent of the implementation and then describe the contracts imposed by interfaces, and the asymptotic complexity of the implementations, in more detail. 1. OVERVIEW OF THE COLLECTION LIBRARY ===================================== All these are public members of the namespace GCollections. Enumeration (iteration) ----------------------- IEnumerator produces T values; supports the foreach statement; IEnumerable produces an IEnumerator; Collections ----------- ICollection a collection of T elements; extends IEnumerable Lists, stacks and queues ------------------------ IList a collection that preserves insertion order; extends ICollection LinkedList a doubly-linked list implementation; suitable for implementing a queue; implements IList ArrayList an array-based list implementation; suitable for implementing a stack; implements IList Sets ---- ISet a duplicate-free collection of T elements; extends ICollection ISortedSet a set whose elements are ordered; extends ISet HashSet an unsorted hashtable implementation of sets; implements ISet TreeSet a binary tree implementation of sets; implements ISortedSet OTreeSet a binary tree using object-based T : IComparable; extends TreeSet GTreeSet a binary tree using generic T : IComparable; extends TreeSet Maps (dictionaries) ------------------- IMap a map from keys K to data values V; extends ICollection< MapEntry > ISortedMap a map whose keys are ordered; extends IMap { } HashMap an unsorted hashtable implementation of maps; implements IMap TreeMap a sorted binary tree implementation of maps; implements ISortedMap OTreeMap a binary tree using object-based K : IComparable; extends TreeMap GTreeMap a binary tree using generic K : IComparable; extends TreeMap MapEntry a struct that is a pair of a key and a value Comparison ---------- IComparer three-way comparer of two T values IComparable three-way comparison to a T value 2. DETAILED SPECIFICATION OF THE INTERFACES AND CLASSES ======================================================= The collections are not thread-safe. The IEnumerator interface (iteration) ---------------------------------------- public interface IEnumerator : System.Collections.IEnumerator { T Current { get; } bool MoveNext(); void Reset(); } Current returns the current element of the enumerator; throws InvalidOperationException before the first call to MoveNext, after a call to Reset, and at the end of the enumeration (when MoveNext has returned false). MoveNext() advances the enumerator; returns true in case there was an element to advance to, false if there were no more elements. Reset() resets the enumerator to before the first element. The IEnumerable interface (iteration) ---------------------------------------- public interface IEnumerable : System.Collections.IEnumerable { IEnumerator GetEnumerator(); } GetEnumerator() returns an enumerator. The enumerator adheres to the .NET conventions. Thus an enumerator created from a collection fails early (by MoveNext throwing InvalidOperationException) if the underlying collection has been (structurally) modified. Creating an enumerator from a collection is a constant time operation. The ICollections interface ----------------------------- public interface ICollection : IEnumerable { int Count { get; } } Count returns the number of elements in the collection. This is a constant-time operation. All collection classes override the methods GetHashCode() and Equals(object) inherited from class object, calling the GetHashCode() and Equals(object) methods of their elements as further specified below. The IList interface ---------------------- public interface IList : ICollection { bool Add(T item); bool Add(int i, T item); T Remove(); T RemoveAt(int i); T Remove(T item); bool Contains(T item); // using Equals from object T this[int index] { get; set; } } An IList is a collection of T elements that preserves insertion order. The implementing classes LinkedList and ArrayList are well-suited for implementing double-ended queues (fifo buffers) and stacks (lifo buffers) respectively. Add(t) adds t to the end of the list; this is a constant-time operation. Returns true. Add(i, t) inserts t before position i in the list, counting from 0. Returns true. Throws IndexOutOfRangeException unless 0<=i<=Count. Remove() removes and returns an element of the list. Generally takes the elements that is cheapest to remove: the first element of a LinkedList, the element of an ArrayList. Take constant time. Throws IndexOutOfRangeException if the list is empty. RemoveAt(i) removes and returns element number i in the list, counting from zero. Throws IndexOutOfRangeException unless 0<=i or if does not have Count elements. Otherwise, evaluates x.Equals(y), drawing elements x from this list and y from lst2 in order, until it gives false or there are no more elements. Returns false in the former case, and true in the latter. GetEnumerator() returns an enumerator over the list, which produces the elements in their list order. The LinkedList class ----------------------- public class LinkedList : IList { ... public LinkedList(); public bool AddLast(T item); public bool AddFirst(T item); public bool RemoveFirst(); public bool RemoveLast(); } A LinkedList is an implementation of IList as a doubly-linked list. It is well-suited for implementing double-ended queues (fifo buffers), because Add inserts elements at the end, and Remove takes them off the front of the list. new LinkedList() returns an empty list. Add(t) adds t to the end of the list. Takes constant time. AddLast(t) is the same. Add(i, t) inserts t before position i in the list, counting from 0. Throws IndexOutOfRangeException unless 0<=i<=Count. Takes time O(min(i, Count-i)). AddFirst(t) is Add(0, t). Takes constant time. Remove() removes and returns the first element of the list; throws IndexOutOfRangeException if the list is empty. Takes constant time. RemoveFirst() is the same. RemoveAt(i) removes and returns element number i in the list, counting from zero. Throws IndexOutOfRangeException unless 0<=i class ---------------------- public class ArrayList : IList { ... public ArrayList(); public bool AddLast(T item); } An ArrayList is an implementation of IList using an underlying array. It is well-suited for implementing stacks (lifo buffers), because Add inserts elements at the end, and Remove takes them off the end also. new ArrayList() returns an empty list. Add(t) adds t to the end of the list. Takes constant time. AddLast(t) is the same. Add(i, t) inserts t before position i in the list, counting from 0. Throws IndexOutOfRangeException unless 0<=i<=Count. Takes time O(Count-i). Remove() removes and returns the last element of the list; throws IndexOutOfRangeException if the list is empty. Takes constant time. RemoveAt(i) removes and returns element number i in the list, counting from zero. Throws IndexOutOfRangeException unless 0<=i interface --------------------- public interface ISet : ICollection { bool Add(T item); // return true if item was added T Remove(T item); // return removed item bool Contains(T item); } An ISet is a duplicate-free collection of T elements. The concept of element equality (and thus the notion of `duplicate') depend on what kind of set is considered. A HashSet uses the element's Equals method inherited from object. A SortedSet uses some form of element ordering (three-way comparison). Iteration over the collection is done in some order determined by the representation; for ISortedSets it is in element order. Add(t) attempts to add t to the set. Returns true if t was added, false if t was in the set already. Remove(t) attempts to remove t from the set. If there is some element x in the set for which x.Equals(t), returns x; otherwise throws ElementNotFoundException. Contains(t) returns true if there is some element x in the set for which x.Equals(t); otherwise returns false. Equals(set2) returns false if set2 is not an ISet or if set2 does not have Count elements. Otherwise checks that every element of this set is in set2; returns true if so, otherwise false. GetHashCode() returns the sum of the hashcodes of the set's elements. The HashSet class -------------------- public class HashSet : ISet { ... } A HashSet relies on the GetHashCode() and Equals(object) methods of the element type T. These must satisfy x.Equals(y) implies x.GetHashCode() = y.GetHashCode() as usual. Methods Remove and Contains take constant time. Method Add takes amortized constant time. Note that because these object-based functions are used, HashSet will be no more (and no less) efficient for primitive types than object-based hashtables. In fact, currently the implementation uses the .NET Framework System.Collections.Hashtable. The ISortedSet interface --------------------------- public interface ISortedSet : ISet { } An ISortedSet is a set whose elements T are ordered, either because T implements object-based IComparable (OTreeSet), or because T implements IComparable, or because an explicit comparer method was provided for T when the set was created. Enumeration of an ISortedSet produces its elements according to the ordering on T. The IComparer interface -------------------------- public interface IComparer { int Compare(T v1, T v2); } Compare(v1, v2) must return a negative number if v1 is less than v2, zero if v1 equals v2, and a positive number if v1 is greater than v2. This is a typed version of System.Collections.IComparer. The IComparable interface ---------------------------- public interface IComparable { int CompareTo(T that); } v1.CompareTo(v2) must return a negative number if v1 is less than v2, zero if v1 equals v2, and a positive number if v1 is greater than v2. This is a typed version of System.IComparable. The TreeSet class -------------------- public class TreeSet : ISortedSet { public TreeSet(System.Collections.IComparer comparer); public TreeSet(IComparer comparer) ... } A TreeSet is an ISortedSet implemented using red-black balanced ordered binary trees. Operations Add, Remove and Contains all take time O(log n) where n is the size of the set. The elements T must be ordered. The element ordering can be specified in one of four different ways: 1. Type T implements object-based System.IComparable. Use subclass OTreeSet 2. Type T implements generic IComparable. Use subclass GTreeSet> 3. An explicit object-based comparer System.Collections.IComparer is given when the set is created. Use the TreeSet constructor TreeSet(System.Collections.IComparer) 4. An explicit generic comparer IComparer is given when the set is created. Use the TreeSet constructor TreeSet(IComparer) The IMap interface ----------------------- public interface IMap : ICollection< MapEntry > { bool Add(K key, V val); MapEntry Remove(K key); V this[K key] { get; set; } bool Contains(K key); } An IMap is a mapping (dictionary) from keys of type K to values of type V. A pair (k,v) in a mapping is called an entry. A map cannot contain duplicate entries (k, v1) and (k, v2) for the same key k. The concept of key equality (and thus the notion of `duplicate') depend on what kind of map is considered. A HashMap uses the key's Equals method inherited from object. A SortedMap uses some form of key ordering (three-way comparison). Count returns the number of entries in the map. This takes constant time. Add(k, v) adds the entry (k,v) to the map if it is not there already, in which case it returns true; otherwise it returns false and leave the map unchanged. Remove(k) attempts to an entry with key k from the map. If there is some entry (x,y) in the map for which x equals k, returns (x,y); otherwise throws ElementNotFoundException. Contains(k) returns true if the map contains an entry (x,y) for which x equals k, otherwise false. map[k] returns the value associated with key k if any, otherwise throws ElementNotFoundException. (The indexer is used as rvalue). map[k] = v replaces the value associated with key k with v, if any value was associated with k; otherwise adds the entry (k,v) to the map. (The indexer is used as lvalue). GetHashCode() returns the sum of the hashcodes of the entries in the map. Equals(map2) returns false if map2 is not an IMap or does not have Count entries. Otherwise checks that every entry in this map is in map2; returns true if so, otherwise false. GetEnumerator() returns an enumerator that produces all the map's entries, as MapEntry structs. The MapEntry struct ------------------------ public struct MapEntry { public MapEntry(K key, V val); public K Key { get; } public V Value { get; } } A MapEntry is a pair of a key and a value. new MapEntry(k,v) produces the entry (k,v). Key is the key k in an entry (k,v). Value is the value v in an entry (k,v). The HashMap class ---------------------- public class HashMap : IMap { ... } A HashMap relies on the GetHashCode() and Equals(object) methods of the key type K. These must satisfy x.Equals(y) implies x.GetHashCode() = y.GetHashCode() as usual. Methods Remove and Contains take constant time. Method Add takes amortized constant time. Note that because these object-based functions are used, HashMap will be no more (and no less) efficient for primitive types than object-based hashtables. In fact, currently the implementation uses the .NET Framework System.Collections.Hashtable. The ISortedMap interface ----------------------------- public interface ISortedMap : IMap { } An ISortedMap is a map whose entries are ordered, either because the keys K implement object-based IComparable (OTreeMap), or because K implements IComparable, or because an explicit comparer method was provided for K when the map was created. Enumeration of an ISortedMap produces its entries according to the ordering on T. The TreeMap class ---------------------- public class TreeMap : ISortedMap { public TreeMap(System.Collections.IComparer comparer); public TreeMap(IComparer comparer); ... } A TreeMap is an ISortedMap implemented using red-black balanced ordered binary trees. Operations Add, Remove, Contains, map[k] and map[k] = v all take time O(log n) where n is the number of entries in the map. The keys K must be ordered. The key ordering can be specified in one of four different ways: 1. Type K implements object-based System.IComparable. Use subclass OTreeMap 2. Type K implements generic IComparable. Use subclass GTreeMap> 3. An explicit object-based comparer System.Collections.IComparer is given when the map is created. Use the TreeMap constructor TreeMap(System.Collections.IComparer) 4. An explicit generic comparer IComparer is given when the map is created. Use the TreeSet constructor TreeMap(IComparer)