line art

Introduction to Algorithms and Data Structures, Episode 4

line art
Fall 2005 Course Description Schedule Resources
line art

Lecture, September 21

Subject : Radix-Sort

Text : Read CLRS 8.3—8.4 and the following paper:

Arne Andersson. Stefan Nilsson. Implementing radixsort. Journal of Experimental Algorithmics (JEA). Volume 3, 1998. Article No. 7 Year of Publication: 1998 ISSN:1084-6654.

Notice that you need to be at ITU to download the pdf from ACM. ITU has an institution-wide registration, so ACM will not ask you for a password if you download from an ITU machine. Feel free to skip section 3 on Adaptive Radix Sort while reading. Also note that the paper is accompanied by the source code of the algorithms implemented in C. The code may help you to understand intricacies of the algorithm descriptions and you get to play with it :).

Some pseudocode is available. Please report bugs, if you see any.

Prerequisites : big-O notation, counting sort, stable/unstable sorting

Time : Wednesday, 21 September, 9:00–12:00, Room 4A12

Tutorial : There will be no tutorial following this lecture (on September 28!). Work on the contest instead.

line art

Assignments

line art

The Grand IADS Programming Contest 2005 !

Task

Implement a mutation of Radix-Sort that will outperform implementations of your colleagues. You are allowed to use any modifications proposed in the paper and add your own ideas. It is important though that you algorithm remains radix-based, i.e. sorts on digits of specific radix separately. A little simplification (but also a twist) is that we will be implementing an array-based version of algorithm—not the version based on linked-lists. It is not requried that your implementation is stable.

You are supposed to sort an array of objects of type Datum. Each of such objects has a key field of type String. The array you return must contain the same objects as the input aray, but reordered so that it is sorted lexicographically on the key field i.e. "Copenhagen" comes before "Denmark", and "note" comes before "notebook".

The lexicographical ordering is induced by order of characters: use the order of ASCII codes as definitive (no special collate sequences, for those that know about them). You may assume that all the characters have codes between 0 and 255 inclusive. When creating benchmarks be careful not to introduce any special characters, as due to Java representing characters in Unicode, you may get higher codes than 255.

The strings for keys are coming from a text file. You will get a little runtime library, which will read the text files for you, create the array and start your sorting algorithm.

The fact that we sort arrays does not preclude using other data structures, like lists, during sorting. Nevertheless the intuition (which might be wrong) indicates that it is fastest to use the array directly, if an array has to be returned.

Reward and Evaluation

The winner gets a bottle of a decent red-wine and a doze of everlasting fame.

But listen now... everybody gains, not only the winner! Your solution will be evaluated. You will get feedback on its quality. If it is correct and its running time is linear in the size of the input array you will get up to 100 points as for a regular weekly assignment. You may choose not to run in the contest, and you can still get maximum points in the exam qualification evaluation, but if you run in the contest, your contest result is added to your assignment points. These bonus points are not counted as a regular assignment, so they make it easier for you to achieve the exam entry treshold.

The winner of the contest is the author (authors) of the correct program that fullfils requirements of the task and performs fastest on a series of benchmarks. Some benchmarks will be made available here.

Any student of IADS 2005E can run in the contest. You may run alone or in group. The same rules as when handing in assignments. If you run in a group everybody gets a potential bonus, but you only get one bottle of wine :).

Technicalities

You must implement your a solution as a Java class that conforms to this interface, with this definition of the Datum class. All benchmarking will be done fully automatically, so if you fail to comply with this rule, you get automatically disqualified.

As you can see in the definition of the Datum class, you will be sorting data records, with keys expressed as strings, and some unknown data attached. Of course you are supposed to preserve this unknown data, but stability is not required. Your main class file should be have a comment on top, describing, who are the group members, what is the variant of the algorithm you have used, with what improvements, etc. This comment plays an important role in assigning your bonus points.

You are not allowed to use any sorting algortihms from Java Standard Library, or to call any external sorting programs. You do not have to use (but you are allowed to use) the Datum.compareTo function in your submission. Note that this makes it possible to test/benchmark against comparison based sorting algorithms.

Do not use Java 5 features. You are allowed to assume that all characters in the input strings have codes below 255.

I will compile the your code myself, using Sun's JDK 1.4.2 with an option "-source 1.4". Code that does not compile, or throws exceptions on test cases, or returns an unsorted array is automatically disqualified (both for the reward andfor the assignment points).

Submission

The deadline for submission is Monday, October 3, 13:00. You should submit by email to Andrzej Wasowski (wasowski@itu.dk). You are expected to send only .java files (perhaps only one, but more is allowed if you need auxiliary classes).

Contest Package 0.04

Download the following jar file: Contest.jar and put in your current development directory.

We will use the DummySorter class as an example. Please compile it with:

javac -source 1.4 -classpath Contest.jar DummySorter.java

Then you can run it on the sample test (for example download ulyss10.txt) like that:

java -Xcomp -Xbatch -jar Contest.jar DummySorter ulyss10.txt 5

The contest package will try to create and run the DummySorter 5 times and measure the average running time. Each call to sort() is applied to a newly created instance of your sorter. The creation time is counted in the same way as sorting, so do not plan any expensive stuff going on in the default constructor of your sorter.

In this case (DummySorter), the contest package will report an error, instead of the time, but if your sorter actually sorts, you will get a measurement on output. I recommend using -Xcomp -Xbatch options, when benchmarking (they should increase reliabality of benchmarks).

It is illegal to try to reuse information in the sorter across the runs (i.e. the sorter class should always do the sorting, and not only sort the thing the first time and keep the result aside, just returning it for remaining 4 calls).

The content package tests whether your output is sorted, i.e. it tests lexicographical ordering on keys in the output array (using String.compareTo method) and whether the output is a permuation of the input.

Benchmarks

The following is a complete list of benchmarks used in the competition.

Benchmarks used for correctness tests: empty, nasty, short. Benchmarks used both for testing correctness and running time: ulyss10, uniq, camera, numbers1, numbers2.

The winner must correctly run on all the benchmarks. Each program will be benchmarked on all 5 efficiency benchmarks listed above. Programs will be ordered by the position they take in each benchmark and awarded points equal to the positions taken in all benchmarks. The winner is the program that is assigned least number of points.

History