# Performance and Test, Spring 2007: Episode 2.1

## Lecture Wednesday, February 7, 13:30—15:30, Room 2A14

Subject : Array searching & Merge-Sort

Text : RS 2.6, 6.0, 6.1 pp. 265—270 (p. 266 is very important for understanding how all the other sorting algorithms are described in RS, while you can safely skip p. 267), RS 6.3—6.4, 8.0—8.1, 8.3, 8.6 [30pp, 3hr30min]

Prerequisites: asymptotic complexity notions from the first lecture.

Comments: We shall introuduce several fundamental algorithms and analyze them using the apparatus from the previous lecture. It is essential that after studying this chapter you have an excellent grasp of linear search, binary search, merge, and merge-sort, including their analysis.

## Exercises, Friday, February 9, 13:00—16:00, Room 2A14

Some of the following exercises will be discussed at the exercise session:

1. RS exercise 2.47 p. 58 [note a typo in the book: instead of (alpha * N) succesful searches assume (alpha * 100%) percentage of succesfull searches]

2. RS exercise 8.1 p. 350

3. RS exercises 8.9, 8.10, 8.11 p. 356

4. Design an efficient algorithm that for a given array of integer numbers would find two numbers a and b contained in it that minimize |a-b| (so find the two closest numbers in a sequence of numbers).

5. Consider `SelectionSort` (see Program 6.12 in RS p. 284). We want to run this algorithm on a hardware architecture where writing to memory is much more expensive than reading (for example: Java Smart Card or anything with a combination of regular memory and flash memory). Assume that reads and writes to regular variables are very fast (negligable cost). Reads from the `a` array are also very fast (negligable cost), but writes to the array are very expensive. Assume that the cost of a single write is given by a constant time w.

1. Calculate the expected running time of selection sort when sorting an array of N elements under the above assumptions.
2. Calculate the expected running time done by insertion sort (Program 6.13 p. 286 in RS) on an array of N elements under the above assumptions (use information given on p. 290 in RS).
3. Which of the two algorithms is more suitable in this situation?

The solutions to the following two exercises should be handed in to the course pigeon hole, before Wednesday, February 14, 13:30 [expected time 2h]

1. Given two sets A and B represented as sorted linked lists describe an efficient algorithm for computing A sym-diff B, which is the set of elements that are in A or in B, but not in both. [after C-11.2 GT p. 534]

2. Consider selection sort again. We run it on a machine with a memory cache. On this machine the variables are kept in registers, but the arrays are kept in main memory and are accessed via the cache. The cache has space enough for B consecutive objects of array a.

If the cell accessed by the algorithm is already in the cache the access time is fast, but if the cell is not in the cache, the whole current contents of cache is discarded and B consecutive objects starting with the accessed cell are loaded from memory to the cache. This is a very slow operation.

Assume that a0 is the running time of a single loop iteration if there is no cache miss (ideal conditions) and that b0 is the time it takes to reload the cache. Find the formula describing the total cost of running selection sort parameterized by these two constants.

Now assume that a is less than 10-3b. Discuss for what values of B and N it is justifiable to only take into account cache misses when predicting the running time of an algorithm.