Advanced Models and Programs F2009 ---------------------------------- sestoft@itu.dk 2009-02-18 ASSIGNMENT 4 (pointers, MicroC, JVM/CLR bytecode) There are five subproblems that should be adequately answered. In addition there are two bonus problems (4.6A and 4.7) that need not be answered. The assignment is due Wednesday 25 March 2009. You may hand in a physical document briefly describing the necessary changes in Danish or English, showing the code changes you have made, and stating your name and the assignment number. You may instead submit the resulting modified files to sestoft@itu.dk, with easily searchable comments clearly indicating what changes you have made, and what subproblem is addressed by each change. GETTING STARTED Download the following files: * MicroC.ATG, the lexer and parser specification for MicroC * MicroC.cs, the checker and compiler for MicroC * Machine.cs, the abstract stack machine * test-c.zip, an archive with illustrative MicroC programs To check that everything works out of the box, do the following: Coco -namespace MicroC MicroC.ATG -- Ignore warnings from Coco csc Micro.cs Scanner.cs Parser.cs unzip test-c.zip Micro test5.c check -- Should print "Program OK!" Micro test5.c compile csc Machine.cs Machine a.out -- Should print "6" Machine /trace a.out -- Should print trace and "6" PROBLEM 4.1: Write, compile and run MicroC programs that do the following: * Write a MicroC program answer5-1-sum.c containing a function void sum(int n, int ns[], int *sump) that returns, using the sump reference, the sum of the first n numbers of array ns. Your main function must create an array holding the four numbers 7, 13, 9, 8, call function sum on that array, and print the result. Remember that MicroC is very limited compared to C: You cannot write "int i=0;" but must write "int i; i=0;" instead; there is no for-loop (unless you implement one, see Problem 4.3); and so on. Also remember to initialize all variables and array elements; this doesn't happen automatically in Micro-C or C. * Write a MicroC program answer5-1-histogram.c containing a function void histogram(int n, int ns[], int max, int freq[]) such that after a call to histogram, freq[c] equals the number of times that number c appears among the first n elements of ns, for 0<=c<=max. You can assume that all numbers in ns are between 0 and max, inclusive. [What happens if the latter is not true?] That is, if your main function creates an array ns holding the seven numbers 1 2 1 1 1 2 0 and calls histogram(7, ns, 3, freq), then afterwards freq[0] is 1, freq[1] is 4, freq[2] is 2, and freq[3] is 0. Of course, freq must be an array with at least four elements. [What happens if it is not?] The array freq should be declared and allocated in the main method, and passed to histogram function. It does not work (in C) to allocate the array in histogram and somehow return it to the main method. Your main method should print the contents of array freq after the call. PROBLEM 4.2: Modify MicroC to allow equality comparison (==, !=) of a pointer and an integer. This is useful only for checking whether a pointer is 0 (null), by convention meaning that it does not point at anything. You only need to modify MicroC.cs. Test that this works. [Note that in Java and (safe) C#, if a pointer p is not null, then it is guaranteed to point at something useful (an object). In MicroC, C and C++ there's no such guarantee.] PROBLEM 4.3: Extend MicroC with a for-loop, permitting for instance for (i=0; i<100; i=i+1) sum = sum+i; To do this, you must modify the parser specification in MicroC.ATG. You can also extend the MicroC abstract syntax in MicroC.cs by defining a new subclass ForLoop : Statement, with suitable Check and Compile methods. But actually, with a small amount of cleverness, you do not need to introduce special abstract syntax for for-loops. Namely, a for-loop of the general form for (e1; e2; e3) stmt is equivalent to a block { e1; while (e2) { stmt e3; } } Hence it suffices to let the semantic action (. ... .) in the parser construct abstract syntax using the existing Block, While, and ExprStatement classes. Write a MicroC program answer5-3-for.c that uses your new for-loop to compute and print the sum of the numbers from 0 to 99. PROBLEM 4.4: Extend MicroC with the prefix ++ operator, so one can write ++n or ++p where n is an integer and p is a pointer. Remember that ++n is an expression, whose effect is to increment n by 1, and whose value is the value of n after the increment. (This is slightly easier to compile for our simple stack machine than postfix increment n++). You must: * slightly extend the CoCo/R lexer and parser specification in MicroC.ATG where the other prefix operators get parsed; * declare a new abstract syntax class PrefixIncrement : Expression which holds an LvalueExpression, in file MicroC.cs; * define a suitable static checking method (Check), * define a suitable compilation method (Compile). For code generation, write down a compilation scheme on paper and think about it *before* you hack away on the Compile method. Hint: you need the ADD, CSTI, DUP, LDI and STI instructions in some order. Check that this works, by writing, compiling and running a MicroC example program such as this: void main() { int sum; int i; for (i=0; i<100; ++i) sum = sum + i; write sum; int a[2]; i = -1; a[++i] = 12; a[++i] = 13; write a[0]; write a[1]; write i; } PROBLEM 4.5: Add the conditional operator e1 ? e2 : e3 to Micro-C. Its precedence is higher than that of assignment and lower than that of the logical "or" operators (||) and (|). Also, the conditional operator is right associative, so x<0 : -1 : x>0 ? 1 : 0 means x<0 : -1 : (x>0 ? 1 : 0). Type checking of e1 ? e2 : e3 should require that e1 has type bool and that the types of e2 and e3 agree. Exactly what "agree" means is up to interpretation; for instance, (b ? &x : 0) might be considered a valid expression. Or you may require e2 and e3 to have identical types. The compilation of e1 ? e2 : e3 should produce code that evaluates e2 only if e1 is true and evaluates e3 only if e1 is false. The compilation scheme should be the same as for if (e1) e2 else e3, but the expression e2 and expression e3 must leave its value on the stack top, so the entire expression e1 ? e2 : e3 leaves its value on the stack top. PROBLEM 4.6: Consider the following C# method from file Selsort.cs (there's also a corresponding Java file Selsort.java): public static void SelectionSort(int[] arr) { for (int i = 0; i < arr.Length; i++) { int least = i; for (int j = i+1; j < arr.Length; j++) if (arr[j] < arr[least]) least = j; int tmp = arr[i]; arr[i] = arr[least]; arr[least] = tmp; } } Compile it using Microsoft's C# compiler with the optimize flag: csc /o Selsort.cs and then disassemble it, saving the output to file Selsort.il: ildasm /text Selsort.exe > Selsort.il Load Selsort.il into an editor, find the declaration of method SelectionSort, and delete everything else. Now try to understand the purpose of each bytecode instruction. Write comments to the right of the instructions (or between them) as you discover their purpose. Also describe which local variables in the bytecode (local 0, 1, ...) correspond to which variables in the source code. Hand in the edited bytecode file with your comments. If you prefer the Java version, do: javac Selsort.java javap -c Selsort > Selsort.bytecode and then proceed to investigate and edit Selsort.bytecode as above. BONUS PROBLEM 4.6A, answer NOT required: Compare the speed of the Java or C# program with a similar program written in real C. Investigate the x86 machine code generated by the runtime JIT compilers for one of the above programs. BONUS *PROJECT* 4.7, answer NOT required: Modify the MicroC compiler to generate x86 symbolic assembly code that the nasm assembler (http://sourceforge.net/projects/nasm) can then assemble into an executable file for Windows, Linux or Mac. Make some benchmarks to see how much faster your MicroC programs run when compiled to assembler than when compiled to bytecode for our simple abstract machine.