Advanced Models and Programs F2009
----------------------------------
sestoft@itu.dk 2009-02-18


ASSIGNMENT 4 (pointers, MicroC, JVM/CLR bytecode)

There are five subproblems that should be adequately answered.  In
addition there are two bonus problems (4.6A and 4.7) that need not be
answered.

The assignment is due Wednesday 25 March 2009.

You may hand in a physical document briefly describing the necessary
changes in Danish or English, showing the code changes you have made,
and stating your name and the assignment number.  You may instead
submit the resulting modified files to sestoft@itu.dk, with easily
searchable comments clearly indicating what changes you have made, and
what subproblem is addressed by each change.


GETTING STARTED

Download the following files:

 * MicroC.ATG, the lexer and parser specification for MicroC
 * MicroC.cs, the checker and compiler for MicroC
 * Machine.cs, the abstract stack machine
 * test-c.zip, an archive with illustrative MicroC programs

To check that everything works out of the box, do the following:

 Coco -namespace MicroC MicroC.ATG      -- Ignore warnings from Coco
 csc Micro.cs Scanner.cs Parser.cs
 unzip test-c.zip
 Micro test5.c check                    -- Should print "Program OK!"
 Micro test5.c compile  
 csc Machine.cs
 Machine a.out                          -- Should print "6"
 Machine /trace a.out                   -- Should print trace and "6"


PROBLEM 4.1: 

Write, compile and run MicroC programs that do the following:

 * Write a MicroC program answer5-1-sum.c containing a function 
      void sum(int n, int ns[], int *sump) 
   that returns, using the sump reference, the sum of the first n
   numbers of array ns.  Your main function must create an array
   holding the four numbers 7, 13, 9, 8, call function sum on that 
   array, and print the result.

   Remember that MicroC is very limited compared to C: You cannot
   write "int i=0;" but must write "int i; i=0;" instead; there is no
   for-loop (unless you implement one, see Problem 4.3); and so on.

   Also remember to initialize all variables and array elements; this
   doesn't happen automatically in Micro-C or C.

 * Write a MicroC program answer5-1-histogram.c containing a function 
      void histogram(int n, int ns[], int max, int freq[]) 
   such that after a call to histogram, freq[c] equals the number of
   times that number c appears among the first n elements of ns, for
   0<=c<=max.  You can assume that all numbers in ns are between 0 and
   max, inclusive.  [What happens if the latter is not true?]

   That is, if your main function creates an array ns holding the
   seven numbers 1 2 1 1 1 2 0 and calls histogram(7, ns, 3, freq),
   then afterwards freq[0] is 1, freq[1] is 4, freq[2] is 2, and
   freq[3] is 0.  Of course, freq must be an array with at least four
   elements.  [What happens if it is not?]  The array freq should be
   declared and allocated in the main method, and passed to histogram
   function.  It does not work (in C) to allocate the array in
   histogram and somehow return it to the main method.  Your main
   method should print the contents of array freq after the call.


PROBLEM 4.2: 

Modify MicroC to allow equality comparison (==, !=) of a pointer and
an integer.  This is useful only for checking whether a pointer is 0
(null), by convention meaning that it does not point at anything.  You
only need to modify MicroC.cs.

Test that this works.

[Note that in Java and (safe) C#, if a pointer p is not null, then it
is guaranteed to point at something useful (an object).  In MicroC, C
and C++ there's no such guarantee.]


PROBLEM 4.3:

Extend MicroC with a for-loop, permitting for instance 
   for (i=0; i<100; i=i+1)  
     sum = sum+i;
To do this, you must modify the parser specification in MicroC.ATG.
You can also extend the MicroC abstract syntax in MicroC.cs by
defining a new subclass ForLoop : Statement, with suitable Check and
Compile methods.  

But actually, with a small amount of cleverness, you do not need to
introduce special abstract syntax for for-loops.  Namely, a for-loop
of the general form

   for (e1; e2; e3) 
     stmt

is equivalent to a block 

   { 
     e1; 
     while (e2) { 
       stmt 
       e3; 
     } 
   }

Hence it suffices to let the semantic action (. ... .) in the parser
construct abstract syntax using the existing Block, While, and
ExprStatement classes.

Write a MicroC program answer5-3-for.c that uses your new for-loop to
compute and print the sum of the numbers from 0 to 99.


PROBLEM 4.4:

Extend MicroC with the prefix ++ operator, so one can write ++n or ++p
where n is an integer and p is a pointer.  Remember that ++n is an
expression, whose effect is to increment n by 1, and whose value is
the value of n after the increment.  (This is slightly easier to
compile for our simple stack machine than postfix increment n++).

You must:

 * slightly extend the CoCo/R lexer and parser specification in 
   MicroC.ATG where the other prefix operators get parsed;
 * declare a new abstract syntax class PrefixIncrement : Expression 
   which holds an LvalueExpression, in file MicroC.cs;
 * define a suitable static checking method (Check),
 * define a suitable compilation method (Compile).

For code generation, write down a compilation scheme on paper and
think about it *before* you hack away on the Compile method.  Hint:
you need the ADD, CSTI, DUP, LDI and STI instructions in some order.

Check that this works, by writing, compiling and running a MicroC
example program such as this:

void main() {
  int sum;
  int i;
  for (i=0; i<100; ++i) 
    sum = sum + i;
  write sum;
  int a[2];
  i = -1;
  a[++i] = 12;
  a[++i] = 13;
  write a[0];
  write a[1];
  write i;
}


PROBLEM 4.5:

Add the conditional operator e1 ? e2 : e3 to Micro-C.  Its precedence
is higher than that of assignment and lower than that of the logical
"or" operators (||) and (|).  Also, the conditional operator is right
associative, so x<0 : -1 : x>0 ? 1 : 0 means x<0 : -1 : (x>0 ? 1 : 0). 

Type checking of e1 ? e2 : e3 should require that e1 has type bool and
that the types of e2 and e3 agree.  Exactly what "agree" means is up
to interpretation; for instance, (b ? &x : 0) might be considered a
valid expression.  Or you may require e2 and e3 to have identical types.

The compilation of e1 ? e2 : e3 should produce code that evaluates e2
only if e1 is true and evaluates e3 only if e1 is false.  The
compilation scheme should be the same as for if (e1) e2 else e3, but
the expression e2 and expression e3 must leave its value on the stack
top, so the entire expression e1 ? e2 : e3 leaves its value on the
stack top.


PROBLEM 4.6:

Consider the following C# method from file Selsort.cs (there's also a
corresponding Java file Selsort.java):

  public static void SelectionSort(int[] arr) {
    for (int i = 0; i < arr.Length; i++) {
      int least = i;                                      
      for (int j = i+1; j < arr.Length; j++)
        if (arr[j] < arr[least])
          least = j;
      int tmp = arr[i]; arr[i] = arr[least]; arr[least] = tmp;
    }
  }

Compile it using Microsoft's C# compiler with the optimize flag:
   csc /o Selsort.cs
and then disassemble it, saving the output to file Selsort.il:
   ildasm /text Selsort.exe > Selsort.il

Load Selsort.il into an editor, find the declaration of method
SelectionSort, and delete everything else.  Now try to understand the
purpose of each bytecode instruction.  Write comments to the right of
the instructions (or between them) as you discover their purpose.
Also describe which local variables in the bytecode (local 0, 1, ...)
correspond to which variables in the source code.  Hand in the edited
bytecode file with your comments.

If you prefer the Java version, do:
   javac Selsort.java
   javap -c Selsort > Selsort.bytecode
and then proceed to investigate and edit Selsort.bytecode as above.


BONUS PROBLEM 4.6A, answer NOT required:

Compare the speed of the Java or C# program with a similar program
written in real C.  Investigate the x86 machine code generated by the
runtime JIT compilers for one of the above programs.


BONUS *PROJECT* 4.7, answer NOT required:

Modify the MicroC compiler to generate x86 symbolic assembly code that
the nasm assembler (http://sourceforge.net/projects/nasm) can then
assemble into an executable file for Windows, Linux or Mac.  Make some
benchmarks to see how much faster your MicroC programs run when
compiled to assembler than when compiled to bytecode for our simple
abstract machine.