C briefing

(These notes are originally written by Jørgen Steensgaard-Madsen from IMM, DTU and modified by Henrik Hulgaard). This is a brief introduction to the programming language C. The objective is to provide a short introduction, covering some essential aspects, but not all. The book The C programming Language by Brian W. Kernighan and Dennis M. Ritchie is a good reference.

The presentation is strictly top-down, justified by the fact that the intended audience knows the basics of programming. It starts by telling about composition of programs from separately compiled parts. It goes on about header files with an emphasis on the notion of an environment. Next comes the details about the language syntax, where the only and few examples appear. Finally the topic of preprocessor commands is touched upon, but only sparsely.

Contents

A list of linked terms is available in the html-version

1  Program composition

C is a low-level programming language intended for writing large programs, e.g. operating systems. An essential implication is that a C program is distributed over several text files. Some text files contain declarations of variables and functions (in the standard terminology of C, not in a mathematical meaning). These text files can be compiled into object files, which can be conceived as program parts written in an even lower-level (binary) language. Eventually, object files can be linked together into a program. Object files can also be collected into libraries intended for reuse in several programs.

1.1  Program files

Program files describe allocation of variables and computations that may use these variables. The computations are mostly expressed as functions that can be called with appropriate arguments. A function declaration tells in terms of parameters how to perform a computation. The parameters are names that will denote arguments' values when the computation is actually carried out due to a function call. A function may return a value and, as usual, a function call may be used in expressions where its returned value has to be used in the evaluation of the expression.

Each program file essentially consists of a simple sequence of variable and function declarations. A name of a declaration should appear sequentially before the name is used. Function declarations may contain (local) variable declarations for use in the function's computation only, but no function declarations (except that some compilers extend C in this respect). Recursive function declarations are allowed.

A program file may refer to names declared in other program files. It can be a reference that reads a variable or assigns a value to it. Or it can be a reference that calls a function. It is common to say that the program file contains external references.

A program file may declare names intended for use in other program files. Ideally one might expect that such declarations be marked in a special way, PUBLIC say. Unfortunately the fathers of C have reversed things, so declarations not marked static may be used from other program files.

C program files typically have names ending in .c . This and other conventional name endings are important, since C-compilers react in ways that may depend on such endings.

For the time being, do not be concerned with how computations can be expressed. The mere fact that it can be done should suffice for building an adequate picture of what constitutes a program.

1.2  Compilation into object files

A binary program is a sequence of bits that is stored in a computer. A simplified picture of a compiler's task is that it transforms a program text into a binary program. This simplified view is only applicable when a program text is not split into several files that can be handled separately by the compiler.

A far better view of a compiler's task is that it transforms a program file into a sequence of bits augmented with information about the external references (which and where) and about the meaning of names that may be used from other program files. Such augmented sequences of bits are called object programs and they are stored in object files.

Object files typically have names ending in .o . Object programs are unsuitable for the human eye, but programs exist that can be used to inspect the content of object files. On computers with Unix-like operating systems you may want to use the programs called nm and objdump . Information about these programs can be found using commands man nm and man objdump . This uses the program man for which a description can be had by the command man man .

Given a program file mytest.c, a corresponding object file can be generated by the C-compiler, called cc on Unix-like systems, by the command

cc -c mytest.c

Try it out!

1.3  Linking object files into binary programs

Once all text parts of a program have been compiled into object programs the task remains to combine them into a binary program ready for execution. This task is called linking. A program specialised for this task is called a linker, but on Unix-systems it is rarely used directly. The most common way is to use the C-compiler as an intermediary.

Linking involves something like editing the bit sequences of the object files, so that external references are replaced by bits indicating the position of the referenced entities in the final binary program. It also involves putting the bit sequences of the object programs together, and in practice this involves an initial splitting of each object program into several bit subsequences, corresponding to data and code for instance.

Given program files mytest.c, mypart_A.c, and mypart_B.c that have been compiled into object files mytest.o, mypart_A.o, and mypart_B.o . They can be linked into the binary program mytest by calling the C-compiler as follows:

cc -o mytest mytest.o mypart_A.o mypart_B.o

The desired name of the program is given after the option -o . If that is missing the resulting program will be put into a file with a default name, e.g. a.out .

1.4  Libraries of object files

Splitting one's own program into several parts can be a great help, since each part might be trusted to solve a particular subproblem. Some subproblems are so common that it is nice to have solutions that can be used by several programs and several programmers. Examples are: computations of mathematical functions, input and output of various kinds, window management.

Object programs for such common subproblems are put into program library files. Most Unix-like operating systems have one kind of program library files with names of the form libxxx.a with xxx replaced by a characteristic library name, e.g. m for the standard mathematical functions.

Assume that the program text mytest.c, or another part of the intended program mytest uses the mathematical function sinh . Linking the program as illustrated above might give an error message, if sinh could not be found. Since an object program for sinh can be found in libm.a one might modify the command for linking through the C-compiler as follows

cc -o mytest -lm mytest.o mypart_A.o mypart_B.o

The characteristic name for the library of mathematical functions is m so that name is given with option -l to the compiler. The compiler is then responsible for interfacing to the actual linker.

A sceptic may ask why there is more than one library. A simple reason is that several authors might have big difficulties in synchronising their work. It is much easier with several libraries maintained separately. Another reason for having several libraries is to reduce the size of programs using libraries: a program that use a a library routine will include the entire library as part of the binary program. This may appear unbelievably expensive in terms of storage, but it is a fact of Unix-life. Some savings are associated with libraries with an internal structure that differs from the traditional one used in the libxxx.a library files, but that story is too complex to be told here.

2  Header files

A program file can only in very special cases directly use a name declared in another program file. Function calls must, for instance, have the number of arguments with their intended types as given in a declaration. A declaration of intent, with just the information needed for correct use, must be present in the program file before it is actually used. Such declaration of intents, and associated descriptions, can be collected into files, which can be included in a program file, i.e. literally replacing a command for inclusion.

Conventionally such files are called header files and they typically have names ending in .h . It seems appropriate to start a presentation of C by focusing on the language subset that is used in header files: it is smaller and provides an overview of C essentials.

2.1  Comments

Comments are included in parentheses-like symbols: /* This is a C comment */, and they cannot be nested, i.e.

/* A nested comment: /* hmmmm */ - oh my */

will cause problems, since the text - oh my */ will be seen as a non-comment.

Comments may extend over several lines. Some compilers support another comment convention in addition: // This is a comment extending from // until the end-of-line.

2.2  Header files for routines in standard libraries

A C programmer may somehow come across a name of a library routine that may be relevant. One may use the man command to obtain more information. Here is an extract from a manual page:

SINH(3)             Linux Programmer's Manual         
    SINH(3)

NAME
       sinh - hyperbolic sine function

SYNOPSIS
       #include <math.h>

       double sinh(double x);

DESCRIPTION
       The  sinh()  function  returns the hyper-
       bolic sine of x, which is defined  mathe-
       matically as (exp(x) - exp(-x)) / 2.

The synopsis provides a declaration of intent for sinh, so that the reader gets the information for calling it. Furthermore it tells that this declaration of intent may be included from a file called math.h which resides somewhere in the file system. The precise location is not important because on a correctly configured system the C compiler will know where to look for it - the information is given to the C compiler by enclosing the file name in < ... > .

All the programmer needs to do is to insert a line #include <math.h> early in any program file that refers to sinh . The declaration of intent shown in the manual page for sinh relieves the programmer for looking into the file math.h which probably contains several such declaration of intents. Of course the programmer must understand the implications of the declaration of intent. Here it may be worth noting that double is a type name for double precision floating point numbers. One of the following subsections will provide more information.

2.3  Simple values

The purpose of a header file is to establish an environment for writing part of a program. Informally speaking an environment is a mapping of names into their types, i.e. into syntactic rules for their use. A more formal characterisation will describe it as a more complex function, but pursuing that would diverge into the topic of formal semantics.

The C language is a fairly old language, so its type system is not as cleanly designed as many newer languages, especially when it comes to associating types with functions. The description of sinh might in a newer language be given as

sinh : double -> double

There are good reasons for considering a declaration of intent for a function as a typing of the function name, but that is not the original point of view in C.

2.3.1  Predefined types

The following simple values are commonly used in C

int Signed integral numbers.
float Signed floating point numbers.
char Signed integral (one byte) numbers.
double Signed (long) floating point numbers

The full story is more complex, since some modifiers like long, short and unsigned may be combined with some of the names above. Furthermore, the number of bits required to hold values of the various types may depend on the compiler.

2.3.2  Declarations

A declaration of intent for a variable to hold simple values may be

extern int i,j,k;

The semicolon is an integral part of this, not a separator. The general pattern for simple declarations of intent is:

extern type_expression declarator, declarator, ...;

The emphasis font is used to set out parts in the pattern that have to obey rules associated with the emphasised names. The example thus illustrates that a type expression can be a name, as can a declarator. A declarator will always contain a name that is described in the context. In the sequel that name will be refered to as the declarator name.

A declaration of intent is characterised by the presence of the symbol extern. A declaration of intent has no effect, in terms of storage allocation for instance. It is just a description that somewhere a proper declaration should exist. For variables a proper declaration is obtained by dropping the extern symbol.

2.4  Composite values

Like many languages C offers primitives to build structures with components to hold values. Types in C can be seen as motivated by the structure of a computer's store: values can be put next to each other in typical patterns and for a large chunk of memory one can have names that identify parts of it relative to an address for the lowest address in it, say.

Declarations of intent for simple values take a form that applies also for composite values, provided a name is associated with their type. Consequently the first subsection deals with the way type names can be introduced. Later subsections deal individually with the various ways types can be described.

2.4.1  Naming new types

Some programmers may prefer the name real for float so they may introduce real as a type name in the following way

typedef float real;

Again the semicolon is part of the construct. The general pattern for type definitions is

typedef type_expression declarator, declarator, ...;

In such a context the declarator names become type names. A type name defined in one program file cannot be referenced from another. Thus when a type definition is in a header file, each program file including the header file gets its own definition.

2.4.2  Enumerations

An example type expression for an enumeration type is

enum{red,blue,green}

and an appropriate use of it will be in a type definition like

typedef enum{red,blue,green} color_t;

The ending _t in the chosen type name is not mandatory, but it follows some programming style recommendations.

The values for named constants red, blue, and green will be their position number in the sequence, starting with 0 for the first.

There are alternative ways of introducing enumerations, but this is convenient and suffices.

2.4.3  Structures

Structures are groups of component values that need not be of the same type. An example type expression for a structure is

struct body {
  color_t color;
  enum{triangular,circular,quadratic,star} shape;
  float height, weight;
}

The type of the shape-field has been given as a composite type expression rather than introducing a type name first. It emphasises the possibility, but use of a type name would be nicer.

Here is another example with a field described by a type expression using the above:

struct stock {
  int count;
  struct body item;
}

Note that there is no field list after the name body here. It serves as a reference to another type expression as the one above. However, body is not considered a type name by itself. Again it would have been nicer to introduce a type name.

The general pattern for type expressions for structures is

struct name {
   type_expression declarator, declarator, ...;
   type_expression declarator, declarator, ...;
   ...
}

The declarator names in this context are called field names. Structures can be defined recursively, but that involves use of pointers, which will be described in a subsection later.

2.4.4  Unions

Use of unions may easily lead to errors that can be very hard to find. Programming languages depend on types to ensure that, for instance, an integer variable is not sometimes used as a real. Type checking is done during compilation, and a type checking compiler will reject programs that perform silly things as just described. Unions are for those who want to do such dangerous things despite the compiler's effort!

The pattern of a type expression for a union is just like that for structures, except that the first word is union, not struct . The meaning is, however, very different, since the `fields' of a union all have the same start position in the computer's store, whereas for a structure they have diffent start positions, chosen such that there is no danger for values to overlap in the store.

No more will be said of unions here. Do not use unions unless you really know the implications. Readers who do know will appreciate this warning.

2.4.5  Arrays

An array has components of one particular type. However, there is no type expression for arrays! A type name can nevertheless be introduced for an array, as for instance in

typedef int month_t[12];

So: a declarator can have the form declarator_name[int_constant] . A constant, n say, characterise a set of indices {0,1,...,n-1} for selecting entries.

A declaration of intent follows the general pattern:

extern month_t m;

The declaration of intent may be used in a program file that contains references to m[0], m[1], m[2], ..., m[11] - and no other index values.

A declaration of an array need not use a type name for the array:

float u[3],v[3],w[3]

This implies allocation of store for 3*3 floating point numbers.

An important design view must be mentioned here, since C differs in this respect from Pascal-like languages. The form of a declarator is intended to remind programmers of how they should use the declarator name in referring to a value of the type given by the type expression. So u[3] is a declarator to remind the programmer that u[...] is the form to be used to refer to a floating point number.

Multi dimensional arrays can be described similarly, e.g.

typedef float transformation_t[3][3][3]
extern transformation_t rot;

i.e. rot[i][j][k] is a float for i,j,k Î {0,1,2} .

Since there is no type expression for arrays, type definitions assume a special role for arrays in function declarations. More will be said later.

2.5  Pointers

The previous subsection described the notion of a declarator and illustrated the notion for arrays. Another kind of declarator is used for references to values of a given type. An example:

struct tree *root;

A declarator of the form *declarator_name reminds the programmer that the declarator name might be preceded by an asterisk to obtain a value of the type in the declaration, here a struct tree value.

So, what type has the value of root itself? It is called a pointer to the type of value given in the declaration. As for arrays there is no type expression for pointer types, but type definitions can be used to introduce names for such types:

typedef struct tree *tree_t;

and then a declaration of intent follows the originally given pattern:

extern tree_t root;

Pointers are needed to build complex data structures that can best be described as having a recursively defined type. Such a structure might be a binary tree with integer values in the internal nodes. Here is an invalid type expression:

NB: invalid example:

struct tree{
  int data;
  struct tree left,right;
}

This is invalid because there is no finite integers that solve the equation

x = 4 + 2x where x º sizeof(struct tree)

A valid type expression is

struct tree{
  int data;
  struct tree *left,*right;
}

Use of type names is nicer than complex type expressions, and it might seem impossible to use a type name for the fields left and right. This is only an apparent difficulty - the appropriate paradigm is:

typedef struct tree *tree_t;
struct tree{
  int data;
  tree_t left,right;
};

The appearance of a name after the struct symbol does not require a preceding characterisation of the name, which is a rare case since C mostly adhere to the definition before use principle. The final semicolon means that struct tree{...}; becomes a structure declaration (with no declarators).

2.6  Function prototypes

A declaration of intent for a function has the following form

extern function_prototype;

A function prototype introduces a name for a function, tells how many arguments must be given in a call, and tells for each required argument what type it must have. Finally it tells what type a returned value will have, if any.

An example of a function prototype for the sinh function has been shown earlier. Here is a pattern for function prototypes

type_expression function_name(
   type_expression declarator,
   type_expression declarator,
   ... ,
   type_expression declarator)

So: a declarator can have the form of a declarator name followed by a list of parameter descriptions. The declarator name in such a context becomes a function name. A declarator name can be omitted in a parameter description's declarator in a declaration of intent. (As mentioned earlier, the term `function' as used in C does not mean the same as in mathematics.)

3  Syntax of program files

A program file is a sequence of declaration of intents and proper declarations, possibly mixed. Most of the declaration of intents belong in header files, which should describe reusable environments. An additional kind of element in program files is preprocessor commands, of which the #include ... is an example. The preprocessor commands will be described in the next section. They will be replaced by a program called a preprocessor and the resulting text should have the structure described in this section.

One of the program files should declare a function main. A program executes as if main was called from outside the program.

3.1  Proper declarations

The previous section focused on declaration of intents. It mentioned proper declaration of variables and it should be remembered that if a variable is used corresponding to a declaration of intent then precisely one proper declaration of the variable must be given in one of the program files used to build a program.

Some proper declarations may not be intended for applications outside the program file in which they occur. This ought to be the default seen with present day eyes, but unfortunately a special indication is required. A proper declaration may be preceded by the symbol static to ensure that the declared name cannot be used from other program files. This applies for both variables and functions.

3.1.1  Variables

A proper declaration of a variable may assign an initial value. Constants may be used as initial values, but this is not all. However, a complete discussion of the various initialisers is too detailed for the present version of this introduction. Illustrations of a few common cases must suffice:

3.1.2  Functions

A proper declaration for a function can consists of a function prototype followed by a compound statement, i.e. something enclosed in braces { ... } - possibly preceded by static. A preceding extern is allowed, and it would be nice if the default behaviour of C compilers were different. As it is, such a use of extern is at least debatable. Compound statements will be presented in a following subsection.

Another (non-prototype) pattern for function declaration exists for backwards compatibility.

type_expression function_name
   (parameter_name,parameter_name,...,parameter_name)
   declaration declaration ... declaration
compound_statement

The declarations between the parameter list and the compound statement must describe the types of the parameter names. They may, of course, not contain initialisers.

3.2  Expression syntax

Expressions are built from elementary terms that may be constants, (components of) variables, or function calls. A component of a variable may be selected as in m[10]. Other forms of component selections are

x.field_name if x denotes a structure or union
y->field_name equivalent to (*y).field_name

An elementary term may also be a function call, i.e. a function name followed by a parentheses with a possibly empty, comma separated list of expressions. The elementary terms may be combined by use of operators and grouping by parentheses in the usual way.

The binary operators are

Syntax Meaning
expression + expression addition
expression - expression subtraction
expression * expression multiplication
expression / expression division
expression % expression integer remainder
expression << expression left shift the left (integer) operand
expression >> expression right shift the left (integer) operand
expression < expression less than
expression <= expression less than or equal
expression == expression equal
expression != expression not equal
expression >= expression greater than or equal
expression > expression greater than
expression & expression bitwise and
expression ^ expressionbitwise exclusive or
expression | expression bitwise or
expression && expression logical conjunction
expression || expression logical disjunction

The unary operators are

Syntax Meaning
* expression selection of value pointed to
& lvalue address of (component of) variable
- expression sign reversal
! expression 1 for a zero bit pattern, 0 otherwise
~ expression one's complement
(type_name) expression prescribing a type for whatever value

Assignments are in C considered as expressions. The value of an assignment expression is the assigned value.

Syntax Meaning
lvalue = expression conventional assignment
lvalue += expression x += y   º   x = x+y
lvalue -= expression x -= y   º   x = x-y
lvalue *= expression ...
lvalue /= expression
lvalue %= expression
lvalue >>= expression
lvalue <<= expression
lvalue &= expression
lvalue ^ = expression
lvalue |= expression
++ lvalue ++º   x = x+1 (in `units' of a type)
- lvalue -º   x = x-1 (in `units' of a type)
lvalue ++ x++  º   (x = x+1)-1
lvalue - x-  º   (x = x-1)+1

The term lvalue is used to denote the kind of expressions that may indicate the receiver of a value in an assignment. A variable name, an array element, and the dereferenced value of a pointer denote lvalues where that is expected - and they may denote stored values (also called rvalues in C) where an expression is expected! In contrast, x+1 may not denote an lvalue.

A conditional expression selects dynamically between evaluation of two subexpressions. Its form is

expression ? expression : expression

A comma may be an operator when it cannot be mistaken for a separator. Use parentheses when in doubt. Contrary to other operators it associates to the right, i.e. x,y,z is short for x,(y,z). Both operands are evaluated, and the value of the expression is the value of the right operand.

3.3  Statement syntax

The bottom level of the hierarchy of statements consists of expression statements which have the form

expression;

Once again: note that the semicolon is part of the construct, not a separator.

Statements are often grouped into compound statements:

{ declaration declaration ... statement statement ... }

i.e. a brace with a list of declarations followed by a list of statements. Either list may be empty.

The declarations of a compound statement may not contain a proper declaration of a function. Declarations preceded by the static symbol describe (variable-)names that just as well might have been declared at the outermost level. Their position in a compound statement restricts their use to within that statement only.

Any statement may be labeled by prefixing it:

identifier :

provided the identifier is unique in the program file where it occurs. Goto statements may indicate transfer of control to such a label, but only in the current program file.

Here is a table showing the other forms of statements, mostly well-known

3.4  Examples: pointer juggling and memory management

Pointer values can be obtained by use of the & operator. This is most frequently used when a function requires a pointer value, primarily in order to be able to assign to the variable pointed at:

#include <stdio.h>

static void swap(int *x, int *y){
  int aux;
  aux = *x; *x = *y; *y = aux;
}

void main(){
  int a = 100, b = -100;
  swap(&a,&b);
  printf("a = %d, b = %d\n",a,b);
}

Compiled, linked and executed this program results in the output of:

a = -100, b = 100

C has nothing like `variable parameters' as known from several other programming languages. Pointers must be used for the purposes achieved with variable parameters. An array parameter is like a pointer unless a name is used for the entire array.

Pointer values needed to build data structures like lists and trees have to be obtained differently. The following function upto that builds a list [0,1,2,...,n-1]:

#include <stdio.h>
#include <stdlib.h>

typedef struct intlist *intlist_t;

struct intlist{
  int data;
  intlist_t next;
};

intlist_t upto(int n){
  intlist_t result, elem;
  result = NULL;
  while (n--) {
    elem = (intlist_t)malloc(sizeof(*elem));
    elem->data = n; elem->next = result; result = elem;
  }
  return result;
}

For each value 0,1,2,...,n-1 a pointer value is generated in the loop of upto and saved in the variable `elem'. It is generated by calling the malloc function (malloc: memory allocation). The argument needed by malloc tells how much of the store is needed for the value that the pointer will point to. The declaration of intent for malloc cannot know the type of `elem' so its return value cannot be assigned directly. This is a case where it is needed and reasonably safe to take a value of some type and force it to be accepted as a value of a prescribed type (here intlist_t).

Memory for variables is allocated automatically, i.e. the programmer may immediately assign values to any variable. For variables declared in compound statements memory is allocated on the program stack when control reaches the statement. Otherwise it is allocated in a memory area known as the data section. Memory allocated by means of malloc belongs in what colloquially is called the heap.

Sometimes allocated memory may have to be released for possible reuse. Again this is done automatically for variables. However, for memory allocated on the heap programmers have to release it explicitly, when needed. Of course the entire memory used by a program will be available for reuse after the program terminates. As an example to illustrate releasing of memory allocated on the heap, here is a function declaration to release the memory used to represent a list:

void free_intlist(intlist_t list){
  intlist_t elem;
  while (elem = list) {
    list = list->next; free(elem);
  }
}

A common source of problems is variables holding pointer values that corresponds to memory that has been released. Once memory has been released no pointer value to it should be dereferenced.

Juggling pointers is only fun when done with great care and precision. In other words: use of pointers is error-prone. Here is a function for reversing a list in situ (i.e. without allocating new memory).

void reverse(intlist_t *list){
  intlist_t rest, elem;
  rest = *list; *list = NULL;
  /* INV: concatenate(reverse(rest),list) */
  while (elem = rest) {
    rest = rest->next; elem->next = *list; *list = elem;
  } 
  /* rest == NULL, concatenate(reverse(rest),list) unchanged*/
}

It can actually be done a little nicer by turning it into a function with a value:

intlist_t reverse(intlist_t list){
  intlist_t rest, elem;
  rest = list; list = NULL;
  while (elem = rest) {
    rest = rest->next; elem->next = list; list = elem;
  }
  return list;
}

3.5  Example: matrix multiplication

Two dimensional arrays are handled as arrays of pointers (to arrays). Standard C does requires the length of an array to be known `at compile time'. This makes it problematic to write certain library routines.

First consider a program with a routine to multiply two 3 by 3 matrices:

#define N 3
typedef float matrix[N][N];

void matmult(matrix a, matrix b, matrix c){    /* c = a x b */
  int i,j,k;    
  float s;

  for (i=0;i<N;i++)                      /* i = 0,1,...,N-1 */
    for (j=0;j<N;j++) { 
      for (s=0,k=0; k<N; k++) s += a[i][k]*b[k][j]; 
      c[i][j] = s; 
    }
} 

int main(){ 
  int p,q;
  matrix  z, 
    x = {1.5,0.0,0.0,  
	 0.0,1.5,0.0,  
	 0.0,0.0,1.5},
    y = {1.0,2.0,3.0,  
	 0.0,2.0,3.0,  
	 2.0,3.0,0.0};

  matmult(x,y,z);
  for (p=0;p<N;p++){
    for (q=0;q<N;q++) printf("%6g ",z[p][q]); 
    printf("\n");
  }
}

 

A macro, N, is used to concentrate the dependency on the particular size. Thus the need for text editing for new cases is reduced, which is an obvious advantage of macros. For the example the fixed limit is actually an advantage, since the matrices used for illustrations are given as initialised variables in the program text.

With accept of the fixed limits the program is straightforward: line 10 computes the inner product of row k in a and column k in b. Iteration over the index range 0,1,...,N-1 as expressed in the for-statements is a standard idiom of C, i.e. a pattern of expression that every C programmer should know by heart. Line 10 is a little special, since the initialisation part of the for-statement also initialises the variable s, although it has no part in the control of the iteration.

The initialisation in line 10 is actually an example of use of the comma expression: a comma separated list of expressions results in evaluation of all expressions and the value of the comma expression is the value of the last on in the list. This is a specialty, and there would be little reason to mention it, if it was not the cause of a possible confusion: do not confuse a[i,k] and a[i][k] in C.

You may now skip to the next section, or go on reading if you want to dive into a nasty aspect of C.

GNU C implements extensions to C and has means to handle arrays with lengths that are determined at run time. These extensions makes it possible to write a general matrix multiplication routine that agrees nicely with the pattern above. However, not everyone is using the GNU C compiler, so one has to be reluctant if portability is a concern.

That is not to say that a general routine cannot be written with standard C. However, it requires that programmers rely on adress arithmetic, whereby a+i as an address is equal to &a[i] if a is a pointer. It is this general view that lets a C compiler accept a[i,k] in a quite different meaning than a[i][k].

3.6  Example: coroutines

This example shows an application of a library that offers some simple coroutine management primitives. A coroutine may be conceived as a simple notion related to the more advanced notion of a process. An advanced system may consist of several processes working together in parallel. A simpler system may consist of several coroutines working together in merged execution.

void coroutine_1(void){
 ... co_yield(); ...
}
void coroutine_2(void){
 ... co_yield(); ...
}
void coroutine_3(void){
 ... co_yield(); ...
}
...

It is useful to think of a coroutine as having a private instruction counter that tells how far execution has reached within the routine. Merged execution means that only one instruction counter is modified at a time, whereas true parallel processing would allow them to be modified simultaneously. Moreover, in a system of coroutines, only one progresses and it continues, either until it terminates or until it explicitly gives up this property. The operation co_yield() in the outline above indicates how a coroutine gives up its property of progressing, ready to receive it again later.

When a coroutine gives up progressing, an administration routine determines which one is next. That routine is called a scheduler and it is not itself a coroutine. It works on a representation of coroutines which among other details holds the instruction counter. A coroutine is represented by a value that may be stored in an appropriate data structure, e.g. a list. A simple scheduler could be

static co_state_list all, current;
co_state *co_schedule(){
  if (!(current = current->next)) current = all;
  return &current->data;
}

This scheduler lets the coroutines represented have their chance in turn, shifting between them in a circular pattern. Schedulers may easily be written that operate on more complex data structures, e.g.  with priorities associated with coroutines. The possibility for users to write suitable schedulers is characteristic for this library.

Obviously the type of the values that represent coroutines is called co_state above. The name is forced according to the library of routines for handling coroutines, and so is the name co_schedule for the scheduler.

The coroutine representations need to be generated, and the library contains a routine to do so. It also contains a routine to transfer control to one coroutine, i.e. to start the system running. These can be used in the routine main as illustrated in

void main(void){
  NEW(current); current->next = all; all = current;
  current->data = co_mk_coroutine(coroutine_1,terminate,5000);

  NEW(current); current->next = all; all = current;
  current->data = co_mk_coroutine(coroutine_2,terminate,1000);

  NEW(current); current->next = all; all = current;
  current->data = co_mk_coroutine(coroutine_2,terminate,1000);

  NEW(current); current->next = all; all = current;
  current->data = co_mk_coroutine(coroutine_1,terminate,5000);

  NEW(current); current->next = all; all = current;
  current->data = co_mk_coroutine(coroutine_3,terminate,2000);

  co_initialise(&current->data);
}

The five coroutines, some of which execute according to shared subroutine definitions, are represented as data in the list. The representations are generated by the library routine co_mk_coroutine. The three arguments needed by this identifies the function definition to use, a function to be called at termination, and a number that tells how big a stack to use for the subroutines.

Every coroutine may call a function in the usual way and each such call requires space allocated on the stack. Since nothing prevents a coroutine from calling a routine that releases control (i.e. calls co_yield) one stack cannot be used for more than one coroutines. Hence, each coroutine needs a stack of its own, and in turning an ordinary function definition into the code of a coroutine the library routine needs to know how much memory to allocate for the coroutine's private stack. Note that it is the user's responsibility to ask for sufficient space for the maximum size of the stack.

When a coroutine finishes, i.e. when the corresponding function returns, it should be reflected in the scheduler's data structure. The function given as second argument to co_mk_coroutine should take care of this. Furthermore, it is required to return 0 only when the last coroutine terminates. So a possibility is

int terminate(){
  co_state_list t1,t2;
  int count;

  t2 = current;
  do {
    t1 = t2; if (!(t2 = t1->next)) t2 = all;
    /* t1==current => t2==schedule() */
    count++; 
  } while (t2 != current);


  if (current == all) all = all->next;
  else t1->next = t2->next;

  current = t1; 
  free(t2); count--;

  return count;
}

The library routines guarantees that the private stack allocated for a coroutine will be deallocated when the coroutine terminates.

The file that describes the routines is meant to be included before the text above. It has the contents shown below. The systematic use of a prefix, like co_ , is typical for libraries related to a specific application area.

typedef unsigned long *co_state;

typedef void (*co_action)(void);
typedef int  (*co_termination)(void);

/* The following routines must be defined by an application */

extern co_state *co_schedule(void);

/* The following are defined in co_lib.o */

extern void co_initialise(co_state * first);
extern co_state 
   co_mk_coroutine(co_action P,co_termination Q,int stacksize);
extern void co_yield(void);

Note the description of co_mk_coroutine. The first two parameters are pointers to functions, and such values can be given names as illustrated. The only new bit of information is that a co_state is a pointer to a long (integer). You are probably not going to use that value for anything but coroutine management, so the details can just as well be hidden.

Note also that the library not only provides facilities, it also requires a function to be defined by users. This illustrates an obvious principle for dividing responsibilities that deserves to be exploited more than it seems to be.

4  The preprocessor

Text written by programmers is not necessarily what a C compiler gets to work on. A simple example is comments. A comment may be removed literally before the text is passed to the C compiler. This and similar actions is done by a program known as a preprocessor.

Besides removing comments (and replacing each one with a single space character) the preprocessor removes occurrences of a backslash immediately followed by a newline. This allows a programmer to present long lines in a form readable by humans.

The preprocessor performs special actions on lines with # as the leftmost visible character. An identifier must follow, possibly after blank space, and it must be one of a few that distinguish a preprocessor action. One such identifier is `include' which makes the preprocessor replace the line by the contents of some file indicated after the identifier:

#include <pathname>

The preprocessor will look for a file identified by the given pathname relative to a directory found by searching in a special path. It uses the first file found this way.

#include "pathname"

The preprocessor will look for a file identified by the pathname relative to the working directory.

#define name expansion

The preprocessor will replace occurrences of the name in subsequent text by the expansion. The expansion is text of the rest of the line. It must be separated from the name by one space. Long lines can be broken into readable form by using the backslash-newline sequence. This kind of preprocessor command is often used to introduce names for constants, so that it becomes easy to change the size of tables, for instance.

#define name(parameter,parameter,...) expansion

The preprocessor replaces occurrences of apparent calls of name as a function by the expansion, after occurrences of the parameters in the expansion have been replaced by the apparent arguments.

The expansion is text of the rest of the line. It must be separated from the name by one space. No space must separate the name and the parenthesis. Long lines can be broken into readable form by using the backslash-newline sequence.

This command is tricky when one considers details about the form of the expansion. There is little reason to spend much time on learning such details before a reasonable complete understanding of C has been obtained. Here is a simple example that relates to memory allocation as shown earlier:

#define NEW(var) \
var = malloc(sizeof(*var))

A line in one of the examples might be simplified to NEW(elem); with this definition of a macro.

#if condition
#else
#elif
#endif

The preprocessor can evaluate simple conditions and include/exclude part of the text accordingly. These proprocessor commands form a simple language, but it will lead too far to go into that here.

5  Interface between C and assembly code

This section assumes some familiarity with assembly code for the Intel 386 architecture. Assemblers differ with respect to the syntax of assembly code, which complicates matters a little.

Some kinds of computations cannot be expressed in C, but that is rare. Examples typically involves access to particular registers on a given architecture. Execution speed is rarely a sufficient excuse for using assembly code. Consequently it is useful to be able to write a small part of a program in assembly code and the rest in C.

Some C compilers provide means to write inline assembly code, i.e. statements expressed in assembly code mixed with C statements. It has the advantage that a programmer can concentrate on very short sequences of assembly instructions. The disadvantages are a complex interface and code that is hard to handle with another C compiler (even for the same machine architecture).

A nicer interface can be obtained by requiring that each function be written entirely in C or assembly code. It means that a programmer should be able to write assembly code for a function that can be called from a C program file. Conversely it may imply that a programmer should be able to write assembly code to call a function written in C.

One important aspect of this is the register conventions that tell who is responsible for saving meaningful values in registers.

Caller saved registers
are those whose values must be saved before a function call, since the called function is allowed to overwrite values in the registers.
Callee saved registers
are those whose values must be unchanged when a called function returns, so if a function need to use one of these registers it must save its value and restore it before return.

Some registers are devoted to special purposes and should be used accordingly.

Here is a overview of register usage for the Intel 386 architecture with a Unix-like operating system (Minix or Linux):

Registers Classification of use
(e)ax function result register
(e)cx, (e)dx caller saved
(e)si, (e)di callee saved
(e)bx compiler dependent!
ebp frame pointer (base address for local variables)
esp stack top pointer
segment changed only by OS-functions
floating point caller saved

Treat the (e)bx register with care, i.e. save its value as both caller and callee when writing assembly code to interface to a C program.

A typical C compiler in a Unix-like OS can be called to produce assembly code. It means that a sensible way to produce a function definition in assembly is

  1. Write a function prototype in C for the routine.
  2. Produce assembly code from an approximate definition in C.
  3. Edit the assembly code to fit the requirements of the situation.

One needs to know the correspondence between C program names that are external visible and assembly code names. This depends on the C compiler: a C program name may be used literally in the assembly code, or an underscore may be prepended. Inspection of assembly code generated through use of the C compiler should reveal the details.

The assembly code produced by a C compiler has the following structure (illustrated with GNU assembler syntax):

_my_fct:
    pushl %ebp        # Store the base pointer on the stack
    movl %esp,%ebp    # Set the base pointer to the stack pointer
    subl $N,%esp push ebp	# N is the number of bytes needed for local variables

    ...               # Computation...

    leave
    ret

Parameters and local variables are addressed relative to the contents of the ebp register. It can be useful to generate assembly code that addresses parameters in otherwise useless computations, just to get the addressing details. This applies for local variables also.

6  Indexed terms

assembler interface to C assignment
data section
#define
declaration of intent
declarator
   for an array
   for a function
   for a pointer
   programmers view
   empty list
environment
extern
field names
free
heap
index
   indices
initialisation
label
lvalue
macro
malloc
memory management
   see also malloc and free
named constants
non-prototype declaration
operator
   binary
   unary
   assignment
   comma
   conditional
prototype of function
rvalue
stack
type expression
   enumeration
   structure
   union
type modifier
   long
   short
   unsigned




Maintained by: Jørgen Steensgaard-Madsen, IMM, DTU
Last modified: Jan 22, 2001


File translated from TEX by TTH, version 2.00.
On 22 Jan 2001, 15:16.