## Sunday, March 7, 2010

### Honour thy compiler and thy linker

Remember how when you were learning to program, your tutor/teacher/zen programming master always told you turn on all compiler output and to make sure that you dealt with all the warnings?

It's vaguely possible that they knew what they were talking about.

In the most recent version of the code, I wrote a wrapper function to streamline the procedure for executing a set of actions. As I explained in this post, the tables for robobuilder's actions consist of Nx16 unsigned char multidimensional arrays which are in the PROGMEM area of the chip.

A multidimensional array can be expressed in two ways. Interestingly, most of the time the two types are functionally and syntactically equivalent - they only differ by the method by which they are stored in memory. However, these subtleties can make all the difference when you are performing low level operations.

The first way is as an indexed array, which stores all the data in a sequential memory block. A matrix of data declared:

int matrix[3][2];

You address the elements of the matrix by row, then by column:

matrix[2][1] = 6;

Assigns the value 6 to the last element in your matrix. Because the compiler knows that the array is supposed to be 3 rows by 2 columns, this statement is equivalent to:

((int*)matrix)[(2*COLUMNS)+1] = 6;

(Given that COLUMNS = 2. Don't forget that C arrays start at 0.)

The second way of defining a multidimensional array is as an array of pointers or an array of arrays. In this case, the blocks of memory are not necessarily arranged sequentially in memory. The following declaration:

int r0[2], r1[2], r2[2];

int* matrix[3] = {r1,r2,r3};

will create an array of three pointers, which point to the start of the row arrays. However, r0-2 may not be in sequential memory. Some compilers will make an effort to put them together, but there are no guarantees.

Elements can be referenced exactly the same way as before:

matrix[2][1] = -6;

Assigns a -6 to the last element of the array. This works because of the order of evaluation: you dereference the third pointer in the array 'matrix' first, and then address the second element of THAT array. No matter how you define your array, you can use the same syntax to manipulate it.

So far so good, but here comes the tricky part.

When you pass arrays as an argument to a function, you always do so by reference. For one dimensional arrays, it doesn't matter if you use:

void func(int * array);

or

void func(int array[]);

in either case, what is being passed to the function is a pointer to the start of the data block.

People are also introduced to the concept of a "string as an array of characters", so an array of strings has the type "pointer to a pointer of chars" (char **). Because C uses null terminated strings, this will work fine and you shouldn't receive any warnings.

All this leads to the mentality that arrays and pointers are interchangeable - but this is only sometimes true. This is exactly the trap that I fell into when I declared this function:

void runAction(unsigned char ** flashpos, ...);

Officially, this is not a problem as long as I dereference the variable flashpos correctly - after all, it is pointing to the start of an array somewhere. It points the start of a sequential Nx16 table of chars. As long as I use the correct number of dereference operators in the right order, I should be able to access it's delicious chocolatey elements.

Hence, there was no compiler error.

What I got instead was the warning: "Incompatible data types in arg 1 of blah blah blah."

Warnings are usually thrown when the compiler is letting you know that you might be doing something you didn't intend, but it's not something illegal so it figures you know what you're doing.

Foolishly, I assumed that the error was being thrown due to a signed vs unsigned conflict and went ahead and cast the argument being passed into an (unsigned char**).

Then, I proceeded to access the data like the multidimensional array that it was:

Robobob went haywire. Can anyone see what went wrong?

The problem was that the compiler didn't know how many columns the table had. It's the equivalent of me giving you a book and asking for the 4th page in the 10th column. It doesn't make any sense unless you know how to arrange the pages!

You see, it would have worked fine if I had just used the form:

But in the absence of the necessary metadata, the compiler had attempted to use the second form of referencing described above - i.e., the jth integer pointed to by the ith pointer in the variable flashpos. Which of course sent it searching for arbitrary values all over the place.

After I realised what was going on, all I had to do was simply change the declaration to:

void runAction(unsigned char flashpos[][MOTORS], ...);

Which informed the compiler of the subtype and dimensions of multidimensional array being referenced.

Always remember that warnings are there for a reason! Maybe you do know best, but the compiler is letting you know that you've done something ambiguous or potentially dangerous.