Array Bounds Checking in C

Most languages (Java, C#, Python, Javascript) prevent the programmer from going past the end of an array. This process, performed at runtime by the language implementation, is frequently called Bounds Checking.

Unfortunately in C there is no way for the programmer to determine the size of an array at runtime (not quite true, as I’ll show below) and so it becomes necessary for the programmer to keep track of the length of the array. In other languages, one can declare an array and then at runtime get the length of that array. For example, in Javascript the programmer can get the length of an array using the length property of the array:

for (var i=0; i<array.length; i++) {
   // Do something with array[i]
}

In C the programmer is required to keep track of the length of the array. Failure to keep array references to within the array result in segfaults, protection faults, etc. Most programmers define the array length somewhere.

#define ARRAY_LENGTH     (25)
for (size_t i=0; i<ARRAY_LENGTH; i++) {
   // Do something with array[i]
}

This becomes a problem if the array length changes and the programmer didn’t reflect that change in the #define ARRAY_LENGTH. A quick fix is to use the sizeof operator as follows:

int array[] = {10, 20, 30, 40, 50, 60, 70, 80};
for (size_t i=0; i<sizeof array / sizeof array[0]; i++) {
   // use array[i]
}

The evaluation of (sizeof array) is the total number of bytes that the array uses. The evaluation of (sizeof array[0]) is the number of bytes that array[0] uses. Dividing the total number of bytes that the array uses by the number of bytes that each element uses gives you the number of elements in the array. It works for all data types, including pointers, function pointers, structs, etc.

struct foo_t {
   int a; float b; char *c;
};
struct foo_t array[] = {
   {10, 12.12, "First Item"},
   {20, 23.34, "Second Item"},
   {30, 34.12, "Third item"},
   {40, 54.32, "Fourth Item"},
};
for (size_t i=0; i<sizeof array / sizeof array[0]; i++) {
   // Use array[i].a and array[i].b and array[i].c
}

The problem comes in with function calls, and passing your array as arguments to other functions. In C all arrays decay to pointers when passed as an argument. Thus the called function only gets a pointer to the first element. For example in the following snippet function func_foo() gets the array array as a pointer a foo_t struct, hence the sizeof trick I used above will not work:

struct foo_t {
   int a; float b; char *c;
};
struct foo_t array[] = {
   {10, 12.12, "First Item"},
   {20, 23.34, "Second Item"},
   {30, 34.12, "Third item"},
   {40, 54.32, "Fourth Item"},
};

void func_foo (struct foo_t array[])
{
   // (sizeof array) = the size of the pointer to a struct foo_t (4)
   // (sizeof array[0]) = the size of a struct foo_t (12)
   // Therefore the length will be calculated as (4 / 12), which
   // is zero using integer division. In this case, the loop won't
   // run at all!
   for (size_t i=0; i<sizeof array / sizeof array[0]; i++) {
      // Use array[i].a and array[i].b and array[i].c
   }
}

The problem is that a parameter of the form (type name[]) is turned into (type *name). A common workaround is to simply have all your arrays terminated by a special character. If you have an array of integers use INT_MAX (from limits.h) as the special value to signify end of array. For all pointers use NULL (if NULL can legitimately appear in your array then use (NULL -1) instead). Thus your array iteration code will look like this.

int array[] = {10, 12, 13, 14, 15, INT_MAX};
for (size_t i=0; array[i]!=INT_MAX; i++) {
   // Use array[i]
}

Same with pointers of any type:

char *array[] = {
   "One", "Two", "Three", "Four", NULL,
};
for (size_t i=0; array[i]; i++) {
   // Use array[i]
}

It even works across function calls: you can pass this array to functions and not have to worry about passing along a separate length value. The function has to know how to check for end-of-array.

struct foo_t {
   int a; float b; char *c;
};
struct foo_t array[] = {
   {10, 12.12, "First Item"},
   {20, 23.34, "Second Item"},
   {30, 34.12, "Third item"},
   {40, 54.32, "Fourth Item"},
   {0, 0.0, NULL},
};

void func_foo (struct foo_t array[])
{
   // Because each element is a struct, the best convention to
   // represent end-of-array is to use all NULL or zero values
   // for all fields.
   for (size_t i=0; array[i].a && array[i].c; i++) {
      // Use array[i].a and array[i].b and array[i].c
   }
}

Obviously for that last code snippet you can make your convention be “The last field is NULL”, or “All the fields are NULL or Zero”. Personally, I use NULL wherever possible.

The only drawback is that you cannot have an element of the array that corresponds to your end-of-array terminator.I haven’t encountered this problem yet though … since I started doing this about 15 years ago … so you should be safe using this convention.