Dynamic arrays in C

(NOTE: This code has been rolled up into the open-source libxc – see https://sourceforge.net/p/libxc/wiki/Home/. While you can still use this code as is, it’s recommended to use libxc instead which has the same functionality and more).

Dynamic arrays are remarkably easy to implement in plain C with two caveats:
1. No destructors, so the user of the array has to free the array elements one item at a time (not such a problem!)
2. No copy constructors, so the user of the array has to ensure that structures with pointers are copied correctly.

I implemented the dynamic array by simply storing an array of void pointers, and only storing pointers to elements (a pointer to void can point to anything), hence this lets me have a dynamic array that can store elements of anything, even elements of different types (although this is a bad idea – unreadable code is not the goal!). You can download the code that implements this as a library.

The “array” is simply a struct like so:

struct rsarray_t {
   size_t alloc_len;   /* Store the number of allocations */
   size_t used_len;   /* Store the of cells we have thus far used */
   void **data;        /* The actual array of void pointers */
};

There are no new_array or create_array functions that the user must call. All the user has to call to append a pointer onto the end of the array is rsarray_ins_tail(array, element), where element is the new pointer to be appended to array, which is of type struct rsarray_t*.

If array is NULL, then an array is created and returned with element as the first and only element of the array (which is why there is no need for an “array_create” function). If array is not NULL, then the element is appended to the array using realloc if there is not enough space to store another element. Usage is dead-easy:

struct rsarray_t *array;
int a =0, b = 1, c = 2;
array = rsarray_ins_tail (NULL, &a);
array = rsarray_ins_tail (array, &b);
array = rsarray_ins_tail (array, &c);
/* array now contains pointers to a, b and c */

You may not always want to store many references to the variable, though. For example, at this point in the code we may need to modify the values of a, b and c and we don’t want those modifications to take effect on the values already stored in the array. To this end, I provide a function rsa_dup, which will make a duplicate of any object passed to it (with the size of the object specified by the user). The above code can then be written as:

int a = 0, b = 1, c = 2;
array = rsarray_ins_tail (NULL, rsa_dup (&a, sizeof a));
array = rsarray_ins_tail (array, rsa_dup (&b, sizeof b));
array = rsarray_ins_tail (array, rsa_dup (&c, sizeof c));
/* array now contains copies of a, b and c */
a = 42; /* Element 0 of array is still 0 */
/* Change the value of b */
*(int *)array->array[1] = 43;

The above is still a little dirty. The duplication of the variable in the insertion is bound to lead to problems, and the verbose semantics of accessing an element (seen in the last line) is simply unreadable. They can both be fixed with macros:

/* unsafe macro, 'x' should not have side-effects */
#define RSA_INS_TAIL(a,x) \
   rsarray_ins_tail (a, rsa_dup (&x, sizeof x))
#define RSA_INDEX(a,i,t)   \
   *(t *)a->data[i]
...
int a = 0, b = 1, c = 2;
array = RSA_INS_TAIL (NULL, a);
array = RSA_INS_TAIL (array, b);
array = RSA_INS_TAIL (array, c);
/* array now contains copies of a, b and c */
a = 42; /* Element 0 of array is still 0 */
/* Access the second element i.e. array[1] */
RSA_INDEX (array, 1, int) = 43;

All well and good, how do we iterate? Easy, again.

/* unsafe macro, 'x' should not have side-effects */
#define RSA_INS_TAIL(a,x) \
   rsarray_ins_tail (a, rsa_dup (&x, sizeof x))
#define RSA_INDEX(a,i,t)   \
   *(t *)a->data[i]
#define RSA_LENGTH(a)    \
   a->used_len
...
struct rsarray_t *array = NULL;
int i;
/* Insert ten elements into array */
for (i=0; i<10; i++) {
   array = RSA_INS_TAIL (array, i);
}
/* Change all the values stored in the array */
for (i=0; i<RSA_LENGTH; i++) {
    RSA_INDEX (array, i, int) = 42 + i;
}

Freeing is as simple as freeing each element (because you are certain that there are no other references to it – the memory was allocated when we appended elements using rsa_dup, so no other code should be sharing the pointer), so we can simply iterate over the array and free each of the void*. In fact, since iteration over the array is a common occurrence, we can just go ahead and create a function to do it for us, so that we never have to write the actual loop:

void rsarray_iterate (rsarray_t *a, void (*f) (void *))
{
   size_t i;
   for (i=0; iused_len; i++) {
      f (a->data[i]);
   }
}
...
struct rsarray_t * array = NULL;
[... here we append to the array ...]
/* Now we free each element */
rsarray_iterate (array, free);
/* Free the array that held all the elements */
free (array->array);
/* Free the structure that held the array */
free (array);

Because this isn’t rare either, we can put the above 3 lines into a function call (for example) rsarray_free (). Of course, once you’ve got that, you can use the resizable array to implement a host of other data structures. You can download the code that does this over here.

It was written about 8 years ago, and I’ve been using it every since (dynamic arrays are more useful than you think!). I go a lot further in the download above, as the download is actually written as a library, and uses the dynamic array as a basis to implement growable arrays, stacks and queues. Check the download for example usage, further avenues to explore, for example creating a ring-buffer, or implementing copy-construction so that composite datatypes consisting of pointers are sensibly copied when an insertion is made.

5 thoughts on “Dynamic arrays in C

  1. Hi,
    A few notes over your code:
    1) in checkspace you can safely use realloc for both cases (and actually, make it just one case) because realloc is guaranteed to act like malloc when passed a null pointer
    2) I would make a smarter increment – usually reallocations cost quite a lot because memory allocations cost a lot and resizing may mean copying the previous data. For that, a better way to increment is to double the allocated size. It’s also better because chunks of the power of two size can be easier to allocate (if the allocator uses a buddy system for allocation).
    3) Find an alternative on systems that don’t have realloc. Some embedded systems have it, some don’t.
    4) For large arrays working at the head of the array may be costy. One simple solution is to implement the storage as circular list in the preallocated space. This way adding to the front of the list becomes a constant time + (possible) reallocation time.
    I hope you don’t mind the suggestions.

  2. Thanks Dorin, all very good suggestions. I was younger (and stupider) when I first wrote this, and have in fact made some changes to it since. However, its got a liberal license, so feel free to change it and redistribute.

    • I wouldn’t say stupider. It’s very cool what you did, especially publishing your code with explanations – I am pretty sure it was helpful for some people. :)

  3. Fantastic website. Lots of useful info here. I’m sending it to some friends ans also sharing in delicious. And naturally, thanks for your sweat!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered by sweet Captcha