Implementing Opaque Datatypes


Posted by Lelanthran
2017-12-09

An opaque data type enforces encapsulation by preventing the internal details of the data type from being revealed to the caller. This article demonstrates the general implementation of an Opaque Data Type.

The opaque data type1 is a composite data type in which the fields are hidden from the caller. Unlike the private fields in C++ in which private fields can be seen by the user but not used by the user, an opaque data type prevents the user from even seeing the fields. The only way for the user to see the private fields is for the implementation to be distributed to the users in source form.

Enforcing the inaccessibility of private fields within a composite data type has a number of benefits. It prevents Leaky Abstractions, enforces encapsulation, simplifies maintenance and encourages unit-testing by forcing the data type into a single unit.

This article implements a simple data type to represent a user’s login details for some hypothetical system. Each user_t datatype will have at least the following fields: - username - email - salt - phash (Hash of the password and salt)

The operations available to the caller will be the following: - create - delete - set_name and get_name - set_name and get_email - set_name and get_salt - set_name and get_phash

Firstly, we write a header file called user.h containing the public definition of the user data type:

   typedef struct user_t user_t;

All that does is inform the compiler that a struct user_t (to be defined further on) exists. At this point all the compiler knows is that the structure will exist. In the same header we list the operations. For now we’ll stick only to creation/destruction of the user_t object.

   user_t *user_create (void);
   void user_delete (user_t *user);

The compiler can, and will, happily pass around pointers to a user_t object even if it doesn’t yet know what the actual structure looks like. The caller (user) of this data type will be unable to access the internal fields of the structure but will be able to pass around pointers to the structure. You can pass around, declare and assign to pointers all day long without needing anything more than a typedef to tell the compiler that it is a type, but if you want to dereference the pointer to get at the actual object then the compiler is going to need the actual type to dereference.

Right now we only have a header file, user.h, but the actual implementation of the user_t data type will be in an implementation file called user.c and to complete the module properly we’ll also create a program to test our new data type, main.c.

In fact, now is a good time to write the unit-testing program. A program that tests the user data type object will make it easier to test the data object as we develop each piece. Out main program, main.c looks like this:

#include <stdio.h>
#include <stdlib.h>

#include "user.h"

bool user_test (void)
{
   // Note that we can declare a pointer to a structure even though we
   // don't know what the structure looks like internally.
   user_t *user;

   user = user_create ();

   // We can also pass around the pointer to the structure even though the
   // structure definition itself is yet to be created.
   user_delete (user);

   return true;
}

int main (void)
{
   bool result = user_test ();

   printf ("User Test: %s\n", result ? "passed" : "failed");

   return result ? EXIT_SUCCESS: EXIT_FAILURE;
}

Compiling that main program results in a linkage error; this is because we only declared those two functions to the compiler in the header, effectively telling the compiler “Hey GCC, these two functions exist”, but we didn’t actually create them.

   $ gcc -o main.elf main.c
   /tmp/ccsLSBeA.o: In function `user_test':
   main.c:(.text+0x9): undefined reference to `user_create'
   main.c:(.text+0x19): undefined reference to `user_delete'
   collect2: error: ld returned 1 exit status

To fix the above errors we need to create the two functions in the user.c implementation file.

#include <stdlib.h>
#include <string.h>

#include "user.h"

user_t *user_create (void)
{
   user_t *ret;

   // It is always safer to use sizeof on the dereferenced pointer than to
   // use sizeof on the type. After all, the type might change during
   // maintenance and if it does the poor maintainer will have to hunt
   // through the code to change all code from "sizeof (old_type)" to
   // "sizeof (new_type)". Using "sizeof *var" will be correct no matter
   // what type "var" is.
   ret = malloc (sizeof *ret);
   if (ret)
      memset (ret, 0, sizeof *ret);

   return ret;
}

void user_delete (user_t *user)
{
   free (user);
}

Now the above still won’t compile because now we are actually dereferencing the structure user_t.

    $ gcc -o user.o user.c
    user.c: In function ‘user_create’:
    user.c:18:25: error: dereferencing pointer to incomplete type ‘user_t
    {aka struct user_t}’
        ret = malloc (sizeof *ret);

While the compiler will allow us to use pointers to undefined structures, it cannot dereference those undefined structures2 nor can it figure out the size of the structure because it does not know what fields exist in the structure!

To fix the error we need to define the structure user_t so that the compiler can determine the size of the structure and can dereference pointers to that structure. The definition of the structure user_t must be placed in the implementation file user.c:

// This must go in user.c before any of the functions are defined.
struct user_t {
   char *name;
   char *email;
   char *salt;
   char *phash;
};

Also, now that we have fields within the structure we should ensure that those fields are initialised when we create an instance of the structure; luckily we already did this with a memset() when we created the instance of the structure, effectively setting all the fields to NULL. However we still need to ensure that the destruction of the structure using user_delete also frees up all the fields. We change the user_delete function to this:

void user_delete (user_t *user)
{
   if (!user)
      return;

   free (user->name);
   free (user->email);
   free (user->salt);
   free (user->phash);
   free (user);
}

This time the implementation compiles and links properly:

 $ gcc -c -o user.o user.c
 $ gcc -c -o main.o main.c
 $ gcc user.o main.o -o main
 $ ./main
  User Test: passed

The main program (the “user” of the implementation) can create and destroy user_t objects without ever knowing what fields are contained within it. The only thing left to do now is to allow the caller to set and retrieve the fields in the user_t object without needing to know what the internal fields are. We do this by providing getter/setter functions in the user.h header. To keep things simple and short for this article I make a single function perform both the setting and the getting, with the setting of the field being optional.

In the user.h header file we add in the following function declarations:

   // Taking shortcuts here; instead of providing two separate functions
   // to get/set a field we provide a single function that always returns
   // the value of the field, and only sets the field if the provided
   // value is non-NULL.
   const char *user_name    (user_t *user, const char *name);
   const char *user_email   (user_t *user, const char *email);
   const char *user_salt    (user_t *user, const char *salt);
   const char *user_phash   (user_t *user, const char *phash);

In the user.c implementation file we add in the function definitions:

static void reset_field (char **dst, const char *src)
{
   if (src) {
      free (*dst);
      *dst = malloc (strlen (src) + 1);
      if (!*dst) {
         // Error!
      }
      strcpy (*dst, src);
   }
}

const char *user_name (user_t *user, const char *name)
{
   reset_field (&user->name, name);
   return user->name;
}

const char *user_email (user_t *user, const char *email)
{
   reset_field (&user->email, email);
   return user->email;
}

const char *user_salt (user_t *user, const char *salt)
{
   reset_field (&user->salt, salt);
   return user->salt;
}

const char *user_phash (user_t *user, const char *phash)
{
   reset_field (&user->phash, phash);
   return user->phash;
}

Finally, we need to change the test program to test that this all works. In a real program you will exhaustively test; in this example I am only testing the happy path through the test. Proper testing itself is a whole field and is thus beyond the scope of this article. In the testing program in main.c we change the function user_test to:

bool user_test (void)
{
   // Note that we can declare a pointer to a structure even though we
   // don't know what the structure looks like internally.
   user_t *user;

   user = user_create ();

   // Set the values for this opaque data object
   user_name  (user, "Testing Name");
   user_email (user, "Testing root@localhost");
   user_salt  (user, "random stream of bytes go here");
   user_phash (user, "Hashed salt+password go here");

   // Use the values in the opaque data object.
   printf ("User name: %s\n", user_name (user, NULL));
   printf ("User email: %s\n", user_email (user, NULL));
   printf ("User salt: %s\n", user_salt (user, NULL));
   printf ("User phash: %s\n", user_phash (user, NULL));

   // Destroy the opaque data object; failure to do so will cause a memory
   // leak
   user_delete (user);

   return true;
}

The results of compiling everything and running the test program:

    $ gcc -c -o user.o user.c
    $ gcc -c -o main.o main.c
    $ gcc user.o main.o -o main
    $ ./main
    User name: Testing Name
    User email: Testing root@localhost
    User salt: Random salt goes here
    User phash: Hashed salt+password go here
    User Test: passed

Once again I must stress: in a real program you will want the test to be much more exhaustive than the user_test program here. Also, in a real-world program the opaque data type is likely to be much more complex. With a more complex data type the opacity helps even more as it literally hides the fields of the data type from the caller.

What happens if the caller tries to access the type anyway? Lets find out by changing the user_test program to access the name field directly:

bool user_test (void)
{
   // Note that we can declare a pointer to a structure even though we
   // don't know what the structure looks like internally.
   user_t *user;

   user = user_create ();

   // Set the values for this opaque data object
   // user_name  (user, "Testing Name");
   // We try to access the field directly
   user->name = "Direct Access To Name Field";
   user_email (user, "Testing root@localhost");
   user_salt  (user, "Random salt goes here");
   user_phash (user, "Hashed salt+password go here");

   // Use the values in the opaque data object.
   printf ("User name: %s\n", user_name (user, NULL));
   printf ("User email: %s\n", user_email (user, NULL));
   printf ("User salt: %s\n", user_salt (user, NULL));
   printf ("User phash: %s\n", user_phash (user, NULL));

   // Destroy the opaque data object; failure to do so will cause a memory
   // leak
   user_delete (user);

   return true;
}

Now, with the direct access in main.c the compiler complains that the type is incomplete.

$ gcc -c -o main.o main.c
main.c: In function ‘user_test’:
main.c:19:8: error: dereferencing pointer to incomplete type ‘user_t {aka
struct user_t}’
    user->name = "Direct Access To Name Field";

So using an opaque type allows access to the fields of that type only within the implementation file (in this example, user.c). Trying to peek into the actual implementation is prevented by the compiler.

In general it is better to implement user-defined types as /opaque/ data types. This enforces encapsulation, data hiding, black-box testing and a whole lot of other qualities shared by well-written software. It also eases the testing; after any change the test program for that single module can be run in isolation to ensure that new behaviours are detected.

Just because you are using C does not mean that you cannot enforce strict encapsulation and data-hiding

Download links


Posted by Lelanthran
2017-12-09

Footnotes


  1. See https://en.wikipedia.org/wiki/Opaque_data_type↩︎

  2. Note the comments on the usage of sizeof above. Whenever you use sizeof make sure that you apply it to a variable and not a data type. There are very few occasions where you will need to apply the sizeof operator to a data type.↩︎