“A good programmer is someone who looks both ways before crossing a one-way street.” – Doug Linder
If you’ve read the original post on “Parse, Don’t Validate” you may have noticed that it focuses primarily on conceptual correctness. Here, I’ll build on that by showing how this technique can be used outside of niche academic languages by demonstrating it in a language that is as practical as it is dangerous - C.
In this blog post you will see three techniques of reducing the risk of exploitable errors in C.
The basic idea is this:
Your first instinct, when your system receives as input an email address (for example), is to perform validateEmail(untrustedInput)
and then pass the validated string further into the depths of the system for usage.
The problem is that other code deep within the rest of the system is going to also do some sort of validation on the string they just got. Every single function deep within the bowels of the system will still need to validate the input before processing it.
I’ll bet good money that the processing functions will attempt to validate their input. Because they’re logically far away from the boundary, they’ll either do it a different way or fail to do it altogether.
So instead of this:
// Pseudocode
if (validateEmail(untrustedInput) != true) {
return error;
}// Rest of system uses `untrustedInput`
Rather do this instead:
// Pseudocode
email_t theEmail = parseEmail(untrustedInput);if (theEmail == PARSE_ERROR) {
return error;
}// Rest of system uses `theEmail`
This removes any opportunity for errors to creep in within the rest of the system, such as some other code using a different validateEmail
function on the untrustedInput
, for example.
What does this have to do with C strings? Good question.
Much to the surprise of, well, everybody, C actually has type safety. Sure, it isn’t as enforceable as (for example) Rust… and, sure, if you are willing to do extra work you can bypass it, but, at the end of the day, the compiler will still warn you if you try to add a number to a string and assign the result to a function.
With some exceptions, when you mismatch types, the compiler will tell you about it.
The problem isn’t that C lacks type safety (it clearly enforces most types in most expressions), it’s that raw pointers do not encode semantics (e.g., a char *
doesn’t tell you if it’s an email, a name, or a filename).
This is pretty much the same in every language; if you have a function store_user()
which accepts two strings, an email and a user name, then no type safety in the world is going to save you if you accidentally swap the arguments around when calling the function.
But, you still have options – even in C – by creating new string types; one for email and another for user name.
When writing in C, instead of passing char *
around as strings, or (safer, but not by much) using an existing string library that stores length + buffer… rather create an opaque type for it.
You parse the input into the correct type once, and then functions which accept that type will produce a compile error if you mix things up.
To create opaque types, read this post.
When you create the correct types for data entering the system, you can then do this:
// C code, not pseudocode
email_t *email = parse_email(untrusted_input);if (!email) {
// Handle error
}
In addition to the safey from using opaque types there’s even more levels of safety here too, starting with:
You remove all occurrences of
char *
values in your system. You force the only occurrences ofchar *
values to be at the boundary of your system, where all input is untrusted anyway!
This is not only for char *
types though; you can do it to all values entering your system.You parse them once into the correct data type, and then code deep in the belly of the system cannot be compromised with malicious input, because the only data that the rest of the system will see is data that has been parsed into specific types.
When your functions never accept char *
parameters your risk of pwnage is reduced. By leveraging the typing guarantees in C, you can ensure that the system won’t compile even if some heretic decides that they want to pass a char *
to a function expecting an email_t
.
Only the functions on the boundary of the system, interfacing to the outside world, should parse input. Everything else should accept only type-checkable parameters.
That alone is a big reason to use this approach, but I’ll point out two more opportunities to reduce the attack surface of your system using an actual compilable example consisting of separate compilation units. The untrusted input email
and name
come from outside the system.
First, the header declaring your custom types:
// callee.h
typedef struct email_t email_t;
typedef struct name_t name_t;
#ifdef __cplusplus
extern "C" {
#endif
const char *untrusted);
email_t *email_parse (const char *untrusted);
name_t *name_parse (
// Additional tip: letting the callee set the callers pointer to NULL
// when the value is freed prevents double-frees
void email_del (email_t **email);
void name_del (name_t **name);
#ifdef __cplusplus
};#endif
Then, the implementation:
// callee.c
#include <string.h>
#include <stdlib.h>
#include "callee.h"
struct email_t {
// In a real program, you might want to store the two components
// of the email address (before and after the `@`) separately.
// This example simply copies the input.
char *email;
};
struct name_t {
char *name;
};
const char *untrusted)
email_t *email_parse (
{if (!untrusted)
return NULL;
sizeof *ret);
email_t *ret = malloc (if (!ret)
return NULL;
// In a real program, you'll parse this correctly
if (!(ret->email = strdup (untrusted))) {
free (ret);
ret = NULL;
}
return ret;
}
const char *untrusted)
name_t *name_parse (
{if (!untrusted)
return NULL;
sizeof *ret);
name_t *ret = malloc (if (!ret)
return NULL;
if (!(ret->name = strdup (untrusted))) {
free (ret);
ret = NULL;
}
return ret;
}
void email_del (email_t **email)
{if (email && *email) {
free ((*email)->email);
free (*email);
*email = NULL;
}
}
void name_del (name_t **name)
{if (name && *name) {
free ((*name)->name);
free (*name);
*name = NULL;
} }
And, of course, the caller:
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include "callee.h"
void store_record_old (char *email, char *name)
{// Do something with the parameters here
void)email;
(void)name;
(
}
void store_record_new (email_t *email, name_t *name)
{// Do something with the parameters here
void)email;
(void)name;
(
}
bool rx_untrusted_input (char *untrusted_name, char *untrusted_email)
{
email_t *email = email_parse (untrusted_email);
name_t *name = name_parse (untrusted_name);if (!email || !name) {
email_del (&email);
name_del (&name);return false;
}// Whoops - we accidentally specified the parameters in the wrong order!
// Compiler cannot tell that this is a mistake!
store_record_old (untrusted_name, untrusted_email);
// Same mistake with opaque types, but now the compiler catches it!
// error: incompatible pointer types passing 'name_t *' to parameter of type 'email_t *'
store_record_new (name, email);return true;
}
There is now literally no way for any non-boundary code in your system to accidentally use an email
value in place of a name
value.
This is a practical way of hardening your system to attacks: Parse, Don’t Validate.
Another one is shown in the code snippet above - your “destructor” functions which free a value should always be written to take the address of a pointer to that value.
Why, you ask? It’s because then the destructor function can set the pointer at the callers location to NULL, so even if a caller accidentally calls the email_del()
destructor function twice, nothing will happen on the second time around.
And finally, the last upside: with different type names for different types, there will never be a situation where a caller might accidentally switch around the parameters in a call.
With Parse, Don’t Validate, you will never run into the situation of accidentally swapping parameters around in a function call, because the compiler will error out even though those two types are identical under the hood!
By applying Parse, Don’t Validate, you gain three benefits:
encapsulation & safety
- Raw char * strings are only handled at system boundaries, preventing misuse.
- Strong types (email_t, name_t, etc.) ensure data is structured correctly from the start.
reduced attack surface
- Untrusted input is immediately transformed into safe, structured data.
- Functions deep in the system never deal with unvalidated input, reducing risk.
compiler-enforced type safety
- Accidentally swapping parameters (e.g., passing name instead of email) becomes a compile-time error rather than a runtime bug.
- Functions expect well-defined types, preventing unexpected behavior.
By leveraging the typing guarantees we eliminate entire classes of bugs while making the code more robust and maintainable. Instead of just checking values for correctness, we parse it once and then the compiler enforces some typing guarantees for us.