“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” – Henry Petroski
All the results below were the best of 10 consecutive executions.
You often need to time certain parts of your code. How accurate you need it to be depends on what you are timing, but for most purposes you don’t want the timing itself to contribute to any performance degradation.
Here is a little test to see how clock_gettime()
performs for each of the different counters.
#include <stdio.h>
#include <time.h>
int main (int argc, char **argv)
{static const struct {
clockid_t id;const char *s;
} clockids[] = {"CLOCK_REALTIME" }, // 0
{ CLOCK_REALTIME, "CLOCK_REALTIME_COARSE" }, // 1
{ CLOCK_REALTIME_COARSE, "CLOCK_MONOTONIC" }, // 2
{ CLOCK_MONOTONIC, "CLOCK_MONOTONIC_COARSE" }, // 3
{ CLOCK_MONOTONIC_COARSE, "CLOCK_MONOTONIC_RAW" }, // 4
{ CLOCK_MONOTONIC_RAW, "CLOCK_BOOTTIME" }, // 5
{ CLOCK_BOOTTIME, "CLOCK_PROCESS_CPUTIME_ID" }, // 6
{ CLOCK_PROCESS_CPUTIME_ID, "CLOCK_THREAD_CPUTIME_ID" }, // 7
{ CLOCK_THREAD_CPUTIME_ID,
};
void)argc;
(if (!argv[1]) {
"Need to specify a clock to use (0-7)\n");
fprintf (stderr, return -1;
}
size_t index = (size_t)-1;
if ((sscanf (argv[1], "%zu", &index) != 1) || index > 7) {
"Index %s is invalid\n", argv[1]);
fprintf (stderr, return -1;
}
"Testing %s\n", clockids[index].s);
printf (for (size_t j=0; j<5000 * 100000; j++) {
struct timespec ts;
clock_gettime (clockids[index].id, &ts);
}return 0;
}
Here’s the results, timed with the time
command for each of the clock counters available:
Counter | Duration |
---|---|
CLOCK_REALTIME |
10.805s |
CLOCK_REALTIME_COARSE |
4.180s |
CLOCK_MONOTONIC |
10.878s |
CLOCK_MONOTONIC_COARSE |
4.049s |
CLOCK_MONOTONIC_RAW |
10.724s |
CLOCK_BOOTTIME |
10.597s |
CLOCK_PROCESS_CPUTIME_ID |
227.224s |
CLOCK_THREAD_CPUTIME_ID |
225.883s |
COARSE
counters.CPUTIME_ID
counter.Everyone knows that allocating memory is expensive; in a tight loop you want to use as much of the stack or preallocated memory as you can. What you want vs what you get are two different things. The first program below is idiomatic C++, the second is not. The second program takes about 40% of the time that the first program does.
int main (void) {
std::string line;
for (int player_num=0; player_num<1000; player_num++) {
for (int points=0; points<100; points++) {
for (int moves=0; moves<100; moves++) {
line ="Player " + std::to_string(player_num) + " " +
"scored " + std::to_string(points) + " points " +
"in " + std::to_string(moves) + " moves." +
"\n";
std::cout << line << "\n";
}
}
}
return 0;
}
int main (void) {
char line[1024];
for (int player_num=0; player_num<1000; player_num++) {
for (int points=0; points<100; points++) {
for (int moves=0; moves<100; moves++) {
sizeof line, "Player %i scored %i points in %i moves\n",
snprintf (line,
player_num, points, moves);std::cout << line << "\n";
}
}
}
return 0;
}
Here is the result of the best of ten runs:
Program | Duration |
---|---|
std::string |
real 0m8.348s |
snprintf |
real 0m3.263s |
Unless your strings are very very tiny, simply using std::string
in a performance sensitive hot path is a really bad idea.
Sure, it’s nice when an application lets the user specify a regex as a search term, but exactly how much slower will it be when the search term is an exact string and not a pattern?
In other words, most users don’t know that the search term they are supplying is being parsed as a regular expression.
Here’s a little program that compare the execution speed of an exact match search and a regex search.
#define SEARCH_TYPE_NONE -1
#define SEARCH_TYPE_EXACT 1
#define SEARCH_TYPE_REGEX 2
int main (int argc, char **argv)
{void)argc;
(
if (!argv[1]) {
"Missing search type (exact or regex\n");
fprintf (stderr, return -1;
}
int stype = SEARCH_TYPE_NONE;
if ((strcmp (argv[1], "exact")) == 0) {
stype = SEARCH_TYPE_EXACT;
}if ((strcmp (argv[1], "regex")) == 0) {
stype = SEARCH_TYPE_REGEX;
}if (stype != SEARCH_TYPE_EXACT && stype != SEARCH_TYPE_REGEX) {
"Search type [%s] unrecognised\n", argv[1]);
fprintf (stderr, return -1;
}
if (!argv[2]) {
"Missing search term\n");
fprintf (stderr, return -1;
}
if (!argv[3]) {
"Missing repeat count for matching\n");
fprintf (stderr, return -1;
}
size_t count = 0;
if ((sscanf (argv[3], "%zu", &count)) != 1) {
"Count value [%s] is invalid\n", argv[3]);
fprintf (stderr, return -1;
}
regex_t re;
if ((regcomp (&re, argv[2], REG_EXTENDED | REG_NOSUB)) != 0) {
"Failed to compile regex\n");
fprintf (stderr, return -1;
}
char input[1024];
while (count-- != 0 &&
!ferror (stdin) &&
!feof (stdin) &&sizeof input, stdin)) {
fgets (input, char *eol = strchr (input, '\n');
if (eol)
0;
*eol = if (stype == SEARCH_TYPE_EXACT) {
"exact [%i]\n", strcmp (argv[2], input));
printf (else {
} "re [%i]\n", regexec (&re, input, 0, NULL, 0));
printf (
}
}
regfree (&re);return 0;
}
The tests were run on multiple inputs of abc
, abcdef
and abcdefghi
, all generated by the yes
program and piped into the program above. The search count was set to 10m.
The results are as follows:
Search term | Search type | Duration |
---|---|---|
abc | Regex | real 0m2.493s |
abcdef | Regex | real 0m2.620s |
abcdefghi | Regex | real 0m2.893s |
abc | Exact match | real 0m1.492s |
abcdef | Exact match | real 0m1.480s |
abcdefghi | Exact match | real 0m1.454s |
Unless your users:
You’re better off using the exact match function in your code1. The exact match searching executes in about 60% of the time that the regular expression does, with better performance on longer strings.2
Your language of choice almost certainly includes a very highly optimised string matching function.↩︎
I would be surprised if strcmp()
does not switch to a vectorised implementation pass a certain length. This would explain why the performance is better when the string length goes over a particular point.↩︎