Some Performance Observations


Posted by Lelanthran
2022-06-05

“The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry.” – Henry Petroski

All the results below were the best of 10 consecutive executions.

Getting the Time

You often need to time certain parts of your code. How accurate you need it to be depends on what you are timing, but for most purposes you don’t want the timing itself to contribute to any performance degradation.

Here is a little test to see how clock_gettime() performs for each of the different counters.

#include <stdio.h>
#include <time.h>

int main (int argc, char **argv)
{
   static const struct {
      clockid_t id;
      const char *s;
   } clockids[] = {
      { CLOCK_REALTIME,            "CLOCK_REALTIME"            },  // 0
      { CLOCK_REALTIME_COARSE,     "CLOCK_REALTIME_COARSE"     },  // 1
      { CLOCK_MONOTONIC,           "CLOCK_MONOTONIC"           },  // 2
      { CLOCK_MONOTONIC_COARSE,    "CLOCK_MONOTONIC_COARSE"    },  // 3
      { CLOCK_MONOTONIC_RAW,       "CLOCK_MONOTONIC_RAW"       },  // 4
      { CLOCK_BOOTTIME,            "CLOCK_BOOTTIME"            },  // 5
      { CLOCK_PROCESS_CPUTIME_ID,  "CLOCK_PROCESS_CPUTIME_ID"  },  // 6
      { CLOCK_THREAD_CPUTIME_ID,   "CLOCK_THREAD_CPUTIME_ID"   },  // 7
   };

   (void)argc;
   if (!argv[1]) {
      fprintf (stderr, "Need to specify a clock to use (0-7)\n");
      return -1;
   }

   size_t index = (size_t)-1;
   if ((sscanf (argv[1], "%zu", &index) != 1) || index > 7) {
      fprintf (stderr, "Index %s is invalid\n", argv[1]);
      return -1;
   }

   printf ("Testing %s\n", clockids[index].s);
   for (size_t j=0; j<5000 * 100000; j++) {
      struct timespec ts;
      clock_gettime (clockids[index].id, &ts);
   }
   return 0;
}

Here’s the results, timed with the time command for each of the clock counters available:

Counter Duration
CLOCK_REALTIME 10.805s
CLOCK_REALTIME_COARSE 4.180s
CLOCK_MONOTONIC 10.878s
CLOCK_MONOTONIC_COARSE 4.049s
CLOCK_MONOTONIC_RAW 10.724s
CLOCK_BOOTTIME 10.597s
CLOCK_PROCESS_CPUTIME_ID 227.224s
CLOCK_THREAD_CPUTIME_ID 225.883s

Conclusion

  1. Read the manual carefully.
  2. If you can, use the COARSE counters.
  3. Make sure you really, really need a CPUTIME_ID counter.

Unintended Allocations

Everyone knows that allocating memory is expensive; in a tight loop you want to use as much of the stack or preallocated memory as you can. What you want vs what you get are two different things. The first program below is idiomatic C++, the second is not. The second program takes about 40% of the time that the first program does.

int main (void) {
   std::string line;
   for (int player_num=0; player_num<1000; player_num++) {
      for (int points=0; points<100; points++) {
         for (int moves=0; moves<100; moves++) {
            line =
               "Player " + std::to_string(player_num) + " " +
               "scored " + std::to_string(points) + " points " +
               "in " + std::to_string(moves) + " moves." +
               "\n";
            std::cout << line << "\n";
         }
      }
   }

   return 0;
}
int main (void) {
   char line[1024];
   for (int player_num=0; player_num<1000; player_num++) {
      for (int points=0; points<100; points++) {
         for (int moves=0; moves<100; moves++) {
            snprintf (line, sizeof line, "Player %i scored %i points in %i moves\n",
                      player_num, points, moves);
            std::cout << line << "\n";
         }
      }
   }

   return 0;
}

Here is the result of the best of ten runs:

Program Duration
std::string real 0m8.348s
snprintf real 0m3.263s

Conclusion

Unless your strings are very very tiny, simply using std::string in a performance sensitive hot path is a really bad idea.

Using Regexes for Simple String Matching

Sure, it’s nice when an application lets the user specify a regex as a search term, but exactly how much slower will it be when the search term is an exact string and not a pattern?

In other words, most users don’t know that the search term they are supplying is being parsed as a regular expression.

Here’s a little program that compare the execution speed of an exact match search and a regex search.

#define SEARCH_TYPE_NONE      -1
#define SEARCH_TYPE_EXACT     1
#define SEARCH_TYPE_REGEX     2

int main (int argc, char **argv)
{
   (void)argc;

   if (!argv[1]) {
      fprintf (stderr, "Missing search type (exact or regex\n");
      return -1;
   }

   int stype = SEARCH_TYPE_NONE;
   if ((strcmp (argv[1], "exact")) == 0) {
      stype = SEARCH_TYPE_EXACT;
   }
   if ((strcmp (argv[1], "regex")) == 0) {
      stype = SEARCH_TYPE_REGEX;
   }
   if (stype != SEARCH_TYPE_EXACT && stype != SEARCH_TYPE_REGEX) {
      fprintf (stderr, "Search type [%s] unrecognised\n", argv[1]);
      return -1;
   }

   if (!argv[2]) {
      fprintf (stderr, "Missing search term\n");
      return -1;
   }

   if (!argv[3]) {
      fprintf (stderr, "Missing repeat count for matching\n");
      return -1;
   }

   size_t count = 0;
   if ((sscanf (argv[3], "%zu", &count)) != 1) {
      fprintf (stderr, "Count value [%s] is invalid\n", argv[3]);
      return -1;
   }

   regex_t re;
   if ((regcomp (&re, argv[2], REG_EXTENDED | REG_NOSUB)) != 0) {
      fprintf (stderr, "Failed to compile regex\n");
      return -1;
   }

   char input[1024];
   while (count-- != 0 &&
         !ferror (stdin) &&
         !feof (stdin) &&
         fgets (input, sizeof input, stdin)) {
      char *eol = strchr (input, '\n');
      if (eol)
         *eol = 0;
      if (stype == SEARCH_TYPE_EXACT) {
         printf ("exact [%i]\n", strcmp (argv[2], input));
      } else {
         printf ("re [%i]\n", regexec (&re, input, 0, NULL, 0));
      }
   }

   regfree (&re);
   return 0;
}

The tests were run on multiple inputs of abc, abcdef and abcdefghi, all generated by the yes program and piped into the program above. The search count was set to 10m.

The results are as follows:

Search term Search type Duration
abc Regex real 0m2.493s
abcdef Regex real 0m2.620s
abcdefghi Regex real 0m2.893s
abc Exact match real 0m1.492s
abcdef Exact match real 0m1.480s
abcdefghi Exact match real 0m1.454s

Conclusion

Unless your users:

  1. Know how to construct regular expression, and
  2. Have a burning need for very flexible search terms

You’re better off using the exact match function in your code1. The exact match searching executes in about 60% of the time that the regular expression does, with better performance on longer strings.2


Posted by Lelanthran
2022-06-05

Footnotes


  1. Your language of choice almost certainly includes a very highly optimised string matching function.↩︎

  2. I would be surprised if strcmp() does not switch to a vectorised implementation pass a certain length. This would explain why the performance is better when the string length goes over a particular point.↩︎