Parallel problem-solving

I’ve just put up Rexel (Remote Execution Environment for Linux) for download. Download the source tarball or the thesis from this website.

In short, Rexel allows the developer to write a single native-code C program that can execute on multiple computers concurrently. This is not the same as writing multiple programs which, via the clever passing of messages, allow all the programs to work on the same problem. Rexel allows a single C program to execute different parts of itself on remote computers.

One of the examples of Rexel usage is to run an unmodified Sqlite3 program within the Rexel environment. This allowed the Sqlite3 binary program to unknowingly perform replication when Rexel redistributed all the filesystem calls across a cluster. Whenever the program modified the file, those modifications were instantly replicated across the cluster.

Another example is the naive Lisp implementation within Rexel. This simple Lisp interpreter written to take advantage of Rexel facilities automatically offloads computations on to remote nodes (without the programmer specifying as such).

Anyway, download and play with it.

Remote Function Invocation

Full MSc thesis

Files_ManickumLelanthran

Reference management in HTML, Part II


A while back I proposed a reference management scheme for HTML blog posts (incurring the wrath of the semantic web people, no doubt :)). I’ve updated it somewhat, making it easier to use. Now it’s easier than ever to ensure that your HTML content is well-referenced.

Usage is simple:

<script src="jsnote.js"></script>
<script src="references.js"></script>
<p>This is how we cite the first reference<cit>FirstRef</cit>.
We can cite the second reference like this<cit>SecondRef</cit>
and the third like so<cit>ThirdRef</cit> and the fourth reference
like this<cit>FourthRef</cit></p>
... more content goes here ... pages and pages if you want ...
<h2>Bibliography</h2>

<p><bibliography>plainstyle</bibliography></p>

<script>jsNote()</script>

The first javascript file that is included contains all the routines for making references work. The second javascript file contains a list of all the references used in the example article, namely  FirstRef, SecondRef, ThirdRef and FourthRef. All the text within <cite> tags are used as a lookup into the references.js file to find the reference and the generated list of references are placed within all the <bibliography> tags found (although why an article would need more than a single bibliography is beyond me).

The resulting article has all of the <cite>tags replaced with the reference number in square brackets, which is a link to the actual reference. The above example HTML code looks like this:


This is how we cite the first referenceFirstRef. We can cite the second reference like thisSecondRef and the third like soThirdRef and the fourth reference like thisFourthRef… more content goes here … pages and pages if you want …


The actual references appear at the end of the article (click on one, see what happens).

The function jsNote() starts all the javascript to make the reference-magic happen. Authors can now easily reference all their material in a thorough and complete manner without resorting to manually inserting links to their references.

You can download jsnote.js, the example references file references.js and the example article jsnote-example.js from the links below.

  • jsnote.js (Include this file in your article)
  • references.js (Use this file as a template and example for all your references)
  • jsnote-example.js (Small example of usage)
  • jsnote.pdf (The as-yet-unpublished paper. This link will be enabled once I publish the paper)

 

 

Bibliography

plainstyle

Array Bounds Checking in C

Most languages (Java, C#, Python, Javascript) prevent the programmer from going past the end of an array. This process, performed at runtime by the language implementation, is frequently called Bounds Checking.

Unfortunately in C there is no way for the programmer to determine the size of an array at runtime (not quite true, as I’ll show below) and so it becomes necessary for the programmer to keep track of the length of the array. In other languages, one can declare an array and then at runtime get the length of that array. For example, in Javascript the programmer can get the length of an array using the length property of the array:

for (var i=0; i<array.length; i++) {
   // Do something with array[i]
}

In C the programmer is required to keep track of the length of the array. Failure to keep array references to within the array result in segfaults, protection faults, etc. Most programmers define the array length somewhere.

#define ARRAY_LENGTH     (25)
for (size_t i=0; i<ARRAY_LENGTH; i++) {
   // Do something with array[i]
}

This becomes a problem if the array length changes and the programmer didn’t reflect that change in the #define ARRAY_LENGTH. A quick fix is to use the sizeof operator as follows:

int array[] = {10, 20, 30, 40, 50, 60, 70, 80};
for (size_t i=0; i<sizeof array / sizeof array[0]; i++) {
   // use array[i]
}

The evaluation of (sizeof array) is the total number of bytes that the array uses. The evaluation of (sizeof array[0]) is the number of bytes that array[0] uses. Dividing the total number of bytes that the array uses by the number of bytes that each element uses gives you the number of elements in the array. It works for all data types, including pointers, function pointers, structs, etc.

struct foo_t {
   int a; float b; char *c;
};
struct foo_t array[] = {
   {10, 12.12, "First Item"},
   {20, 23.34, "Second Item"},
   {30, 34.12, "Third item"},
   {40, 54.32, "Fourth Item"},
};
for (size_t i=0; i<sizeof array / sizeof array[0]; i++) {
   // Use array[i].a and array[i].b and array[i].c
}

The problem comes in with function calls, and passing your array as arguments to other functions. In C all arrays decay to pointers when passed as an argument. Thus the called function only gets a pointer to the first element. For example in the following snippet function func_foo() gets the array array as a pointer a foo_t struct, hence the sizeof trick I used above will not work:

struct foo_t {
   int a; float b; char *c;
};
struct foo_t array[] = {
   {10, 12.12, "First Item"},
   {20, 23.34, "Second Item"},
   {30, 34.12, "Third item"},
   {40, 54.32, "Fourth Item"},
};

void func_foo (struct foo_t array[])
{
   // (sizeof array) = the size of the pointer to a struct foo_t (4)
   // (sizeof array[0]) = the size of a struct foo_t (12)
   // Therefore the length will be calculated as (4 / 12), which
   // is zero using integer division. In this case, the loop won't
   // run at all!
   for (size_t i=0; i<sizeof array / sizeof array[0]; i++) {
      // Use array[i].a and array[i].b and array[i].c
   }
}

The problem is that a parameter of the form (type name[]) is turned into (type *name). A common workaround is to simply have all your arrays terminated by a special character. If you have an array of integers use INT_MAX (from limits.h) as the special value to signify end of array. For all pointers use NULL (if NULL can legitimately appear in your array then use (NULL -1) instead). Thus your array iteration code will look like this.

int array[] = {10, 12, 13, 14, 15, INT_MAX};
for (size_t i=0; array[i]!=INT_MAX; i++) {
   // Use array[i]
}

Same with pointers of any type:

char *array[] = {
   "One", "Two", "Three", "Four", NULL,
};
for (size_t i=0; array[i]; i++) {
   // Use array[i]
}

It even works across function calls: you can pass this array to functions and not have to worry about passing along a separate length value. The function has to know how to check for end-of-array.

struct foo_t {
   int a; float b; char *c;
};
struct foo_t array[] = {
   {10, 12.12, "First Item"},
   {20, 23.34, "Second Item"},
   {30, 34.12, "Third item"},
   {40, 54.32, "Fourth Item"},
   {0, 0.0, NULL},
};

void func_foo (struct foo_t array[])
{
   // Because each element is a struct, the best convention to
   // represent end-of-array is to use all NULL or zero values
   // for all fields.
   for (size_t i=0; array[i].a && array[i].c; i++) {
      // Use array[i].a and array[i].b and array[i].c
   }
}

Obviously for that last code snippet you can make your convention be “The last field is NULL”, or “All the fields are NULL or Zero”. Personally, I use NULL wherever possible.

The only drawback is that you cannot have an element of the array that corresponds to your end-of-array terminator.I haven’t encountered this problem yet though … since I started doing this about 15 years ago … so you should be safe using this convention.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

iBurst USB Drivers for Linux

Got iBurst yesterday, along with a USB modem. The salesperson assured me that the modem works under Linux. Well, chalk one up to the forces of sales and marketing, because the modem definitely did not work, and the enclosed CD definitely did not have drivers for Linux. I downloaded the open source drivers (ibdriver-1.3.5) from sourceforge, but the compilation failed.

So, having “fixed” the compilation problems by means of removing the pcmcia functionality, I offer the updated drivers to the world. This iBurst USB modem driver will work under Ubuntu, Debian, Mint, Redhat, CentOS, slackware or anything else that has a linux kernel 2.6 and above (including kernel 3.0). Download it here (choose your kernel).

Note that this is self-extracting script, so before running it make sure that it has execute permissions set. Please let me know if this does not work by emailing me the output of the executable. If it does work, then you can thank me in your will (of course, if you don’t own much, then just leave a comment here thanking me:)

Download iBurst USB Modem Linux driver (kernel v2.6.x)

Download iBurst USB Modem Linux driver (kernel v3.0.0)

Download iBurst USB Modem Linux driver (kernel v3.2.0)

Google with egg on their face?

The Google Android Developer Challenge/Africa is well underway. App submission deadline was 1-July-2011, semi-finalists will be announced 15-July-2011. Winners will be announced 12-September-2011.

The point of this, presumably, is to encourage African developers to develop for Android. The biggest motivation to do something is money, so participants are allowed to enter paid apps into the challange (under “Eligibility”). Sadly, it seems that the organisers of ADC spend very little time with the Google Checkout team, because Google checkout does not support many of the countries that are eligible for ADC/Africa, and there is no other way to get paid for apps in Google marketplace.

While I know that, sometimes, in large companies its a case of “the right hand not knowing what the left hand is doing”, in this case it seems even the left hand has only a very hazy notion as well. I’m working on porting my application to iPhone. Of course, the longer it takes for Google to get this working, the larger the number of ADC apps that get ported. It could be a case of Google simply giving app developers motivation to come up with a good concept that then goes to iPhone because the concept could not be sold for Android.

Heads up to Google: those devs that came up with a new, nifty and/or novel app will certainly not make it freely available for Android (not even as trial) if they can make it for money for iPhone. Making a trial version for Android simply means putting a good idea out there that some iPhone dev is going to implement on iPhone. To keep the first-mover advantage of a novel concept, the dev has to be the first one with the app. If it is so novel and nice, people would buy it from elsewhere.

Of course, its always possible that Apple also won’t allow devs from South Africa to sell applications either – with Apple you don’t get this information until after you pay their $99 fee, so if anyone knows if South African developers are allowed paid apps in Apples App Store, let me know.

Dynamic arrays in C

(NOTE: This code has been rolled up into the open-source libxc – see https://sourceforge.net/p/libxc/wiki/Home/. While you can still use this code as is, it’s recommended to use libxc instead which has the same functionality and more).

Dynamic arrays are remarkably easy to implement in plain C with two caveats:
1. No destructors, so the user of the array has to free the array elements one item at a time (not such a problem!)
2. No copy constructors, so the user of the array has to ensure that structures with pointers are copied correctly.

I implemented the dynamic array by simply storing an array of void pointers, and only storing pointers to elements (a pointer to void can point to anything), hence this lets me have a dynamic array that can store elements of anything, even elements of different types (although this is a bad idea – unreadable code is not the goal!). You can download the code that implements this as a library.

The “array” is simply a struct like so:

struct rsarray_t {
   size_t alloc_len;   /* Store the number of allocations */
   size_t used_len;   /* Store the of cells we have thus far used */
   void **data;        /* The actual array of void pointers */
};

There are no new_array or create_array functions that the user must call. All the user has to call to append a pointer onto the end of the array is rsarray_ins_tail(array, element), where element is the new pointer to be appended to array, which is of type struct rsarray_t*.

If array is NULL, then an array is created and returned with element as the first and only element of the array (which is why there is no need for an “array_create” function). If array is not NULL, then the element is appended to the array using realloc if there is not enough space to store another element. Usage is dead-easy:

struct rsarray_t *array;
int a =0, b = 1, c = 2;
array = rsarray_ins_tail (NULL, &a);
array = rsarray_ins_tail (array, &b);
array = rsarray_ins_tail (array, &c);
/* array now contains pointers to a, b and c */

You may not always want to store many references to the variable, though. For example, at this point in the code we may need to modify the values of a, b and c and we don’t want those modifications to take effect on the values already stored in the array. To this end, I provide a function rsa_dup, which will make a duplicate of any object passed to it (with the size of the object specified by the user). The above code can then be written as:

int a = 0, b = 1, c = 2;
array = rsarray_ins_tail (NULL, rsa_dup (&a, sizeof a));
array = rsarray_ins_tail (array, rsa_dup (&b, sizeof b));
array = rsarray_ins_tail (array, rsa_dup (&c, sizeof c));
/* array now contains copies of a, b and c */
a = 42; /* Element 0 of array is still 0 */
/* Change the value of b */
*(int *)array->array[1] = 43;

The above is still a little dirty. The duplication of the variable in the insertion is bound to lead to problems, and the verbose semantics of accessing an element (seen in the last line) is simply unreadable. They can both be fixed with macros:

/* unsafe macro, 'x' should not have side-effects */
#define RSA_INS_TAIL(a,x) \
   rsarray_ins_tail (a, rsa_dup (&x, sizeof x))
#define RSA_INDEX(a,i,t)   \
   *(t *)a->data[i]
...
int a = 0, b = 1, c = 2;
array = RSA_INS_TAIL (NULL, a);
array = RSA_INS_TAIL (array, b);
array = RSA_INS_TAIL (array, c);
/* array now contains copies of a, b and c */
a = 42; /* Element 0 of array is still 0 */
/* Access the second element i.e. array[1] */
RSA_INDEX (array, 1, int) = 43;

All well and good, how do we iterate? Easy, again.

/* unsafe macro, 'x' should not have side-effects */
#define RSA_INS_TAIL(a,x) \
   rsarray_ins_tail (a, rsa_dup (&x, sizeof x))
#define RSA_INDEX(a,i,t)   \
   *(t *)a->data[i]
#define RSA_LENGTH(a)    \
   a->used_len
...
struct rsarray_t *array = NULL;
int i;
/* Insert ten elements into array */
for (i=0; i<10; i++) {
   array = RSA_INS_TAIL (array, i);
}
/* Change all the values stored in the array */
for (i=0; i<RSA_LENGTH; i++) {
    RSA_INDEX (array, i, int) = 42 + i;
}

Freeing is as simple as freeing each element (because you are certain that there are no other references to it – the memory was allocated when we appended elements using rsa_dup, so no other code should be sharing the pointer), so we can simply iterate over the array and free each of the void*. In fact, since iteration over the array is a common occurrence, we can just go ahead and create a function to do it for us, so that we never have to write the actual loop:

void rsarray_iterate (rsarray_t *a, void (*f) (void *))
{
   size_t i;
   for (i=0; iused_len; i++) {
      f (a->data[i]);
   }
}
...
struct rsarray_t * array = NULL;
[... here we append to the array ...]
/* Now we free each element */
rsarray_iterate (array, free);
/* Free the array that held all the elements */
free (array->array);
/* Free the structure that held the array */
free (array);

Because this isn’t rare either, we can put the above 3 lines into a function call (for example) rsarray_free (). Of course, once you’ve got that, you can use the resizable array to implement a host of other data structures. You can download the code that does this over here.

It was written about 8 years ago, and I’ve been using it every since (dynamic arrays are more useful than you think!). I go a lot further in the download above, as the download is actually written as a library, and uses the dynamic array as a basis to implement growable arrays, stacks and queues. Check the download for example usage, further avenues to explore, for example creating a ring-buffer, or implementing copy-construction so that composite datatypes consisting of pointers are sensibly copied when an insertion is made.

Software Project Management

[facebook_ilike]

The Polaris Missile project was developed in the late 1950′s. The project involved 250 major contractors, over 9000 subcontractors and hundreds of thousands of individual tasks.[1] Many of the items and components needed for the project existed only on the drawing board at the time.[2]

The project was completed two years ahead of schedule.With 1950′s technology (i.e. pencil and paper) managing it. The complexity involved was almost unheard of at the time.

I find it hard to be convinced that the lack of success seen in most projects can be attributed to complexity, especially as most of the dev teams working on corporate Java (or C#, or whatever)  projects are probably working on problems that have already been solved in a different guise, or that need very little in the way of inspired or creative thinking. The hard mathematical bits are already done. The reason for most projects seem to be “lets reinvent $FOO in a different language/platform/design”.

Why then are project managers unable to accurately predict where the project would be after $X months? Could it be the lack of rigour, in that it’s hard to be mathematically rigorous when you are simply listing the activities with thumb-sucked guesses in a spreadsheet, without weighing each activity with a risk? Perhaps it’s because a technical meeting of 12 people is almost always going to result in 2 people talking at any given time and 10 not listening (‘cos the current conversation has no effect on them, nor should it).

Could it be that PM’s have almost no background in statistical methods, leaving them much more vulnerable to getting blind-sided by confirmation bias or misjudging statistical significance as statistical noise?

However, my singular experience with professional software development for the previous 12 years has put me in touch with numerous SW-dev PM’s, at many different companies, and none of them employed any of the methods or techniques used and created for the development of the Polaris Missile. Many of them were not even aware of it, having a more modern certification (PM’s on the Polaris Missile project had no PM certification) that presumably works better.

Perhaps.

Or maybe I’m simply wrong, and development in Java or C# really is much more complicated than the advanced calculus and materials engineering needed to build a rocket.

==========
[1] Production and Operations Management, Khanna, PHI Learning Pvt. Ltd., 2007

[2]Harvey M. Sapolsky, THE POLARIS SYSTEM DEVELOPMENT,
Harvard University Press, 1972
[facebook_ilike]