We had many goals when we built OpenAMQ, but one of the most central was to build a server that could not be broken, hacked, or compromised. To achieve this in ANSI C, a language which has zero security features, means defining strong policies and applying these ruthlessly.
Security also means reliability. While writing secure software requires certain specialised techniques, these also contribute to reliability. Each buffer overflow exploit makes use of a bug that can, in other conditions, cause the application to behave badly, or crash.
We assume that any successful server that is widely used on the Internet will be the target of constant attacks conducted by motivated, clever, and immoral individuals who at the least seek credit for finding exploits and at the worst, seek entry into key systems. A server has to withstand such attacks without blinking. Thus, reliability also means security.
One noticeable thing about OpenAMQ is that it is built on a large stack of pure iMatix technology. Yes, it also uses several external packages such as Apache APR. But 95% of the code is ours, which is bad because it means we have a lot more work, but good, because we can enforce our standards across all that code.
We enforce standards in many ways, some less obvious than others. These are the main ones:
These are some of the programming techniques we use to enforce reliability and security.
We restrict ourselves to three types of byte arrays (strings):
Before:
char
my_string [100];
strcpy (my_string, random_unsafe_data);
After:
icl_shortstr_t
my_string;
icl_shortstr_cpy (my_string, random_unsafe_data);
Many of the security problems in communications software stem from protocols that use variable-length strings which the software must parse and scan. Safely parsing and scanning text is a discipline all in itself, and hard to do entirely safely. The wave of human readable protocols developed in the 1980's and 90's did not improve things.
The AMQ protocol does not attempt to be human readable, but instead to be safely and easily processed by software. This means prefixing all strings with a length indicator.
AMQP strings are variable length and represented by an integer length followed by zero or more octets of data. AMQP defines two string types:
OpenAMQ internally uses identical matching string types. Most texts destined for human input and output are treated as short strings. All texts that need more capacity are treated as long strings. We use two classes that implement these strings, and methods to operate on them.
An AMQP string can be verified before it is parsed. If it is too long, we know this in advance, and we can reject the entire frame and close the connection without pity.
This effectively eliminates all scope for buffer overflow attacks based on submitting fraudulent data to the server.
Before:
HTTP/1.1 GET /somedomain.com/my/long/url?long_arguments
After (no, this is not really AMQP, it's a sketch):
[13]BASIC.CONSUME [17] QUEUE=MY.QUEUE
When we destroy an object, and possibly free its data, we first nullify it. That ensures that any further access to the object will fail, and that if the memory is reused elsewhere, it cannot contain 'interesting' data.
Before:
free (pointer); pass_illegal_data (pointer->data);
After:
// In the application: my_object_destroy (&reference); // And in the object's destroy method: memset (self, 0, sizeof (my_object_t)); *self = NULL; // Parent reference is now NULL
When references are released, either by an unlink or a destroy method, the reference is nullified. This kind of coding style and practice makes it impossible to use released memory. It's enforcable because all internal APIs are code-generated, meaning an OpenAMQ developer gets safe code by default.
Before:
free (pointer); pass_illegal_data (pointer->data);
After:
my_object_unlink (&reference); // And in the object's unlink method: *self = NULL; // Parent reference is now NULL
We always use the second case - you'll not find a single malloc or free call in the OpenAMQ code except in dedicated memory management layers.
Even a code-generated framework can have errors, since these frameworks are made by humans, and humans make mistakes. So we add extra paranoia into every object method, which we call "sanity checking". It's quite simple:
Which gives us several useful extra checks against rare but not unheard of errors:
// In every live object method assert (self->object_tag == ALIVE);
Why do some applications present a real security risk? Part of the problem is that to use TCP/IP ports below 4096, such as the HTTP port 80, applications need to run as root on Unix systems. Well-designed applications run as root only for the short time it takes to open a port, then they switch to a less powerful user. But this still leaves a window of opportunity for malign software to start the server, then immediately compromise it, and thus get root access to the system.
OpenAMQ solves this elegantly by using a user-space port, 5672, which needs no special authorisation to open. OpenAMQ never runs as root, and this makes it even safer to run.
Before:
sudo my_web_server
After:
amq_server --port 5672
The OpenAMQ code never attempts to recover from internal errors. It makes heavy use of assertions to ensure that errors in one layer cannot affect other layers.
// In every object method assert (self);