JORDAN CAMPBELL
R&D SOFTWARE ENGINEER
cybernaut/0.1.4

C struct

Structures in C are collections of data in (almost) contiguous memory. Grouped variables are mostly useful for programmers, but because they are (sort-of) contiguous in memory they can also lead to program performance improvements, for example by declaring a pointer to the first struct item and then iterating the pointer to access successive members.

The simplest struct is an anonymous declaration that collects multiple datatypes into a single variable:

struct { int x; double y; double z; } obj;
obj.x = 17;
obj.y = 21.57;
obj.z = obj.x + obj.y;

Anonymous declarations aren't useful very often, so it's much more common to declare a struct as being of a particular type:

struct Object 
{ 
  int x; 
  double y; 
  double z; 
} obj;

obj.x = 17;
obj.y = 21.57;
obj.z = obj.x + obj.y;

where the type of obj is Object.

To declare a variable of a struct type in C you must use the struct keyword, i.e. struct Object sym;. This is in contrast to C++ where the struct keyword can be omitted. To aid legibility the typedef keyword will often be used to declare struct types, i.e:

typedef struct Object 
{ 
  int x; 
}; 

Object sym;

Struct Memory Alignment

Given the memory address of a struct variable, what are the memory addresses of the individual elements?

An initial (and reasonable) answer might be that the elements are all packed next to each other (contiguously) in memory, and therefore each memory address is found by taking a pointer to the object and then incremementing it by the size of each element.

Let's try this with the following program:

#include 

int main() {
    struct Object {
        char c;
        char e;
        double d;
    };

    Object obj;
    obj.c = 'a';
    obj.e = 'b';
    obj.d = 3.14;

    // get a pointer to the start of the struct.
    // We use a char pointer so that incrementing it
    // by one gives us successive bytes in memory.
    char* ptr = (char*)(void*)obj;

    // print the values at each of the three expected
    // memory locations.
    printf("c: %c\n", *ptr); // ptr is already pointing to obj.c

    // a char is a single byte, so increment ptr by one.
    ++ptr;
    printf("e: %c\n", *ptr);

    // a char is a single byte, so increment ptr by one again.
    // We expect this to now be pointing at our double, 'd';
    ++ptr;

    // Note that we need to cast ptr to a pointer of type double, 
    // otherwise when we dereference it it will give us a single byte.
    printf("d: %.2lf\n", *((double*)(void*)(ptr)));

    return 0;
}
    

Unfortunately, when we run this code we get:

c: a        // correct!
e: b        // correct!
d: -0.00    // wrong -- we expected '3.14'

So what happened?

The compiler has aligned our struct for us.

If we print out the memory locations and sizes of each of the variables in our struct when can see what is going on:

c [loc] ...45728, [size] 1
e [loc] ...45729, [size] 1
d [loc] ...45736, [size] 8

    0   1   2   3   4   5   6   7   8   9  ...  
  | c | e | . | . | . | . | . | . | . | d | . | . | . | . | . | . | . |

We can see that we've allocated one byte for c, then eight bytes for e, then a further eight bytes for d.

So, why is this?

For performance and correctness!

Different platforms have different hardware requirements and processes for accessing memory, and many will have requirements that members are aligned to specific boundaries. This is fine, but it does mean that the memory required for a struct is often greater than the size of the sum of the individual elements. In our example above, our two chars and a double require (1 + 1 + 8) = 10 bytes, but the size of the obj struct is actually (8 + 8) = 16 bytes (yes, 16, not 17, as we use 8 bytes for the first two chars, and then eight bytes for the double).