Structs, Alignment, and Padding
CS 301 Lecture, Dr. Lawlor
(Also read Chapters 3.9 and 3.10 of the Bryant and O'Hallaron textbook for more info on structs)
Structs
A "struct" or "class" is just an object that contains a bunch of subobjects, called "fields". So
struct bar {
int x;
int y;
};
This is a struct, bar, that contains two fields x and y. x and y
are usually laid out in memory right next to each other, so bar is a
total of 8 bytes long: the first 4 bytes are x, and the last 4 bytes
are y.
There's a cool macro called "offsetof(struct,field)" that returns the
number of bytes between the start of the struct and the start of that
field. So "offsetof(bar,x)==0" bytes and "offsetof(bar,y)==4"
bytes, while "sizeof(bar)==8" bytes.
In assembly, if eax was pointing to the start of bar,
mov DWORD [bar+4],0x117
would set bar's y field to 0x117, because the y field starts 4 bytes from the start of bar.
Structs in C
In plain C, not C++, "struct bar" isn't the same as "bar". So you
need to use a typedef to make both a "struct tag" and an actual
typename at the same time:
typedef struct bar_tag {
int x;
int y;
} bar;
So now "bar" is a typedef for "struct bar_tag", which is just a struct
like in C++. This is one case where the C++ version is so much
better the C way has been almost totally forgotten.
Structs and Alignment
"alignment" is where a 4-byte object must sit in memory at a pointer
divisible by 4. On some CPUs (PowerPC and DEC Alpha
are prominent examples), for example reading an int from an unaligned
address like 0x10000003 can be 1000x slower than reading from an
aligned address like 0x10000004! The penalty for unaligned access
on x86 machines is usually undetectable (a few percent at best), but is
occasionally a fewfold slowdown.
To avoid unaligned accesses, the compiler may insert "padding" (unused space) into your structs to improve alignment.
For example,
struct bar {
int x;
char z;
int y;
};
std::cout<<"sizeof(bar)=="<<sizeof(bar)<<"\n";
std::cout<<"offsetof(bar,x)=="<<offsetof(bar,x)<<"\n";
std::cout<<"offsetof(bar,z)=="<<offsetof(bar,z)<<"\n";
std::cout<<"offsetof(bar,y)=="<<offsetof(bar,y)<<"\n";
return 0;
(executable NetRun link)
In a perfect world, this would be a 9-byte struct: two 4-byte ints, and
one one-byte char. But to avoid an unaligned access to the int,
the compiler sticks in 3 bytes of padding after the char.
Field
|
Size
|
x
|
4 bytes
|
z
|
1 byte
|
(padding)
|
3 bytes (to a total of 4)
|
y
|
4 bytes
|
On x86, char is 1-byte aligned (in other words, char never has
padding), short is 2-byte aligned (meaning the pointer must be a
multiple of 2), and everything else (int, long, long long, and even
double) are 4-byte aligned. On most other machines, including
PowerPC, a builtin type of N bytes must be N-byte aligned; so double is
on 8-byte alignment--a char followed by a double wastes 7 bytes for
alignment!
Fighting Padding Waste
Padding can cause wasted space, and cause very strange values for disk files and the network, so we often want to avoid padding.
- If everything's the same type (for example, all chars, or all
ints), alignment will be perfect and there will never be any padding.
- If the types already have good alignment (for example, four
chars, and then an int), the compiler won't stick in any extra
padding. Often you can help the compiler out by just declaring
types of the same size in the same place in the struct or class--first
all the chars, then all the shorts, then all the ints, etc.
- Some compilers have options to adjust alignment--but usually it's to increase the alignment requirements, not eliminate them!
Bitfields
A "bitfield" is a struct where you tell the compiler you only care
about a subset of the bits in each field. The syntax is just to
put a colon and a number of bits after each field. For example,
"t" is just 2 bits long here because of the ":2"
struct bar {
unsigned char src:3;
unsigned char dest:3;
unsigned char t:2; /* just 2 bits long! */
};
bar b;
b.t=3;
b.dest=6;
b.src=2;
printf("b in octal is 0%o\n", *(unsigned char *)&b);
return sizeof(b);
(executable NetRun link)
The overall struct is just 1 byte, 8 bits, which is pretty cool.
This example is actually the funk_emu 03ds byte, which is the x86
ModR/M byte.
Be warned that the usual padding and alignment rules apply even to
bitfields; so replacing "unsigned char" with "int" above results in a
4-byte struct, because the compiler makes sure "int"s are 4-byte
aligned, even in a bitfield!