Memory Allocation and Access, in Assembly and C

In assembly language, we use "db" (data byte) to allocate some space, and fill it with a string.

mov rdi, daString ; pointer to string
extern puts
call puts ; print the string
ret

daString:
	db `No.`,0    ; sets bytes of string in memory

(Try this in NetRun now!)

You can actually pull out the bytes of the string directly from memory like below, for example to print their ASCII values as a number, like 0x4E for 'N'. The syntax in assembler for reading memory bytes uses square brackets, [], inside of which you put the address (pointer) you want to read. It's a good idea to explicitly write the storage size, which is BYTE for a normal string.

mov rdi, daString ; pointer to string
mov rax,0 ; zero out high bits of return value
mov al, BYTE [rdi+0] ; read first byte of the string (copy to al)
ret

daString:
	db `No.`,0    ; sets bytes of string in memory

(Try this in NetRun now!)

We can't copy the BYTE value directly to rax, because a byte is 8 bits, and rax is 64 bits.

C datatype	Bits	Bytes	Register	Access memory	Allocate memory
char	8	1	al	BYTE [ptr]	db
short	16	2	ax	WORD [ptr]	dw
int	32	4	eax	DWORD [ptr]	dd
long	64	8	rax	QWORD [ptr]	dq

We can put full 64-bit numbers into memory using "dq", and then read them back out, just like we read individual bytes from a string.

mov rdi, myNum ; pointer to long
mov rax, QWORD [rdi+0] ; read the long from memory
ret

myNum:
	dq 117    ; puts one long in memory

(Try this in NetRun now!)

The pointer value you use in a memory access can be surprisingly complex, like BYTE [rdi+3], or even QWORD [rdi+rcx*8]. This is one of the few places you can actually do arithmetic *inside* an instruction in assembly!

For example, if we have two numbers back to back, we can read the second number by using pointer arithmetic to push the pointer from the first number to the second. In assembly, pointer arithmetic is done in bytes (not bits, not ints, always bytes!), so I add 8 bytes to get to the next integer.

mov rdi, myNum ; pointer to long
mov rax, QWORD [rdi+8] ; read *next* long from memory
ret

myNum:
	dq 117    ; puts one long in memory      [myNum+0]
	dq 42    ; puts another long in memory   [myNum+8]

(Try this in NetRun now!)

It's easy to mess this up, and grab part of one integer mixed with part of the next!

Pointer Arithmetic in C

Plain C can also do pointer arithmetic, so you can extract bytes from a string exactly like in assembly:

const char *str="No."; // allocate a string
char c=*(str+0); // pull out one char

printf("The char is '%c'\n",c); // print the char

(Try this in NetRun now!)

The syntax here is "*(ptr)", which dereferences the pointer, accessing what the pointer points to. Unlike in assembly, pointer arithmetic on a "pointer to long" happens in longs, not bytes like assembly. C can do this because C has types, while assembly has no clue, so it needs to assume you're operating on bytes.

long someLongs[2]={7,13}; // allocate two long integers
long *ptr=someLongs; // point to the first one
long v=*(ptr+1); // pointer arithmetic to jump to the second one
return v;

(Try this in NetRun now!)

Sometimes I need to do pointer arithmetic measured in bytes in C or C++, for example when skipping a header on some network data. To do this, I need to typecast the pointer to a byte pointer, add the number of bytes to skip, and then typecast back. It's a pain!

long someLongs[2]={7,13}; // allocate two long integers
long *ptr=someLongs; // point to the first one

char *cptr=(char *)ptr; // typecast to byte pointer
cptr=cptr+8; // move pointer down to next long (8 bytes)
ptr=(long *)cptr; // typecast back

long v=*ptr; // dereference the new pointer
return v;

(Try this in NetRun now!)

Unlike in C++, plain C tends to do pointer arithmetic without shame. In plain C, you can even assign a pointer to one type directly to a pointer to another type without even using a typecast--this is a warning in C, but it's an error in C++.

Making Memory Writeable in Assembly

By default, a string defined with "db" is treated as part of the program's executable code, so the string's bytes can't be modified--if you write to BYTE[rdi], the program will crash with a write to unwriteable memory. But you can tell the assembler with "section .data" to put the string into modifiable memory, which you can read or write:

mov rdi, daString ; pointer to string
mov BYTE [rdi+0], 'Y'; change the string's bytes
extern puts
call puts ; print the string
ret

section .data ; switch storage mode to modifiable data
daString:
	db `No.`,0    ; sets bytes of string in memory

(Try this in NetRun now!)

Again, there are a number of "sections" you can access. By default everything's in the code section ".text". The section directive applies to everything listed in the code until you hit another section directive.

Name	Use	Discussion
section .data	r/w data	This data is initialized, but can be modified.
section .rodata	r/o data	This data can't be modified, which lets it be shared across copies of the program.
section .bss	r/w space	This is automatically initialized to zero, meaning the contents don't need to be stored explicitly.
section .text	r/o code	This is the program's executable machine code (it's binary data, not plain text!).

In C or C++, global or static variables get stored in section .data if you give them an initial value. If they don't have an initial value, they're put in section .bss to get zero-initialized on load. If they're "const", they go in .rodata.

CS 301 Lecture Note, 2014, Dr. Orion Lawlor, UAF Computer Science Department.