In assembly language, we use "db" (data byte) to allocate some space, and fill it with a string.
mov rdi, daString ; pointer to string extern puts call puts ; print the string ret daString: db `No.`,0 ; sets bytes of string in memory
You can actually pull out the bytes of the string directly from memory like below, for example to print their ASCII values as a number, like 0x4E for 'N'. The syntax in assembler for reading memory bytes uses square brackets, [], inside of which you put the address (pointer) you want to read. It's a good idea to explicitly write the storage size, which is BYTE for a normal string.
mov rdi, daString ; pointer to string mov rax,0 ; zero out high bits of return value mov al, BYTE [rdi+0] ; read first byte of the string (copy to al) ret daString: db `No.`,0 ; sets bytes of string in memory
We can't copy the BYTE value directly to rax, because a byte is 8 bits, and rax is 64 bits.
C datatype | Bits | Bytes | Register | Access memory | Allocate memory |
char | 8 | 1 | al | BYTE [ptr] | db |
short | 16 | 2 | ax | WORD [ptr] | dw |
int | 32 | 4 | eax | DWORD [ptr] | dd |
long | 64 | 8 | rax | QWORD [ptr] | dq |
We can put full 64-bit numbers into memory using "dq", and then read them back out, just like we read individual bytes from a string.
mov rdi, myNum ; pointer to long mov rax, QWORD [rdi+0] ; read the long from memory ret myNum: dq 117 ; puts one long in memory
The pointer value you use in a memory access can be surprisingly complex, like BYTE [rdi+3], or even QWORD [rdi+rcx*8]. This is one of the few places you can actually do arithmetic *inside* an instruction in assembly!
For example, if we have two numbers back to back, we can read the second number by using pointer arithmetic to push the pointer from the first number to the second. In assembly, pointer arithmetic is done in bytes (not bits, not ints, always bytes!), so I add 8 bytes to get to the next integer.
mov rdi, myNum ; pointer to long mov rax, QWORD [rdi+8] ; read *next* long from memory ret myNum: dq 117 ; puts one long in memory [myNum+0] dq 42 ; puts another long in memory [myNum+8]
It's easy to mess this up, and grab part of one integer mixed with part of the next!
Plain C can also do pointer arithmetic, so you can extract bytes from a string exactly like in assembly:
const char *str="No."; // allocate a string char c=*(str+0); // pull out one char printf("The char is '%c'\n",c); // print the char
The syntax here is "*(ptr)", which dereferences the pointer, accessing what the pointer points to. Unlike in assembly, pointer arithmetic on a "pointer to long" happens in longs, not bytes like assembly. C can do this because C has types, while assembly has no clue, so it needs to assume you're operating on bytes.
long someLongs[2]={7,13}; // allocate two long integers long *ptr=someLongs; // point to the first one long v=*(ptr+1); // pointer arithmetic to jump to the second one return v;
Sometimes I need to do pointer arithmetic measured in bytes in C or C++, for example when skipping a header on some network data. To do this, I need to typecast the pointer to a byte pointer, add the number of bytes to skip, and then typecast back. It's a pain!
long someLongs[2]={7,13}; // allocate two long integers long *ptr=someLongs; // point to the first one char *cptr=(char *)ptr; // typecast to byte pointer cptr=cptr+8; // move pointer down to next long (8 bytes) ptr=(long *)cptr; // typecast back long v=*ptr; // dereference the new pointer return v;
Unlike in C++, plain C tends to do pointer arithmetic without shame. In plain C, you can even assign a pointer to one type directly to a pointer to another type without even using a typecast--this is a warning in C, but it's an error in C++.
By default, a string defined with "db" is treated as part of the program's executable code, so the string's bytes can't be modified--if you write to BYTE[rdi], the program will crash with a write to unwriteable memory. But you can tell the assembler with "section .data" to put the string into modifiable memory, which you can read or write:
mov rdi, daString ; pointer to string mov BYTE [rdi+0], 'Y'; change the string's bytes extern puts call puts ; print the string ret section .data ; switch storage mode to modifiable data daString: db `No.`,0 ; sets bytes of string in memory
Again, there are a number of "sections" you can access. By default everything's in the code section ".text". The section directive applies to everything listed in the code until you hit another section directive.
Name | Use | Discussion |
section .data | r/w data | This data is initialized, but can be modified. |
section .rodata | r/o data | This data can't be modified, which lets it be shared across copies of the program. |
section .bss | r/w space | This is automatically initialized to zero, meaning the contents don't need to be stored explicitly. |
section .text | r/o code | This is the program's executable machine code (it's binary data, not plain text!). |
In C or C++, global or static variables get stored in section .data if you give them an initial value. If they don't have an initial value, they're put in section .bss to get zero-initialized on load. If they're "const", they go in .rodata.
CS 301 Lecture Note, 2014, Dr. Orion Lawlor, UAF Computer Science Department.