C/C++ datatype | Bits | Bytes | Register | Access memory | Allocate memory |
char | 8 | 1 | al | BYTE [ptr] | db |
short | 16 | 2 | ax | WORD [ptr] | dw |
int | 32 | 4 | eax | DWORD [ptr] | dd |
long | 64 | 8 | rax | QWORD [ptr] | dq |
For example,
we can put full 64-bit numbers into memory using "dq", and then
read them back out with QWORD[yourLabel].
If you allocate more than one constant with dq, they appear at
ascending addresses. So this reads the 5, like you'd expect:
dos_equis: dq 5 ; writes this constant into a "Data Qword" (8 byte block) dq 13 ; writes another constant, at [dos_equis+8] (bytes) foo: mov rax, [dos_equis] ; read memory at this label ret
Adding 8 bytes (the size of a dq, 8-byte / 64-bit QWORD) from the
first constant puts us directly on top of the second constant, 13:
dos_equis: dq 5 ; writes this constant into a "Data Qword" (8 byte block) dq 13 ; writes another constant, at [dos_equis+8] (bytes) foo: mov rax, [dos_equis+8] ; read memory at this label, plus 8 bytes ret
An "array" is just a sequence
of values stored in ascending order in memory. If we
listed our data with "dq", they show up in memory in that order,
so we can do pointer arithmetic to pick out the value we
want. This returns 7:
mov rcx,my_arr ; rcx == address of the arrayDid you ever wonder why the first array element is [0]? It's because it's zero bytes from the start of the pointer!
mov rax,QWORD [rcx+1*8] ; load element 1 of array
ret
my_arr:
dq 4 ; array element 0, stored at [my_arr]
dq 7 ; array element 1, stored at [my_arr+8]
dq 9 ; array element 2, stored at [my_arr+16]
mov rcx,my_arr ; rcx == address of the arrayIt's extremely easy to have a mismatch between one or the other of these values. For example, if I declare values with dw (2 byte shorts), but load them into eax (4 bytes), I'll have loaded two values into one register. So this code returns 0xbeefaabb, which is two 16-bit values combined into one 32-bit register:
mov eax,DWORD [rcx+1*4] ; load element 1 of array
ret
my_arr:
dd 0xaaabbbcc ; array element 0, stored at [my_arr]
dd 0xc001007 ; array element 1, stored at [my_arr+4]
mov rcx,my_arr ; rcx == address of the arrayYou can reduce the likelihood of this type of error by adding explicit memory size specifier, like "WORD" below. That makes this a compile error ("error: mismatch in operand sizes") instead of returning the wrong value at runtime.
mov eax,[rcx] ; load element 0 of array (OOPS! 32-bit load!)
ret
my_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]
mov rcx,my_arr ; rcx == address of the array(If we really wanted to load a 16-bit value into a 32-bit register, we could use "movzx" (unsigned) or "movsx" (signed) instead of a plain "mov".)
mov eax, WORD [rcx] ; load element 0 of array (OOPS! 32-bit load!)
ret
my_arr:
dw 0xaabb ; array element 0, stored at [my_arr]
dw 0xbeef ; array element 1, stored at [my_arr+2]
C++ |
Bits |
Bytes |
Assembly
Create |
Assembly Read |
Example |
char | 8 |
1 |
db (data byte) |
mov al, BYTE[rcx+i*1] |
(Try this in NetRun now!) |
short | 16 |
2 |
dw (data WORD) |
mov ax, WORD [rcx+i*2] | (Try this in NetRun now!) |
int | 32 |
4 |
dd (data DWORD) |
mov eax, DWORD [rcx+i*4] | (Try this in NetRun now!) |
long | 64 |
8 |
dq (data QWORD) |
mov rax, QWORD [rcx+i*8] | (Try this in NetRun now!) |
Human | C++ | Assembly |
Declare a long integer. | long y; | rdx (nothing to declare, just use a register) |
Copy one long integer to another. | y=x; | mov rdx,rax |
Declare a pointer to an long. | long *p; | rax (nothing to declare, use any 64-bit register) |
Dereference (look up) the long. | y=*p; | mov rdx,QWORD [rax] |
Find the address of a long. | p=&y; | mov rax,place_you_stored_Y |
Access an array (easy way) | y=p[2]; | (sorry, no easy way exists!) |
Access an array (hard way) | p=p+2; y=*p; |
add rax,2*8; (move forward by
two 8 byte longs) mov rdx, QWORD [rax] ; (grab that long) |
Access an array (too clever) | y=*(p+2) | mov rdx, QWORD [rax+2*8]; (yes, that actually works!) |
Loading from
the wrong place, or loading the wrong amount of data, is an
INCREDIBLY COMMON problem when using pointers, in any
language. You WILL make this mistake at some point over the
course of the semester, so be careful!
In plain C, you can put a string on the screen with the standard C library "puts" function:
puts("Yo!");
You can expand this out a bit, by declaring a string variable. In C, strings are stored as (constant) character pointers, or "const char *":
const char *theString="Yo!"; puts(theString);
Internally, the compiler does two things:
In assembly, these are separate steps:
Here's an example:
mov rdi, theString ; rdi points to our string extern puts ; declare the function call puts ; call it ret theString: ; label, just like for jumping db `Yo!`,0 ; data bytes for string (don't forget nul!)
In assembly,
there's no obvious way to tell the difference between a
label designed for a jump instruction (a block of code), a label
designed for a call instruction (a function), a label designed as
a pointer (like a string), or many other uses--it's just a
pointer!
while (*p++!=0) { /* do something to *p */ }
If you unpack
this a bit, you find:
Here's a
typical example, in C:
char s[]="string"; // declare a string char *p=s; // point to the start while (*p++!=0) if (*p=='i') *p='a'; // replace i with a puts(s);
Here's a
similar pointer-walking trick, in assembly:
mov rdi,stringStart again: add rdi,1 ; move pointer down the string cmp BYTE[rdi],'a' ; did we hit the letter 'a'? jne again ; if not, keep looking extern puts call puts ret stringStart: db 'this is a great string',0(We'll see how to declare modifiable strings later.)