Register |
AKA |
Use |
r0 |
Return value, first function argument |
|
r1-r3 |
Function arguments and general scratch |
|
r4-r11 |
Saved registers |
|
r12 |
ip |
Intra-procedure scratch register, rarely used by the linker |
r13 |
sp |
Stack pointer, a pointer to the end of the stack. Moved by push and pop. |
r14 |
lr |
Link register, storing the
address to return to when the function is done. Written by "bl"
(branch and link, like function call), often saved with a push/pop
sequence, read by "bx lr" (branch to link register) or the pop. |
r15 |
pc |
Program counter, the current memory address being executed.
It's very unusual, but handy, to have the program counter just be
another register--for example, you can do program counter relative
addressing very easily, by just loading from [pc+addr]. |
mov r0,#17 @ r0 is return value registerSave some registers, and do some three-operand arithmetic.
bx lr @ return from function
push {r4-r7,lr}Call a function.
mov r4,#10
mov r5,#100
add r0,r4,r5
pop {r4-r7,pc} @ interesting hack: pop into the program counter to return from function
push {lr} @ must save link register if we call our own functionMemory addressing is a little weird. As far as I can tell, you need to first load the memory address, then do the actual memory access.
mov r0,#123 @ r0 is first function parameter
bl print_int @ branch-and-link (exactly like PowerPC)
pop {pc} @ interesting hack: pop into the program counter to return from function
adr r2,mydata @ r2 is our memory address (program counter relative)There's also an equivalent(?) syntax using an equals sign, although to me it's more confusing, and this might just be a GNU thing.
ldr r0,[r2] @ actually load data
bx lr
mydata:
.word 123
ldr r2,=mydata @ r2 is our memory address (program counter relative)Here we're loading the address of an array to use as a function argument.
ldr r0,[r2] @ actually load data
bx lr
mydata:
.word 123
push {lr} @ must save lr since we call a function
adr r0,mydata @ first parameter: array memory address (program counter relative)
mov r1,#2 @ second parameter: array length
bl iarray_print
pop {pc} @ function return
mydata:
.word 123
.word 456
Generally, ARM integer instructions are similar to PowerPC.
push {r4,lr} @ (note: we push r4 too, just for 8-byte stack alignment}(Note: I just added ".syntax unified" to NetRun's boilerplate code, so you no longer need # in front of constants.)
sub sp,sp, 32 @ make plenty of space on the stack
adr r0,.myfloats @ makes r0 point to myfloats
flds s0,[r0] @ load single-precision float (from constant below)
fadds s0,s0,s0 @ add to itself
fsts s0,[sp] @ store out to the stack
mov r0,sp @ location of floats to print
mov r1,1 @ number of floats to print
bl farray_print @ print some floats (FAILS if stack is not 8-byte aligned!)
add sp,sp,32 @ hand back stack space
pop {r4,pc} @ restore link register, and return
.myfloats: @ Note that this is read-only constant space (segfault on store!)
.word 0x3F9E0419 @ floating point 1.2345
@ Generate constants above via C++: "float x=10.0; return *(int *)&x;"
push {r4,lr} @ (note: we push r4 too, just for 8-byte stack alignment}
sub sp,sp, 32 @ make plenty of space on the stack
@ Enter vector compute mode
FMRX r12,FPSCR @ copy FPSCR into r12
BIC r12,r12,#0x00370000 @ clears STRIDE and LEN
ORR r12,r12,#0x00030000 @ sets STRIDE = 1, LEN = 4
FMXR FPSCR,r12 @ copy r12 back into FPSCR
adr r0,.myfloats @ makes r0 point to myfloats
fldmias r0,{s8-s11} @ load four single-precision floats (from constants below)
fadds s8,s8,s8 @ add *four* floats (from LEN above)
fstmias sp,{s8-s11} @ store four single-precision floats (to the stack)
@ Leave vector compute mode
BIC r12,r12,#0x00370000 @ clears STRIDE =1 and LEN = 1
FMXR FPSCR,r12 @ copy r12 back into FPSCR
mov r0,sp @ location of floats to print
mov r1,4 @ number of floats to print
bl farray_print @ print some floats (FAILS if stack is not 8-byte aligned!)
add sp,sp,32 @ hand back stack space
pop {r4,pc} @ restore link register, and return
.myfloats: @ Note that this is read-only constant space (segfault on store!)
.word 0x3F9E0419 @ floating point 1.2345
.word 0x42C80000 @ floating point 100.0
.word 0x41200000 @ floating point 10.0
.word 0x4048F5C3 @ floating point 3.14
@ Generate constants above via C++: "float x=10.0; return *(int *)&x;"
Generally, the vector operations seem to be quite fast, taking only a little longer than the scalar versions. In addition, unlike many chip designers, ARM publishes detailed execution information, including cycle counts, pipeline hazards and scoreboarding, so you have something to start with during optimization!