Machine Code: Assembly Code:
Address Instruction Operands
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: b8 07 00 00 00 mov eax,0x7
8: 5d pop ebp
9: c3 ret
mov eax,1234 ; I'm returning 1234, like the homework says...(executable NetRun link)
Size |
Register names |
Meaning (note: not the official meanings!) |
Introduced in |
8-bit |
al,ah, bl,bh, cl,ch, dl,dh |
"Low" and "High" parts of bigger registers |
1972, Intel 8008 |
16-bit |
ax, bx, cx, dx, si, di, sp, bp |
"eXtended" versions of the original 8-bit registers |
1978, Intel 8086/8088 |
32-bit |
eax, ebx, ecx, edx, esi, edi, esp, ebp |
"Extended eXtended" registers |
1985, Intel 80386 |
64-bit |
rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp, r8, r9, r10, r11, r12, r13, r14, r15 |
"Really eXtended" registers |
2003, AMD Opteron / Athlon64 2004, Intel EM64T CPUs |
mov eax,0 ; Clear eax(executable NetRun link)
mov ah,0xAB ; Move "0xAB" into the next-to-the-last byte of eax
Machine Code: Assembly Code:Like x86, PowerPC machine code consists of bytes, with addresses, that represent assembly instructions and operands. PowerPC machine code also spends most of its time manipulating values in registers.
Address Instruction Operands
0: 38 60 00 07 li r3,7
4: 4e 80 00 20 blr
li $2,0xb0b
jr $31
MIPS is a 4-byte per instruction RISC machine almost identical to PowerPC.
mov eax,7Here, the "add" instruction is really "a+=b".
mov ecx,2
add eax,ecx
ret
li $5,7The "add" instruction is really "a = b + c".
li $6,2
add $2,$5,$6
jr $31
nop
return 0xdeadbeef;On x86, there's a one-byte load prefix, followed by the 4-byte little-endian constant:
0: b8 ef be ad de mov eax,0xdeadbeefOn PowerPC, because instructions are just 32 bits, you've got to split the 4-byte constant across two instructions, "load immediate shifted" the high 16 bits, then "or immediate" pastes in the low 16 bits. PowerPC is big-endian.
5: c3 ret
0: 3c 60 de ad lis r3,-8531On MIPS, you also have the low half/high half split. MIPS has a "branch delay slot" after every branch, which always gets executed even if the branch is taken:
4: 60 63 be ef ori r3,r3,48879
8: 4e 80 00 20 blr
[ 0] 0x c: 3c 02 de ad lui r2,0xdeadOn SPARC, you also have a branch delay, and the constant is split up across instructions. But it's split oddly--first you get the high 22 bits with "sethi", and then the low 10 bits:
[ 0] 0x 10: 03 e0 00 08 jr r31
[ 0] 0x 14: 34 42 be ef ori r2,r2,0xbeef
0: 11 37 ab 6f sethi %hi(0xdeadbc00), %o0On DEC Alpha, the only big surprise is that the machine code is little-endian. "lda" actually adds, not OR's, the sign-extended "0xffffbeef" constant to "0xdeae0000", so the sign-extension combines with the high bits to give "0xdeadbeef" in register v0 on return.
4: 81 c3 e0 08 retl
8: 90 12 22 ef or %o0, 0x2ef, %o0 ! deadbeef <foo+0xdeadbeef>
0: ae de 1f 24 ldah v0,-8530Overall, you can see that all these RISC machines use four bytes per instruction. That extra space actually adds up, as we'll see next.
4: ef be 00 20 lda v0,-16657(v0)
8: 01 80 fa 6b ret
0: 04 00 00 00 01 80 [MLX] nop.m 0x0
6: de ff ff ff 7f 00 movl r8=0xffffffffdeadbeef
c: f1 b6 f5 6d
10: 1d 00 00 00 01 00 [MFB] nop.m 0x0
16: 00 00 00 02 00 80 nop.f 0x0
1c: 08 00 84 00 br.ret.sptk.many b0;;
Bytes Platform ProgramPlain x86 executables are reliably over 10% smaller than either 64-bit x86 or PowerPC; and over 30% smaller than DEC Alpha executables. Likely, this depends on the compiler and runtime system somewhat.
36231 alphaInstall/bin/ls
30844 amdInstall/bin/ls
25781 i386Install/bin/ls
29843 ppcInstall/bin/ls
314181 alphaInstall/bin/ksh
242843 amdInstall/bin/ksh
198403 i386Install/bin/ksh
240676 ppcInstall/bin/ksh
10392 alphaInstall/usr/bin/env
9770 amdInstall/usr/bin/env
6690 i386Install/usr/bin/env
7397 ppcInstall/usr/bin/env
112103 alphaInstall/usr/bin/yacc
86140 amdInstall/usr/bin/yacc
71033 i386Install/usr/bin/yacc
82783 ppcInstall/usr/bin/yacc
170865 alphaInstall/usr/bin/g++
140117 amdInstall/usr/bin/g++
121689 i386Install/usr/bin/g++
142463 ppcInstall/usr/bin/g++
440233 alphaInstall/usr/bin/vi
374261 amdInstall/usr/bin/vi
308357 i386Install/usr/bin/vi
354142 ppcInstall/usr/bin/vi
854584 alphaInstall/usr/bin/cvs
652278 amdInstall/usr/bin/cvs
573478 i386Install/usr/bin/cvs
649416 ppcInstall/usr/bin/cvs
32500081 alpha/alpha/binary/sets/base.tgz
27790777 amd/amd64/binary/sets/base.tgz
24908886 i386/i386/binary/sets/base.tgz
27816531 mac/macppc/binary/sets/base.tgz
Instruction usage breakdown (by popularity):
42.4% mov instructions
5.0% lea instructions
4.9% cmp instructions
4.7% call instructions
4.5% je instructions
4.4% add instructions
4.3% test instructions
4.3% nop instructions
3.7% jmp instructions
2.9% jne instructions
2.9% pop instructions
2.6% sub instructions
2.2% push instructions
1.4% movzx instructions
1.3% ret instructions
...
This makes a little more sense broken into categories:
Load and store: about 50% totalSo for this piece of code, the most numerically common instructions on x86 are actually just memory loads and stores (mov, push, or pop), followed by branches, and finally arithmetic--this low arithmetic density was a surprise to me! You can get a little more detail by looking at what stuff occurs in each instruction:
42.4% mov instructions
2.9% pop instructions
2.2% push instructions
1.4% movzx instructions
0.3% xchg instructions
0.2% movsx instructions
Branch: about 25% total
4.9% cmp instructions
4.7% call instructions
4.5% je instructions
4.3% test instructions
3.7% jmp instructions
2.9% jne instructions
1.3% ret instructions
0.4% jle instructions
0.4% ja instructions
0.4% jae instructions
0.3% jbe instructions
0.3% js instructions
Arithmetic: about 15% total
5.0% lea instructions (uses address calculation arithmetic)
4.4% add instructions
2.6% sub instructions
1.0% and instructions
0.5% or instructions
0.3% shl instructions
0.3% shr instructions
0.2% sar instructions
0.1% imul instructions
Registers used:x86 does a good job of optimizing access to the eax register--many instructions have special shorter eax-only versions. But it should clearly be doing the same thing for ebp, and it doesn't have any special instructions for ebp-relative access.
30.9% "eax" lines (eax is the return result register, and general scratch)
5.7% "ebx" lines (this register is only used for accessing globals inside DLL code)
10.3% "ecx" lines
15.5% "edx" lines
11.7% "esp" lines (note that "push" and "pop" implicitly change esp, so this should be about 5% higher)
25.9% "ebp" lines (the bread-and-butter stack access base register)
12.0% "esi" lines
8.6% "edi" lines
Features used:So the "typical" x86 instruction would be an int-sized load or store between a register, often eax, and a memory location, often something on the stack referenced by ebp with an immediate-mode offset. Something like 50% of instructions are indeed of this form!
66.0% "0x" lines (immediate-mode constants)
69.6% "," lines (two-operand instructions)
36.7% "+" lines (address calculated as sum)
1.2% "*" lines (address calculated with scaled displacement)
48.1% "\[" lines (explicit memory accesses)
2.8% "BYTE PTR" lines (char-sized memory access)
0.4% "WORD PTR" lines (short-sized memory access)
40.7% "DWORD PTR" lines (int or float-sized memory)
0.1% "QWORD PTR" lines (double-sized memory)
#!/bin/sh
file="$1"
d="dis.txt"
objdump -drC -M intel "$file" | \
awk -F: '{print substr($2,24);}' | \
grep -v "^$" > "$d"
tot=`wc -l $d | awk '{print $1}'`
echo "$tot instructions total"
echo "Instruction usage breakdown:"
sort $d | awk '{
if ($1==last) {count++;}
else {print count, last; count=0; last=$1;}
}' | \
sort -n -r | \
awk '{printf(" %.1f%% %s instructions\n",$1*100.0/'$tot',$2);}' \
> dis_instructions.txt
head -15 dis_instructions.txt
echo "Register and feature usage:"
for reg in eax ebx ecx edx esp ebp esi edi \
"0x" "," "+" "*" "\[" \
"BYTE PTR" "[^D]WORD PTR" "DWORD PTR" "QWORD PTR"
do
c=`grep "$reg" "$d" | wc -l | awk '{print $1}'`
echo | awk '{printf(" %.1f%% \"'"$reg"'\" lines\n",'$c'*100.0/'$tot');}'
done