CPU and Instruction Set Design

CISC vs RISC vs VLIW

Back in the 1970's, when bits were expensive, the typical CPU encoding used exactly as many bytes as each instruction needed and no more. For example, a "return" instruction might use one byte (0xc3), while a "load a 32-bit constant" instruction might use five bytes (0xb8 <32-bit constant>). These variable-sized instructions are (retroactively) called Complex Instruction Set Computing (CISC), and x86 is basically the last surviving CISC machine.

During the 1980's, folks realized they could build simpler CPU decoders if all the instructions took the same number of bytes, usually four bytes. This idea is called Reduced Instruction Set Computing (RISC), and was built into MIPS, PowerPC, SPARC, DEC Alpha, and other commercial CPUs. Here's a good but long retrospective article on the RISC-vs-CISC war, which got pretty intense during the 1990's. Nowadays, RISC machines might compress their instructions (like CISC), while CISC machines usually decode their instructions into fixed-size blocks (like RISC), so the war ended in the best possible way--both sides have basically joined forces!

During the late 1980's and early 1990's, several companies created even longer instruction machines, called Very Long Instruction Word (VLIW), where basically each part of the CPU has corresponding bits in every instruction. This makes for very simple decoding, and allows some interesting parallel tricks, but each instruction might be a hundred bytes or more! Modern graphics processors are typically VLIW internally, and there are several strange digital signal processor chips that are VLIW, but the concept hasn't really caught on for the main CPU. And any company that produced a successful VLIW chip would have a big problem building an improved processor, since each instruction specifically describes what should happen on each part of the old chip.

Assembly Basics

So here's some assembly language:

        Machine Code:           Assembly Code:
Address                    Instruction Operands
   0:	55                   	push   ebp
   1:	89 e5                	mov    ebp,esp
   3:	b8 07 00 00 00       	mov    eax,0x7
   8:	5d                   	pop    ebp
   9:	c3                   	ret

Address. A byte count, indicating where you are in a big list of bytes. The first byte has address zero. An address can be thought of as an array index into a big array of chars.
Machine Code. A set of bytes that the CPU can treat as instructions and execute. For example, the byte "0xC3" tells the CPU to execute the instruction "ret" (return from function). Human beings can write machine code (just wait for HW3!), but usually humans write Assembly Code instead.
Assembly Code. The human-readable counterpart to machine code. Assembly code is line-oriented human readable text that lots of people write by hand. You use an "assembler" to turn assembly code into executable machine code. You can use a "disassembler" to turn executable code into assembly code (try the NetRun "Disassemble" checkbox for the code above!).
Instruction. One command for the CPU to execute. Can be thought of as: one line of assembly code, or one set of machine code bytes. Depending on the context, sometimes "instruction" includes the operands ("the 'push ebp' instruction"), sometimes it doesn't ("the 'push' instruction, with operand 'ebp'.").
Operands. The parameters taken by various instructions. These parameters can be constants, like the "0x7" above, addresses in memory, or registers.
Registers. Registers are little storage locations built into the CPU. They're used like variables in assembly language--you spend most of your time putting values into registers, doing arithmetic on register values, and moving values around between registers. For example, "ebp", "esp", and "eax" above are all registers. Unlike variables, there are a fixed number of registers (built into the design of the CPU), and the registers have fixed names (built into the assembler).

Here's a typical line of assembly code. It's one CPU instruction, with a comment:

	mov eax,1234 ;  I'm returning 1234, like the homework says...

(executable NetRun link)

Hardware Implementation

Each of the features of assembly language above corresponds to a hardware circuit structure. I claim it's useful to know how these work.

You need to keep track of your address in the code, so there's a program counter.
Instructions need to get loaded from somewhere (ROM or RAM), so there's an instruction fetch unit.
Once you have an instruction, you need to figure out what operand and register's its talking about. This is done in the decode unit.
Registers need to be stored, so there's a register file. You need a way to read and write them, so there are read and write ports.

The simplest place to start is just to look at the registers and arithmetic. We need control lines, highlighted in purple here, to activate the different parts of the computation. Click this image to get a runnable Logisim .circ file.

Manually controlled registers and processor.

Here's the two-register, 12 bit instruction machine we designed in class today. Conceptually, this is the same as above, except I'm driving all the little control lines from a ROM; this stored program dramatically simplifies the user interface.

Circuit diagram simple CPU: instructions are fetched, decoded, and executed.

Here's a slightly more complex four register machine. The instruction set is 16 bits per instruction, and is designed for future compatibility for up to 16 registers. This is vaguely similar to ARM.

Circuit diagram for simple four-register CPU.

Notice how:

The wiring of the CPU is what *determines* the instruction set, so to change instruction sets, you need to use solder! Many machines have a look-up table called microcode that's used to decode instructions, which allows some flexibility to fix hardware bugs. Common instructions often bypass microcode via dedicated logic, for speed.
If you're storing the instructions on-chip, weird instruction lengths like 12 bits work fine. (For example, PIC microcontrollers have 14-bit wide instructions). If the instructions were coming from an off chip 8-bit or 16-bit RAM, this size mismatch would be a problem.
I've left out RAM access, branches, and error checking. These are all important in a real CPU.

Both CPUs above are RISC architectures, which makes the instruction decoder very simple. In a CISC architecture, you need to keep fetching instructions until you have one complete instruction.