Out-of-order Execution

A processor that executes the instructions one after the other, may use the resources inefficiently that leads to poor perfpormance of the processor.To improve the performance of the processor this can be done in two ways. By executing different sub-steps of sequential instructions simultaneously or even executing the instructions entirely simultaneously. Further improvement in the processor can achieved through out-of-order execution[2]. Out-of-order execution can be achieved by executing the instruction in an different from the original order they appear[1].

Out-of-order execution is an approach that is used in high performance microprocessors.This approach efficiently uses instruction cycles ( is a process by which a computer retrieves program instruction from its memory, determines what action the instruction requires and carries out those actions.) and reduces costly delay. A processor will execute the instructions in an order of availability of data or operands instead of original order of the instructions in the program.By doing so the processor will avoid being idle while data is retrieved for the next instruction in a program[1].

In other words, processor that uses multiple execution units completes the processing of instructions in wrong order. For example, I-1 and I-2 are the two intructions where I-1 comes first then I-2. In the out-of-order execution, a processor can execute I-2 instruction before I-1 instruction has been completed. This flexibility will improve the performance of the processor since it allows execution with less waiting time.[1]

The first machine to use out-of-order execution is CDC 6600(1964) which is used to resolve score board conflicts. In 1966 IBM introduced Tomasula's algorithm which supports full out-of-order execution.

In old processors, the processing of instruction is done in-order. The steps required for In-order processor are as follows:
1. The processor retrieves program instructions from its memory.
2. If the input operands are available in the register the instruction is sent to execution unit.
3. If the operand in unavailable during the clock cycle the processor will wait until they are available. This is because the operands are fetced from the memory and are unavailable, so the processor needs to wait until they are available during the current clock cycle.
4. Then the instruction is executed by the appropriate execution unit.
5. After the instruction is executed by the execution unit, it writes back to the register.[1]

In out-of-order processor, the instructions are executed in an order of availability of operands The steps required for In-order processor are as follows:
1. The processor retrieves program instructions from its memory.
2. Instruction are sent to an instruction queue. Instruction queue is also called instruction buffer or reservation stations
3. Until the input operand is available the instruction waits in the queue. The instruction are allowed to leave the queue for the execution.Instruction doesn't need to wait in the queue until its turn. Whenever the operand is available the instruction will leave the queue or buffer for execution.
4. The instruction is sent to appropriate execution unit for execution.
5. Then the results are queued.
6. If all the older instructions have their results written back to register, then the current result is written back to the register file.
[1]

The main advantage of out-of-order processor is it avoids instruction waits when the data needs to perform an operation are unavailable. The out-of-order processor will avoid stall that caused in step(2) of the in-order-execution.[1]
References:
1. http://en.wikipedia.org/wiki/Out-of-order_execution
2. http://www.pcguide.com/ref/cpu/arch/int/featOOE-c.html
3. http://en.wikipedia.org/wiki/Very_long_instruction_word