Friday 24 January 2014

PIPELINING

PIPELINING:

Pipelining is an implementation technique where multiple instructions are overlapped  in execution. The computer pipeline is divided in stages. Each stage completes a part  of an instruction in parallel. The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end.

 

·         F: Fetch

·         D: Decode

·         C: Calculating the address of operand

·         E: Execute

·         W: Write back

 

Types of hazards:

1. Structural hazards

2. Data hazards

3. Control hazards

 

Structural Hazards

Structural hazards occur when a certain resource (memory, functional unit) is requested by more than

one instruction at the same time.

Clock cycle  →  1 2 3 4 5 6 7 8 9 10 11 12

 

·         Instr. i        F D C F E W

·         Instr. i+1        F D C F E W

·         Instr. i+2          F D C F E W

·         Instr. i+3              F D C F E W (4 is stalled)

·         Instr. i+4                F D C F E  W

 

1st F is for fetching instruction and 2nd for fetching operand.

In class we were told after decode there are 3 execute operation but in this first 2 are given specific task to help it understand better.

Penalty

: 1 cycle

 

Data Hazards

We have two instructions, I1 and I2. In a pipeline the execution of I2 can start only when I1 has been executed.

If in a certain stage of the pipeline, I2 needs the result produced by I1, but if this result has not yet been generated,then we have a data hazard and thus we can say that I2 is dependent on I1 i.e Data Dependency.

I1: MUL R2,R3 R2 ← R2 * R3

I2: ADD R1,R2 R1 ← R1 + R2

 

Before executing its F stage, the ADD instruction is stalled until the MUL instruction has written the result into R2.

Penalty

: 2 cycles

 

Control Hazards

 

This type of hazard is caused by uncertainty of execution path, branch taken or not taken.It results when we branch to a new location in the program, invalidating everything we have loaded in our pipeline.

Pipeline stall until branch target known.

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12

BR TARGET     F D C F E W

target                            F D C F E W(2,3,4 are stalled)

next instruction of target is not exexuted until the target is fetched.

 

Example of Inordering and Reordering

           Z ← X + Y                                              C←A*B

Instr. 1>  R1 ←Mem(X)                                    Instr. 5> R5 ←Mem(A)                                        

Instr. 2>  R2 ←Mem(Y)                                    Instr. 6> R6 ←Mem(B)

Instr. 3>  R3 ←R1+R2                                       Instr. 7> R7 ←R5*R6

Instr. 4>  Mem(Z) ←R3                                    Instr. 8> Mem(C) ←R7      

 

 

Clock cycle  →  1 2 3 4 5 6 7 8 9 10 11 12 13 14

Instr. 1               F D I C F E W

Instr. 2                  F D I C F E W

Instr. 3                      F D      I C F  E  W

Instr. 4                          F      D       I   C    F   E   W

               

 

now if the instructuion are in inorder i.e 1 2 3 4 5 6 7 8 then, then there is stalling

But if there is reordering i.e

1 2 5 6 3 4 7 8 then the stalling can be reduced.

 

 

Static pipelining                                                           Dynamic Pipelining

>There is no reordering is instructions.                    >There is reordering of instuctions.

>If one clock cycle is empty in one instruction         >No necessity of empty inst. Due to reordering  

then another it will also be empty in                                other instruction can be executed.

the following instructions.     

 

Mechanism for Instruction Pipelining:

 

Here use of caching,collision avoidance,,multiple functional units,register tagging,and internal forwarding is explained to smoothen the pipeline flow and to remove

bottlenecks.

 

Prefetch Buffer:

In one access time one block of memory is loaded into prefetch buffer.The block access time can be reduced using cache or interleaved memory modules.

Types of Prefetch Buffers:

 

1. Sequential Buffers

2. Target Buffers

3. Loop Buffers

 

Sequential and Target Buffers:

Sequential instructions are loaded into pair of sequential buffers for in sequence pipelining.Instructions from a branch target are loaded into a

pair of target buffers for out-of-sequence pipelining.Both buffers operate in FIFO fashion.A conditional branch like a if condition causes both

sequential and target buffers to fill with instructions.Instructions are taken on the basis of the branch condition from the corresponding buffer and discards

the instruction in other buffer.Within each pair one can usebuffer to load instructions from memory and other to feed instructions to pipeline.

 

Loop Buffer:

The buffers stores sequential statements written in small loops.Loop buffers are maintained by fetch stage of pipeline.Prefetched instructions in loop body will

be executed until all iterations are complete execution.It executes in two steps:

 

1.First step contains the pointer of the instruction just ahead of current instruction.This saves instruction fetch time from memory.

2.It recognizes when the target of a branch falls within the loop boundary.

 

Multiple Functional units:

Sometimes a certain pipeline stage becomes bottleneck.This corresponds to maximum checkmark in reservation table.This problem is solved by using multiple copies

of that state.This leads to use of multiple functional units.

In order to resolve data or resource dependences among successive instructions entering into pipeline reservation table is used with each functional unit.

Operations wait in table until their dependences are resolved.This removes bottleneck in pipeline.

 

Internal Data Forwarding:

Why to do  : The throughput of pipelined processor can be improved with internal data forwarding between functional units.Moreover some memory operations can be replaced by register transfer operations.

 

Types:

 

1.Store Load forwarding

2.Load Load forwarding

3.Store Store forwarding

 

Store Load forwarding:

In this load operation (LD,R2,M) from memory to register can be replaced by move instruction (MOVE,R2,R1) from register R1 to register R2 s,since register transfers are faster.

 

 

By Siddharth Saluja and Varun Jain

 

 

 

 

 

 

1 comment:

  1. An overview of what is done in class is good - i liked the summary But field queries on and also offer info on what happens in real processors (say Pentium or SPARC family)?
    Do they allow load-store forwarding? Out of order instruction scheduling ? What causes structural dependencies? How deep is the instruction pipeline in a real processor (how many stages?) How many pipelines are there ? Is there a loop buffer? A branch Target Buffer? Etc.

    ReplyDelete