COE United Parallel Processing Discussions: Summary for class on 21st April

Minimum Average Latency

The term Minimum Average Latency (henceforth referred to as MAL) comes into the picture when we consider non-linear pipelines.

In non-linear pipelines, as we know, the stages can be used in different orders or even multiple stages can be used in the same cycle. So, it becomes difficult to determine when to initiate each successive task in order to obtain maximum performance without conflict.

Reservation Tables

Here we see a sample reservation table for a 3-stage pipeline and an 8-cycle task. The 'X' marks represent the corresponding pipeline stages used in different cycles by the task. For example: The task uses Stage S₃ of the pipeline in the 3^rd cycle.

Latency, Forbidden Latencies & Collision Vectors

Now, we must wait for some cycles before we can initiate another task on the pipeline without conflict. The no. of cycles between initiations is called latency.

Now, some latencies are forbidden because they lead to collision, i.e. a simultaneous need for the 'same stage' by two different tasks. These are called Forbidden Latencies.

For example: for the above reservation table, if we initiate another task after two cycles, then it'll lead to collision [ the stage S₂will then be used by the cycle 4 of task 1 & cycle 2 of task 2 ] and hence the initiation isn't possible, thus '2' is a forbidden latency.

There is a systematic way to represent all forbidden latencies for a reservation table, called the Collision Vector.

Collision Vector is a n-bit vector, where n=no. of cycles needed for the task-1

C= (C_nC_n-1C_n-2…… C₁), C is the collision vector

Now, C_i=1, if the latency 'i' causes collision & C_i=0, if the latency 'i' is permissible

The collision vector for the above reservation table is 1011010.

State Transitions & Latency Cycles

State diagrams can be constructed to specify the permissible transitions among successive initiations of tasks.

[ Here 'state' means that, the Collision Vector gives us info. about the pipeline that which latencies are allowed and which are not, for task initiation; this info. is being referred to here as 'state'. So each CV is regarded as the state of the pipeline. ]

The next state of the pipeline at time t+p can be obtained as follows-

Right-shift the present Collision Vector for the reservation table by p bits and then logically OR the obtained Collision Vector with the original Collision Vector.

For example: We want to find the state of the pipeline after 1 latency (corresponding to the above Collision Vector and Reservation Table)

Original Collision Vector(CV): 1011010

1-bit right shifted CV: 0101101

New CV (corresponding to the new state): 1011010 | 0101101 = 1111111

Latency Cycles are the set of latencies which upon successively applying on the original state (CV) of the pipeline results in a constant Collision Vector.

For example: for the above Reservation Table

Original state: 1011010

State-1 after a 1-bit shift on original state: 1111111

State-2 after an 8-bit shift on State-1: 1011010

Now, here we see that we get the original state back after operating 2 latencies successively: 1 & 8. Now if we again apply 1 & 8, then we will keep getting the same state. Thus 1,8 is called a latency cycle.

ß Other Latency cycles for the above reservation table.

Average Latency & Minimum Average Latency

Average latencies are defined for the different latency cycles for a reservation table. And the minimum of the average latencies give us the Minimum Average Latency (MAL).

For example:

For the latency cycle 1,8 for the above reservation table, the

Average Latency = Sum of latencies in the cycle / No. of latencies in the cycle.

Thus, average latency for cycle 1,8 = (1+8)/2 = 4.5

Bounds on Minimum Average Latency (MAL)

Lower Bound: Maximum no. of checkmarks in a row in the original reservation table

Upper Bound: No. of 1's in the original Collision Vector + 1

Pipeline Optimization Using Delays

To further optimize the pipeline usage even after finding MAL, we can use the concept of introducing 'non-compute delays' in the pipeline. Using these delays, we can change the reservation table of the pipeline so that collisions that were occurring earlier do not happen.

I'll try to explain it through an example:

Suppose we have a 3-stage pipeline with 5 cycles and the following reservation table:

Original CV: 1011 and Minimum Average Latency: 3

Now, let's say we introduce a delay of one clock cycle after the 3^rd cycle and the 5^th cycle, then we get the following reservation table:

Now the new CV: 100010

For the above reservation table the MAL comes out to be: 2 (for the latency cycle 1,3)

Thus, here we see that the introduction of a delay in the pipeline results in even more optimization due to the reduction of MAL.

References:

Advanced Computer Architecture by Kai Hwang

#No direct copying of content done from any sources from the internet or otherwise.

Links For Further Study:

http://iris.nyit.edu/~mcolef/eeng633/extras/Pipelining&Superscalar_Techniques_ch6.ppt

http://vlsi.catholic.ac.kr/pds/data/class/advcom/adcom4.pdf

Compiled By:

Vipul Gaur 368/CO/11

Vijay 364/CO/11

COE United Parallel Processing Discussions

Thursday, 24 April 2014

Summary for class on 21st April

2 comments: