Wednesday 22 January 2014

PIPELINING

PIPELINING     

 

A pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion ,in that case, some amount of buffer storage is often inserted between elements.

Example of pipeling :

Consider the assembly of a car: assume that certain steps in the assembly line are to install the engine, install the hood, and install the wheels (in that order, with arbitrary interstitial steps). A car on the assembly line can have only one of the three steps done at once. After the car has its engine installed, it moves on to having its hood installed, leaving the engine installation facilities available for the next car. The first car then moves on to wheel installation, the second car to hood installation, and a third car begins to have its engine installed. If engine installation takes 20 minutes, hood installation takes 5 minutes, and wheel installation takes 10 minutes, then finishing all three cars when only one car can be assembled at once would take 105 minutes. On the other hand, using the assembly line, the total time to complete all three is 75 minutes. At this point, additional cars will come off the assembly line at 20 minute increments.

There are two types of pipelining :-

1.   Linear pipelining

  In this, processing stages are placed in cascading manner  ,such that each stage is executed one after the other  i.e   a stage is executed only when previous stages are being executed , to perform a fixed function over a stream of data.

They are static pipelines because they are used to nperform fixed functions.

Linear pipelines are applied for instruction execution, arithmeticn computation and memory access operations.

 

It is further divided into two categories :

 

è Synchronous linear pipeline

In this, every stage transfers data to the next stage at same point of time(i.e at same clock pulse) using latches and flip-flops.The pipelining stages are combinational logic circuits . so, it is desired to have approximately equal delay in every stage.

è Asynchronous linear pipeline

In this ,it is not necessary to have transfer of data in every stage, at the same point of time.Data flow between adjacent stages is controlled by handshaking protocol.

Used in MPI (message passing interface).

 

Reservation table  :

The reservation table mainly displays the time space flow of data through the pipeline for a function. Different functions in a reservation table follow different paths.

The number of columns in a reservation table specifies the evaluation time of a given function.

Determination of clock cycle 'Ƭ' of a pipeline :

 

         Ƭ = max (Ƭi)ᴷ₁+d = Ƭm + d

 OR  Ƭ = m  if (Ƭm >> d)

Where, Ƭm= maximum stage delay

Ƭi =time delay of stage Si

d = time delay of a latch

        Pipeline frequency, f = 1⁄ Ƭ

Suppose if we are not using pipeling , we have 'n' number of tasks and 'k' number of stages

Then, required clock cycles = nk

Total time required, T₁ = nk * Ƭ

By using pipelining,

Required clock cycles = k+(n-1)

Total time required, Tk = [k +(n-1)]* Ƭ

Speed up factor, Sk = Ƭ₁ / Ƭk

                                  =nk /[k+(n-1)] 

Efficiency, Ek = Sk/k = nk/k[k+(n-1)]

                       = n/k+(n-1)

Throughput, Hk    = number of tasks / total time

                               = n/[k+(n-1)]Ƭ

The larger the number 'k' of subdivided pipelined stages,the higher the potential speedup performance.

'k' cannot increase indefinitely due to practical constraints on cost,control complexity and circuitary.

  

   PCR(performance to cost ratio)= f/(C+kh)

                                                                    (where C =cost of all logic states, h=cost of each latch,k=no of stages)

 Using the PCR , the optimal number of stages can be found out.

 

      No of optimal stages , =  sqrt(t*C/(d*h))

                                              (where t = total flow through delay, d=latch delay)

                              

 

 

 

 

 

2.Non linear/Dynamic Pipelining

  A non-linear pipeline (also called dynamic pipeline) can be configured to perform various functions at different times. In a dynamic pipeline there is also feed forward or feedback connection.

 

 

  Reservation Table/State Time diagram

  The reservation table in non linear pipelining are intersterting as they don't follow a linear pattern. It dsplay time-space flow of data for evaluation of one or more functions. The checkmarks in each row correspond to time instants that particular stage will be used.

There may be multiple checkmarks in a row indicating repeated usage of same stage in different cycles.

                                        

STATE-TIME DIAGRAM

 

Let the sequence of stages in a task be S1->S2->S3->S2->S3->S1->S3->S1, then its state-time diagram would be:

 

 



1

2

3

4

5

6

7

8

S1

 -------







 

 -------



 -------

S2



 -------



 -------







S3





 -------



 -------



 -------





------ INDICATES THE ACTIVE STAGE IN A GIVEN CLOCK CYCLE

 

 

 

If two or more processes attempt to use same pipeline stage at the same time, Collision/Resource Conflict occurs.

To resolve the collisions,some scheduling is done.

 

Latency

Latency is defined as no of clock cycles between two initiations of a pipeline or the no of clock cycles after which the next task needs to be started.

Latencies that causes collisions are called forbidden latencies.

Forbidden latencies are detected by checking the distance between any 2 checkmarks in the same row of the reservation table.

 

For eg.

In the above table the forbidden latencies are 2,4,5,7.

 

Let m be the max forbidden latency

      p be permissible latency values (values where collisions don't occur)

&  n be the total number of clock pulses /column number in reservation table ,

Then

                m<=n-1

and        1<=p<=m-1.

 

Collision vector

A collision vector is a m-bit binary vector C . The value of Ci=1 if i causes a collision and 0 if latency I is permissible.

 

For example in the above reservation table the forbidden latency are 2,4,5,7.

So the collision vector will be 1011010.

 

Latency Squence

Latency sequence is a sequence of permissible non forbidden latencies between successive task initiations.

 

Latency cycle

Latency cycle is a latency sequence which repeats the same subsequence indefinitely.

For example , one of the latency cycle in the above example could be 2,5,2,5,2,5,…,

This implies successive intiaitions of new task  are by 2 and 5 cycles alternately.

 

Average latency

Avg latency is obtained by dividing the sum of all latencies by no of latencies in the cycle.

The avg latency is cycle 2,5,2,5,2,5.. would be (2+5)/2 = 3.5.

 

Constant cycle

Constant cycle is a latency cycle which contains only one latency value.

 

State Diagrams

State diagrams specify the permissible state transitions among successive initiations.

The next state is obtained by ORing the initial state and the latency time right shifted state.

 

 

 

NOTE :

 

Supercomputers in India

 

According to the latest news India's PARAM is listed among world's most power efficient supercomputer

 

Development of Advanced Computing (C-DAC) said its super computer--PARAM Yuva II was ranked on the first place inIndia in the Green500 List of Supercomputers in the World. PARAM YUVA II has been ranked number 9 in the Asia PacificRegion and stands at the 44th place in the world among the most power efficient systems as per the list that was announced on November 20, at the Supercomputing Conference (SC'2013) in Denver, Colorado, USA.

 

C-DAC is the 2nd organisation in the world to have carried the level-3 measurement of power as compared to performance for Green500 list, which is an indication of the most rigorous level of measurement exercise performed for such ranking.

 

Information about Supercomputers In India

 

Top Supercomputers-India is an effort to list the most powerful supercomputers in India. It is supported by Supercomputer Education and Research Centre, Indian Institute of Science , Bangalore. 

The project is meant to create and promote healthy competition among the supercomputing initiatives in India and can substantially lead to significant supercomputing advancement in the nation.
It list the top 500 supercomputer in india with regular updations.
Below is the link

http://topsupercomputers-india.iisc.ernet.in/

 

 



By Shefali Singh and Upasana Mehta

4 comments:

  1. Interesting compilation on India's supercomputing power.
    At one point of time, India's EKA (CRL Lab, Tata) recorded 4th in terms of performance too! Sanely, India's focus is now on minimizing power consumption.

    ReplyDelete
  2. http://www.green500.org/lists/green201311&green500from=1&green500to=100

    this is the current list of top 100 green computers...with a downloadable file alongside giving top 500 green computers....india ranks 44 with PARAM YUVA II having 30056 cores :-O


    further,i was also able to get top 500 fastest supercomputers list as of nov 2013

    http://www.top500.org/lists/2013/11/

    one can find it on this site in the downloads tab at the right side...highest ranked indian supercomputer is again at 44 with iDataPlex DX360M4, Xeon E5-2670 8C 2.600GHz, Infiniband FDR made by IBM present at Indian institute of tropical meteorology with 38016 cores :-O

    P.S:one can check various features which unable the construction of these ranks..in the download lists provided...Also, i have asked the mod's to add these files in a post so that i can be viewed by everyone..thanks a lot.:)

    ReplyDelete
    Replies
    1. I was shocked to see the number of cores used...if i am right the laptops we use have 4 cores...the total number of cores in the fastest computer (Tianhe-2) is about 3,120,000...thats 7,80,000 times the number we have in our laptop...it sure cant be this much...i may have interpreted the data wrongly...if anyone can clarify me on this???

      Delete
  3. Just one clarification - laptops have a single processor embedded with a few cores. In modern supercomputers, Intel processors really rule the roost - they are constructed in a scalable manner using several such processors (cores are processor circuits within a chip). So these cores are on different processors, each having 7 to 9 cores.

    So what is the architecture of modern supercomputers?
    Typically, a hierarchy of MIMD clusters (cluster within a cluster within a cluster), the lowest level a UMA system, the individual processors themselves being UMA with SMP, and each core is a small-vector SIMD system (Intel Netburst architecture) containing deep instruction instruction pipelines. A thorough mix of various flavors!

    India's PARAM series started very uniquely though - later....

    ReplyDelete