COE United Parallel Processing Discussions: Insight of 25 February class lecture

Parity calculation by supposing : 8 bits are there and if bits are taken binary wise then seven processors are required .

Therefore In case of

· PRAM model time calculated would be log(s)

· Serial processor time calculated is = s

Size of instruction=s

Hence speedup in PRAM = s/log(s)

Now let us consider a realistic situation i.e for a linear array and n processors

Computation time = s/n

Total time O(n) to collect all results therefore total time = s/n + n

Differentiating with respect to n the above equation we get : n = under root s

Now substitute the value of n , total time = 2 ( under root of s)

Scalability =

speedup of realistic machine/ speedup of PRAM model = log(s)/2(under root of s) < 1 ; ideally it should be = 1

on similar line we can calculate scalability of hypercube , mess , etc.

in case the expression of scalability has log(n) , this indicates a good scalability.

Decreasing order of scalability :

Hypercube > Mess > linear array

Hypercube has the greatest scalability in this case due to its compact structure.
Mess has a lesser scalability because it has loose ends.

PIPELINING AND THE OVERHEADS INVOLVED

In pipelining overheads don’t occur due to network or IPCs .

Speedup = kn/ [ k + (n-1)]

For efficiency divide the expression by k . this is done because additional latches are added to the chip.

Therefore efficiency = n / k+n-1

‘pipelining

increases the CPU instruction throughput - the number of instructions completed per unit of time. But it does not reduce the execution time of an individual instruction. In fact, it usually slightly increases the execution time of each instruction due to overhead in the pipeline control.
The increase in instruction throughput means that a program runs faster and has lower total execution time.’’ Refer http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/perform.html

Throughput = [n/ k + n-1 ] x f ]

Maximum throughput can be achieved only if instructions occur sequentially

Here k =no of steps , f = no. of frequency of operations

Overheads in instruction pipelining

Stalling
Branching

Branching

As soon as branching occurs all the instructions already in the pipeline are flushed out and its filled again . hence the first instruction now has to again go for K steps . this is for unconditional branching .

Let :

Q = probability of branch being successful

P = probability of branch ; b = variable no. of stage ; for b=k-1 ( maximum stage)

Now speedup considering the overhead due to branching;

n/ (k + n-1+ {PQnb})

Asymptotic throughput= f/ 1 + PQk ( therefore longer the pipeline , more the branching , lower the throughput )

FOR CONDITIONAL BRANCHING : MECHANISM CALLED PREDICTION IS DONE .

STATIC PREDICTION : all the branches will be taken . its advantage is less no. of wrong predictions .
Profile the direction of branches by checking and stimulating .
Real time system which depends on inputs from the environment around it . therefore there can be no particular prediction as the direction will keep on changing . its dynamic in nature. For this Branch Target Buffer comes into play . it records history of every branching . hence the next branching possibility can be predicted with the help of state diagram. Refer to the following link http://www-ee.eng.hawaii.edu/~tep/EE461/Notes/ILP/buffer.html
Speculation is another way where the system lets the branches and execute all the instructions. The instruction which weren’t executed are poisoned and hence not commit to it before write back .
Initiated by compiler : whenever a branch instruction comes insert delay slots i.e. NOP which don’t depend on branch . Hence pipelined instructions will not be flushed.

SUPER SCALAR PIPELINED PROCESSOR : multiple pipeline units in a single processor as in laptops.

To make the super scalar pipelined processor temporal increase the no. of stages and it becomes SUPER PIPELINED

Refer to wikipedia

http://en.wikipedia.org/wiki/Superscalar for more insights .

BY TINA BISWAS & SURAJ GUPTA

COE United Parallel Processing Discussions

Tuesday, 25 February 2014

Insight of 25 February class lecture

1 comment: