COE United Parallel Processing Discussions: Superpipeline and Superscalar

Tuesday, 11 March 2014

Superpipeline and Superscalar

Summary of class on 10/03/2014

Super Scalar and Super Pipeline

In super pipeline we increase the number of stages in a pipeline . These processors are based on the fact that many pipeline stages need less than half a clock cycle. The speed of pipeline depends on the slowest stage of pipeline.

If we increase the number of pipelines by m.

T(for single pipeline with k stages) = [k+(n-1)]*τ

where n=no. of instructions

T(for mk stages) = (mk+(n-1))*τm

SpeedUp= [(k+(n-1))*τ]/[(mk+(n-1))*τ/m]

=[mk+m(n-1)]/[mk+(n-1)]

Asymptotic speedup (n->∞) =m

Here the clock period is reduced to (1/m) of the original clock period.

For example each stage in an Look Ahead Carry adder can be separated by a latch .

Super Scalar Processors :

In superscalar processors we increase the numbers of pipelines . We can either have a wider instruction stream or a greater number of instruction streams. Superscalar processors can execute more than one instruction per clock cycle.

A prefetch buffer provides the instructions to different pipelines .

SpeedUp =[(k+n-1)*τ]/[(k+(n-m)/m)*τ]

=[m(k+n-1)]/[mk+n-m]

Asymptotic speedup = m

But in addition to speedup there are dependencies in instruction pipelines which try to reduce the overall speedup .

Pentium P5 was the first 32-bit superscalar microprocessor .It has two pipelines one for most simple instructions and another that can execute any instruction.

Superscalar vs Superpipeline :

Image reference www.uwgb.edu/breznayp/cs353/slides/ch_14.ppt

Superscalar Superpipeline :

Modern day processors are both superscalar and superpipelined.

Power Mac G5 has a superscalar, superpipelined execution core that can handle up to 216 in-flight instructions, and uses 128 bit, 162-instruction SIMD unit.

SpeedUp =SU(superscalar)*SU(superpipeline)

To improve the overall performance of the processor we apply following techniques:

->Hyper Threading

->Branch Target Predictor

->Speculation Unit

->Distributed Shared memory

Sun and Ni's law :

It is a generalization of Gustafson's and Amdahl's law.

SpeedUp = [wi+G(n)wn]/[wi+G(n)wn/n]

G(n)=xn

For amdahl's law x=0

For gustafson's law x=1

By Ravi Kumar Nimmi and Rishav Pipal

6 comments:

Unknown11 March 2014 at 10:31
Correction
T(for mk stages) = (mk+(n-1))*τ/m

There is some problem with the blog the links are not working . So either add this blog to your reading list on blogger.com or copy paste the following link
http://coeunitedparallelprocessing.blogspot.in/2014/03/superpipeline-and-superscalar.html
ReplyDelete
Replies
Shampa chakraverty11 March 2014 at 11:02
Good summary - there's some correction in this sentence: "These processors are based on the fact that many pipeline stages need less than half a clock cycle."
Why half?
It is simply that when number of stages increase by splitting each stage into two or more sub-stages, then the clock period reduces because each sub-stage now does simpler work with lesser delay.
ReplyDelete
Replies
Unknown11 March 2014 at 21:34
In case of super scalar pipeline we have ensure that instructions should b independent.. due to this its degree is restricted to 2-4 pipelines
ReplyDelete
Replies
Shampa chakraverty11 March 2014 at 22:55
Besides, branch mispredictions have a far reaching effect on superscalar pipeline performance - a single misprediction may cause all pipelines to flush.
ReplyDelete
Replies
Unknown12 March 2014 at 00:26
There is one correction required in the post..we have

G(n)=x(to power n)
Then

For amdahl's law :- n=0
For Gustafson's law:- n=1
ReplyDelete
Replies
Unknown12 March 2014 at 00:26
There is one correction required in the post..we have

G(n)=x(to power n)
Then

For amdahl's law :- n=0
For Gustafson's law:- n=1
ReplyDelete
Replies

Add comment