Tuesday 11 March 2014

Superpipeline and Superscalar




Summary of class on 10/03/2014

Super Scalar and Super Pipeline


Super Pipeline Processors:

In super pipeline we increase the number of stages in a pipeline . These processors are based on the fact that many pipeline stages need less than half a clock cycle. The speed of pipeline depends on the slowest stage of pipeline.


If we increase the number of pipelines by m.

T(for single pipeline with k stages) =  [k+(n-1)]*τ

                            where n=no. of instructions

T(for mk stages) = (mk+(n-1))*τm


SpeedUp= [(k+(n-1))*τ]/[(mk+(n-1))*τ/m]

         =[mk+m(n-1)]/[mk+(n-1)]

Asymptotic speedup (n->∞) =m


Here the clock period is reduced to (1/m) of the original clock period.

For example each stage in an Look Ahead Carry adder can be separated by a latch .



Super Scalar Processors :

In superscalar processors we increase the numbers of pipelines . We can either have a wider instruction stream or a greater number of instruction streams. Superscalar processors can execute more than one instruction per clock cycle.

A prefetch buffer provides the instructions to different pipelines .

SpeedUp =[(k+n-1)*τ]/[(k+(n-m)/m)*τ]

         =[m(k+n-1)]/[mk+n-m]

Asymptotic speedup = m


But  in addition to speedup there are dependencies in instruction pipelines which try to reduce the overall speedup .


Pentium P5 was the first 32-bit superscalar microprocessor .It has two pipelines one for most simple instructions and another that can execute any instruction.


Superscalar vs Superpipeline :

Image reference www.uwgb.edu/breznayp/cs353/slides/ch_14.ppt


Superscalar  Superpipeline :

Modern day processors are both superscalar and superpipelined.

Power Mac G5 has a superscalar, superpipelined execution core that can handle up to 216 in-flight instructions, and uses 128 bit, 162-instruction SIMD unit.

SpeedUp =SU(superscalar)*SU(superpipeline)

To improve the overall performance of the processor we apply following techniques:

->Hyper Threading

->Branch Target Predictor

->Speculation Unit

->Distributed Shared memory



Sun and Ni's law :

It is a  generalization of Gustafson's and Amdahl's law.

SpeedUp = [wi+G(n)wn]/[wi+G(n)wn/n]

G(n)=xn

For amdahl's law x=0

For gustafson's law x=1


By Ravi Kumar Nimmi and Rishav Pipal

6 comments:

  1. Correction
    T(for mk stages) = (mk+(n-1))*τ/m

    There is some problem with the blog the links are not working . So either add this blog to your reading list on blogger.com or copy paste the following link
    http://coeunitedparallelprocessing.blogspot.in/2014/03/superpipeline-and-superscalar.html

    ReplyDelete
  2. Good summary - there's some correction in this sentence: "These processors are based on the fact that many pipeline stages need less than half a clock cycle."
    Why half?
    It is simply that when number of stages increase by splitting each stage into two or more sub-stages, then the clock period reduces because each sub-stage now does simpler work with lesser delay.

    ReplyDelete
  3. In case of super scalar pipeline we have ensure that instructions should b independent.. due to this its degree is restricted to 2-4 pipelines

    ReplyDelete
  4. Besides, branch mispredictions have a far reaching effect on superscalar pipeline performance - a single misprediction may cause all pipelines to flush.

    ReplyDelete
  5. There is one correction required in the post..we have

    G(n)=x(to power n)
    Then

    For amdahl's law :- n=0
    For Gustafson's law:- n=1

    ReplyDelete
  6. There is one correction required in the post..we have

    G(n)=x(to power n)
    Then

    For amdahl's law :- n=0
    For Gustafson's law:- n=1

    ReplyDelete