COE United Parallel Processing Discussions: Class Summary-26/3/2014

Review of Significant Concepts :

1)SIMD :

Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxanomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment.

2) Type of parallelism :

a)Spatial parallelism : Duplicate hardware performs multiple tasks at once. (SIMD)

b)Temporal parallelism :Task is broken into multiple stages.(pipelining).

example : Ben is baking cookies.It takes 5 minutes to roll a tray of cookies and 10 minutes to bake them.Now the simplest way for ben to achieve this is that he will first roll the cookies and then put them in the oven,waits and then take them out after 10 minutes.This is sequential procedure.And wastes a lot of time.

Q)What if ben uses parallelism?

A)He could do so in two ways :

1)Spatial parallelism : Ben asks his sister allysa to help using her own oven.The diagram for this scenario is :

2)Temporal parallelism : Ben breaks the task into two stages : rolling and baking .While the first batch is baking he rolls the next batch and so on...The diagram for this scenarios is.

Clearly,time is saved during both the above processes.And hence parallelism is used in moder day computing.

3) Pipelining of functional units :

Apart from instruction pipelining , functional units like Floating point adders etc can also be pipelined. The functional unit can be broken down into various parts and same process can be followed as in instruction pipelining. Lets take an example of a floating point adder for adding two normalized floating point numbers. The whole process can be divided into four stages:

1)Compare exponent part of the two inputs by subtraction .

2)Align the mantissas if the exponents are unequal .

3)Add mantissas.

4)Normalize the result .

The diagram is as follows :

The figure above is self explanatory. If any doubts persists please drop in a comment .

4) Road ahead :

Our laptops despite having these functional units cannot be used to operate on vectors. The reason for this is the absence of vector registers .Despite having these function units the input to these pipelines are coming from registers sequentially and thus act as an hindrance in achieving fast computation on vectors.

Solution : what if we have an architecture in which there are scalar as well as vector registers. this architecture is know as Vector processors and is explained below.

Vector Processors :

A vector processor, or array processor, is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector machines appeared in the early 1970s and dominated supercomputer design through the 1970s into the 90s, notably the various Cray platforms. The rapid rise in the price-to-performance ratio of conventional microprocessor designs led to the vector supercomputer's demise in the later 1990s.

Typical instruction set for the vector processors may include instructions like :

1) V1 ← 0 : initializing all the elements of vector V1 to be 0.(V1 is a vector)

2)V3←V1xV2 : Cross product of two vectors V1 and V2 to be stored in V3.(V1,V2,V3 all are vectors).

3) A←V1.V2 : Dot products of two vectors V1 and V2 to be stored in a scalar A.(V1,V2 are vectors and A is a scalar.)

and so on....

Some Vector processors :

Motivation :

•Maths problems involving physical processes present different difficulties for computation

—Aerodynamics, seismology, meteorology

—Continuous field simulation

•High precision.

•Repeated floating point calculations on large arrays of numbers

•Supercomputers handle these types of problem

—Hundreds of millions of flops

— costs $10-15 million.

—Optimised for calculation rather than multitasking and I/O

—Limited market

–Research, government agencies, meteorology

•Vector processor

—Alternative to supercomputer

—Just run vector portion of problems

HARDWARE NEEDS :

1)Vector registers as well as ways to store vectors in the memory.

2) Scalar registers as well as way to store scalars in the memory.

3) Functional units pipelined.

Vector- Access memory schemes :

The flow of vector operands between main memory and vector registers is usually pipelined with mutiple access paths.We have so far discussed S-access memory organisation .

S-Access memory organisation :

Higher order address bits(b) is used to select among the various memory modules.The lower order address(a) bits is use to access the particular element of the selected memory module. At the end of the memory cycle, m=2^a words are latched in the data buffers simultaneously. The low-order a bits are then used to multiplex the m-words out,one per each minor cycle.

The diagram is as follows :

For concurrent access lower order bits are multiplexed while choosing the memory module.Please do read about C-Access too.

SUBMITTED BY :-

SHASHANK GUPTA

SHAURYA SHARMA

Sources :

http://en.wikipedia.org/wiki/Vector_processor

www.eng.auburn.edu/~agrawvd/COURSE/E6200_Fall07/.../rjm0002.ppt

https://www.youtube.com/watch?v=T9B_DtYTKCc

Advanced computer architecture by Kai hwang

3 comments:

Unknown26 March 2014 at 13:22
vector processors are so expensive while they are needed to perform various vector operations as well....so what can we possibly do to maintain a balance bw cost and performance?
Unknown26 March 2014 at 20:58
Want to make a correction, lower order bits are used to select the memory module and higher order bits are used to select the word in that module.
Vijay Sahil Sondhi26 March 2014 at 21:00
"Extracting value for money out of vector
processing hardware depends very much upon the
problem you are trying to solve, the compiler you
have, and the idiosyncrasies of the vector
hardware on your machine.
If your purpose in life is to deal with the
"classical supercomputing" problems which
involve mostly large matrix operations, then
vector processors like Crays are the best
approach. The basic problem is rich in primitive
vector operations and the machine can be be
driven to its best.
In practice, most problems involve a mix of scalar
and vector computations. "

source : www.ausairpower.net/OSR-0600.html

Wednesday, 26 March 2014

Class Summary-26/3/2014