COE United Parallel Processing Discussions: granularity and speed up

Monday, 3 February 2014

granularity and speed up

GRANULARITY

Granularity is the extent to which a system is broken down into small parts, either the system itself or its description or observation. It is the extent to which a larger entity is subdivided. For example, a yard broken into inches has finer granularity than a yard broken into feet.

Coarse-grained systems consist of fewer, larger components than fine-grained systems; a coarse-grained description of a system regards large subcomponents while a fine-grained description regards smaller components of which the larger ones are composed.

The terms granularity, coarse, and fine are relative, used when comparing systems or descriptions of systems. An example of increasingly fine granularity: a list of nations in the United Nations, a list of all states/provinces in those nations, a list of all cities in those states, etc.

· Programmer divides the program into coarse grain, compiler into fine grain and OS into middle grain.
Programmer is not concerned about fine grain dependency.

· Coarse grained systems are implemented in distrtibuted environment, i.e. MIMD - NORMA.

· Middle grained systems are implemented in UMA(Uniform memory access).

· Fine grained systems are implemented in Cores.

Speed Up (decrease in overall computation time with parallelism)

Suppose we have a program with ‘N1’ no. of instructions and time taken by the processor to execute single instruction be ‘t1’.

So the total time taken by a single processor machine to execute all the instructions will be = N1* t1

Now if we have ‘N2’ no. of processors then the overall time will depend on

· time taken per processor to executes its set of instructions :

(t1*N1)/N2

· Communication time : ( t2*N3)

Where ‘t2’ is single communication time

‘N3’ is the no. of times the communication is made

· OS overheads (we will neglect these)

So the new time = (t1*N1)/N2 + (t2*N3)

Speed up = normal time/ time during parallel execution

= (N1*t1)/ ((t1*N1)/N2 + (t2*N3))

Dividing the numerator and denominator by (t1*N1)/N2

We have,

Speed up = N2/ (1+ (t2/t1)*(N3/ (N1/N2))

We can write N1/N2 = N (size of computation)

So,

Speed up = N2/ (1+ (t2/t1)*(N3/ N)

Also N/N3 (size of computation by no. of communications) is size of computation b/w two communications, i.e. size of grain

So,

Speed up = N2/ (1+ (t2/t1)*(1/size of grain)

Observations from above relation

From the above relation,

For constant speedup the term in the denominator should be constant

=> If size of grain is increased then t2/t1 must be increased and

=> If size of grain is decreased then t2/t1 should also decrease

Here ‘t2’ is the communication time so we conclude that “size of grain is proportional to communication time”.

So, for systems with large grain size we can afford to have more communication time therefore in “coarse grained systems” we use “message passing paradigm”.

While with small grain size we can’t afford to have large ‘t2’ otherwise the speedup will decrease significantly so here in “fine grained systems” we use “shared memory paradigm”.

4 comments:

Vijay Sahil Sondhi4 February 2014 at 09:36
Adding to the blog.
"The goal is to determine the right amount of granularity ('usually' larger is better) for parallel tasks, while avoiding load imbalance and communication overhead to achieve the best performance. If granularity is too fine, then performance can suffer from communication overhead. If granularity is too coarse, then performance can suffer from load imbalance."
ReplyDelete
Replies
Unknown5 February 2014 at 12:19
I think that the most efficient granularity is dependent on the algorithm and the hardware environment in which it runs.Since in most cases the overheads are associated with communications,so it is advantageous to have coarse granularity.
ReplyDelete
Replies
Shampa chakraverty6 February 2014 at 02:12
1. Deadlock is essentially a condition of Circular wait. One process (A) needs the shared resource that another process (B) has locked ~~~~like in a chain~~~~ the last one requires a resource locked by the first one in chain (A).
Your observation is correct, locking shared memory resources in certain sequences can lead to deadlocks. The situation becomes more complicated when different processes have different priorities. Care is indeed needed. One of the MARs robots - Pathfinder actually failed because of a subtle situation that arose.
Find out abt this and Priority Inheritence versus Priority cieling!!

2. Abt the interesting viewpoints on granularity (others do add more): any modern application is granularized at various levels. The programmer partitions it at a coarse level, the compiler-cum-hardware granularizes it at a finer level while the OS manages the grains by scheduling them among available resources - in a manner that achieves the best compromze between performance and load-balancing......
ReplyDelete
Replies
Unknown6 February 2014 at 21:48
"Fine grained systems are implemented in Cores" - This line basically refers to the PIPELINING which is implemented inside a single core.
.
Also, considering the ideal situation MESSAGE PASSING PARADIGM takes more communication time because CREATING messages and PASSING them from one BUFFER to the other takes time whereas in SHARED MEMORY PARADIGM each processor has a DIRECT ACCESS to the memory.
ReplyDelete
Replies

Add comment