Monday 17 February 2014

Summary of monday's class

Static Networks

 

Hi all,

Previously we discussed about some of the dynamic networks which include Cross-bar networks, Multiported Memories, Multistage networks in the class .There are other dynamic networks too like Benes network, Baseline network etc. But today we discussed a network which is rather a static network . We have already discussed one static network which was Linear static network (we've considered this model during Prefix computation ).

 

Hypercube Interconnection Network

 

Systems were designed to reduce the amount of time required for processors to perform simple message routing and Hypercube and Mesh are among two of the popular interconnection schemes. Here we are going to discuss about the Hypercube Interconnection Network.

 

·         k-dimensional hypercube  contains 2k processors (nodes).

·         Each processing node contains a switch.

·         The nodes are numbered from 0 to 2k – 1, usually in binary.

·         Two nodes are separated by a single link if the binary values of their numbers differ by one bit i.e. they are neighbors.

Following fig. shows the various dimensional hypercubes along with the binary equivalents of their nodes.

http://www.cdac.in/HTmL/events/beta-test/archives/promcore-2008/mpi-1x-promcore-2008/point-to-point-mpi-images/hypercube.gif

Note:- As can be seen from the fig.,the no. of bits for the binary equivalent are equal to the log(no.of processors(N)) i.e logN which is also the degree of a processor.

 

Now we discuss the Hypercube n/w with an eg. of a 4D n/w and analyzing various factors such as:-

·         Latency

·         Scalability

·         Diameter

·         Bisection Width

·         No. of Links

·         Dimension

Later on we’ll calculate the overall complexity of matrix multiplication algo.

 

·         Latency:- logN (maximum path taken for processing).

·         Scalability:- These kind of static networks are not scalable as their degree are changing but linear networks have constant degree(except corner nodes).

 

·         Diameter:- 2*logN (diameter tells about the communication overhead. It is the minimum number of steps it takes for one processor to send a message to the processor that is the farthest away).

 

·         Bisection width:- it tells how well two parts of a interconnection are connected.

In linear network bisection width is constant but here it depends upon the no of processors i.e., it is equal to N/2.

 

·         No of Links:- N*logN/2

 

·         Dimention of hypercube of NxN processors :-

total no. of nodes = N*N

dimension = 2*logN

 

 

Multiplication of two 4x4 matrices

 

Using the above interconnection technique, we have to majorly perform the following 5 steps(in bracktes time required is specified) :-

1).Broadcast A (N*logN)

2).Transpose B ( N )

3).Broadcast B (N*logN)

4).Multplication and addition of rows and columns on N*N processors

   (particularly N multiplications and N-1 additions)

5).Rearrange the matrix to get required elements(N*logN)

 

In Broadcasting A we have to perform the following:

        First apply R1 and R2 at 2 and 7 (diagonal edge) and R3 and R4

        at 8 and 13 repectively.

1st stage - send the rows to their respective 4th neighbour(other                neighbours will remain masked)

2nd stage - now simultaneously broadcast the information to their              respective 3rd neighbours.

This simplifies the control unit as there is a single control line for all the processors.

Hence, in log(N) (in general) stages we will have R1, R2, R3, R4 at the front face of the cube i.e, at 0th ,1st , 2nd and 3rd processor we have R1 ,R2, R3, R4 respectively.

Similarly on other faces too.

 

After transposing the columns of matrix B we will broadcast the transpose of C1, C2, C3, and C4 on their respective processors. Let C1, C2, C3 and C4 represents transpose of columns.

        First apply C1 and C2 at 0 and 1 and C3 and C4 at 8 and 12 repectively.

1st stage - send the columns to their respective 1st neighbour(other           neighbours will remain masked)

2nd stage - now simultaneously broadcast the information to their              respective 2nd neighbours.

After this we'll have C1 at all processors of front face of cube.

       

Now multiplication and addition of R1 and transpose of C1 will be performed on 0th processor , of R2 and C1 on 1st processor, of R3 and C1 on 2nd processor, of R4 and C1 on 4th processor... this will happen simultaneously on other processors too for other columns.

Now we rearrange the elements of hypercube to form the final resultant matrix.

 

Conclusion

Considering all the above steps, we get the overall complexity to be O(N*logN).

The value obtained when we considered N^3 processors was O(logN).

Here broadcasting is done in order to utilize parallelism and hence the factor logN comes into the picture as an overhead.

 

 

Submitted by:-

Sweety Yadav and Uzma.

 

 

13 comments:

  1. During analysis of PRAM model considering the same example of matrix multiplication logN term comes into the picture cuz of the step of accumulation of multiplications but here it is bcuz of latency which is logN here.

    ReplyDelete
  2. An important issues in the design of interconnection networks for massively parallel computers is scalability. A new scalable interconnection network topology, called Double-Loop Hypercube (DLH), is proposed. The DLH network combines the positive features of the hypercube topology, such as small diameter, high connectivity, symmetry and simple routing, and the scalability and constant node degree of a new double-loop topology. The DLH network can maintain a constant node degree regardless of the increase in the network size.

    ReplyDelete
  3. Hypercube Network :- Classically this is a multidimensional mesh of processors with exactly two processors in each dimension. An example of such a system is the Intel iPSC/860 computer. Some new projects incorporate the idea of several procesors in each node which results in fat hypercube, i.e. indirect network topology. An example is the SGI/Cray Origin2000 computer.

    ReplyDelete
  4. Several Modifications of Hypercubes are :
    Folded Hypercubes : where more links are established b/w nodes.
    Extended Hypercubes : by addition of nodes
    Cube-Connected cycle where another structure is mapped on Hypercube
    Banyan Hypercube : a synthesis of Banyans and Hypercubes
    Modular Hypercube : where degree of each node is (2n-1) compared to n in hypercubes.

    http://www.ece.uic.edu/~ajohar/paper7.pdf

    ReplyDelete
  5. Diameter for hypercube made of N processors should be logN and not 2logN as mentioned in the post.

    ReplyDelete
  6. I have 2 doubts :
    1. How is the diameter 2logN for N no of processors? Max length of the minimum length paths is 4 logN for N=16 processors which is b/w 0 and 15.

    2. In log N steps why can't we broadcast rows in parallel ,i mean R1 to its fourth neighbour and R2 to its fourth neighbour at the same time? So that overall complexity is O(log N)

    could somebody explain?

    ReplyDelete
    Replies
    1. Ravi, i think i've cleared your first doubt . But your second doubt is same as mine . Hope we'll get our doubt cleared soon . May be it is to reduce the complexity of control lines

      Delete
  7. By the definition of diameter, here we can say that diameter diameter will be the path taken to send info from 0th procesor to 15th processor(which are farthest away ). So one possible path that can be taken is 0-1-5-7-15 as processing can directly be done with its neighbours only. Hence 2*logN but with assumptions we can say it is logN . This is what i've understand . Corrections are welcome

    ReplyDelete
  8. I want to make on thing clear that is the above mentioned diameter is specifically for matrix multiplication i.e, 2logN where NXN is the dimension of the matrix and if we take N as the total number of processor then it will be logN but way of finding the largest of the shortest path be the same .

    ReplyDelete
  9. A major drawback of the hypercube network is its lack of scalability..which limits its use in building large size systems out of small size systems with little changes in configuration...AND

    It is more difficult to design and fabricate the nodes of the hypercube because of the large fan out..

    ReplyDelete
  10. @uzma So isn't the diameter logN (N=16, logN=4 and diameter is also 4 as mentioned by you) ? I think you are confusing N to be the dimension of the hypercube which is 4 here. N is the number of processors in the hypercube which is equal to square of dimension of hypercube(n), N=n^2 so logN=2*logn

    ReplyDelete
  11. Yes thats what i've said in my privous comment but what we have written isn't wrong as u can see we've clearly mentioned tha total number of nodes = N*N hence it is 2*logN. And yes i agree this is confusing. We should have used two symbols n and N for the purpose.

    ReplyDelete
  12. A little doubt...as cube connected networks are restricted to have degree three...does this restriction degrades the performance of cube network..??
    From this one thing is clear that cube networks have higher scalability than hypercube..furthermore,cube networks have larger diameter than hypercube..

    ReplyDelete