Friday, 14 February 2014

Multiported Memory Network and Multistage Network

Multiported Memory Network

The crossbar network closely resembled a PRAM model. It’s scalability was not great, and the cost of the network increased as we added more processors/memory modules.

In a multiported memory, we have as many ports as there are memory modules. A wire goes into the port from each memory module. On the other side are the processors. In this way, each processor can communicate with any memory.

So, in a multiported model, we can have all the possible permutations of memory-processor connection.

Let ‘W’ represent the width of a single bus [ideally, data bus should be considered]

Minimum Bandwidth = w

Maximum Bandwidth = O(w*n)

In this model, the reliability increase. If one port malfunctions, the processors can still access the rest of the modules. Also, we can add additional memories by simply increasing the number of ports. This improves scalability.

Data security is also increases because we can map each of the ports to limit the access of different processors to the memory modules.

Multistage Network

In this, the processors are connected to the memory modules using multiple stages of switching elements.The output from each stage is connected to the inputs of the next stage using a perfect shuffle (logical circular shift left) connection system.

In terms of binary representation of the processors, each stage of the perfect shuffle can be thought of as a cyclic logical shift left; each bit in the address is shifted once to the left, with the most significant bit moving to the least significant bit.

Different settings of a 2*2 switch box:

Multistage network makes use of shuffling.
There is a unique path from a processor to a memory.
Some of the permutations of memory-processor connections are not allowed (as can be seen from the figures below).

In the above diagram:

Single stage Shuffle-Exchange is show on left
Perfect shuffle mapping function is shown on right
Perfect shuffle operation: cyclic shift 1 place left, eg 101 --> 011
Exchange operation: invert least significant bit, e.g. 101 --> 100

Complexity = O((logk n) * n/k) [for the figure shown below, k = 2]

Wiring complexity = O(n log n)

Latency = O(log n)

Switching complexity = O(n log n)

Omega Network

The table shows the various complexities of different interconnecting networks:

Network	Latency	Switching Complexity	Wiring Complexity	Blocking
Bus	O(n)	O(1)	O(w)	Yes
Multistage	O(log n)	O(n log n)	O(n*w logn)	Yes
Crossbar	O(1)	O(n*n)	O(nnw)	No

This network is more reliable than a crossbar. If a switch box stops working, only a part of the memory becomes inaccessible.
It is more scalable than bus-based networks. Its scalability cost is also less than that of a bus’.
The total number of switches used in a multistage network of n switches (each switch k X k) is (n/k)logkn i.e., its cost is O(n log n) as compared to O(n*n) for a crossbar.

There exist a single path from source to destination, thus contrary to a crossbar network, omega network is a blocking network.
This is shown here as:

--the path from 010 -->110 (red)

--the path from 110 -->100 (blue)

has blockage as at input 110 has to wait till 010 has passed otherwise it results in collision.

Another type of network is Benes Network.

In this network, for 8 bits:

I Stage = 4 shuffles of 8 objects

II Stage = 2 shuffles of 4 objects

Similarly, for 16 bits:

I Stage = 8 shuffles of 16 objects

II Stage = 4 shuffles of 8 objects

III Stage = 2 shuffles of 4 objects

by Rishabh Jain and Rudra Pratap Singh Nirvan

12 comments:

Unknown14 February 2014 at 22:01
the main disadvantage that i find against omega networks or multistage networks is Blocking . In this scenario i think multiported networks are more useful as they are reliable and scalable too and do not support blocking.
ReplyDelete
Replies
Unknown15 February 2014 at 00:20
Yes multistage network is called blocking bcoz its result in conflicts in the use of swtiches or communication links.
ReplyDelete
Replies
Unknown15 February 2014 at 05:48
Omega networks may be used as connectors between the CPU's and their Shared Memory, in order to decrease the probability that the CPU-to-memory connection becomes a bottleneck.
This class of Networks has been built into the Illinois Cedar Multiprocessor, into the IBM RP3, and into the NYU Ultracomputer.
ReplyDelete
Replies
Unknown15 February 2014 at 07:47
what is happening in upper and lower broadcast in 2*2 switch box??
ReplyDelete
Replies
Unknown15 February 2014 at 08:33
in broadcast case a single input is passed on to both output ports of the switch, as a result a single input can reach to multiple switches. Based on the position of input in the switch it is defined as upper or lower broadcast.
ReplyDelete
Replies
Unknown15 February 2014 at 09:07
to understand benes network, have a look at this!!
http://en.wikipedia.org/wiki/Clos_network
ReplyDelete
Replies
Shampa chakraverty15 February 2014 at 10:13
Certain points:
1. Baseline and Benes networks: Benes is the non-blocking version of baseline - will discuss .
2. Switching complexity of bus is O(n), since each processor/ memory/IO taps to the bus by a separate switch (trans-receivers).
3. Wiring complexity of crossbar is O(n+m)w.

ReplyDelete
Replies
Unknown15 February 2014 at 20:41
It was previously said that latency does not remain constant if the network is scalable. So, in case of multiported memory,the network is scalable then will the latency not vary..??

For a multistage network, if we add a stage to the network, the connections would change because the no. of bits changed. In order to change the connections, will the chip not be fabricated again..? Then why is this network scalable as fabricating a new chip with different connections will increase the cost.?
ReplyDelete
Replies
Shampa chakraverty16 February 2014 at 01:48
Sweety - I dunno how u got this impression (did i say that ??? ), but it is NOT true that scalable networks have variable latency. In fact, the opposite is true: scalability means a hassle free upgradation: ideally u add one extra unit of hardware and the performance of the system (meaning speedup) also goes up one unit - and without any extra time overhead either!!!
Now that is possible only in a EREW PRAM model - any realistic network will have some flaw or the other. Therefore, scalability is best defined by the ratio of performance (speedup) on a EREW PRAM model to that of a real machine.
That also brings to light an interesting observation : scalability is a relative term for real machines (no one is perfect). Crossbars have have O(n^2) switching complexity while Multiport memory have the same Wiring complexity !! (however, most ppl agree that switch cost is a more sensitive matter than wire cost - tho some dont agree here). Both crossbar and Multiport have CONSTANT average latency which is exactly the bane in MSN: It has a O(lg n) latency! Ofcourse MSN has O(nLg n) switching and wiring complexity. Now different applications need different n/w. In fact, as our reporters have pointed out, FPGAs indeed use crossbar and not MSN networks owning to its constant latency, despite its so-called high sw complexity because VLSI allows that!!

Now for yr second question - it is quite another aspect of scalability : Do we retain the old motherboard/backplane/chip when we upgrade? Does scalability mean a pluggable upgradation?
Usually no! An upgradation means the manufacturer offers a new computer system with enhanced abilities. It may ofcourse have backward compatibility though.
Note however, in older machines the MSN were constructed using cables and standard available 2X2 switches - so that it could be easily upgraded.

I know this is turning out long but an aside: Talking of upgradation, we may want to upgrade an Omega network from 8 to 10 processors. In that case, the definition of shuffle (*2 mod n-1) suggests that we would require a much larger network catering to 16 processors for just 2 more. So let me give a more acceptable definition of shuffle where n is not necessarily a power of 2.

s shuffle (say 2) of s*m objects (say 10 objects, m = 5):

shuffle of object i = (s*i + Floor(i/m)) MOD s*m
Try it out:)

Shuffling ensures a fair and even distribution of one set of objects (say processors or cards) to another set of entities (say memories or players) because there is an equal prob of a person getting any card.

ReplyDelete
Replies
Unknown6 March 2014 at 08:13
Latency of bus system should be O(1) I suppose.
ReplyDelete
Replies

Add comment