Ma'am

Pasted below is the summary of the 10 Feb (Monday) lecture on the topic "Prefix Computation and Linear Network".

Regards

Rudra Pratap Singh Nirvan

COE 3 (6th Semester)

______________________________________________________________________________________________

Prefix Computation

It is a binary, closed and associative computation.

Let '(o)' represent a computation, and x0, x1, x2, … x(n-1) be a set of 'n' numbers. Then,

C1 = x0 [C1 = computation 1]

C2 = x0 (o) x1

C3 = x0 (o) x1 (o) x2

C(n-1) = x0 (o) x2 (0) … (o) x(n-1)

Algorithm:

for i = 1 to (n - 1) {

sum[i] = x[i] + x[i-1]

}

for i = 2 to (n - 1) {

sum[i] = sum[i] + sum[i-2]

}

We keep adding such loops and get the final loop as:

for i = n - 1 to (n - 1) {

sum[i] = sum[i] + sum[i-(n-1)]

}

The above code is written using high level syntax. Changing it to machine level instructions results in the following:

for i = 1 to (n - 1) {

Read M[i] into R1

Read M[i-1] into R2

Write (R1 + R2) into M[i]

}

M[i] = ith memory location

Read = memory read operation

Write = memory write operation

R1, R2 = registers

The same algorithm can be written in a compact form:

for j = 0 to log n - 1 {

for i = 2j to (n - 1) { __

Read M[i] into R1 |

Read M[i - 2j] into R2 |--> Simultaneous Read/Write

Write (R1 + R2) into M[i] _| O(1)

}

Time Complexity = O(log n)

Processors = O(n)

The read/write operations used above are "exclusive" read/write operations (matrix multiplication had used concurrent read/write).
Access time is same because it is a shared memory model.
The operations occur parallely. Hence, it is a PRAM model.
The PRAM model can be considered to be an almost SIMD model (which has one CU and various PUs) with masking.
As an MIMD, the code has be written for each processor.
We are ignoring network delays and the PRAM model is an idealistic RAM model.

Some concurrent write models:

Common model
Random model
Aggregation model
Priority model

Linear Network

It employs message passing paradigm and not the shared memory paradigm. It is a static network. The data is not read in O(1) time, but has to travel in a linear fashion.

So, this type of network involves the concept of "routing".

We change the "Read" operation in the above mentioned algorithm to "Route" operation to show this change.

for j = 0 to log n - 1 {

for i = 2j to (n - 1) {

Read M[i] into R1

Route M[i - 2j] into R2 ←- "Read" changed to "Route"

Write (R1 + R2) into M[i]

}

Time complexity:

1 + 2 + 22 + 23 + … (log n - 1) times

= 2log n - 1

= O(n)

Any processor was reading from any memory location without any delay in the shared memory model. This is not the case in linear network.
It is a static network.
This network is a fine granularity model.
Diameter is the largest of the shortest paths between any two nodes of the network.
Diameter of the above network = n - 1
Our aim should be to minimize this diameter.
If we connect the first node with the last node, the diameter changes.
New diameter = floor( n / 2 )

In the above divide algorithm, we used a bottom-up approach. A top-down approach could also have been used.

By Rishabh Jain and Rudra Pratap Singh Nirvan

1 comment:

Unknown13 February 2014 at 06:46
About implementation of PRAM model,
PRAM model can be implemented using SRAM(Static Random Access Memory) blocks of a Field Programmable Gate Array (FPGA) as DRAM's do not allow concurrent access.
Uzi Vishkin, a leading computer scientist in the field of parallel processing has given the concept of "PRAM on chip" ,the basic approach behind the concept is indefinite number of instructions available for concurrent execution execute immediately(Immediate Concurrent Execution).
He also gave a paradigm for programming parallel computers called XMT(Explicit multi threading) and an extension of C language called XMTC.
Sources:
http://en.wikipedia.org/wiki/Parallel_Random_Access_Machine
http://en.wikipedia.org/wiki/Uzi_Vishkin

COE United Parallel Processing Discussions

Monday, 10 February 2014

Lecture summary for 10 Feb (Monday)

Algorithm:

1 comment: