Ma'am
Pasted below is the summary of the 10 Feb (Monday) lecture on the topic "Prefix Computation and Linear Network".
Regards
Rudra Pratap Singh Nirvan
COE 3 (6th Semester)
______________________________________________________________________________________________
Prefix Computation
It is a binary, closed and associative computation.
Let '(o)' represent a computation, and x0, x1, x2, … x(n-1) be a set of 'n' numbers. Then,
C1 = x0 [C1 = computation 1]
C2 = x0 (o) x1
C3 = x0 (o) x1 (o) x2
.
.
C(n-1) = x0 (o) x2 (0) … (o) x(n-1)
Algorithm:
for i = 1 to (n - 1) {
sum[i] = x[i] + x[i-1]
}
for i = 2 to (n - 1) {
sum[i] = sum[i] + sum[i-2]
}
We keep adding such loops and get the final loop as:
for i = n - 1 to (n - 1) {
sum[i] = sum[i] + sum[i-(n-1)]
}
The above code is written using high level syntax. Changing it to machine level instructions results in the following:
for i = 1 to (n - 1) {
Read M[i] into R1
Read M[i-1] into R2
Write (R1 + R2) into M[i]
}
M[i] = ith memory location
Read = memory read operation
Write = memory write operation
R1, R2 = registers
The same algorithm can be written in a compact form:
for j = 0 to log n - 1 {
for i = 2j to (n - 1) { __
Read M[i] into R1 |
Read M[i - 2j] into R2 |--> Simultaneous Read/Write
Write (R1 + R2) into M[i] _| O(1)
}
}
Time Complexity = O(log n)
Processors = O(n)
-
The read/write operations used above are "exclusive" read/write operations (matrix multiplication had used concurrent read/write).
Access time is same because it is a shared memory model.
-
The operations occur parallely. Hence, it is a PRAM model.
The PRAM model can be considered to be an almost SIMD model (which has one CU and various PUs) with masking.
As an MIMD, the code has be written for each processor.
-
We are ignoring network delays and the PRAM model is an idealistic RAM model.
Some concurrent write models:
Common model
Random model
-
Aggregation model
-
Priority model
Linear Network
It employs message passing paradigm and not the shared memory paradigm. It is a static network. The data is not read in O(1) time, but has to travel in a linear fashion.
So, this type of network involves the concept of "routing".
We change the "Read" operation in the above mentioned algorithm to "Route" operation to show this change.
for j = 0 to log n - 1 {
for i = 2j to (n - 1) {
Read M[i] into R1
Route M[i - 2j] into R2 ←- "Read" changed to "Route"
Write (R1 + R2) into M[i]
}
}
Time complexity:
1 + 2 + 22 + 23 + … (log n - 1) times
= 2log n - 1
= O(n)
-
Any processor was reading from any memory location without any delay in the shared memory model. This is not the case in linear network.
It is a static network.
-
This network is a fine granularity model.
-
Diameter is the largest of the shortest paths between any two nodes of the network.
Diameter of the above network = n - 1
-
Our aim should be to minimize this diameter.
-
If we connect the first node with the last node, the diameter changes.
New diameter = floor( n / 2 )
In the above divide algorithm, we used a bottom-up approach. A top-down approach could also have been used.
By Rishabh Jain and Rudra Pratap Singh Nirvan
About implementation of PRAM model,
ReplyDeletePRAM model can be implemented using SRAM(Static Random Access Memory) blocks of a Field Programmable Gate Array (FPGA) as DRAM's do not allow concurrent access.
Uzi Vishkin, a leading computer scientist in the field of parallel processing has given the concept of "PRAM on chip" ,the basic approach behind the concept is indefinite number of instructions available for concurrent execution execute immediately(Immediate Concurrent Execution).
He also gave a paradigm for programming parallel computers called XMT(Explicit multi threading) and an extension of C language called XMTC.
Sources:
http://en.wikipedia.org/wiki/Parallel_Random_Access_Machine
http://en.wikipedia.org/wiki/Uzi_Vishkin