Loop Transformations
Loop transformations are manipulations of code loops. Their purpose is to improve speed by reducing overhead or increasing the number of statements which can be executed in parallel.[1]
Loop skewing transformation:
Loop skewing takes a given nested loop and changes the indexes in such a way that, increased parallelization becomes possible.[1]
It changes the shape of the iteration space without changing the dependencies. Hence, the skewing transformation is safe.
Skew transformation matrix is given by:
Example:
Code for computation of elements in an array:
do i = 1,3
do j = 1,4
a[i,j] = ½( a[i-1,j] + a[i,j-1])
enddo
enddo
For this nested loop the distance vectors are (1,0) and (0,1).
Iteration space for this nested loop is:
Since there is flow dependence in both ith and jth dimensions, therefore we cannot parallelize the inner and outer loops. But it can be parallelized diagonally. To exploit the parallelism in the loop, first skew the inner loop and then apply the interchange transformation on the nested loop.
By applying skew transformation on the inner loop with respect to the outer loop, we get
do i = 1,3
do j = 1+i,4+i
a[i,j-i] = ½( a[i-1,j-i] + a[I,j-i-1])
enddo
enddo
Distance vectors now become: (1,1) and (0,1).
Iteration space becomes:
Due to skew transformation iteration space shape changes into rhombus shape.
Now we apply interchange transformation on the above nested loop, we get
do j = 2,7
do i = max(1 ,j-4) , min(3 ,j-1)
a[i,j-i] = ½( a[i-1,j-i] + a[i-1,j-i-1])
enddo
enddo
here the ith dimension can be parallelized, therefore the code now becomes:
do j = 2,7
doall i = max(1,j-4) , min(3 ,j-1)
a[i,j-i] = ½( a[i-1,j-i] + a[i-1,j-i-1])
enddoall
enddo
This transform is a wavefront transformation because it causes iterations along the diagonal of the original loop nest to execute in parallel.[2]
Distance vectors : (1,0) and (1,1).
In general this nested loop can be written as:
do j = imin + jmin , 2imax + jmax
doall i = max(imin ,j- jmax) , min(imax ,j- jmin)
a(i,j-i) = ½( a[i-1,j-i] + a[i-1,j-i-1])
enddoall
enddo
where:
imin : minimum no. of rows in an array
imax : maximum no. of rows in an array
jmin : minimum no. of columns in an array
jmax : maximum no. of columns in an array
Loop skewing transformation combined with loop interchanging transformation generates the wavefront transformation.
General wavefront transformation is given by:
Loop Coalescing:
Loop coalescing combines the nested loops. It converts a multi-dimensional iteration space into a one-dimensional iteration space, thus both simplifying the scheduling problem and reducing overheads.[1]
Parallelized code:
doall i = 1,n
doall j = 1,m
a[i,j] = a[i,j] + c
enddoall
enddoall
Let n=3, m=3
If we have 2 processors then
Iteration 1 (1,1) is performed in 1st processor
Iteration 2 (1,2) is performed in 2nd processor
Both the iterations 1 & 2 can be parallelized and take 1 CPU cycle.
In the next CPU cycle only the 3rd iteration (1,3) is performed either in 1st or 2nd processor, thus if the number of processors are lesser than DOP then there is a wastage of CPU cycles. Here DOP is 3.
When such problem occur compiler tries to coalescing the loop, i.e. it combines the nested loop to form 1 loop.
After coalescing the code becomes:
doall T = 1 to n*m
i = (T-1) div m + 1
j = (T-1) mod m + 1
a[i,j] = a[i,j] + c
enddoall
This approach takes half the time as before.
References:
[1] “Loop Transformations”, retrieved 6 April, 2014 from
http://cs.mwsu.edu/~passos/caesar/tutorial1/index.htm.
[2] Advanced Computer Architecture - By Kai Hwang.
By-
Urvashi Meena
Vaibhav Gautam
Corrections:
ReplyDeleteimin : Lower limit of the iteration variable i
imax : Upper limit of the iteration variable i
jmin : Lower limit of the iteration variable j
jmax : Upper limit of the iteration variable j