Sunday 6 April 2014

Summary of 4th April Class

Loop Transformations

Loop transformations are manipulations of code loops. Their purpose is to improve speed by reducing overhead or increasing the number of statements which can be executed in parallel.[1]


Loop skewing transformation:

Loop skewing takes a given nested loop and changes the indexes in such a way that, increased parallelization becomes possible.[1]

It changes the shape of the iteration space without changing the dependencies. Hence, the skewing transformation is safe.

 

Skew transformation matrix is given by:


Example:

Code for computation of elements in an array:

do i = 1,3

          do j = 1,4

                   a[i,j] = ½( a[i-1,j] + a[i,j-1])

          enddo

enddo

For this nested loop the distance vectors are (1,0) and (0,1).

Iteration space for this nested loop is:


Since there is flow dependence in both ith and jth dimensions, therefore we cannot parallelize the inner and outer loops. But it can be parallelized diagonally. To exploit the parallelism in the loop, first skew the inner loop and then apply the interchange transformation on the nested loop.   


By applying skew transformation on the inner loop with respect to the outer loop, we get

do i = 1,3

          do j = 1+i,4+i

                   a[i,j-i] = ½( a[i-1,j-i] + a[I,j-i-1])

enddo

enddo


Distance vectors now become: (1,1) and (0,1).

Iteration space becomes:

Due to skew transformation iteration space shape changes into rhombus shape.


Now we apply interchange transformation on the above nested loop, we get

do j = 2,7

          do i = max(1 ,j-4) , min(3 ,j-1)

                   a[i,j-i] = ½( a[i-1,j-i] + a[i-1,j-i-1])

          enddo

enddo

here the ith dimension can be parallelized, therefore the code now becomes:

do j = 2,7

          doall i = max(1,j-4) , min(3 ,j-1)

                   a[i,j-i] = ½( a[i-1,j-i] + a[i-1,j-i-1])

          enddoall

enddo


This transform is a wavefront transformation because it causes iterations along the diagonal of the original loop nest to execute in parallel.[2]

Distance vectors : (1,0) and (1,1).

In general this nested loop can be written as:

do j = imin + jmin , 2imax + jmax

          doall i = max(imin ,j- jmax) , min(imax ,j- jmin)

                   a(i,j-i) = ½( a[i-1,j-i] + a[i-1,j-i-1])

          enddoall

enddo

where:

imin : minimum no. of rows in an array

imax : maximum no. of rows in an array

jmin : minimum no. of columns in an array

jmax : maximum no. of columns in an array

 

Loop skewing transformation combined with loop interchanging transformation generates the wavefront transformation.


General wavefront transformation is given by:


 

Loop Coalescing:

Loop coalescing combines the nested loops. It converts a multi-dimensional iteration space into a one-dimensional iteration space, thus both simplifying the scheduling problem and reducing overheads.[1]


Parallelized code:

doall i = 1,n

          doall j = 1,m

                   a[i,j] = a[i,j] + c

          enddoall

enddoall


Let n=3, m=3

If we have 2 processors then

Iteration 1 (1,1) is performed in 1st processor

Iteration 2 (1,2) is performed in 2nd processor

Both the iterations 1 & 2 can be parallelized and take 1 CPU cycle.

In the next CPU cycle only the 3rd iteration (1,3) is performed either in 1st or 2nd processor, thus if the number of processors are lesser than DOP then there is a wastage of CPU cycles. Here DOP is 3.

When such problem occur compiler tries to coalescing the loop, i.e. it combines the nested loop to form 1 loop.

After coalescing the code becomes:

doall T = 1 to n*m

          i = (T-1) div m + 1

          j = (T-1) mod m + 1

          a[i,j] = a[i,j] + c

enddoall

This approach takes half the time as before.

 

References:

[1] “Loop Transformations”, retrieved 6 April, 2014 from

http://cs.mwsu.edu/~passos/caesar/tutorial1/index.htm.

[2] Advanced Computer Architecture - By Kai Hwang.

 

By-

Urvashi Meena

Vaibhav Gautam

1 comment:

  1. Corrections:
    imin : Lower limit of the iteration variable i

    imax : Upper limit of the iteration variable i

    jmin : Lower limit of the iteration variable j

    jmax : Upper limit of the iteration variable j

    ReplyDelete