LOOP OPTIMIZATION
Loop optimization is the process of the increasing execution speed and reducing the overheads associated of loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities .Loop optimization can be viewed as the application of a sequence of specific loop transformations.
TILING
Loop tiling, also known as loop blocking, or strip mine and interchange, is a loop optimization technique used by compilers to make the execution of certain types of loops more efficient. It reduce the synchronization cost and improve the data locality of parallelized loops. It transforms the loop such that arrays are processed in blocks fitting into the cache thus eliminating cache capacity misses and reducing cache line interference. Tiling improves the data locality of numerical algorithms. Tiling can be used at different levels of memory hierarchy such as physical memory,caches and registers. Multilevel tiling can be used to achieve locality at multiple levels of memory hierarchy simultaneously.
Example of matrix multiplication,
Do i =1, N
Do j=1, N
Do k=1, N
C( i,k) =C(I,k) +A(i,j) x B(j,k)
Enddo
Enddo
Enddo
In this code, although the same rows of C and B are reused in the next iteration of the middle and outer loops, respectively, the large volume of data used in intervening iterations may replace the data from the register file or the cache before they can be reused. Tiling reorders the execution sequence such that iterations from loops of the outer dimensions are executed before all the iterations of the inner loop are completed. The tiled matrix multiplication is:
Do L=1,N,S
Do M=1,N,S
Do I=1,N
Do j=l,min(L+S-1,N)
Do k=m,min(M+S-1,N)
Enddo
Enddo
Enddo
Enddo
Enddo
Tiling reduces the no of intervening iterations and thus data fetched between data reuses. This allows reused data to still be in cache or register file and hence reduces memory accesses. The tile size S can be chosen to allow the maximum reuse for a specific level of memory hierarchy. For example, the tile size is revelent to the cache size used or the register file size used.
Reference: Advanced Computer Architecture by Kai Hwang
http://en.wikipedia.org/wiki/Loop_optimization
By
Tarun Kumar 352/CO/11
Udayan Yadav 355/CO/11
TILE SIZE
ReplyDeleteIt is not always easy to decide what value of tiling size is optimal for one loop because it demands an accurate estimate of accessed array regions in the loop and the cache size of the target machine. The order of loop nests (loop interchange) also plays an important role in achieving better cache performance. Explicit blocking requires choosing a tile size based on these factors. By contrast, cache-oblivious algorithms are designed to make efficient use of cache without explicit blocking.
For more information on cache-obvios algos visit
http://en.wikipedia.org/wiki/Cache-oblivious_algorithm