Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: Loop nest optimization


Related Topics
LEX

  
  Definition of Loop nest optimization
Loop Nest Optimization (LNO) was a crucial development in compiler technology that made possible large reductions in the cache bandwidth necessary for some pervasive codes.
The loop requires registers to hold both the accumulators and the loaded and reused A and B values.
That trick is reducing the size of the stripe of the B matrix by blocking the k loop, so that the stripe is of size ib x kb.
www.wordiq.com /definition/Loop_nest_optimization   (1590 words)

  
  PGI User's Guide - 3 Optimization Features
Loops that are candidates for the vectorizer are countable, that is the number of loop iterations is determined prior to the loop's execution and the loop counter is incremented or decremented by a fixed amount at each iteration.
An expandable scalar is a scalar appearing in an innermost loop for which every use is reached by a single assignment to the scalar, or where all paths from the beginning of a loop to the scalar use contain a definition of that scalar.
When the address of a user-defined loop index is passed as an argument to a subprogram, the vectorizer must create yet another temporary to hold the address of the expression that yields the same value as the original loop index.
www.unc.edu /depts/case/pgi/pgiws_ug/pgi31u04.htm   (5603 words)

  
 SGI TPL (IRIX 6.5: Developer/Pragmas - Chapter 8. Loop Nest Optimization #pragma Directives)
Fusion is attempted on each pair of adjacent loops and the level, by default, is determined by the maximal perfectly nested loop levels of the fused loops, although partial fusion is allowed.
If the loop that this directive immediately precedes is not innermost, then outer loop unrolling (unroll and jam) is performed (version 7.0 and later).
Loops are often not unrollable in C because of potential aliasing.
techpubs.sgi.com /library/tpl/cgi-bin/getdoc.cgi/0650/bks/SGI_Developer/books/Pragmas/sgi_html/ch08.html   (1555 words)

  
 National Partnership for Advanced Computational Infrastructure: Archives
If we have a loop with 100 iterations and 10 streams are available to work on it, the compiler assigns a contiguous block of 10 iterations to each stream.
Loops where the execution time of a particular iteration is data dependent can still result in load-balance problems.
A minor disadvantage is that the generated code for inner loops is not as good when the stride of the induction variables is unknown.
www.npaci.edu /MTA/tera-doc/pg/html/Parallel_loops1.html   (997 words)

  
 +O2 level optimizations
The register reassociation optimization dedicates a register to track the value of the virtual memory address expression for one or more array references in a loop and updates the register appropriately in each iteration of a loop.
The register is initialized outside the loop to the loop-invariant portion of the virtual memory address expression, and the register is incremented or decremented within the loop by the loop-variant portion of the virtual memory address expression.
After performing the register reassociation optimization, the loop variable may be needed only to control the iteration count of the loop.
docs.hp.com /en/B6056-96002/ch03s05.html   (1427 words)

  
 Loop nest optimization - Wikipedia, the free encyclopedia
Loop nest optimization (LNO) is a special case of loop transformation which deals with nested loops that makes possible large reductions in the cache bandwidth necessary for some pervasive algorithms.
The loop requires registers to hold both the accumulators and the loaded and reused A and B values.
That trick is reducing the size of the stripe of the B matrix by blocking the k loop, so that the stripe is of size ib x kb.
en.wikipedia.org /wiki/Loop_nest_optimization   (1416 words)

  
 Loop Analysis Tools   (Site not responding. Last check: 2007-10-21)
If such a loop were parallelized, multiple copies of the loop might instantiate the function call simultaneously, trample on each other's use of any variables local to that function, or trample on return values, and generally invalidate the function's purpose.
Often loops that are labeled with this hint may also be labeled "parallelized," meaning that the compiler generated two versions of the loop (see Hint 2), and that it will be decided at runtime whether the parallel version or the serial version should be used.
The loop indices of an inner and an outer loop have been swapped, to move data dependencies as far away from the inner loop as possible, and to enable this nested loop to be parallelized.
docs.sun.com /source/806-3562/analyzingloops.html   (2959 words)

  
 LoopTools
Loop fusion improves cache reuse by reducing the reuse distance between accesses to the same data or data that share the same cache line.
In a multilevel fused loop nest, the dependences may be carried by different loops within the fused loop nest.
To facilitate unroll-and-jam, the loop bounds of the core are adjusted so that the number of iterations is divisible by the unroll factor in the dimension to be unrolled.
www.hipersoft.rice.edu /looptool   (820 words)

  
 TJS paper
Loop transformations are widely used by automatic parallelizers to generate efficient code for a variety of high performance computers [1,4,20,24].
denote the nest vector of a loop nest.
The loop lower bound is the ceiling of the maximum corresponding to the former while the upper bound is the floor of the minimum corresponding to the latter.
www.ece.lsu.edu /jxr/revised/revised.html   (5041 words)

  
 Code optimization - user techniques
Proper loop nesting ------------------- Operating systems (except DOS) don't have to load all of your program into memory during execution, the memory your program needs to store code or data is partitioned into pages, and the pages are read and written from and to the disk as necessary.
There is overhead inherent in every loop, upon initializing the loop, and on every iteration (see the chapter on DO loops), the overhead may be larger if the loop does little, e.g.
This horrible loop nest has more overhead in loop initializing and control, but paging activity is minimized and reusing data already in cache is maximized.
www.ibiblio.org /pub/languages/fortran/ch1-9.html   (1798 words)

  
 [No title]
The optimizations at this level are generally conservative, in the sense that they (1) are virtually always beneficial, (2) provide improvements commensurate to the compile time spent to achieve them, and (3) avoid changes which affect such things as floating point accuracy.
The optimizations at this level are distinguished from -O2 by their aggressiveness, generally seeking highest-quality generated code even if it requires extensive compile time.
Although the optimizations are generally safe, they may affect floating point accuracy due to rearrangement of computations.
www.spec.org /osg/cpu/flags/SGI-20000523.txt   (1784 words)

  
 Chapter 7. Using Loop Nest Optimization
One of the noncache optimizations that the LNO performs is outer loop unrolling (sometimes called register blocking, because it tends to get a small block of array values into registers for multiple operations).
Outer loop unrolling is one optimization that the LNO performs; it chooses the proper amount of unrolling for loop nests such as this matrix multiply kernel.
Loop peeling is the technique of removing iterations from the beginning and/or ending of a loop so that the index range will match that of another loop.
www.cs.wfu.edu /~torgerse/Kokua/More_SGI/007-3430-003/sgi_html/ch07.html   (6141 words)

  
 SGI TPL (IRIX 6.5: Developer/OrOn2_PfTune - Chapter 7. Using Loop Nest Optimization)
By increasing the unrolling, it may be possible to convert the loop from memory-bound to floating point-bound.
Loop interchange and outer loop unrolling can be combined to solve some performance problems that neither technique can solve on its own.
Loop fission also needs to be balanced with loop fusion, which has its own benefits and liabilities.
techpubs.sgi.com /library/tpl/cgi-bin/getdoc.cgi/0650/bks/SGI_Developer/books/OrOn2_PfTune/sgi_html/ch07.html   (6205 words)

  
 Auto-vectorization in GCC - GNU Project - Free Software Foundation (FSF)
Checks the control flow properties of the loop (number of basic-blocks it consists of, nesting, single entry/exit, etc.), in order to determine whether the control flow of the loop falls within the range of loop forms that are supported by this vectorizer.
Build the loop dependence graph (for scalar and array references); Detect Strongly Connected Components (SCCs) in the graph (statements that are involved in a dependence cycle); Perform a topological sort on the reduced graph (in which each SCC is represented by a single node); Only singleton nodes w/o self dependencies can be vectorized.
Vectorize loops that can't be vectorized using the classic vectorizer (until the proper loop transformations are developed) by applying SLP vectorization (a la "Exploiting Superword Level Parallelism with Multimedia Instruction Sets" by Amarasinghe and Larsen).
gcc.gnu.org /projects/tree-ssa/vectorization.html   (5365 words)

  
 eko
Disable the loop interchange transformation in the loop nest optimizer.
This option specifies that for loops with 3-deep (or deeper) loop nests, the compiler should outer unroll the wind-down loops that result from outer unrolling loops further out.
The optimizations at this level are generally conservative, in the sense that they are virtually always beneficial, provide improvements commensurate to the compile time spent to achieve them, and avoid changes which affect such things as floating point accuracy.
www.pathscale.com /docs/eko.html   (8498 words)

  
 Strip Mining and Loop Interchange Are Not Enough
At each time step (each iteration of the t loop) at every grid point, the value of u(i) is updated by using the data at the three grid points i-1, i, and i+1 from the previous time step, t-1.
We start from the assumption that the computation is a nested loop of depth k in which there are some loop-carried dependences with fixed displacements in the index space.
We then consider the problem of determining which loop index transformations A permit the resulting index-transformed loop nest to be successfully tiled through strip mining and interchange.
www.netlib.org /utk/papers/autoblock/node4.html   (729 words)

  
 C H A P T E R 9 - Performance and Optimization
If a DO loop with a variable loop limit can be unrolled, both an unrolled version and the original loop are compiled.
Loop unrolling, especially with simple one or two statement loops, increases the amount of computation done per iteration and provides the optimizer with better opportunities to schedule registers and simplify operations.
Then, carefully analyze the loop or loop nest to eliminate coding that might either inhibit the optimizer from generating optimal code or otherwise degrade performance.
docs.sun.com /source/817-6694/9_perform.html   (2748 words)

  
 [No title]   (Site not responding. Last check: 2007-10-21)
Loop characteristics, such as which.\"transformations were applied to the loop, are shown as part of the.\"source code display.
This.\"information is derived by the compiler during its optimization phase,.\"and may not exactly match the source line numbering and source.\"nesting.
If a loop is.\"parallelized, it is credited with the elapsed time for all of the.\"instances of the loop.
elvis.rowan.edu /csdoc/ForteSuite/SUNWspro/man/man1/looptool.1   (747 words)

  
 [No title]
This is the default when optimization levels -O0, -O1 and -O2 are in effect.
The number of iterations of the loop is divided by the number of threads in the team and rounded up to give the chunk size.
Loop iterations are grouped into chunks of this size and assigned to threads in order of increasing thread id (within the team).
www.spec.org /omp/results/flags/HP-Pathscale-20051026.txt   (1893 words)

  
 The SQLite Query Optimizer Overview
The LIKE optimization might occur if the column named on the left of the operator uses the BINARY collating sequence (which is the default) and case_sensitive_like is turned on.
The default order of the nested loops in a join is for the left-most table in the FROM clause to form the outer loop and the right-most table to form the inner loop.
Inner joins to the left and right of the outer join might be reordered if the optimizer thinks that is advantageous but the outer joins are always evaluated in the order in which they occur.
www.sqlite.org /optoverview.html   (2583 words)

  
 SGI TechPubs Library Display (f90.z)   (Site not responding. Last check: 2007-10-21)
Optimizations performed at this level are almost always beneficial.
Optimizations performed at this level may generate results that differ from those obtained when -O2 is specified.
The optimizations performed may differ from release to release and among the supported platforms.
www.hipecc.twsu.edu /f90.htm   (5589 words)

  
 GCC Summit
We will describe a loop optimization infrastructure based on improved induction variable, scalar evolution, and data dependence analysis.
A matrix-based transformation method for rearranging loop nests to optimize locality, cache reuse, and remove inner loop dependencies (to help vectorization and parallelization).
This method can perform any legal combination of loop interchange, scaling, skewing, and reversal to a loop nest, and provides a simple interface to doing it.
www.gccsummit.org /2004/view_abstract.php?content_key=9   (272 words)

  
 IPDPS 2001
We will address questions that must be answered by compiler researchers and students in deciding on a compiler infrastructure to be used in their work, or by others who must use compilers, especially for development of high-performance parallel software.
The tutorial will provide a general overview of the Pro64 infrastructure, in particular the optimization strategy and methods.
We will emphasize loop nest optimization and parallelization, but will also address traditional global optimization and code generation.
www.ipdps.org /ipdps2001/2001_tutorial5.html   (125 words)

  
 Gelato :: Community
The main idea is applying divide and conquer recursively to optimize for unknown memory hierarchy.
I guess from this lesson, we learned that how the compilers optimize and how the users write their high level programs somehow need to be correlated.
The principle of Loop Nest Optimization is of course to
www.gelato.org /community/answer_threaded.php?id=1_934   (1549 words)

  
 [No title]   (Site not responding. Last check: 2007-10-21)
Third bullet: The loop can be run in parallel if M > N, because the array reference does not overlap.
However, the MIPSpro APO does not know the value of the variables and therefore cannot make the loop parallel.
END ¡HyÈ%bª>7wºðH ð ƒ ð0ƒ“Þ½h”™xŒ¿ÿ ?ð ÿÿÿ€€€Ì™33ÌÌÌÿ²²²ðTñ€0 ð pðð¤ð( ð ðð^ ð S ð¿ÿDð³Ç0 ðà  ð ð c ð$€ä¶¿ÿDðÁ @ ððà   𚟨$Array IB is asserted to be a permutation array for both loops in SUB1() in this example.
www.hipecc.twsu.edu /training/APO.ppt   (458 words)

  
 Absoft Fortran Compilers, Debuggers: Windows Macintosh Linux   (Site not responding. Last check: 2007-10-21)
Absoft Fortran compilers for 32-bit and 64-bit Linux combine state-of-the-art code generation and optimization technology with solid reliability and the industry's most complete list of tools and libraries into a single package with prices starting as low as $299 for Absoft Fortran Express.
New Absoft Fortran Compilers for Linux use state-of-the-art optimization technology to deliver superior execution speed and the best real-world application performance on x64 AMD® Opteron and Intel® Xeon™ processors based on Linux systems.
To ensure customers obtain maximum performance, options are suggested for optimizations and compatibility.
www.absoft.com /Products/Compilers/Fortran/Linux/fortran95   (2747 words)

  
 SGI - Developer Central: IRIX Development Tools: Languages: Automatic Parallelization Option
Beginning with the 7.2 release of the MIPSpro compilers, automatic parallelization (analysis and restructuring of loops) for -n32 and -64 binaries on -mips3 and -mips4 platforms can be incorporated into the other optimizations performed by the MIPSpro back end.
Although they suffer a slight run-time performance penalty on single-processor systems, parallelized programs can be created and debugged on any SGI® system with the AP and 7.2, 7.3, and 7.4 compilers.
Starting with the 7.2 release, the MIPSpro auto-parallel compilers integrated automatic parallelization, provided by the AP, with the other compiler optimizations, such as interprocedural analysis (IPA) and loop nest optimization (LNO).
www.sgi.com /products/software/irix/tools/apo.html   (312 words)

  
 [No title]
To report the experience at the University of Delaware using this infra-structure for academic research.
Part II: Back-end optimization: overview and illustration with a case study.
Discuss capabilities of IPA, Loop Nest Optimization and Parallelization (LNO), Global Optimization (WOPT), and Code Generation (CG).
www.ee.udel.edu /~hu/tutorial.html   (182 words)

  
 TACC > TACC Talk: Bob Blainey; Production Compiler Technology at IBM
Bob will discuss some of the compiler externals (what the compiler does to optimize your code) and then present some notable parts of the internal structure of the compiler (how and why the compiler transforms your code).
Bob is an expert in compiler optimization with specific interest in the areas of whole program analysis, loop nest optimization, dynamic compilation, and the interaction of compilers with processor architecture and microarchitecture.
Bob graduated from the University of Toronto in 1988 and has been working at IBM in the area of compilation technology for 12 years.
www.tacc.utexas.edu /general/news/archive/20021106_01.php   (206 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.