| | Unrolling AltiVec, Part 3: Down and dirty loop optimization (Site not responding. Last check: 2007-10-24) |
 | | The Unrolling AltiVec series has looked at the history and usage of the SIMD components of the PowerPC architecture; these are called by various names, including Velocity Engine and VMX, but Freescale's name for it is AltiVec, which is how the previous installments refer to it. |
 | | It could be unrolled further, but when you're performing the same operation on a 32-bit value four times in a row, that suggests that maybe it'd be more efficient to use AltiVec. |
 | | The time to run this version of the loop is only about a million microseconds -- not exactly the huge speedup AltiVec should offer when well-tuned, but it's nice to note that, even without a serious effort at further unrolling, performance has improved. |
| www.ibm.com /developerworks/power/library/pa-unrollav3/index.html?ca=drs-tp1405 (2921 words) |