| |
| | CS267: Notes for Lecture 2 (part 2), Jan 18, 1996 |
 | | When unblocked matmul with no optimization is used, the speed is a disappointing 4.5 Mflops, a small fraction of peak speed. |
 | | DMR compiled this way gave erroneous answers for n>64, and still got only 186/266=70% of the peak machine speed (this was a bug, not a feature, and is meant as an early warning that leading edge technology is not always as reliable as less ambitious technology!). |
 | | There is yet another optimization level, not illustrated here, which would "pattern match" to discover that the unblocked implementation was really doing matrix multiplication, and replace everything by a call to ESSL's dgemm, which then nearly attains peak speed. |
| www.cs.berkeley.edu /~demmel/cs267/lecture03.html (3170 words) |
|