Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: MFLOPs


Related Topics

In the News (Sat 26 Dec 09)

  
 [No title]   (Site not responding. Last check: 2007-10-22)
MFLOPS ratings are provided for each module, but the programs overall results are summerized by the MFLOPS(1), MFLOPS(2), MFLOPS(3), and MFLOPS(4) outputs.
The minimum MFLOPS rating is 15.1 MFLOPS for module 7, which does 25% FDIV's.
The MFLOPS rating is obtained by dividing the number of floating-point instructions in the loop by the Runtime (in microseconds).
sunsite.utk.edu /ftp/usr-436-1/freebsd/distfiles/flops.doc   (800 words)

  
 What is Whetstone? CPU Benchmark Integer Performance Mflops   (Site not responding. Last check: 2007-10-22)
A comparison of the single processor Whetstone performance on a variety of machines, including vector supercomputers, minisupers, super-workstations and workstations, together with that obtained on a number of vector CPUs and on single nodes of various MPP machines.
Data provided includes the Mflop performance on a variety of floating point vector loops (VL =1024), together with the total CPU time to execute the benchmark, and the MWips performance.
The primary aim of this benchmark is to provide a performance measure of both floating point (FP) and integer arithmetic; thus while trends in the VL Mflop ratings are of interest, only a small part of the total CPU time is actually involved in these operations.
bugclub.org /beginners/processors/whetstone.html   (210 words)

  
 GLOSSARY - MIPS, MOPS, MFLOPS, Performance estimation   (Site not responding. Last check: 2007-10-22)
MIPS and MOPS are usually specified for fixed point (Integer) processors while MFLOPS are specified for floating point processors.
The processor performance required for a particular application may be estimated by calculating or estimating (dummy code which counts each operation performed to execute the algorithm) the number of operations required to implement the algorithm and how many times a second the algorithm needs to run.
However, for computer type processors their specified MOPS or MFLOPS will usually need derating by a factor of 3, 4 or even more because even optimized code cannot sustain TBD the peak performance of the processor which is the specified value.
www.magma.ca /~masonjl/Glossary/glos2ayb.htm   (332 words)

  
 [No title]
For data in the cache, the integer units are each capable of loading or storing in one cycle one integer register, one floating-point register, or even two floating-point registers using a so-called quad-load or quad-store instruction.
The number of floating-point operations in the factorization was 1630 million, giving a computational rate of about 175 MFLOPS for the numerical factorization alone and 104 MFLOPS for the entire solution.
The relative improvement due to reordering alone is larger than on the POWER2 machine (for the same matrices), because of the smaller size of the primary cache on the UltraSPARC I. Replacing integer indices with pointers did not improve performance, and neither did prefetching; also, these optimizations did not slow down the algorithm.
www.research.ibm.com /journal/rd/416/toledo.txt   (8316 words)

  
 What is MFLOPS? Million Floating Point Operations Per Second
People often mean MFLOPS to mean different things, but a general definition would be the number of full word-size fp multiply operations that can be performed per second (the M stands for 'Million').
There are just a wide variety of benchmarks and one must use the most appropriate test as a basis for decision making.
All this talk of MFLOPS is fine, but it misses one very important point: memory bandwidth.
bugclub.org /beginners/math/mflops.html   (444 words)

  
 Performance Tuning   (Site not responding. Last check: 2007-10-22)
FPU should be waiting on the other and we should expect a performance near the theoretical peak of 1500 MFLOPS (million floating-point operations per second).
MFLOPS at -O2 MFLOPS at -O3 MFLOPS at -O4 pipe1.f
Although the MFLOPS rating is a good indicator of the overall performance of a routine, it is misleading in this case.
www-rcd.cc.purdue.edu /Performance/power3/tuning.html   (3784 words)

  
 CRAY Performance Monitoring
With 8 processors, the peak MFLOPS for the J90 =.2 x 8 = 1.6 GFLOPS.
Counting the number of operations in a large code is, of course, impossible.
If it is "low" (a MFLOP value below 50 is very poor, and a value above 250 is very good), that is the routine you should look into to get the most benefit for you optimization effort.
www.pdcl.eng.wayne.edu /training/PerfMon/PerfMon.html   (3118 words)

  
 DRAG Benchmarking
The greater the Mflops and the lower the ms/fft, the better the performance.
The intention of this study was to measure the Mflops performance on virtual machine (Xen virtual machine in our experiments) against that on native Linux by running the DRAG benchmark.
For both Mflops and ms/fft results presented above, we observe the scale of their performance variation in either Xen domain 0 and the user domain is less than 3%, comparing with native Linux.
people.cs.uchicago.edu /~hai/vm1/drag   (652 words)

  
 Linpack benchmark
I ran this calculation on a Mac G5 a while back, and I got INF for the MFLOPS rate because the entire calculation finished within a single timing clock-tick.
On a 2GHz Mac G5, for example, you can see 7500 MFLOPS computation rates for dgemm(); the theoretical max for this hardware is 8000 MFLOPS.
As you have seen for yourself, it is about a factor of 10 slower than an optimal implementation for the linpack benchmark.
www.codecomments.com /message209059.html   (919 words)

  
 MFLOPS - OneLook Dictionary Search
MFLOPS : Butterfly Glossary (networking terminology) [home, info]
MFLOPS : Rane Professional Audio Reference [home, info]
Words similar to MFLOPS: megaflops, million floating point operations per second, more...
www.onelook.com /cgi-bin/cgiwrap/bware/dofind.cgi?word=MFLOPS   (137 words)

  
 Latest system benchmarks - Discussion@SR
I often use MFLOPS as a benchmark because there is no way for me to buy (and/or rent/lease/borrow) every single system/processor combination that exists.
If you are actually using Sandra MFLOPS for this, then you seriously misunderstand floating point performance, and if you are also doing heavy CFD, etc then you have a problem..
Finally, MFLOPS is a weird metric, because not all operations are equal (fp divide is much slower than fp add, and is not always pipelined) - the instruction mix can affect your MFLOPS greatly.
forums.storagereview.net /index.php?act=ST&f=2&t=23425   (5540 words)

  
 Manual Page - PAPI_flops(3)
It is usually calculated directly from a counter measurement and may be different from platform to platform.
Mflop/s, or millions of floating point operations per second, is intended to represent the number of floating point arithmetic operations per second.
Attempts are made to massage the counter values to produce the theoreticallly expected value by, for instance, doubling FMA counts or subtracting floating point loads and stores if necessary.
icl.cs.utk.edu /projects/papi/files/html_man3/papi_flops.html   (331 words)

  
 The Performance of the Intel TFLOPS Supercomputer, Performance Tracking (Intel Technology Journal)
The performance per node is reported in MFLOPS for a 4-node multiplication of order 300 matrices.
The compiled code is a factor of two slower than the assembly code, which is not unusual compared to Fortran compilers on other high-end workstations.
Our analysis showed that this kernel should run somewhere between 110 MFLOPS to 130 MFLOPS (depending on the state of the L2 cache prior to the kernel's operation).
developer.intel.com /technology/itj/q11998/articles/art_2e.htm   (626 words)

  
 [No title]
% rmv: packfile threads locks mflops secs [comp/comm/%] mflops/secs [min/max/%] comm is the time spent reducing the vectors to a single vector.
% mmv: packfile mflops secs [comp/comm/%] mflops/secs [min/max/%] % hmv: packfile mflops secs [comp/comm/%] mflops/secs [min/max/%] comm is the time spent assembling the partially assembled output vectors.
However, because the number of nodes on the boundaries of subdomains is small relative to the number of nodes in the interiors of subdomains, the Mflops/sec numbers are fine for rough qualitative comparisons.
www.cs.cmu.edu /People/quake/spark98/spark98/README   (1692 words)

  
 libperfmon Tutorial: Example Instructions
The time it took to do the print of the first Mflops is thus included in the result of the second printing of Mflops.
Add to that the extremely short execution time of this test program, and the result is that the print time severely perturbs the time of the overall program execution.
Thus, while the printmflops() routine correctly samples the Mflops counter before any of these internal floating point operations are performed, they are still accumulated and added to the overall sum if the counter is not stopped.
www.sandia.gov /ASCI/Red/usage/tutorial/perfmon/example/instructions.html   (340 words)

  
 Dunigan's Sparc3 Testing
The 14th kernel is a rough estimate of peak FORTRAN performance since it has a high re-use of operands.
The following table compares the performance (Mflops) of a simple FORTRAN matrix (REAL*8 400x400) multiply compared with the performance of DGEMM from the vendor math library (-lcxml for the Alpha, -lessl for the SP, -lsunperf for the Sparc).
Also the Mflops for 1000x1000 Linpack are reported from netlib.
www.csm.ornl.gov /~dunigan/sparc3   (1973 words)

  
 MFLOPS - Million Floating point Operation Per Second
It is used to measure the number of arithmetic operations that a computer can perform in one second -- fast computers can do larger numbers of MFLOPS.
A loop, for instance, that runs at 24 MFLOPS executes 24 million floating-point operations each second.
Every attempt has been made to provide you with the correct acronym for MFLOPS.
www.auditmypc.com /acronym/MFLOPS.asp   (182 words)

  
 [No title]
There I got in-cache performance of about 1000 to 1200 Mflops (for matrices of size 100 or so, adding up a thousand times to get a long enough time to clock).
And the repeats first caused the x1, x2, etc. to overflow the floating point numbers, so the initial rate was 2 Mflops (of meaningless flops).
Then the rate was about 1100 Mflops for the intel ifc compiler.
www.ncsu.edu /itd/hpc/Courses/3incache.html   (2670 words)

  
 Replicated VLSI to Supercomputing
The Caltech Cosmic Cube is a demonstration six-dimensional hypercube using 0.05 MFLOP nodes based on the Intel 8086/8087.
Given 10 MFLOPs (and 2 watts) per chip-scale module, total performance of one gigaflop will be possible from only a few hundred chips dissipating a few hundred watts.
It must be factored to derive the pattern seen by the radar detectors; this factoring requires almost three trillion floating-point operations and exceeds 99 percent of the total run.
www.scl.ameslab.gov /Publications/Gus/Replicated/ReplicatedVLSI.html   (7303 words)

  
 [No title]
In the serial case, even the smallest loop sizes run at the "peak" rate of 1300 Mflops.
In the fastest parallel cases (where the data fits in cache and the loop sizes are large enough to amortize the cost of the PARALLEL DO, the peak rate is 2300 Mflops (not quite twice as fast as the serial case).
We see that for small enough loops the parallel bottleneck is the time to execute the PARALLEL DO (fork and then join with implicit barrier).
www.ncsu.edu /itd/hpc/Courses/10shared.html   (2461 words)

  
 QG_GYRE and Memory Bandwidth
The STREAM Triad MFLOPS represent the sustainable memory bandwidth for uncached operands.
The SPECfp92 measure is not in units of MFLOPS, but the size is very close to the "Peak MFLOPS" and so it is convenient to use it without further scaling.
The Peak MFLOPS is the "guaranteed not to exceed" specification from the manufacturer.
home.austin.rr.com /mccalpin/papers/balance/qgbox/case_study.html   (1161 words)

  
 Introduction to DSP - Programming a DSP processor: MIPS, MOPS and Mflops
The development of efficient assembly language code shows how efficient a DSP processor can be: each assembler instruction is performing several useful operations.
But if we don't use any of these operations, we are throwing away the potential of the processor and may be slowing it down drastically.
Then the Mflops rating falls from a respectable 40 Mflops to a pitiful 20 Mflops.
www.bores.com /courses/intro/program/7_mops.htm   (528 words)

  
 CS267: Notes for Lecture 2 (part 2), Jan 18, 1996
When unblocked matmul with no optimization is used, the speed is a disappointing 4.5 Mflops, a small fraction of peak speed.
DMR compiled this way gave erroneous answers for n>64, and still got only 186/266=70% of the peak machine speed (this was a bug, not a feature, and is meant as an early warning that leading edge technology is not always as reliable as less ambitious technology!).
There is yet another optimization level, not illustrated here, which would "pattern match" to discover that the unblocked implementation was really doing matrix multiplication, and replace everything by a call to ESSL's dgemm, which then nearly attains peak speed.
www.cs.berkeley.edu /~demmel/cs267/lecture03.html   (3170 words)

  
 ARSC T3D Users' Newsletter 48
Both compilers support other performance switches (the gnu C compiler presents a "switch Heaven" for those inclined), but I did not test them.
What is shown above is the harmonic mean for all 24 loops when run with loop lengths short (average 18), medium (average 89) and long (average 468).
On both compilers, there is a slight increase in the MFLOPS rate with increasing loop length.
www.arsc.edu /support/news/T3Dnews/T3Dnews48.shtml   (957 words)

  
 MIPS/MFLOPS (CPU Performance)...
There are dozens of different processor and system benchmarks, such as SPEC, Linpack, MFLOP, STREAM, Viewperf, etc. One should always use the test that is most relevant to one's area of interest and the system concerned.
With games consoles, however, this is a bit of a problem because no one has yet made a 'games console' benchmark test - people have to use existing benchmarks which were never designed for the job.
Since I mentioned fp calculations, that leads nicely onto the MFLOPS benchmark.
www.futuretech.blinkenlights.nl /perf.html   (1482 words)

  
 Fixed Time Performance
The bottom line is consistent enough to suggest that peak bandwidth instead of peak MFLOPS is a better simplistic predictor for this application.
If peak MFLOPS correlated even weakly with observed rankings of computers, it would still be a practical tool for predicting performance.
In the early days of hypercube computers, it was common to scale the problem with the number of processors, but measure the resulting speedup in an unclear manner.
www.scl.ameslab.gov /Publications/Gus/FixedTime/FixedTime.html   (7196 words)

  
 Automatically Tuned Linear Algebra Software   (Site not responding. Last check: 2007-10-22)
We have developed a general methodology for the generation of the Level 3 BLAS and describe here how this approach is carried out and some of the preliminary results we have achieved.
Our top MFLOPS on this system was on the order of 400MFLOPS, far below both the theoretical peak and that enjoyed by the vendor-supplied BLAS.
The LIB column is overloaded to convey both the BLAS used (A for ATLAS/Superscaler, S for system, F for Fortran77), and the blocking factor chosen (for instance, A(40) in this column indicates a run using ATLAS's DGEMM, using a blocking factor of 40).
www.cs.utk.edu /~rwhaley/ATL/INDEX.HTM   (8542 words)

  
 HPC Computing Environments   (Site not responding. Last check: 2007-10-22)
Whether you should work to maximize MIPS or MFLOPS depends on the actual work done by your code as well as the platform.
For codes that perform mainly floating-point calculations, MFLOPS should be maximized, of course.
On such a platform, high MFLOPS with low MIPS is preferred, as this indicates that a large number of FLOPS are being performed using the vector hardware.
www.osc.edu /hpc/computing/metrics.shtml   (316 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.