skip to main content
article

War of the benchmark means: time for a truce

Published: 01 September 2004 Publication History

Abstract

For decades, computer benchmarkers have fought a War of Means. Although many have raised concerns with the geometric mean (GM), it continues to be used by SPEC and others. This war is an unnecessarymisunderstanding due to inadequately articulated implicit assumptions, plus confusio namong populations, their parameters, sampling methods, and sample statistics. In fact, all the Means have their uses, sometimes in combination. Metrics may be algebraically correct, but statistically irrelevant or misleading if applied to population distributions for which they are inappropriate. Normal (Gaussian) distributions are so useful that they are often assumed without question,but many important distributions are not normal.They require different analyses, most commonly by finding a mathematical transformations that yields a normal distribution,computing the metrics, and then back-transforming to the original scale. Consider the distribution of relative performance ratios of programs on two computers. The normal distribution is a good fit only when variance and skew are small, but otherwise generates logical impossibilities and misleading statistical measures. A much better choice is the lognormal (or log-normal) distribution, not just on theoretical grounds, but through the (necessary) validation with real data. Normal and lognormal distributions are similar for low variance and skew, but the lognormal handles skewed distributions reasonably, unlike the normal. Lognormal distributions occur frequently elsewhere are well-understood, and have standard methods of analysis.Everyone agrees that "Performance is not a single number," ... and then argues about which number is better. It is more important to understanding populations, appropriate methods, and proper ways to convey uncertainty. When population parameters are estimated via samples, statistically correct methods must be used to produce the appropriate means, measures of dispersion, Skew, confidence levels, and perhaps goodness-of-fit estimators. If the wrong Mean is chosen, it is difficult to achieve much. The GM predicts the mean relative performance of programs, not of workloads. The usual GM formula is rather unintuitive, and is often claimed to have no physical meaning. However, it is the back-transformed average of a lognormal distribution, as can be seen by the mathematical identity below. Its use is not onlystatistically appropriate in some cases, but enables straightforward computation of other useful statistics.<display equation>"If a man will begin in certainties, he shall end in doubts, but if he will be content to begin with doubts, he shall end with certainties."  — Francis Bacon, in Savage.

References

[1]
Savage, S, "Some Gratuitous Inflammatory Remarks on the Accounting Industry," http://www.stanford.edu/dept/MSandE/faculty/savage/AccountingRemarks.pdf
[2]
Fleming, P.,Wallace, J. "How Not to Lie With Statistics:The Correct Way to Summarize Benchmarks," Comm ACM, Vol 29, No. 3, pp. 218-221, March 1986.
[3]
Smith, J., "Characterizing Computer Performance with a Single Number," Comm ACM, Vol 31, No. 10, pp. 1202--1206, October 1988.
[4]
John, L., "More on finding a Single Number to indicate Overall Performance of a Benchmark Suite," Computer Architecture News, Vol. 32, No 1, pp. 3--8, March 2004.
[5]
Jain, R., The Art of Computer Systems Performance Analysis, John Wiley and Sons, New York, 1991.
[6]
Lilja, D., Measuring Computer Performance -- A Practioner's Guide, Cambridge University Press, 2000.
[7]
Hennessy, J, Patterson, D., Computer Architecture -- A Quantitative Approach, Third Edition, Morgan Kaufmann Publishers, 2003. Earlier editions 1990 and 1996.
[8]
McMahon, F., "The Livermore Fortran kernels: A Computer test of numerical performance range," Tech. Rep. UCRL-55745, Lawrence Livermore national Laboratory, Univ. of California, Livermore, 1986. See also:
[9]
McMahon, F., "L.L.N.L Fortran Kernels Test" source. www.netlib.org/benchmark/livermore or www.llnl.gov/asci_benchmarks/asci/limited/lfk/README.html
[10]
Digital Review, "At the speed of light through a PRISM," Digital Review, pp. 39-42, December 19, 1988.
[11]
Spanier, S., "Sun-3 Benchmarks," Sun Microsystems, Aug 1985.
[12]
Hewlett Packard, "HP 9000 Series 800 Performance Brief," May 1987.
[13]
MIPS Computer Systems, "Performance Brief Part 1: CPU Benchmarks, Issue 3.0," October 1987.
[14]
AMD, "Am29000 Performance Analysis,", May 1988.
[15]
Apollo Computer, "Apollo Performance ReportVersion 1.2," Digital Equipment 1988.
[16]
Digital Equipment, "RISC Workstation Performance Summary," July 11, 1989.
[17]
Hewlett Packard, "Series 300 HP-UX 6.5 Performance Brief," April 1989.
[18]
Ralph Humphries, "Performance Report, Revision 1.4, July 1, 1989," Silicon Graphics Computer Systems, Mountain View, CA.
[19]
McInnis, D., Kusik, B., Bhandarkar, D., "VAX 8800 System Overview," Proc. COMPCON 1987, pp. 316--321, San Francisco, CA, Feb 1987. Note:VAX 8800 uses two 8700 CPUs.
[20]
SPEC, SPEC Newsletter, Vol. 1, No. 1, Fall 1989.
[21]
SPEC, "Benchmark Results," SPEC Newsletter, Vol. 2, Issue 1, Winter 1990.
[22]
Mashey, J., "SPEC Results Help Normalize Vendor Mips-Ratings for Sensible Comparison," SPEC Newsletter, Vol. 2, Issue 3, Summer 1990.
[23]
Giladi, R, Ahituv, N., "SPEC as a Performance Evaluation Measure," Computer, Vol. 28, No. 8, pp 33--42,Aug 1995.
[24]
Mighafori, N., Jacoby, M., and Patterson, D., "Truth in SPEC Benchmarks," Computer, Vol 28, No. 8, pp. 33--43, Aug 1995.
[25]
DeCoursey, W., Statistics and Probability for Engineering Applications with Microsoft Excel, Newnes, Amsterdam, 2003.
[26]
Good, P., Hardin, J., Common Errors in Statistics (and How to Avoid Them), Wiley-Interscience, Hoboken, NJ, 2003.
[27]
Limpert, E., Stahel, W., "Life is log-normal! Science and art, life and statistics," ETH Zurich, 1998, http://www.inf.ethz.ch/personal/gutc/lognormal/brochure.html
[28]
Limpert, E., Stahel, W., and Abbt, W., "Log-Normal Distributions across the Sciences: Keys and Clues," BioScience Vol 51, No. 5, pp. 341--352, May 2001. Also in: http://www.inf.ethz.ch/personal/gutc/lognormal/bioscience.pdf
[29]
SPEC, www.specbench.org

Cited By

View all
  • (2024)A benchmark suite and performance analysis of user-space provenance collectorsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663627(85-95)Online publication date: 18-Jun-2024
  • (2024)R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup InsteadIEEE Computer Architecture Letters10.1109/LCA.2024.336192523:1(78-82)Online publication date: 5-Feb-2024
  • (2023)Hmem: A Holistic Memory Performance Metric for Cloud ComputingBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_11(171-187)Online publication date: 3-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 32, Issue 4
September 2004
41 pages
ISSN:0163-5964
DOI:10.1145/1040136
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2004
Published in SIGARCH Volume 32, Issue 4

Check for updates

Author Tags

  1. benchmarking
  2. geometric mean
  3. lognormal distribution

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A benchmark suite and performance analysis of user-space provenance collectorsProceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663627(85-95)Online publication date: 18-Jun-2024
  • (2024)R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup InsteadIEEE Computer Architecture Letters10.1109/LCA.2024.336192523:1(78-82)Online publication date: 5-Feb-2024
  • (2023)Hmem: A Holistic Memory Performance Metric for Cloud ComputingBenchmarking, Measuring, and Optimizing10.1007/978-981-97-0316-6_11(171-187)Online publication date: 3-Dec-2023
  • (2021)Methodological Principles for Reproducible Performance Evaluation in Cloud ComputingIEEE Transactions on Software Engineering10.1109/TSE.2019.292790847:8(1528-1543)Online publication date: 1-Aug-2021
  • (2021)Interactions, Impacts, and Coincidences of the First Golden Age of Computer ArchitectureIEEE Micro10.1109/MM.2021.311287641:6(131-139)Online publication date: 1-Nov-2021
  • (2021)Revisiting Issues in Benchmark Metric SelectionPerformance Evaluation and Benchmarking10.1007/978-3-030-84924-5_3(35-47)Online publication date: 4-Aug-2021
  • (2020)A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads2020 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC50251.2020.00017(83-93)Online publication date: Oct-2020
  • (2020)Computer comparisons in the presence of performance variationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7319-214:1(21-41)Online publication date: 1-Feb-2020
  • (2020)MetricsSystems Benchmarking10.1007/978-3-030-41705-5_3(45-70)Online publication date: 29-Aug-2020
  • (2019)Spread-n-shareProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356152(1-15)Online publication date: 17-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media