skip to main content
article
Free access

Contrasting characteristics and cache performance of technical and multi-user commercial workloads

Published: 01 November 1994 Publication History

Abstract

Experience has shown that many widely used benchmarks are poor predictors of the performance of systems running commercial applications. Research into this anomaly has long been hampered by a lack of address traces from representative multi-user commercial workloads. This paper presents research, using traces of industry-standard commercial benchmarks, which examines the characteristic differences between technical and commercial workloads and illustrates how those differences affect cache performance.
Commercial and technical environments differ in their respective branch behavior, operating system activity, I/O, and dispatching characteristics. A wide range of uniprocessor instruction and data cache geometries were studied. The instruction cache results for commercial workloads demonstrate that instruction cache performance can no longer be neglected because these workloads have much larger code working sets than technical applications. For database workloads, a breakdown of kernel and user behavior reveals that the application component can exhibit behavior similar to the operating system and therefore, can experience miss rates equally high. This paper also indicates that “dispatching” or process switching characteristics must be considered when designing level-two caches. The data presented shows that increasing the associativity of second-level caches can reduce miss rates significantly. Overall, the results of this research should help system designers choose a cache configuration that will perform well in commercial markets.

References

[1]
Anant Agarwal, "Analysis of Cache Performance for Operating Systems and Multi-programming", ACM Transactions on Computer Systems, Vol. 6. No. 4, Nov. 1988.
[2]
James Bell and David Casasent, "An investigation of Alternative Cache Organizations", IEEE Transactions on Computers, Vol. C-23. No. 4, April 1974, p. 346-351.
[3]
Anita Borg, R.E. Kessler, Georgia Lazana, and David W. Wall, "Long Address Traces from RISC Machines: Generation and Analysis'', WRL Research Report 89/14, DEC Research Lab, Sept. 1989.
[4]
J. Bradley Chert and Brian N. Bershad, "The Impact of Operating System Structure on Memory System Performance", Operating Systems Review, Vol. 27., No. 5, Dec. 1993.
[5]
Douglas W. Clark, "Cache Performance in the VAX-11/ 780", ACM Transactions on Computer Systems, Vol. 1., No. 1, Feb. 1983.
[6]
Thomas M. Conte and Wen-mei W. Hwu, "Benchmark Characterization", Computer, January 1991, pp 48-56.
[7]
Zarka Cvetanovic and Dileep Bhandarkar, "Characterization of Alpha Performance using TP and SPEC Workloads", Proceedings of the 21th international Symposium on Computer Architecture, Chicago, Illinois, April 18-21, 1994.
[8]
M. Franklin, W. Alexander, R. Jauhari, A. M. G. Maynard, and B. R. Olszewski, "Commercial Workload Performance in the IBM POWER2 RISC System/6000 Processor," IBM Journal of Research and Development, Vol. 38, No. 4, July 1994.
[9]
Jeffrey D. Gee, Mark D. Hill, Dionisios N. Pnevmatikatos, Alan Jay Smith, "Cache Performance of the SPEC92 Benchmark Suite", IEEE Micro, Aug. 1993, p. 17-27.
[10]
John L. Hennessy and David A. Patterson, "Computer Architecture: A Quantitative Approach", Morgan Kaufmann Publishers, Inc., 1990.
[11]
Norma P. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers", 17th Annual international Symposium on Computer Architecture, Seattle, Washington, May 1990.
[12]
Ann Marie Grizzaffi Maynard, Colette M. Donnelly, Bret R. Olszewski, "Contrasting Characteristics and Cache Performance of Technical and Multi-User Commercial Workloads", IBM Technical Report, 1994. (submitted for pubhcauon)
[13]
John J. McGrory II, Alexander Carlton, and Bradley J. Askins, "Transaction Processing Performance on PA-RISC Commercial Unix Systems", COMPCON, Spring 1992, 37th IEEE Computer Society International Conference, San Francisco, CA., Feb. 1992, p. 199-206.
[14]
M. Misra, ed., IBM RISC System/6000 Technology, IBM Corporation, 1990, IBM Publication SA 23-2619.
[15]
A. Poursepanj et aL, "The PowerPCTM 603 Microprocessor: Performance Analysis and Design Trade-offs," Proceedings of COMPCON 1994, February 1994.
[16]
Cache and Memory Hierarchy Design: A Performance Directed Approach, Morgan Kaufmann Publishers, Inc., 1990.
[17]
Alan Jay Smith, "Cache Memories", Computing Surveys, Vol. 14, No. 3, Sept. 1982.
[18]
Alan Jay Smith, "Cache Evaluataons and the Impact of Workload Chome", 12th Annual International Symposium on Computer Architecture Conference Proceedings, Boston, MA, June 17-19, 1985.
[19]
SPEC Newsletter, Vol. 3., No. 4, December 1991, pp 18-21.
[20]
Chriss Stephens, Bryce CogsweU, John Heinlem, Gregory Palmer, and John P. Shen, "Instruction Level Profiling and Evaluation of the IBM RISC System/60000", Proceedings of the 18th International Symposium on Computer Architecture, Toronto, Canada, May 1991.
[21]
Josep Torrellas, Anoop Gupta, and John Hennessy, "Characterizing the Caching and SynchronizaUon Performance of a Multiprocessor Operating System", Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 12-15, 1992.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 28, Issue 5
Dec. 1994
323 pages
ISSN:0163-5980
DOI:10.1145/381792
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
    November 1994
    341 pages
    ISBN:0897916603
    DOI:10.1145/195473
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1994
Published in SIGOPS Volume 28, Issue 5

Check for updates

Author Tags

  1. cache performance
  2. commercial workloads
  3. memory subsystems
  4. operating system activity
  5. technical applications

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)239
  • Downloads (Last 6 weeks)33
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2013)Reducing cache and TLB power by exploiting memory region and privilege level semanticsJournal of Systems Architecture10.1016/j.sysarc.2013.04.00259:6(279-295)Online publication date: Jun-2013
  • (2015)Beyond the StarsACM SIGCOMM Computer Communication Review10.1145/2805789.280579245:3(12-18)Online publication date: 13-Jul-2015
  • (2015)Cooperative group provisioning with latency guarantees in multi-cloud deploymentsACM SIGCOMM Computer Communication Review10.1145/2805789.280579145:3(4-11)Online publication date: 13-Jul-2015
  • (2015)Runtime Resource Allocation for Software PipelinesACM Transactions on Parallel Computing10.1145/27423472:1(1-23)Online publication date: 21-May-2015
  • (2015)Quantitative Study of Music Listening Behavior in a Smartphone ContextACM Transactions on Interactive Intelligent Systems10.1145/27382205:3(1-30)Online publication date: 8-Sep-2015
  • (2014)Efficient Stream Provenance via Operator InstrumentationACM Transactions on Internet Technology10.1145/263368914:1(1-26)Online publication date: 7-Aug-2014
  • (2011)SniperProceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2063384.2063454(1-12)Online publication date: 12-Nov-2011
  • (2010)Computer Architecture Performance Evaluation MethodsSynthesis Lectures on Computer Architecture10.2200/S00273ED1V01Y201006CAC0105:1(1-145)Online publication date: 22-Dec-2010
  • (2010)Loop-Based Instruction Prefetching to Reduce the Worst-Case Execution TimeIEEE Transactions on Computers10.1109/TC.2010.4459:6(855-864)Online publication date: 1-Jun-2010
  • (2009)Analyzing the worst-case execution time for instruction caches with prefetchingACM Transactions on Embedded Computing Systems10.1145/1457246.14572538:1(1-19)Online publication date: 4-Jan-2009
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media