skip to main content
10.1145/3629526.3645046acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
research-article
Open access

Disambiguating Performance Anomalies from Workload Changes in Cloud-Native Applications

Published: 07 May 2024 Publication History

Abstract

Modern cloud-native applications are adopting the microservice architecture in which applications are deployed in lightweight containers that run inside a virtual machine (VM). Containers running different services are often co-located inside the same virtual machine. While this enables better resource optimization, it can cause interference among applications. This can lead to performance degradation. Detecting the cause of performance degradation at runtime is crucial to decide the correct remediation action such as, but not limited to, scaling or migrating. We propose a non-intrusive detection technique that differentiates between degradation caused by load and by interference. First, we define an operational zone for the application. Then we define a disambiguation method that uses models to classify interference and normal load. In contrast to previous work, our proposed detection technique does not require intrusive application instrumentation and incurs minimal performance overhead. We demonstrate how we can design effective Machine Learning models that can be generalized to detect interference from different types of applications. We evaluate our technique using realistic microservice benchmarks on AWS EC2. The results show that our approach outperforms existing interference detection techniques in F_1 score by at least 2.75% and at most 53.86%.

References

[1]
[Online]. Acme Air. https://github.com/acmeair
[2]
[Online]. Air Quality Monitor. https://github.com/jlofw/air-quality-monitor
[3]
[Online]. Amazon Web Services. https://aws.amazon.com/
[4]
[Online]. Google Cloud. https://cloud.google.com/
[5]
[Online]. Locust. https://locust.io/
[6]
[Online]. Online Boutique. https://github.com/GoogleCloudPlatform/microservices-demo
[7]
[Online]. Production-Grade Container Orchestration. https://kubernetes.io/
[8]
[Online]. Prometheus. https://prometheus.io/
[9]
[Online]. Stress-ng. https://kernel.ubuntu.com/~cking/stress-ng/
[10]
Sandip Agarwala, Yuan Chen, Dejan Milojicic, and Karsten Schwan. 2006. QMON: QoS-and utility-aware monitoring in enterprise systems. In 2006 IEEE International Conference on Autonomic Computing. IEEE, 124--133.
[11]
Yasaman Amannejad, Diwakar Krishnamurthy, and Behrouz Far. 2015. Detecting performance interference in cloud-based web services. In 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM). IEEE, 423--431.
[12]
Alexandru Baluta, Joydeep Mukherjee, and Marin Litoiu. 2022. Machine Learning based Interference Modelling in Cloud-Native Applications. In Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering. 125--132.
[13]
R.J. Boucherie and N.M. van Dijk. 2011. Queueing Networks: A Fundamental Approach. Springer US. https://books.google.ca/books?id=C98YswEACAAJ
[14]
Eli Brookner. 1998. Tracking and Kalman filtering made easy. Wiley New York.
[15]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 2755--2763.
[16]
Ruoyu Gao and Zhen Ming Jiang. 2017. An exploratory study on assessing the impact of environment variations on the results of load tests. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 379--390.
[17]
Surya Kant Garg and J Lakshmi. 2017. Workload performance and interference on containers. In 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 1--6.
[18]
Vojt?ch Horký, Jaroslav Kotr?, Peter Libi?, and Petr T?ma. 2016. Analysis of Overhead in Dynamic Java Performance Monitoring. In Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering (Delft, The Netherlands) (ICPE '16). Association for Computing Machinery, New York, NY, USA, 275--286. https://doi.org/10.1145/2851553.2851569
[19]
Hiranya Jayathilaka, Chandra Krintz, and Rich Wolski. 2017. Performance Monitoring and Root Cause Analysis for Cloud-Hosted Web Applications. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 469--478. https://doi.org/10.1145/3038912.3052649
[20]
Devki Nandan Jha, Saurabh Garg, Prem Prakash Jayaraman, Rajkumar Buyya, Zheng Li, and Rajiv Ranjan. 2018. A holistic evaluation of docker containers for interfering microservices. In 2018 IEEE International Conference on Services Computing (SCC). IEEE, 33--40.
[21]
Kartik Joshi, Arun Raj, and Dharanipragada Janakiram. 2017. Sherlock: Lightweight detection of performance interference in containerized cloud services. In 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 522--530.
[22]
Peng Kang and Palden Lama. 2020. Robust Resource Scaling of Containerized Microservices with Probabilistic Machine learning. In 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC). IEEE, 122--131.
[23]
Younggyun Koh, Rob Knauerhase, Paul Brett, Mic Bowman, Zhihua Wen, and Calton Pu. 2007. An analysis of performance interference effects in virtual environments. In 2007 IEEE International Symposium on Performance Analysis of Systems & Software. IEEE, 200--209.
[24]
Erin LeDell and Sebastien Poirier. 2020. H2O AutoML: Scalable Automatic Machine Learning. 7th ICMLWorkshop on Automated Machine Learning (AutoML) (July 2020). https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf
[25]
Dirk Merkel. 2014. Docker: Lightweight Linux Containers for Consistent Development and Deployment. Linux J. 2014, 239, Article 2 (March 2014).
[26]
David Mosberger and Tai Jin. 1998. Httperf-a Tool for Measuring Web Server Performance. SIGMETRICS Perform. Eval. Rev. 26, 3 (Dec. 1998), 31--37. https://doi.org/10.1145/306225.306235
[27]
Joydeep Mukherjee, Alexandru Baluta, Marin Litoiu, and Diwakar Krishnamurthy. 2020. RAD: Detecting Performance Anomalies in Cloud-based Web Services. In 2020 IEEE 13th International Conference on Cloud Computing (CLOUD). IEEE, 493--501.
[28]
Joydeep Mukherjee and Diwakar Krishnamurthy. 2018. Subscriber-driven cloud interference mitigation for network services. In 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). IEEE, 1--6.
[29]
Dejan Novakovic, Nedeljko Vasic, Stanko Novakovic, Dejan Kostic, and Ricardo Bianchini. 2013. DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (San Jose, CA) (USENIX ATC'13). USENIX Association, USA, 219--230.
[30]
Indrani Paul, Sudhakar Yalamanchili, and Lizy K John. 2012. Performance impact of virtual machine placement in a datacenter. In 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC). IEEE, 424--431.
[31]
Yongmin Tan, Hiep Nguyen, Zhiming Shen, Xiaohui Gu, Chitra Venkatramani, and Deepak Rajan. 2012. Prepare: Predictive performance anomaly prevention for virtualized cloud systems. In 2012 IEEE 32nd International Conference on Distributed Computing Systems. 285--294. https://doi.org/10.1109/ICDCS.2012.65
[32]
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Chicago, Illinois, USA) (KDD '13). Association for Computing Machinery, New York, NY, USA, 847--855. https://doi.org/10.1145/2487575.2487629
[33]
Takanori Ueda, Takuya Nakaike, and Moriyoshi Ohara. 2016. Workload characterization for microservices. In 2016 IEEE international symposium on workload characterization (IISWC). IEEE, 1--10.
[34]
Yohei Ueda and Moriyoshi Ohara. 2017. Performance competitiveness of a statically compiled language for server-side Web applications. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 13--22.
[35]
Fotios Voutsas, John Violos, and Aris Leivadeas. 2023. Filtering alerts on cloud monitoring systems. In 2023 IEEE International Conference on Joint Cloud Computing (JCC). IEEE, 34--37.
[36]
Sa Wang, Wenbo Zhang, Tao Wang, Chunyang Ye, and Tao Huang. 2015. Vmon: Monitoring and quantifying virtual machine interference via hardware performance counter. In 2015 IEEE 39th Annual Computer Software and Applications Conference, Vol. 2. IEEE, 399--408.
[37]
C. Wohlin, P. Runeson, M. Host, M.C. Ohlsson, B. Regnell, and A. Wesslen. 2000. Experimentation in Software Engineering. Kluwer Academic Publishers.
[38]
Tao Zheng, C. Murray Woodside, and Marin Litoiu. 2008. Performance Model Estimation and Tracking Using Optimal Filters. IEEE Transactions on Software Engineering 34, 3 (2008), 391--406. https://doi.org/10.1109/TSE.2008.30

Cited By

View all
  • (2024)Exploring Approaches to Integrate Performance Prediction and Anomaly Detection in Microservices Systems2024 34th International Conference on Collaborative Advances in Software and COmputiNg (CASCON)10.1109/CASCON62161.2024.10838219(1-4)Online publication date: 11-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPE '24: Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering
May 2024
310 pages
ISBN:9798400704444
DOI:10.1145/3629526
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2024

Check for updates

Author Tags

  1. cloud computing
  2. interference
  3. machine learning
  4. microservice

Qualifiers

  • Research-article

Conference

ICPE '24

Acceptance Rates

Overall Acceptance Rate 252 of 851 submissions, 30%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)299
  • Downloads (Last 6 weeks)53
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Approaches to Integrate Performance Prediction and Anomaly Detection in Microservices Systems2024 34th International Conference on Collaborative Advances in Software and COmputiNg (CASCON)10.1109/CASCON62161.2024.10838219(1-4)Online publication date: 11-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media