0% found this document useful (0 votes)
25 views18 pages

Li-Cloud-Performance-Yad1 2

Cloud système red hat

Uploaded by

totojojo232232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views18 pages

Li-Cloud-Performance-Yad1 2

Cloud système red hat

Uploaded by

totojojo232232
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Demystifying performance

in the public cloud


Key Red Hat Enterprise Linux benchmarks

Introduction
In this fast-paced world of technological advancements, rising customer demands, and rapidly
increasing competition, using a public cloud is a popular option for deploying business-critical work-
loads. In a recent Qualtrics survey, improved customer experience and reduced total cost of owner-
ship are among the top outcomes that businesses are expecting when it comes to moving workloads
to the cloud.1 However, the convenience of using public cloud infrastructure introduces some unique
challenges. Customers expect excellent performance, regardless of their chosen deployment model.
Linux® is a critical component of the cloud infrastructure as it is often chosen as the foundation for
modern cloud services and emerging use cases.

With capabilities that facilitate uninterrupted workload migration and more efficient manage-
ment, Red Hat® Enterprise Linux delivers the consistency you need to streamline how you manage
hardware and workload performance across your entire hybrid cloud infrastructure. You can detect
performance lag or anomalies to determine the reason behind application performance issues.
Intelligent tooling helps you build a comprehensive view of overall system performance and provides
user-friendly tuning of the kernel for optimum function. Use best practices for performance tuning
with common tuning profiles that optimize hardware and workload performance.

Testing parameters
In 2021, Red Hat performance experts conducted extensive internal testing on public cloud
environments. This paper details the results of that testing, highlighting important factors that
affect Red Hat Enterprise Linux performance in the cloud.

Our testing measured Red Hat Enterprise Linux performance on two popular public cloud platforms
using central processing unit (CPU) and memory-bound workloads. We selected three well-known
benchmarks—LINPACK, STREAM, and SPECjbb2005—and used them to create a CPU aggregate
score that we use for price-performance comparisons. Details on performance characteristics, test
suites, benchmarks, and cloud instance types are outlined in the sections that follow. Keep in mind
that shared cloud infrastructure often has varying performance due to other workloads running on
the same systems. Your price to performance ratio could fluctuate based on other users’ activity.

facebook.com/redhatinc
@RedHat
linkedin.com/company/red-hat 1 Dan Juengst, “Insights into hybrid cloud: Here’s what to consider.” Red Hat blog, 28 May 2020.

redhat.com Detail Demystifying performance in the public cloud


Key performance characteristics and test suites
When it comes to cloud performance, several performance characteristics stand out. The complex-
ity of the cloud requires IT teams to configure their infrastructure to respond to change without nec-
essarily knowing what the changes will be—meaning performance considerations should take these
constant shifts into account. We considered these performance features in our tests:

1. Peak load is the maximum amount of concurrent operations that are on a server within a certain
time period. Peak load measurements are important because they help enterprises properly size
their systems before the busy period hits.

2. Memory bandwidth is the amount of data that can be moved to and from the given memory
destination by the CPU. This metric is important because it demonstrates how quickly the operat-
ing system (OS) can get data into and out of memory for processing. If memory bandwidth is low,
then the processor would be wasting cycles waiting for memory to respond. If memory bandwidth
is high, processor cycles are not wasted.

3. Compute throughput is the number of concurrent compute operations performed per second.
Higher compute throughput means more responsive applications and a better user experience.

4. Price-performance ratio helps balance the price of the solution against its effectiveness. The
lower the price-performance ratio, the better since you are able to get more performance value at
a lower cost. Workloads that are able to scale out, such as containerized applications, will benefit
more than workloads that can only scale up, such as monolithic single-node applications.

There are hundreds, if not thousands, of programs that stress the CPU and memory components of a
system in different ways. When selecting which benchmarks to try, there are several considerations.
The benchmarks should:

 Be standardized and well-known.

 Scale as the systems grow.

 Be predictable and have low variances.

 Be simple to set up.

Benchmarks
LINPACK

The LINPACK benchmark solves a dense system of linear equations focused on floating-point
compute capabilities of the CPU. As all of the instance types in this document are based on Intel
CPUs, we will use the version of LINPACK that Intel ships as part of its Intel Math Kernel Library.
Other CPU types in future work will use a version of LINPACK optimized for their architecture.

STREAM

STREAM is a simple synthetic benchmark program that measures sustainable memory bandwidth (in
MB/s). This benchmark is run by increasing the load starting at one thread until there are two threads
per virtual central processing unit (vCPU) in the system. We test four separate sets of operations
and we include all four in our aggregate score because each highlights a slightly different aspect
of performance.

redhat.com Detail Demystifying performance in the public cloud 2


SPECjbb2005

SPECjbb2005 is the now-retired Standard Performance Evaluation Corporation (SPEC) benchmark


for measuring the performance of server-side JavaTM. The benchmark is run by increasing the load
starting at one thread until there are two threads per CPU in the system. Rather than reporting the
official SPECjbb2005 metric, which is measured from (# of CPUs) to (2 * # of CPUs), we choose to
report the peak throughput as we have found it is more relevant to the loads customers will run.

CPU aggregate

The CPU aggregate score is calculated as the geometric mean of the above benchmarks. This score
allows us to calculate a single metric per system from the results of multiple benchmarks that are
measured using different metrics without any one benchmark overwhelming the rest of the results.

The right level of performance testing ensures that your workloads meet expectations and deliver
a superior user experience. The testing highlights potential problems before your system is put
into production. Our study is limited to just a few benchmarks. We recommend that you ade-
quately test your workload prior to deploying it into production.

Cloud instance types and sizes


Instance types define the hardware configuration of a virtual machine (VM) running in the cloud.
Cloud providers typically offer a variety of instance types, each optimized to fit a different use case.
With each instance type, you get a mix of CPU, memory, storage, and networking resources, provid-
ing the flexibility to choose what works best for your application. Red Hat Enterprise Linux supports
different instance types across many certified public cloud providers.

In our tests, we used these instance types that exist on the public cloud:

 General-purpose instances provide a good balance of compute, memory, and networking


resources. They are suitable for a variety of workloads.

 Compute-optimized instances serve compute-bound applications that benefit from high-per-


formance, modern CPU architectures.

 Storage-optimized instances are for workloads that require low latency and process random
input/output operations per second (IOPS).

Across each of these instance types, we have selected a representative instance in the small, medium,
and large size categories. We chose instances with 8, 32, and 64 CPUs. Because some instance types
also scale beyond that, we also chose the largest CPU count supported if it is higher than 64. In the
case where some instance types do not have sizes available at 32 and 64 CPUs, we have picked the
next-largest available size.

Instance type naming


To keep all of these results sufficiently anonymous, we have adopted the following instance naming
scheme that is used in all result charts and discussions.

CVX_{C|G|S}YYCPU where:

 X: Cloud vendor X, either cloud vendor 1 (CV1) or cloud vendor 2 (CV2)

 C|G|S: Compute-optimized, general purpose, or storage-optimized instance class

redhat.com Detail Demystifying performance in the public cloud 3


 YY: Number of vCPUs in this particular instance type

E.g., CV1_S80CPU equals cloud vendor 1 storage-optimized instance with 80vCPUs

A short note on pricing


Pricing in the public cloud is a complicated subject, and even within a given set of parameters, there
is significant cost variance. In most public clouds, there is a range of pricing options, depending on
factors including the type of commitment, down payment discounts, and spot prices. For simplicity,
we have picked on-demand pricing for each cloud for our calculations.

Performance results
The results below are based on internal benchmark tests run using public cloud infrastructures.

General-purpose class instances

General-purpose class instances are described by cloud vendor 1 and cloud vendor 2 as “balanced”
configurations with 4GB of memory per vCPU. They are intended as a good solution for most work-
loads if not optimized for any specific cases. An immediate difference between the cloud vendor 1
and cloud vendor 2 general-purpose instances is that the cloud vendor 1 instances are running on
the older Intel Broadwell CPU while the general purpose instances in cloud vendor 2 are running
Intel Skylake or Intel Cascade Lake. This gives the cloud vendor 2 instances much improved per-
thread performance due to improved microarchitecture, faster memory support, Intel UltraPath
Interconnect (UPI), Intel Advanced Vector Extensions 512 (AVX-512), and other features.

LINPACK

Figure 1 explanation

We immediately see the


impact of the newer architec-
ture in the LINPACK bench-
mark results. The small and
medium instance saw a 12-24%
increase and the large 64
vCPU instance saw a 57%
increase. The option of a
96 vCPU instance in cloud
vendor 2, which is not offered
in cloud vendor 1, gives an 86%
improvement over the largest
general purpose cloud vendor
1 instance type.

Figure 1. General purpose LINPACK performance

redhat.com Detail Demystifying performance in the public cloud 4


Figure 2 explanation

For small and medium


instances, there were no great
price savings to be gained.
The largest instances should
only be selected when the
problem size is too large for
smaller instances due to either
memory size, input/output
(I/O), or network bandwidth
requirements. In general,
cloud vendor 2 instances
perform better for a given
price point.

Figure 2. General purpose LINPACK price/performance

STREAM

Figure 3 explanation

The STREAM benchmark


shows a slightly different
picture than LINPACK due to
its heavy reliance on memory
bandwidth. Once that band-
width is exhausted, no gains
are achieved by adding more
vCPUs, and additional conten-
tion for that bandwidth
can actually cause a
reduction in the overall
benchmark throughput.

Figure 3. General purpose STREAM triad performance

redhat.com Detail Demystifying performance in the public cloud 5


While we see improved performance going from the small (8 vCPU) instance sizes to the medium (32
vCPU) instance sizes, it is only scaling at about 60% (i.e., for every additional vCPU added, we only
see 60% of the gain we would expect with perfect scaling.) This result suggests that the memory
bandwidth of the underlying hardware is exhausted somewhere before the full 32 vCPUs. The
massive jump in performance at the 64 and 96 vCPU counts is due mostly to the second CPU socket
and the associated additional memory bandwidth available.

Figure 4 explanation

When the price is factored


in, we see that the smallest
instances are far in the lead.
This result can make for a
more complicated sizing deci-
sion if the workload does not
fit on an 8 vCPU instance. Can
the workload be split into mul-
tiple instances (e.g., multihost
Redis or clustered Oracle DB)?
Or is the decreased price/
performance ratio of a single
larger instance necessary due
to other factors (e.g., node-to-
node latency, network band-
width constraints)?

Figure 4. General purpose STREAM triad price/performance

redhat.com Detail Demystifying performance in the public cloud 6


SPECjbb2005

Figure 5 explanation

The SPECjbb2005 benchmark


is generally very parallelizable
when you control for systems
that have multiple non-uni-
form memory access (NUMA)
nodes. In benchmarking, we
control for this by running one
Java VM per NUMA node. With
that caveat, we see very good
scaling of this benchmark
across all instance sizes.

Figure 5. General purpose SPECjbb2005 peak throughput

Figure 6 explanation

This scalability has the inter-


esting effect of flattening out
the price/performance com-
parison, effectively suggest-
ing that customers can get a
roughly equal return on their
investment throughout the
instance sizes, and as a result,
they should pick the instance
size that meets their require-
ments without having to worry
about wasted costs or having
to buy excess compute cycles.

Figure 6. General purpose SPECjbb2005 peak price/performance

redhat.com Detail Demystifying performance in the public cloud 7


CPU aggregate score

Figure 7 explanation

The CPU aggregate score, as


explained above, tracks largely
in line with the vCPU count of
the instances while taking into
account the improved perfor-
mance of the cloud vendor 2
instances due to their newer
Intel Skylake architecture.

Figure 7. General purpose aggregate CPU performance score

Figure 8 explanation

Factoring in price, we see


again the smallest instance
types have overall the best
aggregate price/performance
with the Skylake-powered
cloud vendor 2 instances
slightly outperforming the
Broadwell-powered cloud
vendor 1 instances.

Figure 8. General purpose aggregate CPU price/performance

redhat.com Detail Demystifying performance in the public cloud 8


Compute optimized

The compute-optimized class of instances in cloud vendor 1 and cloud vendor 2 are much closer
in configuration than what is found in the general purpose category, with both supporting 2GB of
memory per vCPU and the cloud vendor 1 class coming with either Skylake or Cascade Lake pro-
cessors and the cloud vendor 2 class coming with Cascade Lake processors. With the more limited
memory per vCPU, customers should select this instance type when their workload’s working set
is relatively small or the improvement in price/performance is sufficient to justify the reduced
memory size.

Note for the below results that cloud vendor 2 has chosen non-power-of-two CPU counts for their
medium and large instance sizes. Instead of 32 vCPUs, their closest is 36, and instead of 64, their
closest is 72. For workloads that scale best with powers of two (common in some high-performance
computing (HPC)-style workloads), this is a factor to keep in mind.

LINPACK

Figure 9 explanation

With instances nearly identi-


cal in configuration, the similar
performance of the LINPACK
benchmark is not surprising.
Notice the significant jump
in performance from the 32
vCPU instances to the 64
vCPU instances as a second
socket is added.

Figure 9. Compute optimized LINPACK performance

redhat.com Detail Demystifying performance in the public cloud 9


Figure 10 explanation

The slight difference in pricing


between the similarly-sized
instances in cloud vendor 1
and cloud vendor 2 shows in
the slightly higher price/per-
formance for similarly-sized
cloud vendor 1 instances.
These results are quite dif-
ferent from the general
purpose category given the
significantly higher price/
performance of the smallest
instances. Larger instance
sizes have roughly equiva-
lent price/performance, so if
the 8 vCPU instances are too
small for the working set (or,
as before, if there are network Figure 10. Compute optimized LINPACK price/performance

or storage requirements the 8


vCPU instances cannot meet)
then the smallest instance STREAMS
that will meet the workload
The memory bandwidth-hungry STREAMS benchmark sees almost no gain going from 8 vCPU
needs will be the best choice.
instances to 32 vCPU instances. This result suggests that with a sufficiently optimized work-
load, 8 vCPUs can power the full memory bandwidth available on the underlying hardware of these
instances. If the workload is predominantly memory bandwidth limited, using the 32 vCPU instances
likely does not make any sense. However, moving to larger instances provides a solid increase in per-
formance due to the addition of the second CPU socket and attached memory.

redhat.com Detail Demystifying performance in the public cloud 10


Figure 11 explanation

The drop in performance


between the 64 and 72 vCPU
configurations in cloud vendor
1 could suggest either a con-
tention for memory resources
with the increased vCPU
count or possibly that the 64
vCPU instance tested was
spun up on a Cascade Lake-
based system and the 72
vCPU instance spun up on a
Skylake-based system.

Figure 11. Compute optimized STREAM triad performance

Figure 12 explanation

The price/performance analy-


sis for the STREAMS bench-
mark is even more drastically
weighted toward the 8 vCPU
instances. Even the over
doubled throughput of the
64 and more vCPU instances
does not overtake the price
premium for memory band-
width limited workloads.

Figure 12. Compute optimized STREAM triad performance

redhat.com Detail Demystifying performance in the public cloud 11


SPECjbb2005

Figure 13 explanation

The performance curves for


these instances look very
similar to those of the general
purpose instances, with the
notable exception of the
CV2_C72CPU instance. It is
possible that during testing we
were allocated slow or more
heavily shared instances than
for other runs. This is a feature
of the public cloud—some
days workloads simply run
slower due to impacts that
are outside of your
ability to measure. Aside from
that anomaly, this workload
scales fairly well as vCPU
count increases. Figure 13. Compute optimized SPECjbb2005 peak throughput

Figure 14 explanation

When we factor in price, we


again see that the 8 vCPU
instances have a significant
lead in price/performance.
While the workload scales
well, it does not scale well
enough to overcome the linear
increase in price as vCPU
count increases.

Figure 14. Compute optimized SPECjbb2005 peak price/performance

redhat.com Detail Demystifying performance in the public cloud 12


CPU aggregate score

Figure 15 explanation

The aggregate perfor-


mance for this instance class
is heavily affected by the
memory bandwidth con-
straints exposed by the
STREAMS benchmark. Since
other workloads scale, the lack
of memory bandwidth scal-
ability is magnified even in this
aggregated score.

Figure 15. Compute optimized aggregate CPU performance score

Figure 16 explanation

These price/performance
scores should not be any sur-
prise after looking at the pre-
vious results. Worth noting:
The degree to which the 8
vCPU instances outscore the
larger instances is significant.

Figure 16. Compute optimized aggregate CPU price/performance

redhat.com Detail Demystifying performance in the public cloud 13


Storage optimized

The storage-optimized instance classes in cloud vendor 1 and cloud vendor 2 are even more different
than the general purpose classes. In cloud vendor 1, the instances are based on the AMD Naples CPU,
and the cloud vendor 2 instances are based on Intel Skylake CPUs. These CPU architectures are
drastically different from each other, as seen in the following benchmark results. While most custom-
ers who select these instance classes are looking for the advantages gained by the instance-local
storage, the applications that run on them are still dependent on CPU horsepower.

LINPACK

Figure 17 explanation

Right away we see the signifi-


cant difference between the
Naples-powered cloud vendor
1 instances and the Skylake-
powered cloud vendor 2
instances. Note: The versions
of the LINPACK benchmark
between the two instances
differ. Intel provides a highly
optimized LINPACK kit that we
use on all Intel-based systems.
AMD has provided instruc-
tions on building and running
the HPL kit and their provided
BLIS library.

Figure 17. Storage optimized LINPACK performance

A further improvement for the AMD-based results could have been made using the AMD Optimized
compiler, but at this time we have not studied its impact on high-performance LINPACK (HPL) or
LINPACK results. Customers may find that applications that have been highly optimized for Intel
CPUs might need work to perform well on AMD CPUs.

redhat.com Detail Demystifying performance in the public cloud 14


Figure 18 explanation

Due to its significantly better


performance, the cloud
vendor 2 instances also show
a considerably better price/
performance, though it should
be noted that price/perfor-
mance decreases as instance
size increases. We recom-
mend selecting the smallest
instance size appropriate for
the workload. While consid-
erably slower and with lower
price/performance, the cloud
vendor 1 instances do show
very consistent price/perfor-
mance, likely due to the high
proportion of NUMA nodes to
vCPUs (8 vCPUs per node). Figure 18. Storage optimized LINPACK price/performance

STREAMS

Figure 19 explanation

The performance differences


between the instances are less
pronounced in the STREAMS
benchmark due to the
increased memory bandwidth
available on the Naples-based
cloud vendor 1 class. Even
with that consideration, these
instances do not approach
the performance of the cloud
vendor 2 instances at similar
vCPU counts.

Figure 19. Storage optimized STREAM triad performance

redhat.com Detail Demystifying performance in the public cloud 15


Figure 20 explanation

As with the LINPACK results,


we see a significant drop
off in price/performance for
both cloud vendor 1 and cloud
vendor 2 as instance size
grows. In the case of this class
of instances, purchasing deci-
sions will likely be made for
storage size or storage band-
width/IOPS. This result again
confirms that the smallest
instance possible for a given
workload is the most price-
efficient choice.

Figure 20. Storage optimized STREAM triad price/performance

SPECjbb2005

Figure 21 explanation

Note that a large part of the


scaling of this benchmark
on these instances is due to
running multiple Java VMs,
one for each NUMA node. If
this approach is not practi-
cal for a customer’s workload,
then the scaling may degrade,
possibly significantly.

Figure 21. Storage optimized SPECjbb2005 peak throughput

redhat.com Detail Demystifying performance in the public cloud 16


Figure 22 explanation

The question of scalability vs.


performance is made even
more clear by looking at the
price/performance results.
The cloud vendor 2 instances
perform relatively consistently
in price/performance but are
outperformed by the small
and medium instances.
These instances slightly out-
perform in the largest instance
size. Keep in mind the above
caveat regarding NUMA
tuning the application.

Figure 22. Storage optimized SPECjbb2005 peak price/performance

CPU aggregate score

Figure 23 explanation

The most striking aspect of


the aggregate score for this
class of instances is the signif-
icant jump in performance for
the CV2_S96CPU. Combining
96 vCPUs with two sockets’
worth of memory bandwidth
provides a powerful configura-
tion for these benchmarks.

Figure 23. Storage optimized aggregate CPU performance score

redhat.com Detail Demystifying performance in the public cloud 17


Detail

Figure 24 explanation

The price/performance results


should be no surprise at this
point. The smallest instances
provide the best price/per-
formance, and it steadily
degrades as the instances
get larger.

About Red Hat


­Red Hat is the world’s leading Figure 24. Storage optimized aggregate CPU price/performance
provider of enterprise open
source software solutions, using
a community-powered approach
to deliver reliable and high- Conclusion
performing Linux, hybrid
cloud, container, and With Red Hat Enterprise Linux, you get the same experience whether you are running on-premise or
Kubernetes technologies. in one of our certified cloud providers. This consistency is invaluable and empowers organizations
Red Hat helps customers develop like yours to retain existing skills, standards, and processes to support peak performance in public,
cloud-native applications,
integrate existing and new IT private, multicloud, and hybrid cloud environments. Based on the performance benchmarks, scaling
applications, and automate and out instances provides much better performance for the price. That said, the best way to experience
manage complex environments. the performance of Red Hat Enterprise Linux is to try it for yourself and explore the various standard
A trusted adviser to the Fortune
500, Red Hat provides award- performance tools that come with Red Hat Enterprise Linux.
winning support, training, and
consulting services that bring Tools like the web console and BCC tools can help you identify performance lag or anomalies and
the benefits of open innovation compensate for a lack of performance expertise or performance troubleshooting resources. These
to any industry. Red Hat is tools allow you to quickly identify the underlying issues behind performance issues.
a connective hub in a global
network of enterprises, partners, When it comes to analysis, Red Hat Enterprise Linux provides an excellent framework for collecting
and communities, helping
organizations grow, transform, performance metrics and visualizing them. With Performance Co-Pilot (PCP) and Grafana, you
and prepare for the digital future. can collect and visualize performance metrics across your Red Hat Enterprise Linux deployments in
the cloud.

In addition, Red Hat Enterprise Linux comes with recommended out-of-the-box best practices for
performance tuning, empowering customers to optimize workload performance. With TuneD, you
can manage and select from a variety of performance profiles to meet your use cases.

See how well your systems are performing in the cloud using the tools we outlined above. Read
our performance blog series on how you can measure and tune your Red Hat Enterprise Linux
performance.

facebook.com/redhatinc
@RedHat NORTH AMERICA EUROPE, MIDDLE EAST, ASIA PACIFIC LATIN AMERICA
linkedin.com/company/red-hat 1 888 REDHAT1 AND AFRICA +65 6490 4200 +54 11 4329 7300
00800 7334 2835 apac@redhat.com info-latam@redhat.com
europe@redhat.com

redhat.com Copyright © 2021 Red Hat, Inc. Red Hat and the Red Hat logo are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries
#F30497_1121 in the United States and other countries. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. Java is
the registered trademark of Oracle America, Inc. in the United States and other countries. All other trademarks are the property of their
respective owners.

You might also like