Intel IT Data Center Strategy Evolution
Intel IT Data Center Strategy Evolution
May 2023
IT@Intel:
Data Center Strategy
Leading Intel’s Business Transformation
Executive Summary
Intel IT Authors Intel IT runs Intel data center services like a factory, affecting change in
Shesha Krishnapura a disciplined manner and applying breakthrough technologies, solutions,
Intel Fellow and Intel IT CTO and processes. This enables us to optimally meet Intel’s business
Shaji Kootaal Achuthan requirements while providing our internal customers with effective data
Senior Staff Engineer center infrastructure capabilities and innovative business services.
Building on previous investments and techniques, our data center strategy
Murty Ayyalasomayajula
Senior Staff Engineer has generated savings exceeding USD 7.5 billion from 2010 to 2022.
Vipul Lal We are constantly enhancing our data center strategy to continue our
Senior Principal Engineer data center transformation. Using disruptive server, storage, network,
infrastructure software and data center facility technologies can lead to
Raju Nallapa unprecedented quality-of-service (QoS) levels and reduction in total cost of
Senior Principal Engineer
ownership (TCO) for business applications. They also enable us to continue
Sanjay Rungta to improve IT operational efficiency and be environmentally responsible.
Senior Principal Engineer
Ty Tang
Senior Principal Engineer
Himayun Zia
Technical Program Manager
2013+
Focus on Resource and Energy Efficiency
• Breakthrough disaggregated server architecture innovation
• Centralized batch computing capacity in two mega-hubs
• Combined high-frequency servers and optimal workloads
for platform pairings
• Centralized management of servers and resources
• Converted older wafer fabrication facilities into data centers
• Custom rack design to optimize space, compute, and power density
• Environmental sustainability—either free-air cooling or evaporative
cooling-tower water to condition the data centers
Intel Data Center: 31 MW in 30K SQ FT: 1.06 PUE • State-of-the-art electrical density and distribution system
2010-2013
Table of Contents Transform Business Capabilities
• TCO assessment of Infrastructure as a Service
Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
• Introduction of data center MOR
Intel IT Data Center Strategy Evolution. . . . . . . . . . . . . . . . 2 • Unit-costing model to plan improvement targets and benchmark
Intel IT Data Center Transformation Strategy. . . . . . . . . 4 • Pulse dashboard for comprehensive state of
Defining a Model of Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Infrastructure-as-a-Service capacity and utilization
Intel
IntelIT
ITData
Data Center TransformationStrategy
Center Transformation Strategy
We operate our data center service like a factory by applying breakthrough technologies,
solutions, and processes to achieve industry leadership.
Headcount
Best Achievable Capabilities
Quality of Service
(Model of Record)
(Service Level Agreements)
TIER-1 TIER-3
Current Capabilities
(Plan of Record)
KPIS Optimize business
structure to support
Maximize critical business
Approach business value
through optimization
Storage functions Servers
Seek transformation instead vectors
Resource Cost Per
of incremental change Utilization
$ Service Unit
10% YOY
TIME +80%
OS and
Management
Figure 2. Maximizing the business value of Intel’s data center infrastructure requires continued business-driven
innovation in the areas of compute, storage, network, and facilities, while balancing KPIs to achieve the MOR.
the last few years to develop a robust business continuity Defining a Model of Record
plan. Our plan keeps factories running even in the case of a
catastrophic data center failure. Our transformational data center strategy involves running
Intel data centers and underlying infrastructure as if they
In our Manufacturing environment, we pursue a methodical, were factories, with a disciplined approach to change
proven infrastructure deployment approach to support management. Applying breakthrough technologies,
high reliability and rapid implementation. This “copy-exact” solutions and processes in an effective controlled manner
approach deploys new solutions in a single factory first and, can help us be an industry leader and to keep up with the
once successfully deployed, we copy that implementation accelerating pace of Intel’s business.
across other factory environments. This approach reduces
Based on improvements each year in technologies,
the time needed to upgrade the infrastructure that supports
solutions, and processes, we use three key performance
new process technologies—thereby accelerating time to
indicators (KPIs) to define a model of record (MOR) for
market for Intel products. The copy-exact methodology
the year. These KPIs—which are discussed in more detail
allows us to quickly deploy new platforms and applications
in subsequent sections—include the following: best
throughout the Manufacturing environment. This helps
achievable quality of service (QoS) and service-level
us meet a 13‑week infrastructure deployment goal 95%
agreements (SLAs); lowest achievable unit cost; and
of the time—compared to less than 50% without using
highest achievable resource utilization.
copy‑exact methodology.
We set investment priorities based on the KPIs to move
toward the MOR goal. As shown in Figure 2, each year we get
Office and Enterprise
closer to the MOR while at the same time balancing the KPIs.
To improve IT agility and the business velocity
of our private enterprise cloud, we have We use five primary tactics to achieve our MOR goals:
implemented an on-demand self-service model. • Embrace disruptive servers
This model has reduced the time to provision • Adopt tiered storage
servers from three months to on-demand provisioning. We • Increase facilities efficiency
have achieved a mature level of virtualization in our Office • Drive network efficiency
and Enterprise computing environment and have started • Improve operational efficiency
deploying containers technology to further improve the
agility in managing infrastructure and application; software More information is provided about each of these tactics
development and testing; and scalable services deliveries. in subsequent sections.
KPIS
not enough storage available, that is only 66% effective
utilization of compute capacity. Or, if a customer consumes
rent R)
only 4 GB of a 10-GB storage allocation, the remaining Cur es (PO
i l i t i
6 GB is wasted storage. Even though it is allocated, it does
Ca pab
not represent effective utilization of this asset. Our goal
for the effective utilization KPI is to achieve 80% effective
utilization of all IT assets. TIME
Implementing a New Unit-Cost Financial Model Service-based unit costing enables us to benchmark
ourselves and prioritize data center investments.
We evolved our financial model from project- and Determining service-based unit costs also allows us to
component-based accounting to a more holistic unit- measure and compare the performance of individual data
costing model. For example, we previously used a “break/ centers to each other. This comparison helps us identify
fix” approach to data center retrofits. We would upgrade which data centers are not performing optimally and decide
a data center facility or a portion of the facility in isolation, whether to upgrade or consolidate them.
looking only at the project costs and the expected return on
that investment. We had no holistic view as to the impact To show how the new unit-based costing model works,
of service unit output. In contrast, today we focus on TCO Figure 4 compares Design cost data and Office and
per service unit—using the entire data center cost stack per Enterprise cost data. The headcount category shows an
unit of service delivered. This cost stack includes all cost equal percentage of total cost in Office and Enterprise and
elements associated with delivering business services and in Design. In contrast, servers are more of a cost factor in
now considers the worldwide view of all data centers in the Design than they are in Office and Enterprise. Knowing
assessment of our investments. our exact unit cost in each environment, as well as the
breakdown of that cost, enables us to develop optimized
Figure 3 shows the six major categories of cost to consider: solutions for each environment that will have the greatest
headcount, facilities, servers, OS and manageability, effect on cost efficiency and ROI.
storage and backup/recovery and network. By adding
these costs and then dividing them by the total number
2022 Unit-based Costing of IaaS
of appropriate service units for the environment, we arrive
at a cost per service unit. Design Office and Enterprise
Environment Environment
Determining the Cost per Service Unit 2.7%
We have dramatically improved performance and reduced Office and Enterprise Environment
costs for our data centers (Table 2).
• More efficient Office and Enterprise compute and storage
• Tier-4 servers have the highest capacity but are used cycle has enabled us to shift from tape-based backup to
for low-frequency access and read-only archived data. disk-based backup with newer technology and architecture.
This shift has made business continuity and rapid recovery
• We have initiated work to automatically tier unused
from disaster a reality while reducing the backup cost and
blocks from these higher tiers to an on-premises object
storage solution. enhancing the SLA. We are also using this transition to
further reduce our backup footprint. Our approach is to
Our strategy has been updated to account for the avoid backing up data for which it is more cost effective
computational scale of the site. This helps us to determine to regenerate it than to recover it from backup.
the appropriate performance level required for each
tier and improves our ability to meet quality, SLA, and Data Reduction
cost targets. Our automated systems monitor file server
The introduction of new storage to support company
responsiveness and use that information to regulate the
growth and our commitment to timely refresh are enabling
jobs through suspension and ramp controls. At the same
us to use the latest generation of Intel Xeon processors.
time, the automated systems generate and analyze file
access patterns to determine which jobs, users and files These processors provide us with the processing power to
are experiencing the highest access rates. We selectively handle data deduplication, compaction, and compression
use storage QoS to isolate and mitigate the impact of on our primary and backup storage servers. They have
very‑high‑IOPS workloads. freed more than 144 PB of capacity, which we are making
available for our users.
We have applied several other storage techniques to further
enhance storage efficiency and reduce costs including scale- We continue to work closely with our internal design teams
out storage, refresh cycles for storage and data reduction. to achieve the following goals:
• Optimize their design flows to reduce the growth rate of
Scale-out Storage their data and IOPS requirements.
We have executed a strategic shift from a fragmented • Dynamically adjust the allocations based on usage.
scale-up storage model to a pooled scale-out storage • Over-allocate capacity.
model. Scale-out storage better supports on-demand
requests for performance and capacity. In addition, We have historically used efficient scanning algorithms to
scale-out storage enables transparent data migration determine the age of files and then used that data to right-
capabilities. It also increases the effective utilization of tier entire areas or subdirectories. We are now using block-
space freed by using storage-efficiency technologies level transparent data tiering to tier aged data to object
such as deduplication, compression, and compaction. We storage. We combine the aging information with I/O activity
are performing storage scaling on-demand for read-only to make more intelligent decisions to remove unused data
storage areas, which require extremely high access rates. within three to six months.
We use mount options to increase attribute caching and
avoid wasteful locking options on read-only areas. This Increasing Facilities Efficiency
reduces the storage load by more than 50% and improves
We used our new investment model to evaluate the
job throughput. We have also enabled high-performance
number of data centers we currently have and the number
shared scratch spaces to meet the demand from our
we should have. The new investment model identified
hyperscale EDA compute environment. As we march
opportunities to reduce the number of data centers
towards significantly higher compute scale, where the
using techniques such as the following:
impact of storage overload is becoming more costly, we
are shifting our bias towards achieving higher resiliency. • Closing, retrofitting, or reclassifying data centers
This is achieved through increased redundancy and and improving efficiency.
moderation of our storage capacity utilization targets. • Co-locating local infrastructure with Design and
Manufacturing data centers or providing services
Storage Refresh Cycle from a server closet.
To improve performance and reduce costs, we implemented • Managing local infrastructure sites remotely.
an efficiency-based refresh cycle. This enables us to take • Improving facility power efficiency through
advantage of storage servers with better performance and strategic investments.
more efficient energy use. This approach has reduced both
capital and expense costs. For example, a more energy- We have targeted 32 inefficient data centers since
efficient server can reduce data center power usage. A 2011. Our efforts have eliminated 66,375 square feet
more powerful server that replaces several older servers and converted 23,609 square feet of data center space
can also reduce our data center footprint. It also helps us to low‑cost infrastructure rooms. This has saved Intel
deliver better performance for our customers at a similar USD 25.45 million annually.
or lower cost per TB. Over the last few years, our refresh
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 11
Figure 6 shows how we have consolidated our data center Driving Network Efficiency
facilities from 2003-2022. We have reduced the total
square footage by up to 21% and reduced the number of Data center growth is continually placing greater demands on
data centers from 152 to 54. Simultaneously, we increased Intel’s network. In response, in 2010 Intel IT began to convert
our data center compute capacity and commissioned our data center network architecture to 10 GbE connections.
power by up to 108% from 50 MW to 105 MW over the last Around 2015, we introduced 40 GbE to meet the inter-switch
ten years. From 2012-2022, we have saved over 1.3 billion link capacity demand. In 2019, we started a multiyear journey
KW hours compared to industry-standard data centers. to make 100 GbE pervasive within our data centers to keep
up with the demand. Figure 7 illustrates the growth in data
center network port deployments.
2003-2022 Data Center Modules
10/40/100 GbE Port Deployment
103 105 MW
482k Ft2
466k 78,561
446k
~21% 419k 85
52,476 14,097
Square
Footage
350k
2017 217,558
Number of Ports
390k 380k Ft 2
100 GbE Introduction
17,842
11,489
Decrease 12,094
356k 3,758 184,914
Since 2003
2016
176,109
12,019
50 MW 64
1,619 155,082
152 40 GbE Introduction 11,755
108%
Modules 136 520 128,756
6,525
3,121 108,604
Overall 91
87,692
Increase 68 56 56
in Power 60 56 56 54 65,456
Since 2003
37,880
3,856 18,136
936 12,896
2003 2005 2011 2012 2016 2020 2021 2022 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Sq. Feet Commissioned Power Total Data Center Modules
10 GbE 40 GbE 100 GbE
Figure 6. Innovative data center designs have enabled us Figure 7. Implementing 10/40/100 GbE data center fabric
to decrease data center square footage while increasing design accommodates current capacity growth.
power density and capacity.
To meet today’s scale and capacity demand, we are protocol within our data centers; this protocol does not
now migrating the data center architecture to a leaf- scale well for large networks. Using layer 3-based, scalable
spine architecture. We are also transitioning our switch architecture within Intel’s data center lets us plan for scale
interconnects to 100 GbE and multi-100 GbE. Our new 100 and resiliency. Also, we are using other technologies such
GbE data center fabric design accommodates our current as overlay, multi-chassis link aggregation and tunneling to
annual network capacity growth of more than 30%. extend layer 2 across data centers, over the layer 3 topology.
In 2022, we have increased our 100 GbE capacity from Due to the scale of the data center and new landings, we
52,476 to 78,561 ports (see Table 3). All the switch made zero-touch provisioning and automation a key part
interconnects are being migrated to 100 GbE going forward. of the new architecture. With the new simplified modular
However, 40 GbE and 10 GbE will continue to be a key part design, each key building block has been converted into a
of the data center technology. We currently have deployed module of the automation system. This approach allows
about 217,558 10 GbE ports. us to provision the network within minutes with minimum
effort. In addition, we can maintain consistency across the
Table 3. 100 GbE Port Count Growth network and investigate anomalies.
100 GbE Annual Growth Rate We tend to adopt higher-speed network technology almost
Year Port Count (% increase) as soon as it is available in the market. We started adoption
2017 520 of 40 GbE in data centers in 2015 and adoption of 100 GbE
technology in 2017, to keep pace with network demand.
2018 1,619 211%
2019 3,758 132% In 2015, we also made two key architecture changes within
Design data centers. We reduced the oversubscription
2020 17,842 374%
through the infrastructure and shifted from chassis-based
2021 52,476 194% switches to fixed form factor switches for better cost and
2022 78,561 49.7% upgrade efficiency.
In addition to increasing the network capacity, we have also With this move, we reduced the oversubscription from 8:1
increased the effective utilization of network ports over to 6:1 on the compute side and 8:1 to 3:1 on the file server
the last 13 years from 40% to 68% (1.7x increase). Higher side. Over the same period, we transitioned 70% of our
utilization means we do not have to purchase additional ports Design data centers to use fixed form-factor switches using
to meet network capacity demand growth. Figure 8 shows a modular design. Now with the new leaf-spine architecture,
the continual increase in port utilization. we have maintained the same level of over-subscription
ratios even though the file servers are transitioning to
Effective 10 GbE Port Utilization 40 GbE. This is possible by using 8x100 GbE interconnect
Higher is Better links and 16x 100 GbE spine-to-universal spine links.
Early Adoption
2015 70%
2013 2014 62%
60% 61%
60%
2012
51% Intel IT adopts higher-speed network
2010
2011
45% 1.7x 50% technology in their data center almost as
40%
Increase in 10 GbE Port soon as it is available
Utilization from 2010-2022
40%
el A
100,000 GbE
I n t 10,000
100 1,000
We are also focusing on improving data center stability.
b ility
In the past, we used a large installation of layer 2-based 40
t Availa
technology. We have migrated to a layer 3-based network. arke
10 M
This new architecture enables us to use all available 0.1 GbE
bandwidth on primary and secondary paths at the same
TIME
time. Therefore, we can use our network capacity more
effectively. We are also able to eliminate the spanning-tree
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 13
We expect the number of cores to continue to increase. We As shown in Figure 10, our HPC solution has enabled an
plan to measure data center performance based on number up to 519x growth in tapeout compute capacity from
of cores, number of racks, power consumed, and the extent 2005 to 2022. We are now using the 6th generation of
to which we meet the meaningful indicator of performance our HPC solution and will continue to develop new HPC
per system (MIPS) demand. generations as Intel process technology advances. The
figure also shows our commitment to quality. Through a
disciplined approach to change management (running our
Design
729
727 data centers as if they are factories), we have reduced the
number of compute issues that impact tapeout by 322x.
Compute and Storage Demand
541.26x
653 Intel Tapeout Computing Metrics
518.71x
Higher
32% 32.2
383.39x
EDA-MIPS INCREASE
2010-2022 Average Annual 293.18x
519x
Increase in
Compute Demand Growth 522
Processing
Capacity
252.59x
39% 435
Pre-HPC HPC-1
45nm
HPC-2
32nm
HPC-3
22nm
HPC-4
14nm
156.02x
PB INCREASE
2010-2022 Average Annual 413
121.78x
322x
Decrease
Raw Storage Demand Growth 90.21x
HPC-5 HPC-6 in Issues
51.87x 10nm Intel 4
5.4 40.94x Intel 3 Impacting
380 3.8 29.87x
345 2.9 3.8 Tapeout
20.58x
Lower
7.39x
1.00x 3.95x 0.5 0.5 0.2 0.5 0.3 0.3 0.1 0.1
13.2x 17.32x 1.3 1.6 1.0
2016 329 323
2005 2023
Disaggregated Servers Compute Issues Impacting Tapeout (Issue per 1,000 masks) lower is better
Introduced Normalized Tapeout Processing Capacity higher is better
280 servers/rack 271
Figure 10. Our HPC solution, combined with disciplined
239 change management, has steadily increased compute
237 capacity and improved QoS.
229
189
2013 192
Increased Design Throughput Using NUMA-Booster
High-Density Racks 154
143
Introduced 161 Overall data center optimization includes more than simply
140-180 servers/rack 120 145
looking at server performance and facility efficiency.
95 117 Application performance and workload optimization
75 101
69
can also be contributing factors. We developed a system
53 83 software capability called NUMA-Booster. This feature
31
40 56 59 automatically and transparently intercepts all Design
45
27
20 29
38
32
43 workloads running on two-socket batch servers and
21
56
11
56 48
15
45 39 48 74 126 161 206 228 262 302 354 365
performs workload scheduling better than the default OS
scheduling capability. Our tests have shown an average 17%
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
improvement in Design performance on these two-socket
Design Servers (1k) EDA-MIPS (10K) Cores (10K) Raw Storage (PB) servers. We are also deploying large-scale single-socket
servers when possible. These servers do not need the
Figure 9. Despite continuing growth in compute and storage NUMA-Booster feature and can further increase overall
demand, our Design data centers are using powerful Intel Design performance.
technology to meet demand.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 14
Increased Design Throughput Using SSDs We are now able to provide a game-changing remote
as Fast Local Data Cache Drives interactive computing user experience by using User
Datagram Protocol (UDP) instead of Transmission Control
Intel silicon chip Design engineers face the challenge of
Protocol (TCP) for interactive jobs over the WAN. Using UDP
integrating more features into ever-shrinking silicon chips,
has provided up to 4.5x faster response for computer-aided
resulting in more complex designs. The increasing design
design (CAD) modeling.3 We have reached the stage where
complexity creates large electronic design automation
our international design team members have better user
workloads that have considerable memory and compute
experience and higher throughput when working from home
requirements.
with systems in the US hubs than their local data centers.
We typically run the workloads on servers that need to We also delivered up to 9x improvement in data transfer
be configured to meet these requirements in the most rates across the WAN through in-depth collaboration with
cost‑effective way. internal and external technology experts.3 This collaboration
optimized the TCP stack, which can take full advantage of
Intel IT has deployed over 40 PB of SSD storage in high-speed WAN links. The interactive computing and data
over 20,000 servers as fast local data cache drives. replication improvements were achieved within existing
This approach improves workload performance due to WAN bandwidth. Combined, these achievements enable us
reduced network traffic and storage demand. to provide rapid turnaround through the hubs for the model
build, design synthesis, layout and tapein cycle.
Optimizing Servers to Meet Compute Demand
Intel silicon design is continually increasing in complexity. 3
According to internal Intel IT measurements, February 2020
To achieve concomitant faster time to market
improvements, Intel IT provides a global framework
for parallel hardware and software design of numerous
System on a Chip platforms and IP blocks.
Design Environment
Matching single-socket servers and highly scalable server
Improvement Examples
configurations in our data centers yields 25 to 30% faster Efficiency improvements and cost savings
product design and architecture validation processes. We
from 2010 through 2022
use a global scheduling mechanism that pools compute
capacity of over 358,000 servers at multiple sites around
the world. In this way, our design hub provides scalable Computing
capacity and delivers optimal memory and compute Intel IT innovations in the Design computing
capability in a shorter amount of time. data center include disaggregated server
innovation (44% savings during refresh); the NUMA-
Since the first disaggregated server design in 2016, we Booster solution (17% higher performance); SSDs
have continued to evolve the concept. We currently have (27% higher capacity at lower cost); faster servers
deployed more than 310,000 disaggregated servers, using (35% higher performance); single-day dock-to-
13 different blade designs including both single-socket production deployment and procurement efficiency.
and two-socket servers. We use the Intel Xeon processor
E family, Intel Xeon processor W family and Intel Xeon
Scalable processors. The various models are targeted to Storage
meet specific workload requirements, such as different We have implemented Design computing
memory capacity, throughput or number of performance data center storage efficiency improvements
cores, high bandwidth, high IOPS storage needs or the by adopting innovative technology capabilities and
ability to add in accelerator cards on demand. increasing utilization.
Design Zones Enable Highly Resilient In 2019, we brought additional existing virtualized workloads,
Scaling at the Hubs VMs and hosts into our private cloud environment for
centralized management, increasing the footprint by up
The dramatic increase in computing scale in a shared
to 1.77x. Process improvements and enhanced automation
network-attached storage (NAS) environment with tens of
led to additional savings, and we are now deploying
thousands of compute servers can overwhelm the storage
performance-based VMs.
server. It can also introduce significant efficiency and
reliability concerns when 10,000 or more such systems
share the same Network File System (NFS) area and expect
extremely high IOPS or throughput rates. We addressed Office and Enterprise
this in our mission-critical tapeout environment. This Compute and Storage Demand
environment runs parallel workflows that span the entire Physical Servers VM Hosts Virtual OS Instances Raw Storage (PB)
compute environment. We introduced the concept of
partitioning the compute in the two major hubs into smaller, 2019 49.05 PB
Integrated existing VMs,
self-contained sites. Each site has its own NFS storage
and management infrastructure. We worked with our
8.2x
RAW STORAGE
workloads, and hosts
under centralized enterprise
38,244
tapeout team to update the tools, flows and work methods, INCREASE 2009-2022 private cloud umbrella
43.77
along with IT software. As a result, we were able to scale 2018 33,790 32,414
29,107
while maintaining the efficiency and improving resiliency Aggressive reclamation strategy
20.4x
OS Instances
and scalability. reduced VMs by ~5,400
VIRTUAL OS INSTANCE 27,079 36.8 1,713
INCREASE 2009-2022
We later experienced the same scaling challenges for the
881 1,585
rest of the HPC design environment in the hub. These 24,691 1,688 9,760
1,674
issues were caused by the increased sharing on a higher
~1.5x
9,005
30.8 8,738
scale and could not be addressed cost effectively or VM DENSITY
21,339 21,596
29 8,135 8,058
efficiently by the storage changes alone. We built on the INCREASE 2009-2022
17,828
tapeout “sites” concept to introduce design zones into 129
the design hub computing environment. We successfully 6,256 384 15,181
5,746
scaled multiple zones and achieved adequate separation 19.8
484
to provide the necessary increased scale and reliability in 4,601
12,208
788
17.6
650 725
a cost-effective manner. This is a challenging and ongoing 3,895
695
3,918 3,932 896
containers, will enable us to achieve truly independent, 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
scalable, and resilient zones without sacrificing efficiency
or the agility to respond to peak computing demands. Figure 11. A high rate of virtualization combined with Intel
architecture has enabled us to meet growing Office and
Enterprise compute and storage demand while significantly
More Efficient Office and Enterprise decreasing the number of virtualization host servers.
Compute and Storage
Like our Design environment, the compute and storage
demand in our Office and Enterprise environment are
also growing quickly. Nevertheless, as shown in Figure 11,
we continue to meet that demand while maintaining the
number of physical servers over the last three years.
From 2009 to 2017, we achieved an approximate 19x
increase in the number of virtual OS instances. We also
greatly increased average VM density per physical server—
from 11 VMs in 2009 to 30 VMs in 2017 due to improved
server platforms. In 2018, we implemented an aggressive
VM reclamation strategy that led to a reduction of about
5,400 VMs. New workloads that were more cost effective 14 disaggregated servers in a 6U blade chassis with
to deploy on cheaper physical platforms than on a virtualized integrated network switch
platform led to an increase in physical server counts.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 16
121%
100% Unit-Cost Decrease stack—compute, storage, networking, and facilities. Page 18
Using Internal Hosting 2010-2022
provides a summary of the best practices we have developed
62% and the business value they have generated.
74% 44% 42%
39% 41% 37% 33%
56% 34% 33% 32% 28%
38%
27%
Lower is Better 21% 15%
13% 12% 12% 10% 9% 8%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Our investment model has enabled
Figure 12. Unit cost including servers, storage, network, us to reduce unit costs in the Design
and operational costs shows private cloud hosting of our
data center workloads is significantly less expensive than if environment by 91% and in the Office
we use public cloud services. and Enterprise environment by 86%.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 17
2010-2022
Intel IT Data Center Strategy Results
Design Office And Enterprise
Environment Growth Environment Growth
2502%
1794%
1306%
Relative Performance
Relative Performance
25x ~4x
468%
GROWTH IN GROWTH IN
over
over
427% 429% 418% 392%
1131%
ENVIRONMENT ENVIRONMENT 378%
933%
834%
285%
493% 650% 252%
227%
349% 189%
259% 162%
136% 183% 131%
100%
100%
2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022
115% 127%
45%
107% 105% 104%
Relative Spending and Cost
56%
31%
2022 Spend Increase
64%
69% 65% 64%
61%
50%
42%
51% 52%
55%
47%
38%
30%
27% 26%
21% 23%
15% 16% 14%
13% 12% 12% 12% 11.8% 12.5%
9.7% 8.9% 8.4% 9%
91%
Total Per Unit-Cost Decrease
86%
Total Per Unit-Cost Decrease
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Servers 35.6%
Facilities 54.5% Other Costs Headcount Other Costs
Servers Other Costs 58.4% 56.8% 25.4%
Other Costs 29.9% Servers
39.7%
6.9%
Servers
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 18
2010-2022
Intel IT Data Center Best Practices
Servers
Adopt disaggregated servers Deploy SSDs as the standard local disk in all new servers
• Saves at least 44% over a full acquisition (rip-and-replace) refresh • Improved performance for I/O-intensive workloads and expected
• Reduces provisioning time (IT technician labor) by as much as 77% reduction of disk failure rates
• Decreases shipping weight of refreshed server material by 82%
Migrate applications from RISC to Intel® architecture
Adopt elastic computing services and technologies • Enabled significant savings and IT efficiencies
• Virtualized most of the Office and Enterprise servers • Allowed us to realize the benefits of industry-standard operating systems
• Reduced the time it takes to provision a server from 90 days to and hardware
on‑demand provisioning using virtualization
• Enabled containers as a service Deploy HPC
• 519x increase in capacity during HPC-6, with a 322x increase in stability
Enable one-day dock-to-production for physical servers • Saved USD 44.72 million net present value during HPC-1 itself
• Upfront planning and process enhancement to order long-lead time
items and rack readiness, reducing the dock-to-production release Enhance server performance through software optimization
from 10+ days to one day • Increased Design job throughput up to 49%
• Delivered various optimizations including disaggregated servers, NUMA-
Regularly refresh servers using the latest Intel® Xeon® processors Booster, fast local data cache based on SSDs and high-frequency servers
• Virtualization ratios of up to 60:1 and optimal workload to platform pairing
• Reduced Design environment energy consumption by 10% annually • Significant performance improvement of data replication (up to 9x) and
between 2008 and 2013 interactive jobs (up to 4.5x) over the WAN (Internal Intel IT, February 2020)
Storage
Refresh and modernize storage using the Implement thin provisioning and deduplication for
latest generations of Intel Xeon processors storage resources
• Take advantage of innovative technology to increase storage capacity, • Helps control costs and increase resource utilization without adversely
quality, velocity, and efficiency at a lower cost affecting performance
• More than twice the I/O throughput than older systems • Increased effective storage utilization in Design from 46% in 2011
• Reduced our data center storage hardware footprint by more than to more than 75% now
50% in 2011-2012
• Reduced backup infrastructure cost due to greater sharing of resources Automatically down-tier inactive blocks while monitoring
• Tiered backup solutions to optimize backup costs and improve reliability and reclaiming unused data
• Policy-based down tiering of blocks that have not been recently
accessed to reduce capacity demand rapidly and automatically for
Right-size storage solutions using a tiered model high‑performance storage
• Provide storage resources based on business needs: performance,
• Continuously monitor and delete transient (non-IP) data that has not
reliability, capacity, and cost
been accessed for 6 months or more based on customer expectations
• Better management of storage costs while still enabling easy access
to necessary data Scale storage on demand and provide high-performance
• Transition to scale-out storage to reduce operational complexity in tiering data
shared scratch spaces
• Automated policy-based data migration between tiers
• Enables higher workload throughput for read-only storage areas that
require high access
Network Facilities
Upgrade data center LAN architecture to support Increase cooling efficiency
10/40/100 GbE • From 2012-2022, we have saved over 1.3 billion KW hours compared
• Increased data center network bandwidth by 400% over three years, to industry-standard data centers
enabling us to respond faster to business needs and accommodate growth
• Increased the network utilization from 40 to 68% between 2010 to 2022 Use a tiered approach to redundancy, availability, and
• Eliminated spanning tree with multi-chassis link aggregation and physical hardening
Layer 3 protocol • Better matching of data center redundancy and availability features
• Reduced network complexity due to fewer NIC and LAN ports to business requirements
• Reduced network cost in our virtualized environment by 18 to 25% • Reduced wasted power by more than 7% by eliminating redundant power
distribution systems within a data center
Open the data center network to multiple suppliers Retrofit and consolidate data centers using a modular design
• Generated more than USD 60 million in cost avoidance over • Retrofitted old wafer fabrication plant to high-density, high-efficiency data
five years with new network technology center modules with industry-leading PUE of 1.06
• Utilized free-air cooling and environmentally efficient evaporative cooling
Deploy Intel® Silicon Photonics optical transceivers for maximum energy efficiency
• For large-scale 100 GbE deployment, leveraged Intel® Silicon • Avoided capital expenditures by not equipping the entire facility
Photonics to significantly reduce the per-port cost with generators
• Quickly responded to changing data center needs with minimal effort and cost
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 19
• Increase facilities efficiency. Use techniques such as • One-day dock-to-production for new physical server
higher ambient temperature for specific data center deployment in our data center hub.
locations to take advantage of newer equipment • We developed a system software capability called
specifications, which will help reduce cooling needs. NUMA-Booster, which has saved millions while delivering
additional usable server capacity.
• Drive network efficiency. Continue to drive LAN
utilization toward 75% and pursue software-defined • We deployed more than 40 PB of SSDs as fast local data
networking to support agile, ultra-high-density data cache drives. This increased workload performance
center designs. Continue to migrate to 100 GbE with due to lower network traffic and storage demand.
Intel® Silicon Photonics optics where appropriate and • Six generations of HPC in our design computing
cost-effective, to meet network capacity demands. environment created a 519x capacity increase and
Drive the automation deeper into our day-to-day work. 322x quality improvement.
• Improve operational efficiency. Increase the telemetry • We adopted new storage capabilities like deduplication
within the data center to improve the operational efficiency. and compression, accelerated storage refresh, focused
on increasing utilization, removed unneeded data, and
implemented policy-based tiering. All of these have
resulted in getting additional usable capacity out of storage
while reducing cost and providing higher performance.
• We deployed more than 78,000 100 GbE network ports,
We are now applying our MOR 14,000 40 GbE network ports, and 217,000 10 GbE
network ports.
approach across our entire
infrastructure stack.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 20
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Performance results are
based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration
details. No product or component can be absolutely secure. Cost-reduction scenarios described are intended as examples of how
a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings.
Circumstances will vary. Intel does not guarantee any costs or cost reduction. Your costs and results may vary. Intel technologies may
require enabled hardware, sof tware or service activation. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks
of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. Copyright 2023 Intel
Corporation. All rights reserved. 0523/WWES/KC/PDF