0% found this document useful (0 votes)
261 views20 pages

Intel IT Data Center Strategy Evolution

Intel IT has implemented a data center strategy evolution since 2006 focused on efficiency and cost savings. Through breakthrough technologies like disaggregated servers, centralized computing hubs, and data center conversions, Intel IT has generated over $7.5 billion in savings from 2010-2022. Current strategy focuses on resource and energy efficiency using disruptive technologies to further optimize costs while improving services.

Uploaded by

soo kyung Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
261 views20 pages

Intel IT Data Center Strategy Evolution

Intel IT has implemented a data center strategy evolution since 2006 focused on efficiency and cost savings. Through breakthrough technologies like disaggregated servers, centralized computing hubs, and data center conversions, Intel IT has generated over $7.5 billion in savings from 2010-2022. Current strategy focuses on resource and energy efficiency using disruptive technologies to further optimize costs while improving services.

Uploaded by

soo kyung Lee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

White Paper

May 2023

IT@Intel:
Data Center Strategy
Leading Intel’s Business Transformation

As we continue to apply breakthrough technologies and solutions while evolving


our processes, we enable the acceleration of Intel’s business

Executive Summary
Intel IT Authors Intel IT runs Intel data center services like a factory, affecting change in
Shesha Krishnapura a disciplined manner and applying breakthrough technologies, solutions,
Intel Fellow and Intel IT CTO and processes. This enables us to optimally meet Intel’s business
Shaji Kootaal Achuthan requirements while providing our internal customers with effective data
Senior Staff Engineer center infrastructure capabilities and innovative business services.
Building on previous investments and techniques, our data center strategy
Murty Ayyalasomayajula
Senior Staff Engineer has generated savings exceeding USD 7.5 billion from 2010 to 2022.

Vipul Lal We are constantly enhancing our data center strategy to continue our
Senior Principal Engineer data center transformation. Using disruptive server, storage, network,
infrastructure software and data center facility technologies can lead to
Raju Nallapa unprecedented quality-of-service (QoS) levels and reduction in total cost of
Senior Principal Engineer
ownership (TCO) for business applications. They also enable us to continue
Sanjay Rungta to improve IT operational efficiency and be environmentally responsible.
Senior Principal Engineer
Ty Tang
Senior Principal Engineer

Himayun Zia
Technical Program Manager

$7.5 Billion in Savings

44% 1-Day 400% 519x


SAVINGS DEPLOYMENT INCREASE INCREASE
during refresh with a using our Process in Data Transfer Rates in our HPC Environment
Disaggregated Server Design Transformation for new between sites through with 322x improvement
compared to a full-acquisition physical server deployment international WAN links in quality
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 2

Intel IT Data Center


Strategy Evolution
CUMULATIVE
$7.5 B COST SAVINGS
2010-2022

2013+
Focus on Resource and Energy Efficiency
• Breakthrough disaggregated server architecture innovation
• Centralized batch computing capacity in two mega-hubs
• Combined high-frequency servers and optimal workloads
for platform pairings
• Centralized management of servers and resources
• Converted older wafer fabrication facilities into data centers
• Custom rack design to optimize space, compute, and power density
• Environmental sustainability—either free-air cooling or evaporative
cooling-tower water to condition the data centers
Intel Data Center: 31 MW in 30K SQ FT: 1.06 PUE • State-of-the-art electrical density and distribution system

2010-2013
Table of Contents Transform Business Capabilities
• TCO assessment of Infrastructure as a Service
Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
• Introduction of data center MOR
Intel IT Data Center Strategy Evolution. . . . . . . . . . . . . . . . 2 • Unit-costing model to plan improvement targets and benchmark
Intel IT Data Center Transformation Strategy. . . . . . . . . 4 • Pulse dashboard for comprehensive state of
Defining a Model of Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Infrastructure-as-a-Service capacity and utilization

Results: Building on the Past, Building for the Future. . 8


2010-2022 Data Center Results . . . . . . . . . . . . . . . . . . . . . . 17
2010-2022 Data Center Best Practices . . . . . . . . . . . . . . 18 2006-2010
Plans for 2023 and Beyond. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Foundation for Efficient Growth
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 • Business-focused investments for DOME
• Proactive server and infrastructure refresh
• Virtualization and enterprise private cloud
Background • Storage optimization and IT sustainability

Intel IT operates 54 data center modules at 15 data


center sites. These sites have a total capacity of 105
megawatts, housing more than 390,000 servers 2000-2006
that underpin the computing needs of more than Standardization and Cost Control
131,000 employees.1 To support the business needs • Formed data center team
of Intel’s critical business functions—Design, Office, • Completed RISC to Intel Architecture migration in Design
Manufacturing and Enterprise (DOME)—while • Standardized data center designs
operating our data centers as efficiently as possible, • Began data center consolidation efforts
Intel IT has engaged in a multiyear evolution of our
data center strategy, as outlined in Figure 1.
1
Number of data centers and servers as of December 2022. To define
Pre-2000
“data center,” Intel uses IDC’s data center size classification: “any
room greater than 100 square feet that houses servers and other Ad-hoc/Unstructured Growth
infrastructure components.”
• No centralized strategy or ownership
• Built data centers to support acquisitions
• Decentralized procurement and management
• RISC migration to Intel® architecture begins

Figure 1. Intel’s data center strategy is a continuous


improvement process.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 3

Meeting Compute Environment Challenges


In the past, we focused our data center investments on Since 2010, these techniques have
improving IT infrastructure to deliver a foundation for the
efficient growth of Intel’s business. Our primary goal was cost
enabled us to realize USD 7.5 billion
reduction through data center efficiency and infrastructure in cost savings while supporting
simplification while reducing energy consumption and our
CO2 footprint to improve IT sustainability.
significant growth.
Over the last several years, we have reduced data center
energy consumption and greenhouse gas emissions.
At the same time, we have met the constantly increasing
Aligning Data Center Investments
demand for data center resources. We anticipate these
annual growth rates to continue or even increase further: with Business Needs
• 30 to 40% in compute capacity requirements We have learned that a one-size-fits-all architecture is not
• 35 to 50% in storage needs the best approach for Intel’s unique business functions.
We worked closely with business leaders to understand
• 30 to 40% in demand for network capacity
their requirements. As a result, we chose to invest in
We needed to address these challenges without negatively vertically integrated architecture solutions that meet
impacting service delivery. We developed and continue to the specific needs of individual business functions.
rely on many established industry best practices in all areas
of our data center investment portfolio. These areas include Design
servers, storage, networking, and facility innovation.
Design engineers run more than 273 million
Since 2010, these techniques, described in detail later,
compute-intensive batch jobs every week.
have enabled us to realize USD 7.5 billion in cost savings
Each job can take a few seconds to several days
while supporting significant growth.
to complete. In addition, interactive Design
applications are sensitive to high latencies caused by hosting
these applications on remote servers. We have used several
approaches in our Design computing data centers to provide
enough compute capacity and performance to support
requirements. These approaches include high-performance
CPU and DRAM I/O
computing (HPC), grid computing and clustered local
Module Module
workstation computing. 2 We used SSDs as fast local data
cache drives, single-socket servers, and a specialized
algorithm that increases the performance of the heaviest
Design workloads. Together, these investments enable Design
engineers to run up to 49% more jobs on the same compute
Breakthrough Disaggregated capacity. This equates to faster design and time to market.
Server Architecture Because Design engineers need to access Design data
By decoupling the CPU/DRAM and NIC/Drives frequently and quickly, we did not simply choose the least
modules from other server components, we can expensive storage method for this environment. Instead,
independently refresh we have invested in clustered and higher performance scale-
servers’ CPU and memory CPU
out, network-attached storage in combination with caching
without replacing other on local storage and automatic block tiering to on-premises
server components. This low-cost object storage for our HPC needs. We use storage
results in faster technology Disaggregated area networks for specific storage needs such as databases.
adoption, which in turn Server Design
puts new technology at DRAM
Manufacturing
our Design engineers’ IT systems must be available 24/7 in Intel’s
fingertips. OTHER
COMPONENTS
Manufacturing environment, so we use dedicated
data centers co-located with the factories for
Learn More: Manufacturing. We have invested heavily over
• In this Document: Disaggregated Server
Innovation Reduces TCO and TCE
• White Paper: Disaggregated Servers Drive 2
Intel uses grid computing for silicon design and tapeout functions. Intel’s
Data Center Efficiency and Innovation compute grid represents thousands of interconnected compute servers,
accessed through clustering and job scheduling sof tware. Additionally,
• Blog: Disaggregated Servers Intel’s tapeout environment uses an HPC approach, which optimizes all
key components such as servers, storage, network, OS, applications, and
• Video: Mission - Green Computing monitoring capabilities cohesively for overall performance, reliability,
and throughput benefits. For more information on HPC at Intel, refer to
“High‑Performance Computing for Silicon Design,” Intel Corp., December 2015.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 4

Intel
IntelIT
ITData
Data Center TransformationStrategy
Center Transformation Strategy
We operate our data center service like a factory by applying breakthrough technologies,
solutions, and processes to achieve industry leadership.
Headcount
Best Achievable Capabilities
Quality of Service
(Model of Record)
(Service Level Agreements)
TIER-1 TIER-3

Continue to Network Facilities


Close The Gap Scope
KPIS

Current Capabilities
(Plan of Record)
KPIS Optimize business
structure to support
Maximize critical business
Approach business value
through optimization
Storage functions Servers
Seek transformation instead vectors
Resource Cost Per
of incremental change Utilization
$ Service Unit
10% YOY
TIME +80%
OS and
Management

Tactics Embrace Adopt Drive Increase Improve


Tactics
Disruptive Tiered Network Facility Operational
Servers Storage Efficiency Efficiency Efficiency

Figure 2. Maximizing the business value of Intel’s data center infrastructure requires continued business-driven
innovation in the areas of compute, storage, network, and facilities, while balancing KPIs to achieve the MOR.

the last few years to develop a robust business continuity Defining a Model of Record
plan. Our plan keeps factories running even in the case of a
catastrophic data center failure. Our transformational data center strategy involves running
Intel data centers and underlying infrastructure as if they
In our Manufacturing environment, we pursue a methodical, were factories, with a disciplined approach to change
proven infrastructure deployment approach to support management. Applying breakthrough technologies,
high reliability and rapid implementation. This “copy-exact” solutions and processes in an effective controlled manner
approach deploys new solutions in a single factory first and, can help us be an industry leader and to keep up with the
once successfully deployed, we copy that implementation accelerating pace of Intel’s business.
across other factory environments. This approach reduces
Based on improvements each year in technologies,
the time needed to upgrade the infrastructure that supports
solutions, and processes, we use three key performance
new process technologies—thereby accelerating time to
indicators (KPIs) to define a model of record (MOR) for
market for Intel products. The copy-exact methodology
the year. These KPIs—which are discussed in more detail
allows us to quickly deploy new platforms and applications
in subsequent sections—include the following: best
throughout the Manufacturing environment. This helps
achievable quality of service (QoS) and service-level
us meet a 13‑week infrastructure deployment goal 95%
agreements (SLAs); lowest achievable unit cost; and
of the time—compared to less than 50% without using
highest achievable resource utilization.
copy‑exact methodology.
We set investment priorities based on the KPIs to move
toward the MOR goal. As shown in Figure 2, each year we get
Office and Enterprise
closer to the MOR while at the same time balancing the KPIs.
To improve IT agility and the business velocity
of our private enterprise cloud, we have We use five primary tactics to achieve our MOR goals:
implemented an on-demand self-service model. • Embrace disruptive servers
This model has reduced the time to provision • Adopt tiered storage
servers from three months to on-demand provisioning. We • Increase facilities efficiency
have achieved a mature level of virtualization in our Office • Drive network efficiency
and Enterprise computing environment and have started • Improve operational efficiency
deploying containers technology to further improve the
agility in managing infrastructure and application; software More information is provided about each of these tactics
development and testing; and scalable services deliveries. in subsequent sections.

In contrast to the Design environment, in the Office and


Enterprise environments we rely primarily on a storage
area network, with limited network-attached storage
for file-based data sharing.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 5

We believe our new approach to data center costing and


investment evaluation, along with a continued focus on
meeting business needs, has stimulated a bolder approach
to continuous innovation. Our efforts have improved
Our new data center investment the quality, velocity, and efficiency of Intel IT’s business
model encourages innovation and services, creating a sustained competitive advantage for
Intel’s business. For details, see “Results: Building on the
provides significant business results. Past, Building for the Future.”

Defining KPIs and Goals


The KPIs provide a means to measure the effectiveness
of data center investments. Because the service output
Achieving Economic Value
for each business function is different, we evaluate them
Our new data center investment model encourages separately. In our data center investment decisions, we
innovation and provides significant business results. seek to balance and meet all business requirements while
We have realized substantial cost savings since 2006 by optimizing the KPIs.
proactively refreshing our infrastructure. For example,
Intel® Xeon® processor-based servers have contributed
Quality of Service
significant economic value. During this time, we have
delivered substantially higher computational throughput We use a tiered approach to SLAs, tailored to each business
as measured by a practical electronic design automation function’s sensitivity to performance, uptime, mean time
(EDA) workload. Further cost savings result from adopting to repair and cost. Our goal for this KPI is to meet specific
cloud computing-like technologies, updating our network, performance-to-SLA requirements for defined tiering levels.
pursuing IT sustainability, and consolidating data centers. In For example, for our most mission-critical applications, we
addition, we have supported business growth and capability aim for a higher performance to SLA than for second-tier
improvements by deploying unique solutions that benefit applications, which are less critical. The end goal and true
Intel’s critical business functions—DOME. measure of IT QoS is zero business impact from IT issues.

Cost per Service Unit


As shown in Table 1, different business functions have
a different service unit that we can measure. This unit
represents the capacity we enable for our business users.
Our goal for this KPI is to achieve a 10% improvement in
data center cost efficiency every year. This goal does not
necessarily mean we will spend less each year, but that we
will get more for each dollar we spend. For example, we may
spend less for the same number of service units, or we may
spend the same amount but get more service output.

Table 1. Service Unit for Each Business Function

Function Service Unit

Design Cost per EDA-MIPS

Office and Enterprise Cost per OS instance

Cost per integrated factory


Manufacturing
compute environment

Intel IT Supercomputer: #81 in Top 500 (2015)


IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 6

Effective Resource Utilization

Closing the Gap


Our refined data center strategy represents a dramatic
shift in how we view resource utilization. Historically, we
measured utilization of IT assets—compute, storage,
network, and facilities—by simply determining how busy Our new data center investment model
or loaded an asset was. For example, if a server was working encourages innovation and provides
at peak capacity 90% of the time, we considered it 90%
utilized. If 80% of available storage was allocated, we significant business results
considered that 80% utilization.
vable
In contrast, we now focus on the actual output of an Best Achie(MOR)
ab ili ti e s
asset—that is, effective utilization. For example, suppose Cap
Intel’s Design engineers start a million design jobs— Continue to
thereby keeping the servers very busy. If a third of those Close the Gap
jobs terminate before completion because there was

KPIS
not enough storage available, that is only 66% effective
utilization of compute capacity. Or, if a customer consumes
rent R)
only 4 GB of a 10-GB storage allocation, the remaining Cur es (PO
i l i t i
6 GB is wasted storage. Even though it is allocated, it does
Ca pab
not represent effective utilization of this asset. Our goal
for the effective utilization KPI is to achieve 80% effective
utilization of all IT assets. TIME

Stimulating Bold Innovation through a


New Investment Model
Our efforts are based on a time-tested methodology that By setting a standard of maximum achievable performance,
has proven successful in Intel’s Manufacturing environment the new model enables us to:
over multiple process technology generations. We adopted • Determine which investments will have the highest ROI.
a new data center investment decision model that compares • Identify the benefits of using disruptive infrastructure
current data center capabilities to a “best achievable technologies and breakthrough approaches that deliver
model.” This model guides us to make investments with more optimal data center solutions across all aspects of
the highest impact. our infrastructure.
Previously, Intel data center planning teams looked at • Make data center location decisions, including identifying
existing capabilities and funding to establish a plan of record. potential data centers to consolidate, upgrade, or close.
This plan drove incremental improvements in our existing The new model focuses limited available resources in
capabilities; our goal was to minimize total cost of ownership specific areas for maximum holistic gain.
(TCO) and deliver positive return on investment (ROI).
Because technology is always changing, peak performance
In contrast, the MOR ignores the constraints imposed by also changes—the maximum achievable performance
what we have today. Instead, it identifies the minimum keeps improving through innovation. We know that resource
amount of resources we should ideally have to support constraints make it difficult to achieve the standard set by
business objectives—thereby establishing an optimal the new investment model. However, our HPC environment
state with available technology. comes very close to that goal. The model enables us to
identify gaps between where we are and where we would
like to be. We can then identify the biggest gaps in capability
to prioritize our budget allocation toward the highest value
investments first.

Our goal for the effective utilization


KPI is to achieve 80% effective
utilization of all IT assets.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 7

Implementing a New Unit-Cost Financial Model Service-based unit costing enables us to benchmark
ourselves and prioritize data center investments.
We evolved our financial model from project- and Determining service-based unit costs also allows us to
component-based accounting to a more holistic unit- measure and compare the performance of individual data
costing model. For example, we previously used a “break/ centers to each other. This comparison helps us identify
fix” approach to data center retrofits. We would upgrade which data centers are not performing optimally and decide
a data center facility or a portion of the facility in isolation, whether to upgrade or consolidate them.
looking only at the project costs and the expected return on
that investment. We had no holistic view as to the impact To show how the new unit-based costing model works,
of service unit output. In contrast, today we focus on TCO Figure 4 compares Design cost data and Office and
per service unit—using the entire data center cost stack per Enterprise cost data. The headcount category shows an
unit of service delivered. This cost stack includes all cost equal percentage of total cost in Office and Enterprise and
elements associated with delivering business services and in Design. In contrast, servers are more of a cost factor in
now considers the worldwide view of all data centers in the Design than they are in Office and Enterprise. Knowing
assessment of our investments. our exact unit cost in each environment, as well as the
breakdown of that cost, enables us to develop optimized
Figure 3 shows the six major categories of cost to consider: solutions for each environment that will have the greatest
headcount, facilities, servers, OS and manageability, effect on cost efficiency and ROI.
storage and backup/recovery and network. By adding
these costs and then dividing them by the total number
2022 Unit-based Costing of IaaS
of appropriate service units for the environment, we arrive
at a cost per service unit. Design Office and Enterprise
Environment Environment
Determining the Cost per Service Unit 2.7%

Total Data Center Cost Categories 9.4%


17.0% 21.4% 17.8%
+ + + + + 0.8%
15.6%
Headcount Facilities Servers OS and Storage Network
Management and BaR 13.7%
11.2%
DIVIDED BY
54.5%
10.5% 25.4%

Design Manufacturing Office and


Enterprise Headcount Facilities Servers OS/Management Storage/BaR Network
Total DOME-Specific Service Units
Figure 4. Knowing total unit costs and individual cost
Figure 3. We arrive at a data center unit cost by considering category figures for each business environment, we can
all categories of cost and dividing by the number of units better choose IT investments that lower costs the most.
for that environment. Unit examples include EDA-MIPS in
Design and OS instances in Office and Enterprise.

Intel IT Data Center Dashboard


Design Environment Dashboard
To better monitor and manage our worldwide network of data
centers, we developed and deployed an integrated Intel IT Data 50%
Center Dashboard. This dashboard is modeled on a dashboard
used in Intel’s Manufacturing environment.
25% 75%
82.1%
Effective Utilized MIPS
This dashboard helps us monitor our KPIs by highlighting
the current state and opportunities for optimization. We
can thereby achieve overall improvements that align with 0% 100%
our data center strategy goals.
For example, the dashboard can report on effective utilization of several data center resources, including EDA-MIPS;
raw and utilized storage capacity; and facilities space, power, and cooling.
This data can report statistics by business function or by data center and can be used to compare KPIs and metrics
across several data centers. The figure above shows a sample of the design environment dashboard.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 8

Results: Building on the Past, Building for the Future


This section details some of the improvements and Table 2. Data Center Improvements from 2003-2022
cost savings our data center strategy has enabled over
the years, using our five primary tactics of embracing
disruptive servers, adopting tiered storage, increasing Data Center-wide
facilities efficiency, driving network efficiency, and • Smaller total data center footprint
improving operational efficiency. We are building on • Improved overall storage and network practices
previous successes. Therefore, some of the results • Increased data center facilities efficiency
shown here are cumulative; others have been achieved • Global street-to-server audit helps prioritize investments
over the last few years as a direct result of our MOR
strategy. Our refined data center strategy enables us Design Environment
to support the growth of Intel’s customers, products, • Deployed disaggregated servers
and acquisitions. It also helps to enhance the quality, • More efficient Design compute and storage
velocity, and efficiency of the services we offer to Intel • Increased Design throughput using NUMA-Booster
business groups. • Faster Design throughput using SSDs

We have dramatically improved performance and reduced Office and Enterprise Environment
costs for our data centers (Table 2).
• More efficient Office and Enterprise compute and storage

Disaggregated Server Architecture


The first major server innovation since the introduction of blade servers in 2005
As shown below, Intel IT has developed a disaggregated server architecture. The architecture separates the CPU/
DRAM module and the NIC/Drives module on the motherboard. Redesigning the server to be modular enables
us to upgrade the CPU/DRAM module while retaining the other components that are not ready for end-of-life.
These include fans, power supplies, cables, network switches, drives, add-on module/accelerator, and chassis. The
disaggregated server architecture is characterized by a CPU/DRAM complex or module and a NIC/Drives module.
These modules can be refreshed independently of each other and of the rest of the server components. We have
found that the disaggregated design offers the following benefits:
• No need to replace perfectly good components. • Reduces technician time spent on refresh by 77%.
• No need to reinstall the OS. • Decreases refresh materials’ shipping weight by 82%.
• Cuts refresh costs by a minimum of 44%.

Example of a 1-Socket Disaggregated Server


Accelerator Network Fans Power
CPU and Supply
DRAM Module Switch
I/O Module
Network Fans
CPU and Accelerator Switch Power
Supply
CPU and DRAM Accelerator I/O
Fans
DRAM Module I/O Module Chassis Module Module Module
Manager Fans Power
Battery Supply
Pack Fans
CPU and Accelerator Power
Battery
DRAM Module I/O Module Pack Fans Supply

Example of a 2-Socket Disaggregated Server


Network Fans Power
CPU and DRAM Module I/O Module Switch Supply
Network Fans
Power
CPU and DRAM Module I/O Module
Switch
Fans
Supply CPU and DRAM I/O
Chassis Module Module
Manager Fans Power
Battery Supply
Pack Fans
CPU and DRAM Module I/O Module Power
Battery
Pack Fans Supply
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 9

Disaggregated Server Innovation Reduces TCO and TCE


One of our leading tactics to achieve our MOR goals is to
adopt disruptive server technology. To this end, we are
deploying disaggregated servers throughout our data
centers. It makes little sense to replace an entire light
fixture when all that is needed is a more energy-efficient or
powerful light bulb. Likewise, replacing an entire server is
not necessary when all that is needed is a more advanced
CPU and DRAM.
Our disaggregated server architecture has the potential
to dramatically change how data centers around the world
perform server refreshes. It will lead to significant refresh
savings (see Figure 5) and the opportunity to quickly take
advantage of the latest compute technology. This technology
is already being used in Intel’s data centers in Santa Clara,
California. These data centers have the world’s best power
usage effectiveness (PUE) rating of 1.06.
Disaggregated server at scale
Refresh Savings Example of a
3U Chassis with 14 Blades
Baseline 0% Savings Adopting Tiered Storage and
Full Acquisition
Rip and replace the entire system Other Storage Techniques
A significant focus on effective utilization in our Design
14x Blade Refresh Only
17%
environment has enabled us to improve resource utilization
Keep chassis with network switch,
power supply, fans from 46% to more than 75%. Our goal is to reach 80%.
Tiered storage is foundational to meeting our MOR
CPU + DRAM
44% Refresh Only
Replace only CPU and DRAM
goals. A four-tier approach to storage helped increase
the effective utilization of storage resources, improve
our performance to SLAs, and reduce TCO for Design
– Refresh Savings + storage. The tiers of Design storage servers are based
on performance, capacity, and cost.
Figure 5. Refreshing the CPU/DRAM module in a • Tier-1 servers have the highest performance and the
disaggregated server saves at least 44% compared to a least storage capacity to support tens of thousands
full‑acquisition server refresh. Based on Intel internal testing, of extremely high Network File System operations per
March 2017. second (NFSops) HPC jobs.
• Tier‑2 servers offer medium performance but greater
The ability to spend less time and money on refreshing storage capacity; these are targeted to support
servers means Intel IT can afford to refresh faster, bringing thousands of intermittently high NFSops HPC jobs.
the most advanced Intel Xeon processor-based technology
into Intel’s data centers. We are excited about the resulting • Tier‑3 servers provide lower performance but
opportunities to boost data center efficiency and more emphasize capacity.
effectively power Intel’s silicon design jobs. We have
deployed more than 310,000 disaggregated servers so far,
based on multiple generations of Intel Xeon processors.
In addition to the TCO benefits of 44% lower refresh cost
over a full acquisition (rip-and-replace) refresh, reduced
provisioning time of 77% and reduced shipping costs, We updated our strategy to account
disaggregated servers have total cost to environment (TCE)
benefits of 82% reduction in material shipping weight and for computational scale of the site …
significantly reduced e-waste. this enabled us to improve our ability to
meet the quality, SLA and cost targets.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 10

• Tier-4 servers have the highest capacity but are used cycle has enabled us to shift from tape-based backup to
for low-frequency access and read-only archived data. disk-based backup with newer technology and architecture.
This shift has made business continuity and rapid recovery
• We have initiated work to automatically tier unused
from disaster a reality while reducing the backup cost and
blocks from these higher tiers to an on-premises object
storage solution. enhancing the SLA. We are also using this transition to
further reduce our backup footprint. Our approach is to
Our strategy has been updated to account for the avoid backing up data for which it is more cost effective
computational scale of the site. This helps us to determine to regenerate it than to recover it from backup.
the appropriate performance level required for each
tier and improves our ability to meet quality, SLA, and Data Reduction
cost targets. Our automated systems monitor file server
The introduction of new storage to support company
responsiveness and use that information to regulate the
growth and our commitment to timely refresh are enabling
jobs through suspension and ramp controls. At the same
us to use the latest generation of Intel Xeon processors.
time, the automated systems generate and analyze file
access patterns to determine which jobs, users and files These processors provide us with the processing power to
are experiencing the highest access rates. We selectively handle data deduplication, compaction, and compression
use storage QoS to isolate and mitigate the impact of on our primary and backup storage servers. They have
very‑high‑IOPS workloads. freed more than 144 PB of capacity, which we are making
available for our users.
We have applied several other storage techniques to further
enhance storage efficiency and reduce costs including scale- We continue to work closely with our internal design teams
out storage, refresh cycles for storage and data reduction. to achieve the following goals:
• Optimize their design flows to reduce the growth rate of
Scale-out Storage their data and IOPS requirements.
We have executed a strategic shift from a fragmented • Dynamically adjust the allocations based on usage.
scale-up storage model to a pooled scale-out storage • Over-allocate capacity.
model. Scale-out storage better supports on-demand
requests for performance and capacity. In addition, We have historically used efficient scanning algorithms to
scale-out storage enables transparent data migration determine the age of files and then used that data to right-
capabilities. It also increases the effective utilization of tier entire areas or subdirectories. We are now using block-
space freed by using storage-efficiency technologies level transparent data tiering to tier aged data to object
such as deduplication, compression, and compaction. We storage. We combine the aging information with I/O activity
are performing storage scaling on-demand for read-only to make more intelligent decisions to remove unused data
storage areas, which require extremely high access rates. within three to six months.
We use mount options to increase attribute caching and
avoid wasteful locking options on read-only areas. This Increasing Facilities Efficiency
reduces the storage load by more than 50% and improves
We used our new investment model to evaluate the
job throughput. We have also enabled high-performance
number of data centers we currently have and the number
shared scratch spaces to meet the demand from our
we should have. The new investment model identified
hyperscale EDA compute environment. As we march
opportunities to reduce the number of data centers
towards significantly higher compute scale, where the
using techniques such as the following:
impact of storage overload is becoming more costly, we
are shifting our bias towards achieving higher resiliency. • Closing, retrofitting, or reclassifying data centers
This is achieved through increased redundancy and and improving efficiency.
moderation of our storage capacity utilization targets. • Co-locating local infrastructure with Design and
Manufacturing data centers or providing services
Storage Refresh Cycle from a server closet.
To improve performance and reduce costs, we implemented • Managing local infrastructure sites remotely.
an efficiency-based refresh cycle. This enables us to take • Improving facility power efficiency through
advantage of storage servers with better performance and strategic investments.
more efficient energy use. This approach has reduced both
capital and expense costs. For example, a more energy- We have targeted 32 inefficient data centers since
efficient server can reduce data center power usage. A 2011. Our efforts have eliminated 66,375 square feet
more powerful server that replaces several older servers and converted 23,609 square feet of data center space
can also reduce our data center footprint. It also helps us to low‑cost infrastructure rooms. This has saved Intel
deliver better performance for our customers at a similar USD 25.45 million annually.
or lower cost per TB. Over the last few years, our refresh
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 11

Figure 6 shows how we have consolidated our data center Driving Network Efficiency
facilities from 2003-2022. We have reduced the total
square footage by up to 21% and reduced the number of Data center growth is continually placing greater demands on
data centers from 152 to 54. Simultaneously, we increased Intel’s network. In response, in 2010 Intel IT began to convert
our data center compute capacity and commissioned our data center network architecture to 10 GbE connections.
power by up to 108% from 50 MW to 105 MW over the last Around 2015, we introduced 40 GbE to meet the inter-switch
ten years. From 2012-2022, we have saved over 1.3 billion link capacity demand. In 2019, we started a multiyear journey
KW hours compared to industry-standard data centers. to make 100 GbE pervasive within our data centers to keep
up with the demand. Figure 7 illustrates the growth in data
center network port deployments.
2003-2022 Data Center Modules
10/40/100 GbE Port Deployment
103 105 MW
482k Ft2
466k 78,561
446k
~21% 419k 85
52,476 14,097
Square
Footage
350k
2017 217,558

Number of Ports
390k 380k Ft 2
100 GbE Introduction
17,842
11,489
Decrease 12,094
356k 3,758 184,914
Since 2003
2016
176,109
12,019
50 MW 64
1,619 155,082
152 40 GbE Introduction 11,755

108%
Modules 136 520 128,756
6,525
3,121 108,604
Overall 91
87,692
Increase 68 56 56
in Power 60 56 56 54 65,456
Since 2003
37,880
3,856 18,136
936 12,896
2003 2005 2011 2012 2016 2020 2021 2022 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Sq. Feet Commissioned Power Total Data Center Modules
10 GbE 40 GbE 100 GbE

Figure 6. Innovative data center designs have enabled us Figure 7. Implementing 10/40/100 GbE data center fabric
to decrease data center square footage while increasing design accommodates current capacity growth.
power density and capacity.

Data Center Evolution at Intel


Driving Up Density While Driving Down Power Usage Effectiveness (PUE)

Gen 1 (1990s) Gen 2 (early to mid-2000s) Gen 3 (2013 and beyond)


Characterized by forced chilled air With improvements such as raised- Our modern data centers use free air
from the ceiling and no hot/cold air floor forced chilled air or hot/cold cooling or close-coupled evaporative
segregation, these early data centers air segregation including chimney cooling to achieve an industry-leading
could accommodate 42U racks with a racks, density stayed at 42U, but PUE of 1.06, with an extreme rack
power consumption of 5 KW, resulting power consumption delivered to the density of 60U and up to 43 KW/rack.
in a PUE of >2.0. Data centers that racks increased to as much as 30 KW,
used chilled air from the row end had resulting in a lower PUE of ~1.18.
a PUE of ~1.40.

Read More: Extremely Energy-Efficient High‑Density Data Centers White Paper


IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 12

To meet today’s scale and capacity demand, we are protocol within our data centers; this protocol does not
now migrating the data center architecture to a leaf- scale well for large networks. Using layer 3-based, scalable
spine architecture. We are also transitioning our switch architecture within Intel’s data center lets us plan for scale
interconnects to 100 GbE and multi-100 GbE. Our new 100 and resiliency. Also, we are using other technologies such
GbE data center fabric design accommodates our current as overlay, multi-chassis link aggregation and tunneling to
annual network capacity growth of more than 30%. extend layer 2 across data centers, over the layer 3 topology.
In 2022, we have increased our 100 GbE capacity from Due to the scale of the data center and new landings, we
52,476 to 78,561 ports (see Table 3). All the switch made zero-touch provisioning and automation a key part
interconnects are being migrated to 100 GbE going forward. of the new architecture. With the new simplified modular
However, 40 GbE and 10 GbE will continue to be a key part design, each key building block has been converted into a
of the data center technology. We currently have deployed module of the automation system. This approach allows
about 217,558 10 GbE ports. us to provision the network within minutes with minimum
effort. In addition, we can maintain consistency across the
Table 3. 100 GbE Port Count Growth network and investigate anomalies.

100 GbE Annual Growth Rate We tend to adopt higher-speed network technology almost
Year Port Count (% increase) as soon as it is available in the market. We started adoption
2017 520 of 40 GbE in data centers in 2015 and adoption of 100 GbE
technology in 2017, to keep pace with network demand.
2018 1,619 211%
2019 3,758 132% In 2015, we also made two key architecture changes within
Design data centers. We reduced the oversubscription
2020 17,842 374%
through the infrastructure and shifted from chassis-based
2021 52,476 194% switches to fixed form factor switches for better cost and
2022 78,561 49.7% upgrade efficiency.

In addition to increasing the network capacity, we have also With this move, we reduced the oversubscription from 8:1
increased the effective utilization of network ports over to 6:1 on the compute side and 8:1 to 3:1 on the file server
the last 13 years from 40% to 68% (1.7x increase). Higher side. Over the same period, we transitioned 70% of our
utilization means we do not have to purchase additional ports Design data centers to use fixed form-factor switches using
to meet network capacity demand growth. Figure 8 shows a modular design. Now with the new leaf-spine architecture,
the continual increase in port utilization. we have maintained the same level of over-subscription
ratios even though the file servers are transitioning to
Effective 10 GbE Port Utilization 40 GbE. This is possible by using 8x100 GbE interconnect
Higher is Better links and 16x 100 GbE spine-to-universal spine links.

2017 2018 2019 2020


2016 71% 2021
70% 70% 71% 69% 2022
68% 68%

Early Adoption
2015 70%
2013 2014 62%
60% 61%
60%
2012
51% Intel IT adopts higher-speed network

2010
2011
45% 1.7x 50% technology in their data center almost as
40%
Increase in 10 GbE Port soon as it is available
Utilization from 2010-2022
40%

Figure 8. Effective utilization of network ports has n


o
increased by up to 1.7x between 2010-2022. pti
do
SPEED

el A
100,000 GbE
I n t 10,000
100 1,000
We are also focusing on improving data center stability.
b ility
In the past, we used a large installation of layer 2-based 40
t Availa
technology. We have migrated to a layer 3-based network. arke
10 M
This new architecture enables us to use all available 0.1 GbE
bandwidth on primary and secondary paths at the same
TIME
time. Therefore, we can use our network capacity more
effectively. We are also able to eliminate the spanning-tree
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 13

Achieving More Efficient 6th Generation of HPC


Design Compute and Storage Designing Intel® microprocessors is compute intensive.
One of the major challenges in our Design environment is Tapeout is the last stage in silicon design, and its computation
that server and storage growth is occurring at a high rate. demand is growing exponentially for each generation of silicon
Average annual growth rate of compute capacity demand process technology. Intel IT adopted HPC to address this large
over the last 13 years is up to 32%, while storage has grown computational scale and realized significant improvements in
annually at up to 39% (see Figure 9). computing performance, reliability, and cost.

We expect the number of cores to continue to increase. We As shown in Figure 10, our HPC solution has enabled an
plan to measure data center performance based on number up to 519x growth in tapeout compute capacity from
of cores, number of racks, power consumed, and the extent 2005 to 2022. We are now using the 6th generation of
to which we meet the meaningful indicator of performance our HPC solution and will continue to develop new HPC
per system (MIPS) demand. generations as Intel process technology advances. The
figure also shows our commitment to quality. Through a
disciplined approach to change management (running our
Design
729
727 data centers as if they are factories), we have reduced the
number of compute issues that impact tapeout by 322x.
Compute and Storage Demand
541.26x
653 Intel Tapeout Computing Metrics
518.71x
Higher

32% 32.2
383.39x

EDA-MIPS INCREASE
2010-2022 Average Annual 293.18x
519x
Increase in
Compute Demand Growth 522
Processing
Capacity
252.59x

39% 435
Pre-HPC HPC-1
45nm
HPC-2
32nm
HPC-3
22nm
HPC-4
14nm
156.02x
PB INCREASE
2010-2022 Average Annual 413
121.78x
322x
Decrease
Raw Storage Demand Growth 90.21x
HPC-5 HPC-6 in Issues
51.87x 10nm Intel 4
5.4 40.94x Intel 3 Impacting
380 3.8 29.87x
345 2.9 3.8 Tapeout
20.58x
Lower

7.39x
1.00x 3.95x 0.5 0.5 0.2 0.5 0.3 0.3 0.1 0.1
13.2x 17.32x 1.3 1.6 1.0
2016 329 323
2005 2023
Disaggregated Servers Compute Issues Impacting Tapeout (Issue per 1,000 masks) lower is better
Introduced Normalized Tapeout Processing Capacity higher is better
280 servers/rack 271
Figure 10. Our HPC solution, combined with disciplined
239 change management, has steadily increased compute
237 capacity and improved QoS.
229
189

2013 192
Increased Design Throughput Using NUMA-Booster
High-Density Racks 154
143
Introduced 161 Overall data center optimization includes more than simply
140-180 servers/rack 120 145
looking at server performance and facility efficiency.
95 117 Application performance and workload optimization
75 101
69
can also be contributing factors. We developed a system
53 83 software capability called NUMA-Booster. This feature
31
40 56 59 automatically and transparently intercepts all Design
45
27
20 29
38
32
43 workloads running on two-socket batch servers and
21
56
11
56 48
15
45 39 48 74 126 161 206 228 262 302 354 365
performs workload scheduling better than the default OS
scheduling capability. Our tests have shown an average 17%
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
improvement in Design performance on these two-socket
Design Servers (1k) EDA-MIPS (10K) Cores (10K) Raw Storage (PB) servers. We are also deploying large-scale single-socket
servers when possible. These servers do not need the
Figure 9. Despite continuing growth in compute and storage NUMA-Booster feature and can further increase overall
demand, our Design data centers are using powerful Intel Design performance.
technology to meet demand.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 14

Increased Design Throughput Using SSDs We are now able to provide a game-changing remote
as Fast Local Data Cache Drives interactive computing user experience by using User
Datagram Protocol (UDP) instead of Transmission Control
Intel silicon chip Design engineers face the challenge of
Protocol (TCP) for interactive jobs over the WAN. Using UDP
integrating more features into ever-shrinking silicon chips,
has provided up to 4.5x faster response for computer-aided
resulting in more complex designs. The increasing design
design (CAD) modeling.3 We have reached the stage where
complexity creates large electronic design automation
our international design team members have better user
workloads that have considerable memory and compute
experience and higher throughput when working from home
requirements.
with systems in the US hubs than their local data centers.
We typically run the workloads on servers that need to We also delivered up to 9x improvement in data transfer
be configured to meet these requirements in the most rates across the WAN through in-depth collaboration with
cost‑effective way. internal and external technology experts.3 This collaboration
optimized the TCP stack, which can take full advantage of
Intel IT has deployed over 40 PB of SSD storage in high-speed WAN links. The interactive computing and data
over 20,000 servers as fast local data cache drives. replication improvements were achieved within existing
This approach improves workload performance due to WAN bandwidth. Combined, these achievements enable us
reduced network traffic and storage demand. to provide rapid turnaround through the hubs for the model
build, design synthesis, layout and tapein cycle.
Optimizing Servers to Meet Compute Demand
Intel silicon design is continually increasing in complexity. 3
According to internal Intel IT measurements, February 2020
To achieve concomitant faster time to market
improvements, Intel IT provides a global framework
for parallel hardware and software design of numerous
System on a Chip platforms and IP blocks.
Design Environment
Matching single-socket servers and highly scalable server
Improvement Examples
configurations in our data centers yields 25 to 30% faster Efficiency improvements and cost savings
product design and architecture validation processes. We
from 2010 through 2022
use a global scheduling mechanism that pools compute
capacity of over 358,000 servers at multiple sites around
the world. In this way, our design hub provides scalable Computing
capacity and delivers optimal memory and compute Intel IT innovations in the Design computing
capability in a shorter amount of time. data center include disaggregated server
innovation (44% savings during refresh); the NUMA-
Since the first disaggregated server design in 2016, we Booster solution (17% higher performance); SSDs
have continued to evolve the concept. We currently have (27% higher capacity at lower cost); faster servers
deployed more than 310,000 disaggregated servers, using (35% higher performance); single-day dock-to-
13 different blade designs including both single-socket production deployment and procurement efficiency.
and two-socket servers. We use the Intel Xeon processor
E family, Intel Xeon processor W family and Intel Xeon
Scalable processors. The various models are targeted to Storage
meet specific workload requirements, such as different We have implemented Design computing
memory capacity, throughput or number of performance data center storage efficiency improvements
cores, high bandwidth, high IOPS storage needs or the by adopting innovative technology capabilities and
ability to add in accelerator cards on demand. increasing utilization.

Enhanced User Experience across the


Network
Global HPC Design Community We adopted a multi-vendor strategy
We were able to successfully consolidate batch activities for our Design computing data center
into our global compute hubs. Further consolidation was network, combined with a focus on reduction of
limited by the following: expensive maintenance costs associated with older
• We did not want to negatively impact user experience for equipment. As we adopt 100 GbE we are focusing
interactive users across the globe. on Intel® Silicon Photonics-based optics because
that technology has a significant cost advantage
• We needed to provide local copies of critical data rapidly over laser-based optics.
over the high-latency international WAN links.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 15

Design Zones Enable Highly Resilient In 2019, we brought additional existing virtualized workloads,
Scaling at the Hubs VMs and hosts into our private cloud environment for
centralized management, increasing the footprint by up
The dramatic increase in computing scale in a shared
to 1.77x. Process improvements and enhanced automation
network-attached storage (NAS) environment with tens of
led to additional savings, and we are now deploying
thousands of compute servers can overwhelm the storage
performance-based VMs.
server. It can also introduce significant efficiency and
reliability concerns when 10,000 or more such systems
share the same Network File System (NFS) area and expect
extremely high IOPS or throughput rates. We addressed Office and Enterprise
this in our mission-critical tapeout environment. This Compute and Storage Demand
environment runs parallel workflows that span the entire Physical Servers VM Hosts Virtual OS Instances Raw Storage (PB)
compute environment. We introduced the concept of
partitioning the compute in the two major hubs into smaller, 2019 49.05 PB
Integrated existing VMs,
self-contained sites. Each site has its own NFS storage
and management infrastructure. We worked with our
8.2x
RAW STORAGE
workloads, and hosts
under centralized enterprise
38,244

tapeout team to update the tools, flows and work methods, INCREASE 2009-2022 private cloud umbrella
43.77
along with IT software. As a result, we were able to scale 2018 33,790 32,414
29,107
while maintaining the efficiency and improving resiliency Aggressive reclamation strategy
20.4x
OS Instances
and scalability. reduced VMs by ~5,400
VIRTUAL OS INSTANCE 27,079 36.8 1,713
INCREASE 2009-2022
We later experienced the same scaling challenges for the
881 1,585
rest of the HPC design environment in the hub. These 24,691 1,688 9,760
1,674
issues were caused by the increased sharing on a higher
~1.5x
9,005
30.8 8,738
scale and could not be addressed cost effectively or VM DENSITY
21,339 21,596
29 8,135 8,058
efficiently by the storage changes alone. We built on the INCREASE 2009-2022
17,828
tapeout “sites” concept to introduce design zones into 129
the design hub computing environment. We successfully 6,256 384 15,181
5,746
scaled multiple zones and achieved adequate separation 19.8
484
to provide the necessary increased scale and reliability in 4,601
12,208
788
17.6
650 725
a cost-effective manner. This is a challenging and ongoing 3,895
695
3,918 3,932 896

effort. We must contend with decades’ worth of legacy 3,597


943
3,395
8,413
interdependencies across project and business units. 4,170 12.9
14.1
2,529
These interdependencies use symbolic links and shared 11.8 12.2
1,428
source files, tools, and flows. We expect that the profiling 10.5

work that we are doing, combined with our efforts with 6


7.5

containers, will enable us to achieve truly independent, 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
scalable, and resilient zones without sacrificing efficiency
or the agility to respond to peak computing demands. Figure 11. A high rate of virtualization combined with Intel
architecture has enabled us to meet growing Office and
Enterprise compute and storage demand while significantly
More Efficient Office and Enterprise decreasing the number of virtualization host servers.
Compute and Storage
Like our Design environment, the compute and storage
demand in our Office and Enterprise environment are
also growing quickly. Nevertheless, as shown in Figure 11,
we continue to meet that demand while maintaining the
number of physical servers over the last three years.
From 2009 to 2017, we achieved an approximate 19x
increase in the number of virtual OS instances. We also
greatly increased average VM density per physical server—
from 11 VMs in 2009 to 30 VMs in 2017 due to improved
server platforms. In 2018, we implemented an aggressive
VM reclamation strategy that led to a reduction of about
5,400 VMs. New workloads that were more cost effective 14 disaggregated servers in a 6U blade chassis with
to deploy on cheaper physical platforms than on a virtualized integrated network switch
platform led to an increase in physical server counts.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 16

Summary of Results from 2010 to 2022 Reducing Unit Costs


Our strategic approach has enabled us to deliver a data The graphs on page 17 detail how our budget has remained
center infrastructure best suited to meet our complex and relatively flat while unit growth has continued to rise in both
ever-increasing compute needs while transforming our cost the Design and Office and Enterprise environments. Our
structure. By applying the innovative data center techniques investment model has enabled us to reduce unit costs in the
listed in this paper, we have achieved unit‑cost levels that are Design environment by 91% and in the Office and Enterprise
significantly lower than if we were to host our workloads using environment by 86%.
public cloud infrastructure (Figure 12). Our workloads and our Before implementing our data center strategy, we spent
ability to achieve high server utilization are particularly well a third or more of our Design environment budget on
suited towards private cloud investment. facilities, and only a quarter or less on servers—but the
Over the 13-year period from 2010-2022, we have servers are what power Intel’s business success. Our new
garnered combined capital and operational savings in investment model has enabled us to reverse that ratio, now
excess of USD 7.5 billion, which help fuel our continuous spending only ~16% on facilities and more than 50% on
innovation cycle. servers. A similar transformation has occurred in the Office
and Enterprise environment, with much of the growth
Design driven by newer analytics and security workloads.

Relative Unit-Cost Comparison


Internal Hosting External Hosting Summary of Best Practices
186%
Over the last decade, we have made many strategic investments
and developed solutions to enable our data centers to be more
efficient and to better serve the needs of Intel’s business. We are
3.36x now applying our MOR approach across our entire infrastructure
Relative Unit Cost

121%
100% Unit-Cost Decrease stack—compute, storage, networking, and facilities. Page 18
Using Internal Hosting 2010-2022
provides a summary of the best practices we have developed
62% and the business value they have generated.
74% 44% 42%
39% 41% 37% 33%
56% 34% 33% 32% 28%
38%
27%
Lower is Better 21% 15%
13% 12% 12% 10% 9% 8%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Our investment model has enabled
Figure 12. Unit cost including servers, storage, network, us to reduce unit costs in the Design
and operational costs shows private cloud hosting of our
data center workloads is significantly less expensive than if environment by 91% and in the Office
we use public cloud services. and Enterprise environment by 86%.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 17

2010-2022
Intel IT Data Center Strategy Results
Design Office And Enterprise
Environment Growth Environment Growth
2502%
1794%
1306%
Relative Performance

Relative Performance
25x ~4x
468%
GROWTH IN GROWTH IN
over

over
427% 429% 418% 392%
1131%
ENVIRONMENT ENVIRONMENT 378%
933%
834%
285%
493% 650% 252%
227%
349% 189%
259% 162%
136% 183% 131%
100%
100%

2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022

Total Spend vs. Per Unit Cost 210%


Total Spend vs. Per Unit Cost
131% 160%

115% 127%

45%
107% 105% 104%
Relative Spending and Cost

Relative Spending and Cost


102% 102% 2010
2010 100% 98% 101%
2014 2018 2020 2022 2012 2014 2016 2018 2022
100% 2012 2016 100%
93%
88% Total Spend Decrease
80%
74%

56%
31%
2022 Spend Increase
64%
69% 65% 64%
61%
50%
42%
51% 52%
55%

47%
38%
30%
27% 26%
21% 23%
15% 16% 14%
13% 12% 12% 12% 11.8% 12.5%
9.7% 8.9% 8.4% 9%

91%
Total Per Unit-Cost Decrease
86%
Total Per Unit-Cost Decrease

Total Cost Breakdown Total Cost Breakdown


Facilities Servers Storage/BaR Network Headcount OS/Management Facilities Servers Storage/BaR Network Headcount OS/Management
1% 2% 2% 2% 1.5% 2% 2% 2% 1% 1% 1% 1% 0.8%
9.4% 7% 8% 7% 8.5% 5.5% 7% 6.5% 6.5% 8.5% 11% 9% 10%
12.1% 12%
17% 17.5% 20.5% 23% 20% 19% 18% 17% 16% 2.7%
22% 22% 3.5% 16%
Total Cost as a Percent

Total Cost as a Percent

6% 28% 28% 18% 17% 18%


7% 6% 17% 29% 27.5% 28% 30.5% 20%
7.5% 8% 7% 15.3% 35% 34%
8.5% 8% 7% 8%
7.5%
14% 9% 17% 20%
16% 13% 17% 19% 17.5% 20%
12.5% 14% 15% 18% 23% 21%

14% 15% 23.5% 26% 34% 31.5%


25% 21% 15% 14%
23% 23% 37% 39% 34% 14% 11%
29% 51.8% 54.5% 10%
34.5% 36.5% 42% 38% 16%
39% 16% 13% 11%
10% 11% 29% 18% 26%
7% 7% 6% 8.5% 25% 25%
9% 4% 13%
36% 36% 6.5%
33% 32% 8%
25.5% 25% 21% 18.5% 21% 18.5% 3%
18.5% 17.5% 16% 17% 19% 16.3% 15.6% 20% 16% 12% 19% 15% 13% 14%
10.5% 12.5%

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Exploded View of Exploded View of


Change in Spending Type Change in Spending Type
2010 vs. 2022 2010 vs. 2022
15.6% 17.8%
24.8% Facilities
34.7%
Headcount

Servers 35.6%
Facilities 54.5% Other Costs Headcount Other Costs
Servers Other Costs 58.4% 56.8% 25.4%
Other Costs 29.9% Servers

39.7%
6.9%
Servers
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 18

2010-2022
Intel IT Data Center Best Practices
Servers
Adopt disaggregated servers Deploy SSDs as the standard local disk in all new servers
• Saves at least 44% over a full acquisition (rip-and-replace) refresh • Improved performance for I/O-intensive workloads and expected
• Reduces provisioning time (IT technician labor) by as much as 77% reduction of disk failure rates
• Decreases shipping weight of refreshed server material by 82%
Migrate applications from RISC to Intel® architecture
Adopt elastic computing services and technologies • Enabled significant savings and IT efficiencies
• Virtualized most of the Office and Enterprise servers • Allowed us to realize the benefits of industry-standard operating systems
• Reduced the time it takes to provision a server from 90 days to and hardware
on‑demand provisioning using virtualization
• Enabled containers as a service Deploy HPC
• 519x increase in capacity during HPC-6, with a 322x increase in stability
Enable one-day dock-to-production for physical servers • Saved USD 44.72 million net present value during HPC-1 itself
• Upfront planning and process enhancement to order long-lead time
items and rack readiness, reducing the dock-to-production release Enhance server performance through software optimization
from 10+ days to one day • Increased Design job throughput up to 49%
• Delivered various optimizations including disaggregated servers, NUMA-
Regularly refresh servers using the latest Intel® Xeon® processors Booster, fast local data cache based on SSDs and high-frequency servers
• Virtualization ratios of up to 60:1 and optimal workload to platform pairing
• Reduced Design environment energy consumption by 10% annually • Significant performance improvement of data replication (up to 9x) and
between 2008 and 2013 interactive jobs (up to 4.5x) over the WAN (Internal Intel IT, February 2020)

Storage
Refresh and modernize storage using the Implement thin provisioning and deduplication for
latest generations of Intel Xeon processors storage resources
• Take advantage of innovative technology to increase storage capacity, • Helps control costs and increase resource utilization without adversely
quality, velocity, and efficiency at a lower cost affecting performance
• More than twice the I/O throughput than older systems • Increased effective storage utilization in Design from 46% in 2011
• Reduced our data center storage hardware footprint by more than to more than 75% now
50% in 2011-2012
• Reduced backup infrastructure cost due to greater sharing of resources Automatically down-tier inactive blocks while monitoring
• Tiered backup solutions to optimize backup costs and improve reliability and reclaiming unused data
• Policy-based down tiering of blocks that have not been recently
accessed to reduce capacity demand rapidly and automatically for
Right-size storage solutions using a tiered model high‑performance storage
• Provide storage resources based on business needs: performance,
• Continuously monitor and delete transient (non-IP) data that has not
reliability, capacity, and cost
been accessed for 6 months or more based on customer expectations
• Better management of storage costs while still enabling easy access
to necessary data Scale storage on demand and provide high-performance
• Transition to scale-out storage to reduce operational complexity in tiering data
shared scratch spaces
• Automated policy-based data migration between tiers
• Enables higher workload throughput for read-only storage areas that
require high access

Network Facilities
Upgrade data center LAN architecture to support Increase cooling efficiency
10/40/100 GbE • From 2012-2022, we have saved over 1.3 billion KW hours compared
• Increased data center network bandwidth by 400% over three years, to industry-standard data centers
enabling us to respond faster to business needs and accommodate growth
• Increased the network utilization from 40 to 68% between 2010 to 2022 Use a tiered approach to redundancy, availability, and
• Eliminated spanning tree with multi-chassis link aggregation and physical hardening
Layer 3 protocol • Better matching of data center redundancy and availability features
• Reduced network complexity due to fewer NIC and LAN ports to business requirements
• Reduced network cost in our virtualized environment by 18 to 25% • Reduced wasted power by more than 7% by eliminating redundant power
distribution systems within a data center
Open the data center network to multiple suppliers Retrofit and consolidate data centers using a modular design
• Generated more than USD 60 million in cost avoidance over • Retrofitted old wafer fabrication plant to high-density, high-efficiency data
five years with new network technology center modules with industry-leading PUE of 1.06
• Utilized free-air cooling and environmentally efficient evaporative cooling
Deploy Intel® Silicon Photonics optical transceivers for maximum energy efficiency
• For large-scale 100 GbE deployment, leveraged Intel® Silicon • Avoided capital expenditures by not equipping the entire facility
Photonics to significantly reduce the per-port cost with generators
• Quickly responded to changing data center needs with minimal effort and cost
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 19

Intel IT MOR Tactics

Embrace Adopt Drive Increase Improve


Disruptive Servers Tiered Storage Network Efficiency Facility Efficiency Operational
Efficiency

Plans for 2023 and Beyond Conclusion


Our data center strategy is continuously improving. We We are committed to providing a foundation for continuous
are always striving to close the gap between current innovation that will improve the quality, velocity, and
achievements and the best possible scenario. To that end, efficiency of Intel IT’s business services. To that end, we have
we plan to continue to apply the MOR approach to our refined our data center strategy, building on the practices
primary enabling tactics: established over the last decade. Our refined data center
strategy has created new business value exceeding USD
• Embrace disruptive servers. Deploy ultra-dense, power-
7.5 billion from 2010 to 2022. Our data center transformation
optimized disaggregated server nodes to reduce data
strategy is critical for Intel IT to stay competitive.
center space and power consumption for computing needs.
Key achievements include the following:
• Adopt standards-based storage. Use industry-standard
hardware and software for scale-up and scale-out storage • Our breakthrough disaggregated server design allows
to take advantage of the latest hardware. This will enable independent refresh of CPU and memory without replacing
higher throughput more rapidly. Use strategic planning and other server components. This new design results in faster
storage protection technologies to deliver both backup data center innovation and a minimum of 44% cost savings
and disaster-recovery coverage while reducing backup compared to a full‑acquisition refresh. Along with this TCO
cost. Enhance automation to achieve fully autonomous reduction, the disaggregated server innovation enables
performance and capacity management while providing significant TCE reduction (82% of material weight in a
greater visibility and control to our customers. new server is removed with just a CPU-complex upgrade).

• Increase facilities efficiency. Use techniques such as • One-day dock-to-production for new physical server
higher ambient temperature for specific data center deployment in our data center hub.
locations to take advantage of newer equipment • We developed a system software capability called
specifications, which will help reduce cooling needs. NUMA-Booster, which has saved millions while delivering
additional usable server capacity.
• Drive network efficiency. Continue to drive LAN
utilization toward 75% and pursue software-defined • We deployed more than 40 PB of SSDs as fast local data
networking to support agile, ultra-high-density data cache drives. This increased workload performance
center designs. Continue to migrate to 100 GbE with due to lower network traffic and storage demand.
Intel® Silicon Photonics optics where appropriate and • Six generations of HPC in our design computing
cost-effective, to meet network capacity demands. environment created a 519x capacity increase and
Drive the automation deeper into our day-to-day work. 322x quality improvement.
• Improve operational efficiency. Increase the telemetry • We adopted new storage capabilities like deduplication
within the data center to improve the operational efficiency. and compression, accelerated storage refresh, focused
on increasing utilization, removed unneeded data, and
implemented policy-based tiering. All of these have
resulted in getting additional usable capacity out of storage
while reducing cost and providing higher performance.
• We deployed more than 78,000 100 GbE network ports,
We are now applying our MOR 14,000 40 GbE network ports, and 217,000 10 GbE
network ports.
approach across our entire
infrastructure stack.
IT@Intel White Paper | IT@Intel: Data Center Strategy Leading Intel’s Business Transformation 20

Our data center transformation


strategy is critical for Intel IT to
stay competitive.

We have achieved these results by running Intel data centers


like a factory, implementing change in a disciplined manner,
and applying breakthrough technologies, solutions, and
Acronyms
processes. Transformational elements of our data center DOME Design, Office, Manufacturing, Enterprise
strategy include: EDA-MIPS electronic design automation MIPS
HPC high-performance computing
• A focus on three primary KPIs. These metrics enable us KPI key performance indicator
to measure the success of data center transformation: LAN local area network
Meet growing customer demand (SLAs and QoS) MIPS meaningful indicator of performance
within constrained spending targets (remaining cost- per system
competitive) while optimally increasing infrastructure MOR model of record
asset utilization (asset efficiency). NFS Network File System
• Stimulating bolder innovation by changing our NFSops Network File System operations per second
investment model. Comparing our current capabilities NIC network interface card
to a “best achievable model” encourages us to strive for NUMA non-uniform memory access
innovation that will transform our infrastructure at a faster PUE power usage effectiveness
rate than if we sought only incremental change. QoS quality of service
ROI return on investment
• New unit-costing financial model. This model enables
SLA service-level agreement
us to better assess our data center TCO based on the
TCE total cost to environment
business capabilities our infrastructure is supporting. The
TCO total cost of ownership
model measures the cost of a unit of service output and
VM virtual machine
enables us to compare investments and make informed
WAN wide area network
trade-off decisions across business functions. This
enables us to maximize ROI and business value.

Related Content IT@Intel


If you liked this paper, you may also be interested in these: We connect IT professionals with their IT peers inside Intel.
Our IT department solves some of today’s most demanding
• Fuel Cells: An Alternative Energy Source for Intel’s Data
and complex technology issues. We want to share these
Centers white paper lessons directly with our fellow IT professionals in an open
• Green Computing at Scale white paper peer-to-peer forum.
• Intel Takes On E-Waste With Disaggregated Servers blog
Our goal is simple: improve efficiency throughout the
organization and enhance the business value of IT investments.
For more on Intel IT best practices, Follow us and join the conversation on Twitter or LinkedIn.
visit intel.com/IT. Visit us today at intel.com/IT or contact your local Intel
representative if you would like to learn more.

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Performance results are
based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration
details. No product or component can be absolutely secure. Cost-reduction scenarios described are intended as examples of how
a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings.
Circumstances will vary. Intel does not guarantee any costs or cost reduction. Your costs and results may vary. Intel technologies may
require enabled hardware, sof tware or service activation. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks
of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. Copyright 2023 Intel
Corporation. All rights reserved. 0523/WWES/KC/PDF

You might also like