IBM Systems                                                   June 2019
White Paper
Implement high-performance
object storage with MinIO
and IBM
Achieve robust performance for AI, IoT and more using MinIO
and IBM Power Systems servers with POWER9 processors
2   Implement high-performance object storage with MinIO and IBM
Executive summary                                                      It also offers the flexibility to disaggregate storage from compute
Object storage presents several important benefits for                 resources, enabling organizations to optimize compute and
accommodating fast-growing volumes of unstructured data.               storage for specific workflows. As a result, object storage is fast
With the right object storage solution and hardware                    becoming the default storage option for these organizations.
infrastructure, organizations can also achieve the robust
performance required for supporting computationally intensive          Using the right hardware infrastructure, object storage can also
workloads, including artificial intelligence (AI)/machine              provide a fundamentally different performance profile than
learning, Internet of Things (IoT), and big data analytics.            other types of storage, enabling organizations to implement new
                                                                       use cases and launch more ambitious projects. High-performance
Recent benchmark testing shows that MinIO object storage               object storage can support workloads ranging from training AI
running on IBM Power Systems servers with IBM POWER9                   algorithms to analyzing IoT data. Running MinIO object
processors can deliver exceptional throughput performance—             storage with IBM Power Systems servers based on IBM
up to 25 GB/s in aggregate for four servers—plus linear                POWER9 processors can deliver this level of performance,
scalability as clusters grow. That level of performance enables        opening important opportunities for enterprises deploying
organizations to unlock the full value of their data while also        workloads in private cloud or multicloud environments.
capitalizing on the scalability, accessibility, data protection, and
cost-effectiveness of object storage.                                  Recognizing the advantages
                                                                       of object storage
Launching data-intensive initiatives                                   For storing large, rapidly expanding volumes of unstructured
Across industries, organizations are launching new technology          data, object storage can present your organization with several
initiatives that require them to store, access, and analyze large,     advantages over more traditional file- or block-based storage.
fast-growing volumes of data. Whether they are implementing
artificial intelligence (AI)/machine learning, capitalizing on         Scalability
Internet of Things (IoT) technology, or employing other big            Object storage is designed to scale. Instead of the nested files and
data solutions, these organizations might need to store and            folders used by hierarchical file systems, object storage uses a flat
analyze tens—or hundreds—of petabytes of data.                         structure. That structure enables you to store billions of files
                                                                       without the complexity and performance issues that can develop
Much of that data is unstructured. From multimedia files               as you scale hierarchical environments. Object storage also lets
and text documents to web pages and log files, unstructured            you scale incrementally: you can scale performance or capacity
data can be difficult to query, making it challenging for              simply by adding racks of clusters.
organizations to work with all of the data they are collecting.
Traditional hierarchical file storage systems and block storage        Fast retrieval
are not the best fit for these unstructured data volumes.              With MinIO object storage, each object has metadata and uses
                                                                       the URL as a unique identifier. These tags and ID numbers help
Object storage offers an important alternative to file- and            eliminate the need to know the exact location of data within the
block-based storage for big data, as proven by organizations           storage environment. Every object is accessible from anywhere
with hyperscale environments. Object storage provides                  through its unique URL—only standard IP routing and DNS
the right combination of cost-effective scalability, data              mechanisms are required. The right object storage solution can
integrity, and accessibility that many organizations need.             also avoid the bottleneck of a centralized metadata server, storing
                                                                       the metadata alongside objects.
                                                                                                                               IBM Systems   3
Data protection and preservation                                          Until recently, big data, IoT, and AI workloads often drove
Object storage solutions protect and preserve data more                   organizations to employ Hadoop Distributed File System
efficiently than other types of storage architectures. By using           (HDFS) storage. With HDFS, you bring the algorithm to the
data protection capabilities such as erasure coding, object storage       data. Each node computes a part of the algorithm using local
can protect data using far less raw storage capacity than                 storage and then sends the results back to a centralized server,
RAID-based architectures. Data protection capabilities can also           where results are aggregated. This approach can work well for
help quickly repair problems on a per-object basis, instead of on         some algorithms, and it can offer scalability for large-scale
a per-disk basis, helping to avoid data loss and to maintain high         collections of data.
availability of data.
                                                                          However, object storage presents several advantages over HDFS.
Cost-effectiveness                                                        For example, object storage can provide greater flexibility for
The ability of object storage to scale incrementally, without             balancing compute and storage across your environment. Using
forklift upgrades, can help you control storage costs.                    high-speed networking with your object storage environment,
In addition, object storage data protection capabilities help             you can consume your compute and storage resources in the
eliminate the need for numerous copies of files, reducing the             optimal way for each particular workload.
raw storage capacity required to safeguard data and driving
down capital expenditures.                                                Object storage also requires less capacity than HDFS to ensure
                                                                          data protection for the same amount of data. While HDFS stores
Unlocking the full value of data with                                     multiple copies of each file, object storage can use data
high-performance object storage                                           protection capabilities such as erasure coding to protect data
Object storage has not always been used for high-performance              more efficiently. Object storage also helps eliminate the risk of
workloads. In fact, some organizations employ object storage as a         using a single master node, which can become a single point of
backup environment or a long-term disk-based archive.                     failure. Overall, high-performance object storage provides a
                                                                          more efficient and reliable way to support data-intensive
Object storage does have advantages for these use cases. By               workloads than HDFS.
storing objects along with metadata, object storage can make it
easier for users to find and retrieve the files, media clips, or entire   Capitalizing on MinIO high-performance
projects they need among millions or billions of files. At the same       object storage with enterprise capabilities
time, data protection capabilities can help securely preserve data        MinIO high-performance distributed object storage is designed
over the long term.                                                       for large-scale data environments. It is a well-suited Amazon
                                                                          S3–compatible replacement for HDFS, especially when used for
Yet to maximize the value of data residing in object storage,             AI/machine learning, IoT, and other big data workloads.
you need to be able to consume it quickly. High-performance
object storage solutions can help you extend the benefits
of object storage to new use cases and extract more value from
your stored data. If you can achieve sufficient throughput, you
can use object storage for big data and IoT analytics, as well as
AI/machine learning workloads.
4    Implement high-performance object storage with MinIO and IBM
MinIO object storage comprises a server, optional client, and       Flexibility
optional software development kits (SDKs):                          MinIO allows you to combine multiple data instances to form
                                                                    a unified global namespace. As a result, you can support
•	   MinIO Server is a distributed object storage server that       geographically distributed users while accommodating a
     includes an array of enterprise-grade capabilities.            variety of applications from a single console. By using an
•	   MinIO Client (“mc”) is a modern alternative to UNIX            Amazon S3 API, MinIO also gives you the flexibility to support
     commands that supports web-scale object storage                multiple clouds—and incorporate existing storage—while
     deployments.                                                   ensuring that your view of data looks exactly the same.
•	   MinIO Client SDKs include simple APIs for accessing any
     Amazon S3–compatible object storage.                           Achieving robust object storage performance
                                                                    with MinIO optimized for POWER9
MinIO is an open source solution that offers several enterprise     IBM Power Systems servers based on POWER9 processors
capabilities for protecting data, maintaining data integrity,       provide the high-performance infrastructure required by
tightening security, and maximizing flexibility.                    MinIO high-performance object storage software. Together,
                                                                    these solutions can support demanding workloads such as AI,
Data protection and integrity
                                                                    IoT analytics, and big data analytics.
Per-object, inline erasure coding protects against data loss and
maintains availability of data—even if multiple drives or
                                                                    For many organizations, Power Systems servers offer the
devices are lost. Bitrot protection avoids reading corrupted
                                                                    right combination of performance, reliability, cloud flexibility,
data caused by aging drives, firmware bugs, accidental
                                                                    and security.
overwrites, and other problems.
Security                                                            •	   Robust performance: Outstanding core performance
MinIO supports multiple, sophisticated server-side encryption            plus high memory bandwidth help deliver industry-
schemes to protect data wherever it resides. MinIO Server                leading performance.
encrypts each object with a unique object key. Even if an           •	   Reliability: IBM Power Systems servers provide dependable
individual object is compromised, the same decryption key                on-premises infrastructure to meet around-the-clock
cannot be used with any other object. In addition, MinIO                 user demands.
offers a write-once, read-many (WORM) mode, which disables          •	   Cloud flexibility: These servers integrate easily into private
all APIs that can potentially mutate the object data and                 cloud and multicloud strategies.
metadata: once written, data becomes tamperproof.                   •	   Security: Strong security capabilities—such as
                                                                         accelerated encryption built into the chip—help ensure
Support for advanced standards in identity management                    data remains protected.
creates centralized access with temporary and rotated
passwords. Fine-grained, configurable access policies facilitate
simple support of multitenant and multi-instance deployments.
                                                                                                                                       IBM Systems   5
To achieve the object storage performance needed for AI, IoT,                Several POWER9-based servers also feature a storage-rich
and big data workloads, the POWER9-based servers take                        design that supports processing and analysis of very large data
advantage of PCIe 4.0 technology. PCIe 4.0 doubles the                       volumes. The Power Systems LC922—which offers the
bandwidth offered by PCIe 3.0, which remains the standard                    highest storage capacity in the Power Systems portfolio—
used by other CPU architectures.                                             supports up to 120 TB of capacity in a 2U form factor.
In addition, these servers support nonvolatile memory                        Benchmarking IBM POWER9-based
express (NVMe) storage technology, through which each                        servers with MinIO
processor core communicates directly with storage devices                    MinIO engineers conducted benchmark testing to demonstrate
using the PCIe bus. NVMe drives can deliver superior                         the extreme performance that is possible using MinIO Server with
performance compared to previous-generation, flash-based                     POWER9-based systems. The testing deployed four IBM Power
storage. These drives also enable you to achieve that                        Systems LC922 servers, equipped with POWER9 processors,
performance in dense environments that help control                          along with four POWER8-based servers as clients. The POWER9
infrastructure costs.                                                        servers included NVMe-based flash drives in addition to hard-disk
                                                                             drives. The environment used a high-speed 100 Gb private network.
Fast networking is critical for maximizing bandwidth across
object storage clusters. By supporting multiple 100 Gb/s                     To fully capitalize on the throughput performance of POWER9-
Ethernet networking links per server, the Power Systems                      based servers, the MinIO team optimized and accelerated
servers help eliminate networking bottlenecks.                               MinIO Server for the POWER9 architecture using the Golang
                                                                             (Plan 9) assembly feature.
                                                           100 GbE top-of-rack switch
4x      IBM Power
        Systems S822LC
                                                                          4x       IBM Power
                                                                                   Systems LC922
        servers (clients)                                                          servers
Figure 1: The test environment included four IBM Power Systems LC922 POWER9 servers (right), four IBM Power Systems S822LC servers as clients, and
100 GbE networking.
6   Implement high-performance object storage with MinIO and IBM
The MinIO team first evaluated throughput performance for           Hashing operations require considerable CPU resources,
accelerated versions of two computationally intensive algorithms:   but the POWER9-based servers can deliver the required
erasure coding and HighwayHash (for bitrot detection).              performance. In the benchmark testing, the optimized
                                                                    HighwayHash algorithm running on the POWER9 servers
Erasure coding                                                      achieved throughput of 5 GB/s per core, which can saturate
With MinIO, erasure coding is designed to take place inline on      the 100 Gb network.
a per-object basis. When you store 1 GB of data, MinIO splits
up that data across a large number of drives and creates the        COSBench
appropriate amount of parity data on separate drives.               The team also ran COSBench, a commonly used open source
Depending on the parity configuration you choose, you can           benchmarking tool, to measure the performance of object
afford to lose up to half of the servers and half of the drives—    storage services. COSBench testing used four POWER9-based
you will still be able to reconstruct all of your data. Running     systems, each with four NVMe drives and connected with
erasure coding inline—instead of offline—enables you to start       100 Gb/s networking.
protecting data the moment you store it, but it inherently
demands high-performance object storage, which MinIO is             The team ran COSBench on the four clients with 256 threads
able to provide.                                                    per client (1024 total). Each test typically took about an hour,
                                                                    with a prepare (WRITE) stage of 20–30 minutes, a 20-minute
In the benchmark testing, the optimized erasure coding              main (READ) stage, and a final cleanup stage. The team
algorithm running on POWER9 systems achieved throughput             uploaded and downloaded more than 10 TB of data to
of 7–9 GB/s per core, which is critical for saturating the fast     mitigate any memory caching effects that could inflate the
100 Gb network. This level of throughput for the optimized          performance numbers.
algorithm reflects the robust performance of the POWER9
system architecture, which is particularly well suited for this     Object-size benchmarks: The team used the four-node
type of high-throughput workload.                                   cluster to benchmark MinIO object storage read and write
                                                                    throughput for objects of increasing size. Read performance
Bitrot detection                                                    reached 18 GB/s and stayed constant through 32 MB and
Similar to erasure coding, MinIO is designed to run bitrot          64 MB object sizes. For larger objects, the write performance
detection on the fly. MinIO’s implementation of the                 achieved 50 percent of the read performance, which is a
HighwayHash algorithm helps prevent the reading of corrupt          strong result.
data. The algorithm computes a hash on read and verifies the
hash on write from the application. Any change in the hash           Object Size           10 MB         20 MB         32 MB         64 MB
fingerprint indicates data corruption and requires the use of        Read (GB/s)              14.9          18.1          18.7         18.0
parity data instead of the corrupted data.                           Write (GB/s)                            5.7           7.3         10.1
                                                                    Figure 2: Read performance reached 18 GB/s for objects of 20 MB or larger.
                                                                                                                            IBM Systems   7
Cluster scaling benchmarks: The team also benchmarked                       Moving forward with MinIO and IBM
MinIO cluster scaling by increasing the number of nodes used                Object storage provides an important alternative to file and
in the test. The COSBench test demonstrated a maximum read                  block storage for large and growing volumes of unstructured
performance of nearly 25 GB/s in aggregate for the four                     data. By selecting high-performance object storage, your
POWER9-based servers.                                                       organization can extend the benefits of object storage to new
                                                                            use cases, including AI/machine learning, IoT, and other big
Expanding the cluster could also boost read performance.                    data workloads. Employing MinIO in combination with IBM
Because MinIO clusters can grow to any number of servers,                   Power Systems servers based on POWER9 processors can
and overall throughput increases as cluster size increases, the             deliver the performance to support those workloads and unlock
total read performance could be higher than 25 GB/s.                        greater value from data.
 Number of Servers                 1           2          3           4
                                                                            Learn more
 Throughput (GB/s)              10.5        19.4       24.1       25.4
                                                                            To discover more about MinIO benefits for AI, IoT, and
Figure 3: MinIO Server performance increases as the cluster size expands.   additional big data workloads, visit: https://min.io
Benchmarking summary                                                        To learn more about the complete line of the IBM Power
Results from the erasure coding, bitrot, and COSBench testing               Systems family, visit: ibm.com/it-infrastructure/power
all show the impressive throughput performance that can be
achieved with MinIO Server on POWER9-based systems. The
results of the erasure coding and bitrot detection algorithm
testing highlight how well this architecture handles these two
specific computationally intensive processes. But the results
also suggest that this architecture could deliver strong results
for computationally intensive AI, IoT, and big data workloads.
The COSBench testing illustrates how this distributed object
storage architecture can deliver outstanding aggregate
throughput performance across a cluster, enabling clients to
take full advantage of the high-performance nature of MinIO
object storage. Whether your organization is running a private
or multicloud environment, you can use this architecture to
gain the performance you need for parallel processing of large
sets of unstructured data.
© Copyright IBM Corporation 2019
IBM Global Services
Route 100
Somers, NY 10589
USA
Produced in the United States of America
June 2019
All Rights Reserved
IBM, the IBM logo and ibm.com are trademarks or registered trademarks
of International Business Machines Corporation in the United States, other
countries, or both. If these and other IBM trademarked terms are marked
on their first occurrence in this information with a trademark symbol
(® or ™), these symbols indicate US registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks
may also be registered or common law trademarks in other countries.
A current list of IBM trademarks is available on the Web at “Copyright and
trademark information” at ibm.com/legal/copytrade.shtml. Other company,
product and service names may be trademarks or service marks of others.
References in this publication to IBM products and services do not
imply that IBM intends to make them available in all countries in which
IBM operates.
         Please Recycle
                                                 XXX-XXXXX-XXXX-00