Facebook’s Database Handling
Billions of Messages (Cassandra
Deep Dive)
ByteByteGo
Mar 11
Cassandra is a powerful database system designed to store and manage
massive amounts of data across many computers.
Facebook originally developed it to support a feature called Inbox Search,
which allows users to quickly search through their messages. The goal was
to support billions of messages sent by Facebook users every day.
Storing and efficiently searching through such a massive amount of data is a
big challenge. Traditional databases, like MySQL, struggled to handle this
workload because they were not designed to scale easily.
To solve this, Facebook engineers took inspiration from two existing
technologies:
Amazon Dynamo: A system that makes sure data is always available,
even if some nodes fail. It does this by copying data to multiple
machines and using a peer-to-peer structure where every node
(computer) is equal.
Google Bigtable: A database used by Google to store large amounts
of structured data efficiently. It introduced the idea of a column-based
storage model, which makes it faster to access specific pieces of data.
1/10
By combining the best parts of these two systems, Facebook created
Cassandra, which became a decentralized, highly scalable, and fault-tolerant
database. Later, it was released as open-source software, allowing
companies like Netflix, Twitter, and Apple to use and improve it.
In this article, we’ll take a deep dive into Cassandra and understand what
makes it special.
The Key Features of Cassandra
Some key features of Cassandra are as follows:
Distributed Storage: Data is spread across many machines instead of
being stored on a single server.
High Availability: Even if some machines fail, Cassandra continues to
work without interruption.
No Single Point of Failure: Since there is no central control system,
there is no weak spot that can bring everything down.
Scalability: It can easily handle increasing amounts of data by simply
adding more machines to the network.
Cassandra’s Data Model
Cassandra’s data model is quite different from traditional relational databases
like MySQL.
At its core, Cassandra’s data model is like a multi-dimensional map (or
dictionary), where each piece of data is indexed by a row key. This means
that instead of rigidly defining tables and columns in advance, data can be
stored in a way that best suits the needs of the application.
The data is organized into column families that are of two types:
2/10
Simple Column Family: It is a collection of standard key-value pairs
where each key points to a set of columns. For example, if storing user
information, the row key could be the User ID, and the columns could
be name, email, phone number, etc.
Super Column Family: It is a more complex, nested structure that
groups multiple columns under a "Super Column." This allows
hierarchical data organization. For example, if storing a user’s
messages, the row key could be User ID. Super Columns could be
different conversations, and within each Super Column, individual
messages could be stored.
Columns can be sorted by timestamp or name, depending on the
application’s needs. Primary key lookup is the main way to retrieve data.
Instead of running complex queries like in SQL databases, Cassandra
retrieves data by directly accessing the row key.
The structure of a column consists of the following parts:
Name: The identifier for the column.
Value: The actual data stored in the column.
Timestamp: A timestamp that records when the data was written,
helping in managing updates and conflict resolution.
3/10
Cassandra API Overview
Cassandra follows a key-based lookup approach, meaning every operation
revolves around the row key. Unlike relational databases that support
complex queries (like JOINs or subqueries), Cassandra prioritizes speed and
scalability by keeping its API lightweight.
Therefore, Cassandra provides a simple API structure that allows
applications to interact with the database using three main operations.
1 - Insert Data
The interface is insert(table, key, rowMutation). This command adds new data
to Cassandra.
The “table” is where the data will be stored and the “key” uniquely identifies
the row. The rowMutation represents the changes made to the row, such as
adding new columns or updating existing ones.
2 - Retrieve Data
The API interface is get(table, key, columnName). It fetches data from the
database.
The “table” specifies where to look and the “key” identifies which row to
retrieve. The “columnName” specifies which part of the row is needed.
3 - Delete Data
The interface is delete(table, key, columnName).
This command removes data from the database. It can delete an entire row
or just a specific column within a row.
Cassandra System Architecture
Cassandra is designed as a highly scalable and fault-tolerant distributed
database.
It does not rely on a single central server but instead follows a peer-to-peer
model, where all nodes in the system are equal.
Cassandra organizes its nodes (servers) in a ring structure. Each piece of
data is assigned to a node using consistent hashing, which ensures even
distribution across all nodes. When new nodes are added, Cassandra
automatically rebalances the data without requiring a complete
reorganization.
See the diagram below that shows how consistent hashing works.
4/10
There is no master node, meaning any node can handle read and write
requests. Since all nodes are equal, there is no single point of failure. If a
node fails, other nodes in the system can continue handling requests without
disruption.
Replication Mechanisms
Cassandra ensures that data is copied across multiple nodes to prevent data
loss and improve availability. Developers can choose between different
replication strategies:
Rack-Unaware Replication: Data is copied to N-1 successor nodes in
the ring, meaning if one node goes down, its neighbors still have a
copy. This method works well for small-scale deployments but does not
account for hardware or network failures across different locations.
Rack-Aware Replication: Uses Zookeeper, a coordination service, to
manage which nodes store replicas. This ensures that data copies are
distributed across different racks (physical groupings of servers) in a
data center.
Datacenter-Aware Replication: Distributes data copies across multiple
data centers to ensure high availability even if an entire data center
goes offline.
Gossip Protocols in Cassandra
Cassandra uses a gossip protocol to allow nodes (servers) in the system to
communicate with each other efficiently.
This protocol is inspired by how rumors spread in real life. Instead of requiring
a central system to keep track of everything, information is passed from one
node to another in small, periodic updates.
Gossip protocols are great because they have a low network overhead.
Instead of flooding the system with updates, nodes exchange small bits of
5/10
information at regular intervals. Even if some nodes go offline, others can still
function because they share information across the network.
Cassandra uses Scuttlebutt, a specialized Gossip Protocol, to keep track of
which nodes are active or inactive. Each node periodically exchanges
information about itself and other nodes with its neighbors, ensuring that the
entire cluster remains up to date.
Instead of a simple "up or down" status, Cassandra assigns a suspicion level
to each node.
If a node stops responding, its suspicion value starts increasing over
time.
If the value crosses a certain threshold, the system considers the node
as "dead" and reroutes traffic to other nodes.
In other words, Cassandra’s failure detection is probabilistic, meaning it
adapts to network conditions instead of rigid timeout rules. This helps prevent
false alarms caused by temporary delays or slow responses.
Query Execution in Cassandra
Cassandra is designed to handle high-speed data writes and efficient reads
while ensuring durability and fault tolerance.
Instead of storing data like traditional relational databases, which write
changes immediately to disk, Cassandra follows a log-structured storage
model that optimizes speed and reliability.
How Cassandra Handles Writes?
Cassandra follows a multi-step process when writing data. The process
consists of three main components:
6/10
Commit Log (Disk): Every time data is written to Cassandra, it is first
recorded in the Commit Log. The Commit Log is stored on disk and
ensures that data is not lost even if the system crashes before it is fully
processed. This step makes Cassandra fault-tolerant.
Memtable (RAM): After writing to the Commit Log, the data is stored in
memory in a structure called the Memtable. The Memtable acts as a
temporary, in-memory cache for fast access. Because reading from
RAM is faster than reading from disk, this speeds up query
performance.
SSTables (Disk): When the Memtable reaches a certain size, it is
flushed to disk as an SSTable (Sorted String Table). SSTables are
immutable, meaning they are never modified after being written. Instead
of updating existing files, Cassandra writes new SSTables and merges
them later through a compaction process to optimize storage.
This write process is efficient because, unlike traditional databases that
modify data in place (causing random disk writes), Cassandra writes data
sequentially, which is much faster and more efficient. Since SSTables are
never modified, Cassandra avoids the overhead of complex locking
mechanisms found in relational databases. Also, Cassandra can recover lost
data if a node crashes because every write is first recorded in the Commit
Log.
How Cassandra Handles Reads?
Unlike traditional databases that rely on complex indexing, Cassandra
optimizes read performance using a combination of in-memory lookups and
efficient disk scans.
Here’s a step-by-step look at the read process:
Check Memtable: When a read request comes in, Cassandra first
checks the Memtable because it contains the most recent data. If the
data is found in the Memtable, the result is returned immediately,
making the read extremely fast.
7/10
Check SSTables on Disk: If the requested data is not in the Memtable,
Cassandra searches for it in the SSTables stored on disk. Since
SSTables are never updated, multiple versions of a row may exist in
different SSTables, so Cassandra must scan multiple files.
Use Bloom Filters: To improve efficiency, Cassandra uses Bloom
Filters, which are probabilistic data structures that help quickly
determine if an SSTable might contain the requested data. If the Bloom
Filter suggests that an SSTable does not contain the data, Cassandra
skips that file entirely, reducing the number of disk reads. If the Bloom
Filter indicates that an SSTable might contain the data, Cassandra
checks the file for the requested row.
Merge and Return Most Recent Data: Since Cassandra writes new
SSTables instead of modifying existing ones, multiple versions of a row
might exist across different SSTables. The system merges all versions
of the row, applying the latest updates based on timestamps, and
returns the final result to the client.
Facebook Inbox Search Use Case
As mentioned, Cassandra was originally developed at Facebook to solve the
challenge of storing and searching billions of messages efficiently. Before
Cassandra, Facebook used MySQL for storing these messages, but as the
platform grew, MySQL struggled to handle the increasing volume of data and
high query load.
To address this, Facebook deployed Cassandra on a 150-node cluster, which
stored over 50 terabytes (TB) of messages. The system needed to support
fast and scalable searches while handling constant write operations as users
sent and received messages.
Facebook’s Inbox Search allows users to find messages using two types of
queries:
Term Search (Keyword-Based Search): It allows users to search for
messages that contain specific words or phrases. For example, if a user
searches for "project update," Cassandra retrieves all messages
containing those words. The system stores word-to-message mappings
using Super Column Families, where the row key is the user ID, the
super columns are the words and the sub-columns store message IDs
that contain those words.
Interaction Search (User-Based Search): It allows users to find
messages exchanged with a specific person. For example, if a user
searches for "Alice," Cassandra retrieves all messages exchanged
between the user and Alice. This is implemented using another Super
Column Family, where the row key is the user ID, the super columns
are the contacts, and the sub-columns contain the message IDs
exchanged with that contact.
8/10
One of the biggest challenges in Facebook’s messaging system was
ensuring low-latency searches across a massive dataset. Cassandra’s highly
optimized architecture allowed it to achieve impressive performance:
Minimum Latency: As low as 7-8 milliseconds, meaning some
searches return almost instantly.
Median Latency: Around 15-18 milliseconds, ensuring consistently fast
responses for most queries.
Maximum Latency: In worst-case scenarios, searches could take up to
44 milliseconds, which is still very fast given the large dataset.
Conclusion
Cassandra is a highly scalable, distributed database system designed to
handle large volumes of data while ensuring fault tolerance and high
availability.
Its peer-to-peer architecture and ring-based design make it particularly well-
suited for applications that require continuous uptime and seamless scaling
across multiple data centers. One of Cassandra’s key strengths is its ability to
handle high write-throughput efficiently, making it ideal for real-time
applications, such as messaging platforms, recommendation systems, and
IoT data storage.
However, Cassandra is not a replacement for traditional relational databases.
It is not optimized for complex queries, joins, or transactional consistency,
which makes it less suitable for applications requiring strong relational
integrity.
For businesses and developers building large-scale, distributed systems,
Cassandra provides a robust, flexible, and highly available solution that can
grow with demand while maintaining performance and reliability.
9/10
10/10