AWS Database Services
AWS Database Services
History of Databases
Relational databases have been around for more than 50 years. The first relational database was created
by Edgar F. Codd in 1970. The key characteristic of relational databases is the organization of data into
rows and columns. Rows in one table are often related to rows in other tables using keys, which establish
relationships between tables. SQL (Structured Query Language) was developed in the 1970s by IBM
researchers Raymond Boyce and Donald Chamberlin and became the standard language for interacting
with relational databases, including inserting, updating, and retrieving data.
As technology advanced in the 1990s, relational databases began to encounter scalability issues,
particularly as diverse data types emerged. This led to the development of NoSQL databases, designed to
overcome the limitations of relational databases.
The term "NoSQL" was first coined by Carlo Strozzi in 1998 for a relational database that did not use SQL.
However, in 2009, Eric Evans and Johan Oskarsson redefined the term to refer to non-relational
databases, marking a shift in database design to accommodate the growing need for more flexible and
scalable systems.
Copyrighted Material
AWS Database Services
• Atomicity: All operations within a transaction must succeed or fail as a single unit. If one part of
the transaction fails, the entire transaction is rolled back. For example, when withdrawing money
from an ATM, the system ensures that both the withdrawal and the deduction from your account
either happen together or not at all.
• Consistency: This ensures that the database remains in a valid state before and after a
transaction. After any transaction, the database will maintain its structural integrity.
• Isolation: Transactions are processed independently, without interference from others. This
prevents data conflicts when multiple users access the database concurrently.
• Durability: Once a transaction is committed, the changes are permanent, even in the event of a
system failure.
While the ACID model is reliable and robust, it can limit scalability and performance, particularly for
global applications. As a result, an alternative consistency model known as BASE was introduced.
• Basic Availability: The system guarantees availability most of the time, even though consistency
might not always be immediate.
• Soft state: The state of the system may change over time without requiring immediate
consistency. Different replicas of data may not always be perfectly synchronized.
While ACID prioritizes strong consistency, BASE is designed to provide scalability and availability, which is
often necessary for NoSQL databases used in large-scale applications. NoSQL databases, such as
Amazon DynamoDB and Cassandra, typically implement the BASE model to support highly scalable
systems.
• Amazon RDS: A managed relational database service that supports popular database engines
such as MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, and Amazon Aurora.
Copyrighted Material
AWS Database Services
• Amazon Aurora: A relational database service that provides high performance and availability at
a lower cost compared to commercial databases.
• Amazon DynamoDB: A fully managed NoSQL database service that provides fast and
predictable performance with seamless scalability.
• Amazon ElastiCache: An in-memory data store that supports real-time applications, caching,
and internet-scale workloads.
• Amazon Neptune: A managed graph database service designed for applications with highly
connected data.
Migrating databases to AWS is also made simple with AWS Database Migration Service (DMS), which
enables cost-effective migration with minimal downtime.
Relational Databases
Relational databases remain the dominant type of database in use today. Their origins date back to the
1970s when Edgar F. Codd, working for IBM, introduced the relational model. Relational databases are
used in a variety of applications, from social media and e-commerce websites to enterprise-level
systems.
A relational database organizes data into tables, with rows representing individual records and columns
defining attributes such as names, addresses, or dates. Each table has a unique identifier known as a
primary key, which is used to relate data between different tables.
Relational databases are categorized as either Online Transaction Processing (OLTP) or Online
Analytical Processing (OLAP) systems. OLTP systems are used for transaction-heavy applications such
as e-commerce sites, where data is frequently written and updated. OLAP systems, on the other hand,
are used for data analysis and reporting. These databases are optimized for querying large datasets and
are typically found in data warehouses.
Amazon RDS simplifies the setup and management of relational databases by offering support for several
popular database engines. Amazon RDS can also be used for both OLTP and OLAP workloads.
Data Warehouses
A data warehouse serves as a central repository for data gathered from multiple sources. Unlike OLTP
systems, which are updated frequently, data warehouses are often updated in batches and optimized for
querying large amounts of historical data. Data warehouses use the OLAP model for reporting and
analytics.
Organizations often separate their databases into OLTP systems for day-to-day operations and OLAP
systems for data analysis. AWS offers Amazon Redshift as a high-performance data warehouse solution,
capable of handling complex analytical queries on large datasets.
NoSQL Databases
NoSQL databases offer a flexible, scalable solution for applications that require high performance with
large volumes of data. Unlike relational databases, which use rigid table and column structures, NoSQL
Copyrighted Material
AWS Database Services
databases are more flexible and can scale horizontally. They are often used for managing session state,
user profiles, shopping carts, and time-series data.
Amazon DynamoDB, one of the leading NoSQL services offered by AWS, provides seamless scalability
and performance, making it ideal for high-demand applications. It supports key-value and document-
based storage models and can automatically manage distributed clusters across multiple data centres.
NoSQL databases have been embraced by organizations looking to achieve fast performance and
scalability, with popular choices including MongoDB, Cassandra, CouchDB, and Hbase. AWS users can
run any type of NoSQL database on EC2 instances or use Amazon DynamoDB for a fully managed
solution.
For example, Amazon RDS makes it easy to replicate your data to increase availability, improve durability,
or scale up or beyond a single database instance for read-heavy database workloads. Amazon RDS
exposes a database endpoint to which client software can connect and execute SQL. However, Amazon
RDS does not provide shell access to Database (DB) Instances and restricts access to certain system
procedures and tables that require advanced privileges. You can typically use the same tools to query,
analyze, modify, and administer the database. Current Extract, Transform, Load (ETL) tools and reporting
tools can connect to Amazon RDS databases in the same way with the same drivers, and often all it takes
to reconfigure is changing the hostname in the connection string.
Amazon RDS was designed by AWS to simplify the management of crucial transactional applications by
providing an easy-to-use platform for setting up, operating, and scaling a relational database in the cloud.
With RDS, laborious administrative tasks such as hardware provisioning, database configuration,
patching, and backups are automated, and a scalable capacity is provided in a cost-efficient manner.
RDS is available on various database instance types, optimized for memory, performance, or I/O, and
supports six well-known database engines, including Amazon Aurora (compatible with MySQL and
PostgreSQL), MySQL, PostgreSQL, MariaDB, SQL Server, and Oracle.
• Community (Postgres, MySQL, and MariaDB): AWS offers RDS with three different open-source
offerings. This is a good option for development environments, low-usage deployments, defined
workloads, and non-critical applications that can afford some downtime.
• Amazon Aurora (Postgres and MySQL): Postgres and MySQL are available here, as they are in
the community editions. Delivering these applications within the Aurora wrapper can add many
benefits to a community deployment. Amazon started offering the MySQL service in 2014 and
added the Postgres version in 2017. Some of these benefits include:
Copyrighted Material
AWS Database Services
o Automatic six-way replication across availability zones to improve availability and fault
tolerance
• Commercial (Oracle and SQL Server): Many organizations still run Oracle workloads, so AWS
offers RDS with an Oracle flavour (and a Microsoft SQL Server flavour).
1. Multi-AZ deployment
2. Read replicas
3. Automated backup
4. Database snapshots
5. Data Storage
6. Scalability
7. Monitoring
9. Security
Multi-AZ Deployments
Multi-AZ deployments in RDS provide improved availability and durability for database instances, making
it an ideal choice for production database workloads. With Multi-AZ DB instances, RDS synchronously
replicates data to a standby instance in a different Availability Zone (AZ) for enhanced resilience. You can
change your environment from Single-AZ to Multi-AZ at any time. Each AZ runs on its distinct,
independent infrastructure and is built to be highly dependable. In the event of an infrastructure failure,
RDS initiates an automatic failover to the standby instance, allowing you to resume database operations
as soon as the failover is complete. Additionally, the endpoint for your DB instance remains the same
after a failover, eliminating manual administrative intervention and enabling your application to resume
database operations seamlessly.
Read Replicas
RDS makes it easy to create read replicas of your database and automatically keeps them in sync with the
primary database (for MySQL, PostgreSQL, and MariaDB engines). Read replicas are helpful for both read
scaling and disaster recovery use cases. You can add read replicas to handle read workloads, so your
master database doesn’t become overloaded with reading requests. Depending on the database engine,
you may be able to position your read replica in a different region than your master, providing you with the
option of having a read location that is closer to a specific locality. Furthermore, read replicas provide an
additional option for failover in case of an issue with the master, ensuring you have coverage in the event
of a disaster.
While both Multi-AZ deployments and read replicas can be used independently, they can also be used
together to provide even greater availability and performance for your database. In this case, you would
create a Multi-AZ deployment for your primary database and then create one or more read replicas of that
Copyrighted Material
AWS Database Services
primary database. This would allow you to benefit from the automatic failover capabilities of Multi-AZ
deployments and the performance improvements provided by read replicas.
Automated Backup
With RDS, a scheduled backup is automatically performed once a day during a time window that you can
specify, and it is monitored as a managed service to ensure its successful completion within the
specified time window. The backups include both your entire instance and transaction logs. The retention
period for your backups can be up to 35 days. While automated backups are available for 35 days, you
can retain longer backups using the manual snapshots feature provided by RDS. RDS keeps multiple
copies of your backup in each AZ where you have an instance deployed to ensure their durability and
availability.
During the automatic backup window, storage I/O might be briefly suspended, which may cause a brief
period of elevated latency. However, no I/O suspension occurs for Multi-AZ DB deployments because the
backup is taken from the standby instance, achieving high performance if your application is time-
sensitive and needs to be always on.
Database Snapshots
You can manually create backups of your instance stored in Amazon S3, which are retained until you
decide to remove them. You can use a database snapshot to create a new instance whenever needed.
Even though database snapshots function as complete backups, you are charged only for incremental
storage usage.
Data Storage
Amazon RDS supports the most demanding database applications by utilizing Amazon Elastic Block
Store (Amazon EBS) volumes for database and log storage. There are two SSD-backed storage options to
choose from:
Amazon RDS automatically stripes across multiple Amazon EBS volumes to improve performance based
on the requested storage amount.
Scalability
You can often scale your RDS database compute and storage resources without downtime. You can
choose from over 25 instance types to find the best fit for your CPU, memory, and price requirements. If
you need to scale your database instance up or down, you can do so to handle a higher load or preserve
resources when you have a lower load. This flexibility allows you to control costs if you have regular
periods of high and low usage.
Monitoring
RDS offers a set of 15-18 monitoring metrics that are automatically available for you. You can access
these metrics through the RDS or CloudWatch APIs. These metrics enable you to monitor crucial aspects
such as CPU utilization, memory usage, storage, and latency. You can view the metrics individually or in
multiple graphs, or integrate them into your existing monitoring tool. RDS provides Enhanced Monitoring,
which offers access to more than 50 additional metrics. By enabling Enhanced Monitoring, you can
specify the granularity at which you want to view the metrics, ranging from one-second to sixty-second
intervals. This feature is available for all six database engines supported by RDS.
Copyrighted Material
AWS Database Services
Security
Controlling network access to your database is made simple with RDS. You can run your database
instances in Amazon Virtual Private Cloud (Amazon VPC) to isolate them and establish an industry-
standard encrypted IPsec VPN to connect with your existing IT infrastructure. Additionally, most RDS
engine types offer encryption at rest, and all engines support encryption in transit. RDS also offers a wide
range of compliance readiness, including HIPAA eligibility.
Amazon RDS increases the operational reliability of your databases by applying a very consistent
deployment and operational model. This level of consistency is achieved in part by limiting the types of
changes that can be made to the underlying infrastructure and through the extensive use of automation.
For example, with Amazon RDS, you cannot use Secure Shell (SSH) to log in to the host instance and
install a custom piece of software. However, you can connect using SQL administrator tools or use DB
option groups and DB parameter groups to change the behaviour or feature configuration for a DB
Instance. If you want full control of the OS or require elevated permissions to run, then consider installing
your database on Amazon EC2 instead of Amazon RDS.
• DB Parameter Group: A DB parameter group acts as a container for engine configuration values
that can be applied to your database instance. You can change the behaviour of the DB engine by
modifying the parameters in the group. Once you create a DB parameter group, you can then
associate it with one or more DB instances. Some DB engines expose many configuration options
and require changes to the default DB parameter group.
• DB Option Group: The option group allows you to add and configure additional features for your
DB instances, such as Oracle Enterprise Manager and other database features. These option
groups are not interchangeable between engines, and if you need to switch engines, you will
need to configure a new option group for the new engine type.
If you want more control over your environment than what is provided by Amazon RDS, consider
deploying your database on Amazon EC2 instead.
Copyrighted Material
AWS Database Services
advanced functionality. Amazon RDS supports DB instances running several versions of PostgreSQL,
including 9.5.x, 9.4.x, and 9.3.x. You can manage Amazon RDS PostgreSQL using standard tools like
pgAdmin, and it supports standard JDBC/ODBC drivers. Additionally, Amazon RDS PostgreSQL supports
Multi-AZ deployment for high availability and read replicas for horizontal scaling.
MariaDB
Amazon RDS provides support for DB instances running MariaDB, which is a popular open-source
database engine developed by the creators of MySQL, enhanced with enterprise tools and functionality.
MariaDB adds features that improve the performance, availability, and scalability of MySQL. AWS
supports MariaDB version 10.0.17. Amazon RDS fully supports the XtraDB storage engine for MariaDB DB
instances and, like Amazon RDS MySQL and PostgreSQL, has support for Multi-AZ deployment and read
replicas.
Oracle
Oracle is one of the most popular relational databases used in enterprises, and it is fully supported by
Amazon RDS. Amazon RDS supports DB instances running several editions of Oracle 11g and Oracle 12c.
You can access schemas on a DB instance using any standard SQL client application, such as Oracle
SQL Plus. Amazon RDS Oracle supports three different editions of the popular database engine: Standard
Edition One, Standard Edition, and Enterprise Edition.
• License Included: The license is held by AWS and is included in the Amazon RDS instance price.
o SQL Server, License Included provides licensing for SQL Server Express Edition, Web
Edition, and Standard Edition.
• Bring Your Own License (BYOL): You provide your own license.
o For Oracle, you must have an Oracle Database license for the DB instance class and
Oracle Database edition you want to run (Standard Edition One, Standard Edition, and
Enterprise Edition).
o For SQL Server, you provide your own license under the Microsoft License Mobility
program for Microsoft SQL Standard Edition and Enterprise Edition.
Copyrighted Material
AWS Database Services
Amazon DynamoDB
DynamoDB is a fully managed, multi-Region, multi-active database that delivers exceptional
performance, with single-digit-millisecond latency, at any scale. It can handle more than 10 trillion daily
requests and support peaks of over 20 million requests per second, making it an ideal choice for internet-
scale applications. DynamoDB offers built-in security, backup and restore features, and in-memory
caching. Its elastic scaling allows for seamless growth as the number of users and required I/O
throughput increases. You pay only for the storage and I/O throughput you provision or on a consumption-
based model if you choose on-demand. DynamoDB also provides fine-grained access control and
support for end-to-end encryption to ensure data security.
• Multi-master deployment
DynamoDB is a table-based database. While creating the table, you can specify at least three
components:
• Keys: The key has two parts: a partition key to retrieve the data and a sort of key to sort and
retrieve a batch of data in a given range. For example, a transaction ID can be your primary key,
and the transaction date-time can be the sort key.
Copyrighted Material
AWS Database Services
• WCU: Write capacity unit (1 KB/sec) defines the rate at which you want to write your data in
DynamoDB.
• RCU: Read capacity unit (4 KB/sec) defines the rate at which you want to read from your given
DynamoDB table.
The size of the table automatically increases as you add more items, with a hard limit of item size at 400
KB. As the size increases, the table is automatically partitioned for you. The size and provisioning capacity
of the table are equally distributed for all partitions.
As data is stored in tables, you can think of a table as the database, and within the table, you have items.
Each attribute in an item is a name/value pair. An attribute can be a single-valued or multi-valued set. For
example, a book item can have title and authors attributes. Each book has one title but can have many
authors. The multi-valued attribute is a set, and duplicate values are not allowed. Data is stored in
Amazon DynamoDB in key/value pairs such as the following:
json
Copy code
"Id": 101,
"ISBN": "123–1234567890",
"Price": 2.88,
"PageCount": 500,
"InPublication": 1,
"ProductCategory": "Book"
}
Applications can connect to the Amazon DynamoDB service endpoint and submit requests over HTTP/S
to read and write items to a table or even create and delete tables. DynamoDB provides a web service API
that accepts requests in JSON format. While you could program directly against the web service API
endpoints, most developers choose to use the AWS Software Development Kit (SDK) to interact with their
items and tables. The AWS SDK is available in many different languages and provides a simplified, high-
level programming interface.
Copyrighted Material
AWS Database Services
When you create a table or a secondary index, you must specify the names and data types of each
primary key attribute (partition key and sort key). Amazon DynamoDB supports a wide range of data types
for attributes, which fall into three major categories:
• Scalar Data Types: A scalar type represents exactly one value. Amazon DynamoDB supports the
following five scalar types:
o String: Text and variable-length characters up to 400 KB. Supports Unicode with UTF-8
encoding.
o Null: Represents a blank, empty, or unknown state. String, Number, Binary, and Boolean
cannot be empty.
• Set Data Types: Sets represent a unique list of one or more scalar values. Each value in a set
needs to be unique and must be of the same data type. Sets do not guarantee order. Amazon
DynamoDB supports three set types:
• Document Data Types: This is useful for representing multiple nested attributes, similar to the
structure of a JSON file. Amazon DynamoDB supports two document types:
o List: Each List can store an ordered list of attributes of different data types.
o Map: Each Map can store an unordered list of key/value pairs. Maps can represent the
structure of any JSON object.
Partition Key
The primary key is made of one attribute, a partition (or hash) key. Amazon DynamoDB builds an
unordered hash index on this primary key attribute.
Copyrighted Material
AWS Database Services
write capacity to handle your expected workloads. Based on your configuration settings, DynamoDB will
provision the right amount of infrastructure capacity to meet your requirements with sustained, low-
latency response times. Overall capacity is measured in read and write capacity units, and these values
can later be scaled up or down by using an UpdateTable action. Each operation against an Amazon
DynamoDB table will consume some of the provisioned capacity units. The specific amount of capacity
units consumed depends largely on the size of the item, as well as other factors. For read operations, the
amount of capacity consumed also depends on the read consistency selected in the request.
For example, a table without a local secondary index will consume 1 capacity unit if you read an item that
is 4KB or smaller. Similarly, for write operations, you will consume 1 capacity unit if you write an item that
is 1KB or smaller. This means that if you read an item that is 110KB, you will consume 28 capacity units,
calculated as 110 / 4 = 27.5 rounded up to 28. For read operations that are strongly consistent, they will
use twice the number of capacity units, or 56 in this example. You can use Amazon CloudWatch to
monitor your Amazon DynamoDB capacity and make scaling decisions. There is a rich set of metrics,
including ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits. If you exceed your
provisioned capacity for a period of time, requests will be throttled and can be retried later. You can
monitor and alert on the ThrottledRequests metric using Amazon CloudWatch to notify you of changing
usage patterns.
Secondary indexes allow you to search a large table efficiently and avoid an expensive scan operation to
find items with specific attributes. These indexes enable you to support different query access patterns
and use cases beyond what is possible with only a primary key. While a table can only have one local
secondary index, you can have multiple global secondary indexes. Amazon DynamoDB updates each
secondary index when an item is modified, and these updates consume write capacity units. For a local
secondary index, item updates will consume write capacity units from the main table, while global
secondary indexes maintain their own provisioned throughput settings separate from the table.
If your data size is small enough and you only need to query data based on a different sort key within the
same partition key, you should use an LSI. If the data size is larger, or you need to query data based on
attributes that are not part of the primary key or sort key, you should use a GSI. GSIs come with additional
costs and complexities in terms of provisioned throughput, index maintenance, and eventual
consistency. If an item collection’s data size exceeds 10 GB, the only option is to use a GSI, as an LSI
limits the data size in a particular partition. If eventual consistency is acceptable for your use case, a GSI
can be used, as it is suitable for 99% of scenarios. DynamoDB is very useful in designing serverless event-
driven architecture, allowing you to capture item-level data changes, such as putItem, updateItem, and
delete, by using DynamoDB Streams.
Copyrighted Material
AWS Database Services
Writing Items
Amazon DynamoDB provides three primary API actions to create, update, and delete items: PutItem,
UpdateItem, and DeleteItem. The PutItem action creates a new item with one or more attributes and will
update an existing item if the primary key already exists. It requires a table name and a primary key; any
additional attributes are optional. The UpdateItem action finds existing items based on the primary key
and replaces the attributes. This operation is useful for updating a single attribute while leaving other
attributes unchanged, and it can also create items if they don’t already exist. To remove an item from a
table, use DeleteItem and specify a specific primary key. The UpdateItem action also supports atomic
counters, which allow you to increment and decrement a value while ensuring consistency across
multiple concurrent requests. For example, a counter attribute used to track the overall score of a mobile
game can be updated by many clients simultaneously. These three actions support conditional
expressions, allowing you to perform validation before an action is applied. This is useful for preventing
accidental overwrites or enforcing business logic checks.
Reading Items
After an item has been created, it can be retrieved through a direct lookup by calling the GetItem action or
through a search using the Query or Scan action. The GetItem action allows you to retrieve an item based
on its primary key, returning all of the item’s attributes by default, with the option to select individual
attributes to filter results. If a primary key is composed of a partition key, the entire partition key needs to
be specified to retrieve the item. If the primary key is a composite of a partition key and a sort key,
GetItem will require both the partition and sort key as well. Each call to GetItem consumes read capacity
units based on the item size and the consistency option selected. By default, a GetItem operation
performs an eventually consistent read.
Eventual Consistency
When reading items from Amazon DynamoDB, the operation can be either eventually consistent or
strongly consistent. Amazon DynamoDB is a distributed system that stores multiple copies of an item
across an AWS Region to provide high availability and increased durability. When an item is updated, it
starts replicating across multiple servers. Because replication can take some time to complete, we refer
to the data as being eventually consistent, meaning a read request immediately after a write operation
might not show the latest change. In some cases, the application needs to guarantee that the data is the
latest, and Amazon DynamoDB offers an option for strongly consistent reads.
Copyrighted Material
AWS Database Services
Searching Items
Amazon DynamoDB also offers two operations, Query and Scan, for searching a table or an index. A
Query operation is the primary search operation you can use to find items in a table or a secondary index
using only primary key attribute values. Each Query requires a partition key attribute name and a distinct
value to search. You can optionally provide a sort key value and use a comparison operator to refine the
search results. Results are automatically sorted by the primary key and are limited to 1MB. In contrast to
a Query, a Scan operation will read every item in a table or a secondary index. By default, a Scan
operation returns all data attributes for every item in the table or index. Each request can return up to 1MB
of data, and items can be filtered out using expressions, but this can be a resource-intensive operation. If
the result set for a Query or a Scan exceeds 1MB, you can page through the results in 1MB increments.
A well-designed application will take the partition structure of a table into account to distribute read and
write transactions evenly, achieving high transaction rates at low latencies. Amazon DynamoDB stores
items for a single table across multiple partitions.
Amazon DynamoDB decides which partition to store the item in based on the partition key. The partition
key is used to distribute the new item among all available partitions, ensuring that items with the same
partition key will be stored on the same partition.
When a table is created, Amazon DynamoDB configures the table’s partitions based on the desired read
and write capacity. A single partition can hold about 10 GB of data and supports a maximum of 3,000 read
capacity units or 1,000 write capacity units. For partitions that are not fully utilizing their provisioned
capacity, Amazon DynamoDB provides burst capacity to handle spikes in traffic. A portion of your unused
capacity will be reserved to accommodate short bursts.
As storage or capacity requirements change, Amazon DynamoDB can split a partition to accommodate
more data or higher provisioned request rates. However, once a partition is split, it cannot be merged
back together. This is an important consideration when planning to increase provisioned capacity
temporarily and then lower it again. Each additional partition added will reduce its share of the
provisioned capacity.
Copyrighted Material
AWS Database Services
operations on specific tables, and conditions can be utilized to restrict access to individual items or
attributes.
All operations must first be authenticated as valid user sessions. Applications that need to read from or
write to Amazon DynamoDB must obtain a set of temporary or permanent access control keys. While
these keys can be stored in configuration files, best practices recommend using IAM Amazon EC2
instance profiles to manage credentials. This method allows applications running on AWS to avoid storing
sensitive keys in configuration files that require additional security measures.
Amazon DynamoDB also supports fine-grained access control, which can restrict access to specific
items within a table or even specific attributes within an item. For instance, it may be necessary to limit a
user’s access to only their items within a table and prevent access to items associated with different
users. By using conditions in an IAM policy, you can restrict which actions a user can perform, on which
tables, and to which attributes a user can read or write.
Copyrighted Material