0% found this document useful (0 votes)
28 views15 pages

AWS Database Services

Uploaded by

im.second007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views15 pages

AWS Database Services

Uploaded by

im.second007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

6

AWS Database Services


AWS Database Services
Databases play a crucial role in modern applications by storing vital data and records for users. A
database engine is essential for managing large volumes of data, providing access, and enabling efficient
searches. A well-architected application must ensure that the database meets performance
requirements, is highly available, and has strong recoverability features. Database systems are broadly
categorized into two types: Relational Database Management Systems (RDBMS) and NoSQL (non-
relational) databases. Many applications leverage both types to achieve the best of both worlds. Having a
strong grasp of essential database concepts, including Amazon RDS and Amazon DynamoDB, is critical
to using these services effectively.

History of Databases
Relational databases have been around for more than 50 years. The first relational database was created
by Edgar F. Codd in 1970. The key characteristic of relational databases is the organization of data into
rows and columns. Rows in one table are often related to rows in other tables using keys, which establish
relationships between tables. SQL (Structured Query Language) was developed in the 1970s by IBM
researchers Raymond Boyce and Donald Chamberlin and became the standard language for interacting
with relational databases, including inserting, updating, and retrieving data.

As technology advanced in the 1990s, relational databases began to encounter scalability issues,
particularly as diverse data types emerged. This led to the development of NoSQL databases, designed to
overcome the limitations of relational databases.

The term "NoSQL" was first coined by Carlo Strozzi in 1998 for a relational database that did not use SQL.
However, in 2009, Eric Evans and Johan Oskarsson redefined the term to refer to non-relational
databases, marking a shift in database design to accommodate the growing need for more flexible and
scalable systems.

Database Consistency Models


Maintaining data consistency during transactions is fundamental in database systems. When data is
written to a database, it must follow certain rules and constraints to ensure integrity. These rules ensure
that data remains accurate and reliable for all users. Two major data consistency models exist in modern
databases: the ACID model and the BASE model.

Copyrighted Material
AWS Database Services

ACID Database Consistency Model


The ACID model, which has been a long-standing standard in database management, guarantees four
properties:

• Atomicity: All operations within a transaction must succeed or fail as a single unit. If one part of
the transaction fails, the entire transaction is rolled back. For example, when withdrawing money
from an ATM, the system ensures that both the withdrawal and the deduction from your account
either happen together or not at all.

• Consistency: This ensures that the database remains in a valid state before and after a
transaction. After any transaction, the database will maintain its structural integrity.

• Isolation: Transactions are processed independently, without interference from others. This
prevents data conflicts when multiple users access the database concurrently.

• Durability: Once a transaction is committed, the changes are permanent, even in the event of a
system failure.

While the ACID model is reliable and robust, it can limit scalability and performance, particularly for
global applications. As a result, an alternative consistency model known as BASE was introduced.

BASE Database Consistency Model


The BASE model, which emerged with the rise of large-scale, distributed systems, offers a more flexible
approach compared to the ACID model. It relaxes some of the strict guarantees of ACID, allowing for
better scalability and availability in distributed environments. The BASE model is built on the following
principles:

• Basic Availability: The system guarantees availability most of the time, even though consistency
might not always be immediate.

• Soft state: The state of the system may change over time without requiring immediate
consistency. Different replicas of data may not always be perfectly synchronized.

• Eventual Consistency: Although consistency is not immediate, it is eventually achieved. Over


time, all replicas of the data will converge to the same value, ensuring that the system is
eventually consistent.

While ACID prioritizes strong consistency, BASE is designed to provide scalability and availability, which is
often necessary for NoSQL databases used in large-scale applications. NoSQL databases, such as
Amazon DynamoDB and Cassandra, typically implement the BASE model to support highly scalable
systems.

AWS Database Services


AWS offers a wide array of fully managed database services that are purpose-built to meet a variety of use
cases. These services allow developers to build applications that can scale quickly while maintaining
high performance, availability, and security. The AWS database portfolio includes:

• Amazon RDS: A managed relational database service that supports popular database engines
such as MySQL, PostgreSQL, Oracle, SQL Server, MariaDB, and Amazon Aurora.

Copyrighted Material
AWS Database Services

• Amazon Aurora: A relational database service that provides high performance and availability at
a lower cost compared to commercial databases.

• Amazon DynamoDB: A fully managed NoSQL database service that provides fast and
predictable performance with seamless scalability.

• Amazon ElastiCache: An in-memory data store that supports real-time applications, caching,
and internet-scale workloads.

• Amazon Neptune: A managed graph database service designed for applications with highly
connected data.

Migrating databases to AWS is also made simple with AWS Database Migration Service (DMS), which
enables cost-effective migration with minimal downtime.

Relational Databases
Relational databases remain the dominant type of database in use today. Their origins date back to the
1970s when Edgar F. Codd, working for IBM, introduced the relational model. Relational databases are
used in a variety of applications, from social media and e-commerce websites to enterprise-level
systems.

A relational database organizes data into tables, with rows representing individual records and columns
defining attributes such as names, addresses, or dates. Each table has a unique identifier known as a
primary key, which is used to relate data between different tables.

Relational databases are categorized as either Online Transaction Processing (OLTP) or Online
Analytical Processing (OLAP) systems. OLTP systems are used for transaction-heavy applications such
as e-commerce sites, where data is frequently written and updated. OLAP systems, on the other hand,
are used for data analysis and reporting. These databases are optimized for querying large datasets and
are typically found in data warehouses.

Amazon RDS simplifies the setup and management of relational databases by offering support for several
popular database engines. Amazon RDS can also be used for both OLTP and OLAP workloads.

Data Warehouses
A data warehouse serves as a central repository for data gathered from multiple sources. Unlike OLTP
systems, which are updated frequently, data warehouses are often updated in batches and optimized for
querying large amounts of historical data. Data warehouses use the OLAP model for reporting and
analytics.

Organizations often separate their databases into OLTP systems for day-to-day operations and OLAP
systems for data analysis. AWS offers Amazon Redshift as a high-performance data warehouse solution,
capable of handling complex analytical queries on large datasets.

NoSQL Databases
NoSQL databases offer a flexible, scalable solution for applications that require high performance with
large volumes of data. Unlike relational databases, which use rigid table and column structures, NoSQL

Copyrighted Material
AWS Database Services

databases are more flexible and can scale horizontally. They are often used for managing session state,
user profiles, shopping carts, and time-series data.

Amazon DynamoDB, one of the leading NoSQL services offered by AWS, provides seamless scalability
and performance, making it ideal for high-demand applications. It supports key-value and document-
based storage models and can automatically manage distributed clusters across multiple data centres.

NoSQL databases have been embraced by organizations looking to achieve fast performance and
scalability, with popular choices including MongoDB, Cassandra, CouchDB, and Hbase. AWS users can
run any type of NoSQL database on EC2 instances or use Amazon DynamoDB for a fully managed
solution.

Amazon Relational Database Services


Amazon RDS is a service that simplifies the setup, operations, and scaling of a relational database on
AWS. With Amazon RDS, you can spend more time focusing on the application and the schema while
letting Amazon RDS offload common tasks like backups, patching, scaling, and replication. Amazon RDS
helps you streamline the installation of the database software and the provisioning of infrastructure
capacity. Within a few minutes, Amazon RDS can launch one of many popular database engines that are
ready to start taking SQL transactions. After the initial launch, Amazon RDS simplifies ongoing
maintenance by automating common administrative tasks on a recurring basis. With Amazon RDS, you
can accelerate your development timelines and establish a consistent operating model for managing
relational databases.

For example, Amazon RDS makes it easy to replicate your data to increase availability, improve durability,
or scale up or beyond a single database instance for read-heavy database workloads. Amazon RDS
exposes a database endpoint to which client software can connect and execute SQL. However, Amazon
RDS does not provide shell access to Database (DB) Instances and restricts access to certain system
procedures and tables that require advanced privileges. You can typically use the same tools to query,
analyze, modify, and administer the database. Current Extract, Transform, Load (ETL) tools and reporting
tools can connect to Amazon RDS databases in the same way with the same drivers, and often all it takes
to reconfigure is changing the hostname in the connection string.

Amazon RDS was designed by AWS to simplify the management of crucial transactional applications by
providing an easy-to-use platform for setting up, operating, and scaling a relational database in the cloud.
With RDS, laborious administrative tasks such as hardware provisioning, database configuration,
patching, and backups are automated, and a scalable capacity is provided in a cost-efficient manner.
RDS is available on various database instance types, optimized for memory, performance, or I/O, and
supports six well-known database engines, including Amazon Aurora (compatible with MySQL and
PostgreSQL), MySQL, PostgreSQL, MariaDB, SQL Server, and Oracle.

Amazon RDS’s flavours fall into three broad categories:

• Community (Postgres, MySQL, and MariaDB): AWS offers RDS with three different open-source
offerings. This is a good option for development environments, low-usage deployments, defined
workloads, and non-critical applications that can afford some downtime.

• Amazon Aurora (Postgres and MySQL): Postgres and MySQL are available here, as they are in
the community editions. Delivering these applications within the Aurora wrapper can add many
benefits to a community deployment. Amazon started offering the MySQL service in 2014 and
added the Postgres version in 2017. Some of these benefits include:

o Automatic allocation of storage space in 10 GB increments up to 64 TBs

Copyrighted Material
AWS Database Services

o Fivefold performance increase over the vanilla MySQL version

o Automatic six-way replication across availability zones to improve availability and fault
tolerance

• Commercial (Oracle and SQL Server): Many organizations still run Oracle workloads, so AWS
offers RDS with an Oracle flavour (and a Microsoft SQL Server flavour).

Amazon RDS Benefits


Amazon RDS offers multiple benefits as a managed database service provided by AWS, including:

1. Multi-AZ deployment

2. Read replicas

3. Automated backup

4. Database snapshots

5. Data Storage

6. Scalability

7. Monitoring

8. Amazon RDS Performance Insights

9. Security

Multi-AZ Deployments
Multi-AZ deployments in RDS provide improved availability and durability for database instances, making
it an ideal choice for production database workloads. With Multi-AZ DB instances, RDS synchronously
replicates data to a standby instance in a different Availability Zone (AZ) for enhanced resilience. You can
change your environment from Single-AZ to Multi-AZ at any time. Each AZ runs on its distinct,
independent infrastructure and is built to be highly dependable. In the event of an infrastructure failure,
RDS initiates an automatic failover to the standby instance, allowing you to resume database operations
as soon as the failover is complete. Additionally, the endpoint for your DB instance remains the same
after a failover, eliminating manual administrative intervention and enabling your application to resume
database operations seamlessly.

Read Replicas
RDS makes it easy to create read replicas of your database and automatically keeps them in sync with the
primary database (for MySQL, PostgreSQL, and MariaDB engines). Read replicas are helpful for both read
scaling and disaster recovery use cases. You can add read replicas to handle read workloads, so your
master database doesn’t become overloaded with reading requests. Depending on the database engine,
you may be able to position your read replica in a different region than your master, providing you with the
option of having a read location that is closer to a specific locality. Furthermore, read replicas provide an
additional option for failover in case of an issue with the master, ensuring you have coverage in the event
of a disaster.

While both Multi-AZ deployments and read replicas can be used independently, they can also be used
together to provide even greater availability and performance for your database. In this case, you would
create a Multi-AZ deployment for your primary database and then create one or more read replicas of that

Copyrighted Material
AWS Database Services

primary database. This would allow you to benefit from the automatic failover capabilities of Multi-AZ
deployments and the performance improvements provided by read replicas.

Automated Backup
With RDS, a scheduled backup is automatically performed once a day during a time window that you can
specify, and it is monitored as a managed service to ensure its successful completion within the
specified time window. The backups include both your entire instance and transaction logs. The retention
period for your backups can be up to 35 days. While automated backups are available for 35 days, you
can retain longer backups using the manual snapshots feature provided by RDS. RDS keeps multiple
copies of your backup in each AZ where you have an instance deployed to ensure their durability and
availability.

During the automatic backup window, storage I/O might be briefly suspended, which may cause a brief
period of elevated latency. However, no I/O suspension occurs for Multi-AZ DB deployments because the
backup is taken from the standby instance, achieving high performance if your application is time-
sensitive and needs to be always on.

Database Snapshots
You can manually create backups of your instance stored in Amazon S3, which are retained until you
decide to remove them. You can use a database snapshot to create a new instance whenever needed.
Even though database snapshots function as complete backups, you are charged only for incremental
storage usage.

Data Storage
Amazon RDS supports the most demanding database applications by utilizing Amazon Elastic Block
Store (Amazon EBS) volumes for database and log storage. There are two SSD-backed storage options to
choose from:

• A cost-effective general-purpose option

• A high-performance OLTP option

Amazon RDS automatically stripes across multiple Amazon EBS volumes to improve performance based
on the requested storage amount.

Scalability
You can often scale your RDS database compute and storage resources without downtime. You can
choose from over 25 instance types to find the best fit for your CPU, memory, and price requirements. If
you need to scale your database instance up or down, you can do so to handle a higher load or preserve
resources when you have a lower load. This flexibility allows you to control costs if you have regular
periods of high and low usage.

Monitoring
RDS offers a set of 15-18 monitoring metrics that are automatically available for you. You can access
these metrics through the RDS or CloudWatch APIs. These metrics enable you to monitor crucial aspects
such as CPU utilization, memory usage, storage, and latency. You can view the metrics individually or in
multiple graphs, or integrate them into your existing monitoring tool. RDS provides Enhanced Monitoring,
which offers access to more than 50 additional metrics. By enabling Enhanced Monitoring, you can
specify the granularity at which you want to view the metrics, ranging from one-second to sixty-second
intervals. This feature is available for all six database engines supported by RDS.

Copyrighted Material
AWS Database Services

Amazon RDS Performance Insights


It is a performance monitoring tool for Amazon RDS databases. You can view a graphical representation
of your database’s performance over time and detailed performance metrics for specific database
operations. This tool helps you identify any potential performance bottlenecks or issues and take action
to resolve them. It also provides recommendations for improving the performance of your database.
These recommendations are based on best practices and the performance data collected by the tool,
helping you optimize your database configuration and application code to enhance the overall
performance of your application.

Security
Controlling network access to your database is made simple with RDS. You can run your database
instances in Amazon Virtual Private Cloud (Amazon VPC) to isolate them and establish an industry-
standard encrypted IPsec VPN to connect with your existing IT infrastructure. Additionally, most RDS
engine types offer encryption at rest, and all engines support encryption in transit. RDS also offers a wide
range of compliance readiness, including HIPAA eligibility.

Amazon RDS increases the operational reliability of your databases by applying a very consistent
deployment and operational model. This level of consistency is achieved in part by limiting the types of
changes that can be made to the underlying infrastructure and through the extensive use of automation.
For example, with Amazon RDS, you cannot use Secure Shell (SSH) to log in to the host instance and
install a custom piece of software. However, you can connect using SQL administrator tools or use DB
option groups and DB parameter groups to change the behaviour or feature configuration for a DB
Instance. If you want full control of the OS or require elevated permissions to run, then consider installing
your database on Amazon EC2 instead of Amazon RDS.

Amazon RDS – Database Instances


The Amazon RDS service itself provides an Application Programming Interface (API) that lets you create
and manage one or more DB instances. Each DB instance you create will be associated with a DB
parameter group and a DB option group.

• DB Parameter Group: A DB parameter group acts as a container for engine configuration values
that can be applied to your database instance. You can change the behaviour of the DB engine by
modifying the parameters in the group. Once you create a DB parameter group, you can then
associate it with one or more DB instances. Some DB engines expose many configuration options
and require changes to the default DB parameter group.

• DB Option Group: The option group allows you to add and configure additional features for your
DB instances, such as Oracle Enterprise Manager and other database features. These option
groups are not interchangeable between engines, and if you need to switch engines, you will
need to configure a new option group for the new engine type.

If you want more control over your environment than what is provided by Amazon RDS, consider
deploying your database on Amazon EC2 instead.

Amazon RDS – Database Engines


PostgreSQL
PostgreSQL is a widely used open-source database engine that boasts a rich set of features and

Copyrighted Material
AWS Database Services

advanced functionality. Amazon RDS supports DB instances running several versions of PostgreSQL,
including 9.5.x, 9.4.x, and 9.3.x. You can manage Amazon RDS PostgreSQL using standard tools like
pgAdmin, and it supports standard JDBC/ODBC drivers. Additionally, Amazon RDS PostgreSQL supports
Multi-AZ deployment for high availability and read replicas for horizontal scaling.

MariaDB
Amazon RDS provides support for DB instances running MariaDB, which is a popular open-source
database engine developed by the creators of MySQL, enhanced with enterprise tools and functionality.
MariaDB adds features that improve the performance, availability, and scalability of MySQL. AWS
supports MariaDB version 10.0.17. Amazon RDS fully supports the XtraDB storage engine for MariaDB DB
instances and, like Amazon RDS MySQL and PostgreSQL, has support for Multi-AZ deployment and read
replicas.

Oracle
Oracle is one of the most popular relational databases used in enterprises, and it is fully supported by
Amazon RDS. Amazon RDS supports DB instances running several editions of Oracle 11g and Oracle 12c.
You can access schemas on a DB instance using any standard SQL client application, such as Oracle
SQL Plus. Amazon RDS Oracle supports three different editions of the popular database engine: Standard
Edition One, Standard Edition, and Enterprise Edition.

Microsoft SQL Server


Microsoft SQL Server is another very popular relational database used in enterprises. Amazon RDS allows
Database Administrators (DBAs) to connect to their SQL Server DB instance in the cloud using native
tools like SQL Server Management Studio. Amazon RDS provides support for several versions of Microsoft
SQL Server, including SQL Server 2008 R2, SQL Server 2012, and SQL Server 2014. Amazon RDS SQL
Server also supports four different editions of SQL Server: Express Edition, Web Edition, Standard Edition,
and Enterprise Edition.

Amazon RDS – Licensing


Amazon RDS Oracle and Microsoft SQL Server are commercial software products that require
appropriate licenses to operate in the cloud. AWS offers two licensing models:

• License Included: The license is held by AWS and is included in the Amazon RDS instance price.

o Oracle, License Included provides licensing for Standard Edition One.

o SQL Server, License Included provides licensing for SQL Server Express Edition, Web
Edition, and Standard Edition.

• Bring Your Own License (BYOL): You provide your own license.

o For Oracle, you must have an Oracle Database license for the DB instance class and
Oracle Database edition you want to run (Standard Edition One, Standard Edition, and
Enterprise Edition).

o For SQL Server, you provide your own license under the Microsoft License Mobility
program for Microsoft SQL Standard Edition and Enterprise Edition.

Tracking and managing the licenses are allocated.

Copyrighted Material
AWS Database Services

Amazon DynamoDB
DynamoDB is a fully managed, multi-Region, multi-active database that delivers exceptional
performance, with single-digit-millisecond latency, at any scale. It can handle more than 10 trillion daily
requests and support peaks of over 20 million requests per second, making it an ideal choice for internet-
scale applications. DynamoDB offers built-in security, backup and restore features, and in-memory
caching. Its elastic scaling allows for seamless growth as the number of users and required I/O
throughput increases. You pay only for the storage and I/O throughput you provision or on a consumption-
based model if you choose on-demand. DynamoDB also provides fine-grained access control and
support for end-to-end encryption to ensure data security.

Amazon DynamoDB Benefits


• Fully managed

• Supports multi-region deployment

• Multi-master deployment

• Fine-grained identity and access control

• Seamless integration with IAM security

• In-memory caching for fast retrieval

• Supports ACID transactions

• Encrypts all data by default

Amazon DynamoDB Backup Options


DynamoDB provides the option of on-demand backups for archiving data to meet regulatory
requirements. This feature enables you to create full backups of your DynamoDB table’s data.
Additionally, you can enable continuous backups for point-in-time recovery, allowing restoration to any
point in the last 35 days with per-second granularity. All backups are automatically encrypted,
catalogued, and retained until explicitly deleted. DynamoDB is built for high availability and durability. All
writes persisted on an SSD disk and replicated to three availability zones. Reads can be configured as
“strong” or “eventual.” There is no latency trade-off with either configuration; however, the read capacity
is used differently. Amazon DynamoDB Accelerator (DAX) is a managed, highly available, in-memory
cache that offers ten times faster performance.

Amazon DynamoDB – Data Model


The basic components of the Amazon DynamoDB data model include tables, items, and attributes. A
table is a collection of items, and each item is a collection of one or more attributes. Each item also has a
primary key that uniquely identifies it.

DynamoDB is a table-based database. While creating the table, you can specify at least three
components:

• Keys: The key has two parts: a partition key to retrieve the data and a sort of key to sort and
retrieve a batch of data in a given range. For example, a transaction ID can be your primary key,
and the transaction date-time can be the sort key.

Copyrighted Material
AWS Database Services

• WCU: Write capacity unit (1 KB/sec) defines the rate at which you want to write your data in
DynamoDB.

• RCU: Read capacity unit (4 KB/sec) defines the rate at which you want to read from your given
DynamoDB table.

The size of the table automatically increases as you add more items, with a hard limit of item size at 400
KB. As the size increases, the table is automatically partitioned for you. The size and provisioning capacity
of the table are equally distributed for all partitions.

As data is stored in tables, you can think of a table as the database, and within the table, you have items.

Each attribute in an item is a name/value pair. An attribute can be a single-valued or multi-valued set. For
example, a book item can have title and authors attributes. Each book has one title but can have many
authors. The multi-valued attribute is a set, and duplicate values are not allowed. Data is stored in
Amazon DynamoDB in key/value pairs such as the following:

json

Copy code

"Id": 101,

"ProductName": "Book 101 Title",

"ISBN": "123–1234567890",

"Authors": ["Author 1", "Author 2"],

"Price": 2.88,

"Dimensions": "8.5 x 11.0 x 0.5",

"PageCount": 500,

"InPublication": 1,

"ProductCategory": "Book"

}
Applications can connect to the Amazon DynamoDB service endpoint and submit requests over HTTP/S
to read and write items to a table or even create and delete tables. DynamoDB provides a web service API
that accepts requests in JSON format. While you could program directly against the web service API
endpoints, most developers choose to use the AWS Software Development Kit (SDK) to interact with their
items and tables. The AWS SDK is available in many different languages and provides a simplified, high-
level programming interface.

Amazon DynamoDB – Data Types


Amazon DynamoDB offers significant flexibility with your database schema. Unlike a traditional relational
database that requires you to define your column types ahead of time, DynamoDB only requires a primary
key attribute. Each item added to the table can then have additional attributes, allowing you to expand
your schema over time without rebuilding the entire table and dealing with record version differences in
application logic.

Copyrighted Material
AWS Database Services

When you create a table or a secondary index, you must specify the names and data types of each
primary key attribute (partition key and sort key). Amazon DynamoDB supports a wide range of data types
for attributes, which fall into three major categories:

• Scalar Data Types: A scalar type represents exactly one value. Amazon DynamoDB supports the
following five scalar types:

o String: Text and variable-length characters up to 400 KB. Supports Unicode with UTF-8
encoding.

o Number: Positive or negative numbers with up to 38 digits of precision.

o Binary: Binary data, images, and compressed objects up to 400 KB in size.

o Boolean: Binary flag representing a true or false value.

o Null: Represents a blank, empty, or unknown state. String, Number, Binary, and Boolean
cannot be empty.

• Set Data Types: Sets represent a unique list of one or more scalar values. Each value in a set
needs to be unique and must be of the same data type. Sets do not guarantee order. Amazon
DynamoDB supports three set types:

o String Set: Unique list of String attributes.

o Number Set: Unique list of Number attributes.

o Binary Set: Unique list of Binary attributes.

• Document Data Types: This is useful for representing multiple nested attributes, similar to the
structure of a JSON file. Amazon DynamoDB supports two document types:

o List: Each List can store an ordered list of attributes of different data types.

o Map: Each Map can store an unordered list of key/value pairs. Maps can represent the
structure of any JSON object.

Amazon DynamoDB – Primary Key


In a relational database, the primary key uniquely identifies each item in the table. A primary key will point
to exactly one item. Amazon DynamoDB supports two types of primary keys, and this configuration
cannot be changed after a table has been created:

Partition Key
The primary key is made of one attribute, a partition (or hash) key. Amazon DynamoDB builds an
unordered hash index on this primary key attribute.

Partition and Sort Key


The primary key is made of two attributes. The first attribute is the partition key, and the second one is the
sort (or range) key. Each item in the table is uniquely identified by the combination of its partition and sort
key values. Two items can have the same partition key value but must have different sort key values.

Amazon DynamoDB – Capacity


When creating an Amazon DynamoDB table, you are required to provision a certain amount of read and

Copyrighted Material
AWS Database Services

write capacity to handle your expected workloads. Based on your configuration settings, DynamoDB will
provision the right amount of infrastructure capacity to meet your requirements with sustained, low-
latency response times. Overall capacity is measured in read and write capacity units, and these values
can later be scaled up or down by using an UpdateTable action. Each operation against an Amazon
DynamoDB table will consume some of the provisioned capacity units. The specific amount of capacity
units consumed depends largely on the size of the item, as well as other factors. For read operations, the
amount of capacity consumed also depends on the read consistency selected in the request.

For example, a table without a local secondary index will consume 1 capacity unit if you read an item that
is 4KB or smaller. Similarly, for write operations, you will consume 1 capacity unit if you write an item that
is 1KB or smaller. This means that if you read an item that is 110KB, you will consume 28 capacity units,
calculated as 110 / 4 = 27.5 rounded up to 28. For read operations that are strongly consistent, they will
use twice the number of capacity units, or 56 in this example. You can use Amazon CloudWatch to
monitor your Amazon DynamoDB capacity and make scaling decisions. There is a rich set of metrics,
including ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits. If you exceed your
provisioned capacity for a period of time, requests will be throttled and can be retried later. You can
monitor and alert on the ThrottledRequests metric using Amazon CloudWatch to notify you of changing
usage patterns.

Amazon DynamoDB – Secondary Indexes


When you create a table with a partition and sort key (formerly known as a hash and range key), you can
optionally define one or more secondary indexes on that table. A secondary index lets you query the data
in the table using an alternate key, in addition to queries against the primary key. Amazon DynamoDB
supports two different kinds of indexes:

• Global Secondary Index


The global secondary index is an index with a partition and sort key that can be different from
those on the table. You can create or delete a global secondary index on a table at any time.

• Local Secondary Index


The local secondary index is an index that has the same partition key attribute as the primary key
of the table but a different sort key. You can only create a local secondary index when you create
a table.

Secondary indexes allow you to search a large table efficiently and avoid an expensive scan operation to
find items with specific attributes. These indexes enable you to support different query access patterns
and use cases beyond what is possible with only a primary key. While a table can only have one local
secondary index, you can have multiple global secondary indexes. Amazon DynamoDB updates each
secondary index when an item is modified, and these updates consume write capacity units. For a local
secondary index, item updates will consume write capacity units from the main table, while global
secondary indexes maintain their own provisioned throughput settings separate from the table.

If your data size is small enough and you only need to query data based on a different sort key within the
same partition key, you should use an LSI. If the data size is larger, or you need to query data based on
attributes that are not part of the primary key or sort key, you should use a GSI. GSIs come with additional
costs and complexities in terms of provisioned throughput, index maintenance, and eventual
consistency. If an item collection’s data size exceeds 10 GB, the only option is to use a GSI, as an LSI
limits the data size in a particular partition. If eventual consistency is acceptable for your use case, a GSI
can be used, as it is suitable for 99% of scenarios. DynamoDB is very useful in designing serverless event-
driven architecture, allowing you to capture item-level data changes, such as putItem, updateItem, and
delete, by using DynamoDB Streams.

Copyrighted Material
AWS Database Services

Amazon DynamoDB – Writing & Reading Data


Amazon DynamoDB provides multiple operations that let you create, update, and delete individual items.
It also offers various querying options that allow you to search a table or an index or retrieve specific
items or a batch of items.

Writing Items
Amazon DynamoDB provides three primary API actions to create, update, and delete items: PutItem,
UpdateItem, and DeleteItem. The PutItem action creates a new item with one or more attributes and will
update an existing item if the primary key already exists. It requires a table name and a primary key; any
additional attributes are optional. The UpdateItem action finds existing items based on the primary key
and replaces the attributes. This operation is useful for updating a single attribute while leaving other
attributes unchanged, and it can also create items if they don’t already exist. To remove an item from a
table, use DeleteItem and specify a specific primary key. The UpdateItem action also supports atomic
counters, which allow you to increment and decrement a value while ensuring consistency across
multiple concurrent requests. For example, a counter attribute used to track the overall score of a mobile
game can be updated by many clients simultaneously. These three actions support conditional
expressions, allowing you to perform validation before an action is applied. This is useful for preventing
accidental overwrites or enforcing business logic checks.

Reading Items
After an item has been created, it can be retrieved through a direct lookup by calling the GetItem action or
through a search using the Query or Scan action. The GetItem action allows you to retrieve an item based
on its primary key, returning all of the item’s attributes by default, with the option to select individual
attributes to filter results. If a primary key is composed of a partition key, the entire partition key needs to
be specified to retrieve the item. If the primary key is a composite of a partition key and a sort key,
GetItem will require both the partition and sort key as well. Each call to GetItem consumes read capacity
units based on the item size and the consistency option selected. By default, a GetItem operation
performs an eventually consistent read.

Eventual Consistency
When reading items from Amazon DynamoDB, the operation can be either eventually consistent or
strongly consistent. Amazon DynamoDB is a distributed system that stores multiple copies of an item
across an AWS Region to provide high availability and increased durability. When an item is updated, it
starts replicating across multiple servers. Because replication can take some time to complete, we refer
to the data as being eventually consistent, meaning a read request immediately after a write operation
might not show the latest change. In some cases, the application needs to guarantee that the data is the
latest, and Amazon DynamoDB offers an option for strongly consistent reads.

Eventually Consistent Reads


When you read data, the response might not reflect the results of a recently completed write operation,
potentially including stale data. Consistency across all copies of the data is usually reached within a
second; if you repeat your read request after a short time, the response will return the latest data.

Strongly Consistent Reads


When you issue a strongly consistent read request, Amazon DynamoDB returns a response with the most
up-to-date data that reflects updates from all prior related write operations to which Amazon DynamoDB
returned a successful response. However, a strongly consistent read might be less available in the case
of a network delay or outage. You can request a strongly consistent read result by specifying optional
parameters in your request.

Copyrighted Material
AWS Database Services

Amazon DynamoDB – Batch Operations


Amazon DynamoDB provides several operations designed for working with large batches of items,
including BatchGetItem and BatchWriteItem. Using the BatchWriteItem action, you can perform up to 25
item creates or updates with a single operation, minimizing the overhead of each individual call when
processing large numbers of items.

Searching Items
Amazon DynamoDB also offers two operations, Query and Scan, for searching a table or an index. A
Query operation is the primary search operation you can use to find items in a table or a secondary index
using only primary key attribute values. Each Query requires a partition key attribute name and a distinct
value to search. You can optionally provide a sort key value and use a comparison operator to refine the
search results. Results are automatically sorted by the primary key and are limited to 1MB. In contrast to
a Query, a Scan operation will read every item in a table or a secondary index. By default, a Scan
operation returns all data attributes for every item in the table or index. Each request can return up to 1MB
of data, and items can be filtered out using expressions, but this can be a resource-intensive operation. If
the result set for a Query or a Scan exceeds 1MB, you can page through the results in 1MB increments.

Amazon DynamoDB – Scaling & Partitioning


Amazon DynamoDB is a fully managed service that abstracts away most of the complexity involved in
building and scaling a NoSQL cluster. It allows users to create tables that can scale up to hold a virtually
unlimited number of items while maintaining consistent low-latency performance. An Amazon
DynamoDB table can scale horizontally through the use of partitions to meet the storage and
performance requirements of your application. Each individual partition represents a unit of compute and
storage capacity.

A well-designed application will take the partition structure of a table into account to distribute read and
write transactions evenly, achieving high transaction rates at low latencies. Amazon DynamoDB stores
items for a single table across multiple partitions.

Amazon DynamoDB decides which partition to store the item in based on the partition key. The partition
key is used to distribute the new item among all available partitions, ensuring that items with the same
partition key will be stored on the same partition.

When a table is created, Amazon DynamoDB configures the table’s partitions based on the desired read
and write capacity. A single partition can hold about 10 GB of data and supports a maximum of 3,000 read
capacity units or 1,000 write capacity units. For partitions that are not fully utilizing their provisioned
capacity, Amazon DynamoDB provides burst capacity to handle spikes in traffic. A portion of your unused
capacity will be reserved to accommodate short bursts.

As storage or capacity requirements change, Amazon DynamoDB can split a partition to accommodate
more data or higher provisioned request rates. However, once a partition is split, it cannot be merged
back together. This is an important consideration when planning to increase provisioned capacity
temporarily and then lower it again. Each additional partition added will reduce its share of the
provisioned capacity.

Amazon DynamoDB – Security


Amazon DynamoDB provides granular control over access rights and permissions for users and
administrators by integrating with the IAM service. This integration allows for strong control over
permissions through the use of policies. Users can create one or more policies that allow or deny specific

Copyrighted Material
AWS Database Services

operations on specific tables, and conditions can be utilized to restrict access to individual items or
attributes.

All operations must first be authenticated as valid user sessions. Applications that need to read from or
write to Amazon DynamoDB must obtain a set of temporary or permanent access control keys. While
these keys can be stored in configuration files, best practices recommend using IAM Amazon EC2
instance profiles to manage credentials. This method allows applications running on AWS to avoid storing
sensitive keys in configuration files that require additional security measures.

Amazon DynamoDB also supports fine-grained access control, which can restrict access to specific
items within a table or even specific attributes within an item. For instance, it may be necessary to limit a
user’s access to only their items within a table and prevent access to items associated with different
users. By using conditions in an IAM policy, you can restrict which actions a user can perform, on which
tables, and to which attributes a user can read or write.

Copyrighted Material

You might also like