0% found this document useful (0 votes)

32 views32 pages

Aws Q

Uploaded by

Gr M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views32 pages

Aws Q

Uploaded by

Gr M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Copyright © Cloud Certification Store | All Rights Reserved Page 1

PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

AWS Certified Data Engineer Associate DEA-C01 Practice Exam Questions

(AWS-DEA-C01-0010)

© Cloud Certification Store All rights reserved.

Amazon Web Services (AWS) is a registered trademark of Amazon.com, Inc. or its affiliates.

This practice set is an original work for educational use and is NOT endorsed by or affiliated

PY
with Amazon Web Services. “AWS,” “AWS Certified Developer – Associate,” and related marks

are trademarks of Amazon.com, Inc., used here for identification only.

O
DISCLAIMER

C
● This practice test includes questions compiled from various exam preparation
platforms.
● Important:some answers were curated using generative AI with human review
W
Verify accuracy with official documentation before relying on this material.
● Users are strongly encouraged to double-check all content against official
documentation and trusted sources before using it for exam preparation or making
IE

important decisions.
● The creators of this material assume no responsibility for any errors, inaccuracies,
EV

or outcomes, including exam results, based on the use of this content.

● Some questions might be duplicated or close to previous ones. This is done on
purpose as a way to show possible scenarios and to re-inforce your learning.
● Single-user licence only:
PR

○ Includes one unique Payhip Licence Key per purchase, along with a Product Key.
○ Redistribution, resale, or public posting is prohibited. We can trace any file
to the purchaser, with the use of the purchased License Key, Product Key and a
watermark at the top/left corner of each page containing the email of the
purchaser.
Copyright © Cloud Certification Store | All Rights Reserved Page 2
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

AWS Certified Data Engineer Associate DEA-C01

Practice Exam Questions (AWS-DEA-C01-0010)

Earners of the AWS Certified Data Engineer –

PY
Associate certification have an in-depth understanding
of how to use AWS services to implement data pipelines
and to monitor, troubleshoot, and optimize cost and
performance issues in accordance with best practices.

O
Badge owners have technical expertise to understand
the effects of volume, variety, and velocity on data

C
ingestion. They are familiar with transformation,
modeling, security, governance, privacy, schema design,
and optimal data store design.
W
Issued by Amazon Web Services
Training and Certification https://aws.amazon.com/certification/certified-data-engin

eer-associate/
IE

AWS Certified Data Engineer - Associate validates skills and knowledge in core data-related AWS
EV

services, ability to ingest and transform data, orchestrate data pipelines while applying programming
concepts, design data models, manage data life cycles, and ensure data quality.
PR

Schedule your exam

Exam overview

AWS Certified Data Engineer - Associate

65 questions; either multiple choice or multiple response

PY
Cost

150 USD. Visit Exam pricing for additional cost information, including foreign exchange rates

Testing options

O
Pearson VUE testing center or online proctored exam

Languages offered

English, Japanese, Korean, and Simplified Chinese

C
W
Prepare for the exam
IE

Gain confidence by following AWS Skill Builder's 4-step exam prep plan. Enroll in the complete plan
or choose specific courses tailored to your needs, ensuring you're ready for exam day.
EV

1. Get to know the exam with exam-style questions

Follow the 4-step plan.

Review the exam guide.

Take the AWS Certification Official Practice Question Set to understand exam-style questions.

Take the AWS Certification Official Pretest to identify any areas where you need to refresh your AWS
knowledge and skills.

2. Refresh your AWS Knowledge and skills

Enroll in digital courses where you need to fill gaps in knowledge and skills, practice with AWS
Builder Labs, AWS Cloud Quest, and AWS Jam.
Copyright © Cloud Certification Store | All Rights Reserved Page 4
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

3. Review and practice for your exam

Review the scope of the exam. Explore each exam domain’s topics and how they align to AWS
services. Reinforce your knowledge and identify learning gaps with exam-style questions and
flashcards. Follow instructors as they walk through exam-style questions and provide test-taking
strategies. Continue practicing with AWS Builder Labs and/or AWS SimuLearn.

4. Assess your exam readiness

PY
Take the AWS Certification Official Practice Exam.

Key FAQs to help you get started

O
Who should take this exam?

C
The ideal candidate for this exam has the equivalent of 2-3 years of experience in data engineering
or data architecture and a minimum of 1-2 years of hands-on experience with AWS services.
W
How will the AWS Certified Data Engineer - Associate help my career?
IE

This is an in-demand role with a low supply of skilled professionals. AWS Certified Data Engineer -
Associate and accompanying prep resources offer you a means to build your confidence and
credibility in data engineer, data architect, and other data-related roles.
EV

What certification(s) should I earn next after AWS Certified Data Engineer -
Associate?
PR

The AWS Certified Security - Specialty certification is a recommended next step for cloud data
professionals to validate their expertise in cloud data security and governance. View AWS
Certification paths to learn more and plan your AWS Certification journey.

How long is this certification valid for?

This certification is valid for 3 years. Before your certification expires, you can recertify by passing
the latest version of this exam.
Copyright © Cloud Certification Store | All Rights Reserved Page 5
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Practice Questions

Question 1

A company uploads .csv files to an Amazon S3 bucket. The company’s data platform team has

PY
set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The

AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift

O
database appropriately.

C
If the company reruns the AWS Glue job for any reason, duplicate records are introduced into

the Amazon Redshift tables. The company needs a solution that will update the Redshift tables
W
without duplicates.

Which solution will meet these requirements?

A. Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands

to update the existing rows with new values from the staging Redshift table.
EV

B. Modify the AWS Glue job to load the previously inserted data into a MySQL database.

Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift

tables.
PR

C. Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates. Write the data

to the Redshift tables.

D. Use the AWS Glue ResolveChoice built-in transform to select the value of the column from

the most recent record.

✅ Correct answer: A. Modify the AWS Glue job to copy the rows into a staging Redshift
table. Add SQL commands to update the existing rows with new values from the staging

Redshift table.
Copyright © Cloud Certification Store | All Rights Reserved Page 6
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📌 This approach allows controlled upsert operations within Redshift by staging data first and
then merging or updating it, ensuring no duplicates are introduced.

Incorrect answers:

❌ B. Modify the AWS Glue job to load into MySQL first – Adding MySQL as an intermediary
increases complexity, cost, and latency without necessity when Redshift supports direct upserts

PY
from staging tables.

❌ C. Use Apache Spark’s dropDuplicates() API – This only removes duplicates within the
Spark job’s data frame and doesn’t prevent duplication in the persistent Redshift table if data is

O
appended.

❌ D. Use ResolveChoice built-in transform – ResolveChoice is for handling schema

C
conflicts, not for deduplication or merging existing table rows.
W
Question 2
IE

A data engineer notices that Amazon Athena queries are held in a queue before the queries

run.
EV

How can the data engineer prevent the queries from queueing?

A. Increase the query result limit.

B. Configure provisioned capacity for an existing workgroup.

C. Use federated queries.

D. Allow users who run the Athena queries to an existing workgroup.

✅ Correct answer: B. Configure provisioned capacity for an existing workgroup.

📌 Provisioned capacity in Athena ensures queries start immediately by reserving dedicated
resources for the workgroup.
Copyright © Cloud Certification Store | All Rights Reserved Page 7
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Incorrect answers:

❌ A. Increase the query result limit – This controls output size, not concurrency or queuing
behavior.

❌ C. Use federated queries – Federated queries expand data sources Athena can query but
do not address queuing delays.

❌ D. Allow users to an existing workgroup – Simply granting access doesn’t guarantee

PY
faster execution; resource allocation is the bottleneck.

O
Question 3

C
A company ingests data from multiple data sources and stores the data in an Amazon S3

bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the
W
transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to

query the data that is in the data lake.

The company needs to identify matching records even when the records do not have a common

unique identifier.
EV

Which solution will meet this requirement?

A. Use Amazon Macie pattern matching as part of the ETL job.

B. Train and use the AWS Glue PySpark Filter class in the ETL job.

C. Partition tables and use the ETL job to partition the data on a unique identifier.

D. Train and use the AWS Lake Formation FindMatches transform in the ETL job.

✅ Correct answer: D. Train and use the AWS Lake Formation FindMatches transform in
the ETL job.
Copyright © Cloud Certification Store | All Rights Reserved Page 8
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📌 FindMatches uses machine learning to identify duplicate or matching records without a

unique identifier, making it ideal for record linkage tasks.

Incorrect answers:

❌ A. Use Amazon Macie – Macie is for sensitive data detection and classification, not
matching related records.

❌ B. Use PySpark Filter class – The filter class removes rows based on conditions but

PY
doesn’t perform fuzzy matching or deduplication logic.

❌ C. Partition tables on a unique identifier – Partitioning is only effective if a reliable unique

O
key exists, which is not the case here.

Question 4
C
W
A company wants to migrate data from an Amazon RDS for PostgreSQL DB instance in the

eu-east-1 Region of an AWS account named Account_A. The company will migrate the data to
IE

an Amazon Redshift cluster in the eu-west-1 Region of an AWS account named Account_B.

Which solution will give AWS Database Migration Service (AWS DMS) the ability to replicate
EV

data between two data stores?

A. Set up an AWS DMS replication instance in Account_B in eu-west-1.

B. Set up an AWS DMS replication instance in Account_B in eu-east-1.

C. Set up an AWS DMS replication instance in a new AWS account in eu-west-1.

D. Set up an AWS DMS replication instance in Account_A in eu-east-1.

✅ Correct answer: B. Set up an AWS DMS replication instance in Account_B in

eu-east-1.
Copyright © Cloud Certification Store | All Rights Reserved Page 9
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

📌 The replication instance must be in the same Region as the source database to connect
efficiently and replicate to the target in another Region.

Incorrect answers:

❌ A. Instance in eu-west-1 – Placing the replication instance in the target’s Region increases
latency and may fail to connect to the source efficiently.

❌ C. Instance in a new account – This adds unnecessary complexity without benefit.

PY
❌ D. Instance in Account_A in eu-east-1 – This limits management control to the source
account instead of the target account, complicating access and cost management.

O
Question 5

C
A banking company uses an application to collect large volumes of transactional data. The
W
company uses Amazon Kinesis Data Streams for real-time analytics. The company’s application

uses the PutRecord action to send data to Kinesis Data Streams.

A data engineer has observed network outages during certain times of day. The data engineer

wants to configure exactly-once delivery for the entire processing pipeline.

Which solution will meet this requirement?

A. Design the application so it can remove duplicates during processing by embedding a unique
PR

ID in each record at the source.

B. Update the checkpoint configuration of the Amazon Managed Service for Apache Flink data

collection application to avoid duplicate processing of events.

C. Design the data source so events are not ingested into Kinesis Data Streams multiple times.

D. Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache

Spark Streaming in Amazon EMR.

Copyright © Cloud Certification Store | All Rights Reserved Page 10
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ Correct answer: B. Update the checkpoint configuration of the Amazon Managed

Service for Apache Flink data collection application to avoid duplicate processing of

events.

📌 Configuring checkpointing ensures stateful stream processing that supports exactly-once

semantics, preventing duplicate processing during failures or network issues.

PY
Incorrect answers:

❌ A. Embed a unique ID – While this helps deduplication downstream, it doesn’t enforce

exactly-once delivery through the pipeline.

O
❌ C. Prevent multiple ingestion at source – It’s impractical to fully prevent duplicate

C
ingestion in real-world streaming scenarios; handling must occur during processing.

❌ D. Replace with EMR – This is an unnecessary rearchitecture and increases operational

overhead compared to enabling checkpointing.
W
IE

Question 6

A data engineer set up an AWS Lambda function to read an object that is stored in an Amazon
EV

S3 bucket. The object is encrypted by an AWS KMS key.

The data engineer configured the Lambda function’s execution role to access the S3 bucket.
PR

However, the Lambda function encountered an error and failed to retrieve the content of the

object.

What is the likely cause of the error?

A. The data engineer misconfigured the permissions of the S3 bucket. The Lambda function

could not access the object.

B. The Lambda function is using an outdated SDK version, which caused the read failure.
Copyright © Cloud Certification Store | All Rights Reserved Page 11
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

C. The S3 bucket is located in a different AWS Region than the Region where the data

engineer works. Latency issues caused the Lambda function to encounter an error.

D. The Lambda function’s execution role does not have the necessary permissions to access

the KMS key that can decrypt the S3 object.

✅ Correct answer: D. The Lambda function’s execution role does not have the

PY
necessary permissions to access the KMS key that can decrypt the S3 object.

📌 Even if the Lambda role has S3 access, it must also have kms:Decrypt permission for the
key to read encrypted content.

O
Incorrect answers:

C
❌ A. Misconfigured bucket permissions – The Lambda could still access metadata; the
failure here is due to encryption key permissions, not bucket policy.

❌ B. Outdated SDK – SDK versions rarely cause decryption permission errors; the root issue
W
is IAM/KMS access.

❌ C. Region latency – Cross-Region reads may be slower, but won’t directly cause
IE

permission-based decryption failures.

Question 7
PR

A company receives test results from testing facilities that are located around the world. The

company stores the test results in millions of 1 KB JSON files in an Amazon S3 bucket. A data

engineer needs to process the files, convert them into Apache Parquet format, and load them

into Amazon Redshift tables. The data engineer uses AWS Glue to process the files, AWS Step

Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

The company recently added more testing facilities. The time required to process files is

increasing. The data engineer must reduce the data processing time.
Copyright © Cloud Certification Store | All Rights Reserved Page 12
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Which solution will MOST reduce the data processing time?

A. Use AWS Lambda to group the raw input files into larger files. Write the larger files back to

Amazon S3. Use AWS Glue to process the files. Load the files into the Amazon Redshift tables.

B. Use the AWS Glue dynamic frame file-grouping option to ingest the raw input files. Process

the files. Load the files into the Amazon Redshift tables.

PY
C. Use the Amazon Redshift COPY command to move the raw input files from Amazon S3

directly into the Amazon Redshift tables. Process the files in Amazon Redshift.

D. Use Amazon EMR instead of AWS Glue to group the raw input files. Process the files in

Amazon EMR. Load the files into the Amazon Redshift tables.

O
✅ Correct answer: B. Use the AWS Glue dynamic frame file-grouping option to ingest
the raw input files.

C
📌 Dynamic frame file grouping reduces the number of small files processed individually,
W
minimizing overhead and improving Glue job performance.
IE

Incorrect answers:

❌ A. Group files in Lambda first – While possible, it adds complexity and cost compared to
EV

Glue’s built-in grouping.

❌ C. Load raw files directly to Redshift – COPY works best with fewer large files, not
millions of tiny files; performance would suffer.

❌ D. Use EMR instead – Switching to EMR adds more management overhead when Glue
PR

already has a native feature for this.

Question 8

A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access

several log files, the data engineer discovers that some files have been unintentionally deleted.
Copyright © Cloud Certification Store | All Rights Reserved Page 13
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

The data engineer needs a solution that will prevent unintentional file deletion in the future.

Which solution will meet this requirement with the LEAST operational overhead?

A. Manually back up the S3 bucket on a regular basis.

B. Enable S3 Versioning for the S3 bucket.

C. Configure replication for the S3 bucket.

PY
D. Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket.

✅ Correct answer: B. Enable S3 Versioning for the S3 bucket.

📌 S3 Versioning retains previous versions of objects, enabling recovery from accidental

O
deletion without complex workflows.

Incorrect answers:
C
❌ A. Manual backups – Labor-intensive and prone to human error.
W
❌ C. Replication – Protects against Region-level disasters, not accidental deletion unless
IE

combined with versioning.

❌ D. Glacier storage – Glacier is for archival; it doesn’t inherently prevent deletions.

Question 9
PR

A retail company stores data from a product lifecycle management (PLM) application in an

on-premises MySQL database. The PLM application frequently updates the database when

transactions occur.

The company wants to gather insights from the PLM application in near real time. The company

wants to integrate the insights with other business datasets and to analyze the combined

dataset by using an Amazon Redshift data warehouse.

Copyright © Cloud Certification Store | All Rights Reserved Page 14
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

The company has already established an AWS Direct Connect connection between the

on-premises infrastructure and AWS.

Which solution will meet these requirements with the LEAST development effort?

A. Run a scheduled AWS Glue ETL job using JDBC to pull MySQL updates into Redshift.

B. Run a full load plus CDC task in AWS Database Migration Service (AWS DMS) to

PY
continuously replicate MySQL changes into Redshift.

C. Use Amazon AppFlow SDK to build a custom connector for MySQL and send changes to

Redshift.

O
D. Run scheduled AWS DataSync tasks to sync MySQL data into Redshift.

✅ Correct answer: B. Run a full load plus CDC task in AWS Database Migration Service
C
(AWS DMS) to continuously replicate MySQL changes into Redshift.

📌 AWS DMS supports full load plus ongoing change data capture (CDC) to keep Redshift
W
updated with minimal custom coding.
IE

Incorrect answers:

❌ A. Glue ETL job – This would require scheduling and does not offer real-time replication.
EV

❌ C. AppFlow SDK – No native MySQL connector; requires significant custom coding.

❌ D. DataSync – Geared towards file transfers, not database transaction replication.
PR

Question 10

A data engineer maintains a materialized view that is based on an Amazon Redshift database.

The view has a column named load_date that stores the date when each row was loaded.

The data engineer needs to reclaim database storage space by deleting all the rows from the

materialized view.
Copyright © Cloud Certification Store | All Rights Reserved Page 15
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Which command will reclaim the MOST database storage space?

A. DELETE FROM materialized_view_name WHERE 1=1

B. TRUNCATE materialized_view_name

C. VACUUM table_name WHERE load_date<=current_date materializedview

D. DELETE FROM materialized_view_name WHERE load_date<=current_date

✅ Correct answer: B. TRUNCATE materialized_view_name.

PY
📌 TRUNCATE removes all rows efficiently and releases storage space immediately compared
to DELETE, which marks rows for deletion.

O
Incorrect answers:

C
❌ A. DELETE all rows – Leaves storage allocated until VACUUM is run.
❌ C. VACUUM with WHERE – VACUUM reorganizes space but does not inherently remove
W
all rows.

❌ D. DELETE with condition – Partial delete still leaves storage fragments until a vacuum
IE

occurs.
EV

Question 11

A company wants to migrate an application and an on-premises Apache Kafka server to AWS.
PR

The application processes incremental updates that an on-premises Oracle database sends to

the Kafka server. The company wants to use the replatform migration strategy instead of the

refactor strategy.

Which solution will meet these requirements with the LEAST management overhead?

A. Amazon Kinesis Data Streams

B. Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned cluster
Copyright © Cloud Certification Store | All Rights Reserved Page 16
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

C. Amazon Kinesis Data Firehose

D. Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless

✅ Correct answer: D. Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Serverless.

📌 MSK Serverless removes the need to manage cluster infrastructure, while still offering Kafka

PY
compatibility for a replatformed workload.

Incorrect answers:

❌ A. Kinesis Data Streams – Different API; would require refactoring the application away

O
from Kafka.

❌ B. MSK provisioned cluster – Requires cluster sizing and scaling management.

C
❌ C. Kinesis Data Firehose – Focused on data delivery to sinks, not full Kafka-compatible
W
streaming.
IE

Question 12
EV

A company ingests data into Amazon Managed Streaming for Apache Kafka (Amazon MSK)

and writes it to Amazon Keyspaces, Amazon OpenSearch Service, and Avro objects in Amazon

S3. The company needs the data visualizations to have the lowest possible latency.
PR

Which solution will achieve this?

A. Create OpenSearch Dashboards using data from OpenSearch Service.

B. Use Amazon Athena with a Hive metastore to query the Avro objects in S3 and connect

Athena to Grafana.

C. Use Athena to query Avro objects in S3, configure Keyspaces as the data catalog, and
Copyright © Cloud Certification Store | All Rights Reserved Page 17
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

connect QuickSight to Athena.

D. Use AWS Glue to catalog Avro objects and S3 Select to query them for QuickSight.

✅ Correct answer: A. Create OpenSearch Dashboards using data from OpenSearch

Service.

📌 Data is already in OpenSearch Service, which supports low-latency visualization directly with

PY
OpenSearch Dashboards.

Incorrect answers:

❌ B. Athena with Hive metastore – Adds query latency unsuitable for real-time dashboards.

O
❌ C. Athena with Keyspaces catalog – Still query-based, not real-time from OpenSearch.
❌ D. S3 Select – Optimized for object-level queries, not streaming dashboards.
C
W
Question 13
IE

A company has implemented a lake house architecture in Amazon Redshift and needs to give

users the ability to authenticate into Redshift query editor using a third-party identity provider
EV

(IdP).

What is the first step the data engineer should take?

A. Register the third-party IdP as an identity provider in the Redshift cluster configuration.

B. Register the third-party IdP from within Amazon Redshift.

C. Register the third-party IdP for AWS Secrets Manager and configure Redshift to use it for

credentials.

D. Register the third-party IdP for AWS Certificate Manager (ACM) and configure Redshift to

use ACM for credentials.

Copyright © Cloud Certification Store | All Rights Reserved Page 18
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ Correct answer: A. Register the third-party IdP as an identity provider in the Redshift
cluster configuration.

📌 The IdP must be integrated at the cluster level for authentication to work with query editor
SSO.

Incorrect answers:

PY
❌ B. Register within Redshift – There is no internal Redshift-only registration; it’s done at the
cluster config level.

❌ C. Secrets Manager – Stores credentials, but doesn’t perform federated IdP authentication.

O
❌ D. ACM – Manages certificates, not identity provider integration.

Question 14 C
W
A company is using an AWS Glue crawler to catalog data in an S3 bucket containing both .csv
IE

and .json files. The crawler is configured to exclude .json files, yet Athena queries still

process them. The data engineer wants the fastest queries without losing .csv access.
EV

Which solution meets the requirement?

A. Adjust the Glue crawler settings to ensure .json files are excluded.
PR

B. Use the Athena console to exclude .json files in queries.

C. Relocate .json files to a different path in the S3 bucket.

D. Use S3 bucket policies to block .json file access.

✅ Correct answer: C. Relocate .json files to a different path in the S3 bucket.

📌 Athena queries scan files based on S3 paths in the table definition; moving .json files out
of the query path prevents unnecessary scanning.
Copyright © Cloud Certification Store | All Rights Reserved Page 19
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Incorrect answers:

❌ A. Adjust crawler – Already configured; this issue is about query scope, not cataloging.
❌ B. Console exclusion – Manual per-query filtering still scans .json data, increasing
cost/time.

❌ D. S3 policies – Would also block legitimate .json access if needed elsewhere.

PY
Question 15

O
A company uses Amazon Redshift to store employee data. The Employee table uses Region

C
ID, Department ID, and Role ID as a compound sort key. Which queries will benefit most

from the compound sort key? (Choose two.)

W
A. SELECT * FROM Employee WHERE Region ID='North America';

B. SELECT * FROM Employee WHERE Region ID='North America' AND

Department ID=20;

C. SELECT * FROM Employee WHERE Department ID=20 AND Region ID='North

America';

D. SELECT * FROM Employee WHERE Role ID=50;

E. SELECT * FROM Employee WHERE Region ID='North America' AND Role

ID=50;

✅ Correct answers: A and B.

📌 Compound sort keys benefit most when queries filter starting with the first sort key and,
optionally, subsequent keys in order.

❌ C. Department first – Sort keys are ordered; filtering on the second key first doesn’t
optimize.

❌ D. Role only – Skips both leading keys; no benefit.

❌ E. Region + Role – Skips second key, limiting optimization.

PY
Question 16

A data engineer needs Athena queries to finish faster. Data is in uncompressed .csv format,

O
and most queries filter on a specific column.

Which solution will improve performance most?

A. Change format to JSON with Snappy compression.

B. Compress .csv with Snappy.

C
W
C. Change format to Apache Parquet with Snappy compression.

D. Compress .csv with gzip.

✅ Correct answer: C. Change format to Apache Parquet with Snappy compression.

📌 Parquet is columnar, reducing scanned data when filtering on specific columns, and Snappy
offers efficient compression.
PR

Incorrect answers:

❌ A. JSON with Snappy – JSON is row-based, less efficient than columnar for analytics.
❌ B. CSV with Snappy – Compression helps storage, but CSV remains row-based.
❌ D. CSV with gzip – Compression helps storage, not scan efficiency.

A company plans to use Amazon Kinesis Data Firehose to store 2 MB .csv files in S3,

converting them first to JSON, then to Apache Parquet.

Which option meets the requirements with the least development effort?

A. Firehose converts .csv to JSON, then Lambda stores in Parquet.

B. Firehose converts .csv to JSON and stores directly in Parquet.

PY
C. Firehose invokes Lambda to convert to JSON and store in Parquet.

D. Firehose invokes Lambda to convert to JSON, then Firehose stores in Parquet.

✅ Correct answer: B. Firehose converts .csv to JSON and stores directly in Parquet.

O
📌 Firehose supports direct transformation to Parquet, minimizing the need for custom Lambda
functions.
C
W
Incorrect answers:

❌ A, C, D – All add Lambda unnecessarily, increasing complexity.

IE
EV

Question 18

A data engineer is building an ETL pipeline in AWS Glue to process compressed files in S3. The

pipeline must support incremental data processing.

Which Glue feature should be used?

A. Workflows

B. Triggers

C. Job bookmarks

✅ Correct answer: C. Job bookmarks.

📌 Job bookmarks track previously processed data, enabling incremental processing.
Incorrect answers:

❌ A. Workflows – Orchestrate multiple jobs, not incremental tracking.

❌ B. Triggers – Control execution timing, not data state.

PY
❌ D. Classifiers – Identify schema, not process state.

O
Question 19

C
A data engineer is configuring an AWS Glue job to read from an S3 bucket. The Glue job fails

due to S3 VPC gateway endpoint issues.

W
Which action resolves this?
IE

A. Update Glue security group to allow inbound from S3 gateway endpoint.

B. Add S3 bucket policy to allow Glue access.

C. Ensure Glue connection uses a fully qualified domain name.

D. Verify VPC route table has routes for the S3 gateway endpoint.

✅ Correct answer: D. Verify VPC route table has routes for the S3 gateway endpoint.
PR

📌 S3 gateway endpoint traffic must be routed correctly in the VPC for Glue to connect.
Incorrect answers:

❌ A. Security group inbound – Gateway endpoints use routing, not inbound SG rules.
❌ B. Bucket policy – Might be needed, but the error here is connectivity, not permissions.
❌ C. Fully qualified domain name – Not relevant to endpoint routing.
Copyright © Cloud Certification Store | All Rights Reserved Page 23
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

Question 20

A data engineer is processing terabytes of raw data in S3, preparing it for Redshift analytics,

and wants to avoid complex ETL or infrastructure management.

PY
Which solution meets this with least overhead?

A. EMR for prep, Step Functions to load into Redshift, QuickSight for queries.

B. Glue DataBrew for prep, Glue to load into Redshift, query in Redshift.

O
C. Lambda for prep, Kinesis Firehose to load into Redshift, Athena for queries.

D. Glue for prep, DMS to load into Redshift, Redshift Spectrum for queries.

C
✅ Correct answer: B. Glue DataBrew for prep, Glue to load into Redshift, query in
Redshift.
W
📌 DataBrew is serverless, easy for data prep, Glue ETL loads data efficiently, and Redshift
IE

supports complex queries without extra infra.

Incorrect answers:
EV

❌ A. EMR + Step Functions – Adds unnecessary complexity and cluster management.

❌ C. Lambda + Firehose – Not ideal for large batch analytics.
❌ D. Glue + DMS – DMS is for replication, not batch ETL.
PR

(END OF PREVIEW QUESTIONS)

This is a Preview Copy. Get the Full Version at

𝐅𝐢𝐧𝐚𝐥 𝐑𝐞𝐯𝐢𝐞𝐰 𝐂𝐡𝐞𝐜𝐤𝐥𝐢𝐬𝐭 & 𝐄𝐱𝐚𝐦 𝐑𝐞𝐚𝐝𝐢𝐧𝐞𝐬𝐬 𝐒𝐜𝐨𝐫𝐞𝐜𝐚𝐫𝐝

✅ How to Use the Final Review Checklist

This section is meant to validate your hands-on skills and theoretical readiness across all
exam topics.

PY
Step-by-step:

1. Print it or load it in a note-taking app (Notion, Google Docs, OneNote, etc.).

O
2. Go through each checkbox:

○ ✅ Check it if you fully understand and can implement the topic without
C
looking up documentation.

○ ❌ Leave it unchecked if you feel unsure or haven't practiced the task.

W
3. Prioritize unchecked topics by reviewing:

○ Check the official documentation

○ Practice exams
EV

○ Hands-on labs

4. For each unchecked item, write a short action plan or resource link next to it.
PR

📈 How to Use the Exam Readiness Scorecard

This part helps you self-assess your confidence level and focus your revision time wisely.

Instructions:

1. For each domain (e.g., "Hybrid connectivity and routing"), rate yourself from 1 to 5:
Copyright © Cloud Certification Store | All Rights Reserved Page 25
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

○ 1️⃣ = No understanding or hands-on practice

○ 3️⃣ = Moderate familiarity, but need review

○ 5️⃣ = Mastered topic and can apply it in real-world use

2. Add Notes / Action Items to explain:

○ Why you scored yourself low

PY
○ What resources you'll use to improve (YouTube, whitepapers, exam guides)

○ Practice test scores if relevant

O
3. Reassess 2–3 days before your exam, and compare scores to measure improvement.

🧠 Bonus Tips C
W
● Do timed mock exams and cross-reference errors with checklist topics

● Use the scorecard to simulate an exam debrief: where did you fail? What must you
IE

strengthen?

● Once all checklist items are ✅ and all categories are at 4–5 stars and you're
EV

🎯
consistently scoring 85%+ on full practice exams with confidence in scenario-based
reasoning, then you’re likely ready to book the real exam.
PR
Copyright © Cloud Certification Store | All Rights Reserved Page 26
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

✅ 𝐅𝐢𝐧𝐚𝐥 𝐑𝐞𝐯𝐢𝐞𝐰 𝐂𝐡𝐞𝐜𝐤𝐥𝐢𝐬𝐭

📦 Data Ingestion & Transformation
● Configure data ingestion with Kinesis Data Streams, Kinesis Data Firehose, and AWS
Glue crawlers

PY
● Implement ETL/ELT pipelines using AWS Glue, AWS Data Pipeline, and Step
Functions

● Handle streaming vs batch ingestion patterns

O
● Optimize transformations using PySpark in Glue or EMR

🗄 Data Storage & Management

C
● Choose optimal storage between S3, Redshift, DynamoDB, and RDS/Aurora
W
● Implement S3 lifecycle policies, object locking, and versioning

● Understand Redshift distribution keys, sort keys, and compression

● Partition and bucket data for performance in Athena and Glue

📊 Data Analysis & Querying

● Use Amazon Athena for serverless querying of S3 data

● Connect BI tools to Redshift, Athena, or RDS

● Optimize queries with proper joins, filters, and partitions

● Integrate AWS QuickSight for dashboards and visual analytics

🔄 Data Movement & Integration

● Replicate data with DMS (Database Migration Service)
Copyright © Cloud Certification Store | All Rights Reserved Page 27
PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

● Transfer and transform datasets between AWS regions/accounts

● Set up cross-account access for data sharing

● Integrate AWS with on-premises and third-party data sources

🔐 Security & Compliance

PY
● Configure IAM roles, policies, and least privilege access

● Use KMS for encryption at rest and TLS for in transit

● Apply Lake Formation for fine-grained permissions

O
● Enable CloudTrail, CloudWatch, and S3 access logs for auditing

🛠 Operations, Monitoring & Optimization

C
W
● Monitor pipelines with CloudWatch metrics, Glue job metrics, and Redshift
monitoring

● Optimize cost with data compression, tiered storage, and spot instances
IE

● Troubleshoot slow queries and failed jobs

● Automate housekeeping tasks with Lambda and EventBridge

📈 𝐄𝐱𝐚𝐦 𝐑𝐞𝐚𝐝𝐢𝐧𝐞𝐬𝐬 𝐒𝐜𝐨𝐫𝐞𝐜𝐚𝐫𝐝

Domain Confidence (1–5) Notes / Action Items

PY
Data ingestion and transformation ☐1☐2☐3☐4☐

O
5

Data storage and management ☐1☐2☐3☐4☐

5 C
W
Data querying and analysis ☐1☐2☐3☐4☐
IE

5
EV

Data movement and integration ☐1☐2☐3☐4☐

5
PR

Security and compliance ☐1☐2☐3☐4☐

Monitoring and optimization ☐1☐2☐3☐4☐

Time management (130-min ☐1☐2☐3☐4☐ Practice 65Q timed

pacing) 5 set

🎯 You’re exam-ready when:

● Each domain scores 4 stars or more

● You consistently score 80–85%+ on practice tests

● You can confidently explain both what AWS GenAI tools do and how to apply them

responsibly

💫 Congratulations!! You are on the right path to certification.

All of our practice exams include 300 + questions. This is only a Preview Copy, you can get the
full version here at https://cloudcertificationstore.com/b/5UMiy

Our writers who have taken the exam recently—and the reviewers who purchased these
materials—agree that over 90 % of the questions matched what they saw on the live test.

Invest in your future: browse the full catalogue of Cloud practice exams at our store

PY
O
C
W
IE
EV
PR

AWS T&C P2C Datasheet DataAnalytics Specialty
No ratings yet
AWS T&C P2C Datasheet DataAnalytics Specialty
2 pages
AWS Certified Solutions Architect Associate Training Notes B08BCLWMDW
No ratings yet
AWS Certified Solutions Architect Associate Training Notes B08BCLWMDW
76 pages
Demonstrate Your AWS Expertise by Getting AWS Certified: Available Exams Currently Include
No ratings yet
Demonstrate Your AWS Expertise by Getting AWS Certified: Available Exams Currently Include
4 pages
AWS Certified SysOps Administrator Practice Tests 2021 AWS Exam Difficulty Practice Questions With A - 1
100% (2)
AWS Certified SysOps Administrator Practice Tests 2021 AWS Exam Difficulty Practice Questions With A - 1
353 pages
AWS Certified Data Engineer Associate - Exam Guide
No ratings yet
AWS Certified Data Engineer Associate - Exam Guide
19 pages
AWS DE Certification Guide 1728124415
No ratings yet
AWS DE Certification Guide 1728124415
112 pages
AWS Certified Data Engineer Associate - Exam Guide
No ratings yet
AWS Certified Data Engineer Associate - Exam Guide
19 pages
Prerequisites: Aws Certified Devops Engineer - Professional Exam Blueprint
No ratings yet
Prerequisites: Aws Certified Devops Engineer - Professional Exam Blueprint
3 pages
10-AWS Certified Cloud Practitioner Basics InstructorDeck
No ratings yet
10-AWS Certified Cloud Practitioner Basics InstructorDeck
29 pages
Introduction To AWS Certifications - Cloud Academy 1
No ratings yet
Introduction To AWS Certifications - Cloud Academy 1
21 pages
The Ultimate AWS Solutions Architect Certification Guide PDF
No ratings yet
The Ultimate AWS Solutions Architect Certification Guide PDF
8 pages
AWS Certified Data Engineer - Associate
No ratings yet
AWS Certified Data Engineer - Associate
2 pages
AWS Certified Solutions Architect Associate Exam: The Shortest Path To Success
No ratings yet
AWS Certified Solutions Architect Associate Exam: The Shortest Path To Success
1,099 pages
Finally CLF C01 AWS Certified Cloud Prac
No ratings yet
Finally CLF C01 AWS Certified Cloud Prac
3 pages
AWS Exam Prep for IT Professionals
No ratings yet
AWS Exam Prep for IT Professionals
3 pages
65 Questions
100% (1)
65 Questions
298 pages
AWS Developer Exam Update 2023
No ratings yet
AWS Developer Exam Update 2023
10 pages
AWS DevOps Engineer Exam Guide
No ratings yet
AWS DevOps Engineer Exam Guide
3 pages
AWS Academy and AWS Certification Journey
No ratings yet
AWS Academy and AWS Certification Journey
1 page
Data Engineer Associate
0% (1)
Data Engineer Associate
3 pages
Module 11 - AWS Certified Cloud Practitioner Basics
No ratings yet
Module 11 - AWS Certified Cloud Practitioner Basics
3 pages
AWS Certified Solutions Architect Associate Exam Guide - v1.1 - 2019 - 08 - 27 - FINAL PDF
No ratings yet
AWS Certified Solutions Architect Associate Exam Guide - v1.1 - 2019 - 08 - 27 - FINAL PDF
2 pages
AWS Certification Paths
No ratings yet
AWS Certification Paths
1 page
AWS Certified Database Specialty Practice Exam
No ratings yet
AWS Certified Database Specialty Practice Exam
16 pages
Essential Amazon CLF-C02 Exam Questions and Answers PDF For Cloud Practitioner Success
No ratings yet
Essential Amazon CLF-C02 Exam Questions and Answers PDF For Cloud Practitioner Success
5 pages
AmazonAWS Certified Developer AssociateDumps SecretHacksToCrackAWS Certified Developer AssociateExam
No ratings yet
AmazonAWS Certified Developer AssociateDumps SecretHacksToCrackAWS Certified Developer AssociateExam
5 pages
AWS Certified Cloud Practitioner Study Guide - April 2023
100% (1)
AWS Certified Cloud Practitioner Study Guide - April 2023
125 pages
Course 18 AWS DataEng
No ratings yet
Course 18 AWS DataEng
6 pages
AWS Solutions Architect Exam Guide
No ratings yet
AWS Solutions Architect Exam Guide
47 pages
AWS Certified Cloud Practitioner
No ratings yet
AWS Certified Cloud Practitioner
5 pages
AWS CCP Practice Exam 6 - Digital Cloud Training
No ratings yet
AWS CCP Practice Exam 6 - Digital Cloud Training
84 pages
Latest Amazon AWS-Certified-Cloud-Practitioner Dumps Questions
50% (4)
Latest Amazon AWS-Certified-Cloud-Practitioner Dumps Questions
8 pages
Udemy AWS - Certified - Cloud - Practitioner - 2018 - PR PDF
No ratings yet
Udemy AWS - Certified - Cloud - Practitioner - 2018 - PR PDF
4 pages
Tips For Your Certification - Technical (Spanish) PDF
100% (1)
Tips For Your Certification - Technical (Spanish) PDF
42 pages
AWS Certified Developer Associate Study Plan PDF
No ratings yet
AWS Certified Developer Associate Study Plan PDF
6 pages
AWS Certified Data Engineer - Associate
No ratings yet
AWS Certified Data Engineer - Associate
2 pages
AWS Database Specialty Exam Guide
No ratings yet
AWS Database Specialty Exam Guide
8 pages
AWS Certification & Career Guide
No ratings yet
AWS Certification & Career Guide
37 pages
AWS Certification Guide for Career Paths
No ratings yet
AWS Certification Guide for Career Paths
1 page
Amazon: Exam Questions AWS-Certified-Cloud-Practitioner
No ratings yet
Amazon: Exam Questions AWS-Certified-Cloud-Practitioner
11 pages
ACAv3 EN M17 BridgingToCertification Instructor Deck
No ratings yet
ACAv3 EN M17 BridgingToCertification Instructor Deck
29 pages
AWS Certified Solutions Architect - Associate (SAA-C02) Exam Guide
No ratings yet
AWS Certified Solutions Architect - Associate (SAA-C02) Exam Guide
2 pages
AWS Certification Guide
No ratings yet
AWS Certification Guide
4 pages
AWS Certified Solutions Architect Associate-Exam Guide EN 1.8 PDF
No ratings yet
AWS Certified Solutions Architect Associate-Exam Guide EN 1.8 PDF
3 pages
AWS Solutions Architect Exam Guide
No ratings yet
AWS Solutions Architect Exam Guide
3 pages
AWS Solutions Architect Exam Guide
No ratings yet
AWS Solutions Architect Exam Guide
3 pages
AWS Certified Solutions Architect Associate-Exam Guide en 1.8
No ratings yet
AWS Certified Solutions Architect Associate-Exam Guide en 1.8
3 pages
AWS Certified Solutions Architect Associate-Exam Guide en 1.8
No ratings yet
AWS Certified Solutions Architect Associate-Exam Guide en 1.8
3 pages
AWS Certified Solutions Architect Associate-Exam Guide 1.8
No ratings yet
AWS Certified Solutions Architect Associate-Exam Guide 1.8
3 pages
AWS DevOps Engineer Certification Guide
No ratings yet
AWS DevOps Engineer Certification Guide
4 pages
Fast Access Testbank AWS Certified Developer Associate AllinOne Exam Guide Exam DVAC01 1st Edition
No ratings yet
Fast Access Testbank AWS Certified Developer Associate AllinOne Exam Guide Exam DVAC01 1st Edition
330 pages
AWS Certified Solutions Architect - Associate - SAA-C02 - Marks4Sure - Mansoor
No ratings yet
AWS Certified Solutions Architect - Associate - SAA-C02 - Marks4Sure - Mansoor
3 pages
Introduction To AWS
No ratings yet
Introduction To AWS
14 pages
AWS Certification Practice Exams - AWS Certified Cloud Practitioner - Digital Cloud Training
No ratings yet
AWS Certification Practice Exams - AWS Certified Cloud Practitioner - Digital Cloud Training
87 pages
Module 14 - Bridging To Certification
No ratings yet
Module 14 - Bridging To Certification
25 pages
DPC_EXT Class: Methods & Parameters
No ratings yet
DPC_EXT Class: Methods & Parameters
7 pages
Workload For Infra
No ratings yet
Workload For Infra
32 pages
BitLocker Drive Encryption Step-By-Step
No ratings yet
BitLocker Drive Encryption Step-By-Step
18 pages
Ls 5 Big Data Visualization
No ratings yet
Ls 5 Big Data Visualization
7 pages
Big Data - 2 Marks-1
No ratings yet
Big Data - 2 Marks-1
1 page
Data Mining and Data Warehousing: Unit - III Association Rules
No ratings yet
Data Mining and Data Warehousing: Unit - III Association Rules
19 pages
Big Data Technologies - PGDBDA - Feb20
No ratings yet
Big Data Technologies - PGDBDA - Feb20
12 pages
DBA Database Management Checklist Written by
No ratings yet
DBA Database Management Checklist Written by
3 pages
Assignment 1
100% (1)
Assignment 1
19 pages
Real Google Cloud Associate Data Practitioner Study Questions by Brady
No ratings yet
Real Google Cloud Associate Data Practitioner Study Questions by Brady
8 pages
Debremarkos University Burie Campas: Department of Computer Sience Database Lab Mannul
No ratings yet
Debremarkos University Burie Campas: Department of Computer Sience Database Lab Mannul
21 pages
SQL Server 2008 Replication Technical Case Study
No ratings yet
SQL Server 2008 Replication Technical Case Study
44 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Using PowerShell To Delete Files With Remove-Item and WMI
No ratings yet
Using PowerShell To Delete Files With Remove-Item and WMI
28 pages
ALL VMCE 2020 - 2021 Exams Dumps v2.0
No ratings yet
ALL VMCE 2020 - 2021 Exams Dumps v2.0
158 pages
Database Systems Laboratory Exercises Lab2
No ratings yet
Database Systems Laboratory Exercises Lab2
2 pages
Horizon Europe DMP v5
No ratings yet
Horizon Europe DMP v5
12 pages
DBMS Multiple Choice Questions
100% (3)
DBMS Multiple Choice Questions
28 pages
HDHacker: MBR and Boot Sector Utility
No ratings yet
HDHacker: MBR and Boot Sector Utility
2 pages
Compare Cloud
No ratings yet
Compare Cloud
1 page
Dot Net Question Paper
No ratings yet
Dot Net Question Paper
6 pages
Class 12 Computer Science Syllabus
No ratings yet
Class 12 Computer Science Syllabus
1 page
Laundry ERP - Mehek
No ratings yet
Laundry ERP - Mehek
86 pages
Data Structures Notes Unit-2
No ratings yet
Data Structures Notes Unit-2
60 pages
12 3rd Term Scheme - 7c6cf0fd Cfa4 4a83 A271 Ffdc39b69f95
No ratings yet
12 3rd Term Scheme - 7c6cf0fd Cfa4 4a83 A271 Ffdc39b69f95
6 pages
Jawaharlal Nehru Engineering College: Laboratory Manual
No ratings yet
Jawaharlal Nehru Engineering College: Laboratory Manual
60 pages
TN DataPolicy 2022
No ratings yet
TN DataPolicy 2022
72 pages
Isilon S Series
No ratings yet
Isilon S Series
6 pages
Commands For Managing Files and Directories With Other Commonly Used Commands
No ratings yet
Commands For Managing Files and Directories With Other Commonly Used Commands
2 pages
Library
No ratings yet
Library
20 pages

Aws Q

Uploaded by

Aws Q

Uploaded by

Copyright © Cloud Certification Store | All Rights Reserved Page 1

PREVIEW COPY PLEASE SHARE | Full version at https://cloudcertificationstore.com/b/5UMiy

AWS Certified Data Engineer Associate DEA-C01 Practice Exam Questions

© Cloud Certification Store All rights reserved.

are trademarks of Amazon.com, Inc., used here for identification only.

or outcomes, including exam results, based on the use of this content.

AWS Certified Data Engineer Associate DEA-C01

Earners of the AWS Certified Data Engineer –

Schedule your exam

AWS Certified Data Engineer - Associate

65 questions; either multiple choice or multiple response

English, Japanese, Korean, and Simplified Chinese

1.​ Get to know the exam with exam-style questions

Follow the 4-step plan.

Review the exam guide.

2.​ Refresh your AWS Knowledge and skills

3.​ Review and practice for your exam

4.​ Assess your exam readiness

Key FAQs to help you get started

How long is this certification valid for?

Which solution will meet these requirements?

to the Redshift tables.​

the most recent record.

❌ D. Use ResolveChoice built-in transform – ResolveChoice is for handling schema

A. Increase the query result limit.​

B. Configure provisioned capacity for an existing workgroup.​

C. Use federated queries.​

D. Allow users who run the Athena queries to an existing workgroup.

✅ Correct answer: B. Configure provisioned capacity for an existing workgroup.

❌ D. Allow users to an existing workgroup – Simply granting access doesn’t guarantee

query the data that is in the data lake.

Which solution will meet this requirement?

A. Use Amazon Macie pattern matching as part of the ETL job.​

📌 FindMatches uses machine learning to identify duplicate or matching records without a

❌ C. Partition tables on a unique identifier – Partitioning is only effective if a reliable unique

data between two data stores?

A. Set up an AWS DMS replication instance in Account_B in eu-west-1.​

B. Set up an AWS DMS replication instance in Account_B in eu-east-1.​

C. Set up an AWS DMS replication instance in a new AWS account in eu-west-1.​

D. Set up an AWS DMS replication instance in Account_A in eu-east-1.

✅ Correct answer: B. Set up an AWS DMS replication instance in Account_B in

❌ C. Instance in a new account – This adds unnecessary complexity without benefit.​

uses the PutRecord action to send data to Kinesis Data Streams.

wants to configure exactly-once delivery for the entire processing pipeline.

Which solution will meet this requirement?

ID in each record at the source.​

collection application to avoid duplicate processing of events.​

Spark Streaming in Amazon EMR.

✅ Correct answer: B. Update the checkpoint configuration of the Amazon Managed

📌 Configuring checkpointing ensures stateful stream processing that supports exactly-once

❌ A. Embed a unique ID – While this helps deduplication downstream, it doesn’t enforce

❌ D. Replace with EMR – This is an unnecessary rearchitecture and increases operational

S3 bucket. The object is encrypted by an AWS KMS key.

What is the likely cause of the error?

could not access the object.​

the KMS key that can decrypt the S3 object.

permission-based decryption failures.

Functions to orchestrate the processes, and Amazon EventBridge to schedule jobs.

Which solution will MOST reduce the data processing time?

Glue’s built-in grouping.​

already has a native feature for this.

A. Manually back up the S3 bucket on a regular basis.​

B. Enable S3 Versioning for the S3 bucket.​

C. Configure replication for the S3 bucket.​

✅ Correct answer: B. Enable S3 Versioning for the S3 bucket.

combined with versioning.​

❌ D. Glacier storage – Glacier is for archival; it doesn’t inherently prevent deletions.

dataset by using an Amazon Redshift data warehouse.

on-premises infrastructure and AWS.

❌ C. AppFlow SDK – No native MySQL connector; requires significant custom coding.​

Which command will reclaim the MOST database storage space?

A. DELETE FROM materialized_view_name WHERE 1=1​

C. VACUUM table_name WHERE load_date<=current_date materializedview​

D. DELETE FROM materialized_view_name WHERE load_date<=current_date

✅ Correct answer: B. TRUNCATE materialized_view_name.

A. Amazon Kinesis Data Streams​

1. Get to know the exam with exam-style questions

2. Refresh your AWS Knowledge and skills

3. Review and practice for your exam

4. Assess your exam readiness

to the Redshift tables.

A. Increase the query result limit.

B. Configure provisioned capacity for an existing workgroup.

C. Use federated queries.

A. Use Amazon Macie pattern matching as part of the ETL job.

A. Set up an AWS DMS replication instance in Account_B in eu-west-1.

B. Set up an AWS DMS replication instance in Account_B in eu-east-1.

C. Set up an AWS DMS replication instance in a new AWS account in eu-west-1.

❌ C. Instance in a new account – This adds unnecessary complexity without benefit.

ID in each record at the source.

collection application to avoid duplicate processing of events.

could not access the object.

Glue’s built-in grouping.

A. Manually back up the S3 bucket on a regular basis.

B. Enable S3 Versioning for the S3 bucket.

C. Configure replication for the S3 bucket.

combined with versioning.

❌ C. AppFlow SDK – No native MySQL connector; requires significant custom coding.

A. DELETE FROM materialized_view_name WHERE 1=1

C. VACUUM table_name WHERE load_date<=current_date materializedview

A. Amazon Kinesis Data Streams

C. Amazon Kinesis Data Firehose

❌ B. MSK provisioned cluster – Requires cluster sizing and scaling management.

A. Create OpenSearch Dashboards using data from OpenSearch Service.

connect QuickSight to Athena.

B. Register the third-party IdP from within Amazon Redshift.

B. Use the Athena console to exclude .json files in queries.

C. Relocate .json files to a different path in the S3 bucket.

D. SELECT * FROM Employee WHERE Role ID=50;

❌ D. Role only – Skips both leading keys; no benefit.

A. Change format to JSON with Snappy compression.

B. Compress .csv with Snappy.

A. Firehose converts .csv to JSON, then Lambda stores in Parquet.

B. Firehose converts .csv to JSON and stores directly in Parquet.

❌ A. Workflows – Orchestrate multiple jobs, not incremental tracking.

A. Update Glue security group to allow inbound from S3 gateway endpoint.

B. Add S3 bucket policy to allow Glue access.

C. Ensure Glue connection uses a fully qualified domain name.

❌ A. EMR + Step Functions – Adds unnecessary complexity and cluster management.

○ ❌ Leave it unchecked if you feel unsure or haven't practiced the task.

○ Check the official documentation

○ 1️⃣ = No understanding or hands-on practice

○ 3️⃣ = Moderate familiarity, but need review

○ 5️⃣ = Mastered topic and can apply it in real-world use

2. Add Notes / Action Items to explain:

○ Why you scored yourself low

○ Practice test scores if relevant

● Handle streaming vs batch ingestion patterns

● Understand Redshift distribution keys, sort keys, and compression

● Partition and bucket data for performance in Athena and Glue

● Use Amazon Athena for serverless querying of S3 data

● Connect BI tools to Redshift, Athena, or RDS

● Optimize queries with proper joins, filters, and partitions

● Integrate AWS QuickSight for dashboards and visual analytics

● Transfer and transform datasets between AWS regions/accounts

● Set up cross-account access for data sharing