0% found this document useful (0 votes)
74 views20 pages

Snowflake Data Engineering Guide

Uploaded by

sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views20 pages

Snowflake Data Engineering Guide

Uploaded by

sai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Question: 1 CertyIQ

A Data Engineer is investigating a query that is taking a long time to return. The Query Profile shows the following:

What step should the Engineer take to increase the query performance?

A.Add additional virtual warehouses.


B.Increase the size of the virtual warehouse.
C.Rewrite the query using Common Table Expressions (CTEs).
D.Change the order of the joins and start with smaller tables first.

Answer: B
Explanation:

Increase the size of the virtual warehouse.

Question: 2 CertyIQ
How can the following relational data be transformed into semi-structured data using the LEAST amount of
operational overhead?

A.Use the TO_JSON function.


B.Use the PARSE_JSON function to produce a VARIANT value.
C.Use the OBJECT_CONSTRUCT function to return a Snowflake object.
D.Use the TO_VARIANT function to convert each of the relational columns to VARIANT.

Answer: C

Explanation:

Use the OBJECT_CONSTRUCT function to return a Snowflake object.

Reference:

https://docs.snowflake.com/en/sql-reference/functions/object_construct.

Question: 3 CertyIQ
A Data Engineer executes a complex query and wants to make use of Snowflake’s query results caching
capabilities to reuse the results.
Which conditions must be met? (Choose three.)

A.The results must be reused within 72 hours.


B.The query must be executed using the same virtual warehouse.
C.The USED_CACHED_RESULT parameter must be included in the query.
D.The table structure contributing to the query result cannot have changed.
E.The new query must have the same syntax as the previously executed query.
F.The micro-partitions cannot have changed due to changes to other data in the table.

Answer: DEF

Explanation:

D.The table structure contributing to the query result cannot have changed.
E.The new query must have the same syntax as the previously executed query.

F.The micro-partitions cannot have changed due to changes to other data in the table.

Question: 4 CertyIQ
A Data Engineer needs to load JSON output from some software into Snowflake using Snowpipe.
Which recommendations apply to this scenario? (Choose three.)

A.Load large files (1 GB or larger).


B.Ensure that data files are 100-250 MB (or larger) in size, compressed.
C.Load a single huge array containing multiple records into a single table row.
D.Verify each value of each unique element stores a single native data type (string or number).
E.Extract semi-structured data elements containing null values into relational columns before loading.
F.Create data files that are less than 100 MB and stage them in cloud storage at a sequence greater than once
each minute.

Answer: BDE

Explanation:

B.Ensure that data files are 100-250 MB (or larger) in size, compressed.

D.Verify each value of each unique element stores a single native data type (string or number).

E.Extract semi-structured data elements containing null values into relational columns before loading.

Question: 5 CertyIQ
Given the table SALES which has a clustering key of column CLOSED_DATE, which table function will return the
average clustering depth for the SALES_REPRESENTATIVE column for the North American region?

A.select system$clustering_information('Sales', 'sales_representative', 'region = ''North America''');


B.select system$clustering_depth('Sales', 'sales_representative', 'region = ''North America''');
C.select system$clustering_depth('Sales', 'sales_representative') where region = 'North America';
D.select system$clustering_information('Sales', 'sales_representative') where region = 'North America’;

Answer: B

Explanation:

select system$clustering_depth('Sales', 'sales_representative', 'region = ''North America''');

Question: 6 CertyIQ
A large table with 200 columns contains two years of historical data. When queried, the table is filtered on a single
day. Below is the Query Profile:
Using a size 2XL virtual warehouse, this query took over an hour to complete.
What will improve the query performance the MOST?

A.Increase the size of the virtual warehouse.


B.Increase the number of clusters in the virtual warehouse.
C.Implement the search optimization service on the table.
D.Add a date column as a cluster key on the table.

Answer: D

Explanation:
Add a date column as a cluster key on the table.

Question: 7 CertyIQ
A Data Engineer is working on a Snowflake deployment in AWS eu-west-1 (Ireland). The Engineer is planning to
load data from staged files into target tables using the COPY INTO command.
Which sources are valid? (Choose three.)

A.Internal stage on GCP us-central1 (Iowa)


B.Internal stage on AWS eu-central-1 (Frankfurt)
C.External stage on GCP us-central1 (Iowa)
D.External stage in an Amazon S3 bucket on AWS eu-west-1 (Ireland)
E.External stage in an Amazon S3 bucket on AWS eu-central-1 (Frankfurt)
F.SSD attached to an Amazon EC2 instance on AWS eu-west-1 (Ireland)

Answer: CDE

Explanation:

C.External stage on GCP us-central1 (Iowa).

D.External stage in an Amazon S3 bucket on AWS eu-west-1 (Ireland).

E.External stage in an Amazon S3 bucket on AWS eu-central-1 (Frankfurt).

Question: 8 CertyIQ
A Data Engineer wants to create a new development database (DEV) as a clone of the permanent production
database (PROD). There is a requirement to disable Fail-safe for all tables.
Which command will meet these requirements?

A.CREATE DATABASE DEV -

CLONE PROD -
FAIL_SAFE = FALSE;
B.CREATE DATABASE DEV -
CLONE PROD;
C.CREATE TRANSIENT DATABASE DEV -
CLONE PROD;
D.CREATE DATABASE DEV -

CLONE PROD -
DATA_RETENTION_TIME_IN DAYS = 0;

Answer: C

Explanation:

CREATE TRANSIENT DATABASE DEV -

CLONE PROD.

Reference:
https://docs.snowflake.com/en/user-guide/tables-temp-transient

Question: 9 CertyIQ
Which query will show a list of the 20 most recent executions of a specified task, MYTASK, that have been
scheduled within the last hour that have ended or are still running?

A.

B.

C.

D.

Answer: B

Explanation:

BTo query only those tasks that have already completed or are currently running, include WHERE query_id IS
NOT NULL as a filter. The QUERY_ID column in the TASK_HISTORY output is populated only when a task has
started running.https://docs.snowflake.com/en/sql-reference/functions/task_historyA - will give all the
schedules even the ones that have not run yetC - A schedule could be skipped, cancelled so it won't give all
the runsD - It won't return the most recent tasks.

Question: 10 CertyIQ
Which methods can be used to create a DataFrame object in Snowpark? (Choose three.)

A.session.jdbc_connection()
B.session.read.json()
C.session.table()
D.DataFrame.write()
E.session.builder()
F.session.sql()

Answer: BCF

Explanation:
B.session.read.json().

C.session.table().

F.session.sql().

Reference:

https://docs.snowflake.com/en/developer-guide/snowpark/python/working-with-dataframes

Question: 11 CertyIQ
A new CUSTOMER table is created by a data pipeline in a Snowflake schema where MANAGED ACCESS is
enabled.
Which roles can grant access to the CUSTOMER table? (Choose three.)

A.The role that owns the schema


B.The role that owns the database
C.The role that owns the CUSTOMER table
D.The SYSADMIN role
E.The SECURITYADMIN role
F.The USERADMIN role with the MANAGE GRANTS privilege

Answer: AEF

Explanation:

A.The role that owns the schema.

E.The SECURITYADMIN role.

F.The USERADMIN role with the MANAGE GRANTS privilege.

Question: 12 CertyIQ
What is the purpose of the BUILD_STAGE_FILE_URL function in Snowflake?

A.It generates an encrypted URL for accessing a file in a stage.


B.It generates a staged URL for accessing a file in a stage.
C.It generates a permanent URL for accessing files in a stage.
D.It generates a temporary URL for accessing a file in a stage.

Answer: C

Explanation:

It generates a permanent URL for accessing files in a stage.

Question: 13 CertyIQ
The JSON below is stored in a VARIANT column named V in a table named jCustRaw:
Which query will return one row per team member (stored in the teamMembers array) along with all of the
attributes of each team member?

A.

B.
C.

D.

Answer: B

Explanation:

Reference:

https://docs.snowflake.com/user-guide/semistructured-considerations#using-flatten-to-list-distinct-key-
names

Question: 14 CertyIQ
A company has an extensive script in Scala that transforms data by leveraging DataFrames. A Data Engineer
needs to move these transformations to Snowpark.
What characteristics of data transformations in Snowpark should be considered to meet this requirement? (Choose
two.)

A.It is possible to join multiple tables using DataFrames.


B.Snowpark operations are executed lazily on the server.
C.User-Defined Functions (UDFs) are not pushed down to Snowflake.
D.Snowpark requires a separate cluster outside of Snowflake for computations.
E.Columns in different DataFrames with the same name should be referred to with squared brackets.

Answer: AB

Explanation:
A.It is possible to join multiple tables using DataFrames.

B.Snowpark operations are executed lazily on the server.

Question: 15 CertyIQ
The following is returned from SYSTEM$CLUSTERING_INFORMATION() for a table named ORDERS with a DATE
column named O_ORDERDATE:

What does the total_constant_partition_count value indicate about this table?

A.The table is clustered very well on O_ORDERDATE, as there are 493 micro-partitions that could not be
significantly improved by reclustering.
B.The table is not clustered well on O_ORDERDATE, as there are 493 micro-partitions where the range of
values in that column overlap with every other micro-partition in the table.
C.The data in O_ORDERDATE does not change very often, as there are 493 micro-partitions containing rows
where that column has not been modified since the row was created.
D.The data in O_ORDERDATE has a very low cardinality, as there are 493 micro-partitions where there is only a
single distinct value in that column for all rows in the micro-partition.

Answer: A

Explanation:

The table is clustered very well on O_ORDERDATE, as there are 493 micro-partitions that could not be
significantly improved by reclustering.

Question: 16 CertyIQ
A company is building a dashboard for thousands of Analysts. The dashboard presents the results of a few
summary queries on tables that are regularly updated. The query conditions vary by topic according to what data
each Analyst needs. Responsiveness of the dashboard queries is a top priority, and the data cache should be
preserved.
How should the Data Engineer configure the compute resources to support this dashboard?

A.Assign queries to a multi-cluster virtual warehouse with economy auto-scaling. Allow the system to
automatically start and stop clusters according to demand.
B.Assign all queries to a multi-cluster virtual warehouse set to maximized mode. Monitor to determine the
smallest suitable number of clusters.
C.Create a virtual warehouse for every 250 Analysts. Monitor to determine how many of these virtual
warehouses are being utilized at capacity.
D.Create a size XL virtual warehouse to support all the dashboard queries. Monitor query runtimes to determine
whether the virtual warehouse should be resized.

Answer: B

Explanation:

Assign all queries to a multi-cluster virtual warehouse set to maximized mode. Monitor to determine the
smallest suitable number of clusters.

Question: 17 CertyIQ
A Data Engineer has developed a dashboard that will issue the same SQL select clause to Snowflake every 12
hours.
How long will Snowflake use the persisted query results from the result cache, provided that the underlying data
has not changed?

A.12 hours
B.24 hours
C.14 days
D.31 days

Answer: D

Explanation:
Each time the persisted result for a query is reused, Snowflake resets the 24-hour retention period for the
result, up to a maximum of 31 days from the date and time that the query was first executed. After 31 days,
the result is purged and the next time the query is submitted, a new result is generated and persisted.

https://docs.snowflake.com/en/user-guide/querying-persisted-results

Question: 18 CertyIQ
A Data Engineer ran a stored procedure containing various transactions. During the execution, the session abruptly
disconnected, preventing one transaction from committing or rolling back. The transaction was left in a detached
state and created a lock on resources.
What step must the Engineer take to immediately run a new transaction?

A.Call the system function SYSTEM$ABORT_TRANSACTION.


B.Call the system function SYSTEM$CANCEL_TRANSACTION.
C.Set the LOCK_TIMEOUT to FALSE in the stored procedure.
D.Set the TRANSACTION_ABORT_ON_ERROR to TRUE in the stored procedure.

Answer: A

Explanation:

Call the system function SYSTEM$ABORT_TRANSACTION.

Question: 19 CertyIQ
A database contains a table and a stored procedure defined as:

The log_table is initially empty and a Data Engineer issues the following command:
CALL insert_log(NULL::VARCHAR);
No other operations are affecting the log_table.
What will be the outcome of the procedure call?

A.The log_table contains zero records and the stored procedure returned 1 as a return value.
B.The log_table contains one record and the stored procedure returned 1 as a return value.
C.The log_table contains one record and the stored procedure returned NULL as a return value.
D.The log_table contains zero records and the stored procedure returned NULL as a return value.

Answer: D
Explanation:

The log_table contains zero records and the stored procedure returned NULL as a return value.

Question: 20 CertyIQ
When would a Data Engineer use TABLE with the FLATTEN function instead of the LATERAL FLATTEN
combination?

A.When TABLE with FLATTEN requires another source in the FROM clause to refer to.
B.When TABLE with FLATTEN requires no additional source in the FROM clause to refer to.
C.When the LATERAL FLATTEN combination requires no other source in the FROM clause to refer to.
D.When TABLE with FLATTEN is acting like a sub-query executed for each returned row.

Answer: B

Explanation:

Reference:

https://docs.snowflake.com/en/sql-reference/functions/flatten
QUESTION 21
Which are the valid options for the validation mode parameter in the COPY COMMAND?
A. RETURN ROWS
B. RETURN_ERROR
C. RETURN_ERRORS
D. RETURN_ALL_ERRORS

Correct Answer: A, C, D

Explanation/Reference:

VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS


String (constant) that instructs the COPY command to validate the data files instead of loading them
into the specified table; i.e. the COPY command tests the files for errors but does not load them. The
command validates the data to be loaded and returns results based on the validation option specified:
Supported Values
Notes:
RETURN_n_ROWS (e.g. RETURN_10_ROWS)
Validates the specified number of rows, if no errors are encountered; otherwise, fails at the first error
encountered in the rows.
RETURN_ERRORS
Returns all errors (parsing, conversion, etc.) across all files specified in the COPY statement.
RETURN_ALL_ERRORS
Returns all errors across all files specified in the COPY statement, including files with errors that were
partially loaded during an earlier load because the ON_ERROR copy option was set to CONTINUE
during the load.
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#optional-parameters

QUESTION 22

Which of the below functions are recommended to be used to understand the clustering ratio of a
table?

A. SYSTEM$CLUSTERING_RATIO
B. SYSTEM$CLUSTERING_DEPTH
C. SYSTEM$CLUSTERING_INFORMATION

Correct Answer: B,C


Explanation/Reference:

https://docs.snowflake.com/en/sql-reference/functions/system_clustering_ratio.html

QUESTION 23

You have many files which are loaded onto the cloud storage. Most of them are less than 200 MB in
size, but a few are 1GB or more. You need to process them using SNOWPIPE. Which of the below
options is recommended

A. Split the 1 GB files into smaller file sizes


B. Use a dedicated large warehouse for the 1 GB files.
C. Merger the smaller 200 MB files into a single file.

Correct Answer: A

Explanation/Reference:

Explanation
https://docs.snowflake.com/en/user-guide/data-load-considerations-prepare.html#general-file-sizing-
recommendations
The number of load operations that run in parallel cannot exceed the number of data files to be
loaded. To optimize the number of parallel operations for a load, we recommend aiming to produce
data files roughly 100-250 MB (or larger) in size compressed.
Note
Loading very large files (e.g. 100 GB or larger) is not recommended.
If you must load a large file, carefully consider the ON_ERROR copy option value. Aborting or skipping
a file due to a small number of errors could result in delays and wasted credits. In addition, if a data
loading operation continues beyond the maximum allowed duration of 24 hours, it could be aborted
without any portion of the file being committed.
Aggregate smaller files to minimize the processing overhead for each file. Split larger files into a
greater number of smaller files to distribute the load among the compute resources in an active
warehouse. The number of data files that are processed in parallel is determined by the amount of
compute resources in a warehouse. We recommend splitting large files by line to avoid records that
span chunks.
If your source database does not allow you to export data files in smaller chunks, you can use a third-
party utility to split large CSV files.

QUESTION 24

The employee project details has the project names as array against each employee as shown below
Which of the query below will convert the array into individual rows?
A. select emp_id,
emp_name,
p.value::string as project_names
from employee_project_details,table(flatten(employee_project_details.project_names)) p
;

B. select emp_id,
emp_name,
p.value::string as project_names
from employee_project_details,lateral(flatten(employee_project_details.project_names)) p
;
C. select emp_id,
emp_name,
p.value::string as project_names
from employee_project_details,lateral flatten(employee_project_details.project_names)) p

Correct Answer: A

Explanation/Reference:

Explanation
Try this out in your snowflake instance
Step 1 - Create the table
create or replace table employee_project_details(emp_id varchar, emp_name varchar, project_names
array);
Step 2 - Insert values
insert into employee_project_details
select '1','john',array_cat(to_array('it'),to_array('prod'));
Step 3 - Convert to rows
select emp_id,
emp_name,
p.value::string as project_names
from employee_project_details,table(flatten(employee_project_details.project_names)) p

QUESTION 25

For using Snowflake with Spark, which of the below privileges are required?
A. USAGE on the schema that contains the table that you will read from or write to
B. CREATE STAGE on the schema that contains the table that you will read from or write to
C. Accountadmin
D. Sysadmin

Correct Answer: A,B

Explanation/Reference:

Explanation
https://docs.snowflake.com/en/user-guide/spark-connector-install.html#requirements
Requirements
To install and use Snowflake with Spark, you need the following:
A supported operating system. For a list of supported operating systems, see Operating System
Support.
Snowflake Connector for Spark.
Snowflake JDBC Driver (the version compatible with the version of the connector).
Apache Spark environment, either self-hosted or hosted in any of the following:
Qubole Data Service.
Databricks.
Amazon EMR.
In addition, you can use a dedicated Amazon S3 bucket or Azure Blob storage container as a staging
zone between the two systems; however, this is not required with version 2.2.0 (and higher) of the
connector, which uses a temporary Snowflake internal stage (by default) for all data exchange.
The role used in the connection needs USAGE and CREATE STAGE privileges on the schema that
contains the table that you will read from or write to.

QUESTION 26

Which of the below privileges are required to add or remove search optimization?
A. OWNERSHIP privilege on the table
B. ADD SEARCH OPTIMIZATION privilege on the schema that contains the table
C. ACCOUNTADMIN privilege
D. ALL OF THE ABOVE

Correct Answer: A,B

Explanation/Reference:

Explanation
What Access Control Privileges Are Needed For the Search Optimization Service?
To add or remove search optimization for a table, you must have the following privileges:
You must have OWNERSHIP privilege on the table.
You must have ADD SEARCH OPTIMIZATION privilege on the schema that contains the table.
GRANT ADD SEARCH OPTIMIZATION ON SCHEMA TO ROLE ;
To use the search optimization service for a query, you just need SELECT privileges on the table.
You do not need any additional privileges. Because search optimization is a table property, it is
automatically detected and used (if appropriate) when querying a table.

QUESTION 27

While using kafka connector, what charges are applied to your account?
A. Snowpipe processing time
B. Data storage
C. Kafka connector usage

Correct Answer: A,B

Explanation/Reference:

Explanation
Billing Information
There is no direct charge for using the Kafka connector. However, there are indirect costs:
Snowpipe is used to load the data that the connector reads from Kafka, and Snowpipe processing time
is charged to your account.
Data storage is charged to your account

QUESTION 28

External functions must be scalar functions


A. TRUE
B. FALSE

Correct Answer: A

Explanation/Reference:

Explanation
https://docs.snowflake.com/en/sql-reference/external-functions-introduction.html#execution-time-
limitations-and-issues
Execution-time Limitations and Issues
Because the remote service is opaque to Snowflake, the optimizer might not be able to perform some
optimizations that it could perform for equivalent internal functions.
External functions have more overhead than internal functions (both built-in functions and internal
UDFs) and usually execute more slowly.
Currently, external functions must be scalar functions. A scalar external function returns a single
value for each input row.
Currently, external functions cannot be shared with data consumers via Secure Data Sharing.
The maximum response size per batch is 10MB.
External functions cannot be used in the following situations:
As part of a database object (e.g. table, view, UDF, or masking policy) shared via Secure Data
Sharing. For example, you cannot create a shared view that uses an external function. The following
is not supported:
create view my_shared_view as select my_external_function(x) ...;
create share things_to_share;
grant select on view my_shared_view to share things_to_share;
A DEFAULT clause of a CREATE TABLE statement. In other words, the default value for a column
cannot be an expression that calls an external function. If you try to include an external function in a
DEFAULT clause, then the CREATE TABLE statement fails.
A COPY transformation.
External functions can raise additional security issues. For example, if you call a third party’s function,
that party could keep copies of the data passed to the function.

QUESTION 29

Which option needs to be followed to allow a user to have only OWNERSHIP privilege on the table, but
should not be able to manage privilege grants on the object

A. Create a managed schema


B. Create a regular schema
C. By default object owner will not be able to manage privileges on the object

Correct Answer: A

Explanation/Reference:

Explanation
https://docs.snowflake.com/en/sql-reference/sql/create-schema.html#optional-parameters
WITH MANAGED ACCESS
Specifies a managed schema. Managed access schemas centralize privilege management with the
schema owner.
In regular schemas, the owner of an object (i.e. the role that has the OWNERSHIP privilege on the
object) can grant further privileges on their objects to other roles. In managed schemas, the schema
owner manages all privilege grants, including future grants, on objects in the schema. Object owners
retain the OWNERSHIP privileges on the objects; however, only the schema owner can manage
privilege grants on the objects.

QUESTION 30

The Snowflake Kafka Connector does not guarantee that rows are inserted in the order that they
were originally published.
A. TRUE
B. FALSE

Correct Answer: A

Explanation/Reference:

Explanation
There is no guarantee that rows are inserted in the order that they were originally published.
https://docs.snowflake.com/en/user-guide/kafka-connector-overview.html#fault-tolerance

QUESTION 31

Which system table will you use to get the total credit consumption over a specific time period?

A. WAREHOUSE_METERING_HISTORY
B. WAREHOUSE_CREDIT_USAGE_HISTORY
C. WAREHOUSE_USAGE_HISTORY

Correct Answer: A

Explanation/Reference:

The WAREHOUSE_METERING_HISTORY table in the ACCOUNT_USAGE Schema can be used to get the
desired information. Run the below query to try this out.
SELECT WAREHOUSE_NAME,
SUM(CREDITS_USED_COMPUTE) AS CREDITS_USED_COMPUTE_SUM
FROM ACCOUNT_USAGE.WAREHOUSE_METERING_HISTORY
GROUP BY 1
ORDER BY 2 DESC;

You might also like