TEST QUESTIONS
1.What is the difference between DELETE TABLE and TRUNCATE TABLE in
SQL?
Although they are both used to delete data, a key difference is that DELETE is a database manipulation
language command, whereas TRUNCATE is a data definition language command.
Therefore, DELETE removes specific data from a table but TRUNCATE removes all the rows of a table
without maintaining the table’s structure. Another difference between the two is that DELETE can be used with
the WHERE clause but TRUNCATE cannot. In this case, the DELETE TABLE would remove all the data from
within the table while maintaining the structure. In contrast, TRUNCATE TABLE would delete the table in its
entirety.
2. Write an SQL query to select all records of employees with last names
between “Bailey” and “Frederick”.
For this question, assume the table is called Employees and the last name column is LastName. The query
should look like this:
SELECT * FROM Employees WHERE LastName BETWEEN 'Bailey' AND 'Frederick'
3. Write an SQL query to find the year from a YYYY-MM-DD date.
EXTRACT allows you to pull temporal data types like date, time, timestamp, and interval from the date and
time values. If you wanted to find the year from 2022-03-22, you would write EXTRACT( FROM), as shown
below:
SELECT EXTRACT(YEAR FROM DATE '2022-03-22') AS year;
4. Write an SQL query to select the second highest salary in the engineering
department.
To answer this question, you need the name of the department to be associated with each employee in the
employees table in order to understand which department each employee is a part of.
The department_id field in the employees table is associated with the “id” field in the departments table. You
can call the department_id a foreign key because it is a column that references the primary key of another table,
which in this case is the “id” field in the departments table.
Based on this shared field, you can join both tables using INNER JOIN to associate the department name to
their employees as exemplified below:
SELECT salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.id
With the department name in place, you can now look at the employees of the engineering team and sort by
salary to find the second highest paid.
5. Explain the different types of joins in SQL.
A JOIN is a clause in SQL that’s used to join rows from two or more tables based on a common column
between the tables. This clause is used for merging tables and retrieving data. The most common types of
JOIN commands include:
INNER JOIN- An INNER JOIN is the most common type of JOIN command and is used to return all rows
from two or more tables when the JOIN condition is met.
LEFT JOIN - This type of JOIN command returns rows from the left table when a matching row from the
right meets the JOIN condition.
RIGHT JOIN - This JOIN command is similar to a LEFT JOIN, but rows are returned instead from the right
table when the JOIN condition on the left is met.
FULL JOIN - FULL JOIN returns all rows from the left and the right when there is a match in any of the
tables.
6. What is the PRIMARY KEY in SQL?
PRIMARY KEY is a constraint that uniquely identifies each record. Notably, the constraint cannot have
NULL values, and all values must be UNIQUE. A table can have only one PRIMARY KEY, but the
PRIMARY KEY can consist of single or multiple columns.
7. What are constraints?
Constraints in SQL are rules that can be applied to the type of data in a table. They are used to limit the type of
data that can be stored in a particular column within a table. Some common constraints are:
NOT NULL - This constraint prevents null values from being stored in a column.
UNIQUE - This constraint says that values in a column must be unique. PRIMARY KEY uses the UNIQUE
constraint.
PRIMARY KEY - This constraint is used to specify which field is the primary key.
FOREIGN KEY - This constraint uniquely identifies a row in another table.
8. We’re given two tables, a users table with demographic information and the
neighborhood they live in, and a neighborhoods table. Write a query that returns
all of the neighborhoods that have 0 users.
users table
Columns Type
id INTEGER
name VARCHAR
neighborhood_id INTEGER
created_at DATETIME
neighborhoods table
Columns Type
id INTEGER
name VARCHAR
city_id INTEGER
Output:
Columns Type
name VARCHAR
Hint: The predicament here is finding all the neighborhoods without users. In a sense, you need all the
neighborhoods that do not have a singular user living in them. This means you have to introduce a concept of
existence of a column in one table but not in the other.
9. Given two tables: accounts and downloads, find the average number of
downloads for free vs paying accounts, broken down by day.
To answer this question, start by joining the two tables using an INNER JOIN since we only need accounts
that had downloads during the day. This type of join will exclude accounts with no download records, ensuring
accurate calculations. Then, calculate the average downloads using the AVG() function, grouping results by
the columns you want to differentiate: download_date and paying_customer. Finally, use the ROUND()
function to format the average to two decimal places.
10. Given three tables, representing customer transactions and customer
attributes, write a query to get the average order value by gender.
For this problem, note that we are going to assume that the question states average order value for all users that
have ordered at least once. Therefore, we can apply an INNER JOIN between users and transactions.
Intermediate Business Analyst SQL Questions
For mid-level business analyst roles, expect intermediate SQL questions to challenge your knowledge and
skill. Medium SQL questions fall into two categories:
Complex SQL queries - Intermediate SQL questions ask you to perform joins, sub-queries, self-joins, and
window functions.
SQL/Analytics case studies - Many intermediate questions take the form of case studies or ask you to perform
analysis on the data you pull from a query.
Here are some intermediate business analyst SQL interview questions:
11 . Write a query to obtain the top five most expensive projects by the following
criteria: budget to employee count ratio.
For more context, you are given two tables. One is named projects and the other maps employees to the
projects they are working on. Exclude projects with zero employees. Assume each employee works on only
one project.
To start, think about how to calculate the combined budget for each project and then determine what the
budget per employee attached to the project equals. After that, you can think about how to rank these project
values from most to least expensive to determine which are the most expensive.
12. Given a transactions table with date timestamps, sample every fourth row
ordered by date.
Here’s a hint for this question to get you started: if you are sampling from this table and you want to
specifically sample every fourth value, you will probably have to use a window function.
A general rule of thumb to follow is when a question states or asks for some Nth value (like the third purchase
of each customer or the tenth notification sent), then a window function is the best option. Window functions
allow you to use the RANK() or ROW_NUMBER() function to provide a numerical index based on a certain
order.
13. Write a query to get the number of customers that were upsold by
purchasing additional products.
For this problem, you are given a table of product purchases. Each row in the table represents an individual
product purchase.
Note: If the customer purchased two things on the same day, that does not count as an upsell because they
were purchased within a similar time frame. You are looking for a customer returning on a different date to
purchase a product.
This question is slightly tricky because you have to note the dates that each user purchased products. You can’t
just group by the user_id to find where the number of products purchased is greater than one because of the
upsell condition.
You have to group by both the date field and the user_id to obtain each transaction broken out by day and user:
SELECT
user_id
, DATE(created_at) AS date
FROM transactions
GROUP BY 1,2
The query above will now provide a user_id and date field for each row. If there is a duplicate user_id, then you
know that the user purchased on multiple days, which satisfies the upsell condition. What comes next?
14. Write a query to support or disprove the hypothesis: clickthrough rate is
dependent on search rating.
This question provides a table that represents search results on Facebook, including a query, a position, and a
human rating.
15. Select the largest three departments with ten or more employees and rank
them according to the percentage of employees making over $100,000.
In this problem, you are given two tables: an employees table and a departments table.
Example:
Input:
employees table
Columns Type
id INTEGER
first_name VARCHAR
last_name VARCHAR
salary INTEGER
department_id INTEGER
departments table
Columns Type
id INTEGER
name VARCHAR
Output:
Column Type
percentage_over_100k FLOAT
department_name VARCHAR
number_of_employees INTEGER
First, break down the question to understand what it’s asking. Specifically, break the question down into three
clauses of conditions:
Top three departments by employee count.
Percent of employees making over $100,000 a year.
Departments with at least ten employees.
Then, think about how you would associate employees with their department, calculate and display the
percentage of employees making over $100,000 a year, and order those results to provide an answer to the
original question.
Hard Business Analyst SQL Questions
Advanced business analyst SQL interview questions are common for mid and senior-level roles, and they
require you to write advanced SQL queries or work through complex logic-based case studies. The two main
types of hard SQL questions are:
Advanced SQL writing - Writing queries to debug code, using indices to tune SQL queries, and using
advanced SQL clauses.
Logic-based questions - More challenging analytics case studies or queries that first require you to solve a
logic-based problem.
16. Write an SQL query that creates a cumulative distribution of the number of
comments per user. Assume bin buckets class intervals of one.
To solve this cumulative distribution practice problem, you are given two tables, a users table, and a comments
table.
frequency cumulative
0 10
1 25
2 27
17. Given a users table, write a query to get the cumulative number of new users
added by day, with the total reset every month.
Input:
users table
Columns Type
id INTEGER
name VARCHAR
created_at DATETIME
Output:
DATE INTEGER
2020-01-01 5
2020-01-02 12
… …
2020-02-01 8
2020-02-02 17
2020-02-03 23
Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date or
that maybe it’s just a regular cumulative distribution function. But it is important to notice that you are actually
grouping by a specific interval of month and date, and when the next month comes around, you want to reset
the count of the number of users.
18. Given a table of product subscriptions with a subscription start date and end
date for each user, write a query that returns true or false whether or not each
user has a subscription date range that overlaps with any other user.
Input:
subscriptions table
Column Type
user_id INTEGER
start_date DATETIME
end_date DATETIME
user_id start_date end_date
1 2019-01-01 2019-01-31
2 2019-01-15 2019-01-17
3 2019-01-29 2019-02-04
4 2019-02-05 2019-02-10
Output:
user_id overlap
1 1
2 1
3 1
4 0
Hint: Take a look at each of the conditions first and see how they could be triggered. Given two date ranges,
what determines if the subscriptions would overlap?
19. Let’s say you have two tables, transactions, and products. Hypothetically, the
transactions table consists of over a billion rows of purchases bought by users.
You are trying to find paired products that are often purchased together by the same user, such as wine and
bottle openers, chips and beer, etc.
Write a query to find the top five paired products and their names.
Note: For the purposes of satisfying the test case, P1 should be the item that comes first in the alphabet.
Input:
transactions table
Column Type
id INTEGER
user_id INTEGER
created_at DATETIME
product_id INTEGER
quantity INTEGER
products table
Column Type
id INTEGER
name VARCHAR
price FLOAT
Output:
Column Type
p1 VARCHAR
p2 VARCHAR
qty INTEGER
Hint: To solve this question, you need to break it into several steps. First, you should find a way to
select all the instances in which a user purchased two or more products at the same time. How can
you use user_id and created_at to accomplish this?
20. Calculate the first-touch attribution for each user_id that converted.
The schema below is for a retail online shopping company consisting of two tables, attribution
and user_sessions. Here are some details of the two tables:
The attribution table logs a session visit for each row. If the conversion is true, then the user
converted to a purchase on that session.
The channel column represents which advertising platform the user was attributed to for that specific
session. Lastly, the user_sessions table maps session visits back to one user from a single visit all the
way up to several on the same day.
First-touch attribution is defined as the channel to which the converted user was associated when
they first discovered the website. It is helpful to sketch out the attribution model for converting users:
1st session - User sees Facebook ad -> Clicks to order -> Leaves.
2nd session - User sees Google ad -> Leaves.
3rd session - User types in website -> Clicks to order -> Purchases.
How do you figure out the beginning path of the Facebook ad and connect it to the end purchasing
user?
You need to take two actions:
1. Subset all of the users that converted to customers.
2. Figure out their first session visit to attribute the actual channel.