Six Steps of Data Analysis Process
1. Define the Problem or Research Question
2. Collect Data
3. Data Cleaning
4. Analyzing the Data
5. Data Visualization
6. Presenting Data
1. Define the Problem or Research Question (Ask Phase)
In the first step, I focus on understanding the business problem or objective. This
involves communicating clearly with stakeholders to identify their pain points,
expectations, and desired outcomes. I ensure that I ask the right questions to
uncover the root cause of the issue and align my analysis with the stakeholders'
goals. It’s also important to remain focused and avoid distractions during this
phase to fully grasp the scope of the problem.
Key Questions I Ask:
What problem are we trying to solve?
What are the stakeholders expecting as a solution?
2. Collect the Data (Prepare Phase)
Once the problem is clearly defined, I move to data collection. I gather relevant
data from multiple sources — both internal (like company databases) and external
(such as public datasets or third-party providers).
Types of Data:
First-party: Data collected directly by the organization.
Second-party: Shared data from another organization.
Third-party: Data purchased or collected externally.
Data is commonly collected through surveys, interviews, feedback forms, and
stored in formats like Excel spreadsheets or SQL databases.
3. Clean the Data (Process Phase)
Cleaning is one of the most critical steps. I remove duplicates, correct
inconsistencies, and ensure the data is in the correct format. Clean data is
essential for reliable analysis. I also check for any biases in the data to ensure fair
and accurate insights.
Tools I Use:
Excel functions (e.g., Remove Duplicates, TRIM)
SQL queries (e.g., filtering NULLs, removing duplicates)
4. Analyze the Data (Analyze Phase)
With clean data, I perform analysis to identify patterns, correlations, and trends. I
use tools like Excel (pivot tables, formulas) and SQL (joins, groupings,
aggregations). I may also use programming languages like Python or R for
advanced statistical analysis or automation.
Objective: Find meaningful insights that address the business problem.
5. Visualize the Data (Share Phase)
Visualization helps convey complex data in a simple and understandable form. I
use tools like Tableau or Power BI to create interactive dashboards and charts.
These visuals are especially helpful for non-technical stakeholders.
Goal: Make data easy to understand and insights more impactful.
6. Present the Data (Act Phase)
Finally, I present the data story. This is more than just showing visuals — I
interpret the findings, connect them back to the original problem, and
recommend actions. I tailor the presentation to suit both technical and non-
technical audiences, using storytelling to guide them through the insights.
Output Formats:
Reports
Slide decks
Interactive dashboards
The aim is to support informed decision-making and drive positive business
outcomes.
Summary
Each step builds on the previous one — from understanding the problem to
delivering actionable insights. I follow this structured approach to ensure my
analysis is accurate, insightful, and aligned with business goals.
Perform Data Analysis in Excel
Preparing Data for Analysis
How to Clean Data in Excel
1. Remove Duplicates: Use Data > Remove Duplicates to eliminate redundancy.
Also, there are many ways to identify duplicates for checking
=UNIQUE([array], [by column], [TRUE-[returns items that appear only once]
FALSE-[return every distinct item]])
DATA -> REMOVE DUPLICATES -> (remove duplicates by column-wise)
By using CONDITIONAL FORMATTING-> HIGHLIGHT DUPLICATES-> (can
highlight duplicate values by color and remove after checking)
2. Use TRIM and CLEAN Functions:
TRIM removes unnecessary spaces.
CLEAN removes non-printable characters (hidden characters, weird characters,
also removes multiple lines text)
3. Sort and Structure Data:
Use Text to columns wizard option to segregate or split columns based on
preferences.
SQL
What is a Database?
A database is a digital system designed for
the storage and arrangement of data. Think of it as an online filing
system that allows you to store and quickly access a vast amount of
information. Databases facilitate the efficient management of data,
enabling the simple addition, modification, removal, and access of
information. They serve numerous uses such as websites, applications, and
enterprises to manage extensive data in an organized and secure manner.
What is MySQL?
MySQL is an open-source relational database management
system (RDBMS) that uses Structured Query Language
(SQL) to manage data. Developed by MySQL AB and now owned
by Oracle Corporation, it’s widely used due to its reliability, speed,
and ease of use. MySQL is a key component in many web applications,
forming the backbone of popular websites and services.
It allows users to create, modify, and maintain databases, supporting
operations like data insertion, querying, updating, and deletion. Ideal
for both small and large-scale applications, MySQL powers various types of
systems, from personal projects to complex enterprise environments.
SQL is a Structured query language used to access and manipulate
data in databases. SQL is a query language that communicates with
databases.
SQL Uses:
SQL's integration with various technologies makes it essential for managing
and querying data in databases. Whether it's in traditional relational
databases (RDBMS) or modern technologies such as machine learning, AI,
and blockchain, SQL plays a key role. It works seamlessly
with DBMS (Database Management Systems) to help users interact with
data, whether stored in structured RDBMS or other types of databases.
Data Science & Analytics: Used for querying large datasets, data
cleaning, and analysis. Analysts use SQL to generate reports and insights
that inform business decisions.
How SQL work?
When you interact with a database, you typically use SQL commands to
perform these operations. These commands are translated into actions by
the SQL Engine, the core component responsible for processing queries.
The SQL Engine parses and compiles SQL queries, optimizing and executing
them to interact with the stored data.
Different DBMS tools (like MySQL, SQL Server, etc.) provide an interface
and APIs that users can use to interact with the database. These tools
provide a user-friendly way to write and execute SQL queries, but
internally, they rely on their respective SQL Engines to process these
commands.
For example, MySQL uses its own SQL Engine to parse, optimize, and
execute queries, while SQL Server has a different SQL Engine for the same
task.
Important Terminologies
These are some important terminologies that are used in terms of relation.
Table STUDENT
ROLL_NO NAME ADDRESS PHONE AGE
1 RAM DELHI 9455123451 18
2 RAMESH GURGAON 9652431543 18
3 SUJIT ROHTAK 9156253131 20
ROLL_NO NAME ADDRESS PHONE AGE
4 SURESH DELHI 9156768971 18
Attribute:Attributes are the properties that define a relation.
e.g.; ROLL_NO, NAME etc.
Tuple:Each row in the relation is known as tuple. The above relation
contains 4 tuples, one of which is shown as:
Degree:The number of attributes in the relation is known as degree of the
relation. The STUDENT relation defined above has degree 5.
Cardinality:The number of tuples in a relation is known as cardinality.
The STUDENTrelation defined above has cardinality 4.
Column:Column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from relation STUDENT.
Create Database
The first step to storing the information electronically using SQL includes
creating database. And in this section we will learn how to Create, Select,
Drop, and Rename databases with examples.
CREATE Database
DROP Database
RENAME Database
SELECT Database
Tables
The cornerstone of any SQL database is the table. Basically, these structure
functions is very similar to spreadsheets, which store data in very
organized grid format. In this section, you will learn how to Create, Drop,
Delete, and more related to Table.
CREATE TABLE
DROP TABLE
RENAME TABLE
TRUNCATE TABLE
COPY TABLE
TEMP TABLE
ALTER TABLE
CRUD Operations
In this section, you will learn about the SQL Queries like SELECT statement,
SELECT LAST, and more. Explore this section and learn how to use these
queries.
SELECT Statement
INSERT INTO
INSERT Multiple Rows
UPDATE Statement
DELETE Statement
DELETE Duplicate Rows
Clauses
Unlock the power of SQL Clauses with this SQL tutorial. Here in this section,
you will learn how to use SELECT, WHERE, JOIN, GROUP BY, and more to
query databases effectively.
WHERE Clause
WITH Clause
HAVING Clause
ORDER By Clause
Group By Clause
LIMIT Clause
Distinct Clause
FETCH
Aliases
Operators
SQL Operators" refers to the fundamental symbols and keywords within the
SQL that enable users to perform various operations and SQL AND, OR,
LIKE, NOT, and more operators on databases. Here, we have discussed all
the SQL operators in a detailed manner with examples.
AND Operator
OR Operator
Logical Operators
LIKE Operator
IN Operator
NOT Operator
NOT EQUAL Operator
IS NULL Operator
UNION Operator
UNION ALL Operator
EXCEPT Operator
BETWEEN Operator
ALL and ANY
INTERSECT Operator
EXISTS Operator
CASE Operator
Aggregate Functions
Whether you are calculating the total sales revenue for a particular
product, finding the average age of customers, or determining the highest
value in a dataset, SQL Aggregate Functions make these tasks
straightforward and manageable.
Aggregate Function
Count() Function
SUM() Function
MIN() Function
MAX() Function
AVG() Function
Data Constraints
Constraints act as rules or conditions imposed on the data, dictating what
values are permissible and what actions can be taken. They play a crucial
role in maintaining the quality and coherence of the database by
preventing errors. So, explore this section to get a hand on SQL Data
Constraints.
NOT NULL Constraints
Primary Key Constraints
Foreign Key Constraints
Composite Key
Unique Constraints
Alternate Key
CHECK Constraints
DEFAULT Constraints
Joining Data
SQL joins serve as the weaver's tool, allowing you to seamlessly merge
data from multiple tables based on common threads. So explore this
section to learn how to use JOIN command.
JOIN
Outer Join
Left Join
Right Join
Full Join
Cross Join
Self Join
UPDATE with JOIN
DELETE JOIN
Recursive Join
Functions
SQL functions offer an efficient and versatile approach to data analysis. By
leveraging these functions within your queries, you can enhance the depth
and accuracy of your insights, transforming raw data into actionable
knowledge.
Date Functions
String Functions
Numeric Functions
Statistical Functions
JSON Functions
Conversion Functions
Datatype Functions
LTRIM Function
UPPER Function
RTRIM Function
Views
Views makes easier for anyone to access the information they need,
without getting bogged down in complicated queries. Views also act like a
helpful security guard, keeping the most sensitive information in the back
room, while still allowing access to what's needed.
CREATE VIEW
UPDATE VIEW
RENAME VIEW
DELETE VIEW
Indexes
Indexes work by organizing specific columns in a particular order, allowing
the database to quickly pinpoint the information you need. And in this
section, we have listed all the points that one has to learn while learning
SQL.
Indexes
Create Index
Drop Index
Show Indexes
Unique Index
Clustered Index vs Non-Clustered Index
Subquery
Subqueries allow you to perform nested queries within a larger query,
enabling more complex data retrieval. They help in filtering data or
performing operations on data that would otherwise require multiple
queries.
Subquery
Correlated Subqueries
Nested Queries
Miscellaneous Topics
In this miscellaneous section, you will encounter concepts like stored
procedures for automating repetitive tasks, triggers for automated actions
based on data changes, and window functions for complex calculations
within a single query.
Wildcards Operators
Comments
Pivot and Unpivot
Trigger
Hosting
Performance Tuning
Stored Procedures
Transactions
Sub Queries
Using Sequences
Auto Increment
Window functions
Cursors
Common Table Expressions
Database Tuning
Dynamic
Regular Expressions
SQL commands are categorized based on their specific functionalities:
1. Data Definition Language (DDL),
2. Data Manipulation Language (DML),
3. Data Control Language (DCL),
4. Data Query Language (DQL), and
5. Transaction Control Language (TCL).
1. DDL - Data Definition Language
can be used for defining, altering, and deleting database structures
such as tables, indexes, and schemas. It simply deals with descriptions
of the database schema and is used to create and modify the structure of
database objects in the database.
Command Description Syntax
Create database or its
CREATE TABLE table_name
objects (table, index,
CREATE (column1 data_type, column2
function, views, store
data_type, ...);
procedure, and triggers)
Delete objects from the
DROP DROP TABLE table_name;
database
ALTER TABLE table_name ADD
Alter the structure of the
ALTER COLUMN column_name
database
data_type;
Remove all records from a
table, including all spaces
TRUNCATE TRUNCATE TABLE table_name;
allocated for the records
are removed
Add comments to the data COMMENT 'comment_text' ON
COMMENT
dictionary TABLE table_name;
Rename an object existing RENAME TABLE old_table_name
RENAME
in the database TO new_table_name;
2. DQL - Data Query Language
DQL statements are used for performing queries on the data
within schema objects. The purpose of the DQL Command is to get
some schema relation based on the query passed to it. This command
allows getting the data out of the database to perform operations with it.
When a SELECT is fired against a table or tables the result is compiled into
a further temporary table, which is displayed.
Comma
nd Description Syntax
It is used to retrieve data SELECT column1, column2, ...FROM
SELECT
from the database table_name WHERE condition;
3. DML - Data Manipulation Language
The SQL commands that deal with the manipulation of data present in
the database. It is the component of the SQL statement that controls
access to data and to the database.
Command Description Syntax
Insert data into a INSERT INTO table_name (column1, column2, ...)
INSERT
table VALUES (value1, value2, ...);
Update existing data UPDATE table_name SET column1 = value1,
UPDATE
within a table column2 = value2 WHERE condition;
Delete records from
DELETE DELETE FROM table_name WHERE condition;
a database table
Table control
LOCK LOCK TABLE table_name IN lock_mode;
concurrency
Call a PL/SQL or JAVA
CALL CALL procedure_name(arguments);
subprogram
EXPLAIN Describe the access
EXPLAIN PLAN FOR SELECT * FROM table_name;
PLAN path to data
4. DCL - Data Control Language
DCL (Data Control Language) includes commands such
as GRANT and REVOKE which mainly deal with the rights, permissions,
and other controls of the database system. These commands are used
to control access to data in the database by granting or revoking
permissions.
Command Description Syntax
Assigns new privileges to a
user account, allowing GRANT privilege_type
[(column_list)] ON [object_type]
GRANT access to specific database object_name TO user [WITH
objects, actions, or GRANT OPTION];
functions.
Removes previously granted
privileges from a user REVOKE [GRANT OPTION FOR]
privilege_type [(column_list)] ON
REVOKE account, taking away their [object_type] object_name FROM
access to certain database user [CASCADE];
objects or actions.
5. TCL - Transaction Control Language
Transactions group a set of tasks into a single execution unit. Each
transaction begins with a specific task and ends when all the tasks in the
group are successfully completed. If any of the tasks fail, the transaction
fails. Therefore, a transaction has only two results: success or failure.
Command Description Syntax
BEGIN BEGIN TRANSACTION
Starts a new transaction
TRANSACTION [transaction_name];
Saves all changes made during
COMMIT COMMIT;
the transaction
Undoes all changes made
ROLLBACK ROLLBACK;
during the transaction
Creates a savepoint within the
SAVEPOINT SAVEPOINT savepoint_name;
current transaction
SQL CONSTRAINTS:
SQL constraints are essential elements in relational database
design that ensure the integrity, accuracy, and reliability of the data
stored in a database. By enforcing specific rules on table columns, SQL
constraints help maintain data consistency, preventing invalid data entries
and optimizing query performance.
In this article, we will explain the most common SQL constraints in detail,
providing clear examples and explaining how to implement them
effectively.
What Are SQL Constraints?
SQL constraints are rules applied to columns or tables in a relational
database to limit the type of data that can be inserted, updated,
or deleted. These rules ensure the data is valid, consistent, and adheres to
the business logic or database requirements. Constraints can be
enforced during table creation or later using the ALTER TABLE statement.
They play a vital role in maintaining the quality and integrity of your
database.
Types of SQL Constraints
SQL provides several types of constraints to manage different aspects of
data integrity. These constraints are essential for ensuring that data meets
the requirements of accuracy, consistency, and validity. Let’s go
through each of them with detailed explanations and examples.
1. NOT NULL Constraint
The NOT NULL constraint ensures that a column cannot contain NULL
values. This is particularly important for columns where a value is essential
for identifying records or performing calculations. If a column is defined
as NOT NULL, every row must include a value for that column.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
ADDRESS varchar(20)
);
Explanation: In the above example, both the ID and NAME columns are
defined with the NOT NULL constraint, meaning every student must have
an ID and NAME value.
2. UNIQUE Constraint
The UNIQUE constraint ensures that all values in a column are distinct
across all rows in a table. Unlike the PRIMARY KEY, which requires
uniqueness and does not allow NULLs, the UNIQUE constraint allows NULL
values but still enforces uniqueness for non-NULL entries.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20)
);
Explanation: Here, the ID column must have unique values, ensuring that
no two students can share the same ID. We can have more than
one UNIQUE constraint in a table.
3. PRIMARY KEY Constraint
A PRIMARY KEY constraint is a combination of the NOT
NULL and UNIQUE constraints. It uniquely identifies each row in a table. A
table can only have one PRIMARY KEY, and it cannot accept NULL values.
This is typically used for the column that will serve as the identifier of
records.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20),
PRIMARY KEY(ID)
);
Explanation: In this case, the ID column is set as the primary key,
ensuring that each student’s ID is unique and cannot be NULL.
4. FOREIGN KEY Constraint
A FOREIGN KEY constraint links a column in one table to the primary
key in another table. This relationship helps maintain referential
integrity by ensuring that the value in the foreign key column matches a
valid record in the referenced table.
Orders Table:
O_I ORDER_N C_I
D O D
1 2253 3
2 3325 3
3 4521 2
O_I ORDER_N C_I
D O D
4 8532 1
Customers Table:
C_I ADDRES
D NAME S
1 RAMESH DELHI
2 SURESH NOIDA
DHARMES
3 GURGAON
H
As we can see clearly that the field C_ID in Orders table is the primary
key in Customers table, i.e. it uniquely identifies each row in
the Customers table. Therefore, it is a Foreign Key in Orders table.
Example:
CREATE TABLE Orders
(
O_ID int NOT NULL,
ORDER_NO int NOT NULL,
C_ID int,
PRIMARY KEY (O_ID),
FOREIGN KEY (C_ID) REFERENCES Customers(C_ID)
)
Explanation: In this example, the C_ID column in the Orders table is a
foreign key that references the C_ID column in the Customers table. This
ensures that only valid customer IDs can be inserted into the Orders table.
5. CHECK Constraint
The CHECK constraint allows us to specify a condition that data must
satisfy before it is inserted into the table. This can be used to enforce rules,
such as ensuring that a column’s value meets certain criteria (e.g., age
must be greater than 18)
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
AGE int NOT NULL CHECK (AGE >= 18)
);
Explanation: In the above table, the CHECK constraint ensures that only
students aged 18 or older can be inserted into the table.
6. DEFAULT Constraint
The DEFAULT constraint provides a default value for a column when no
value is specified during insertion. This is useful for ensuring that certain
columns always have a meaningful value, even if the user does not provide
one. This helps maintain consistent data and simplifies data entry.
Example:
CREATE TABLE Student
(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
AGE int DEFAULT 18
);
Explanation: Here, if no value is provided for AGE during an insert, the
default value of 18 will be assigned automatically.
How to Specify Constraints in SQL
Constraints can be specified during the table creation process using
the CREATE TABLE statement. Additionally, constraints can be modified or
added to existing tables using the ALTER TABLE statement.
Syntax for Creating Constraints:
CREATE TABLE table_name
(
column1 data_type [constraint_name],
column2 data_type [constraint_name],
column3 data_type [constraint_name],
...
);
We can also add or remove constraints after a table is created:
Example to Add a Constraint:
ALTER TABLE Student
ADD CONSTRAINT unique_student_id UNIQUE (ID);
Conclusion
SQL constraints are essential for maintaining data integrity and ensuring
consistency in relational databases. Understanding and implementing these
constraints effectively will help in designing robust, error-free databases.
By leveraging NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY,
CHECK, DEFAULT, and INDEX, you can ensure your database is
optimized for accuracy and performance.
Questions:
1. What is the difference between CHAR and VARCHAR2 data types?
CHAR: Fixed-length storage. If the defined length is not fully used, it is padded
with spaces.
VARCHAR2: Variable-length storage. Only the actual data is stored, saving
space when the full length is not needed.
2. What is a primary key?
A primary key is a unique identifier for each record in a table. It ensures
that no two rows have the same value in the primary key column(s), and
it does not allow NULL values.
3. What is a foreign key?
A foreign key is a column (or set of columns) in one table that refers to
the primary key in another table. It establishes and enforces a
relationship between the two tables, ensuring data integrity.
4. What is normalization in databases?
Normalization is the process of organizing data in a database to reduce
redundancy and improve data integrity. This involves dividing large
tables into smaller, related tables and defining relationships between
them to ensure consistency and avoid anomalies.
5. What is a query in SQL?
A query is a SQL statement used to retrieve, update, or manipulate data
in a database.
6. What is a view in SQL?
A view is a virtual table created by a SELECT query. It does not store
data itself, but presents data from one or more tables in a structured
way. Views simplify complex queries, improve readability, and enhance
security by restricting access to specific rows or columns.
7. What is a subquery?
A subquery is a query nested within another query. It is often used in
the WHERE clause to filter data based on the results of another query,
making it easier to handle complex conditions.
8. What is the difference between the WHERE and HAVING clauses?
WHERE: Filters rows before any grouping takes place.
HAVING: Filters grouped data after the GROUP BY clause has been
applied.
In short, WHERE applies to individual rows, while HAVING applies to
groups.
9. What are indexes, and why are they used?
Indexes are database objects that improve query performance by
allowing faster retrieval of rows. They function like a book’s index,
making it quicker to find specific data without scanning the entire table.
However, indexes require additional storage and can slightly slow
down data modification operations.
10. What is the difference between DELETE and TRUNCATE
commands?
DELETE: Removes rows one at a time and records each deletion in the
transaction log, allowing rollback. It can have a WHERE clause.
TRUNCATE: Removes all rows at once without logging individual row
deletions. It cannot have a WHERE clause and is faster than DELETE for
large data sets.
11. What are the differences between SQL and NoSQL databases?
SQL Databases:
Use structured tables with rows and columns.
Rely on a fixed schema.
Offer ACID properties.
NoSQL Databases:
Use flexible, schema-less structures (e.g., key-value pairs, document
stores).
Are designed for horizontal scaling.
Often focus on performance and scalability over strict consistency.
12. What is a trigger in SQL?
A trigger is a set of SQL statements that automatically execute in
response to certain events on a table, such as INSERT, UPDATE,
or DELETE. Triggers help maintain data consistency, enforce business
rules, and implement complex integrity constraints.
13. What is the purpose of the ALTER command in SQL?
The ALTER command is used to modify the structure of an existing
database object. This command is essential for adapting our database
schema as requirements evolve.
Add or drop a column in a table.
Change a column’s data type.
Add or remove constraints.
Rename columns or tables.
Adjust indexing or storage settings.
14. What is a composite primary key?
A composite primary key is a primary key made up of two or more
columns. Together, these columns must form a unique combination for
each row in the table. It’s used when a single column isn’t sufficient to
uniquely identify a record.
Example:
Consider an Orders table where OrderID and ProductID together uniquely
identify each record because multiple orders might include the same
product, but not within the same order.
CREATE TABLE OrderDetails (
OrderID INT,
ProductID INT,
Quantity INT,
PRIMARY KEY (OrderID, ProductID)
);
15. How is data integrity maintained in SQL databases?
Data integrity refers to the accuracy, consistency, and reliability of
the data stored in the database. SQL databases maintain data integrity
through several mechanisms:
Constraints: Ensuring that certain conditions are always met. For
example, NOT NULL ensures a column cannot have missing values, FOREIGN
KEY ensures a valid relationship between tables, and UNIQUE ensures no
duplicate values.
Transactions: Ensuring that a series of operations either all succeed or
all fail, preserving data consistency.
Triggers: Automatically enforcing rules or validations before or after
changes to data.
Normalization: Organizing data into multiple related tables to minimize
redundancy and prevent anomalies.
These measures collectively ensure that the data remains reliable and
meaningful over time.
16. What are the advantages of using stored procedures?
Improved Performance: Stored procedures are precompiled and
cached in the database, making their execution faster than sending
multiple individual queries.
Reduced Network Traffic: By executing complex logic on the server,
fewer round trips between the application and database are needed.
Enhanced Security: Stored procedures can restrict direct access to
underlying tables, allowing users to execute only authorized operations.
Reusability and Maintenance: Once a procedure is written, it can be
reused across multiple applications. If business logic changes, you only
need to update the stored procedure, not every application that uses it.
17. What is the difference between UNION and UNION ALL?
UNION: Combines result sets from two queries and removes duplicate
rows, ensuring only unique records are returned.
UNION ALL: Combines the result sets without removing duplicates,
meaning all records from both queries are included.
Performance-wise, UNION ALL is faster than UNION because it doesn’t
perform the additional operation of eliminating duplicates.
18. What are scalar functions in SQL?
Scalar functions operate on individual values and return a single value as
a result. They are often used for formatting or converting data. Common
examples include:
LEN(): Returns the length of a string.
ROUND(): Rounds a numeric value.
CONVERT(): Converts a value from one data type to another.
Example:
SELECT LEN('Example') AS StringLength;
19. What is the difference between the NVL and NVL2 functions?
NVL(): Replaces a NULL value with a specified replacement
value. Example: NVL(Salary, 0) will replace NULL with 0.
NVL2(): Evaluates two arguments:
If the first argument is NOT NULL, returns the second argument.
If the first argument is NULL, returns the third argument.
Example:
SELECT NVL(Salary, 0) AS AdjustedSalary FROM Employees; -- Replaces NULL
with 0
SELECT NVL2(Salary, Salary, 0) AS AdjustedSalary FROM Employees; -- If Salary is
NULL, returns 0; otherwise, returns Salary.
20. How does the RANK() function differ from DENSE_RANK()?
RANK(): Assigns a rank to each row, with gaps if there are ties.
DENSE_RANK(): Assigns consecutive ranks without any gaps.
Example:
SELECT Name, Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;
If two employees have the same salary, they get the same rank,
but RANK() will skip a number for the next rank, while DENSE_RANK() will
not.
21. What is the difference between ROW_NUMBER() and RANK()?
ROW_NUMBER(): Assigns a unique number to each row regardless of
ties.
RANK(): Assigns the same number to tied rows and leaves gaps for
subsequent ranks.
Example:
SELECT Name, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS RowNum
FROM Employees;
22. What are common table expressions (CTEs) in SQL?
A CTE is a temporary result set defined within a query. It improves query
readability and can be referenced multiple times.
Example:
WITH TopSalaries AS (
SELECT Name, Salary
FROM Employees
WHERE Salary > 50000
)
SELECT * FROM TopSalaries WHERE Name LIKE 'A%';
23. What are window functions, and how are they used?
Window functions allow you to perform calculations across a set of
table rows that are related to the current row within a result set, without
collapsing the result set into a single row. These functions can be used to
compute running totals, moving averages, rank rows, etc.
Example: Calculating a running total
SELECT Name, Salary,
SUM(Salary) OVER (ORDER BY Salary) AS RunningTotal
FROM Employees;
24. How does indexing improve query performance?
Indexing allows the database to locate and access the rows
corresponding to a query condition much faster than scanning the
entire table. Instead of reading each row sequentially, the database uses
the index to jump directly to the relevant data pages. This reduces the
number of disk I/O operations and speeds up query execution,
especially for large tables.
Example:
CREATE INDEX idx_lastname ON Employees(LastName);
SELECT * FROM Employees WHERE LastName = 'Smith';
The index on LastName lets the database quickly find all rows matching
‘Smith’ without scanning every record.
25. What are the trade-offs of using indexes in SQL databases?
Advantages
Faster query performance, especially for SELECT queries
with WHERE clauses, JOIN conditions, or ORDER BY clauses.
Improved sorting and filtering efficiency.
Disadvantages:
Increased storage space for the index structures.
Additional overhead for write operations (INSERT, UPDATE, DELETE), as
indexes must be updated whenever the underlying data changes.
Potentially slower bulk data loads or batch inserts due to the need to
maintain index integrity.
In short, indexes make read operations faster but can slow down write
operations and increase storage requirements.
26. What is the difference between clustered and non-clustered
indexes?
1. Clustered Index:
Organizes the physical data in the table itself in the order of the indexed
column(s).
A table can have only one clustered index.
Improves range queries and queries that sort data.
Example: If EmployeeID is the clustered index, the rows in the table are
stored physically sorted by EmployeeID.
2. Non-Clustered Index:
Maintains a separate structure that contains a reference (or pointer) to
the physical data in the table.
A table can have multiple non-clustered indexes.
Useful for specific query conditions that aren’t related to the primary
ordering of the data.
Example: A non-clustered index on LastName allows fast lookups by last
name even if the table is sorted by another column.
27. What is a materialized view, and how does it differ from a
standard view?
Standard View:
A virtual table defined by a query.
Does not store data; the underlying query is executed each time the
view is referenced.
A standard view shows real-time data.
Materialized View:
A physical table that stores the result of the query.
Data is precomputed and stored, making reads faster.
Requires periodic refreshes to keep data up to date.
materialized view is used to store aggregated sales data, updated
nightly, for fast reporting.
28. What is a sequence in SQL?
A sequence is a database object that generates a series of unique
numeric values. It’s often used to produce unique identifiers for
primary keys or other columns requiring sequential values.
Example:
CREATE SEQUENCE seq_emp_id START WITH 1 INCREMENT BY 1;
SELECT NEXT VALUE FOR seq_emp_id; -- Returns 1
SELECT NEXT VALUE FOR seq_emp_id; -- Returns 2
29. What is the purpose of the SQL MERGE statement?
The MERGE statement combines multiple operations INSERT, UPDATE,
and DELETE into one. It is used to synchronize two tables by:
Inserting rows that don’t exist in the target table.
Updating rows that already exist.
Deleting rows from the target table based on conditions
Example:
MERGE INTO TargetTable T
USING SourceTable S
ON T.ID = S.ID
WHEN MATCHED THEN
UPDATE SET T.Value = S.Value
WHEN NOT MATCHED THEN
INSERT (ID, Value) VALUES (S.ID, S.Value);
30. How can you handle duplicates in a query without using
DISTINCT?
1. GROUP BY: Aggregate rows to eliminate duplicates
SELECT Column1, MAX(Column2)
FROM TableName
GROUP BY Column1;
2. ROW_NUMBER(): Assign a unique number to each row and filter by
that
WITH CTE AS (
SELECT Column1, Column2, ROW_NUMBER() OVER (PARTITION BY Column1
ORDER BY Column2) AS RowNum
FROM TableName
)
SELECT * FROM CTE WHERE RowNum = 1;
31. What are the best practices for writing optimized SQL queries?
1. Write Simple, Clear Queries:
Avoid overly complex joins and subqueries.
Use straightforward, well-structured SQL that is easy to read and
maintain.
2. Filter Data Early:
Apply WHERE clauses as early as possible to reduce the amount of data
processed.
Consider using indexed columns in WHERE clauses for faster lookups.
3. **Avoid SELECT *:
Retrieve only the columns needed. This reduces I/O and improves
performance.
4. Use Indexes Wisely:
Create indexes on columns that are frequently used in WHERE clauses,
JOIN conditions, and ORDER BY clauses.
Regularly review index usage and remove unused indexes.
5. Leverage Query Execution Plans:
Use execution plans to identify bottlenecks, missing indexes, or
inefficient query patterns.
6. Use Appropriate Join Types:
Choose INNER JOIN, LEFT JOIN, or OUTER JOIN based on the data
relationships and performance requirements.
7. Break Down Complex Queries:
Instead of a single monolithic query, use temporary tables or CTEs to
process data in stages.
8. Optimize Aggregations:
Use GROUP BY and aggregate functions efficiently.
Consider pre-aggregating data if queries frequently require the same
computations.
9. Monitor Performance Regularly:
Continuously analyze query performance and fine-tune as data volumes
grow or usage patterns change.
32. What is the purpose of the SQL PIVOT operator?
The PIVOT operator transforms rows into columns, making it easier to
summarize or rearrange data for reporting.
Example:
Converting a dataset that lists monthly sales into a format that displays each
month as a separate column.
SELECT ProductID, [2021], [2022]
FROM (
SELECT ProductID, YEAR(SaleDate) AS SaleYear, Amount
FROM Sales
) AS Source
PIVOT (
SUM(Amount)
FOR SaleYear IN ([2021], [2022])
) AS PivotTable;
Query Based SQL Interview Questions
33. Write a query to find the second-highest salary of an employee in a
table.
SELECT MAX(Salary) AS SecondHighestSalary
FROM Employee
WHERE Salary < (SELECT MAX(Salary) FROM Employee);
Explanation:
This query identifies the second-highest salary by selecting the
maximum salary that is less than the overall highest salary. The
subquery determines the top salary, while the outer query finds the next
highest value.
Write a query to retrieve employees who earn more than the average salary.
SELECT *
FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);
Explanation:
This query fetches details of employees whose salary exceeds the
average salary. The subquery calculates the average salary, and the
main query filters rows based on that result.
34. Write a query to fetch the duplicate values from a column in a
table.
SELECT ColumnName, COUNT(*)
FROM TableName
GROUP BY ColumnName
HAVING COUNT(*) > 1;
Explanation:
The query uses GROUP BY to group identical values and HAVING COUNT(*) >
1 to identify values that appear more than once in the specified column.
84. Write a query to find the employees who joined in the last 30 days.
SELECT *
FROM Employee
WHERE JoiningDate > DATE_SUB(CURDATE(), INTERVAL 30 DAY);
Explanation:
By comparing the JoiningDate to the current date minus 30 days, this
query retrieves all employees who joined within the last month.
35. Write a query to fetch top 3 earning employees.
SELECT *
FROM Employee
ORDER BY Salary DESC
LIMIT 3;
Explanation:
The query sorts employees by salary in descending order and uses LIMIT
3 to return only the top three earners.
36. Write a query to delete duplicate rows in a table without using the
ROWID keyword.
DELETE FROM Employee
WHERE EmployeeID NOT IN (
SELECT MIN(EmployeeID)
FROM Employee
GROUP BY Column1, Column2
);
Explanation:
This query retains only one row for each set of duplicates by keeping the
row with the smallest EmployeeID. It identifies duplicates using GROUP
BY and removes rows not matching the minimum ID.
87. Write a query to fetch common records from two tables.
SELECT *
FROM TableA
INNER JOIN TableB ON TableA.ID = TableB.ID;
Explanation:
An INNER JOIN is used to find rows present in both tables by matching a
common column (in this case, ID).
37. Write a query to fetch employees whose names start and end with
‘A’.
SELECT *
FROM Employee
WHERE Name LIKE 'A%' AND Name LIKE '%A';
Explanation:
The query uses LIKE with wildcard characters to filter rows where
the Name column starts and ends with the letter 'A'.
89. Write a query to display all departments along with the number of employees
in each.
SELECT DepartmentID, COUNT(*) AS EmployeeCount
FROM Employee
GROUP BY DepartmentID;
Explanation:
By grouping employees by their DepartmentID and counting rows in each
group, the query produces a list of departments along with the employee
count.
90. Write a query to find employees who do not have managers.
SELECT *
FROM Employee
WHERE ManagerID IS NULL;
Explanation:
This query selects employees whose ManagerID column is NULL, indicating
they don’t report to a manager.
38. Write a query to fetch the 3rd and 4th highest salaries.
WITH SalaryRank AS (
SELECT Salary, RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employee
)
SELECT Salary
FROM SalaryRank
WHERE Rank IN (3, 4);
Explanation:
This query uses the RANK() window function to rank the salaries in
descending order. The outer query then selects the 3rd and 4th highest
salaries by filtering for those ranks.
39. Write a query to transpose rows into columns.
SELECT
MAX(CASE WHEN ColumnName = 'Condition1' THEN Value END) AS Column1,
MAX(CASE WHEN ColumnName = 'Condition2' THEN Value END) AS Column2
FROM TableName;
Explanation:
This query converts specific row values into columns using conditional
aggregation with CASE. Each column’s value is determined based on a
condition applied to rows.
40. Write a query to fetch records updated within the last hour.
SELECT *
FROM TableName
WHERE UpdatedAt >= NOW() - INTERVAL 1 HOUR;
Explanation:
By comparing the UpdatedAt timestamp to the current time minus one
hour, the query retrieves rows updated in the last 60 minutes.
41. Write a query to list employees in departments that have fewer
than 5 employees.
SELECT *
FROM Employee
WHERE DepartmentID IN (
SELECT DepartmentID
FROM Employee
GROUP BY DepartmentID
HAVING COUNT(*) < 5
);
Explanation:
The subquery counts employees in each department, and the main query
uses those results to find employees working in departments with fewer
than 5 members.
42. Write a query to check if a table contains any records.
SELECT CASE
WHEN EXISTS (SELECT * FROM TableName) THEN 'Has Records'
ELSE 'No Records'
END AS Status;
Explanation:
The query uses EXISTS to determine if any rows exist in the table,
returning a status of 'Has Records' or 'No Records' based on the result.
43. Write a query to find employees whose salaries are higher than
their managers.
SELECT e.EmployeeID, e.Salary
FROM Employee e
JOIN Employee m ON e.ManagerID = m.EmployeeID
WHERE e.Salary > m.Salary;
Explanation:
This query joins the Employee table with itself to compare employee
salaries to their respective managers’ salaries, selecting those who earn
more.
44. Write a query to fetch alternating rows from a table.
WITH RowNumbered AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum
FROM Employee
)
SELECT *
FROM RowNumbered
WHERE RowNum % 2 = 0;
Explanation:
This query assigns a sequential number to each row using ROW_NUMBER(),
then selects rows where the row number is even, effectively fetching
alternating rows. The ORDER BY (SELECT NULL) is used to avoid any specific
ordering and just apply a sequential numbering.
45. Write a query to find departments with the highest average salary.
SELECT DepartmentID
FROM Employee
GROUP BY DepartmentID
ORDER BY AVG(Salary) DESC
LIMIT 1;
Explanation:
Grouping by DepartmentID and ordering by the average salary in
descending order, the query returns the department with the highest
average.
46. Write a query to fetch the nth record from a table.
WITH OrderedEmployees AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum
FROM Employee
)
SELECT *
FROM OrderedEmployees
WHERE RowNum = n;
Explanation:
This query uses ROW_NUMBER() to generate a sequential number for
each row. The outer query then retrieves the row where the number
matches the desired nth position. The approach is portable across most
databases.
47. Write a query to find employees hired in the same month of any
year.
SELECT *
FROM Employee
WHERE MONTH(JoiningDate) = MONTH(CURDATE());
Explanation:
By comparing the month of JoiningDate to the current month, the query
selects all employees who were hired in that month regardless of the
year.