DSA-SQL For Data Analysis
DSA-SQL For Data Analysis
Big Query
SQLite Database
The Facilitator
SQL for Data Analysis Course breakdown
Week 2: SQL Operators and Clauses 3rd Week of SQL for Data Analysis
SQL Aggregation Part 1 SQL Aggregation Part 2
➢ SQL – Clause: where, order by, Group ➢ Introduction to SQL Aggregation
by, Having, Top. ➢ Introduction to Null
➢ SQL – Operators: Like, AND, OR, ➢ Aggregation: COUNT, SUM, AVERAGE,
Between, Not Between, IN, NOT IN. MIN, MAX etc
➢ SQL - UNION Vs UNION ALL, INTERSECT.
➢ SQL- Wild Card SQL Views
➢ Introduction to SQL Views
SQL Joins ➢ SQL – Create views
➢ SQL - update views
➢ Introduction to Join ➢ SQL – Drops
➢ Type of Join ➢ SQL – Rename views
Physical Server
Structured Data
PostgreSQL,
MySQL, Power BI
Oracle DB
Tableau
Python
SQL follows the following rules: A database is an organized collection of data that
is stored and managed in a structured way to allow
• Structure query language is for easy access, retrieval, and manipulation.
not case sensitive. Generally,
keywords of SQL are written in
uppercase. Advantage of Databases:
➢ Data Integrity: Ensures accuracy and
• Statements of SQL are consistency of data.
dependent on text lines. We ➢ Security: Protects data from unauthorized
can use a single SQL statement access and breaches.
on one or multiple text line.
➢ Backup and Recovery: Allows data to be
• Using the SQL statements, you recovered in case of loss or corruption.
can perform most of the ➢ Concurrency: Supports multiple users accessing
actions in a database. the database simultaneously.
• SQL depends on tuple ➢ Scalability: Ability to handle increasing
relational calculus and amounts of data and users without performance
relational algebra degradation.
➢ Efficient Data Management: Databases are
optimized for quick access to large volumes of
data, enabling fast retrieval of information
through queries.
How Databases Store Data
Databases store data in a structured format using tables, which are composed of rows and columns. Each
table represents a specific type of data, and each row (or record) in the table represents a single entry,
such as a customer or transaction. The columns (or fields) define the attributes of the data, such as a
customer’s name, age, or account number.
➢ If you have used Excel, you should already be familiar with tables
➢ Tables have rows and columns just like Excel.
➢ Database tables for instance are organized by column
➢ Each column must have a unique name,
➢ You will notice, that some columns contain numbers, while other contain texts. In a spreadsheet,
each cell can have its own data types.
➢ But in databases tables, all the data in a column must be of the same type.
➢ This makes performing analysis on database tables pretty simples, while the data type must be
consistent.
How Databases Store Data
➢ Data in Databases is stored in tables that can be thought of just like Excel spreadsheets.
➢ All the data in the same column must match in terms of data type.
➢ Consistent columns types are one of the main reasons working with databases is fast.: Often
databases hold a LOT of data. So, knowing that the columns are all the same type of data means that
obtaining data from a database can still be fast
Why SQL
There are some major advantages to using traditional relational databases, which
we interact with using SQL.
The five most apparent are:
SQL Commands
✓ SQL commands are instructions. It is used to communicate with the database.
✓ SQL can perform various tasks like create a table, add data to tables, drop the table,
modify the table, set permission for users.
DDL changes the structure of the table like creating a table, deleting a table, altering
a table, etc.
All the command of DDL are auto-committed that means it permanently save all the changes
in the database
➢ Insert: The INSERT statement is a SQL query. It is used to insert data into the row
of a table.
➢ Update: This command is used to update or modify the value of a column in the table.
TCL commands are used to manage transactions in the database. These are used to manage the changes made DML
Statement (INSERT, DELETE and UPDATE only). It also allows statements to be grouped into logical transactions
COMMIT
Commit command is used to permanently save any transaction.
Commit command is used to save all the transactions to the database.
Example:
DELETE FROM CUSTOMERS WHERE AGE = 25
COMMIT;
Rollback
This command restores the database to last committed state
Rollback command is used to undo transactions that have not already been saved to the database.
Example:
DELETE FROM CUSTOMERS WHERE AGE = 25;
ROLLBACK;
Savepoint:
It is used to roll the transaction back to a certain point without rolling back the entire transaction.
Savepoint command is used to temporarily save a transaction so that you can rollback to that point
whenever necessary.
SQL - Data Types
✓ SQL Datatype is used to define the values that a column can contain.
✓ Every column is required to have a name and data type in the database table.
SQL Datatype
Numeric Datatype
E.g. String Datatype Date Datatype
Binary Datatype Bit, tinyint, smallint, E.g.: E.g:
E.g. int, bigint, decimal, Char, Varchar, nchar, Date, Datetime,
Yes/No, True/False money, float, real etc. Varbinary, nvarchar Timestamp.
Database Keys
Surrogate Key
Candidate Key
Primary Key
Primary Key:
A special type of key that uniquely identifies each record in a table. Each table can have only one
primary key.
Example: Employee_id in the Employee table.
Foreign Key:
A field in one table that uniquely identifies a row of another table, creating a relationship between the
two tables.
Example: Employee_id in the Salary table is a foreign key (FK) that references the Employee_id in the
Employee table (PK)
Surrogate Key:
A surrogate key is a unique identifier for each record in a table, typically created by the database
itself (e.g. an auto-incrementing integer)
Surrogate Key Vs Primary Key
Primary Key can have a real meaning, like Driving License, Matric No, while surrogate key is usually auto-
incrementing integer with no real meaning.
Composite Key:
Composite key (also known as compound key concatenated key) is a group of two or more columns that
identifies each row of a table uniquely.
Example: In salary tables, Employee_id and Salary_month_year are combined to identify each row uniquely
in salary table.
Candidate Key:
Candidate key is a key of a table which can be selected as primary key. A table can have
multiple candidate keys, out of which one can be selected as a primary key.
Example: Employee_id, License_Number, and Passport_Number
Alternate key:
Alternate key is a candidate key, currently not selected as a primary key of the table
Example: License_Number and Passport_Number
SQL Clauses
SQL clauses are essential components of SQL (Structured Query Language) that define how
queries interact with the database. They are used to specify conditions, modify data, and
control how results are returned. Here’s an overview of some of the most common SQL
clauses
SQL Clause
Group By
Having Clause Order By Clause
Clause
Group BY Clause
SQL GROUP BY statement is used to arrange identical data into groups.
• The GROUP BY statement is used with the SQL SELECT statement.
• The GROUP BY statement follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.
• The GROUP BY statement is used with aggregation function
Having Clause
• HAVING clause is used to specify a search condition for a group or an aggregate.
• Having is used in a GROUP BY clause. If you are not using GROUP BY clause then you
can use HAVING function like a WHERE clause
Order BY
• The ORDER BY clause sorts the result-set in ascending or descending order.
• It sorts the records in ascending order by default. DESC keyword is used to sort the
records in descending order.
SQL Operators
An SQL operator is a reserved words or a character used primarily in an SQL statement's
WHERE clause to perform operation(s), such as comparisons and arithmetic operations.
These Operators are used to specify conditions in an SQL statement and to serve as
conjunctions for multiple conditions in a statement
SQL Operator
8 !< It checks if the left operand value is not less than the right operand value, if yes
then condition becomes true.
9 !> It checks if the left operand value is not greater than the right operand value, if
yes then condition becomes true.
SQL Logical Operators
S/N Operator Description
4 BETWEEN It is used to search for values that are within a set of values.
SQL Joins
LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right table. If there is
no matching join value, it will return NULL.
RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values from
the left table. If there is no matching in both tables, it will return NULL.
SQL Aggregation
Function
2. SUM()Purpose:
Calculates the total sum of a numeric column.
Example: SELECT SUM(salary) FROM employees;
Use: Adds up all the values in a numeric column.
3. AVG()Purpose:
Calculates the average value of a numeric column.
Example: SELECT AVG(salary) FROM employees;
Use: Returns the mean of the values in a numeric column.
4. MIN()Purpose:
Returns the smallest (minimum) value in a column.
Example: SELECT MIN(salary) FROM employees;
Use: Finds the lowest value in a column.
5. MAX()Purpose:
Returns the largest (maximum) value in a column.
Example: SELECT MAX(salary) FROM employees;
Use: Finds the highest value in a column.
SQL Set Operation
SQL set operations allow you to combine the results of two or more SELECT queries. These
operations treat the result sets of each query as mathematical sets, enabling you to
perform set operations like union, intersection, and difference on them. SQL provides the
following set operations:
• An SQL View is a virtual table that is created based on the result set of a SQL query. Unlike a regular table, a
view does not store data itself; instead, it dynamically retrieves data from one or more underlying tables
whenever the view is queried.
• A view behaves like a table in SQL, allowing you to select, update, insert, and delete data (with some
limitations).
• The data in a view is not stored physically; it is generated dynamically when the view is accessed.
• A view is defined using a SELECT statement that can join multiple tables, filter rows, and select specific
columns.
Security
• Views can be used to restrict access to certain data in a table by exposing only specific columns or rows to the
user.
• For example, you can create a view that only shows certain fields of a sensitive table, hiding the rest from the
user.
Simplicity
• Views simplify complex queries. Instead of writing a complex query repeatedly, you can create a view and use it
as a simple table.
SELECT
product_name,
CASE category_id
WHEN 1 THEN 'Electronics'
WHEN 2 THEN 'Furniture'
WHEN 3 THEN 'Clothing'
ELSE 'Other'
END AS category_name
FROM products;
In this example, the CASE statement checks the category_id of each product and returns the
corresponding category name.
If category_id is 1, it returns 'Electronics’;
if 2, it returns 'Furniture’;
and so on.
Key Points:
✓ Flexibility:
CASE can be used in SELECT, UPDATE, INSERT, and ORDER BY clauses, making it very
versatile..
✓ Readability:
CASE statements can make your SQL queries more readable by replacing nested IF
statements.
✓ Efficiency:
Even though CASE adds some complexity to your queries, it often improves efficiency by
reducing the need for multiple queries or conditional logic in your application code.
Thank you