# Database Design and SQL: Comprehensive Guide for Developers
## Introduction to Databases
A database is an organized collection of structured data stored and accessed
electronically. Databases are fundamental to modern applications, enabling
efficient data storage, retrieval, and management. This guide covers relational
databases, SQL, database design principles, and best practices for building
scalable database systems.
## Relational Databases and Tables
### Understanding Relational Database Concepts
A relational database organizes data into tables with rows and columns. Tables are
related through keys, creating a structured system for data management.
**Tables**: Collections of related data organized in rows (records) and columns
(fields).
**Rows**: Individual records containing data for a single entity.
**Columns**: Fields containing specific attributes of the entity.
**Primary Key**: A unique identifier for each row in a table.
**Foreign Key**: A reference to a primary key in another table, creating
relationships.
**Schema**: The structure of a database including tables, columns, data types, and
constraints.
### Creating Tables
```sql
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
CREATE TABLE posts (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL,
title VARCHAR(200) NOT NULL,
content TEXT NOT NULL,
published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id)
);
CREATE TABLE comments (
id INT PRIMARY KEY AUTO_INCREMENT,
post_id INT NOT NULL,
user_id INT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (post_id) REFERENCES posts(id),
FOREIGN KEY (user_id) REFERENCES users(id)
);
```
## Data Types
### Common SQL Data Types
**Numeric Types**:
- INT: Integer values (-2,147,483,648 to 2,147,483,647)
- BIGINT: Large integers (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
- DECIMAL(precision, scale): Fixed-point numbers (e.g., prices)
- FLOAT/DOUBLE: Floating-point numbers
**String Types**:
- VARCHAR(length): Variable-length string up to specified length
- CHAR(length): Fixed-length string
- TEXT: Large text (up to 65,535 characters in MySQL)
- LONGTEXT: Very large text
**Date and Time Types**:
- DATE: Date in YYYY-MM-DD format
- TIME: Time in HH:MM:SS format
- DATETIME: Date and time combined
- TIMESTAMP: Date and time with automatic update
**Boolean**:
- BOOLEAN: TRUE or FALSE (stored as 0 or 1)
## Constraints
Constraints enforce data integrity and consistency:
```sql
CREATE TABLE employees (
id INT PRIMARY KEY AUTO_INCREMENT,
-- NOT NULL constraint
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
-- UNIQUE constraint
email VARCHAR(100) UNIQUE NOT NULL,
employee_id VARCHAR(20) UNIQUE,
-- DEFAULT constraint
department VARCHAR(50) DEFAULT 'Unassigned',
status VARCHAR(20) DEFAULT 'Active',
-- CHECK constraint
salary DECIMAL(10, 2) CHECK (salary > 0),
age INT CHECK (age >= 18),
-- Foreign key constraint
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(id),
-- Composite primary key
UNIQUE KEY unique_combo (first_name, last_name, email)
);
```
## SELECT Statements
### Basic SELECT Queries
```sql
-- Select all columns
SELECT * FROM users;
-- Select specific columns
SELECT id, username, email FROM users;
-- Select with WHERE clause
SELECT * FROM posts WHERE user_id = 5;
-- Multiple conditions
SELECT * FROM posts
WHERE user_id = 5 AND published = TRUE;
SELECT * FROM posts
WHERE published = TRUE OR user_id IN (5, 10, 15);
-- NOT operator
SELECT * FROM posts WHERE published != TRUE;
SELECT * FROM posts WHERE NOT published;
```
### Filtering with WHERE
```sql
-- Comparison operators
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE price >= 100;
SELECT * FROM products WHERE price < 50;
SELECT * FROM products WHERE price <= 50;
SELECT * FROM products WHERE price <> 100;
-- LIKE pattern matching
SELECT * FROM users WHERE username LIKE 'john%'; -- Starts with 'john'
SELECT * FROM users WHERE username LIKE '%smith'; -- Ends with 'smith'
SELECT * FROM users WHERE username LIKE '%an%'; -- Contains 'an'
-- BETWEEN
SELECT * FROM orders WHERE order_date BETWEEN '2025-01-01' AND '2025-12-31';
-- IN operator
SELECT * FROM users WHERE role IN ('admin', 'moderator', 'user');
-- IS NULL
SELECT * FROM users WHERE middle_name IS NULL;
```
### Sorting and Limiting
```sql
-- ORDER BY
SELECT * FROM posts ORDER BY created_at DESC;
SELECT * FROM products ORDER BY price ASC, name ASC;
-- LIMIT
SELECT * FROM posts LIMIT 10; -- First 10 rows
SELECT * FROM posts LIMIT 10 OFFSET 20; -- Skip first 20, get 10
-- LIMIT for pagination
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 10 OFFSET (page - 1) * 10;
```
## Joins
Joins combine data from multiple tables:
```sql
-- INNER JOIN - Only matching rows
SELECT users.username, posts.title, posts.created_at
FROM posts
INNER JOIN users ON posts.user_id = users.id
WHERE posts.published = TRUE;
-- LEFT JOIN - All rows from left table + matching from right
SELECT users.username, COUNT(posts.id) as post_count
FROM users
LEFT JOIN posts ON posts.user_id = users.id
GROUP BY users.id, users.username;
-- RIGHT JOIN - All rows from right table + matching from left
SELECT posts.title, users.username
FROM posts
RIGHT JOIN users ON posts.user_id = users.id;
-- FULL OUTER JOIN - All rows from both tables
SELECT *
FROM users
FULL OUTER JOIN posts ON users.id = posts.user_id;
-- CROSS JOIN - Cartesian product
SELECT * FROM users CROSS JOIN roles;
-- Multiple joins
SELECT users.username, posts.title, comments.content
FROM comments
INNER JOIN posts ON comments.post_id = posts.id
INNER JOIN users ON comments.user_id = users.id
WHERE posts.id = 42;
```
## Aggregation Functions
Aggregate functions perform calculations on groups of rows:
```sql
-- COUNT
SELECT COUNT(*) FROM users; -- Total users
SELECT COUNT(DISTINCT user_id) FROM posts; -- Unique users with posts
-- SUM
SELECT SUM(amount) FROM orders;
SELECT SUM(amount) FROM orders WHERE user_id = 5;
-- AVG
SELECT AVG(price) FROM products;
-- MIN/MAX
SELECT MIN(price) as cheapest, MAX(price) as most_expensive FROM products;
SELECT MIN(created_at) as oldest_post FROM posts;
-- GROUP BY
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id;
-- GROUP BY with multiple columns
SELECT category, status, COUNT(*) as count
FROM products
GROUP BY category, status;
-- HAVING clause (filtering groups)
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING COUNT(*) > 5;
```
## INSERT, UPDATE, DELETE
### Inserting Data
```sql
-- Single row insert
INSERT INTO users (username, email, password_hash)
VALUES ('john_doe', 'john@example.com', 'hashed_password');
-- Multiple rows insert
INSERT INTO users (username, email, password_hash)
VALUES
('alice', 'alice@example.com', 'hash1'),
('bob', 'bob@example.com', 'hash2'),
('charlie', 'charlie@example.com', 'hash3');
-- Insert with calculated values
INSERT INTO posts (user_id, title, content)
SELECT id, CONCAT('Welcome ', username), 'Welcome to our platform'
FROM users;
```
### Updating Data
```sql
-- Update specific records
UPDATE users
SET email = 'newemail@example.com'
WHERE id = 5;
-- Update multiple columns
UPDATE posts
SET published = TRUE, updated_at = NOW()
WHERE id = 42;
-- Conditional update
UPDATE products
SET price = price * 1.1
WHERE category = 'Electronics';
-- Update from another table
UPDATE orders
SET status = 'shipped'
WHERE user_id IN (
SELECT id FROM users WHERE country = 'USA'
);
```
### Deleting Data
```sql
-- Delete specific records
DELETE FROM comments WHERE id = 123;
-- Delete with condition
DELETE FROM posts WHERE published = FALSE;
-- Delete from multiple tables (careful!)
DELETE FROM orders WHERE user_id NOT IN (
SELECT id FROM users
);
-- Clear entire table (careful - no WHERE clause)
DELETE FROM temp_data;
```
## Database Normalization
Normalization reduces data redundancy and improves data integrity.
### First Normal Form (1NF)
Each column contains atomic (indivisible) values:
```sql
-- ✗ Not normalized - hobbies is a list
CREATE TABLE users_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
hobbies VARCHAR(500) -- "reading, gaming, coding"
);
-- ✓ Normalized - separate table for hobbies
CREATE TABLE users_good (
id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE user_hobbies (
user_id INT,
hobby VARCHAR(50),
FOREIGN KEY (user_id) REFERENCES users_good(id)
);
```
### Second Normal Form (2NF)
All non-key attributes depend on the entire primary key:
```sql
-- ✗ Not normalized - instructor_phone depends only on instructor_id
CREATE TABLE classes_bad (
class_id INT,
instructor_id INT,
instructor_phone VARCHAR(20),
PRIMARY KEY (class_id, instructor_id)
);
-- ✓ Normalized - separate table for instructors
CREATE TABLE instructors (
id INT PRIMARY KEY,
phone VARCHAR(20)
);
CREATE TABLE classes_good (
id INT PRIMARY KEY,
instructor_id INT,
FOREIGN KEY (instructor_id) REFERENCES instructors(id)
);
```
### Third Normal Form (3NF)
Non-key attributes depend only on the primary key, not on other non-key attributes:
```sql
-- ✗ Not normalized - city depends on state
CREATE TABLE customers_bad (
id INT PRIMARY KEY,
state_id INT,
state_name VARCHAR(50),
city VARCHAR(50)
);
-- ✓ Normalized
CREATE TABLE states (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE customers_good (
id INT PRIMARY KEY,
state_id INT,
city VARCHAR(50),
FOREIGN KEY (state_id) REFERENCES states(id)
);
```
## Indexes
Indexes improve query performance by creating a data structure for faster lookups:
```sql
-- Create single column index
CREATE INDEX idx_email ON users(email);
-- Create composite index
CREATE INDEX idx_user_created ON posts(user_id, created_at);
-- Create unique index
CREATE UNIQUE INDEX idx_username ON users(username);
-- Create full-text index
CREATE FULLTEXT INDEX idx_content ON posts(content);
-- Drop index
DROP INDEX idx_email ON users;
-- View indexes
SHOW INDEX FROM users;
```
## Transactions
Transactions ensure data consistency by grouping operations:
```sql
-- Basic transaction
START TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Commit if successful
COMMIT;
-- Or rollback if there's an error
ROLLBACK;
-- Transaction with error handling (pseudocode)
BEGIN TRANSACTION
INSERT INTO logs VALUES (...)
UPDATE users SET last_login = NOW() WHERE id = 5
COMMIT
```
## Stored Procedures
Stored procedures are prewritten SQL statements stored in the database:
```sql
-- Create procedure
DELIMITER //
CREATE PROCEDURE GetUserPosts(IN user_id INT)
BEGIN
SELECT id, title, created_at
FROM posts
WHERE user_id = user_id
ORDER BY created_at DESC;
END//
DELIMITER ;
-- Call procedure
CALL GetUserPosts(5);
-- Procedure with output parameter
DELIMITER //
CREATE PROCEDURE CountUserPosts(IN user_id INT, OUT post_count INT)
BEGIN
SELECT COUNT(*) INTO post_count FROM posts WHERE user_id = user_id;
END//
DELIMITER ;
-- Call with output
CALL CountUserPosts(5, @count);
SELECT @count;
```
## Views
Views are virtual tables created from query results:
```sql
-- Create view
CREATE VIEW active_users AS
SELECT id, username, email, created_at
FROM users
WHERE status = 'Active';
-- Query view
SELECT * FROM active_users;
-- Update view
UPDATE active_users SET email = 'newemail@example.com' WHERE id = 5;
-- Drop view
DROP VIEW active_users;
-- Create complex view with joins
CREATE VIEW user_post_stats AS
SELECT
u.id,
u.username,
COUNT(p.id) as post_count,
MAX(p.created_at) as latest_post
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.username;
```
## Database Design Best Practices
### 1. Use Appropriate Data Types
```sql
-- ✓ Good - appropriate types
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(200),
price DECIMAL(10, 2),
stock_quantity INT,
is_active BOOLEAN
);
-- ✗ Bad - inappropriate types
CREATE TABLE products_bad (
id VARCHAR(50) PRIMARY KEY,
name TEXT,
price VARCHAR(50),
stock_quantity VARCHAR(50),
is_active VARCHAR(10)
);
```
### 2. Use Foreign Keys for Referential Integrity
```sql
CREATE TABLE orders (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL,
order_date DATETIME,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
```
### 3. Create Indexes on Frequently Queried Columns
```sql
-- Index on columns used in WHERE, JOIN, and ORDER BY
CREATE INDEX idx_user_id ON posts(user_id);
CREATE INDEX idx_created_at ON posts(created_at);
```
### 4. Use Naming Conventions
```sql
-- Consistent naming
CREATE TABLE users ( -- Plural table names
id INT PRIMARY KEY, -- id for primary key
user_id INT, -- foreign keys with entity_id
created_at TIMESTAMP, -- snake_case for columns
is_active BOOLEAN -- boolean columns start with 'is_' or 'has_'
);
```
### 5. Document Your Schema
```sql
-- Add comments to explain purpose
ALTER TABLE users COMMENT = 'Stores user account information';
ALTER TABLE users MODIFY COLUMN email VARCHAR(100) COMMENT = 'User email address
must be unique';
```
## Performance Optimization
### Query Optimization
```sql
-- ✗ Inefficient - full table scan
SELECT * FROM large_table WHERE column LIKE '%value%';
-- ✓ More efficient - with index
CREATE INDEX idx_column ON large_table(column);
SELECT * FROM large_table WHERE column = 'value';
-- ✗ Inefficient - subquery
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE country = 'USA');
-- ✓ More efficient - join
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.country = 'USA';
```
### Connection Pooling
Use connection pooling in applications to reuse database connections efficiently
rather than creating new connections for each query.
## Conclusion
Database design and SQL are essential skills for modern developers. Understanding
relational concepts, normalization principles, and SQL syntax enables the creation
of efficient, maintainable databases. Proper schema design, appropriate indexing,
and query optimization are crucial for building scalable applications that perform
well as data volumes grow. Regular practice with these concepts ensures developers
can make informed decisions when designing and maintaining databases.