COMP101:
Foundations
                           of Information
                              Systems
Lecture 5:                      Grant Dick
The Relational Model and         Department of
                              Information Science
SQL (I)
Creating databases
Before we start…
• Lab this week — build SQL database
• Lab assessment via SQL submission
                                       1
Today
• Brief introduction to the relational model
  of databases (expanded in later lectures)
• Basics of Structured Query Language (SQL)
  • Creating a table
  • Data types
                                               2
Logical data models
• Abstract away details of physical data storage
  • i.e., bits, bytes, records, files, …
  • This is known as (physical) data independence
• Normally have three parts:
  • Structure defines how data are organised, e.g., files,
    relations, objects
  • Integrity rules define how data are kept correct and
    consistent
  • Operators define how data are manipulated
                   (“Data models in database management”, ACM SIGMOD Record 11(2), E. F. Codd, 1981)
                                                                                                  3
Relational model of data
The inspiration behind SQL databases
                                       4
The Relational Model of
Data                                                            E.F. Codd, 1923–2003
• Devised by Edgar Codd in early
  1970s, and further developed by Chris
  Date
                                                                C.J. Date, 1941-
• Predominant data model for databases
• Simple, logical structure based upon relations
  (tables), tuples (rows) and attributes (columns)
• High correlation to ERDs (but “relation” ≠ “relationship”!)
                                                                                   5
The Information Principle
In a relational database, all information is
represented by relations (specifically, values
within attributes within tuples within relations)
      (“A relational model of data for large shared data banks”, CACM 13(6), E. F. Codd, 1970)
  • Everything identified by value or name (a special
    kind of value)
  • All collections are sets of unordered, distinct values
  • No unnecessary duplication of data
                                                                                            6
Relational Model Components
• Attributes are named “value containers”:
  • Each attribute contains a single value from a known
    domain, describing aspects of a larger object
• Tuples are unordered sets of attributes that
  group together to describe an object
• Relations are unordered sets of tuples
  (describing a collection of like objects)
                                                          7
Properties of Relational Model
• All the tuples of a relation must have the same
  attributes
• Tuples are uniquely identified by a special sets
  of attributes (called keys)
• A relation has one primary key
• Tuples in one relation are linked through foreign
  keys that reference a primary key of another
  relation
                                                      8
Properties of Relational Model
• Manipulation of data through set
  operations (e.g., union, intersection, set
  difference)
• Because of the uniqueness property in the
  model, manipulating data can be rigorously
  defined (e.g., relational algebra, relational
  calculus)
                                                  9
Mapping from an ERD to relational
(more detail shortly)
    Lectures 3-4                            Today               Lectures 5-8
        ERDs                            Relational Model         SQL
        Entity type                     Relation                 Table
        Entity instance                 Tuple                    Row
        Attribute         Attribute                              Column
        Unique identifier Primary key                            Primary key
        Relationship      Foreign key†                           Foreign key†
†   1:1 and 1:M relationships only (i.e., must transform M:N)
                                                                                10
Database Management
Systems (DBMS)
Implementation and management of logical database models
                                                           11
    Every database has a schema
    • Definition of the database’s structure
      • In some language, e.g., SQL
•    Processed to create instance of database
     • By a database management system (DBMS)
                                                12
Database management system
• Software to create and manage databases
• Implements a logical data model (e.g., relational)
• Provides infrastructure for security, concurrent
  access, and failure recovery
• Often provide facilities to integrate data validation,
  integrity rules, and triggered/automated data
  operations.
                                                           13
COMP 101 uses PostgreSQL DBMS
• “The World's Most Advanced Open Source
  Relational Database”
• High standards compliance
• Available on Windows, Linux, and MacOS
• Stand-alone server implementation (we
  provide one in COMP 101) (demo)
                                           14
SQL
Structured Query Language
                            15
Structured Query Language
• Special-purpose language for
   • creating (DDL — Data Definition Language),
   • managing (DCL — Data Control Language), and
   • manipulating and querying (DML — Data Manipulation Language)
     databases
• Not (strictly speaking) pure relational:
   • Different terminology (tables, columns, rows)
   • Breaks some aspects (e.g., allows duplicate rows)
   • ”Good enough” for most purposes
• Designed as a stop-gap prototype (1974), became de facto
  standard (1986)
                                                                    16
Some notes on SQL
syntax
• Generally not case-sensitive:
   • cReAtE tAbLe somEtaBLE (a INTeGer); is valid
   • String comparison is case-sensitive ('A' ≠ 'a')
• Single quotes for strings ('abc' not "abc")
   • Watch out for ‘ ’ (invalid) vs. ' ' (valid)
• Whitespace isn’t significant (cf. Python)
• Statements end with semicolon (;)
• -- for line comments
• List syntax (a, b, c) is common
                                                       17
Creating databases in SQL
SQL Data Definition Language (DDL)
                                     18
The essential DDL statements
CREATE <schema
object> ALTER
<schema object> DROP
<schema object>
• e.g., CREATE TABLE, ALTER TABLE, DROP
                                     19
TABLE
        20
    Types of schema “object” in SQL
    (Note: not “objects” as in Java!)
    • In general, each has its own CREATE, ALTER,
      and
      DROP statement
                 Logical (schema)       Administrative Physical
                 TABLE                  USER           INDEX
Focus in
COMP 101         VIEW                   ROLE           CLUSTER
                 DOMAIN                 SCHEMA         TABLESP
                                                       ACE
                 …                      …              …
                                                                  21
Creating tables
CREATE TABLE <name> (
   <column-name> <data-type>
      [DEFAULT
      <expression>]
      [<inline-constraint>],
   <column-name> <data-type>
      [<default>]
      [<inline-constraint>],
   ...,
   [<out-of-line-constraint>, ...]
);
          Note
          1. Things inside angle brackets should be replaced appropriately
             (e.g.
          <name> with Student, Employee etc.).
          2. Anything inside square brackets [] is optional.                 21
COMP 101 SQL conventions
• Use clear singular nouns
  • e.g., Employee instead of Employees
• Use proper nouns (i.e., capitalise!)
  • e.g., Student, Customer, Patient, …
  • e.g., Student ID ⇒ Student_ID
• Replaces spaces between nouns with underscores
• SQL may be case insensitive, but you aren’t!
  • Use ALL CAPS for SQL keywords
                                                 22
SQL demonstration
Creating a table
                    23
A simple student table (Oracle)
-- Create a table called Student
CREATE TABLE Student (
   Student_ID
   VARCHAR(10),
   First_Name
   VARCHAR(50),
   Last_Name
   VARCHAR(50)
);
-- Insert a row into the table
INSERT INTO Student (Student_ID, First_Name, Last_Name)
   VALUES ('1121978', 'Grant', 'Dick');
-- Retrieve and display the data stored in the Student table
SELECT *
FROM Student;
-- Remove the table entirely
                                                               24
DROP TABLE Student;
                      25
Dealing with different types of
data: SQL data types
A whirlwind tour — this doesn’t really become well-understood until
you start using them in table definitions (essentially, this is reference
material for the labs)
                                                                            26
Data types
• SQL standard defines many data types
• Most DBMSs don’t implement all of the
  standard types
• Most DBMSs add their own proprietary, non-
  standard types
                                               27
Text data
types
CHAR(<n>)
  • Fixed-length of <n> bytes, padded with blanks
  • Use this if value length never changes
  • e.g., to store Paper_Code at Otago (e.g. COMP101), use CHAR(7)
VARCHAR(<n>)
  • Variable-length text, not blank-padded
  • Must specify maximum length (of <n> bytes) — DBMS typically limits to a few
    thousand
  • e.g., to store First_Name of a customer, use VARCHAR(50)
TEXT
  • Character Large Object (CLOB)
  • “long” variable-length text (multiple thousands of bytes)
  • e.g., to store a blog article or email, use TEXT
                                                                                  28
Numbers: Exact integer
SMALLINT (smaller than) INTEGER (smaller than) BIGINT
NUMERIC(<p>), DECIMAL(<p>)
  • <p> = precision = number of significant digits
  • e.g., NUMERIC(4) can store values –9999 to 9999
Only store values as a numeric data type if you
need to do calculations with them (otherwise
CHAR/VARCHAR)
                                                   29
Numbers: Exact decimal
NUMERIC(<p>,<s>), DECIMAL(<p>,<s>)
 • NUMERIC may be slightly preferable (but often identical)
 • <p> = precision = number of significant digits
 • <s> = scale = number of digits after decimal point
  (effectively)
 • e.g., NUMERIC(5,2) can store values –999.99 to
   999.99
 • Important to get scale & precision correct
  (especially for money!)
                                                              30
Numbers: Approximate decimal
REAL (also DOUBLE PRECISION)
CAUTION: Floating point values are always an
approximation (Lectures 17-19) and shouldn’t
be used for anything requiring precise
calculations (e.g., money)
                                               31
Boolean (True/False) data types
BOOLEAN
Two possible states (TRUE, FALSE), represented
by:
• TRUE: TRUE, 'yes', 't', 'y', 1, 'on'
• FALSE: FALSE, 'no', 'f', 'n', 0, 'off'
                                                 32
Dates and times
DATE
TIME [WITH TIME ZONE]
TIMESTAMP [WITH TIME ZONE]
Notes:
  •   DATE has precision of one day
  •   Range 1/1/4713 BC → 31/12/294276 AD (no year 0)
  •   Date/time literal values expressed as strings (e.g., '12-MAY-16', '2016-05-12’)
  •   CURRENT_DATE returns current date
  •   Date/Time manipulation by arithmetic operators (+, -)
                                                                                    33
                                   See “The Problem with Time & Timezones” in
                                   the Computerphile channel on YouTube for
                                   the full insanity of handling dates and times
                                   https://www.youtube.com/watch?v=-5wpm-gesOY
Dates and times are
extremely easy to get
wrong — be warned!
Time is heavily dependent on context!
                                                                                   34
What next in SQL? (in COMP 101)
• Add constraints to tables (e.g., primary keys,
  foreign keys, “check” clauses)
• CRUD (create, update, retrieve, delete)
• Views to create permanent queries
• Not covered: user management, security,
  performance management
                                                   35
Summary
• Databases store data in a more organised and consistent
  form than ad hoc solutions using an underlying data model
• SQL databases are heavily inspired by the relational model
• Creating tables in SQL requires careful thought about
  appropriate data types
• We’ve still only covered structural aspects — more SQL to
  come in labs and lectures (constraints and querying)
                                                               36
Thanks!
  Questions?