CS102
Database Management
System
Introduction to Old Models
Data Categories Based on Source
Data are stored in Documents (A file)
Unstructured Semi-structured Structured
A file stored on A web page
A database
your PC stored on WWW
(Text Mining) 90%
(Web Mining) (Data Mining)
WLM WSM WCM WUM
Opinion Mining & Sentiment Analysis Advice Mining
Another Data Categorization
Quantitative vs. Categorical
Quantitative data
Discrete (counting values, e.g., no. of trees in a gardeen)
Continuous (measurement values, e.g., height of students in a
class)
Interval-scaled (only differences)
Ratio-scaled (both differences & ratio)
Categorical data
Nominal (non-numeric values, e.g., color names – red, blue,
black)
Ordinal (non-numeric values with ordering relation)
Database, DBMS & Database System
Database: A collection of related data.
Database Management System (DBMS): A software
package/ system to facilitate the creation and
maintenance of a computerized database.
Database System: The DBMS software together with the
data itself. Sometimes, the applications are also
included.
Classification of DBMS
Single-user (typically used with personal computers)
vs. multi-user (most DBMSs).
Centralized (uses a single computer with one database)
vs. distributed (uses multiple computers, multiple
databases)
Centralized and
Client-Server DBMS Architectures
Centralized DBMS:
Combines everything into single system including- DBMS software,
hardware, application programs, and user interface processing
software.
User can still connect through a remote terminal – however, all
processing is done at centralized site.
A Physical Centralized Architecture
Basic 2-tier Client-Server Architectures
Specialized Servers with Specialized functions
Print server
File server
DBMS server
Web server
Email server
Clients can access the specialized servers as needed
Logical two-tier client server architecture
Clients
Provide appropriate interfaces through a client software module
to access and utilize the various server resources.
Clients may be diskless machines or PCs or Workstations with
disks with only the client software installed.
Connected to the servers via some form of a network.
(LAN: local area network, wireless network, etc.)
DBMS Server
Provides database query and transaction services to the clients
Relational DBMS servers are often called SQL servers, query servers, or transaction
servers
Applications running on clients utilize an Application Program Interface (API) to access
server databases via standard interface such as:
ODBC: Open Database Connectivity standard
JDBC: for Java programming access
Client and server must install appropriate client module and server module software for
ODBC or JDBC
Two Tier Client-Server Architecture
A client program may connect to several DBMSs, sometimes
called the data sources.
In general, data sources can be files or other non-DBMS software
that manages data.
Other variations of clients are possible: e.g., in some object
DBMSs, more functionality is transferred to clients including data
dictionary functions, optimization and recovery across multiple
servers, etc.
Three Tier Client-Server Architecture
Common for Web applications
Intermediate Layer called Application Server or Web Server:
Stores the web connectivity software and the business logic part of the application used to
access the corresponding data from the database server
Acts like a conduit for sending partially processed data between the database server and
the client.
Three-tier Architecture Can Enhance Security:
Database server only accessible via middle tier
Clients cannot directly access database server
Three-tier client-server architecture
Classification of DBMSs
Based on the data model used
Traditional: Relational, Network, Hierarchical.
Emerging: Object-oriented, Object-relational.
Other classifications
Single-user (typically used with personal computers)
vs. multi-user (most DBMSs).
Centralized (uses a single computer with one database)
vs. distributed (uses multiple computers, multiple databases)
Variations of Distributed DBMSs (DDBMSs)
Homogeneous DDBMS
Heterogeneous DDBMS
Federated or Multidatabase Systems
Distributed Database Systems have now come to be known as
client-server based database systems because:
They do not support a totally distributed environment, but rather a set
of database servers supporting a set of clients.
DATA MODELS
Data Models
Data Model:
A set of concepts to describe the structure of a database, the operations for manipulating
these structures, and certain constraints that the database should obey.
Data Model Structure and Constraints:
Constructs are used to define the database structure
Constructs typically include elements (and their data types) as well as groups of elements
(e.g. entity, record, table), and relationships among such groups
Constraints specify some restrictions on valid data; these constraints must be enforced at all
times
Data Models (continued)
Data Model Operations:
These operations are used for specifying database retrievals and
updates by referring to the constructs of the data model.
Operations on the data model may include basic model operations (e.g.
generic insert, delete, update) and user-defined operations (e.g.
compute_student_gpa, update_inventory)
History of Data Models
Hierarchical Model
Network Model
Relational Model
Object-oriented Data Models
Object-Relational Models
History of Data Models (2)
Hierarchical Model:
Initially implemented in a joint effort by IBM and North
American Rockwell around 1965. Resulted in the IMS family of
systems.
Hierarchical model was formalized based on the IMS system.
This model is like a structure of a tree with the records
forming the nodes and fields forming the branches of the tree.
History of Data Models (3)
Advantages:
Simple to construct and operate
Corresponds to a number of natural hierarchically organized domains, e.g.,
organization (“org”) chart
Language is simple:
Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT,
etc.
Disadvantages:
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization“
Represent one to many relationship only
History of Data Models (4)
Network Model:
In network model, data are represented by records using links
among them.
It is an improvement over the hierarchical model.
A record type can have multiple owners, and we can have
many to many (M:N) relationships among records.
The first network DBMS which was implemented by Honeywell
in 1964-65 (IDS System)
Adopted heavily due to the support by CODASYL (Conference
on Data Systems Languages).
Later implemented in a large variety of systems – IDMS, DMS
1100 (Unisys), IMAGE (HP), and VAX –DBMS.
History of Data Models (5)
Advantages:
Able to model complex relationships.
Can handle most situations for modeling using record types
and relationship types.
Language is navigational; uses constructs like FIND, FIND
member, FIND owner, FIND NEXT within set, GET, etc.
Disadvantages:
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread
through a set of records.
Little scope for automated “query optimization”
History of Data Models (6)
Relational Model:
Data is organized in the form of rows and columns similar to a
table.
The tables are referred to as relations in a relational data model.
Rows of the table are referred to as tuples and the columns of a table
are referred to as attributes.
Proposed in 1970 by E.F. Codd (IBM)
Now in several commercial products (e.g. DB2, ORACLE, MS
SQL Server, SYBASE, INFORMIX).
Several free open source implementations, e.g. MySQL,
PostgreSQL and most dominant for developing database
applications.
SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-
99, SQL3, …
History of Data Models (7)
Advantages:
Ease of use: Data in tables consisting of rows and columns is
much easier to understand.
Flexibility: Different tables can be linked and data can be
extracted to give information in the form in which it is desired.
Precision: The usage of relational algebra and relational
calculus in the manipulation of he relations ensures that there
is no ambiguity.
Security: Security control and authorization can be
implemented by moving sensitive attributes into a separate
relation with its own authorization controls.
Data Independence: It is achieved more easily with
normalization structure used in a relational database than in
the more complicated tree or network structure.
History of Data Models (8)
Disadvantages:
Poor Performance: If the number of tables between which
relationships to be established are large and the tables
themselves effect the performance in responding to the SQL
queries.
Physical Storage Consumption: Operations like join consumes
too much physical storage.
Poor Interpretability: Difficult to trace relationship between
different tuples, in comparison to hirarchical and network
models.
A Summarized View (Src. webeduclick.com)
History of Data Models (9)
Object-Oriented Model:
Several models have been proposed for implementing in a
database system.
One set comprises models of persistent O-O Programming
Languages such as C++ (e.g., in OBJECTSTORE or VERSANT),
and Smalltalk (e.g., in GEMSTONE).
Additionally, systems like O2, ORION (at MCC - then ITASCA),
IRIS (at H.P.- used in Open OODB).
Object Database Standard: ODMG-93, ODMG-version 2.0,
ODMG-version 3.0.
History of Data Models (10)
Object Relational Model:
Most Recent Trend. Started with Informix Universal Server.
Relational systems incorporate concepts from object
databases leading to object-relational.
Exemplified in the latest versions of Oracle-10i, DB2, and SQL
Server and other DBMSs.
Standards included in SQL-99 and expected to be enhanced in
future SQL standards.
Database Applications
Almost every information system
Traditional Applications:
Numeric and Textual Databases
Specialized Applications:
Multimedia Databases
Geographic Information Systems (GIS)
Data Warehouses
Real-time and Active Databases
Many other applications
Resources
Graph database (neo4j): https://neo4j.com/developer/graph-
database/
NoSQL database (MongoDB) https://www.mongodb.com/nosql-
explained
Database related conferences
32
Resources…
33