Data modeling
Prof. Amos DAVID
http://ui-n2.loria.fr
Course content
Why data modeling ? Entity-Relation model Relational model We will focus on
WHY WHAT
HOW
2
Why data modeling ?
Illustration with an example
From information problem statement to data specification
A case study
Problem statement
We want an information system on students
Questions
What is an information system ? What do we mean by students ? Why do we want the information system ?
The system should provide answers to what questions ?
4
What is an information system ?
Functional characteristics of an information system
Store information (creation) Retrieve information (access) Update information (modification, deletion)
Components of an information system
Users
End-user Information system manager/administrator
Information base (database) User interface
To implement the functional characteristics Between the end-user and the information base
General schema of an IRS : functional approach
User
Information problem
Information problem transformation into access expression
Common access methods * navigation * query
Information base
Matching
Store /Update
Results
Objects
6
What do we mean by students ?
Students viewed by who ?
Admission office ? Post graduate school ? Registrars office ? A university department ? Alumni association ? By a state government ? Foreign government ? By other structures ?
University structures
Government structures
Students considered over what period ?
From admission to graduation only ? Consider ex-students ?
Who are the end-users ?
This will determine how students are viewed This will determine the final use of the information to be accessed Examples of end-users in the university structure
The VC The registrar The Dean The HOD Any category of student (in-course, ex-student)
8
Why data modeling ?
Represent the real world
Focus on the use of the elements Represent only the necessary elements Represent the relationships between the selected elements DO NOT ignore or neglect necessary elements or relations
9
Why data modeling ?
For efficient computerization
Reduce data redundancy
Disk space problem Volume of data transfer Describe the computed elements user notice technical notice
Objects of documentation
For the programmer For the system designer / manager For the end-users
Guaranty data integrity
Valid information irrespective of context
10
Example on data integrity
Admission office
STUDENT (N, names, marital status, gender, age, degree, address)
The department
STUDENT (N, names, date of birth, courses, address)
The administrative office
PERSONNEL (N, names, marital status, faculty, department, address)
11
General schema of an IRS : Users and usage centered approach
IRS (What is seen by the end-user) Implemented for Database Management System (DBMS) Determined by
Developed using
Event Operation Data
Data modeling Real world
12
Entity-Relation model
13
Represent the real world elements with four main concepts
Entity Attributes Relation Cardinality Employs graphic representation Intuitive approach
14
Entity
The basic conceptual or real element Examples
A student A personnel A town
Entities have real existence (the instances) They are identifiable
Amos DAVID Charles ROBERT Ibadan
15
Entity
Each entity is associated with a set of attributes The instances of an entity have the same characteristics They have the same set of attributes Examples
All students have the same set of attributes All members of staff have the same set of attributes
16
Attributes
Attributes are used for describing the entities The entities and their attributes are determined according to the database project Taking into account the functions to be accomplished Examples
Represent students at the department for course registration Represents members of staff for salaries and promotions One of the attributes must be an IDENTIFIER Its value is unique for each entity
17
Attributes
How to reduce redundancy
Avoid structured attributes
Structured attributes should be decomposed
Example
Names First name, last name Address Street n, street name, town, local government, state
Decomposing structured attributes allows an easy access to the component elements
Example
The town element of an address can be easily extracted instead of performing string extraction on the structured element
18
Attributes
Examples
Name, Address Amos DAVID; Dept computer science, UI Ibadan, Ibadan, Oyo state Olu OJO; 23 Aderemo street, Agbowo, AgbowoLGA, Ibadan, Oyo state Uche KALU; 5 market road, Anambra, Anambra state
Problems with this representation
The addresses do not have the same number of elements, so how can one obtain a specific component ?
The nth element ? Starting from the nth character ? How can one locate the town within an address ?
19
Attributes
Examples Dissociate structured elements
Name, Street number, Street name, Town, Local government, State
Amos DAVID; Dept computer science; UI Ibadan; Ibadan; ;Oyo state Olu OJO; 23; Aderemo street; Agbowo; Ibadan; ; Oyo state Uche KALU; 5; market road; Anambra; ; Anambra state
Efficiency
Each entry has the same number of elements A component element can be easily extracted using its position Example
The town value is always at the 4th position The state value is always at the last position The position can be in string functions or for the colon numbers in tables
20
Attributes reducing redundancy
Avoid attributes whose value is a list ; a new entity should be created Example (memory redundancy)
Courses as attribute of Degree We do not know the number of courses for a degree
Create DEGREE and COURSE Associate the two entities (to be seen later)
21
Example (memory redundancy)
Computer science, course 1, course 2, course 3 Biology, course 3, course 6, course 7, course 20 Chemistry, course 7, course 3, course 8, course 9, course 10 In terms of memory allocation, how many courses should be anticipated ? Because of the unknown number of courses, the anticipated number will either be too few or too many
22
Attributes
How to ensure data integrity
Identify the functional dependency between attributes
Example of dependency
A town belongs to only one state Towns are unique there is dependency between town and state if the town is known, the state can be determined unambiguously
In a case of functional dependency, create a new entity to regroup the dependent attributes Create a relation between the new entity and the original one
23
Attributes
Examples
Name, Street number, Street name, Town, Local government, State 1.Amos DAVID; Dept computer science; UI Ibadan; Ibadan; Oyo state 2.Olu OJO; 23; Aderemo street; Agbowo; Ibadan; Oyo state 3.Samuel UCHE; 213 Sango road; Dugbe; Ibadan; Oyo state 4.Uche KALU; 5; market road; Anambra; ; Anambra state
Entities 1, 2 and 3 are redundant, prone to non integrity
Entering entities 1, 2 and 3 (town, state) three times may produce typographical error Should a town change from one state to another, all the entities are no longer valid
All the entities must be modified
24
Entity graphical representation
An entity is represented by a rectangle divided into two parts The name of the entity is represented at the upper part The names of the attributes are represented at the lower part The identifier is underlined
PERSON Number Last name First name Date of birth
TOWN Town name State Local government
25
Relation
A relation specifies the association between two or more entities Example
Town and Person
The relation should specify the semantic of the association
A person lives in a town
26
Relation
A relation is symbolized by an oval with its semantic inside the oval A relation is further specified by cardinalities that indicate the number of associated instances Example
A person lives in a minimum of one town and in a maximum of 1 town A town is inhabited by a minimum of one person and a maximum of n (indicating several)
TOWN Name Surface area State Local government
(1,1)
Lives in (1,n)
PERSON Number Last name First name Date of birth
27
Relation
A relation may sometimes have an attribute
The attribute describes the relation and not the entities associated Example
The number of an article bought by a client as well as the date are neither an attribute of the client nor that of the article, but an attribute of the association
The attributes of the relation are indicated at the lower part of the oval that represents the relation
28
Relation
CLIENT Number Last name First name Date of birth ARTICLE Name Unit price (1,m)
(1,n)
Bought Quantity Date
29
Relation
Maximum cardinality
This indicates the maximum cardinalities on either side of a relation
Example
[n:1] (1,1)
TOWN Name Surface area State Local government
Lives in (1,n)
PERSON Number Last name First name Date of birth
30
Relation
How to read the relations : recall
A person lives in a minimum of 1 town and in a maximum of 1 town A town is inhabited (is lived) by a minimum of one person and a maximum of n person (several)
[n:1]
TOWN Name Surface area State Local government
(1,1)
Lives in (1,n)
PERSON Number Last name First name Date of birth
31
Proposed methodology
Identify an entity Develop fully the entity and chose the identifier Associate the entity with existing ones if and where necessary Establish the cardinalities
32
Relation
A relation can be between two same entities Example
A person is the father of another person WARNING : A person is the father of 0 or many persons ; A person has as father 1 and only 1 person
(0,n) PERSON Number Last name First name Date of birth (1,1)
Father of
33
Exercise
In a shopping center, a client can buy one or more articles of various quantities. Propose an ER model for representing the elements of information necessary for billing the client.
34
REMARKS
The entity and the association as described above correspond to the description of the concepts. In the literature, they are termed entity-type and relationtype. Their instantiations (existence) are termed entity and relation.
For us we use entity and entity-type ; relation and relation-type indifferently.
35
Relational model
36
Relational model
The basic concepts
Relation Domain Attribute Key N-uplet
37
Relation (Table)
STUDENT
Last name
Attributes
First name Date of birth Degree
N-Uplets
Domain (same types of value : names)
38
Domain
Represents the data type of a column Can be defined in form of intention or extension In form of intention, it is specified by a formal definition
Example
Integer values Character set of less than 20 characters
In form of extension, it is specified as a finite list of values
Example
Town : {Oyo, Ibadan, Lagos}
39
Relation
A relation R is represented as R(A1, , An)
Where
A1 takes its values from D1 An takes its values from Dm m <= n
40
Attribute
An attribute specifies a constituent of the relation (a particular column of the table)
Attributes are unique within a relation (each column must be distinguished from the others)
Two columns should not have the same name
Two attributes may have the same domain
Example
First name, Last name : NAMES
41
Cartesian product of relation
Let R(A1, A2) be a relation The Cartesian product of the relation represents all the possible combinations of the values of the attributes Example
Cars-parked (Make, Color) Where
Make makes of cars (Toyota, Peugeot) Color colors of cars (red, black, white)
42
Cartesian product of a relation
Toyota, red Toyota, black Toyota, white Peugeot, red Peugeot, black Peugeot, white
43
The intention of a relation
The intention of a relation specifies how the relation should be interpreted Example
Cars-parked (Make, Color) Cars parked in front of the department of computer science, University of Ibadan
44
N-uplet
Represents the extension of a relation It is a Cartesian product of attributes
A line of the table Also called a record
Example
Person (First name, Surname)
Amos, David John, Olaoye
45
Schema of a relation
The schema of a relation specifies the intention of the relation and the associated integrity constraints
46
Constraint of data integrity
Constraint on a single attribute
Example
The values of vehicle makes should be German vehicles
Constraint based on two attributes
Example
The date of marriage should be date of birth
Constraint on the table n-uplets
Example
The number of registrations for a degree in one year should be limited to one
47
Maximum Key
A set of attributes of a relation whose values are distinct for each n-uplet Example
Person (Matriculation number, First name, Last name, Date of birth, email) All the attributes combined can form the maximum key
48
Minimum key (the key)
A key is the minimum set of attributes of a relation whose values are distinct for each n-uplet Examples
Student N, first name, last name
The Student N, first name combined can be used as key, but only the Student N is sufficient
Student N, first name, last name, email
Either Student N or email can be used as key
49
Functional dependency
In the relation R(X, Y, Z),
There is functional dependency between X and Y if and only if the value of Y is determined by the value of X and the value of X is unique
Example
Person (N, FN, LN, town, state)
There is FD between town and state
50
Relations in 1st normal form
A relation is said to be in 1st normal form if all the attributes are of single values
51
Relations in 2nd normal form
A relation is in 2nd normal form if and only if it is in 1st normal form and there is no FD between a subset of the key and the rest of the attributes
This mean that the key of the relation must be a minimum key
52
Relations in 3rd normal form
A relation is in 3rd normal form if it is in 2nd normal form and there is no FD between non key attributes
53
From ERM to RM
1. 2. 3. 4.
5.
To each entity corresponds a relation (Entity name relation name) To each attribute of an entity corresponds an attribute of the relation The identifier of the entity becomes the key of the relation For associations of maximum cardinality [1:n], add the key of the relation on the n side to the relation on the 1 side For associations of maximum cardinality [n:m], a new relation should be created using the concatenation of the keys of the associated relations as the key. The attributes of the association should be added as attributes of the new relation
54
REMARKS
A collection of relations obtained from an entity-relation model as described above will have the following characteristics
Each attribute is single-value The key contains the least number of attributes There are no dependencies between the attributes
A collection of relations that have the above characteristics are considered to be of 3rd normal form
55
Important problem
Some relations resulting from the translation may not have keys In this case, define a new key This happens occasionally particularly in the transformation of NM associations
56
Graph of relations
Specify by pointed arrows the origin of imported attributes Redraw the relations in form of rectangles Use the pointed arrows to link the relations REMARKS
There should be no linked circle
57
Practical
Model the following types of person in the university
the students the members of staff
Be sure to apply the methods for reducing redundancy and guaranteeing data integrity
58
Practical : University students
Description
Each student has a number, a name, an address A student is registered for a degree A student may not register for more than one degree simultaneously A student may take several degrees from the university To a degree is associated a set of courses A degree is managed by a department A course is offered by only one department
Example of questions
What are the courses associated with a degree ? What are the courses taken by a student for a degree ? What are the courses offered by a department ?
Propose an ER model
59
Practical Documentary information system
A library contains the following types of document
Books Journals that contain articles Proceedings that contain articles Write-ups for master and PhD works Books and write-ups are described using title, authors, and a list of keywords Journals and proceedings are described using the title, editor and year of publication Articles are represented using the title, authors, their addresses and a list of keywords The authorized keywords for describing the documents are represented using a thesaurus Propose an ER model and the associated MR to manage the information on the various types of document in the library as well as the thesaurus
60
A thesaurus
List of concepts linked by semantic links The semantic links are
Specific Generic link (hierarchy) See also (association) Used for (synonymous)
61
Example of a thesaurus
Transport
Specific/generic
Plane
Specific/generic
Boat
Specific/generic
Car
Vehicle
See also
Boeing
Airbus
Mercedes
Peugeot
62