0% found this document useful (0 votes)
24 views19 pages

DBMS - Unit 3 - Page 1-6

The document outlines the organization of file storage in databases, detailing the hierarchy of storage types including primary, secondary, and tertiary storage, as well as the characteristics of various storage media. It explains file organization methods such as heap files, sorted files, and hashing techniques, along with the importance of indexing for efficient data retrieval. Additionally, it covers operations on files, record types, and the structure of records, including fixed and variable length records, and methods for efficient data access and management.

Uploaded by

Tasrin Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views19 pages

DBMS - Unit 3 - Page 1-6

The document outlines the organization of file storage in databases, detailing the hierarchy of storage types including primary, secondary, and tertiary storage, as well as the characteristics of various storage media. It explains file organization methods such as heap files, sorted files, and hashing techniques, along with the importance of indexing for efficient data retrieval. Additionally, it covers operations on files, record types, and the structure of records, including fixed and variable length records, and methods for efficient data access and management.

Uploaded by

Tasrin Hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

File Organization

The databases are stored in physical storage in a system. This storage media forms a storage
hierarchy that includes:
Primary Storage: These include memory that can be directly accessed by the CPU of the system.
They include primary memory and cache memories. These memories provide fast access but are
limited in size. They are also expensive. Being volatile, they are lost when power off or system crash.
Secondary Storage: These are the mostly used sotrages, include disks and magnetic tapes.
Nowadays, common media lik SSD (Solid State Drive) is also used.
Tertiary Storage: These include optical drives like CDs , DVDs etc
Storage/Memory Hierarchies
There is hierarchy of devices used to store data- Starting with the highest speed memory that
are expensive so less in storage to higher level storages but less expensive.
Hierarchy at Primary Level
Cache Storage (SRAM): This is the most expensive Static RAM. These are used mostly inside the
CPU for instruction execution.
Main Memory (DRAM): These ae Dynamic RAM and provides work area for most programs and
data while in operation. While this is cheaper , it volatile and lower speed in comparison to SRAM.
Hierarchy at Secondary Level
Magnetic Disks: Harddisks , Tapes etc
Flash Memory: These are faster than magnetic storages and slower than DRAM and it is non volatile
and can store large data. Used in cameras etc.
Optical Drives: CDs , DVDs
Tape Juke Boxes: large – in TB
Hardware
This is the description of a Hard disk drive- The disk stores on magnetic surface data as bits (0 or 1)
into bytes. We assume each character is stored in terms of 8 bits called a byte and the capacity of a

Page | 1
disk is measured in bytes. Old Floppies stored 1.2 or 1.5 MB of data. Disks can have single sided or
double sided storages. In order to increase storage the disk surfaces are assemble one on top of
another into disk pack.
On the disk surface, information is stored in concentric circles called tracks. A cylinder is a
group of all tracks of same diameter across all surfaces. The tracks are organized or circularly divided
into sectors.
A Single sided HDD structure:

A disc controller is inserted (either SATA


or SCSI) in order to interface the drive with the system. Another popular interface is SAS (Serial
Attached SCSI), that is faster. The data block is transferred and written or read by use of the head.
Seek time: Time required to position the read/write head on the correct track.
Latency time: This is the time taken to rotate the desired block under the head.
Block Transfer time: This is the time taken to transfer the data block.
Making Data Access More Efficient on Disks
Here are some techniques used to make data access faster in the disk: The methods are:

Page | 2
• Buffering of Data: CPU  → HDD , there is always a slow.. so use memory in between as
buffer. CPU→ MEMORY HDD.
• Proper organization of Data on Disks: Keeping contiguous blocks on contiguous cylinders. This
avoids unnecessary seek and latency time
• Reading data ahead of request: When a track is being read, read also the neighbouring blocks of
the track and this saves seek time. Can be buffered the data read in advance
• Proper Scheduling I/O requests: if there are several blocks to be read, proper scheduling the
reading will allow the seeking, latency easy. The R/W head can move easy and transfer the data
in one go.
• Use of Log disk to temporarily hold writes: Use a single disk or a portion of the disk called
logging of writes and all blocks to be written can go sequentially to the disk- this avoids extra
seek time. The avoids a seek for each write.
• Use of SSD or Flash Memories for Recovery: Where there are more and frequent update, One
way to safeguard information is to use non-volatile SSD buffer and they can be written later to
the HDD
Organization of File Records
Information in database is organized as set of records that are organized as files. Here we see
records, record types and files.
Records and Record Types
A record is a collection of fields and each field is an attribute of an entity which stroes the
data values of an item. Each value consists of one or more bytes of information.
A record type or record format is a collection of field names and their corresponding data types of
each field. (it is the structure).
Consider a data type employee
struct employee
{
char *name;
int age;
char sno;
};
In some of the data records that are new, will also need to store images, videos etc. They are called
BLOB (Binary Large Objects). These may not be stored along with the structure as data, only pointers
are used to point here to the location this BLOB is stored.
Fixed Length and Variable Length Records
We know that a file is a sequence of records. In most cases all records in the file are of same
record type. If every record in the file has exactly the same size in bytes, the file is said to be made up
of fixed length records.
Reversely, if the different records in the file have different sizes, the file is said to be made up of
variable length records. There are several reasons why a file may have variable length records:

• It is possible that the records in the file are of same type, but one or more fields may be of
varying size (variable length field). Eg. The Name field in the Employee can be a variable
length field.
• It is possible that the records in the file are of same type, but one or more fields may have
multiple values for individual records. This is called a repeating field. Ex. In schema Student

Page | 3
{name, courses} –courses may take repeating values . But storing and querying the database
will be complicated
• It is possible that the records in the file are of same type, but one or more fields may be
optional. They are called optional fields.
• It is possible that the records in the file are of different record types, and hence of varying size
(mixed file).

There three types of separators used here. In the first case all fields in the records are of fixed size, so
it is easy to know where the next record begins, no need a separator. In the second and third case the
records are with varied length fields, so separators are used between records, fields etc.

Record Blocking and Spanned Versus Unspanned Records


Record Blocking:
Record blocking, also known as block organization or block structuring, is a technique used
in database management systems (DBMS) to improve the efficiency of data storage and retrieval. It
involves grouping multiple records together into a block, which is the smallest unit of data transfer
between the main memory and secondary storage (e.g., disk). The primary goal of record blocking is
to minimize the number of I/O operations required to access records.

Page | 4
Imagine for fixed length record type file, we have to store it to the storage. These records
have to be allocated to these disk blocks as the block is the unit of data transfer. If the block size is
larger than a record size, each block will store more than one records. But in reality most files have
records that are big which may not fit into a block.
Suppose we have a block size of B Bytes, and in file with fixed length file records of size R
bytes, with B>=R, we can fit bfr = [B/R] records per block, where bfr is called Blocking factor for
the file.
In most cases, R may not divide B exactly , so we will have extra spaces, unused, that is
equal to
B-(bfr*R) bytes
In order to utilize these unused spaces, we may store part of another record in this free space
and use a pointer to the next space of storage, that is next:
Spanned and Unspanned Records
When the blocks have extra space or a record do not fit into a block fully, a part of the record
may be stored in the first available block space and the remaining may be stored in another block.
And the two are connected via a pointer, in case it is not in the next sequential block. This type of
organization is called Spanned organization of records.
If the records are not allowed to cross over to a block boundary, then it is called as
Unspanned organization.

a. Unspanned
b. Spanned

Allocating File Blocks on Disk


This looks how a blocks of the file are allocated on disk. In Contiguous Allocation the file
blocks are allocated to consecutive disk blocks. This type of file reading makes it easy to read and
fast. Another method is Linked Allocation where each block contains a pointer to the next file block.
This makes it easy to expand the read file but slow to read and process the whole file. Another method
is to use Indexed Allocation, where one or more indexed blocs contain pointers to actual file block on
the disks.
File Headers
A file header or file descriptor contains the information regarding the file that the system
requires for processing a file. The header will include the file access blocks details, record lengths, the
order of fields within the records and all details associated with the contents of the file. The file
header also includes meta data (name, type size etc. ) of the file itself. It may also include file status
flags (as open, closed, reading etc).

Page | 5
Operations on File
There are two sets of operation that are associated with a file. They are Retrieval Operations
and Update Operations. The former does not change the content of the file, it only reads and uses the
information, but the latter makes changes to the file by insertion or deletion or update to the records.
Listed below are the standard file operation commands that a DBMS use typically. These are:

• Open: This prepares the file for reading or writing. It makes the buffer available for the
retrieving of blocks. Sets the file pointer to the beginning of the file.
• Reset: It resets or positions the file pointer for an open file to the beginning of the file
• Find/Locate: Searches for the first record that satisfies a search condition. Then transfers the
block containing that record into the buffer/memory and pointer then points to the record in
the buffer- called the current record
• Read: It copies the current record from the buffer to a program variable in the application and
it may also advance the pointer to the next record.
• FindNext: it searches for the next record in the file that satisfies the search condition –
transfers the block into the buffer
• Delete: it deletes the current record and updates the file resultantly.
• Modify: it modifies the field values of the current record and does the update in the block
• Insert: It inserts a new record in the file by locating the location/block where the record is to
be inserted, transfers to buffer and then to the block.
• Close: It completes the file operation by releasing the buffer and doing all clean up operations
• (other commands: Scan, FindAll, Find, FindOrdered, Reorganize )

Page | 6
Primary Methods of File Organization (How records are placed in the file)
Heap Files
This is a most basic type of organization. Here the records are placed in the file in the order in
which they are inserted. So every new record is inserted at the end of the file. This is called heap or
pile file.
Here, inserting a new record is very efficient. The last disk block from the file is copied into a
buffer, the new record is then added and the resultant block then is rewritten back to the file.
Disadvantage here is searching for a record, it involves a linear search, where the file is
searched from beginning to end block by block till found or not found. While simple and flexible,
heap files can lead to inefficient searching and retrieval operations, especially for large datasets.
Here when we delete a record, the block is copied , deleted and then rewritten into the file.
This will leave lot of unused spaces in the blocks, so occasionally these free spaces will have to be
claimed and done by reorganization.
Another way is to keep a deletion marker with reach record block. The marker is set to a
valid number if it is not deleted and when searching only those with valid deletion numbers are
searched. In order to keep these records, we may use spanned or Unspanned methods of organization.
Sorted Files
This is an improvement over the previous organization. Here the records of a file on disk is
stored based on values of one of its fields – we call it ordering field. This will make it an ordered file.
If the ordering field is a key field then it is called a ordering key.
Some advantages

• Reading the records based on key becomes easy, the search key makes it search easier and
faster.
• Finding the next record, having found one is easy because it is based on the key item
• When binary search technique is used, as the records are arranged according to a key.
Disadvantages

• Being sorted order of records, adding a new record becomes complex, as the resultant file also
should be sorted order
o In order to overcome: we use one way , is to keep some free spaces in each block but
once this free space is used, the same problem surfaces.
o Another method is to use two levels of files to write records . It creates a temporary
unordered file called transaction file and the ordered file is called master file. New
records are added at the end of the first file and then sorted and merged occasionaly
with the master file.
This method of ordered file is not used much in file organization. If used, it is with addition of
Indexes.

Page | 7
Hashing Techniques
A faster an most efficient organization is called hashing. This is by using a hash field and
applying a hash function.
Explain H(k)→ L, explain hash tables
Types of hashing (Division Method, Folding Method, MidSquare)
Collision:
Resolve : Collision by : Open Address, Chaining, Multiple Hashing

INDEXING
The primary purpose of indexing is to provide a structure that allows the DBMS to locate and
access specific records efficiently, reducing the time required for search operations. The most
common types of indexes are based on ordered files. We see here the single level indexes that include
Primary, secondary and clustering indexes.
Primary Indexes
A primary index is based on the primary key of the file and determines the physical order of records.
Here there will be two fields in the access structure: The first field is the ordering key field and is the
primary key of the data file, the second field is the pointer to a disk block . The two field values for
an index entry I as :
Let the entry be: <K(i), P9i)> : where K is the key and P is the address or the location block. If we
take the general form as <Ki, X>, then

• X can be a physical address of the block in the file


• X may be the record address within the block address
• X may be logical address of the block
Here we use an example, the primary key is name and is unique. Each entry is as example as:
<K(1)=(Aaron, Ed), P(1)=address of block1>:Example of the above is given below:
Here the figure 17.1 shows, each index entry has a pointer that points to the first block and the
first block is called as block anchor. Here no of entries in index is same as no of blocks. It can be
Dense index: when every primary key has a record pointer associated, i.e. every record has a
pointer
Sparce index: not all key values have a pointer, it will only be for block anchors (not every record
has a pointer – may be only to a block)

Page | 8
The index file occupies a much less space than the data file. This is for two reason:

• There fewer entries in the index file than there are records
• Index entries use only two fields than the fields in a record.
If there is a record whose primary key is K, then that record if it lies in a block P(i) where
K(i) and K(i+1). Once the block is determined through the index table, then using a binary
search using the primary key, the record is located.

The disadvantage, with this problem is that when a record is to be added or deleted
changes have to be made not only in blocks, but changes are also to be made in the block
anchor also. This may be over come by using an unordered overflow file.
https://www.youtube.com/watch?v=29o99zuwwKs

Clustering Index
Here the records are physically ordered on a nonkey field (not primary key), called
clustering field. But the file is ordered, Here records that have same value for the clustering
field are stored in the same block. Here the index file too have two entries : first is the
clustering field entry and the second is the disk block pointer. The pointer points to the first
record in the block for that clustering field. This is non dense index because it has entry for
every distinct value of the indexing field . There will be only one entry for each unique value
of the nonkey attribute

Page | 9
https://www.youtube.com/watch?v=Ph656dyoQVA

Secondary Indexes
In the secondary indexing it provides a secondary means of accessing a data file for which
some primary access already exists. Here the data file records could be ordered, unordered or hashed.
The secondary index may be created on a field that may be a candidate key or unique field or a non
key field too. Here too, the index file consists of two fields: One is the indexing field (a second key,
non key etc.) and the other is a block pointer that points to the block. Many such secondary indexes
can be created to access the file as per the request.
It can be done using sparse or dense.

Page | 10
above is a dense representation and below is a nondense representation

https://www.youtube.com/watch?v=u5iHc46KOyo&t=10s

Page | 11
Make a comparative study of the three indexes

Page | 12
Transaction
A Transaction is an executing program that forms a logical unit of database processing. It
includes one or more database access operation – these can include insertion, deletion, update or
retrieval operations. Transaction can embedded in an application program or can be a separate SQL
interaction.
A transaction usually works within the boundaries of Begin Transaction - --End
Transaction statements. Some transactions can be read-only transaction or read-write transaction.
In order to do the transaction of data items, which may be a record, a block etc, we use the
read(x), write(x) operations, In order to temperaorily keep this values on transfer to or from storage,
we use data buffers. If the buffers are all occupied for a new operation we use buffer replacement
policy as LRU (Least Recently Used) to replace new buffers.
There are two cares to be taken while a transaction is in process: Concurrency control and
Recovery Mechanisms.
Some of the problems that car arise due to concurrency errors are:
Lost Updates:
Lost updates occur when two or more transactions read the same data simultaneously, and each
updates the data independently. As a result, changes made by one transaction may be overwritten by
another, leading to a loss of updates.
Inconsistent Retrievals:
Inconsistent retrievals happen when a transaction reads data that is being modified by another
transaction. This can result in a transaction obtaining a mix of old and new values, leading to
inconsistencies.
Uncommitted Data (Dirty Read):
Dirty reads occur when one transaction reads data modified by another transaction that has not yet
been committed. If the second transaction is rolled back, the first transaction has read data that was
never committed, leading to potential inconsistencies.
Incorrect Summary Problem
it is a concurrency control issue that can occur in a database system when multiple transactions are
executed concurrently, and the summary or aggregate values of the data are not updated correctly due
to interleaved operations. This problem arises in scenarios where transactions perform calculations or
aggregations on a set of data, and the final result may be inconsistent if the transactions are not
properly coordinated.
Non-Repeatable Read:
Non-repeatable reads occur when a transaction reads the same data multiple times, but the values
change between reads due to updates by other transactions. This inconsistency may lead to
unexpected results.
Phantom Reads:
Phantom reads occur when a transaction reads a set of records that satisfy a certain condition, but
another transaction inserts or deletes records that also meet the condition before the first transaction
completes. This can result in the first transaction seeing phantom data that did not exist initially.
Deadlocks:

Page | 13
Deadlocks occur when two or more transactions are blocked, each waiting for a resource that the other
holds. This can lead to a situation where none of the transactions can progress, causing a system
deadlock.
Recovery
If a transaction is not committed (all its logic is done) or aborted (one or many failed), In
such case there are failures and these failures can be of : transaction, system and media failures.

• A computer Failure: it can be hardware, software network error, it can also be media failure
• A transaction error: these can be some program error like integer overflow, division by zero
• Local errors or exception: this is a exception error or
• Concurrency control: concurrency control method may abort a transaction
• Disk failure:
• Physical Problem and Catastrophe:

ACID PROPERTIES
ACID properties are a set of characteristics that guarantee the reliability and consistency of
transactions in a database system. These properties ensure that database transactions are processed
reliably even in the face of failures, errors, or system crashes. The acronym ACID stands for
Atomicity, Consistency, Isolation, and Durability:
Atomicity (A):
Atomicity ensures that a transaction is treated as a single, indivisible unit of work. Either all the
changes made by the transaction are committed to the database, or none of them are. If any part of a
transaction fails (due to errors, crashes, or other reasons), the entire transaction is rolled back to its
previous state, ensuring that the database remains in a consistent state.
Consistency (C):
Consistency ensures that a database transitions from one consistent state to another after the
successful completion of a transaction. The consistency constraint specifies that the integrity
constraints and rules defined for the database must be maintained throughout the execution of a
transaction. If a transaction violates any integrity constraints, the entire transaction is rolled back to
maintain consistency.
Isolation (I):
Isolation ensures that the execution of one transaction is isolated from the execution of other
transactions. Each transaction appears to be executed in isolation, even if multiple transactions are
being processed concurrently. Isolation prevents interference and ensures that the intermediate state of
a transaction is not visible to other transactions until the transaction is committed.
Durability (D):
Durability ensures that once a transaction is committed, its effects persist in the database even in the
event of system failures (such as power outages, crashes, or hardware failures). Committed changes
are stored in a durable and permanent manner, typically through mechanisms like transaction logs or
write-ahead logging, so that the system can recover and restore the database to a consistent state after
a failure

In summary:
Atomicity: Transactions are all-or-nothing; either they are fully completed, or none of their changes
take effect.

Page | 14
Consistency: Transactions bring the database from one consistent state to another, preserving integrity
constraints.
Isolation: Concurrent transactions are executed as if they are the only transactions in the system,
preventing interference between them.
Durability: Committed changes are permanent and survive system failures.

TRANSACTION STATES
In the life of an operation, the following take place: BEGIN_TRANSACTION, READ,
WRITE, END_TRANSACTION, COMMIT_TRANSACTION, ROLLBACK. Based on these are the
following states:
Active State:
The transaction begins in the active state when it starts its execution. In this state, the transaction
performs various operations, such as reading or writing data, and it is actively contributing to changes
in the database.
Partially Committed (Prepared) State:
After a transaction has executed all its operations successfully, it enters the partially committed state.
In this state, the transaction has not yet been officially committed, but it is prepared to be committed.
The DBMS ensures that all changes made by the transaction are recorded and can be committed to the
database.
Committed State:
In the committed state, the transaction has been successfully executed, and all its changes are
permanently applied to the database. The DBMS ensures that the changes are durable, meaning they
will persist even in the event of system failures.
Rolled Back State:
If a transaction encounters an error during its execution or if it is explicitly rolled back, it enters the
rolled back state. In this state, any changes made by the transaction are undone, and the database is
restored to its state before the transaction began.
Aborted State:
An aborted state is similar to the rolled back state. It indicates that a transaction has been terminated
due to an error or an explicit request for abort. The key difference is that the term "aborted" is often
used when describing the result of an error or exceptional condition that forced the transaction to
terminate.
Failed State:
In the case of a system failure or crash, a transaction may enter a failed state. This means that the
transaction was active at the time of the failure and could not complete its execution. Transactions in
the failed state need to be recovered during the system's restart or recovery process.
Terminated State:

Page | 15
The terminated state is a general term that encompasses both committed and aborted states. Once a
transaction has completed its execution, whether by committing or aborting, it is considered
terminated.

Indeterminate State:
In some cases, a transaction may end up in an indeterminate state if there are uncertainties about its
outcome. This can happen in distributed systems where communication failures or network issues
make it challenging to determine the final status of a transaction.
The following pic puts together the operations and the states that they pass through:

SCHEDULES

A schedule refers to the chronological order of execution of transactions in a multi-


user or concurrent database environment. A schedule defines the sequence in which
individual operations (read and write operations) of different transactions are interleaved or
overlapped over time. Managing schedules is crucial for ensuring data consistency, integrity,
and isolation in a concurrent database system.

A schedule S of n transactions T1, T2…Tn, is the ordering of the operations of the


transactions. Operations from another transaction can be interleaved in S.

Conflicting Operations in a Schedule:

Two operations in schedule are said to conflict they satisfy all three of the following
conditions:

• They belong to different transactions


• They access the same item X
• At least one of the opertons is write(x)
Schedules can be two types:
- Serial Schedule
In serial schedule only after the commit of one transaction, will the other transaction begin. it
satisfies the ACID properties.

Page | 16
Eg
T1 READ (X),
T1 WRITE (X)

T2 READ (X),
T2 WRITE (X)
T3 READ (X),
T3 WRITE (X)
Here the order is T1(COMMIT) ->T2(COMMIT)-> T3(COMMIT)
- Concurrent schedule
In a concurrent schedule, there will be interleaved execution of two or more transactions
simultaneously.T1 then T2 and then T1 then back to T2. But they will all be complete transactions i.e.
T1 and T2 will end with a commit.
In comparison,
- In Serial Schedule: No problem of inconsistency; But poor resource utilization
But in
- Concurrent schedule
There can be inconsistency problem; But there is better resource utilization and
throughput
Characterizing Schedules Based on Recoverability
Some schedules, it is easy to recover from a transaction or system failure whereas in some
cases it is tedious. We now see the types of schedules where recovery is possible or relatively simple.
These only provide a theoretical categorization of schedules from the recoverability issues.
Schedules can be characterized based on their recoverability. Recoverability refers to the ability to
restore the database to a consistent state in the event of a transaction failure or system crash. Different
types of schedules exhibit varying levels of recoverability. Here are the key characterizations of
schedules based on recoverability:
Recoverable Schedule:
A schedule is considered recoverable if, for every pair of transactions T1 and T2, where T2 reads a
data item previously written by T1, and always T1 commits before T2 commits. This ensures that if
T2 reads data modified by T1, T1's changes are durable (committed) before T2 makes its changes
permanent. (in short we are able to undo or get back to previous stage)
https://www.youtube.com/watch?v=sF3y_40SnWE&list=PLYzc3veT5szJdASKHd_qHVa7Gb99dsF
C6
Cascading Rollback (Non-Recoverable) Schedule:
A schedule is non-recoverable or exhibits cascading rollback if it allows the changes made by a
transaction that eventually gets rolled back to affect other transactions that have read its modified
data. This leads to a chain reaction of rollbacks.(Eg. T1 begins, then T2 and T3 begins, then T1 rolls
back, this will cause, T2 and T3 also to roll back)
Cascadeless (Cascading Abort) Schedule:
A schedule is cascadeless if, for every pair of transactions T1 and T2, where T2 reads a data item
previously written by T1, T1 commits before T2 commits or aborts. In cascadeless schedules, if T2
reads data modified by T1, either T1's changes are committed before T2 or T1's changes are rolled
back before T2.

Page | 17
Strict Schedule:
A strict schedule is a type of recoverable schedule that ensures that the effects of a transaction are not
visible to other transactions until it is committed. Strict schedules provide a high level of isolation,
preventing anomalies such as non-repeatable reads or phantom reads.

Serializable Schedule
A schedule S is serializable if it is equivalent to some serial schedule S’. meaning that the end
result is the same as if the transactions were executed one after another in some order. Serializable
schedules provide a high level of isolation and consistency, preventing concurrency-related
anomalies. If two transactions that happen concurrently produce the correct result as if they were
serial schedules, then they are said to be serializable schedules.

Explanation for : Serial, Non Serial, Serializable schedules: → https://www.youtube.com/watch?v=UGA4xjEGi-0

Testing for Serializability:


https://www.youtube.com/watch?v=VqpfBwzFWUA
Precedence Graph:
A precedence graph is a directed graph that represents the order in which transactions read and write
data items. In the graph:
Each transaction is represented by a node. An edge from transaction Ti to Tj indicates that there is a
conflict operation between Ti and Tj. A conflict operation is a situation where one transaction reads or
writes a data item that another transaction writes.

Page | 18
The Procedure
Steps:
1. Create a node for each transaction
2. Create a directed edge Ti to Tj if Tj reads the value of an item written by Ti
3. Create a directed edge Ti to Tj if Tj writes the value of an item after Ti reads it
4. Create a directed edge Ti to Tj if Tj writes the value of an item after Ti writes it
if the precedence graph is acyclic then only the schedule is serializable and if it is cyclic it is not
serializable
(Be ware: when there is a R→W; W→R; W→W

Ti Write Ti Read

Step: 2 Step: 3,4

Tj Read Tj Write

Page | 19

You might also like