0% found this document useful (0 votes)

65 views28 pages

Lecture3 PDF

This document summarizes a lecture on representing data elements in a database. It discusses storing fixed and variable length tuples, dealing with pointers, and issues with updates. It covers storing records in blocks, using offset tables and structured addresses, and managing pointers when blocks are moved between main memory and secondary storage using techniques like pointer swizzling.

Uploaded by

john

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views28 pages

Lecture3 PDF

Uploaded by

john

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Advanced Database Technology

Rasmus Pagh and S. Srinivasa Rao

IT University of Copenhagen
Spring 2006

Representing data elements

February 13, 2006

Based on Chapter 12 in GUW, [Pagh03] Sec. 1, and [CLRS01] pp. 405-409

This lecture: Representing data elements.

In this lecture we ask: How does one store relations in a blocked memory?

title year length filmType

Star Wars 1977 124 color
Mighty Ducks 1991 104 color
Wayne’s World 1992 95 color

Schema:Movie; Star Wars; Schema:Movie; Wayne’s

1977; 124; color; Mighty World; 1992; 95; color;
Ducks; 1991; 104; color ...

Problems with updates:

What if we want to add “Episode 4” to “Star Wars” but there is not
sufficient space in the block?

Representing data elements 1

Overview of this lecture

• Storing fixed sized tuples

• Dealing with pointers
• Variable length tuples
• Updates
• Queues, stacks, and linked lists (I/O model and amortized analysis)
• Index structures (separate slide set)

Representing data elements 2

Some terminology

• Attributes are stored as a sequence of bytes, called fields.

• Tuples are stored as a collection of fields, called records.
• Records are put together and are stored in blocks.
• A relation is a collection of records stored in blocks, called a file.

Representing data elements 3

Attributes stored in fields

The schema of a relation specifies the type of attributes. This decides how
much space is needed to store a relation. (It may be of variable size.)
• CHAR(7) a string of length 7 is stored in 7 bytes.
• BIT(2) is two bits, can be stored in two bits, but often a whole byte is
used.
• {RED, GREEN, BLUE, YELLOW} is an enumerated type that can be
stored as 00, 01, 10, 11, i.e. two bits is enough.

Representing data elements 4

Variable sized data

Some attributes may not have fixed sized. If the size varies a lot for
different tuples, then we do not want to allocate memory for all tuples to be
able to store the maximum sized attributes.
However, that is what VARCHAR(n) in SQL does. n + 1 is the number of
bytes allocated for the string, even if it may be shorter.

Two solutions:
• Length + content: n + 1 bytes allocated for a string of length n.
6 S t r i n g (assuming n < 256)
• Null-terminated string: n + 1 bytes allocated for a string of length n.
S t r i n g ⊥

Representing data elements 5

Records

A tuple is stored in a record. The size is the sum of sizes of the fields in the
record. A record often also stores a ‘header’.

A record header might store information such as

• the schema for the record (or a pointer to it)
• Size of the record
• Timestamps (last read, last updated)
The schema is used to access specific fields in the record.

Representing data elements 6

Schema information

Tells us how the fields (attributes) are stored within a record (tuple).
It contains
• the attributes of the relation
• their types
• the order in which attributes appear in a tuple
• constraints on the attributes and the relation

Representing data elements 7

Fixed-length records in blocks

Records are stored in blocks. Typically a block only contains one kind of
records.

The block may have a header with info.:

• Index information, often in form of pointers. (More on indexes later
today.)
• Type of tuples in the block.
• Offset table for the records in the block. Needed if records are of
variable length.
• Block ID.
• Timestamps.

Representing data elements 8

Problem session: Packing fields

Read the box on page 573 in the book and discuss:

• Do you agree with their conclusion?
• When is it a good/bad idea to pack fields?

Representing data elements 9

Block and record addresses

Addresses (pointers) to fields, records and blocks are often part of records
and we have to deal with them in a special way. E.g., pointers to schemas
and pointers used in index structures are stored in records.

Why addresses are different from other kind of data:

• Blocks are moved from secondary memory to main memory when they
are used.
• Records may move, both within a block and from one block to another.
• Records may be deleted.
• Attribute values may change size, i.e. data move within a record.

Representing data elements 10

Block addresses in main and secondary memory

Block address for blocks in main memory:

The block has an internal memory address when it is loaded into a buffer in
main memory.

Block address for blocks in secondary memory:

The physical address has to be used. The physical address describes the
physical location of the block.

Representing data elements 11

Physical and logical addresses

Physical address
Describes physical location, i.e. which disk, which cylinder, which track etc.
Typical size is 8-16 bytes.

Logical address
A fixed length arbitrary string for each record. A map table is used to map
logical addresses to physical addresses.
Useful when records are moved, since only the map table has to be updated,
and not the references to the record.

Representing data elements 12

Structured addresses

Structured address
A combination of physical and logical addresses. E.g., only the physical
address for the block. To find a record, an offset table in the block or
another kind of search in the block is needed.

Reasons why structured addresses are useful:

• A record can move within a block and still have the same structured
address.
• When a record is removed it can be replaced by a tombstone that
marks it deleted. The structured address can still be unchanged. When
the record is looked up we know it is deleted.

Representing data elements 13

Offset tables

How to organize an offset table:

• Grow the offset table from left to right and insert records from right to
left (since we do not know the size of the offset table when dealing with
variable length records and when using tombstones).
• If the entries of the offset table are large enough references to other
blocks can be stored. Useful when records are moved and we do not
want to update the address.
• The tombstone can be stored in the offset table and the space used by
the deleted record can be reused by another record.

Representing data elements 14

Pointer swizzling

How to manage pointers when blocks are moved between main memory
(memory addresses) and secondary memory (database addresses).
• When in secondary memory, database addresses are used.
• When in main memory, database or memory addresses may be used.
Using memory addresses is more efficient. Otherwise translation is needed.
A translation table is used to map database addresses to memory addresses.

Pointer swizzling
When a block is moved from secondary to main memory, pointers in the
block can be swizzled (translated) from database addresses to memory
addresses. A bit indicates the type of address.

Representing data elements 15

Swizzling strategies

Automatic Swizzling
When a block is moved into main memory, all pointers in the block are
swizzled if possible. All addresses to blocks currently in memory are stored
in the translation table.

Swizzling on Demand
Pointers are swizzled when they are followed. When a block is moved into
memory only the translation table is updated.

No Swizzling
Pointers are never swizzled. The translation table is used all the time.

Representing data elements 16

Problem session: Swizzling

• Discuss the pros and cons of the three swizzling strategies:

– Automatic Swizzling,
– Swizzling on Demand, and
– No Swizzling.
When is it a good/bad idea to use them?
• What are the problems when a block is written back to disk? And how
can they be solved?

Representing data elements 17

Variable-length data and records

Reasons why records not always have the same size:

• Fields of variable length. Attribute content vary in size.
• Repeating fields. An attribute that appears several times, but how
many times is not specified by the schema.
• Records of variable format. When different tuples in a relation have
different sets of attributes. E.g., if many attributes have no content.
• Enormous fields. Data like movies and pictures in the relation. The
record may not fit into one block.

Representing data elements 18

Fields of variable length

When a field has variable size we still have to be able to find all fields in the
record. Since the offset cannot be read from the relation schema some extra
information is stored in the record header.

Example of how it can be solved:

• Store fixed length fields first in the record.
• Store the total size of the record.
• Store offsets for variable sized fields (except the first).

Representing data elements 19

Repeating fields
When a record contains a variable number of a field.
Store information in the record header to locate all occurrences of the field
in the record.
A method to deal with fields of variable size and variable number of
occurrences:
• Keep the record fixed size.
• Store variable length data in a separate block and use a pointer to it.
• Fixed sized records can be searched more efficiently. Less information is
needed in the header. Moving records is easier.
• The number of I/O’s increase, since a pointer has to be followed.
Mixed strategies may be a good solution.

Representing data elements 20

Spanned records

A record is called spanned record if it is split between two or more blocks.

Reasons for spanned records:

• Space utilization.
• Records larger than a block.
For each fragment of a record, extra information on where to find next and
previous fragment is needed.

Representing data elements 21

BLOBS
Binary, Large OBjectS = BLOBS
BLOBS can be images, movies, audio files and other very large values that
can be stored in fields.
Storing BLOBS
• Stored in several blocks.
• Preferable to store them consecutively on a cylinder for efficient
retrieval.
Retrieving BLOBS
• A client retrieving a movie may not want it all at the same time.
• Retrieving a specific part of the large data requires an index structure
to make it efficient.

Representing data elements 22

Problem session: Updates

We will look at three types of updates:

• Insertions of new tuples
• Deletions of tuples
• Tuple updates
What problems may arise when updates are performed on the database?
Think of the different situations where we have:
• fixed length vs. variable length tuples
• no order vs. sorted tuples

Representing data elements 23

Updates
Insert
No order: No problem, just find a block with enough space or use a new
block.
Fixed order: May be a problem if there is not enough room in the correct
block. Solutions:
1. Find space in nearby block and rearrange
2. Create an overflow block
Delete
Pack data in the block to prepare for new inserts. Remove overflow blocks
if possible. Leave a tombstone if there may be pointers to the record.
Update
Fixed length: No problem.
Variable length: Same as for insert and delete. (But no tombstones.)

Representing data elements 24

Stacks and Queues

A stack maintains a collection of items in which only the most recently

added item may be removed.
A queue maintains a collection of items in which only the earliest added
item may be accessed/removed.
How can we maintain a stack or queue in external memory?
– use buffering

“macroscopic view” in external memory is same as “microscopic view” in

internal memory.

Representing data elements 25

Problem session on linked lists

Representing data elements 26

Summary

• Storing fixed sized tuples

• Variable length tuples
– offset tables
– overflow blocks
• Dealing with pointers
– logical and physical addresses
– database and memory addresses
– pointer swizzling
• Updates
• stacks, queues and linked lists in external memory

Representing data elements 27

4 DBMS
No ratings yet
4 DBMS
78 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
18 pages
Lecture 17
No ratings yet
Lecture 17
24 pages
31 File Structures
No ratings yet
31 File Structures
20 pages
File Organization1
No ratings yet
File Organization1
17 pages
File Management Essentials
No ratings yet
File Management Essentials
29 pages
Fs Report
No ratings yet
Fs Report
28 pages
14-Record Nei Blocchi
No ratings yet
14-Record Nei Blocchi
14 pages
6 Data Storage and Querying
100% (1)
6 Data Storage and Querying
58 pages
Data Storage and Access Methods: Min Song IS698
No ratings yet
Data Storage and Access Methods: Min Song IS698
50 pages
Unit - 5 - Part 1
No ratings yet
Unit - 5 - Part 1
49 pages
DBMS - Unit 3 - Page 1-6
No ratings yet
DBMS - Unit 3 - Page 1-6
19 pages
Lecture 03 Storage
No ratings yet
Lecture 03 Storage
32 pages
Dbms 5
No ratings yet
Dbms 5
38 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
No ratings yet
Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering &technology
38 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Elmasri 6e Ch17 Week2 HW DiskStorage
No ratings yet
Elmasri 6e Ch17 Week2 HW DiskStorage
96 pages
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
No ratings yet
Disk Storage, Basic File Structures, and Hashing: Database Design Database Design
13 pages
Unit 5
No ratings yet
Unit 5
185 pages
Ch4-Data Storage and Indexing
No ratings yet
Ch4-Data Storage and Indexing
116 pages
Lecture 1 Edited-1
No ratings yet
Lecture 1 Edited-1
48 pages
2.3 Databases
No ratings yet
2.3 Databases
9 pages
Business Objects Design
No ratings yet
Business Objects Design
5 pages
08 Storage
No ratings yet
08 Storage
43 pages
Topic2 4 Stid5014 PDD
No ratings yet
Topic2 4 Stid5014 PDD
70 pages
File Organization
No ratings yet
File Organization
37 pages
File Organization
No ratings yet
File Organization
47 pages
Unit3 Datastorage Structre
No ratings yet
Unit3 Datastorage Structre
29 pages
File Organization
No ratings yet
File Organization
4 pages
Data Storage and Indexing: João R. Campos
No ratings yet
Data Storage and Indexing: João R. Campos
55 pages
Unit I - Database Management System
No ratings yet
Unit I - Database Management System
77 pages
Fundamental File Structure Concepts-Report
No ratings yet
Fundamental File Structure Concepts-Report
25 pages
Unit 4 Data Storage Structure - 2
No ratings yet
Unit 4 Data Storage Structure - 2
26 pages
Lec 6
No ratings yet
Lec 6
29 pages
Topic: Databases: A Database Is A Way of Storing Information in A Structured, Logical Way. They Are Used To Collect and
No ratings yet
Topic: Databases: A Database Is A Way of Storing Information in A Structured, Logical Way. They Are Used To Collect and
8 pages
DBT 1
No ratings yet
DBT 1
10 pages
Lecture 03 Storage (2) - Without Answers
No ratings yet
Lecture 03 Storage (2) - Without Answers
45 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
RAID Levels & File Organization
No ratings yet
RAID Levels & File Organization
30 pages
DBMS Indexing and Storage
No ratings yet
DBMS Indexing and Storage
53 pages
Architecture and Implementation of Database Systems HS 07 Indexing
No ratings yet
Architecture and Implementation of Database Systems HS 07 Indexing
9 pages
CST 204 Dbms Module - 3 Physical Data Organization
No ratings yet
CST 204 Dbms Module - 3 Physical Data Organization
93 pages
Intro File2
No ratings yet
Intro File2
36 pages
05 Storage2
No ratings yet
05 Storage2
4 pages
04-Storage2 2
No ratings yet
04-Storage2 2
4 pages
Files, Pages, Records
No ratings yet
Files, Pages, Records
56 pages
Data Storage Structures
No ratings yet
Data Storage Structures
38 pages
CH 13
No ratings yet
CH 13
6 pages
Module 1 Part2
No ratings yet
Module 1 Part2
67 pages
4th Lecture (Database Structure)
No ratings yet
4th Lecture (Database Structure)
14 pages
Storage and File Structures: Goals
No ratings yet
Storage and File Structures: Goals
13 pages
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
No ratings yet
(IT) 08 Physical DM Dan Implementasi DB - DDL - DML
68 pages
Encapsulation Presentation
No ratings yet
Encapsulation Presentation
38 pages
Data Warehousing & Mining Course Outline
No ratings yet
Data Warehousing & Mining Course Outline
7 pages
Nissan 2019 Financial Report Summary
No ratings yet
Nissan 2019 Financial Report Summary
141 pages
Iot Based Implementation of Vehicle Monitoring and Tracking System Using Node Mcu
No ratings yet
Iot Based Implementation of Vehicle Monitoring and Tracking System Using Node Mcu
5 pages
IOT Based Vehicle Tracking and Monitoring System Using GPS and GSM
No ratings yet
IOT Based Vehicle Tracking and Monitoring System Using GPS and GSM
5 pages
1 Introduction To Statistical Packages
No ratings yet
1 Introduction To Statistical Packages
2 pages
Cost Management for UCLA Activities
No ratings yet
Cost Management for UCLA Activities
1 page
Statisticalpackage PDF
No ratings yet
Statisticalpackage PDF
16 pages
Statistical Software: An Overview: January 2011
No ratings yet
Statistical Software: An Overview: January 2011
9 pages
Introduction and Statistical Packages: Based On A Book by Julian J. Faraway
No ratings yet
Introduction and Statistical Packages: Based On A Book by Julian J. Faraway
11 pages
ZCC Form No
No ratings yet
ZCC Form No
4 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Economics, Institutions, and Development: A Global Perspective
No ratings yet
Economics, Institutions, and Development: A Global Perspective
19 pages
Heat Exchanger Foundation Design Guide
No ratings yet
Heat Exchanger Foundation Design Guide
13 pages
Nosql Column-Family Stores
No ratings yet
Nosql Column-Family Stores
30 pages
Snowflake Schema
No ratings yet
Snowflake Schema
13 pages
Reducing Lookup Table Size Used For Bit-Counting Algorithm
No ratings yet
Reducing Lookup Table Size Used For Bit-Counting Algorithm
8 pages
Information Retrieval Techniques
No ratings yet
Information Retrieval Techniques
175 pages
Sqlite Studio Manual
No ratings yet
Sqlite Studio Manual
38 pages
DBMS Disadvantages & Setup Guide
No ratings yet
DBMS Disadvantages & Setup Guide
61 pages
Denodo Certified Developer (Associate) Exam - Prep
No ratings yet
Denodo Certified Developer (Associate) Exam - Prep
25 pages
MongoDB Essentials 2025
No ratings yet
MongoDB Essentials 2025
106 pages
Excel Course Outline
No ratings yet
Excel Course Outline
27 pages
BCA 428 Oracle
No ratings yet
BCA 428 Oracle
142 pages
Data Stage CV
No ratings yet
Data Stage CV
24 pages
Delphi Study Notes
No ratings yet
Delphi Study Notes
36 pages
SQL Interview Prep Guide
No ratings yet
SQL Interview Prep Guide
10 pages
Understanding Performance Tuning in Oracle
100% (2)
Understanding Performance Tuning in Oracle
17 pages
SQL - Practical: Data Base
No ratings yet
SQL - Practical: Data Base
6 pages
ERP Is A Package Which Provides Solution For Departmental Functionalities of An Organization
No ratings yet
ERP Is A Package Which Provides Solution For Departmental Functionalities of An Organization
31 pages
Guides in Using The Measurement Package On Discernment
No ratings yet
Guides in Using The Measurement Package On Discernment
16 pages
MySQL Cluster Deployment Guide
No ratings yet
MySQL Cluster Deployment Guide
39 pages
Nimap Interview
No ratings yet
Nimap Interview
13 pages
10987C ENU PowerPoint
No ratings yet
10987C ENU PowerPoint
278 pages
Formulas Vlookup
No ratings yet
Formulas Vlookup
25 pages
DynamoDB Guide for Developers
No ratings yet
DynamoDB Guide for Developers
26 pages
Ssas Rolap For SQL Server
No ratings yet
Ssas Rolap For SQL Server
42 pages
Power BI Developer Data Analyst Interview Questions
No ratings yet
Power BI Developer Data Analyst Interview Questions
9 pages
PostgreSQL 11 Released
No ratings yet
PostgreSQL 11 Released
3 pages
Arrays y Clusters in Labview
No ratings yet
Arrays y Clusters in Labview
9 pages
AppsUpgrade11 5 10 2tor12
No ratings yet
AppsUpgrade11 5 10 2tor12
10 pages
PowerBuilder 10.5 Launch and Learning Event
No ratings yet
PowerBuilder 10.5 Launch and Learning Event
34 pages
SQL Basics Training Manual
No ratings yet
SQL Basics Training Manual
60 pages

Lecture3 PDF

Uploaded by

Lecture3 PDF

Uploaded by

Advanced Database Technology

Rasmus Pagh and S. Srinivasa Rao

Representing data elements

February 13, 2006

Based on Chapter 12 in GUW, [Pagh03] Sec. 1, and [CLRS01] pp. 405-409

title year length filmType

Schema:Movie; Star Wars; Schema:Movie; Wayne’s

Problems with updates:

Representing data elements 1

• Storing fixed sized tuples

Representing data elements 2

• Attributes are stored as a sequence of bytes, called fields.

Representing data elements 3

Representing data elements 4

Representing data elements 5

A record header might store information such as

Representing data elements 6

Representing data elements 7

The block may have a header with info.:

Representing data elements 8

Read the box on page 573 in the book and discuss:

Representing data elements 9

Why addresses are different from other kind of data:

Representing data elements 10

Block address for blocks in main memory:

Block address for blocks in secondary memory:

Representing data elements 11

Representing data elements 12

Reasons why structured addresses are useful:

Representing data elements 13

How to organize an offset table:

Representing data elements 14

Representing data elements 15

Representing data elements 16

• Discuss the pros and cons of the three swizzling strategies:

Representing data elements 17

Reasons why records not always have the same size:

Representing data elements 18

Example of how it can be solved:

Representing data elements 19

Representing data elements 20

A record is called spanned record if it is split between two or more blocks.

Reasons for spanned records:

Representing data elements 21

Representing data elements 22

We will look at three types of updates:

Representing data elements 23

Representing data elements 24

A stack maintains a collection of items in which only the most recently

“macroscopic view” in external memory is same as “microscopic view” in

Representing data elements 25

Representing data elements 26

• Storing fixed sized tuples

Representing data elements 27

You might also like