DBMS
DBMS
↓
 Data: Data is any known facts or any smallest information that can be recorded and have implicit
   meaning
      Eg :- Sanjana , BSC(CS) ,DCA, 2004 .
                                                                Data                 Information
 Why we need Data ?                                                   1)Data is collection    1)Information
Ans-To derive some information from it.                                of raw facts and        is    processed
                                                                       figures                 data
   Information :-When data is processed ,organized,structured
                                                                       2)Data    is     not    2)Information
    or presented in a given context to make it more useful it is
                                                                       arranged.               is arranged
    called information .
                                                                       3)Data             is   3)Information
                                                                       unorganized             is organized
   Data Base:-It is collection of related data..…. Here related
    data means if are collecting the information of an employee.It     4)Data does    not      4)Information
    should be related to employee. And DATA BASE should have           depend          on      depends      on
    collecting of this employee data.                                  information             data
Eg:                     Name    Age        Designation      Salary   5)Data is low -level 5)Information
Related Data            Sanjana 20          Clerk          19000     knowledge.           is the second
       Collection       Sana     23        Data Analyst    50000                          level       of
       Of                :                                                                language
       Related           :
       data            Sara      23        Data Analyst  80,000
 Data Base System:- It is a system in which ensure uses the Database Technology in order to achieve
    an organized store a large no.of dynamic associated data with the help of Hardware ,software (DBMS),
    OS.
   Data Base System:- Composed of 5 major parts - Hardware , Software(DBMS), people, procedure,
    data
  Data Base Management System:-It is a set of software programs that allows users to create,edit and
   update data in database files and store and retrieve data from those database files.
Example-Oracle, MS Sql server ,MYSQL,SQL ,DB2(IBM)
                                                                            A database is a    A DBMS is a
   A data Base Management System (DBMS) is a collection of
                                                                            collection    of   collection  of
    interrelated data and a set of programs to access those data.
                                                                            connected          programs that
                                                                            information        allow you to
   DBMS is used to organize the data in the form of a
                                                                            about              create,manage
    table ,schema,view and report etc.
                                                                            people,location    and operate a
   The primary goal of a DBMS is to provide a way to store and
                                                                            or things          database
    retrieve database information that is both convenient and efficient.
   DBMS can also be define as an interface between the application program and the OS to access and
    manipulate that database.
   Database Management system is a software which is used to manage the database.
Example- MySQL, Oracle etc are a very popular commercial database which is used in different application.
         Characteristics of DBMS :-1)Self describing nature of a database system(catalog)
    2)   It can provide a clear and logical view of the process that manipulates data
    3)   DBMS contains automatic backup and recovery procedures
    4)   It can reduce the complex relationship between data
    5)   It is used to provide security of data.
     Application of DBMS:-
    1) Banking : For maintaining customer information,accounts ,loans and banking transactions
    2) Universities :For maintaining student records ,course registration grades.
    3) Railway Reservation :For checking the availability of reservation in different trains,tickets etc.
    4) Airlines :For reservation and schedule information
    5) Telecommunication- :For Keeping records of calls mode ,generating monthly bills etc.
    6) Finance : For storing information about holidays ,sales and purches of financial instruments
    7) Sales :For customer ,product and purchase information
     Advantage of DBMS
     1) Control database Redundancy:- It control data redundancy because it stores all the data in one single
    database file and that recorded data is placed in the database .
    2) Data sharing :- It DBMS the authorized users of an organization can share the data among multiple users.
    3) Easily Maintenance:-It can be easily maintainable due to the centralized nature of the database system.
    4) Reduce Time:-It reduces development time and maintenance need.
    5) Backup: It provides backup and recovery subsystems which create automatic backup of data from
    hardware and software failure and restores the data if requires.
    6) Multiple user interface: It provides different types of user interface like graphical user
    interfaces ,application program interface.
       Disadvantage of DBMS: 1)Cost of hardware and software :It requires a high speed of data processor
        and large memory size to run DBMS software.
    2) Size :It occupies a large space of disks and large memory to run then efficiently.
    3) Complexity : Database system creates additional complexity and requirements.
     Disadvantage of File System :- 1) Data Redundancy and Inconsistency 2)Difficulty in Accessing Data
     3) Data Isolation 4) Integrity Problem 5) Atomicity Problem 6)Concurrent Access Anomalies
     7)Security problem
     Type of databases :-There are various types of databases used for storing different varieties of data.
Type of Database
    1) Centralized Database:-It is the type of database that stores data at a centralized database system. It
    comforts the users to access the stored data from different locations through several applications. These
    applications contain the authentication process to let users access data securely.
     An example of a Centralized database can be Central Library that carries a central database of each library
    in a college/university.
    2) Distributed Database:- In distributed systems, data is distributed among different database systems of an
    organization. These database systems are connected via communication links. Such links help the end-users
    to access the data easily.
Examples of the Distributed database are Apache Cassandra, HBase, Ignite, etc.
It divided into two subpart-
                           Distributed Data base
              Homogeneous                         Heterogeneous
                DDB                                    DDB
   o  Homogeneous DDB: Those database systems which execute on the same operating system and use the
      same application process and carry the same hardware devices.
   o Heterogeneous DDB: Those database systems which execute on different operating systems under
      different application procedures, and carries different hardware devices.
Advantages of Distributed Database
   o Modular development is possible in a distributed database, i.e., the system can be expanded by
      including new computers and connecting them to the distributed system.
   o One server failure will not affect the entire data set.
3) Relational Database:-It stores data in the form of rows(tuple) and columns(attributes), and together
forms a table(relation). A relational database uses SQL for storing, manipulating, as well as maintaining the
data. E.F. Codd invented the database in 1970. Each table in the database carries a key that makes the data
unique from others
Examples of Relational databases are MySQL, Microsoft SQL Server, Oracle, etc
4)No Sql Database(With out structure data store kora):-Non-SQL/Not Only SQL is a type of database that
is used for storing a wide range of data sets. It is not a relational database as it stores data not only in
tabular form but in several different ways.
It also divides into 4 sub part-
    a. Key-value storage
    b. Document-oriented Database
    c. Graph Databases
    d. Wide-column stores
Advantages of NoSQL Database
    o It is a better option for managing and handling large data sets.
    o It provides high scalability.
    o Users can quickly access data from the database through key-value.
5)Cloud Database:-A type of database where data is stored in a virtual environment and executes over the
cloud computing platform. It provides users with various cloud computing services (SaaS, PaaS, IaaS, etc.) for
accessing the database. There are numerous cloud platforms, but the best options are:
    o Amazon Web Services(AWS)
    o Microsoft Azure
    o ScienceSoft
    o Google Cloud SQL, etc
6)Object-oriented Databases:The type of database that uses the object-based data model approach for
storing data in the database system. The data is represented and stored as objects which are similar to the
objects used in the object-oriented programming language.
7) Hierarchical Databases:It is the type of database that stores data in the form of parent-children
relationship nodes. Here, it organizes data in a tree-like structure.
Data get stored in the form of records that are connected via links. Each child record in the tree will contain
only one parent. On the other hand, each parent record can have multiple child records.
8)Network Databases:It is the database that typically follows the network data model. Here, the
representation of data is in the form of nodes connected via links between them. Unlike the hierarchical
database, it allows each record to have multiple children and parent nodes to form a generalized graph
structure.
   table/Relation :Everything in a relational database is stored in the form of relations. The RDBMS
    database uses tables to store data. A table is a collection of related data entries and contains rows and
    columns to store data.
Properties of a Relation:
   o Each relation has a unique name by which it is identified in the database.
   o Relation does not contain duplicate tuples.
   o The tuples of a relation have no specific order.
   o All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.
                  -
    row or record: A row of a table is also called a record or tuple. It contains the specific information of
    each entry in the table. It is a horizontal entity in the table
Properties of a row:
   o No two tuples are identical to each other in all their entries.
   o All tuples of the relation have the same format and the same number of entries.
   o The order of the tuple is irrelevant. They are identified by their content, not by their position.
   column/attribute/fields :A column is a vertical entity in the table which contains all information
    associated with a specific field in a table.
Properties of an Attribute:
   o Every attribute of a relation must have a name.
   o Null values are permitted for the attributes.
   o Default values can be specified for an attribute automatically inserted if no other value is specified for
       an attribute.
   o Attributes that uniquely identify each tuple of a relation are the primary key.
   o
   data item/Cells:-The smallest unit of data in the table is the individual data item. It is stored at the
    intersection of tuples and attributes.                        ID        Name      AGE         COURSE
Properties of data items:1)Data items are atomic.
   2)The data items for an attribute should be drawn from the     1         Debraj    20          BSC
   same domain.
In the below example, the data item in the student table consists of Debraj, 20 and BSC, etc.
 Degree:The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
                   ID        Name       AGE       COURSE
                   1         Sara       24        B.tech
                   2         Sana       20        C.A
                   3         Deb        20        BCA
                   4         Raj        22        MCA
                   5         Debraj     20        BSC
   Cardinality:The total number of tuples at any one time in a relation is known as the table's cardinality.
    The relation whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
   Domain:The domain refers to the possible values each attribute can contain. It can be specified using
    standard data types such as integers, floating numbers, etc. For example, An attribute entitled
    Marital_Status may be limited to married or unmarried values.
   Codd’s Rules in RDBMS :-Dr E.F codd is an IBM researcher who first developed the relational data
    model in 1970. In 1985 Dr.codd published a list of 12 rules that define an ideal relational database
    and has provided a guideline for the design of all relational database
Rule 1: The Information Rule : This rule simply requires that all data should be presented in table form this
is the basis of relational model.
Rule 2: The Guaranteed Access Rule :Each data element is guaranteed to be accessible logically with a
combination of the table name, primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values:Every Null value in a database must be given a systematic
and uniform treatment.
Rule 4: Active Online Catalog Rule:The database catalog, which contains metadata about the database, must
be stored and accessed using the same relational database management system.
Rule 5: The Comprehensive Data Sub language Rule: A crucial component of any efficient database system
is its ability to offer an easily understandable data manipulation language (DML) that facilitates defining,
querying, and modifying information within the database.
Rule 6: The View Updating Rule:All views that are theoretically up datable must also be up datable by the
system.
Rule 7: High-level Insert, Update, and Delete:-A successful database system must possess the feature of
facilitating high-level insertions, updates, and deletions that can grant users the ability to conduct these
operations with ease through a single query.
Rule 8: Physical Data Independence:Application programs and activities should remain unaffected when
changes are made to the physical storage structures or methods.
Rule 9: Logical Data Independence :Application programs and activities should remain unaffected when
changes are made to the logical structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence:Integrity constraints should be specified separately from application
programs and stored in the catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence:The distribution of data across multiple locations should be invisible to
users, and the database system should handle the distribution transparently.
Rule 12: Non-Subversion Rule:If the interface of the system is providing access to low-level records, then
the interface must not be able to damage the system and bypass security and integrity constraints.
       Key                       DBMS                                          RDBMS
Query           There is no efficient query processing in the      Efficient query processing is there in
processing      file system.                                       DBMS.
User Access Only one user can access data at a time. Multiple users can access data at a time.
1-Tier Architecture:- this architecture, the database is directly available to the user. It means the user can
directly sit on the DBMS and uses it.Any changes done here will directly be done on the database itself. It
doesn't provide a handy tool for end users.The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate with the database for the quick response.
    Application architecture of DBMS :-
    o The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
       with a large number of PCs, web servers, database servers and other components that are connected
       with networks.
    o DBMS architecture depends upon how users are connected to the database to get their request done.
 Types of DBMS Architecture - 1-tier architecture , 2-tier architecture and 3-tier architecture
    Database architecture can be seen as a single tier or multi-tier. But logically, database architecture is of
    two types like: 2-tier architecture and 3-tier architecture.
2-Tier Architecture
   o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the
       client end can directly communicate with the database at the server side. For this interaction, API's
       like: ODBC, JDBC are used.
   o The user interfaces and application programs are run on the client-side.
   o The server side is responsible to provide the functionalities like: query processing and transaction
       management.
   Schema:- The schema defines the tables, the attributes along with its size and type and relationship
    between attributes(column) and table.Overol design of the data is called the database schema.
   Database Instance:- Database changes over time as information are inserted and deleted .The
    collection at particular moment is called Database Instance.
Database
1. Internal Level/Physical level :-The internal level has an internal schema which describes the physical
storage structure of the database.
2. Conceptual Level :The conceptual schema describes the design of a database at the conceptual level.
       Conceptual level is also known as logical level.
    o The conceptual schema describes the structure of the whole database.
    o In the conceptual level, internal details such as an implementation of the data structure are hidden.
    o Programmers and database administrators work at this level.
    o
3. External Level /View Level/view schema :At the external level, a database contains several schemas that
sometimes called as sub schema. The sub schema is used to describe the different view of the database.
   Data Model:-Data model is the modeling of the data description ,data semantics and consistency
    constraints of the data.
• Data model provides the conceptual tools for describing the design of a database at each level of data
   abstraction.
• A data model can also be define as the collection of high level data description constructs that hide many
   low level storage details.
 There are mainly three types of data model
data model
1) Object Based Data model :-It is used to describe the data at the logical and view level .Object based data
model provide flexible structuring and structuring capabilities and allow to specify data constraints.
B) Object Oriented Data Model :-In an object oriented model ,information or data is displayed as an object
and these objects store the value in the instance variable .In this model object oriented programming images
are used.
This model works with object oriented programming language like Python,Java etc it was constructed in the
1980.
2) Record Base Data Model:-It is used to describe data at logical and view level.
• This data model is used to specify the overall logical structure and to specify the higher level structure
   and provide higher level description
.
 There are three type of record based data mode-
B) Network Data Model :-In network data model data is organized into graph .And it can have more than one
parent node. It permits the modeling of man to many relationships in data.
                                   Store
                         Order                   Items
C) Hierarchical Data Model:- The Hierarchical Data Model organizes data in a tree structure .
In this model each entity has only one parent and may abstract children.There is only one entity in this model
that we call root.
                                    College
Department Information
3)Physical Data model:-This data model is used to describe the data at low level
   DBMS in Interface:-A database management system (DBMS) interface is a user interface that allows
    for the ability to input queries to a database without using the query language itself.
 User-friendly interfaces provided by DBMS may include the following:
      Menu-Based Interfaces
      Forms-Based Interfaces
       Graphical User Interfaces
       Natural Language Interfaces
       Speech Input and Output Interfaces
       Interfaces for Parametric Users
       Interfaces for the Database Administrator (DBA)
1)   Menu-Based Interfaces:These interfaces present the user with lists of options (called menus) that lead
     the user through the formation of a request. The basic advantage of using menus is that they remove the
     tension of remembering specific commands and syntax of any query language.
2)   Forms-Based Interfaces:A forms-based interface displays a form to each user. Users can fill out all of
     the form entries to insert new data, or they can fill out only certain entries, in which case the DBMS will
     redeem the same type of data for other remaining entries.. Many DBMS’s have form specification
     languages which are special languages that help specify such forms.
3)   Graphical User Interface:A GUI typically displays a schema to the user in diagrammatic form. The user
     then can specify a query by manipulating the diagram. In many cases, GUI utilize both menus and forms.
     Most GUI use a pointing device such as a mouse, to pick a certain part of the displayed schema diagram.
4)   Natural Language Interfaces:These interfaces accept requests written in English or some other
     language and attempt to understand them. A Natural language interface has its own schema, which is
     similar to the database conceptual schema .
5)   Speech Input and Output Interfaces:There is limited use of speech be it for a query or an answer to a
     question or being a result of a request it is becoming commonplace. Applications with limited vocabulary
     such as inquiries for telephone directory, flight arrival/departure, and bank account information are
     allowed speech for input and output to enable ordinary folks to access this information.
     The Speech input is detected using predefined words and used to set up the parameters that are supplied
      to the queries. For output, a similar conversion from text or numbers into speech takes place.
6)   Interface for Parametric Users:Interfaces for Parametric Users contain some commands that can be
     handled with a minimum of keystrokes. It is generally used in bank transactions for transferring money.
     These operations are performed repeatedly.
7)   Interfaces for Database Administrators (DBA):-Most database system contains privileged commands
     that can be used only by the DBA’s staff. These include commands for creating accounts, setting system
     parameters etc.
 component of ER Diagram :-
        Strong Entity
        set set
1. Entity: it is a thing or object in the real world that is distinguishable from all other object .
• Anything about Which we store information is called an Entity.
Entity Set :- It is a set of entities of the some type that share the some properties or attributes.
• An Entity set can be represented as rectangle.
1)Weak Entity set :- An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.
2) Strong Entity Set :- A strong entity set is an entity set that contains sufficient attributes to uniquely
identify all its entities .
• Primary key exists for a strong entity set.
• Single rectangle is used to representing a strong entity set .
Type of attributes :-
1) Simple attribute : An attribute that cannot be further subdivided into components is a simple
   attribute. It is represent by ellipse.
   Example: The roll number of a student, the id number of an employee.
                                       Roll no
2)    Composite attribute : An attribute that can be split into components is a composite attribute.
     The composite attributes is represent by an ellipse and those Ellipse are connected with an ellipse
First Name
                                                  Last Name
3)Multi-valued attribute : An attributes can have more than one value these attributes are known as a
Multi-valued attributes.The double ellipse is used to represent multi valued attributes.
Example: A student can have more than one phone number.
Phone no
4)Derived attribute : An attribute that can be derived from other attributes is derived attributes.
It can be represented by a dashed ellipse.
Example:A person age changes are time and can be derived from another attributes like date of birth.
Age age
5)Key attribute:The key attributes is used to represent the main characteristic of an Entity .It represent a
primary key.The key attribute is represented by an ellipse with the next underlined.
                                   Student -ID
6)Single-valued attribute : The attribute which takes up only a single value for each entity instance is a
single-valued attribute.
Example: The age of a student.
7)Complex attribute : Those attributes, which can be formed by the nesting of composite and multi-valued
attributes, are called “Complex         Attributes“. These attributes are rarely used in DBMS(DataBase
Management System). That’s why they are not so popular.
8)Stored attribute:The stored attribute are those attribute which doesn’t require any type of further
update since they are stored in the database.
Example: DOB(Date of birth) is the stored attribute.
 Relationship/Mapping construction :-
Relationship:-A relationship is used to describe the relation between entities. Diamond or rhombus is used
to represent the relationship.
a. One-to-One Relationship:-When only one instance of an entity is associated with the relationship, then it
is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship:-When only one instance of the entity on the left, and more than one instance
of an entity on the right associates with the relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific scientist.
c. Many-to-one relationship:When more than one instance of the entity on the left, and only one instance of
an entity on the right associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship:When more than one instance of the entity on the left, and more than one
instance of an entity on the right associates with the relationship then it is known as a many-to-many
relationship.
For example, Employee can assign by many projects and project can have many employees.
   Notation of E-R Diagram :-Database can be represented using the notations. In ER diagram, many
    notations are used to express the cardinality. These notations are as follows:
   Construct an E-R diagram for a hospital with a set of patients and a set of medical doctor.
 Keys in DBMS .
Keys:-
    o A key is a value which can always be used to uniquely identify an object instance.
    o It is used to uniquely identify any record or row of data from the table. It is also used to establish and
        identify relationships between tables.
For example, ID is used as a key in the Student table because it is unique for each student. In the PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of keys:
1. Primary key
    o Primary key can be define as the minimum no of candidate key this is chosen by the database designer
       as the principle means of identifying entities within an entity set.
    o It is a unique key.
    o It can identity only one tuple (are cord) at a time .
    o It has no duplicate values it has unique values
    o It cannot be NULL.
   o   Primary keys are not necessary to be a single column,more than one the column can also be a primary
       key for a table.
2. Candidate key
    o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
    o Except for the primary key, the remaining attributes are considered a candidate key. The candidate
       keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3.Super Key: A super key is a set of one or more attributes that taken collectively allow us to identify
uniquely an entity in the entity set.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a
key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
4. Foreign key
    o A Foreign keys is a column whose value are the same as the primary key of another table.
    o It combines two or more relations(table) at a time.
    o They act as a crass reference between the tables
    o Foreign key are the column of the table used to point to the primary key of another table
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is also
known as Concatenated Key.
For example-In student table with attribute (s_roll no,s_ID,s_name,s_branch)
Composite key- s_roll no s_ID.
7. Artificial key
The key created using arbitrarily assigned data are known as artificial keys. These keys are created when a
primary key is large and complex and has no relationship with many other relations. The data values of the
artificial keys are usually numbered in a serial order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is large in employee
relations. So it would be better to add a new virtual attribute to identify each tuple in the relation uniquely.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in a
relation. These attributes or combinations of the attributes are called the candidate keys. One key is chosen
as the primary key from these candidate keys, and the remaining candidate key, if it exists, is termed the
alternate key. In other words, the total number of the alternate keys is the total number of candidate keys
minus the primary key. The alternate key may or may not exist. If there is only one candidate key in a
relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate keys. In
this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No, acts as the
Alternate key.
 DBMS Generalization :-
    o Generalization is like a bottom-up approach in which two or more entities of lower level combine to
       form a higher level entity if they have some attributes in common.
    o In generalization, an entity of a higher level can also combine with the entities of the lower level to
       form a further higher level entity.
    o Generalization is more like subclass and superclass system, but the only difference is the approach.
       Generalization uses the bottom-up approach.
    o In generalization, entities are combined to form a more generalized entity, i.e., subclasses are
       combined to make a superclass.
For example, Faculty and Student entities can be generalized and create a higher level entity Person
   DBMS Specialization :-
   o Specialization is a top-down approach, and it is opposite to Generalization. In specialization, one
      higher level entity can be broken down into two lower level entities.
   o Specialization is used to identify the subset of an entity set that shares some distinguishing
      characteristics.
   o Normally, the superclass is defined first, the subclass and its related attributes are defined next, and
      relationship set are then added.
For example: In an Employee management system, EMPLOYEE entity can be specialized as TESTER or
DEVELOPER based on what role they play in the company.
                     Person
    Salary                                   Credit-rating
                     Is a
                Employee             Customer
                                                                              Generalization
                  Is A
                                                                              Specialization
    DBMS Aggregation :-
    o Aggregation is a technique to express relationship among relationship.
    o Through E-R modeling we cannot express relationship among relationships .Thus we use the concept
       of aggregation for this purpose
    o Aggregation is an abstraction through which relationship are treated as entities
    o In aggregation, the relation between two entities is treated as a single entity.
    o In aggregation, relationship with its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in a
relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he will
never enquiry about the Course only or just about the Center instead he will ask the enquiry about both.
   DBMS Architecture/Structure/Component :-A database system is partitional into modules that deal
    with each of the responsibilities of the overall system.
DBMS Architecture divided into 4 parts-
1) DBMS Users
2) Query Processor
3) Storage Processor
4) Disk Storage.
1) DBMS User: Database users are categorized based up on their interaction with the data base
A)Naive/End Users :-End users are the unsophisticated who don’t have any DBMS knowledge but they
frequently use the database applications in their daily life to get the desired results.
For example- Railways ticket booking users are have users.Clearcks in any bank is a naive user because they
do not have any DBMS knowledge but they still use the database and perform their given tasks.
B)Application Programmer :-A application program are the back end programmers who writes the code for
the application program.They are the computer professionals.These program could be written in
programming languages such as Net,C,C++,Java etc.
C)Sophisticated Users :-Sophisticated users can be engineers,scientists can be business analyst,who are
familiar with the database.They can develop their own database application according to their requirement.
They don’t write the program code but they interact the data base by writing SQL queries directly through the
query processor.
D) Database Administrator(DBA):-DBA is a person/team who who defines the schema and also controls the
3 levels of database.
• The DBA will then create a new account id and password for the user if he/she need to access the database.
• DBA is also responsible for providing security to the database he allow only the authorized users to
  access/modify the database.
• DBA monitors the recovery and backup and provide technical support.
• The DBA has a DBA account in the DBMS which called a system or super user account.
• DBA repairs damage caused due to hardware and/or software failures.
2)Query Processor:-In interprets the requests(queries) received from end user via an application program
into instruction.It also executes the user request which is received from the DML compiler.
a) DDL Compiler : The DDL statements are sent to DDL compiler,which converts these statements to set of
tables.These tables contains the meta data concerning the database and are in the form that can be used by
other components of the DBMS.
b) DML pre-compiler and Query Processor :-The DML pre compiler converts the DML statements embedded
in an application program to normal procedure calls in the host language.
   I)   DDL interpreter :It processes the DDL statements into a set of table containing meta data(data about
                       data)
   II) DML Compiler:-It processes the DML statements into low level instruction(machine language)so that
                       they can be executed.
   III) Query Evaluation Engine :-Which executes low-level instructions generated by the DML compiler.
3)Storage Manager/processor:-Storage manager is a program that provides an interface between the data
   Stored in the database and the queries received. It is also known as database control system .it maintains
   the consistency and integrity of the database by applying the constraints and executes the DCL statements.
   It is responsible for updating ,storing deleting and retrieving data in the database.
    o   A key attribute of the entity type represented by the primary key:-In the given ER diagram,
        COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of the entity.
    o   The multi valued attribute is represented by a separate table:-In the student table, a hobby is a
        multi valued attribute. So it is not possible to represent multiple values in a single column of
        STUDENT table. Hence we create a table STUD_HOBBY with column name STUDENT_ID and HOBBY.
        Using both the column, we create a composite key.
    o   Derived attributes are not considered in the table:-In the STUDENT table, Age is the derived
        attribute. It can be calculated at any point of time by calculating the difference between current date
        and Date of Birth.
   Using these rules, you can convert the ER diagram to tables and columns and assign the mapping
    between the tables. Table structure for the given ER diagram is as below:
Relational instance: In the relational database system, the relational instance is represented by a finite set
of tuples. Relation instances do not have duplicate tuples.
Relational schema: A relational schema contains the name of the relation and name of all columns or
attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify the row in the
relation uniquely.
   Integrity Constraints :-
    o   Integrity constraints are a set of rules. It is used to maintain the quality of information.
    o   Integrity constraints ensure that the data insertion, updating, and other processes have to be
        performed in such a way that data integrity is not affected.
1. Domain constraints
    o Domain constraints can be defined as the definition of a valid set of values for an attribute.
    o The data type of domain includes string, character, integer, time, date, currency, etc. The value of the
      attribute must be available in the corresponding domain.
Example:
4. Key constraints
    o Keys are the entity set that is used to identify an entity within its entity set uniquely.
    o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key
       can contain a unique and null value in the relational table.
Example:
   Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the result of the
query.Relational algebra mainly provides theoretical foundation for relational databases and SQL. It uses
operators to perform queries.
Types of Relational operation
    Notation: σ p(r)
Where: σ is used for selection prediction
        r is used for relation
        p is used as a propositional logic formula which may use connectors like: AND OR and NOT. These
        relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: Student
      Name      Roll No    Address
      Sana      02         Purulia
      Sara      04         Bakura
                                      Query 1 :Give all information of student having roll no is 04
      Deb       08         Delhi      Solution : σ roll no=04(student)
      Raj       13         Bombay
                                      Query 2: Find all information of student having name is deb and
address is Delhi
Solution : σ Name=”deb” and address=”Delhi”(student)
           σ (Name=”deb”) V (address=”Delhi”)(student)                  [V= or]
3. Union Operation(∪):
     o It performs binary union between two given relations and is define as R ∪S
     Where R and S are either database relations or relation result set(temporary relation )
     Notation: R - S
                       (student 1)   -            (student 2)
Example -∏ name                          ∏ name
6. Cartesian product:
    o The Cartesian product is used to combine each row in one table with each row in the other table. It is
       also known as a cross product.
    o It is denoted by X.
      Notation: E X D
      Where E and D are relations and their output will be define as-
                   E X D={q €|q€E and € ε D}
Example:
                                 (Student 1 X Student 2)
               Σ Name=’Kamal
7.   Rename Operation:The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
                                              Normalization
   Functional Dependency :-The functional dependency is a relationship that exists between two attributes.
    It typically exists between the primary key and non-key attribute within a table.
                                   X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know
the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as: Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency
   1. Trivial functional dependency:-
       o A → B has trivial functional dependency if B is a subset of A.
       o The following dependencies are also trivial like: A → A, B → B
                 ABEF={A,B,C,D,E,F}
           S.K   A→C,C→D,D→B,A→B        Transitive Dependency
                 AEF={A,B,C,D,E,F}
                 E→F
          AE={A,B,C,D,E,F}
    Candidate Key
   Prime attributes=(A,E) if no prime attributes available to right hand side of any function dependency so,it
   has only one candidate key ( AE ).
Problem-2:Find the possible candidate key of the Relation R(A,B,C,D) with functional dependency
A→B,B→C,C→A.
Solution: S.K       A B C D → {A,B,C,D}
            S.K      A C D →{A,B,C,D}
            S.K      A D →{A,B,C,D}
AD is candidate key.
AD prime attributes A,D prime attributes A available in right side of function dependency C→A so,another
candidate key CD.
Again C is available in right side of function dependency B→C so another candidate key.
               BD
So candidate key = AD,CD,BD
    What is Normalization?
    o Normalization is the process of organizing the data in the database.
    o Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to
       eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
    o Normalization divides the larger table into smaller and links them using relationships.
    o The normal form is used to reduce redundancy from the database table
1NF (first normal   A relation is in 1NF if it contains an atomic value. It Eliminate Repeating Groups
 form)
2NF (2nd normal     A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
form)               on the primary key. It Eliminate partial functional dependency.
3NF (3rd normal     A relation will be in 3NF if it is in 2NF and no transition dependency exists. It Eliminate
form)               transitive dependency.
BCNF     (4    th   A stronger definition of 3NF is known as Boyce Codd's normal form.it is called 3.5 NF.
normal form)
4NF(4th   normal    A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
form                dependency. Eliminate multi- values Dependency
5NF( 5th normal     A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
form)               lossless.Eliminate join Dependency.
    Advantages of Normalization
    o Normalization helps to minimize data redundancy.
    o Greater overall database organization.
    o Data consistency within the database.
    o Much more flexible database design.
    o Enforces the concept of relational integrity.
    Disadvantages of Normalization
    o You cannot start building the database before knowing what the user needs.
    o It is very time-consuming and difficult to normalize relations of a higher degree.
    o Careless decomposition may lead to a bad database design, leading to serious problems.
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
           Depart.Name     Depart.location
           Accounts        102
           Sales           104
           Store           106
1.         EMP_ID → EMP_COUNTRY
2.         EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
     Candidate key: {EMP-ID, EMP-DEPT}
     The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
     To convert the given table into BCNF, we decompose it into three tables:
     EMP_COUNTRY table:
                               Emp-id          Emp-Country
                                  264                 India
                                  364                 UK
     EMP_DEPT table:
                        Emp-Dept           Emp_Type           Emp_Dept No
                        Designing          D394               283
                        Testing            D394               300
                        Stored             D283               232
                        Developing         D283               549
     EMP_DEPT_MAPPING table:
                        Emp_ID             Emp_Dept
                        D394               283
                        D394               300
                        D283               232
                        D283               549
     Functional dependencies:
1.         EMP_ID → EMP_COUNTRY
2.          EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
     Candidate keys:
     For the first table: EMP_ID
     For the second table: EMP_DEPT
     For the third table: {EMP_ID, EMP_DEPT}
     Now, this is in BCNF because left side part of both the functional dependencies is a key.
     The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is
     no relationship between COURSE and HOBBY.
     In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
     hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary
     repetition of data.
     So to make the above table into 4NF, we can decompose it into two tables.
     STUDENT_COURSE
                                  Stu-Id    Course
                                  21        Computer
                                  21        Math
                                  34        Chemistry
                                  74        Biology
                                  59        Physics
STUDENT_HOBBY
             Stu-Id               Hobby
                    21            Dancing
                    21            Singing
                    34            Dancing
                    74            Cricket
                    59            Hockey
The given table is not in 4NF and 5NF first we convert it in 4NF with converting in two sub table.
Table-1 Faculty-Subject
               Faculty     Subject
               Sana        DBMS
               Sana        Java
               Sana        C
Table-2 Faculty -committee:-
                 Faculty       Committee
                 Sana          Placement
                 Sana          Scholarship
To convert it in 5NF ,we join both table1 and table2 if it give the result same as original table (faculty)then
its in 5NF otherwise not in 5NF.
                                  Table 1 + Table 2
          Faculty          Subject        Committee
          Sana             DBMS           Placement
          Sana             DBMS           Scholarship
          Sana             Java           Placement
          Sana             Java           Scholarship
          Sana             C              Placement
          Sana             C              Scholarship   Is equl to original table so ,it is in 5NF.
   Relational Decomposition:-
   o When a relation in the relational model is not in appropriate normal form then the decomposition of a
      relation is required.
   o In a database, it breaks the table into multiple tables.
   o If the relation has no proper decomposition, then it may lead to problems like loss of information.
   o Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies,
      and redundancy.
Types of Decomposition
Lossless Decomposition
   o If the information is not lost from the relation that is decomposed, then the decomposition will be
      lossless.
   o The lossless decomposition guarantees that the join of relations will result in the same relation as it
      was decomposed.
   o The relation is said to be lossless decomposition if natural joins of all the decomposition give the
      original relation.
Example: Emp-info
    Emp_ID    Emp_Name        Emp_Age   Emp_location    Dept_ID   Dept_Name
    €001      Sana            29        Hariduwar       Dpt1      Operation
    €002      Sara            32        Dehradun        Dpt2      HR
    €003      Deb             22        Delhi           Dpt3      Finance
Decompose the above table into two tables :
1) Emp Details
    Emp_ID Emp_Name Emp_Age Emp_location
    €001      Sana            29        Hariduwar
    €002      Sara            32        Dehradun
    €003      Deb             22        Delhi
2) Dept Details
    Emp_ID Dept_ID       Dept_Name
    €001      Dpt1       Operation
    €002      Dpt2       HR
    €003      Dpt3       Finance
Now ,natural join is applied on the above two tables.
   Dependency Preserving
   o It is an important constraint of the database.
   o In the dependency preservation, at least one decomposed table must satisfy every dependency.
   o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part
       of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.
   o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The
       relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD
   A->BC is a part of relation R1(ABC).
 Lossy Decomposition :When a relation is decomposed into two or more relational schemas,the loss of
    information unavoidable when the original relation is retrieved.
Example-Emp Info
    Emp_ID    Emp_Name        Emp_Age   Emp_location    Dept_ID   Dept_Name
    €001      Sana            29        Hariduwar       Dpt1      Operation
    €002      Sara            32        Dehradun        Dpt2      HR
    €003      Deb             22        Delhi           Dpt3      Finance
Decompose the above table into two tables :
< Emp Details >
    Emp_ID Emp_Name         Emp_Age    Emp_location
    €001      Sana          29         Hariduwar
    €002      Sara          32         Dehradun
    €003      Deb           22         Delhi
       Transaction property:-The transaction has the four properties. These are used to maintain
       consistency in a database, before and after the transaction.
   Property of Transaction
      1. Atomicity
      2. Consistency
      3. Isolation
      4. Durability
   1)Atomicity:-
      o It states that all operations of the transaction take place at once if not, the transaction is aborted.
      o There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as one unit
          and either run to completion or is not executed at all.
   Atomicity involves the following two operations:
   Abort: If a transaction aborts then all the changes made are not visible.
   Commit: If a transaction commits then all the changes made are visible.
   Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs 600 and B
   consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2
  Read(A)                                        Read(B)
  A:= A-100                                      Y:= Y+100
  Write(A)                                       Write(B)
     After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.
     If the transaction T fails after the completion of transaction T1 but before completion of transaction T2, then
     the amount will be deducted from A but not added to B. This shows the inconsistent database state. In order
     to ensure correctness of database state, the transaction must be executed in entirety.
     2)Consistency
         o The integrity constraints are maintained so that the database is consistent before and after the
             transaction.
         o The execution of a transaction will leave a database in either its prior stable state or a new stable
             state.
         o The consistent property of database states that every transaction sees a consistent database instance.
         o The transaction is used to transform the database from one consistent state to another consistent
             state.
     For example: The total amount must be maintained before or after the transaction.
      States of Transaction
     In a database, the transaction can be in one of the following states -
     1)Active state:-The active state is the first state of every transaction. In this state, the transaction is being
     executed.
         For example: Insertion or deletion or updating a record is done here. But all the records are still not saved
         to the database.
     2)Partially committed:-In the partially committed state, a transaction executes its final operation, but the
     data is still not saved to the database.
         o In the total mark calculation example, a final display of the total marks step is executed in this state.
     3)Committed:-A transaction is said to be in a committed state if it executes all its operations successfully. In
     this state, all the effects are now permanently saved on the database system.
     4)Failed state:-If any of the checks made by the database recovery system fails, then the transaction is said
     to be in the failed state.
         o In the example of total mark calculation, if the database is not able to fire a query to fetch the marks,
             then the transaction will fail to execute.
     5)Aborted:If any of the checks fail and the transaction has reached a failed state then the database recovery
     system will make sure that the database is in its previous consistent state. If not then it will abort or roll
     back the transaction to bring the database into a consistent state.
    o   If the transaction fails in the middle of the transaction then before executing the transaction, all the
        executed transactions are rolled back to its consistent state.
    o   After aborting the transaction, the database recovery module will select one of the two operations:
            1. Re-start the transaction
            2. Kill the transaction
   Schedule:-A series of operation from one transaction to another transaction is known as schedule. It is
    used to preserve the order of the operation in each of the individual transaction.
1. Serial Schedule:-The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction completes its cycle,
then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no
interleaving of operations, then there are the following two possible outcomes:
    1. Execute all the operations of T1 which was followed by all the operations of T2.
    2. Execute all the operations of T1 which was followed by all the operations of T2.
    o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
    o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
    o If interleaving of operations is allowed, then there will be non-serial schedule.
    o It contains many possible orders in which the system can execute the individual operations of the
       transactions.
    o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial schedules. It has
       interleaving of operations.
3. Serializable schedule
    o The serializability of schedules is used to find non-serial schedules that allow the transaction to
       execute concurrently without interfering with one another.
    o It identifies which schedules are correct when executions of the transaction have interleaving of their
       operations.
    o A non-serial schedule will be serializable if its result is equal to the result of its transactions executed
       serially.
Here,
Schedule A and Schedule B are serial schedule.
Schedule C and Schedule D are Non-serial schedule.
   View Serializability/schedule:-
   o A schedule will view serializable if it is view equivalent to a serial schedule.
   o If a schedule is conflict serializable, then it will be view serializable.
   o The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following conditions:
1. Initial Read:-An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should also read A.
Above two schedules are view equivalent because Initial read operation in S1 is done by T1 and in S2 it is also
done by T1.
2. Updated Read:-In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A
which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and in S2, T3 is reading
A updated by T1.
3. Final Write:-A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.
Above two schedules is view equal because Final write operation in S1 is done by T3 and in S2, the final write
operation is also done by T3.
 File Organization
    o The File is a collection of records. Using the primary key, we can access the records. The type and
       frequency of access can be determined by the type of file organization which was used for a given set
       of records.
    o File organization is a logical relationship among various records. This method defines how file records
       are mapped onto disk blocks.
    o File organization is used to describe the way in which the records are stored in terms of blocks, and
       the blocks are placed on the storage medium.
    o The first approach to map the database to the file is to use the several files and store only one fixed
       length record in any given file. An alternative approach is to structure our files so that we can contain
       multiple lengths for records.
    o Files of fixed length records are easier to implement than the files of variable length records.
 Objective of file organization
    o It contains an optimal selection of records, i.e., records can be selected as fast as possible.
    o To perform insert, delete or update transaction on the records should be quick and easy.
    o The duplicate records cannot be induced as a result of insert, update or delete.
    o For the minimal cost of storage, records should be stored efficiently.
Types of file organization:
File organization contains various methods. These particular methods have pros and cons on the basis of
access or selection. In the file organization, the programmer decides the best-suited file organization method
according to his requirement.
Types of file organization are as follows:
   Insertion of the new record:-Suppose there is a preexisting sorted sequence of four records R1, R3 and
    so on upto R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then it will be
    inserted at the end of the file, and then it will sort the sequence.
If we want to search, update or delete the data in heap file organization, then we need to traverse the data
from staring of the file till we get the requested record.
If the database is very large then searching, updating or deleting of record will be time-consuming because
there is no sorting or ordering of records. In the heap file organization, we need to check all the data until we
get the requested record.
Pros of Heap file organization
    o It is a very good method of file organization for bulk insertion. If there is a large number of data
        which needs to load into the database at a time, then this method is best suited.
    o In case of a small database, fetching and retrieving of records is faster than the sequential record.
Cons of Heap file organization
    o This method is inefficient for the large database because it takes time to search or modify the record.
    o
    o This method is inefficient for large databases.
   3)Hash File Organization:-Hash File Organization uses the computation of hash function on some fields
    of the records. The hash function's output determines the location of disk block where the records are to
    be placed.
When a record has to be received using the hash key columns, then the address is generated, and the whole
record is retrieved using that address. In the same way, when a new record has to be inserted, then the
address is generated using the hash key and record is directly inserted. The same process is applied in the
case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be
stored randomly in the memory.
    4)B+ File Organization
    o B+ tree file organization is the advanced method of an indexed sequential access method. It uses a
       tree-like structure to store records in File.
    o It uses the same concept of key-index where the primary key is used to sort the records. For each
       primary key, the value of the index is generated and mapped with the record.
    o The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this
       method, all the records are stored only at the leaf node. Intermediate nodes act as a pointer to the leaf
       nodes. They do not contain any records.
In this method, we can directly insert, update or delete any record. Data is sorted based on the key with
which searching is done. Cluster key is a type of key with which joining of the table is performed.
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:-In indexed cluster, records are grouped based on the cluster key and stored together.
The above EMPLOYEE and DEPARTMENT relationship is an example of an indexed cluster. Here, all the
records are grouped based on the cluster key- DEP_ID and all the records are grouped.
2. Hash Clusters:-It is similar to the indexed cluster. In hash cluster, instead of storing the records based on
the cluster key, we generate the value of the hash key for the cluster key and store the records with the same
hash key value.
Pros of Cluster file organization
    o The cluster file organization is used when there is a frequent request for joining the tables with same
       joining condition.
    o It provides the efficient result when there is a 1:M mapping between the tables.
Cons of Cluster file organization
    o This method has the low performance for the very large database.
    o This method is not suitable for a table with a 1:1 condition.
   Indexing in DBMS
   o Indexing is used to optimize the performance of a database by minimizing the number of disk accesses
       required when a query is processed.
   o The index is a type of data structure. It is used to locate and access the data in a database table
       quickly.
Index structure:
Indexes can be created using some database columns.
    o The first column of the database is the search key that contains a copy of the primary key or candidate
      key of the table. The values of the primary key are stored in sorted order so that the corresponding
      data can be accessed easily.
   o The second column of the database is the data reference. It contains a set of pointers holding the
      address of the disk block where the value of the particular key can be found.
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered
indices.
Example: Suppose we have an employee table with thousands of record and each of which is 10 bytes long. If
their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.
    o In the case of a database with no index, we have to search the disk block from starting till it reaches
        543. The DBMS will read the record after reading 543*10=5430 bytes.
    o In the case of an index, we will search using indexes and the DBMS will read the record after reading
        542*2= 1084 bytes which are very less compared to the previous case.
Primary Index
    o If the index is created on the basis of the primary key of the table, then it is known as primary
        indexing. These primary keys are unique to each record and contain 1:1 relation between the records.
    o As primary keys are stored in sorted order, the performance of the searching operation is quite
        efficient.
    o The primary index can be classified into two types: Dense index and Sparse index.
Dense index
    o The dense index contains an index record for every search key value in the data file. It makes
        searching faster.
    o In this, the number of records in the index table is same as the number of records in the main table.
    o It needs more space to store index record itself. The index records have the search key and a pointer
        to the actual record on the disk.
Sparse index
   o In the data file, index record appears only for a few items. Each item points to a block.
   o   In this, instead of pointing to each record in the main table, the index points to the records in the
       main table in a gap.
Clustering Index
   o A clustered index can be defined as an ordered data file. Sometimes the index is created on non-
       primary key columns which may not be unique for each record.
   o In this case, to identify the record faster, we will group two or more columns to get the unique value
       and create index out of them. This method is called a clustering index.
   o The records which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we use a clustering
index, where all employees which belong to the same Dept_ID are considered within a single cluster, and
index pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which belong to the
different cluster. If we use separate disk block for separate clusters, then it is called better technique.
Secondary Index:-In the sparse indexing, as the size of the table grows, the size of mapping also grows.
These mappings are usually kept in the primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the address got from mapping. If the mapping size
grows then fetching the address itself becomes slower. In this case, the sparse index will not be efficient. To
overcome this problem, secondary indexing is introduced.
In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this method,
the huge range for the columns is selected initially so that the mapping size of the first level becomes small.
Then each range is further divided into smaller ranges. The mapping of the first level is stored in the primary
memory, so that address fetch is faster. The mapping of the second level and actual data are stored in the
secondary memory (hard disk).
For example:
   o If you want to find the record of roll 111 in the diagram, then it will search the highest entry which is
      smaller than or equal to 111 in the first level index. It will get 100 at this level.
   o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the address 110,
      it goes to the data block and starts searching each record till it gets 111.
   o This is how a search is performed in this method. Inserting, updating or deleting is also done in the
      same manner.
 B+ Tree
   o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
   o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes remain at the
      same height.
   o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random
      access as well as sequential access.
 Structure of B+ Tree
   o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the order n
      where n is fixed for every B+ tree.
   o It contains an internal node and leaf node.
Internal node
    o An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
    o At most, an internal node of the tree contains n pointers.
Leaf node
    o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
    o At most, a leaf node contains n record pointer and n key values.
    o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.
 Searching a record in B+ Tree:-Suppose we have to search 55 in the below B+ tree structure. First, we
     will fetch for the intermediary node which will direct to the leaf node that can contain a record for 55.
So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we will be
redirected to the third leaf node. Here DBMS will perform a sequential search to find 55.
   B+ Tree Insertion:-Suppose we want to insert a record 60 in the below structure. It will go to the 3rd
    leaf node after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60
    there.
In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the fill factor,
balance and order.
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf
node of the tree in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70)
into 2 leaf nodes.
If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60 added to it,
and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the
node where it fits and then place it in that leaf node.
 B+ Tree Deletion:-Suppose we want to delete 60 from the above example. In this case, we have to
    remove 60 from the intermediate node as well as from the 4th leaf node too. If we remove it from the
    intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it to have a
    balanced tree.
After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:
               Insertion takes more time and it       Insertion is easier and the results are always
 Insertion
                 is not predictable sometimes.                           the same.
 Basis of                   B tree                                   B+ tree
Compariso
    n
   Leaf         Leaf nodes are not stored as       Leaf nodes are stored as structural linked
  Nodes            structural linked list.                            list.
              Sequential access to nodes is not   Sequential access is possible just like linked
  Access
                          possible                                     list
               For a particular number nodes        Height is lesser than B tree for the same
 Height
                       height is larger                         number of nodes
 Number           Number of nodes at any           Each intermediary node can have n/2 to n
 of Nodes        intermediary level ‘l’ is 2l.                    children.
       B-tree is used in DBMS(code indexing,      While binary tree is used in Huffman coding
 5.
                         etc).                     and Code optimization and many others.
        To insert the data or key in B-tree is     While in binary tree, data insertion is not
 6.
        more complicated than a binary tree.            more complicated than B-tree.
Definition of B-tree
B-tree in DBMS is an m-way tree which self balances itself. Due to their balanced structure, such trees are
frequently used to manage and organise enormous databases and facilitate searches. In a B-tree, each node
can have a maximum of n child nodes. In DBMS, B-tree is an example of multilevel indexing. Leaf nodes and
internal nodes will both have record references. B-Tree is called Balanced stored trees as all the leaf nodes
are at same levels.
Properties of B-tree
    All leaves are at the same level.
    B-Tree is defined by the term minimum degree ‘t‘. The value of ‘t‘ depends upon disk block size.
    Every node except the root must contain at least t-1 keys. The root may contain a minimum of 1 key.
    All nodes (including root) may contain at most (2*t – 1) keys.
    Number of children of a node is equal to the number of keys in it plus 1.
    All keys of a node are sorted in increasing order. The child between two keys k1 and k2 contains all
      keys in the range from k1 and k2.
    B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search Trees
      grow downward and also shrink from downward.
    Like other balanced Binary Search Trees, the time complexity to search, insert, and delete is O(log
      n).
    Insertion of a Node in B-Tree happens only at Leaf Node.
Need of B-tree
    For having optimized searching we cannot increase a tree's height. Therefore, we want the tree to be
      as short as possible in height.
    Use of B-tree in DBMS, which has more branches and hence shorter height, is the solution to this
      problem. Access time decreases as branching and depth grow.
    Hence, use of B-tree is needed for storing data as searching and accessing time is decreased.
    The cost of accessing the disc is high when searching tables Therefore, minimising disc access is our
      goal.
    So to decrease time and cost, we use B-tree for storing data as it makes the Index Fast.
Interesting Facts about B-Trees:
       The minimum height of the B-Tree that can exist with n number of nodes and m is the
          maximum number of children of a node can have
           is:
          The maximum height of the B-Tree that can exist with n number of nodes and t is the
           minimum number of children that a non-root node can have
           is:                               and
Traversal in B-Tree:
Traversal is also similar to Inorder traversal of Binary Tree. We start from the leftmost child, recursively
print the leftmost child, then repeat the same process for the remaining children and keys. In the end,
recursively print the rightmost child.
The above data is stored in sorted order according to the values, if we want to search for the node containing
the value 48, so the following steps will be applied:
     First, the parent node with key having data 100 is checked, as 48 is less than 100 so the left children
       node of 100 is checked.
     In left children, there are 3 keys, so it will check from the leftmost key as the data is stored in sorted
       order.
     Leftmost element is having key value as 48 which match the element to be searched, so thats how we
       the element we wanted to search.
Applications of B-Trees:
      It is used in large databases to access data stored on the disk
      Searching for data in a data set can be achieved in significantly less time using the B-Tree
      With the indexing feature, multilevel indexing can be achieved.
      Most of the servers also use the B-tree approach.
      B-Trees are used in CAD systems to organize and search geometric data.
      B-Trees are also used in other areas such as natural language processing, computer networks, and
    cryptography.
Advantages of B-Trees:
       B-Trees have a guaranteed time complexity of O(log n) for basic operations like insertion, deletion,
    and searching, which makes them suitable for large data sets and real-time applications.
       B-Trees are self-balancing.
      High-concurrency and high-throughput.
      Efficient storage utilization.
Disadvantages of B-Trees:
      B-Trees are based on disk-based data structures and can have a high disk usage.
      Not the best for all cases.
      Slow in comparison to other data structures.
1. Search O(log n)
2. Insert O(log n)
3. Delete O(log n)