Database: Database is a collection of inter-related data which helps in efficient retrieval, insertion
and deletion of data from database and organizes the data in the form of tables
..
     1. Centralized Database :
        A centralized database is basically a type of database that is stored, located as well as
        maintained at a single location only.
        This type of database is modified and managed from that location itself.
        This location is thus mainly any database system or a centralized computer system. The
        centralized location is accessed via an internet connection (LAN, WAN, etc).
        This centralized database is mainly used by institutions or organizations.
 Advantages –
 Since all data is stored at a single location only thus it is easier to access and co-ordinate data.
 The centralized database has very minimal data redundancy since all data is stored at a single
   place.
 It is cheaper in comparison to all other databases available.
 Disadvantages –
 The data traffic in case of centralized database is more.
 If any kind of system failure occurs at centralized system then entire data will be destroyed.
     2. Distributed Database :
       A distributed database is basically a type of database which consists of multiple databases
     that are connected with each other and are spread across different physical locations.
         The data that is stored on various physical locations can thus be managed independently
        of other physical locations. The communication between databases at different physical
        locations is thus done by a computer network.
Advantages –
   This database can be easily expanded as data is already spread across different physical
    locations.
   The distributed database can easily be accessed from different networks.
   This database is more secure in comparison to centralized database.
     More info
        Advantages of Distributed Databases
       Following are the advantages of distributed databases over centralized databases.
       Modular Development − If the system needs to be expanded to new locations or new units, in
        centralized database systems, the action requires substantial efforts and disruption in the
        existing functioning. However, in distributed databases, the work simply requires adding new
        computers and local data to the new site and finally connecting them to the distributed system,
        with no interruption in current functions.
       More Reliable − In case of database failures, the total system of centralized databases comes to
        a halt. However, in distributed systems, when a component fails, the functioning of the system
        continues may be at a reduced performance. Hence DDBMS is more reliable.
       Better Response − If data is distributed in an efficient manner, then user requests can be met
        from local data itself, thus providing faster response. On the other hand, in centralized systems,
        all queries have to pass through the central computer for processing, which increases the
        response time.
       Lower Communication Cost − In distributed database systems, if data is located locally where
        it is mostly used, then the communication costs for data manipulation can be minimized. This is
        not feasible in centralized systems.
Distributed database management basically proposed for the various reason from organizational
decentralization and economical processing to greater autonomy. Some of these advantages are
as follows:
 1. Management of data with different level of transparency –
 Ideally, a database should be distribution transparent in the sense of hiding the details of where
 each file is physically stored within the system. The following types of transparencies are basically
 possible in the distributed database system:
 Network transparency:
    This basically refers to the freedom for the user from the operational details of the network.
    These are of two types Location and naming transparency.
   Replication transparencies:
    It basically made user unaware of the existence of copies as we know that copies of data may
    be stored at multiple sites for better availability performance and reliability.
   Fragmentation transparency:
    It basically made user unaware about the existence of fragments it may be the vertical
    fragment or horizontal fragmentation.
2. Increased Reliability and availability –
Reliability is basically defined as the probability that a system is running at a certain time whereas
Availability is defined as the probability that the system is continuously available during a time
interval. When the data and DBMS software are distributed over several sites one site may fail
while other sites continue to operate and we are not able to only access the data that exist at the
failed site and this basically leads to improvement in reliability and availability.
3. Easier Expansion –
In a distributed environment expansion of the system in terms of adding more data, increasing
database sizes, or adding more data, increasing database sizes or adding more processor is much
easier.
4. Improved Performance –
We can achieve interquery and intraquery parallelism by executing multiple queries at different
sites by breaking up a query into a number of subqueries that basically executes in parallel which
basically leads to improvement in performance.
Disadvantages –
   This database is very costly and it is difficult to maintain because of its complexity.
   In this database, it is difficult to provide a uniform view to user since it is spread across different
    physical locations.
    More info
    a) Complexity:-DBAs may have to do extra work to ensure that the distributed nature of the system
    is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one
    big one. Extra database design work must also be done to account for the disconnected nature of the
    database for example, joins become prohibitively expensive when performed across multiple
    systems.
     b) Economics :- increased complexity and a more extensive infrastructure means extra labour costs
    c) Security: Remote database fragments must be secured, and they are not centralized so the remote
    sites must be secured as well. The infrastructure must also be secured (for example, by encrypting
    the network links between remote sites.
    d) Difficult to maintain integrity :- but in a distributed database, enforcing integrity over a network
    may require too much of the network's resources to be feasible
    e) Inexperience :- distributed databases are difficult to work with, and in such a young field there is
    not much readily available experience in "proper" practice
     f) Lack of standards:- there are no tools or methodologies yet to help users convert a centralized
    DBMS into a distributed DBMS
     g) Database design more complex :- In addition to traditional database design challenges, the
    design of a distributed database has to consider fragmentation of data, allocation of fragments to
    specific sites and data replication
    h) Additional software is required i) Operating system should support distributed environment IV.
    Problems Areas of Distributed Database
Functions of Distributed database system:
          Keeping track of data –
           The basic function of DDBMS is to keep track of the data distribution, fragmentation and
           replication by expanding the DDBMS catalog.
          Distributed Query Processing –
           The basic function of DDBMS is basically its ability to access remote sites and to transmits
           queries and data among the various sites via a communication network.
          Replicated Data Management –
           The basic function of DDBMS is basically to decide which copy of a replicated data item to
           access and to maintain the consistency of copies of replicated data items.
          Distributed Database Recovery –
           The ability to recover from the individual site crashes and from new types of failures such
           as failure of communication links.
          Security –
           The basic function of DDBMS is to execute Distributed Transaction with proper
           management of the security of the data and the authorization/access privilege of users.
          Distributed Directory Management –
           A directory basically contains information about data in the database. The directory may be
           global for the entire DDB, or local for each site. The placement and distribution of the
           directory may have design and policy issues.
          Distributed Transaction Management –
           The basic function of DDBMS is its ability to devise execution strategies for queries and
           transaction that access data from more than one site and to synchronize the access to
           distributed data and basically to maintain the integrity of the complete database.
Distributed Database Management System
    A distributed database management system (DDBMS) is a centralized software system that manages a
    distributed database in a manner as if it were all stored in a single location.
Features
          It is used to create, retrieve, update and delete distributed databases.
          It synchronizes the database periodically and provides access mechanisms by the virtue of
            which the distribution becomes transparent to the users.
          It ensures that the data modified at any site is universally updated.
          It is used in application areas where large volumes of data are processed and accessed by
            numerous users simultaneously.
          It is designed for heterogeneous database platforms.
          It maintains confidentiality and data integrity of the databases.
A distributed DBMS consist of a single logical database that is divided into a number of pieces
called the fragments. In DDBMS, Each site is capable of independently processing the users
request.
Users can access the DDBMS via applications classified:
1. Local Applications –
   Those applications that doesn’t require data from the other sites are classified under the
   category of Local applications.
2. Global Applications –
   Those applications that require data from the other sites are classified under the category of
   Global applications.
Characteristics of Distributed DDBMS :
A DDBMS has the following characteristics-
1. A collection of logically related shared data.
2. The data is split into a number of fragments.
3. Fragments may be duplicate.
4. Fragments are allocated to sites.
   Applications
    1.   Manufacturing - especially multi-plant manufacturing
    2.   Military command and control
    3.   Electronic fund transfers and electronic trading
    4.   Corporate MIS
    5.   Airline restrictions
    6.   Hotel chains
    7.   Any organization which has a decentralized organization structure
   Distributed database system;
   The data at each site is under the control of DBMS and managed by DBMS.
   A distributed database is basically a database that is not limited to one system, it is spread
   over different sites, i.e, on multiple computers or over a network of computers. A distributed
   database system is located on various sites that don’t share physical components. This may
   be required when a particular database needs to be accessed by various users globally. It
   needs to be managed such that for the users it looks like one single database.
         Types:
            1. Homogeneous Database:In a homogeneous database, all different sites store
                database identically. The operating system, database management system and the
                data structures used – all are same at all sites. Hence, they’re easy to manage.
     2. Heterogeneous Database:
     In a heterogeneous distributed database, different sites can use different schema and
     software that can lead to problems in query processing and transactions. Also, a particular
     site might be completely unaware of the other sites. Different computers may use a
     different operating system, different database application. They may even use different
     data models for the database. Hence, translations are required for different sites to
     communicate.
Distributed Processing :
              –    Data stored at a number of sites each site logically consists of a single processor
              –    Processors at different sites are interconnected by a computer network
                   (we do notconsider multiprocessors in DDBMS, cf. parallel systems)
              –    DDBS is a database, not a collection of files (cf. relational data model).
                   Placementand query of data is impacted by the access patterns of the
                   user
              –    DDBMS is a collections of DBMSs (not a remote file system)
Distributed DBMS Promises
  1. Transparent management of distributed, fragmented, and replicated data
  2. Improved reliability/availability through distributed transactions
  3. Improved performance
  4. Easier and more economical system expansion
Problems Areas of Distributed Database
Following are the Problems Areas of Distributed database. :-
1) Distributed Concurrency Control: - Distributed Concurrency Control specifies that synchronization of
access to the distributed database such that the integrity of the database is maintained. To maintain
Concurrency in distributed database different locking techniques should used which is based on mutual
exclusion of access to data. Time stamping algorithm also used where transactions are executed in some
order [1].
 2) Distributed Deadlock Management :- In distributed database several users are request for resources
from the database if the resources are available at that time , then database grant the resources to that
user if not available the user has to wait until the resources are released by other user. Sometimes the
users are not released the resources are blocked by some other user. This situation is known as
Deadlock. Distributed Deadlock is manage using the different algorithm and techniques such avoidance
and detection algorithm.
 3) Replication Control: - Replication is a technique that only applies to distributed systems. A database
is said to be replicated if the entire database or a portion of it (a table, some tables, one or more
fragments, etc.) is copied and the copies are stored at different sites. The issue with having more than
one copy of a database is maintaining the mutual consistency of the copies—ensuring that all copies
have identical schema and data content [2].
4) Operating Environment: - To Implement Distributed Database Environment a Specific Operating
System is requirement as per Organizational needs. Operating system plays and important role for
managing the distributed database. Some time Operating system is not supported for Distributed
database.
 5) Transparent Management: - Transparent management of Data is one of the major problem area in
Distributed database. In Distributed database data is situated in multiple locations and number of users
are used that database. To maintain the integrity of database transparent management of data is
important.
 6) Security and privacy: - How to apply the security policies to the interdependent system is a great
issue in distributed system. Since distributed systems deal with sensitive data and information so the
system must have a strong security and privacy measurement. Protection of distributed system assets,
including base resources, storage, communications and user-interface I/O as well as higherlevel
composites of these resources, like processes, files, messages, display windows and more complex
objects, are important issues in distributed system
7) Resource management: - In distributed systems, objects consisting of resources are located on
different places. Routing is an issue at the network layer of the distributed system and at the application
layer. Resource management in a distributed system will interact with its eterogeneous Nature. V.
Distributed Database A
    Architectures of Distributed DBMS
    The basic types of distributed DBMS are as follows:
    1. Client-server architecture of Distributed system.
   A client server architecture has a number of clients and a few servers connected in a network.
   A client sends a query to one of the servers. The earliest available server solves it and replies.
   A Client-server architecture is simple to implement and execute due to centralized server
    system.
    2. Collaborating server architecture.
   Collaborating server architecture is designed to run a single query on multiple servers.
   Servers break single query into multiple small queries and the result is sent to the client.
   Collaborating server architecture has a collection of database servers. Each server is capable for
    executing the current transactions across the databases.
    3. Middleware architecture.
   Middleware architectures are designed in such a way that single query is executed on multiple
    servers.
   This system needs only one server which is capable of managing queries and transactions from
    multiple servers.
   Middleware architecture uses local servers to handle local queries and transactions.
   The softwares are used for execution of queries and transactions across one or more
    independent database servers, this type of software is called as middleware.
    Distributed DBMS Architectures
    DDBMS architectures are generally developed depending on three parameters −
           Distribution − It states the physical distribution of data across the different sites.
           Autonomy − It indicates the distribution of control of the database system and the degree to
            which each constituent DBMS can operate independently.
           Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
            components and databases.
    Architectural Models
    Some of the common architectural models are −
           Client - Server Architecture for DDBMS
           Peer - to - Peer Architecture for DDBMS
           Multi - DBMS Architecture
    Client - Server Architecture for DDBMS
    This is a two-level architecture where the functionality is divided into servers and clients. The server
    functions primarily encompass data management, query processing, optimization and transaction
    management. Client functions include mainly user interface. However, they have some functions like
    consistency checking and transaction management.
    The two different client - server architecture are −
           Single Server Multiple Client
           Multiple Server Multiple Client (shown in the following diagram)
Peer- to-Peer Architecture for DDBMS
In these systems, each peer acts both as a client and a server for imparting database services. The
peers share their resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas −
       Global Conceptual Schema − Depicts the global logical view of data.
       Local Conceptual Schema − Depicts logical data organization at each site.
       Local Internal Schema − Depicts physical data organization at each site.
       External Schema − Depicts user view of data.
Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous database
systems.
Multi-DBMS can be expressed through six levels of schemas −
      Multi-database View Level − Depicts multiple user views comprising of subsets of the
       integrated distributed database.
      Multi-database Conceptual Level − Depicts integrated multi-database that comprises of global
       logical multi-database structure definitions.
      Multi-database Internal Level − Depicts the data distribution across different sites and multi-
       database to local data mapping.
      Local database View Level − Depicts public view of local data.
      Local database Conceptual Level − Depicts local data organization at each site.
      Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −
      Model with multi-database conceptual level.
      Model without multi-database conceptual level.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −
      Non-replicated and non-fragmented
      Fully replicated
      Partially replicated
      Fragmented
      Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so that it is at a
close proximity to the site where it is used most. It is most suitable for database systems where the
percentage of queries needed to join information in tables placed at different sites is low. If an
appropriate distribution strategy is adopted, then this design alternative helps to reduce the
communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since, each site has
its own copy of the entire database, queries are very fast requiring negligible communication cost. On
the contrary, the massive redundancy in data requires huge cost during update operations. Hence, this
    is suitable for systems where a large number of queries is required to be handled whereas the number of
    database updates is low.
    Partially Replicated
    Copies of tables or portions of tables are stored at different sites. The distribution of the tables is done in
    accordance to the frequency of access. This takes into consideration the fact that the frequency of
    accessing the tables vary considerably from site to site. The number of copies of the tables (or portions)
    depends on how frequently the access queries execute and the site which generate the access queries.
    Fragmented
    In this design, a table is divided into two or more pieces referred to as fragments or partitions, and each
    fragment can be stored at different sites. This considers the fact that it seldom happens that all data
    stored in a table is required at a given site. Moreover, fragmentation increases parallelism and provides
    better disaster recovery. Here, there is only one copy of each fragment in the system, i.e. no redundant
    data.
    The three fragmentation techniques are −
           Vertical fragmentation
           Horizontal fragmentation
           Hybrid fragmentation
    Mixed Distribution
    This is a combination of fragmentation and partial replications. Here, the tables are initially fragmented in
    any form (horizontal or vertical), and then these fragments are partially replicated across the different
    sites according to the frequency of accessing the fragments.
    Design Strategies
    Data Replication
    replication?
    Data replication is the process in which the data is copied at multiple locations (Different
    computers or servers) to improve the availability of data.
    Goals of data replication
    Data replication is done with an aim to:
   Increase the availability of data.
   Speed up the query evaluation.
    Types of data replication
    There are two types of data replication:
    1. Synchronous Replication:
    In synchronous replication, the replica will be modified immediately after some changes are
    made in the relation table. So there is no difference between original data and replica.
    2. Asynchronous replication:
    In asynchronous replication, the replica will be modified after commit is fired on to the database.
    Replication Schemes
    The three replication schemes are as follows:
    1. Full Replication
    In full replication scheme, the database is available to almost every location or user in
    communication network.
    Advantages of full replication
   High availability of data, as database is available to almost every location.
   Faster execution of queries.
    Disadvantages of full replication
   Concurrency control is difficult to achieve in full replication.
   Update operation is slower.
    2. No Replication
    No replication means, each fragment is stored exactly at one location.
    Advantages of no replication
   Concurrency can be minimized.
   Easy recovery of data.
    Disadvantages of no replication
   Poor availability of data.
   Slows down the query execution process, as multiple clients are accessing the same server.
    3. Partial replication
          Partial replication means only some fragments are replicated from the database.
           Advantages of partial replication
           The number of replicas created for fragments depend upon the importance of data in that
           fragment.
    fragmentation?
   The process of dividing the database into a smaller multiple parts is called as fragmentation.
   These fragments may be stored at different locations.
   The data fragmentation process should be carrried out in such a way that the reconstruction of
    original database from the fragments is possible.
    Types of data Fragmentation
    There are three types of data fragmentation:
    1. Horizontal data fragmentation
    Horizontal fragmentation divides a relation(table) horizontally into the group of rows to create
    subsets of tables.
    Example:
    Account (Acc_No, Balance, Branch_Name, Type).
    In this example if values are inserted in table Branch_Name as Pune, Baroda, Delhi.
    The query can be written as:
    SELECT*FROM ACCOUNT WHERE Branch_Name= “Baroda”
    Types of horizontal data fragmentation are as follows:
    1) Primary horizontal fragmentation
    Primary horizontal fragmentation is the process of fragmenting a single table, row wise using a
    set of conditions.
    Example:
               Acc_No                       Balance                                Branch_Name
    A_101                        5000                           Pune
    A_102                        10,000                         Baroda
    A_103                        25,000                         Delhi
    For the above table we can define any simple condition like, Branch_Name= 'Pune',
    Branch_Name= 'Delhi', Balance < 50,000
    Fragmentation1:
    SELECT * FROM Account WHERE Branch_Name= 'Pune' AND Balance < 50,000
    Fragmentation2:
    SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000
    2) Derived horizontal fragmentation
    Fragmentation derived from the primary relation is called as derived horizontal fragmentation.
    Example: Refer the example of primary fragmentation given above.
    The following fragmentation are derived from primary fragmentation.
    Fragmentation1:
    SELECT * FROM Account WHERE Branch_Name= 'Baroda' AND Balance < 50,000
    Fragmentation2:
    SELECT * FROM Account WHERE Branch_Name= 'Delhi' AND Balance < 50,000
    3) Complete horizontal fragmentation
   The complete horizontal fragmentation generates a set of horizontal fragmentation, which
    includes every table of original relation.
   Completeness is required for reconstruction of relation so that every table belongs to at least
     one of the partitions.
    4) Disjoint horizontal fragmentation
    The disjoint horizontal fragmentation generates a set of horizontal fragmentation in which no two
    fragments have common tables. That means every table of relation belongs to only one
    fragment.
    5) Reconstruction of horizontal fragmentation
    Reconstruction of horizontal fragmentation can be performed using UNION operation on
    fragments.
    2. Vertical Fragmentation
    Vertical fragmentation divides a relation(table) vertically into groups of columns to create
    subsets of tables.
    Example:
               Acc_No                        Balance                               Branch_Name
    A_101                         5000                          Pune
    A_102                         10,000                        Baroda
    A_103                         25,000                        Delhi
    Fragmentation1:
    SELECT * FROM Acc_NO
    Fragmentation2:
    SELECT * FROM Balance
    Complete vertical fragmentation
   The complete vertical fragmentation generates a set of vertical fragments, which can include all
    the attributes of original relation.
   Reconstruction of vertical fragmentation is performed by using Full Outer Join operation on
    fragments.
    3) Hybrid Fragmentation
       Hybrid fragmentation can be achieved by performing horizontal and vertical partition together.
       Mixed fragmentation is group of rows and columns in relation.
        Example: Consider the following table which consists of employee information.
               Emp_ID                Emp_Name                 Emp_Address                 Emp_Age                Emp_Salary
        101                 Surendra                 Baroda                        25                    15000
        102                 Jaya                     Pune                          37                    12000
        103                 Jayesh                   Pune                          47                    10000
        Fragmentation1:
        SELECT * FROM Emp_Name WHERE Emp_Age < 40
        Fragmentation2:
        SELECT * FROM Emp_Id WHERE Emp_Address= 'Pune' AND Salary < 14000
    
        The design issues of Distributed Database
              1. Distributed Database Design
                 • One of the main questions that is being addressed is how database and the applications that
                 run against it should be placed across the sites.
                 • There are two basic alternatives to placing data: partitioned (or no-replicated) and replicated.
                 • In the partitioned scheme the database is divided into a number of disjoint partitions each of
                 which is placed at different site. Replicated designs can be either fully replicated (also called fully
                 duplicated) where entire database is stored at each site, or partially replicated (or partially
                 duplicated) where each partition of the database is stored at more than one site, but not at all the
                 sites.
                 • The two fundamental design issues are fragmentation, the separation of the database into
                 partitions called fragments, and distribution, the optimum distribution of fragments. The research
                 in this area mostly involve mathematical programming in order to minimize the combined cost of
                 storing the database, processing transactions against it, and message communication among
                 site.
              2. Distributed Directory Management
                 • A directory contains information (such as descriptions and locations) about data items in the
                 database. Problems related to directory management are similar in nature to the database
                 placement problem discussed in the preceding section.
                 • A directory may be global to the entire DDBS or local to each site; it can be centralized at one
                 site or distributed over several sites; there can be a single copy or multiple copies.
              3. Distributed Query Processing
                 • Query processing deals with designing algorithms that analyze queries and convert them into a
                 series of data manipulation operations. The problem is how to decide on a strategy for executing
                 each query over the network in the most cost-effective way, however cost is defined.
                 • The factors to be considered are the distribution of data, communication cost, and lack of
                 sufficient locally-available information. The objective is to optimize where the inherent parallelism
   is used to improve the performance of executing the transaction, subject to the abovementioned
   constraints.
4. Distributed Concurrency Control
   • Concurrency control involves the synchronization of access to the distributed database, such
   that the integrity of the database is maintained. It is, without any doubt, one of the most
   extensively studied problems in the DDBS field.
   • The concurrency control problem in a distributed context is somewhat different that in a
   centralized framework. One not only has to worry about the integrity of a single database, but
   also about the consistency of multiple copies of the database. The condition that requires all
   values of multiple copies of every data item to converge to the same value is called mutual
   consistency.
   • Let us only mention that the two general classes are pessimistic, synchronizing the execution of
   the user request before the execution starts, and optimistic, executing requests and then
   checking if the execution has compromised the consistency of the database.
   • Two fundamental primitives that can be used with both approaches are locking, which is based
   on the mutual exclusion of access to data items, and time-stamping, where transactions
   executions are ordered based on timestamps.
   • There are variations of these schemes as well as hybrid algorithms that attempt to combine the
   two basic mechanisms.
5. Distributed Deadlock Management
   • The deadlock problem in DDBSs is similar in nature to that encountered in operating systems.
   • The competition among users for access to a set of resources (data, in this case) can result in a
   deadlock if the synchronization mechanism is based on locking. The well-known alternatives of
   prevention, avoidance, and detection/recovery also apply to DDBSs.
6. Reliability of Distributed DBMS
   • It is important that mechanisms be provided to ensure the consistency of the database as well
   as to detect failures and recover from them. The implication for DDBSs is that when a failure
   occurs and various sites become either inoperable or inaccessible, the databases at the
   operational sites remain consistent and up to date.
   • Furthermore, when the computer system or network recovers from the failure, the DDBSs
   should be able to recover and bring the databases at the failed sites up-to date. This may be
   especially difficult in the case of network partitioning, where the sites are divided into two or more
   groups with no communication among them.
7. Replication
   • If the distributed database is (partially or fully) replicated, it is necessary to implement protocols
   that ensure the consistency of the replicas, i.e. copies of the same data item have the same
   value.
   • These protocols can be eager in that they force the updates to be applied to all the replicas
   before the transactions completes, or they may be lazy so that the transactions updates one copy
   (called the master) from which updates are propagated to the others after the transaction
   completes.