Mod8 Dbms
Mod8 Dbms
       Core Philosophy
       An OODBMS adopts the principles of object-oriented programming—specifically, the
       encapsulation of state and behavior within objects—and brings them into the realm of
       persistent storage. That is, objects created in an OOP language can be stored and retrieved
       directly from the database, maintaining their identity, structure, and behavior.
       Objects are instances of classes, and classes define both the data (attributes) and the
       methods (operations) that can be performed on the data. In an OODBMS, these objects are
       not transient—meaning they do not cease to exist once the application terminates. Rather,
       the system manages their persistence, supporting operations such as object creation,
       deletion, updates, and queries.
       Characteristics
       1. Object Identity: Unlike relational databases where identity is based on primary key
          values, OODBMS supports intrinsic object identity. Every object has a unique object
          identifier (OID) that remains constant throughout its lifetime.
       2. Encapsulation: Objects encapsulate both data and behavior. Access to object state is
          ideally only via defined methods.
       3. Complex Objects: The system can manage arbitrarily complex data structures,
          including sets, lists, nested records, and user-defined types.
       4. Inheritance: OODBMS supports both single and multiple inheritance. This allows
          classes to inherit properties and behaviors from other classes, encouraging reuse and
          polymorphism.
mod8                                                                                                 1
       5. Persistence: Objects can be made persistent without requiring conversion to tables or
          flat files. This aligns well with applications built in OOP languages.
       6. Programming Language Integration: Many OODBMSs are tightly coupled with object-
          oriented programming languages (e.g., C++, Java), reducing the impedance mismatch
          between the programming model and the data model.
       Conceptual Foundation
       In an ORDBMS, data is still represented in terms of tables (relations), but these tables can
       now contain complex data types, user-defined types (UDTs), and inheritance structures.
       Rather than breaking complex objects down into multiple flat tables, ORDBMS allows these
       objects to be represented more natively, while still supporting SQL-based operations.
       This system retains the declarative querying power of SQL, and incorporates object-
       oriented modeling constructs such as encapsulation, polymorphism, and extensibility into
       the relational framework.
       Key Features
       1. Extended Type System: ORDBMS introduces abstract data types (ADTs) or user-
          defined types (UDTs) that can encapsulate structure and behavior. These are more
          expressive than primitive data types.
       2. Inheritance and Polymorphism: Tables (or types) can inherit from one another. This
          allows for schema evolution and reuse. Queries on parent types can automatically
          include data from all subtypes.
mod8                                                                                                  2
       3. Encapsulated Methods: Methods can be defined alongside UDTs. These methods are
          typically written in external languages (e.g., PL/SQL, Java, C) and can be invoked from
          within SQL.
       4. Complex Structures: ORDBMS supports arrays, sets, nested tables, and even
          multimedia objects (images, audio, video) as columns in a relational schema.
       Conclusion
       Object-Oriented and Object-Relational databases address the limitations of the flat
       relational model in different ways. OODBMS is more aligned with pure object-oriented
       applications, while ORDBMS extends the relational paradigm with object-oriented
       capabilities to support more complex data modeling without abandoning SQL. Both have
       their place in database history, but ORDBMS has found broader commercial acceptance
       due to its backward compatibility, extensibility, and standardization.
       A logical database defines what data is stored and how it is related, while remaining
       independent of storage mechanisms, access paths, and indexing methods.
mod8                                                                                                        3
       3. Internal Schema (Physical Level) – Storage structures
       The logical schema (or logical database) exists at the middle level, providing a unified and
       abstract description of the entire database, which is independent of physical storage and
       specific application requirements.
       2. Schema Definition
       A logical database includes:
Data types
mod8                                                                                                  4
        Level            Conceptual level (middle)           Internal level (lowest)
       Example:
       An E-R diagram may define:
       This schema represents the logical database — it defines the data and its structure, but not
       how it is stored or accessed.
       2. Portability
       The same logical schema can be mapped to different physical architectures or storage
       systems.
mod8                                                                                                  5
       3. Centralized Design
       Acts as a unified blueprint for managing data across different applications and user
       interfaces.
          Logical design is DBMS-independent and focuses on business rules, data flow, and
          structural constraints.
       8. Real-World Relevance
       Logical databases are widely used in:
Application frameworks that generate database access layers from logical models
       9. Limitations
          Logical databases do not account for performance optimization — those concerns are
          addressed during physical schema design.
          In some cases, logical abstraction may hide important hardware constraints that affect
          design decisions (e.g., storage size limits, disk layout).
       Conclusion
       A logical database is a crucial abstraction layer in DBMS that defines the structure,
       constraints, and relationships of data independently of how data is physically stored. It
       provides the foundation for data independence, system modularity, and application-
mod8                                                                                                6
       agnostic database access. Mastery of logical schema design is essential for any database
       designer or architect aiming for scalable and maintainable systems.
       Web databases are integral to modern computing ecosystems, powering online banking,
       social media platforms, e-commerce systems, content management systems, and virtually
       all interactive websites.
       2. Definition
       A Web Database is a database system that is accessible over the internet or an intranet
       through web technologies such as HTTP, server-side scripting languages (like PHP, Python,
       Node.js), and client-server communication protocols.
       It typically works behind the scenes of dynamic websites, handling data retrieval,
       manipulation, and storage operations requested via web interfaces.
Communicates with the web server via HTTP or asynchronous technologies like AJAX.
mod8                                                                                               7
           Uses server-side technologies such as PHP, Django (Python), Express (Node.js),
           ASP.NET, Java Servlets, etc.
           Platform Independent: Accessible from various devices and operating systems using
           standard internet protocols.
       These technologies work in tandem to enable seamless communication between the user
       and the database.
mod8                                                                                             8
       6. Use Cases and Applications
         E-commerce Websites (e.g., Amazon): To manage product catalogs, orders, users,
         payments
         Social Media Platforms (e.g., Facebook): To handle profiles, posts, interactions, and
         feeds
         Online Banking Systems: Securely store and manage financial transactions and
         customer data
         Content Management Systems (e.g., WordPress): Power blogs and websites with
         dynamic content
       3. Cost Efficiency: No need for specialized client software; browsers act as universal
         frontends.
4. Scalability: Can be scaled vertically and horizontally to handle growing user bases.
       5. Ease of Integration: Can be connected with third-party APIs, payment gateways, social
         login systems, etc.
       2. Concurrency Management: Must handle race conditions and maintain ACID properties
         under load.
3. Latency: Dependent on internet bandwidth; high latency can degrade user experience.
5. Downtime Risk: Server or network failures can make the application inaccessible.
mod8                                                                                              9
       9. Example Scenario
       Consider a web-based learning management system (LMS):
           Upon successful authentication, the student’s dashboard is rendered with data from
           multiple tables: courses, assignments, notifications.
Access Mode Via browser/web interface Typically via desktop apps or command line
       Conclusion
       Web databases have become the backbone of modern, data-driven web applications. By
       enabling persistent, dynamic interaction between users and content via browsers, they
       revolutionize how data is stored, accessed, and manipulated. As web technologies evolve,
       web databases continue to be central to digital communication, commerce, education,
       governance, and virtually all aspects of online life. Their power lies in their ability to
       combine the structure and robustness of traditional databases with the interactivity and
       accessibility of the web.
mod8                                                                                                  10
       these limitations, database systems evolved into distributed database systems, which
       distribute data across multiple physical sites but maintain logical consistency and unified
       access.
       A Distributed Database (DDB) is a type of database system in which data is logically
       related but physically stored at multiple locations, often connected via a network. These
       locations could be on different servers, within different geographical regions, or even
       spread across cloud environments.
       2. Definition
       A Distributed Database System (DDBS) is a collection of multiple, logically interrelated
       databases distributed over a computer network, where each site is capable of processing
       part of the database independently. Despite the distribution, the system appears as a
       single unified database to the user.
          The server(s) store and manage the distributed data and respond with the requested
          results.
          Local sites manage their own data and participate in global transactions coordinated by
          a central site.
Uniform data models and query languages (e.g., all use PostgreSQL).
mod8                                                                                                 11
          Easier to manage and integrate.
Greater flexibility but adds complexity in translation, query processing, and integration.
              Vertical Fragmentation: Columns are distributed (e.g., separating personal info and
              financial info).
       2. Replication
          Copies of the same data are stored at multiple sites.
       3. Allocation
          Decides where to place data or fragments — based on access patterns, costs, and
          constraints.
       6. Key Features
       1. Location Transparency
Users can query data without knowing the physical location of data.
2. Replication Transparency
The system hides the fact that data is replicated across sites.
3. Fragmentation Transparency
mod8                                                                                                   12
              Users are unaware of how data is divided or fragmented.
4. Concurrency Control
5. Fault Tolerance
Queries can involve data from multiple sites and must be optimized accordingly.
7. Autonomy
       7. Distributed Transactions
       A distributed transaction is one that accesses or modifies data at multiple sites. It must
       satisfy the ACID properties:
       2. Phase 2 (Commit/Abort): If all respond with "ready", it sends commit; else, it sends
          abort.
2. Scalability
mod8                                                                                                13
             Easily scale out by adding new nodes and redistributing data.
3. Local Autonomy
4. Faster Access
Users can access data stored at nearby sites, reducing response time.
             Local processing reduces the need to transfer large amounts of data across the
             network.
Query planning must account for network latency, data location, and fragmentation.
3. Data Security
4. Concurrency Control
Telecom Networks: Call records and user data stored across regional servers.
         E-commerce: Data centers in multiple regions store product and customer data close to
         the users.
         Cloud Databases: Systems like Google Spanner or Amazon Aurora replicate and
         partition data across multiple zones.
mod8                                                                                              14
       Conclusion
       A Distributed Database offers a robust, scalable, and fault-tolerant alternative to
       centralized systems, especially for modern applications that demand global access, high
       availability, and low latency. While it introduces complexity in terms of query processing,
       concurrency, and system management, it significantly enhances the performance,
       modularity, and flexibility of enterprise-scale database systems. Mastery of distributed
       database concepts is essential for building resilient, large-scale information systems in
       today’s interconnected digital world.
       To address this, the concept of a Data Warehouse was developed. A Data Warehouse is a
       centralized repository that stores integrated, subject-oriented, time-variant, and non-
       volatile data from multiple sources, optimized for querying and analysis rather than
       transaction processing.
       1.2 Definition
       A Data Warehouse is a large, centralized system that collects data from various
       heterogeneous sources, transforms it into a consistent format, and stores it for analytical
       querying and decision-making. It acts as the foundation of business intelligence (BI)
       systems.
2. Integrated
mod8                                                                                                   15
          Consolidates data from multiple sources (e.g., relational databases, flat files, legacy
          systems), resolving naming conflicts and data format inconsistencies.
       3. Time-Variant
          Maintains historical data to support trend analysis and forecasting. Each record is time-
          stamped or associated with a period.
       4. Non-Volatile
          Once data is loaded into the warehouse, it is not updated or deleted through typical
          transactional operations. It is read-only for analysis purposes.
       3. Data Storage
          Centralized data warehouse or data marts (smaller, department-specific warehouses).
       4. Metadata Repository
          Stores information about the data such as its origin, transformations, and structure.
mod8                                                                                                  16
          Improves data quality through integration and cleaning
       1.6 Challenges
          High cost of setup and maintenance
       2.2 Definition
       Data Mining is the process of automatically discovering patterns, trends, correlations, or
       anomalies in large datasets using techniques from statistics, machine learning, and
       database systems.
mod8                                                                                                 17
       2.4 Data Mining Process (Part of KDD)
       1. Data Selection: Identify relevant data from the warehouse
3. Transformation: Convert data into suitable format for mining (e.g., feature extraction)
       2. Clustering
          Grouping similar data without predefined labels.
       4. Regression
          Predicting a continuous numeric value.
       5. Anomaly Detection
          Identifying data points that deviate significantly from the norm.
mod8                                                                                                18
       2.6 Applications of Data Mining
          Marketing: Customer segmentation, recommendation engines
Bias and Fairness: Data mining systems may unintentionally reinforce societal biases
In essence:
       Conclusion
       Data warehousing and data mining form two crucial pillars of modern data-driven decision-
       making systems. A data warehouse enables the efficient collection, storage, and
mod8                                                                                                    19
       management of vast volumes of organizational data, while data mining leverages that
       stored data to derive actionable insights, patterns, and predictions. Together, they support
       strategic planning, operational efficiency, and a deeper understanding of business and user
       behavior in nearly every sector.
mod8 20