History of Database Applications
Hehua Chi                                                                   Yihe Yang
                 University of Rochester                                                    University of Rochester
              Wegmans Hall 1210, Rochester, NY 14620                                  Wegmans Hall 1210, Rochester, NY 14620
                   hchi3@ur.rochester.edu                                                  yyang107@ur.rochester.edu
ABSTRACT                                                                   that we today associate with business intelligence, required complex
In this paper, we explored the history of database applications which      coding [1].
are all about three revolutions in database technologies. The first
revolution was driven by the emergence of the electronic computer.         Arguably, no single person has had more influence over database
In the 20 years following the widespread adoption of electronic            technology than Edgar Codd. He harbored significant reservations
computers, a range of increasingly sophisticated database systems          about their design. In particular, he considered the following
emerged. The second revolution was the emergence of the relational         restrictions [1, 2]:
database. Shortly after the definition of the relational model in 1970,        1. Existing databases were too hard to use. Databases of the day
almost every significant database system shared a common                   could only be accessed by people with specialized programming
architecture. The three pillars of this architecture were the relational   skills.
model, ACID transactions, and the SQL language. However,                       2. Existing databases lacked a theoretical foundation. Codd’s
starting around 2008, the third revolution has resulted in an              mathematical background encouraged him to think about data in
explosion of non-relational database alternatives driven by the            terms of formal structures and logical operations.
demands of modern applications that require global scope and                   3. Existing databases mixed logical and physical
continuous availability. An explosion of new database systems              implementations.
occurred: key-value database, document database, graph database,
column database or even SSD and In-memory database. The next               Codd published an internal IBM paper outlining his ideas for a more
generation databases will be NoSQL, NewSQL and big data                    formalized model for database systems, which then led to his 1970
platforms [1].                                                             paper “A Relational Model of Data for Large Shared Data Banks.”
                                                                           This classic paper contained the core ideas that defined the relational
                                                                           database model that became the most significant—almost
1. INTRODUCTION                                                            universal—model for database systems for a generation [1, 2].
Wikipedia defines a database as an “organized collection of data.”
Although the term database entered our vocabulary only in the late         The relational model does not itself define the way in which the
1960s, collecting and organizing data has been an integral factor in       database handles concurrent data change requests. These changes—
the development of human civilization and technology. Books,               generally referred to as database transactions. Jim Gray defined the
libraries and other indexed archives of information represent              most widely accepted transaction model in the late 1970s. This soon
preindustrial equivalents of modern database systems [1].                  became popularized as ACID transactions: Atomic, Consistent,
                                                                           Independent, and Durable [2].
The emergence of electronic computers following the Second World
War represented the first revolution in databases. The development         However, the restriction on scalability beyond a single data center
of indexing methods such as ISAM (Index Sequential Access                  implied by the ACID transaction model has been a key motivator for
Method) and similar indexing structures powered the first electronic       the development of new database architectures. The difference in
databases. However, there was no Database Management Systems               application architectures between the client-server era and the era of
(DBMS) which can minimize programmer overhead and ensure the               massive web-scale applications created pressures on the relational
performance and integrity of data access routines [1].                     database that could not be relieved through incremental innovation
                                                                           [1].
By the early 1970s, two major models of DBMS were competing for
dominance. The network model was formalized by the CODASYL                 A sort of database explosions occurred in the years 2008 - 2009:
standard and implemented in databases such as IDMS, while the              literally dozens of new database systems emerged in this short
hierarchical model provided a somewhat simpler approach as was             period. Especially in late 2009, the term NoSQL quickly caught on
most notably found in IBM’s IMS (Information Management                    as shorthand for any database system that broke with the traditional
System) [1].                                                               SQL database [3]. By 2011, the term NewSQL became popularized
                                                                           as a means of describing this new breed of databases that, while not
However, these systems had several notable drawbacks. First, the           representing a complete break with the relational model, enhanced
navigational databases were extremely inflexible in terms of data          or significantly modified the fundamental principles. Finally, the
structure and query capabilities. And it was extremely difficult to        term Big Data burst onto mainstream consciousness in early 2012.
add new data elements to an existing system. Second, the database          Although the term refers mostly to the new ways in which data is
systems were centered on record at a time transaction processing.          being leveraged to create value, we generally understand "Big Data
Query operations, especially the sort of complex analytic queries          solutions" as convenient shorthand for technologies that support
large and unstructured datasets such as Hadoop. NoSQL, NewSQL,
and Big Data are in many respects vaguely defined, overhyped, and              Table 1. Pre-relational database system development
overloaded terms. However, they represent the most widely                       Year               Pre-Relational Database system
understood phrases for referring to next-generation database                    1951               Magnetic tape
technologies [1].                                                               1955               Magnetic Disk
                                                                                1961               ISAM
The remaining of this paper is organized as follows: the history of             1965               Hierarchical model
database applications are reviewed in Section 2. Two promising sub-             1968               IMS
areas of future database systems are introduced in Section 3. The
                                                                                1969               Network Model
considerations and requirements for choosing appropriate database
                                                                                1971               IDMS
systems in different applications are summarized in Section 4.
Finally, the conclusion is in Section 5.
                                                                          2.1.2 Relational database system
                                                                          The intricacies of relational database theory, at its essence, describes
                                                                          how a given set of data should be presented to the user, rather than
2. HISTORY OF DATABASE                                                    how it should be stored on disk or in memory. A row in a table
APPLICATIONS                                                              should be identifiable and efficiently accessed by a unique key
2.1. Timeline of database development                                     value, and every column in that row must be dependent on that key
                                                                          value and no other identifier. Arrays and other structures that contain
                                                                          nested information are, therefore, not directly supported [1]. Table
                                                                          2 illustrates the development of relational database systems. While
                                                                          each of these systems attempts to differentiate by claiming superior
                                                                          performance, availability, functionality, or economy, they are
                                                                          virtually identical in their reliance on three key principles: Codd’s
                                                                          relational model, the SQL language, and the ACID transaction
                                                                          model [1].
                                                                                  Table 2: Relational database system development
                                                                           Year                Relational database system
                                                                           1970                Codd’s Paper
                                                                           1974                System R
                                                                           1978                Oracle
                                                                           1980                Commercial Ingres
                                                                           1981                Informix
                                                                           1984                DB2
                                                                           1987                Sybase
                                                                           1989                Postgres
                                                                           1989                SQL Server
Figure 1. Illustrating three major eras in database technology
                                                                           1995                MySQL
[1].
                                                                          2.1.3 The next generation database system
The first revolution was driven by the emergence of the electronic        By the middle of the 2000s, the relational database seemed
computer and then many pre-relational databases have sprung up            completely entrenched. In fact, the era of complete relational
like mushrooms. The second revolution was driven by a classic             database supremacy was just about to end. The difference in
paper contained the core ideas that defined the relational database       application architectures between the client-server era and the era of
model that became the most significant - almost universal - model         massive web-scale applications created pressures on the relational
for database systems for a generation [5]. The third revolution for       database that could not be relieved through incremental innovation
next generation databases has resulted in an explosion of non-            [1]. Table 3 illustrates the development of today and future
relational database alternatives to meet the era of massive web-scale     databases. In section 3, two promising sub-areas of future databases
and big data applications [4]. Figure 1 illustrates three major eras in   will be introduced in detail.
database technology. In this section, we’ll provide an overview of
these three waves of database technologies and discuss the tendency                Table 3: Today and future database development
forces leading to today and future’s next generation databases.            Year               Today and future databases
                                                                           2003               MarkLogic
2.1.1 The pre-relational database system                                   2004               MapReduce
Early database systems enforced both a schema (a definition of the         2005               Hadoop
structure of the data within the database) and an access path (a fixed     2005               Vertica
means of navigating from one record to another). Table 1 illustrates       2007               Dynamo
the pre-relational database system development. By the early 1970s,        2008               Cassandra
two major models of DBMS were competing for dominance: the                 2008               Hbase
network model and the hierarchical model [1].                              2008               NuoDB
 2009                 MongoDB                                            transactions are lost which becomes impossible to perform joins or
 2010                 VoltDB                                             maintain transactional integrity across shards [1].
 2010                 Hana
 2011                 Riak                                               Finally, the operational costs of sharding, together with the loss of
 2012                 Areospike                                          relational features, made many seek alternatives to the Relational
 2014                 Splice Machine                                     Database Management System (RDBMS) [1].
                                                                         3.1 NoSQL database
2.2 Summary: three platforms corresponding to                            A NoSQL (Not Only SQL) database provides a mechanism for
                                                                         storage and retrieval of data that is modeled in means other than the
three waves of databases                                                 tabular relations used in relational databases. NoSQL databases
The three waves of databases roughly corresponds to a three waves
                                                                         operate without a schema, allowing you to freely add fields to
of computer applications. The three platforms shown in figure 2 are
                                                                         database records without having to define any changes in structure
often referred to illustrate the database system development. The
                                                                         first. This is particularly useful when dealing with non-uniform data
first platform was the mainframe, which was supported by pre-
                                                                         and custom fields. In summary, the common characteristics of
relational database systems. The second platform, client-server and
                                                                         NoSQL databases are:
early web applications, was supported by relational databases. The
                                                                              (1) The do not use the relational model;
third platform is characterized by applications that involve cloud
                                                                              (2) The run well on clusters;
computing, mobile presence, social networking, and the Internet of
                                                                              (3) Usually are open-source;
Things. The third platform demands a third wave of database
                                                                              (4) They’re built for the 21st century web estates;
technologies that include but are not limited to relational systems
                                                                              (5) They’re for the most part, schemaless;
[1]. Figure 2 summarizes how the three platforms correspond to the
                                                                              (6) The most important result of the rise of NoSQL is Polyglot
three waves of database revolutions.
                                                                                   Persistence.
                                                                         There are commonly 4 main types of NoSQL data models: key-value
                                                                         databases; document databases; column databases and graph
                                                                         databases.
                                                                         3.1.1 Key-Value databases
                                                                         A key-value database, or key-value store, is a data storage paradigm
                                                                         designed for storing, retrieving, and managing associative arrays
                                                                         which is a data structure more commonly known today as a
                                                                         dictionary or hash. In the following scenarios, it is beneficial to
                                                                         apply key-value databases [4].
                                                                         (1) Storing Session Information;
                                                                         Generally, every web session is unique and is assigned a unique
                                                                         SessionID value. Applications that store the SessionID on disk or in
                                                                         a RDBMS will greatly benefit from moving to a key-value store,
                                                                         since everything about the session can be stored by a single PUT
                                                                         request or retrieved using GET. This single-request operation makes
   Figure 2. Three platforms correspond to three waves of                it very fast, as everything about the session is stored in a single
database technology [1].                                                 object. Solutions such as Memcached are used by many web
                                                                         applications, and Riak can be used when availability is important [1]
                                                                         .
3. TWO PROMISING FUTURE DATABASES                                        (2) User Profiles, Preferences;
                                                                         Almost every user has a unique UserId, Username, or some other
The relational database was already well established. However,
                                                                         attribute, as well as preferences such as language, color, time-zone
driven by the demands of modern applications that require global
                                                                         and so on. This can all be put into an object, so getting preferences
scope and continuous availability, relational databases were
                                                                         of a user takes a single GET operation. Product profiles can be
inadequate to deal with the volumes and velocity of the big data. In
                                                                         stored, similarly.
particular, the difference in application architectures between the
client-server era and the era of massive web-scale applications
                                                                         (3)Shopping Cart Data;
created pressures on the relational database that could not be
                                                                         E-commerce websites have shopping carts tied to the user. As we
relieved through incremental innovation. Scalability challenges
                                                                         want the shopping carts to be available all the time, across browsers,
exist in scaling their infrastructure from thousands to millions of
                                                                         machines, and sessions, all the shopping information can be put into
users. Even the most expensive commercial Relational Database
                                                                         the value where the key is the userid. A Riak cluster would be best
Management System (RDBMS) such as Oracle could not provide
                                                                         suited for these kinds of applications.
sufficient scalability to meet the demands of these sites. Sharding at
sites like Facebook has allowed a MySQL-based system to scale up
to massive levels. However, there are downsides to doing this
because many relational operations and database-level ACID               3.1.2 Document databases
A document database is designed to store semi-structured data as           and so on. Figure 3 shows how to make decisions involved in
documents, typically in JSON or XML format. It is beneficial to use        choosing the correct database.
document databases in the following scenarios: event logging;
content management systems, blogging platforms; web analytics or
real-time analytics and e-commerce Applications.
3.1.3 Column databases
A column store database is a type of database that stores data using
a column oriented model. It is beneficial to use document databases
in the following scenarios: event logging; content management
systems, blogging platforms; counters and expiring usage.
3.1.4 Graph databases
A graph database is a database that uses graph structures for
semantic queries with nodes, edges and properties to represent and
store data. It is beneficial to use document databases in the following
scenarios: connected data; routing, dispatch, and location-Based
services; recommendation engines.
                                                                           Figure 3. Decisions involved in choosing the correct database
                                                                           [1].
3.2 NewSQL database
The term NewSQL is not quite as broad as NoSQL. NewSQL is a
term to describe a new group of databases that share much of the
functionality of traditional SQL relational databases, while offering      5. CONCLUSIONS
some of the benefits of NoSQL technologies. NewSQL systems                 It's an exciting time to be working in the database industry. For a
offer the best of both worlds: the relational data model and ACID          generation of software professionals, innovation in database
transactional consistency of traditional operational databases; the        technology occurred largely within the constraints of the ACID-
familiarity and interactivity of SQL; and the scalability and speed of     compliant relational databases. Now that the hegemony of the
NoSQL. Some offer stronger consistency guarantees than are                 RDBMS has been broken, we are free to design database systems
available with NoSQL solutions, although others limit this to              whose only constraint is our imagination. It's well known that failure
‘tunable’ consistency and thus aren’t fully ACID-compliant [1]. The        drives innovation. Some of these new database system concepts
NewSQL advantages include:                                                 might not survive the test of time; however, there seems little chance
    (1) Minimize application complexity, stronger consistency and          that a single model will dominate the immediate future as
         often full transactional support.                                 completely as had the relational model. Database professionals will
    (2) Familiar SQL and standard tooling.                                 need to choose the most appropriate technology for their
    (3) Richer analytics leveraging SQL and extensions.                    circumstances with care; in many cases, relational technology will
    (4) Many systems offer NoSQL-style clustering with more                continue be a better fit—but not always [1].
         traditional data and query models.
                                                                           NoSQL, NewSQL, and Big Data are in many respects vaguely
The NewSQL disadvantages include:                                          defined, overhyped, and overloaded terms. However, they represent
   (1) No NewSQL systems are as general-purpose as traditional             the most widely understood phrases for referring to next-generation
       SQL systems set out to be.                                          database technologies [1].
   (2) In-memory architectures may be inappropriate for volumes
       exceeding a few terabytes.                                          Loosely speaking, NoSQL databases reject the constraints of the
   (3) Offers only partial access to the rich tooling of traditional       relational model, including strict consistency and schemas.
       SQL systems.                                                        NewSQL databases retain many features of the relational model but
                                                                           amend the underlying technology in significant ways. Big Data
                                                                           systems are generally oriented around technologies within the
                                                                           Hadoop ecosystem, increasingly including Spark [1].
4. DATABASE CONSIDERATIONS AND
REQUIREMENTS                                                               6. REFERENCES
The first and most obvious purpose of a database is to store, update,      [1] Harrison, Guy. Next Generation Databases. Publisher: Apress.
and access data. All database systems allow these operations in one        December 26, 2015
form or another. Other functional and nonfunctional system                 [1] Ramez Elmasri, Shamkant B. Navathe, Fundamentals of
considerations and requirements for choosing appropriate database          Database Systems (7th Edition). Publisher: Pearson. June 18, 2015
systems in different applications include: (1) consistency,                [3] Hugh E.Williams, David Lane. Web Database Applications with
availability, and partition tolerance (CAP); (2) robustness and            PHP and MySQL. Publisher: O’Reilly Media. May 2004
reliability; (3) scalability; (4) performance and speed; (5)               [4] Haseeb, Abdul, and Geeta Pattun. "A review on NoSQL:
partitioning ability; (7) in-database analytics and monitoring; (8)        Applications and challenges." International Journal of Advanced
operational and querying capabilities; (9) storage management; (10)        Research in Computer Science 8.1 (2017).
talent pool and availability of relevant skills; (11) database integrity   [5] Codd, Edgar F. "A relational model of data for large shared data
and constraints; (12) data model flexibility; (13) database security       banks." Communications of the ACM 13.6 (1970): 377-387.