NoSQL databases are currently a hot topic in some parts of computing, with
over a hundred
different NoSQL databases.
NoSQL stands for:
No Relational
No RDBMS
Not Only SQL
NoSQL is an umbrella term for all databases and data stores that don’t follow
the RDBMS principles
A class of products
A collection of several (related) concepts about data storage and
manipulation
Often related to large data sets
Non-relational DBMSs are not new
But NoSQL represents a new incarnation
Due to massively scalable Internet applications
Based on distributed and parallel computing
Development
Starts with Google
First research paper published in 2003
Continues also thanks to Lucene's developers/Apache (Hadoop) and Amazon
(Dynamo)
Then a lot of products and interests came from Facebook, Netfix, Yahoo, eBay,
Hulu, IBM, and many more
Three major papers were the seeds of the NoSQL
movement
BigTable (Google)
Dynamo (Amazon)
Distributed key-value data store
Eventual consistency
CAP Theorem (discuss in a sec ..)
NoSQL comes from Internet, thus it is often related to the “big
data” concept
How much big are “big data”?
Over few terabytes Enough to start spanning multiple storage
units
Challenges
Efficiently storing and accessing large amounts of data is
difficult, even more considering fault tolerance and backups
Manipulating large data sets involves running immensely
parallel processes
Managing continuously evolving schema and metadata for
semi-structured and un-structured data is difficult
Discussing NoSQL databases is complicated
because there are a variety of types:
Sorted ordered Column Store
Optimized for queries over large datasets, and store
columns of data together, instead of rows
Document databases:
pair each key with a complex data structure known as a document.
Key-Value Store :
are the simplest NoSQL databases. Every single item in the
database is stored as an attribute name (or 'key'), together with its
value.
Graph Databases :
are used to store information about networks of data, such as social
connections.
Documents
Loosely structured sets of key/value pairs in documents, e.g., XML,
JSON, BSON
Encapsulate and encode data in some standard formats or
encodings
Are addressed in the database via a unique key
Documents are treated as a whole, avoiding splitting a document
into its constituent name/value pairs
Allow documents retrieving by keys or contents
Notable for:
MongoDB (used in FourSquare, Github, and more)
CouchDB (used in Apple, BBC, Canonical, Cern, and more)
Store data in a schema-less way
Store data as maps
HashMaps or associative arrays
Provide a very efficient average running
time algorithm for accessing data
Notable for:
Couchbase (Zynga, Vimeo, NAVTEQ, ...)
Redis (Craiglist, Instagram, StackOverfow,
flickr, ...)
Amazon Dynamo (Amazon, Elsevier,
IMDb, ...)
Apache Cassandra (Facebook, Digg,
Reddit, Twitter,...)
Voldemort (LinkedIn, eBay, …)
Riak (Github, Comcast, Mochi, ...)
Data are stored in a column-oriented way
Data efficiently stored
Avoids consuming space for storing nulls
Columns are grouped in column-families
Data isn’t stored as a single table but is stored by column families
Unit of data is a set of key/value pairs
Identified by “row-key”
Ordered and sorted based on row-key
Notable for:
Google's Bigtable (used in all
Google's services)
HBase (Facebook, StumbleUpon,
Hulu, Yahoo!, ...)
• Consistency and Availability is not “binary”
decision
• AP systems relax consistency in favor of
availability – but are not inconsistent
• CP systems sacrifice availability for consistency-
but are not unavailable
• This suggests both AP and CP systems can offer a
degree of consistency, and availability, as well as
partition tolerance
THANK YOU

gayathrinosql.pptx

  • 2.
    NoSQL databases arecurrently a hot topic in some parts of computing, with over a hundred different NoSQL databases. NoSQL stands for: No Relational No RDBMS Not Only SQL NoSQL is an umbrella term for all databases and data stores that don’t follow the RDBMS principles A class of products A collection of several (related) concepts about data storage and manipulation Often related to large data sets
  • 3.
    Non-relational DBMSs arenot new But NoSQL represents a new incarnation Due to massively scalable Internet applications Based on distributed and parallel computing Development Starts with Google First research paper published in 2003 Continues also thanks to Lucene's developers/Apache (Hadoop) and Amazon (Dynamo) Then a lot of products and interests came from Facebook, Netfix, Yahoo, eBay, Hulu, IBM, and many more
  • 4.
    Three major paperswere the seeds of the NoSQL movement BigTable (Google) Dynamo (Amazon) Distributed key-value data store Eventual consistency CAP Theorem (discuss in a sec ..)
  • 5.
    NoSQL comes fromInternet, thus it is often related to the “big data” concept How much big are “big data”? Over few terabytes Enough to start spanning multiple storage units Challenges Efficiently storing and accessing large amounts of data is difficult, even more considering fault tolerance and backups Manipulating large data sets involves running immensely parallel processes Managing continuously evolving schema and metadata for semi-structured and un-structured data is difficult
  • 6.
    Discussing NoSQL databasesis complicated because there are a variety of types: Sorted ordered Column Store Optimized for queries over large datasets, and store columns of data together, instead of rows Document databases: pair each key with a complex data structure known as a document. Key-Value Store : are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Graph Databases : are used to store information about networks of data, such as social connections.
  • 7.
    Documents Loosely structured setsof key/value pairs in documents, e.g., XML, JSON, BSON Encapsulate and encode data in some standard formats or encodings Are addressed in the database via a unique key Documents are treated as a whole, avoiding splitting a document into its constituent name/value pairs Allow documents retrieving by keys or contents Notable for: MongoDB (used in FourSquare, Github, and more) CouchDB (used in Apple, BBC, Canonical, Cern, and more)
  • 8.
    Store data ina schema-less way Store data as maps HashMaps or associative arrays Provide a very efficient average running time algorithm for accessing data Notable for: Couchbase (Zynga, Vimeo, NAVTEQ, ...) Redis (Craiglist, Instagram, StackOverfow, flickr, ...) Amazon Dynamo (Amazon, Elsevier, IMDb, ...) Apache Cassandra (Facebook, Digg, Reddit, Twitter,...) Voldemort (LinkedIn, eBay, …) Riak (Github, Comcast, Mochi, ...)
  • 9.
    Data are storedin a column-oriented way Data efficiently stored Avoids consuming space for storing nulls Columns are grouped in column-families Data isn’t stored as a single table but is stored by column families Unit of data is a set of key/value pairs Identified by “row-key” Ordered and sorted based on row-key Notable for: Google's Bigtable (used in all Google's services) HBase (Facebook, StumbleUpon, Hulu, Yahoo!, ...)
  • 10.
    • Consistency andAvailability is not “binary” decision • AP systems relax consistency in favor of availability – but are not inconsistent • CP systems sacrifice availability for consistency- but are not unavailable • This suggests both AP and CP systems can offer a degree of consistency, and availability, as well as partition tolerance
  • 11.