MongoDB
An overview of NoSQL
COS216
AVINASH SINGH
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF PRETORIA
Big Data - Overview
“Big Data” is a recent buzzword
Although not a new concept, it became popular with recent hardware and
algorithmic advances
Big data refers to the large volume of data used by businesses on a day-to-day basis
The amount of data is not the important thing
Rather how the data is used by the organization
And how the data is processed
Big Data - V
Some of the Vs of Big Data
Volume: refers to the size of the data, in this case a very large amount in the
Petabyte or even Exabyte range
Velocity: refers to how quickly new incoming data can be processed and analysed.
If data is processed too slowly, it might already be outdated before it can be used.
Variety: refers to the different data types and files that have to be stores and
analysed. Recently hypermedia (video, audio, images, etc) is being analysed more
often, rather than just plain text
SQL - Problems
SQL-based databases have been used for decades
However, they have problems, especially with modern requirements
SQL requires the database/table structure to be known before adding data
SQL has inherent problems storing and processing hypermedia and non-text data
Although SQL is relatively good with large tables (millions or billions of records), inserting,
indexing, and lookup queries can become slow with very large datasets
SQL has storage overhead which might consume storage space that could be utilized
better
Depending on the programming language used to query SQL databases, the data
representation is typically different, requiring the data to be converted before being usable
NoSQL - Overview
NoSQL can refer to multiple concepts
Not using SQL at all
Not only using SQL, but combing other mechanisms with SQL
Not using relational SQL
NoSQL databases provide mechanism for storage and retrieval which is modelled by
means other than tabular relations used by RDBMSs like MySQL
NoSQL has been around since the 1960s, but only gained widespread adoption
during the past decade due to large datasets
Google, Facebook, Twitter, Amazon, etc
NoSQL - Design
NoSQL systems are designed for:
Large scale data storage
Massively parallel data processing
Processing using a large number of low cost servers
NoSQL - Types
Four general types of NoSQL databases
Key-value stores: every item in the database is stored as an attribute name/key with a
corresponding value. For example Amazon Dynamo
Document databases: pair each key with a complex data structure, known as a document.
Documents can contain many different key-value pairs, arrays, and nested documents. For
example MongoDB
Wide-column stores: optimized for queries over large datasets. Stores columns of data
together, instead of rows. For example Cassandra and HBase
Graph databases: store data as graphs showing connections and networks. Often used for
social media where users have multiple relationships with other users. For example: Neo4j
NoSQL - Benefits
NoSQL has some advantages
Manage large volumes of rapidly changing structured, semi-structured, and unstructured
data
Agile sprints, quick schema iterations, and frequent code pushes
Object-oriented programming that is easy to use and flexible
Geographically distributed scale-out architecture instead of expensive monolithic
architecture
Database is splits across multiple cheap servers, instead of a single expensive server
MongoDB - Overview
MongoDB is a document-oriented database which provides high performance, high
availability, and high scalability
MongoDB has a dynamic schema, which means that you do not have to create a
structure for the database before being able to use it
Released in 2009
Free, open-source, an cross-platform
Written in C/C++ and JavaScript
Nicely works together with JavaScript clients, such as NodeJS
MongoDB - Overview
MongoDB documents are a set of key-value pairs
Similar to JavaScript objects
MongoDB makes use of Binary JSON (BSON)
BSON is very similar to JSON, but it supports more data types
MongoDB makes use of the following structure
Database
Collections
Documents
MongoDB - Terminology
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Primary Key Primary Key
MongoDB - Documents
A document is equivalent to a row in a relational database
A document has a dynamic schema
The schema does therefore not have to be specified beforehand
Different documents in the same collection can also have different fields and structure
{"id" : "X47521", "city" : "Pretoria", "location" : [25.7479, 28.2293]}
MongoDB - Collections
A collection is a grouping of multiple documents
The documents are similar, or have a similar purpose, but do not need to have the
same structure or fields
A collection is equivalent to a table in a relational database
{"id" : "X47521", "city" : "Pretoria", "location" : [25.7479, 28.2293]}
{"id" : "X47561", "city" : "Johannesburg", "location" : [26.2041, 28.0473]}
{"id" : "X43542", "city" : "Durban", "location" : [29.8587, 31.0218]}
MongoDB - Processes
The following programs are available in MongoDB
Name Description
mongod Daemon/server for MongoDB that has to run in the
background
mongo The client used to connect to the server and execute
queries
mongoimport Import JSON documents into MongoDB
mongoexport Export JSON documents from MongoDB
MongoDB - Connection
First start the MongoDB server (mongod)
Use the client to connect, similar to MySQL
mongo -u <user> -p <pass> --host <host> --port <port>
Or connect to localhost if no parameters given
mongo
Further commands on the server, client, import, export are not discussed further
Read up on this if you need it
MongoDB – Data Types
Type Description
String UTF-8 strings
Boolean True/False values
Integer 32bit or 64bit integers
Double Floating points
Arrays Arrays of any of the other types
Timestam Unix timestamp with ordinal
ps operations
Date Date and times
Object Embedded documents
Code JavaScript code
Regex Regular expressions
MongoDB – GRUD Operations
Operation Mongo Call Description
Create use mydb Creates and select a database
Database
Create mydb.mycollection.insert Creates the collection if it does not
Collection or (doc) exist, otherwise adds the document
Document to the collection
Read mydb.mycollection.find( Queries specified collections
…)
Update mydb.mycollection.upda Updates document in specified
te(…) collection
Delete mydb.mycollection.remo Remove specific document
ve(…)
Drops specified collection
mydb.mycollection.drop
()
MongoDB - Insert
Insert documents into a collections
db.myCollection.insert({name : "Bitcoin", price : 8521.32});
Insert multiple documents at the same time
db.myCollection.insert([
{name : "Bitcoin", price : 8521.32},
{name : "Ethereum", price : 625.32}
]);
MongoDB - Find
Search a collection with specific criteria that is equal
db.myCollection.find({name : "Bitcoin"});
Search values that are less than
db.myCollection.find({price :{$lt : 5200}});
Other operators are available: less than ($lt), less than or equal ($lte), greater than
($gt), greater than or equal ($gte), not equal ($ne)
MongoDB - Find
Search according to multiple criteria (AND – all must match)
db.myCollection.find({name : "Bitcoin", price : {$lt : 10000}});
Note that each key can only appear once
Hence, when searching for a range (eg price greater than 500 AND less than 1000)
the key cannot be used multiple times
Use $range instead, or combine it with $and
More operators and combinations can be found in MongoDB’s docs
MongoDB - Find
Search according to either criteria (OR– one must match)
db.myCollection.find({$or : [
{name : "Bitcoin"},
{name : "Ethereum"}
]});
MongoDB - Sort
Sort results based on one or more fields
Sort in ascending order (1) or descending order (-1)
db.myCollection.find({price : {$lt : 10000}})
.sort({price : -1, name : 1});
MongoDB - Projection
Determine which fields to return from the collection
Fields can be shown (1) or hidden (0)
db.myCollection.find({ price : {$lt : 10000}},
{name : 0, price : 1});
MongoDB - Update
Update the values of specific fields
First find documents and then set the values
Multi determines of all matches should be updated, or only the first one
db.myCollection.update({name : "Bitcoin"},
{$set : {price : 8652.23}},
{multi : false});
MongoDB - Remove
Remove documents from a collection
Remove all documents db.myCollection.remove({});
Remove specific documents
db.myCollection.remove({name : "Bitcoin"});
MongoDB - Aggregation
MongoDB supports a range of aggregated queries
$sum – calculate the sum
$avg – calculate the mean
$min – find the minimum value
$max – find the maximum value
MongoDB - Aggregation
Aggregated queries have the following syntax
The queries are pipelined from one to the other
.aggregate([
{ $match : <document criteria> }, // Limits data before grouping
{ $group : <group specification> }, // Grouping data
{ $match : <group criteria> }, // Limits results after grouping
{ $sort : <sort specification> }, // Sorts grouped data
{ $out : <collection> }, // The results are inserted into a collection
]) ;
MongoDB - Aggregation
Example: find the maximum valued coin
db.myCollection.aggregate([
{
$group : {_id : "$name", _price : {$max : "$price"}}
}
])
Result
{"_id" : "Bitcoin", "_price" : "8765.23"}
MongoDB - Aggregation
Example: find the total market cap of all coins
db.myCollection.aggregate([
{
$group : {_cap : {$sum : "$marketcap"}}
}
])
Result
{"_cap" : "390776273498"}
MongoDB - Aggregation
Example: find all coins above $5000 and sort them in descending order according to
price
db.myCollection.aggregate([
{$group : {_id : "$name", _price : "$price"}},
{$match : {_price : {$gt : 5000}}},
{$sort : {_price : -1}},
{$out : "CoinResults"}}
]) ;