MongoDB
Advance
MapReduce with MongoDB
• MongoDB provides mapReduce command
• Syntax: db.collection-name.mapReduce(mapFunction,
reduceFunction, options)
MapReduce Operation
• the map function, defined as JS function
• the reduce function also defined as function using the
function keyword
MapReduce Operation
• query: is the criteria to select documents first, if not given,
then the map reduce will be applied on all docs
• out field specify where to store the MapReduce output
• here, in the example order_totals will be a new collection
MapReduce
• The map and reduce can be defined as functions outside
the mapReduce operator,
• can be called inside the mapReduce operator
db.testMR.mapReduce(mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
Example
• Collection contains documents, document as shown
below
{
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
• The MapReduce task: return the total price per customer
Define Map Function
• Map function reads the input and emits/produce
• cust_id and price as key-value pairs
• We can define the map function using JS
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
* this refers to the document that the map function is processing
Define the reduce function
• The reduce function will get the cust_id as a key, and an
Array of all prices as value
• Using JS, we can define the reduce function as
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Perform the MapReduce
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: “sum_prices_per_customer” }
• We call the mapReduce operator on the collection
)
• pass the map function name
• pass the reduce function name
• and specify where to store the output
Other parameters
mapReduce takes
• Beside the map function, reduce function, and the out
• mapReduce operator can take
• query: perform selection (filter on the data)
• sort
• limit
Example
• count number of movies per year, starting from 2005
sample doc
{
title: "Anthropoid",
year: 2016,
actors: [ ObjectId("8") ]
}
Example
• count number of movies per year, starting from 2005, and
sort
sample doc
{
title: "Anthropoid",
year: 2016,
actors: [ ObjectId("8") ]
}
Index
• If data is not indexed, this means that the DB will scan the entire
collection to find docs based on given conditions
• With indexes, mongoDB can efficiently find docs
• primary feature for performance
• Primary index is applied on the identifier field (_id)
• created automatically
• Secondary indexes can be applied on any field
• created manually
Index types
• Default: _id
• Single field
• user-defined on single field
• Compound fields
• user-defined on multiple fields
• multikey index
• used to index content stored in array
• index entry for each array element
Index Structures
• Ordered
• values in the indexed field are sorted either ascending or
descending
• Hashed
• index the hashing of the values
• Text
• index the text of the fields
• useful for full-text search
Behind the scenes: BSON
• MongoDB uses BSON (Binary JSON) to store
representation of JSON docs
• the JSON objects, arrays are serialized into binary
Behind the scenes:
Replication
• Master/slave replication
• one replica is the master
• other replicas are slaves
• client perform operations on
the master replica
Behind the scenes:
Sharding
• MongoDB automatically partition the data
• MongoDB partition Collection
• using the indexed key that is immutable (for example the _id)
• divide into chunks
• when the chunk grows beyond configured limit, it will be split
• In the background
• MongoDB runs chunk migration process
• to achieve load balancing
Summary
• MongoDB is a JOSN document Database
• Master/slave replication approach
• Query functionality
• CRUD operations
• Create , Read, Update, Delete
• MapReduce
• index structures