Mongodb Essentials Training
Mongodb Essentials Training
MongoDB, Inc.
Contents
1 Introduction 4
1.1 Warm Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 MongoDB - The Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 MongoDB Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 MongoDB Stores Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 MongoDB Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Lab: Installing and Configuring MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 CRUD 18
2.1 Creating and Deleting Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Reading Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Query Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 Lab: Finding Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Updating Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6 Lab: Updating Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Indexes 44
3.1 Index Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Lab: Basic Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Compound Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Lab: Optimizing an Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Multikey Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.6 Hashed Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.7 Geospatial Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.8 Using Compass with Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.9 TTL Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.10 Text Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.11 Partial Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.12 Lab: Finding and Addressing Slow Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.13 Lab: Using explain() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 Storage 82
4.1 Introduction to Storage Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5 Drivers 87
5.1 Introduction to MongoDB Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2 Lab: Driver Tutorial (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6 Aggregation 91
1
6.1 Intro to Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 Aggregation - Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Aggregation - $lookup and $graphLookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.4 Lab: Using $graphLookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.5 Aggregation - Unwind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.6 Lab: Aggregation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.7 Aggregation - $facet, $bucket, and $bucketAuto . . . . . . . . . . . . . . . . . . . . . . . 109
6.8 Aggregation - Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9 Sharding 176
9.1 Introduction to Sharding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.2 Balancing Shards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.3 Shard Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.4 Lab: Setting Up a Sharded Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
11 Security 198
11.1 Security Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
11.2 Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
11.3 Lab: Administration Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
11.4 Lab: Create User-Defined Role (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.5 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.6 Lab: Secure mongod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11.7 Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.8 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.9 Log Redaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.10Lab: Secured Replica Set - KeyFile (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.11Lab: LDAP Authentication & Authorization (Optional) . . . . . . . . . . . . . . . . . . . . . . . . 219
11.12Lab: Security Workshop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
12 Views 228
12.1 Views Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
2
12.2 Lab: Vertical Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
12.3 Lab: Horizontal Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
12.4 Lab: Reshaped Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
3
1 Introduction
1.1 Warm Up
Introductions
• Who am I?
• My role at MongoDB
• My background and prior experience
MongoDB Experience
4
1.2 MongoDB - The Company
10gen
Origin of MongoDB
Learning Objectives
5
MongoDB is a Document Database
{
"_id" : "/apple-reports-second-quarter-revenue",
"headline" : "Apple Reported Second Quarter Revenue Today",
"date" : ISODate("2015-03-24T22:35:21.908Z"),
"author" : {
"name" : "Bob Walker",
"title" : "Lead Business Editor"
},
"copy" : "Apple beat Wall St expectations by reporting ...",
"tags" : [
"AAPL", "Earnings", "Cupertino"
],
"comments" : [
{ "name" : "Frank", "comment" : "Great Story" },
{ "name" : "Wendy", "comment" : "When can I buy an Apple Watch?" }
]
}
Vertical Scaling
CPU
RAM
I/O
CPU
RAM
I/O
6
Scaling with MongoDB
Collection1
1TB
Database Landscape
Scalability & Performance
Memcached
MongoDB
RDBMS
Depth of Functionality
7
MongoDB Deployment Models
Learning Objectives
8
JSON
{
"firstname" : "Thomas",
"lastname" : "Smith",
"age" : 29
}
{
"headline" : "Apple Reported Second Quarter Revenue Today",
"date" : ISODate("2015-03-24T22:35:21.908Z"),
"views" : 1234,
"author" : {
"name" : "Bob Walker",
"title" : "Lead Business Editor"
},
"tags" : [
"AAPL",
23,
{ "name" : "city", "value" : "Cupertino" },
{ "name" : "stockPrice", "value": NumberDecimal("143.51")},
[ "Electronics", "Computers" ]
]
}
1 http://json.org/
9
BSON
// JSON
{ "hello" : "world" }
// BSON
x16 x0 x0 x0 // document size
x2 // type 2=string
h e l l o x0 // name of the field, null terminated
x6 x0 x0 x0 // size of the string value
w o r l d x0 // string value, null terminated
x0 // end of document
// JSON
{ "BSON" : [ "awesome", 5.05, 1986 ] }
// BSON
x31 x0 x0 x0 // document size
x4 // type=4, array
B S O N x0 // name of first element
x26 x0 x0 x0 // size of the array, in bytes
x2 // type=2, string
x30 x0 // element name ’0’
x8 x0 x0 x0 // size of value for array element 0
a w e s o m e x0 // string value for element 0
x1 // type=1, double
x31 x0 // element name ’1’
x33 x33 x33 x33 x33 x33 x14 x40 // double value for array element 1
x10 // type=16, int32
x32 x0 // element name ’2’
xc2 x7 x0 x0 // int32 value for array element 2
x0
x0
2 http://bsonspec.org/#/specification
10
Documents, Collections, and Databases
Learning Objectives
11
What is BSON?
BSON is a binary serialization of JSON, used to store documents and make remote procedure calls in MongoDB. For
more in-depth coverage of BSON, specifically refer to bsonspec.org3
Note: All official MongoDB drivers map BSON to native types and data structures
BSON types
MongoDB supports a wide range of BSON types. Each data type has a corresponding number and string alias that can
be used with the $type operator to query documents by BSON type.
Double 1 “double”
String 2 “string”
Object 3 “object”
Array 4 “array”
Binary data 5 “binData”
ObjectId 7 “objectId”
Boolean 8 “bool”
Date 9 “date”
Null 10 “null”
12
ObjectId
> ObjectId()
ObjectId("58dc309ce3f39998099d6275")
Timestamps
BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date type.
Date
BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This
results in a representable date range of about 290 million years into the past and future.
• Official BSON spec refers to the BSON Date type as UTC datetime
• Signed data type. Negative values represent dates before 1970.
Decimal
For specific information about how your preferred driver supports decimal128, click here4 .
In the Mongo shell, we use the NumberDecimal() constructor.
• Can be created with a string argument or a double
• Stored in the database as NumberDecimal(“999.4999”)
> NumberDecimal("999.4999")
NumberDecimal("999.4999")
> NumberDecimal(999.4999)
NumberDecimal("999.4999")
4 https://docs.mongodb.com/ecosystem/drivers/
13
Decimal Considerations
• If upgrading an existing database to use decimal128, it is recommended a new field be added to reflect the new
type. The old field may be deleted after verifying consistency
• If any fields contain decimal128 data, they will not be compatible with previous versions of MongoDB. There
is no support for downgrading datafiles containing decimals
• decimal types are not strictly equal to their double representations, so use the NumberDecimal constructor in
queries.
Learning Objectives
Production Releases
64-bit production releases of MongoDB are available for the following platforms.
• Windows
• OSX
• Linux
• Solaris
Installing MongoDB
• Visit https://docs.mongodb.com/manual/installation/.
• Please install the Enterprise version of MongoDB.
• Click on the appropriate link, such as “Install on Windows” or “Install on OS X” and follow the instructions.
• Versions:
– Even-numbered builds are production releases, e.g., 2.4.x, 2.6.x.
– Odd-numbers indicate development releases, e.g., 2.5.x, 2.7.x.
14
Linux Setup
PATH=$PATH:<path to mongodb>/bin
Install on Windows
C:\Program Files\MongoDB\Server\<VERSION>\bin
md \data\db
Launch a mongod
<path to mongodb>/bin/mongod
Specify an alternate path for data files using the --dbpath option. (Make sure the directory already exists.) E.g.,
15
The MMAPv1 Data Directory
ls /data/db
ls /data/db
unzip usb_drive.zip
cd usb_drive
cd dump
Note: If there is an error importing data directly from a USB drive, please copy the sampledata.zip file to your local
computer first.
16
Launch a Mongo Shell
Open another command shell. Then type the following to start the Mongo shell.
mongo
help
Explore Databases
show dbs
use <database_name>
db
Exploring Collections
show collections
db.<COLLECTION>.help()
db.<COLLECTION>.find()
Admin Commands
db.adminCommand( { shutdown : 1 } )
17
2 CRUD
Creating and Deleting Documents (page 18) Inserting documents into collections, deleting documents, and drop-
ping collections
Reading Documents (page 23) The find() command, query documents, dot notation, and cursors
Query Operators (page 30) MongoDB query operators including: comparison, logical, element, and array operators
Lab: Finding Documents (page 34) Exercises for querying documents in MongoDB
Updating Documents (page 34) Using update methods and associated operators to mutate existing documents
Lab: Updating Documents (page 43) Exercises for updating documents in MongoDB
Learning Objectives
// For example
db.people.insertOne( { "name" : "Mongo" } )
18
Example: Inserting a Document
use sample
db.movies.find()
// succeeds
db.movies.insertOne( { "_id" : "Star Wars" } )
// malformed document
db.movies.insertOne( { "Star Wars" } )
19
insertMany()
Ordered insertMany()
• For ordered inserts MongoDB will stop processing inserts upon encountering an error.
• Meaning that only inserts occurring before an error will complete.
• The default setting for db.<COLLECTION>.insertMany is an ordered insert.
• See the next exercise for an example.
Unordered insertMany()
20
The Shell is a JavaScript Interpreter
db.stuff.find()
Deleting Documents
Using deleteOne()
21
Using deleteMany()
Experiment with removing documents. Do a find() after each deleteMany() command below.
Dropping a Collection
22
Example: Dropping a Collection
db.colToBeDropped.insertOne( { a : 1 } )
show collections // Shows the colToBeDropped collection
db.colToBeDropped.drop()
show collections // collection is gone
Dropping a Database
use tempDB
db.testcol1.insertOne( { a : 1 } )
db.testcol2.insertOne( { a : 1 } )
db.dropDatabase()
show collections // No collections
show dbs // The db is gone
Learning Objectives
23
The find() Method
Query by Example
• To query MongoDB, specify a document containing the key / value pairs you want to match
• You need only specify values for fields you care about.
• Other fields will not be used to exclude documents.
• The result set will include all documents in a collection that match.
db.movies.drop()
db.movies.insertMany( [
{ "title" : "Jaws", "year" : 1975, "imdb_rating" : 8.1 },
{ "title" : "Batman", "year" : 1989, "imdb_rating" : 7.6 }
] )
db.movies.find()
// Multiple Batman movies from different years, find the correct one
db.movies.find( { "year" : 1989, "title" : "Batman" } )
Querying Arrays
24
Example: Querying Arrays
db.movies.drop()
db.movies.insertMany(
[{ "title" : "Batman", "category" : [ "action", "adventure" ] },
{ "title" : "Godzilla", "category" : [ "action", "adventure", "sci-fi" ] },
{ "title" : "Home Alone", "category" : [ "family", "comedy" ] }
])
"field1.field2" : value
• Put quotes around the field name when using dot notation.
db.movies.insertMany(
[ {
"title" : "Avatar",
"box_office" : { "gross" : 760,
"budget" : 237,
"opening_weekend" : 77
}
},
{
"title" : "E.T.",
"box_office" : { "gross" : 349,
"budget" : 10.5,
"opening_weekend" : 14
}
}
] )
// dot notation
db.movies.find( { "box_office.gross" : 760 } ) // expected value
25
Example: Arrays and Dot Notation
db.movies.insertMany( [
{ "title" : "E.T.",
"filming_locations" :
[ { "city" : "Culver City", "state" : "CA", "country" : "USA" },
{ "city" : "Los Angeles", "state" : "CA", "country" : "USA" },
{ "city" : "Cresecent City", "state" : "CA", "country" : "USA" }
] },
{ "title": "Star Wars",
"filming_locations" :
[ { "city" : "Ajim", "state" : "Jerba", "country" : "Tunisia" },
{ "city" : "Yuma", "state" : "AZ", "country" : "USA" }
] } ] )
Projections
• You may choose to have only certain fields appear in result documents.
• This is called projection.
• You specify a projection by passing a second parameter to find().
db.movies.insertOne(
{
"title" : "Forrest Gump",
"category" : [ "drama", "romance" ],
"imdb_rating" : 8.8,
"filming_locations" : [
{ "city" : "Savannah", "state" : "GA", "country" : "USA" },
{ "city" : "Monument Valley", "state" : "UT", "country" : "USA" },
{ "city" : "Los Anegeles", "state" : "CA", "country" : "USA" }
],
"box_office" : {
"gross" : 557,
"opening_weekend" : 24,
"budget" : 55
}
})
26
Projection: Example
Projection Documents
Example: Projections
Cursors
27
Example: Introducing Cursors
db.testcol.drop()
for (i=1; i<=10000; i++) {
db.testcol.insertOne( { a : Math.floor( Math.random() * 100 + 1 ),
b : Math.floor( Math.random() * 100 + 1 ) } )
}
db.testcol.find()
it
it
Cursor Methods
db.testcol.drop()
for (i=1; i<=100; i++) { db.testcol.insertOne( { a : i } ) }
// all 100
db.testcol.count()
// just 41 docs
db.testcol.count( { a : { $lt : 42 } } )
28
Example: Using sort()
db.testcol.drop()
for (i=1; i<=20; i++) {
db.testcol.insertOne( { a : Math.floor( Math.random() * 10 + 1 ),
b : Math.floor( Math.random() * 10 + 1 ) } )
}
db.testcol.find()
// sort by b, then a
db.testcol.find().sort( { b : 1, a : 1 } )
29
Example: Using distinct()
db.movie_reviews.drop()
db.movie_reviews.insertMany( [
{ "title" : "Jaws", "rating" : 5 },
{ "title" : "Home Alone", "rating" : 1 },
{ "title" : "Jaws", "rating" : 7 },
{ "title" : "Jaws", "rating" : 4 },
{ "title" : "Jaws", "rating" : 8 } ] )
db.movie_reviews.distinct( "title" )
Learning Objectives
Upon completing this module students should understand the following types of MongoDB query operators:
• Comparison operators
• Logical operators
• Element query operators
• Operators on arrays
Example (Setup)
30
{
"title" : "Home Alone",
"category" : [ "family", "comedy" ],
"imdb_rating" : 7.4
}
] )
db.movies.find()
db.movies.find( { $or : [
{ "category" : "sci-fi" }, { "imdb_rating" : { $gte : 7 } }
] } )
// more complex $or, really good sci-fi movie or medicore family movie
db.movies.find( { $or : [
{ "category" : "sci-fi", "imdb_rating" : { $gte : 8 } },
{ "category" : "family", "imdb_rating" : { $gte : 7 } }
] } )
31
Example: Logical Operators
// type 1 is Double
db.movies.find( { "budget" : { $type : 1 } } )
32
Example: Array Operators
Example: $elemMatch
db.movies.insertOne( {
"title" : "Raiders of the Lost Ark",
"filming_locations" : [
{ "city" : "Los Angeles", "state" : "CA", "country" : "USA" },
{ "city" : "Rome", "state" : "Lazio", "country" : "Italy" },
{ "city" : "Florence", "state" : "SC", "country" : "USA" }
] } )
33
2.4 Lab: Finding Documents
In the sample database, how many documents in the grades collection have a student_id less than 65?
In the sample database, how many documents in the inspections collection have result “Pass” or “Fail”?
In the stories collection, write a query to find all stories where the view count is greater than 1000.
Find the news article that has the most comments in the stories collection
Find all digg stories where the topic name is “Television” or the media type is “videos”. Skip the first 5 results and
limit the result set to 10.
Query for all digg stories whose media type is either “news” or “images” and where the topic name is “Comedy”. (For
extra practice, construct two queries using different sets of operators to do this.)
Learning Objectives
34
The replaceOne() Method
Example: replaceOne()
35
The updateOne() Method
Example (Setup)
db.movies.insertMany( [
{
"title" : "Batman",
"category" : [ "action", "adventure" ],
"imdb_rating" : 7.6,
"budget" : 35
},
{
"title" : "Godzilla",
"category" : [ "action",
"adventure", "sci-fi" ],
"imdb_rating" : 6.6
},
{
"title" : "Home Alone",
"category" : [ "family", "comedy" ],
"imdb_rating" : 7.4
}
] )
36
Example: $set and $unset
Update Operators
37
The updateMany() Method
Warning: Without an appropriate index, you may scan every document in the collection.
Example: updateMany()
38
Array Operators
db.movies.updateOne(
{ "title" : "Batman" },
{ $push : { "category" : "superhero" } } )
db.movies.updateOne(
{ "title" : "Batman" },
{ $pushAll : { "category" : [ "villain", "comic-based" ] } } )
db.movies.updateOne(
{ "title" : "Batman" },
{ $pop : { "category" : 1 } } )
db.movies.updateOne(
{ "title" : "Batman" },
{ $pull : { "category" : "action" } } )
db.movies.updateOne(
{ "title" : "Batman" },
{ $pullAll : { "category" : [ "villain", "comic-based" ] } } )
db.<COLLECTION>.updateOne(
{ <array> : value ... },
{ <update operator> : { "<array>.$" : value } }
)
6 http://docs.mongodb.org/manual/reference/operator/update/postional
39
Example: The Positional $ Operator
Upserts
Upsert Mechanics
Example: Upserts
40
save()
db.<COLLECTION>.save( <document> )
Example: save()
• If the document in the argument does not contain an _id field, then the save() method acts like
insertOne() method
– An ObjectId will be assigned to the _id field.
• If the document in the argument contains an _id field: then the save() method is equivalent to a replaceOne
with the query argument on _id and the upsert option set to true
// insert
db.movies.save( { "title" : "Beverly Hills Cops", "imdb_rating" : 7.3 })
db.movies.drop()
db.movies.insertOne( { "title" : "Jaws", "imdb_rating" : 7.3 } )
doc.imdb_rating = 7.4
41
findOneAndUpdate() and findOneAndReplace()
Example: findOneAndUpdate()
db.worker_queue.findOneAndUpdate(
{ state : "unprocessed" },
{ $set: { "worker_id" : 123, "state" : "processing" } },
{ upsert: true } )
findOneAndDelete()
db.foo.drop();
db.foo.insertMany( [ { a : 1 }, { a : 2 }, { a : 3 } ] );
db.foo.find(); // shows the documents.
db.foo.findOneAndDelete( { a : { $lte : 3 } } );
db.foo.find();
42
2.6 Lab: Updating Documents
In the sample.inspections namespace, let’s imagine that we want to do a little data cleaning. We’ve decided to eliminate
the “Completed” inspection result and use only “No Violation Issued” for such inspection cases. Please update all
inspections accordingly.
db.product_metrics.insertOne(
{ name: "backpack",
purchasesPast7Days: [ 0, 0, 0, 0, 0, 0, 0] })
Each 0 within the “purchasesPast7Days” field corresponds to a day of the week. The first element is Monday, the
second element is Tuesday, etc.).
Write an update statement to increment the number of backpacks sold on Friday by 200.
43
3 Indexes
Learning Objectives
Why Indexes?
Index on x
9 25
3 7 8 17 27 35
1 2 4 5 6 8.5 16 26 28 33 39 55
{ { { { {
x: 8.5, x: 5, x: 17, x: 35, x: 25, ... ...
... ... ... ... ...
} } } } }
44
Types of Indexes
• Single-field indexes
• Compound indexes
• Multikey indexes
• Geospatial indexes
• Text indexes
Let’s explore what MongoDB does for the following query by using explain().
We are projecting only user.name so that the results are easy to read.
Results of explain()
With the default explain() verbosity, you will see results similar to the following:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "twitter.tweets",
"indexFilterSet" : false,
"parsedQuery" : {
"user.followers_count" : {
"$eq" : 1000
}
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"user.followers_count" : {
"$eq" : 1000
}
},
"direction" : "forward"
},
"rejectedPlans" : [ ]
},
...
}
45
explain() Verbosity Can Be Adjusted
• default: determines the winning query plan but does not execute query
• executionStats: executes query and gathers statistics
• allPlansExecution: runs all candidate plans to completion and gathers statistics
explain("executionStats")
..
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 8,
"executionTimeMillis" : 107,
"totalKeysExamined" : 0,
"totalDocsExamined" : 51428,
"executionStages" : {
"stage" : "COLLSCAN",
"filter" : {
"user.followers_count" : {
"$eq" : 1000
}
},
explain("executionStats") - Continued
"nReturned" : 8,
"executionTimeMillisEstimate" : 100,
"works" : 51430,
"advanced" : 8,
"needTime" : 51421,
"needFetch" : 0,
"saveState" : 401,
"restoreState" : 401,
"isEOF" : 1,
"invalidates" : 0,
"direction" : "forward",
"docsExamined" : 51428
}
...
}
46
explain("executionStats") Output
Other Operations
In addition to find(), we often want to use explain() to understand how other operations will be handled.
• aggregate()
• count()
• group()
• update()
• remove()
• findAndModify()
• insert()
db.<COLLECTION>.explain()
equivalent to
also equivalent to
47
Using explain() for Write Operations
Simulate the number of writes that would have occurred and determine the index(es) used:
Single-Field Indexes
Creating an Index
db.tweets.createIndex( { "user.followers_count" : 1 } )
db.tweets.find( { "user.followers_count" : 1000 } ).explain()
explain() indicated there will be a substantial performance improvement in handling this type of query.
Listing Indexes
db.tweets.getIndexes()
db.tweets.getIndexKeys()
• Indexes improve read performance for queries that are supported by the index.
• Inserts will be slower when there are indexes that MongoDB must also update.
• The speed of updates may be improved because MongoDB will not need to do a collection scan to find target
documents.
• An index is modified any time a document:
– Is inserted (applies to all indexes)
– Is deleted (applies to all indexes)
– Is updated in such a way that its indexed field changes
48
Index Limitations
• Sparse
• Unique
• Background
• Sparse indexes only contain entries for documents that have the indexed field.
db.<COLLECTION>.createIndex(
{ field_name : 1 },
{ sparse : true } )
db.<COLLECTION>.createIndex(
{ field_name : 1 },
{ unique : true } )
49
Building Indexes in the Background
db.<COLLECTION>.createIndex(
{ field_name : 1 },
{ background : true } )
• Begin by importing the routes collection from the usb drive into a running mongod process
• You should import 66985
# if no mongod running
mkdir -p data/db
mongod --port 30000 --dbpath data/db --logpath data/mongod.log --append --fork
# end if no mongod running
mongoimport --drop -d airlines -c routes routes.json
Executing a Query
• With the documents inserted, perform the following two queries, finding all routes for Delta
db.routes.find({"airline.id": 2009})
db.routes.find({"airline.id": 2009}).explain("executionStats")
Creating an Index
50
3.3 Compound Indexes
Learning Objectives
• Sorting, e.g.,
51
Designing Compound Indexes
Requirements:
• Find all messages in a specified timestamp range.
• Select for whether the messages are anonymous or not.
• Sort by rating from highest to lowest.
Now let’s query for messages with timestamp in the range 2 through 4 inclusive.
Analysis:
• Explain plan shows good performance, i.e. totalKeysExamined = n.
• However, this does not satisfy our query.
• Need to query again with {username: "anonymous"} as part of the query.
52
Query Adding username
totalKeysExamined > n.
db.messages.dropIndex( "myindex" );
db.messages.createIndex( { timestamp : 1, username : 1 },
{ name : "myindex" } )
db.messages.find( { timestamp : { $gte : 2, $lte : 4 },
username : "anonymous" } ).explain("executionStats")
totalKeysExamined > n
timestamp username
1 “anonymous”
2 “anonymous”
3 “sam”
4 “anonymous”
5 “martha”
db.messages.dropIndex( "myindex" );
db.messages.createIndex( { username : 1 , timestamp : 1 },
{ name : "myindex" } )
totalKeysExamined is 2. n is 2.
53
totalKeysExamined == n
username timestamp
“anonymous” 1
“anonymous” 2
“anonymous” 4
“sam” 2
“martha” 5
db.messages.find( {
timestamp : { $gte : 2, $lte : 4 },
username : "anonymous"
} ).sort( { rating : -1 } ).explain("executionStats");
In-Memory Sorts
Let’s modify the index again to allow the database to sort for us.
db.messages.dropIndex( "myindex" );
db.messages.createIndex( { username : 1 , timestamp : 1, rating : 1 },
{ name : "myindex" } );
db.messages.find( {
timestamp : { $gte : 2, $lte : 4 },
username : "anonymous"
} ).sort( { rating : -1 } ).explain("executionStats");
• The explain plan remains unchanged, because the sort field comes after the range fields.
• The index does not store entries in order by rating.
• Note that this requires us to consider a tradeoff.
54
Avoiding an In-Memory Sort
db.messages.dropIndex( "myindex" );
db.messages.createIndex( { username : 1, rating : 1, timestamp : 1 },
{ name : "myindex" } );
db.messages.find( {
timestamp : { $gte : 2, $lte : 4 },
username : "anonymous"
} ).sort( { rating : -1 } ).explain("executionStats");
Covered Queries
• When a query and projection include only the indexed fields, MongoDB will return results directly from the
index.
• There is no need to scan any documents or bring documents into memory.
55
• These covered queries can be very efficient.
db.testcol.drop()
for (i=1; i<=20; i++) {
db.testcol.insertOne({ "_id" : i, "title" : i, "name" : i,
"rating" : i, "budget" : i })
};
db.testcol.createIndex( { "title" : 1, "name" : 1, "rating" : 1 } )
// Covered query!
db.testcol.find( { "title" : 3 },
{ "_id" : 0, "title" : 1, "name" : 1, "rating" : 1 }
).explain("executionStats")
performance.init()
The method above will build a sample data set in the “sensor_readings” collection. What index is needed for this
query?
56
Exercise: Avoiding an In-Memory Sort
What index is needed for the following query to avoid an in-memory sort?
What index is needed for the following query to avoid an in-memory sort?
db.sensor_readings.find(
{ x : { $in : [100, 200, 300, 400] } }
).sort( { tstamp : -1 })
Learning Objectives
57
Example: Array of Numbers
db.race_results.drop()
db.race_results.createIndex( { "lap_times" : 1 } )
a = [ { "lap_times" : [ 3, 5, 2, 8 ] },
{ "lap_times" : [ 1, 6, 4, 2 ] },
{ "lap_times" : [ 6, 3, 3, 8 ] } ]
db.race_results.insertMany( a )
db.blog.drop()
b = [ { "comments" : [
{ "name" : "Bob", "rating" : 1 },
{ "name" : "Frank", "rating" : 5.3 },
{ "name" : "Susan", "rating" : 3 } ] },
{ "comments" : [
{ name : "Megan", "rating" : 1 } ] },
{ "comments" : [
{ "name" : "Luke", "rating" : 1.4 },
{ "name" : "Matt", "rating" : 5 },
{ "name" : "Sue", "rating" : 7 } ] }]
db.blog.insertMany(b)
db.blog.createIndex( { "comments" : 1 } )
// vs
db.blog.createIndex( { "comments.rating" : 1 } )
58
Exercise: Array of Arrays, Part 1
Add some documents and create an index simulating a player in a game moving on an X,Y grid.
db.player.drop()
db.player.createIndex( { "last_moves" : 1 } )
c = [ { "last_moves" : [ [ 1, 2 ], [ 2, 3 ], [ 3, 4] ] },
{ "last_moves" : [ [ 3, 4 ], [ 4, 5 ] ] },
{ "last_moves" : [ [ 4, 5 ], [ 5, 6 ] ] },
{ "last_moves" : [ [ 3, 4 ] ] },
{ "last_moves" : [ [ 4, 5 ] ] } ]
db.player.insertMany(c)
db.player.find()
db.player.find( { "last_moves" : [ 3, 4 ] } )
db.player.find( { "last_moves" : 3 } )
db.player.find( { "last_moves.1" : [ 4, 5 ] } )
db.player.find( { "last_moves.2" : [ 2, 3 ] } )
59
Exercise: Multikey Indexes and Sorting
db.testcol.drop()
a = [ { x : [ 1, 11 ] }, { x : [ 2, 10 ] }, { x : [ 3 ] },
{ x : [ 4 ] }, { x : [ 5 ] } ]
db.testcol.insert(a)
db.testcol.createIndex( { x : 1 } )
• You cannot create a compound index using more than one array-valued field.
• This is because of the combinatorics.
• For a compound index on two array-valued fields you would end up with N * M entries for one document.
• You cannot have a hashed multikey index.
• You cannot have a shard key use a multikey index.
• We discuss shard keys in another module.
• The index on the _id field cannot become a multikey index.
db.testcol.drop()
db.testcol.createIndex( { x : 1, y : 1 } )
// no problems yet
db.testcol.insertOne( { _id : 1, x : 1, y : 1 } )
// still OK
db.testcol.insertOne( { _id : 2, x : [ 1, 2 ], y : 1 } )
// still OK
db.testcol.insertOne( { _id : 3, x : 1, y : [ 1, 2 ] } )
// Won’t work
db.testcol.insertOne( { _id : 4, x : [ 1, 2 ], y : [ 1, 2 ] } )
60
3.6 Hashed Indexes
Learning Objectives
• Hashed indexes are based on field values like any other index.
• The difference is that the values are hashed and it is the hashed value that is indexed.
• The hashing function collapses sub-documents and computes the hash for the entire value.
• MongoDB can use the hashed index to support equality queries.
• Hashed indexes do not support multi-key indexes, i.e. indexes on array fields.
• Hashed indexes do not support range queries.
• In MongoDB, the primary use for hashed indexes is to support sharding a collection using a hashed shard key.
• In some cases, the field we would like to use to shard data would make it difficult to scale using sharding.
• Using a hashed shard key to shard a collection ensures an even distribution of data and overcomes this problem.
• See Shard a Collection Using a Hashed Shard Key7 for more details.
• We discuss sharding in detail in another module.
Limitations
• You may not create compound indexes that have hashed index fields.
• You may not specify a unique constraint on a hashed index.
• You can create both a hashed index and a non-hashed index on the same field.
7 http://docs.mongodb.org/manual/tutorial/shard-collection-with-a-hashed-shard-key/
61
Floating Point Numbers
• MongoDB hashed indexes truncate floating point numbers to 64-bit integers before hashing.
• Do not use a hashed index for floating point numbers that cannot be reliably converted to 64-bit integers.
• MongoDB hashed indexes do not support floating point values larger than 253 .
Create a hashed index using an operation that resembles the following. This operation creates a hashed index for the
active collection on the a field.
db.active.createIndex( { a: "hashed" } )
Learning Objectives
62
Easiest to Start with 2 Dimensions
Location Field
• A geospatial index enables us to efficiently query a collection based on geometric relationships between docu-
ments and the query.
• For example, we can quickly locate all documents within a certain radius of our query location.
• In this example, we’ve illustrated a $near query in a 2d geospatial index.
63
Flat Geospatial Index
Creating a 2d Index
Creating a 2d index:
db.<COLLECTION>.createIndex(
{ field_name : "2d", <optional additional field> : <value> },
{ <optional options document> } )
64
Inserting Documents with a 2d Index
Write a query to find all documents in the testcol collection that have an xy field value that falls entirely within the
circle with center at [ -2.5, -0.5 ] and a radius of 3.
65
Creating a 2dsphere Index
geoJSON Considerations
66
Polygons
Simple Polygon:
Create a coordinate pair for each the following airports. Create one variable per airport.
• LaGuardia (New York): 40.7772° N, 73.8726° W
• JFK (New York): 40.6397° N, 73.7789° W
• Newark (New York): 40.6925° N, 74.1686° W
• Heathrow (London): 52.4775° N, 0.4614° W
• Gatwick (London): 51.1481° N, 0.1903° W
• Stansted (London): 51.8850° N, 0.2350° E
• Luton (London): 51.9000° N, 0.4333° W
67
Exercise: Inserting geoJSON Objects (2)
$near: Just like $geoNear, except in very edge cases; check the docs.
$geoWithin: Only returns documents with a location completely contained within the query.
$geoIntersects: Returns documents with their indexed field intersecting any part of the shape in the query.
68
3.8 Using Compass with Indexes
Learning Objectives
Introduction
• Import the trips.json dataset into a database called citibike and a collection called trips
• Execute a geoSpatial query finding all trips that
– Begin within a 1.2 mile radius (1.93 kilometers) of the middle of Central Park:
* [ -73.97062540054321, 40.776398033956916]
– End within a 0.25 mile radius (.40 kilometers) of Madison Square Park:
* [-73.9879247077942, 40.742201076382784]
{
"start station location": { "$geoWithin": { "$centerSphere": [
[ -73.97062540054321, 40.776398033956916 ], 0.000302786 ] } },
"end station location": { "$geoWithin": { "$centerSphere": [
[ -73.9879247077942, 40.742201076382784 ], 0.00006308 ] } }
}
69
geoJSON Query Example
70
Query Explain (cont)
71
Creating an Index Example
72
Verifying the Index
{
"start station location": { "$geoWithin": { "$centerSphere": [
[ -73.97062540054321, 40.776398033956916 ], 0.000302786 ] } },
"end station location": { "$geoWithin": { "$centerSphere": [
[ -73.9879247077942, 40.742201076382784 ], 0.00006308 ] } }
}
Index Performance
Learning Objectives
73
TTL Index Basics
Create with:
db.<COLLECTION>.createIndex( { field_name : 1 },
{ expireAfterSeconds : some_number } )
Let’s create a TTL index on the ttl collection that will delete documents older than 30 seconds. Write a script that
will insert documents at a rate of one per second.
db.sessions.drop()
db.sessions.createIndex( { "last_user_action" : 1 },
{ "expireAfterSeconds" : 30 } )
i = 0
while (true) {
i += 1;
db.sessions.insertOne( { "last_user_action" : ISODate(), "b" : i } );
sleep(1000); // Sleep for 1 second
}
Then, leaving that window open, open up a new terminal and connect to the database with the mongo shell. This will
allow us to verify the TTL behavior.
74
3.10 Text Indexes
Learning Objectives
• A text index is based on the tokens (words, etc.) used in string fields.
• MongoDB supports text search for a number of languages.
• Text indexes drop language-specific stop words (e.g. in English “the”, “an”, “a”, “and”, etc.).
• Text indexes use simple, language-specific suffix stemming (e.g., “running” to “run”).
You create a text index a little bit differently than you create a standard index.
db.<COLLECTION>.createIndex(
{ "title" : "text", "keywords": "text", "author" : "text" },
{ "weights" : {
"title" : 10,
"keywords" : 5
}})
• Term match in “title” field has 10 times (i.e. 10:1) the impact as a term match in the “author” field.
75
Creating a Text Index with Weighted Fields
db.<COLLECTION>.createIndex(
{ "title" : "text", "keywords": "text", "author" : "text" },
{ "weights" : {
"title" : 10,
"keywords" : 5
}})
• Term match in “title” field has 10 times (i.e. 10:1) the impact as a term match in the “author” field.
• Continuing our example, you can treat the dialog field as a multikey index.
• A multikey index with each of the words in dialog as values.
• You can query the field using the $text operator.
db.montyPython.insertMany( [
{ _id : 1,
dialog : "What is the air-speed velocity of an unladen swallow?" },
{ _id : 2,
dialog : "What do you mean? An African or a European swallow?" },
{ _id : 3,
dialog : "Huh? I... I don’t know that." },
{ _id : 45,
dialog : "You’re using coconuts!" },
{ _id : 55,
dialog : "What? A swallow carrying a coconut?" } ] )
76
Exercise: Querying a Text Index
Using the text index, find all documents in the montyPython collection with the word “swallow” in it.
// Returns 3 documents.
db.montyPython.find( { $text : { $search : "swallow" } } )
• Find all documents in the montyPython collection with either the word ‘coconut’ or ‘swallow’.
• By default MongoDB ORs query terms together.
• E.g., if you query on two words, results include documents using either word.
db.<COLLECTION>.find(
{ $text : { $search : "swallow coconut"} },
{ textScore: {$meta : "textScore" } }
).sort(
{ textScore: { $meta: "textScore" } }
) )
77
3.11 Partial Indexes
Learning Objectives
• Indexes with keys only for the documents in a collection that match a filter expression.
• Relative to standard indexes, benefits include:
– Lower storage requirements
* On disk
* In memory
– Reduced performance costs for index maintenance as writes occur
78
Example: Creating Partial Indexes (Continued)
db.integers.createIndex(
{ integer : 1 },
{ partialFilterExpression : { importance : "high" },
name : "high_importance_integers" } )
Filter Conditions
• As the value for partialFilterExpression, specify a document that defines the filter.
• The following types of expressions are supported.
• Use these in combinations that are appropriate for your use case.
• Your filter may stipulate conditions on multiple fields.
– equality expressions
– $exists: true expression
– $gt, $gte, $lt, $lte expressions
– $type expressions
– $and operator at the top-level only
• Both sparse indexes and partial indexes include only a subset of documents in a collection.
• Sparse indexes reference only documents for which at least one of the indexed fields exist.
• Partial indexes provide a richer way of specifying what documents to index than does sparse indexes.
db.integers.createIndex(
{ importance : 1 },
{ partialFilterExpression : { importance : { $exists : true } } }
) // similar to a sparse index
Quiz
79
Identifying Partial Indexes
> db.integers.getIndexes()
[
...,
{
"v" : 1,
"key" : {
"integer" : 1
},
"name" : "high_importance_integers",
"ns" : "test.integers",
"partialFilterExpression" : {
"importance" : "high"
}
},
...
]
Quiz
{
"v" : 1,
"key" : {
"score" : 1,
"student_id" : 1
},
"name" : "score_1_student_id_1",
"ns" : "test.scores",
"partialFilterExpression" : {
"score" : {
"$gte" : 0.65
},
"subject_name" : "history"
}
}
80
Quiz (Continued)
Set Up
• In this exercise let’s bring up a mongo shell with the following instructions
performance.init()
• In a mongo shell run performance.b(). This will run in an infinite loop printing some output as it runs
various statements against the server.
• Now imagine we have detected a performance problem and suspect there is a slow operation running.
• Find the slow operation and terminate it. Every slow operation is assumed to run for 100ms or more.
• In order to do this, open a second window (or tab) and run a second instance of the mongo shell.
• What indexes can we introduce to make the slow queries more efficient? Disregard the index created in the
previous exercises.
Exercise: explain(“executionStats”)
mongo performance
> db.sensor_readings.dropIndexes()
db.sensor_readings.createIndex({ "active" : 1 } )
How many index entries and documents are examined for the following query? How many results are returned?
db.sensor_readings.find(
{ "active": false, "_id": { $gte: 99, $lte: 1000 } }
).explain("executionStats")
81
4 Storage
Learning Objectives
82
Storage Engine Journaling
With the release of MongoDB 3.2, three storage engine options are available:
• MMAPv1
• WiredTiger (default)
• In-memory storage (Enterprise only)
Use the --storageEngine parameter to specify which storage engine MongoDB should use. E.g.,
• MMAPv1 is MongoDB’s original storage engine was the default up to MongoDB 3.0.
• specify the use of the MMAPv1 storage engine as follows:
• MMAPv1 is based on memory-mapped files, which map data files on disk into virtual memory.
• As of MongoDB 3.0, MMAPv1 supports collection-level concurrency.
8 http://docs.mongodb.org/manual/reference/program/mongod/#storage-options
83
MMAPv1 Workloads
MMAPv1 excels at workloads where documents do not outgrow their original record size:
• High-volume inserts
• Read-only workloads
• In-place updates
• MongoDB 3.0 uses allocation as the default record allocation strategy for MMAPv1.
• With this strategy, records include the document plus extra space, or padding.
• Each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, ... 2MB).
• For documents larger than 2MB, allocation is rounded up to the nearest multiple of 2MB.
• This strategy enables MongoDB to efficiently reuse freed records to reduce fragmentation.
• In addition, the added padding gives a document room to grow without requiring a move.
– Saves the cost of moving a document
– Results in fewer updates to indexes
Compression in MongoDB
• Compression can significantly reduce the amount of disk space / memory required.
• The tradeoff is that compression requires more CPU.
• MMAPv1 does not support compression.
• WiredTiger does.
• The WiredTiger storage engine excels at all workloads, especially write-heavy and update-heavy workloads.
• Notable features of the WiredTiger storage engine that do not exist in the MMAPv1 storage engine include:
– Compression
– Document-level concurrency
• Default storage engine since MongoDB 3.2.
• For older versions, specify the use of the WiredTiger storage engine as follows.
84
WiredTiger Compression Options
• snappy (default): less CPU usage than zlib, less reduction in data size
• zlib: greater CPU usage than snappy, greater reduction in data size
• no compression
Use the wiredTigerCacheSize parameter to designate the amount of RAM for the WiredTiger storage engine.
• By default, this value is set to the maximum of half of physical RAM or 1GB
• If the database server shares a machine with an application server, it is now easier to designate the amount of
RAM the database server can use
• MMAPv1 uses write-ahead journaling to ensure consistency and durability between fsyncs.
• WiredTiger uses a write-ahead log in combination with checkpoints to ensure durability.
• Regardless of storage engine, always use journaling in production.
85
MMAPv1 Journaling Mechanics (Continued)
• Data is flushed from the shared view to data files every 60 seconds (configurable)
• The operating system may force a flush at a higher frequency than 60 seconds if the system is low on free
memory
• Once a journal file contains only flushed writes, it is no longer needed for recovery and can be deleted or re-used
• WiredTiger will commit a checkpoint to disk every 60 seconds or when there are 2 gigabytes of data to write.
• Between and during checkpoints the data files are always valid.
• The WiredTiger journal persists all data modifications between checkpoints.
• If MongoDB exits between checkpoints, it uses the journal to replay all data modified since the last checkpoint.
• By default, WiredTiger journal is compressed using snappy.
Conclusion
86
5 Drivers
Learning Objectives
• C9
• C++10
• C#11
• Java12
• Node.js13
• Perl14
• PHP15
• Python16
• Ruby17
• Scala18
9 http://docs.mongodb.org/ecosystem/drivers/c
10 http://docs.mongodb.org/ecosystem/drivers/cpp
11 http://docs.mongodb.org/ecosystem/drivers/csharp
12 http://docs.mongodb.org/ecosystem/drivers/java
13 http://docs.mongodb.org/ecosystem/drivers/node-js
14 http://docs.mongodb.org/ecosystem/drivers/perl
15 http://docs.mongodb.org/ecosystem/drivers/php
16 http://docs.mongodb.org/ecosystem/drivers/python
17 http://docs.mongodb.org/ecosystem/drivers/ruby
18 http://docs.mongodb.org/ecosystem/drivers/scala
87
MongoDB Community Supported Drivers
Driver Specs
To ensure drivers have a consistent functionality, series of publicly available specification documents19 for:
• Authentication20
• CRUD operations21
• Index management22
• SDAM23
• Server Selection24
• Etc.
• Read preference
• Write concern
• Maximum operation time (maxTimeMS)
• Batch Size (batchSize)
• Exhaust cursor (exhaust)
• Etc.
• Connection timeout
• Connections per host
• Time that a thread will block waiting for a connection (maxWaitTime)
• Socket keep alive
• Sets the multiplier for number of threads allowed to block waiting for a connection
• Etc.
19 https://github.com/mongodb/specifications
20 https://github.com/mongodb/specifications/tree/master/source/auth
21 https://github.com/mongodb/specifications/tree/master/source/crud
22 https://github.com/mongodb/specifications/blob/master/source/index-management.rst
23 https://github.com/mongodb/specifications/tree/master/source/server-discovery-and-monitoring
24 https://github.com/mongodb/specifications/tree/master/source/server-selection
88
Insert a Document with the Java Driver
MongoDatabase db = mongoClient.getDatabase("test");
db.getCollection("blog").insertOne(myDocument);
client = MongoClient()
db = client[’test’]
db.blog.insert_one(myDocument);
mongocxx::instance inst{};
mongocxx::client conn{};
auto db = conn["test"];
89
5.2 Lab: Driver Tutorial (Optional)
Tutorial
25 http://api.mongodb.org/python/current/tutorial.html
90
6 Aggregation
Intro to Aggregation (page 91) An introduction to the the aggregation framework, pipeline concept, and stages
Aggregation - Utility (page 98) Utility stages in the aggregation pipeline
Aggregation - $lookup and $graphLookup (page 103) $lookup and $graphLookup in the aggregation pipeline
Lab: Using $graphLookup (page 106) $graphLookup lab
Aggregation - Unwind (page 107) The $unwind stage in depth
Lab: Aggregation Framework (page 108) Aggregation labs
Aggregation - $facet, $bucket, and $bucketAuto (page 109) The $facet, $bucket, and $bucketAuto stages
Aggregation - Recap (page 113) Recap of the aggregation framework
Learning Objectives
Aggregation Basics
• Use the aggregation framework to transform and analyze data in MongoDB collections.
• For those who are used to SQL, aggregation comprehends the functionality of several SQL clauses like
GROUP_BY, JOIN, AS, and several other operations that allow us to compute datasets.
• The aggregation framework is based on the concept of a pipeline.
91
Aggregation Stages
92
The Project Stage
• $project allows you to shape the documents into what you need for the next stage.
– The simplest form of shaping is using $project to select only the fields you are interested in.
– $project can also create new fields from other fields in the input document.
* E.g., you can pull a value out of an embedded document and put it at the top level.
* E.g., you can create a ratio from the values of two fields as pass along as a single field.
• $project produces 1 output document for every input document it sees.
A Twitter Dataset
• Let’s look at some examples that illustrate the MongoDB aggregation framework.
• These examples operate on a collection of tweets.
– As with any dataset of this type, it’s a snapshot in time.
– It may not reflect the structure of Twitter feeds as they look today.
{
"text" : "Something interesting ...",
"entities" : {
"user_mentions" : [
{
"screen_name" : "somebody_else",
...
}
],
"urls" : [ ],
"hashtags" : [ ]
},
"user" : {
"friends_count" : 544,
"screen_name" : "somebody",
"followers_count" : 100,
...
},
}
93
Analyzing Tweets
db.tweets.aggregate( [
{ $match: { "user.friends_count": { $gt: 0 },
"user.followers_count": { $gt: 0 } } },
{ $project: { ratio: {$divide: ["$user.followers_count",
"$user.friends_count"]},
screen_name : "$user.screen_name"} },
{ $sort: { ratio: -1 } },
{ $limit: 1 } ] )
• Of the users in the “Brasilia” timezone who have tweeted 100 times or more, who has the largest number of
followers?
• Time zone is found in the “time_zone” field of the user object in each tweet.
• The number of tweets for each user is found in the “statuses_count” field.
• A result document should look something like the following:
{ _id : ObjectId(’52fd2490bac3fa1975477702’),
followers : 2597,
screen_name: ’marbles’,
tweets : 12334
}
94
The Group Stage
• For those coming from the relational world, $group is similar to the SQL GROUP BY statement.
• $group operations require that we specify which field to group on.
• Documents with the same identifier will be aggregated together.
• With $group, we aggregate values using accumulators27 .
Tweet Source
db.tweets.aggregate( [
{ "$group" : { "_id" : "$source",
"count" : { "$sum" : 1 } } },
{ "$sort" : { "count" : -1 } }
] )
95
Rank Users by Number of Tweets
For each user, aggregate all their tweets into a single array.
db.tweets.aggregate( [
{ "$group" : { "_id" : "$user.screen_name",
"tweet_texts" : { "$push" : "$text" },
"count" : { "$sum" : 1 } } },
{ "$sort" : { "count" : -1 } },
{ "$limit" : 3 }
] )
96
The Limit Stage
• Used to limit the number of documents passed to the next aggregation stage.
• Works like the limit() cursor method.
• Value is an integer.
• E.g., the following will only pass 3 documents to the stage that comes next in the pipeline.
– db.testcol.aggregate( [ { $limit: 3 }, ... ] )
97
Example: Using $lookup (Continued)
] )
// END EXAMPLES LOOKUP INSERT
db.testcol.aggregate( [ { $sort : { b : 1, a : -1 } } ] )
• Used to limit the number of documents passed to the next aggregation stage.
• Works like the .limit() cursor method.
• Value is an integer.
• The following will only pass 3 documents to the stage that comes next in the pipeline:
98
The $count Stage
• Used to count the number of documents that this stage receives in input
• The following would count all documents in a users collection with a firstName field set to “Mary”
db.users.aggregate([
{ $match: { firstName: "Mary" } },
{ $count: "usersNamedMary" }
])
Example: $sample
db.companies.aggregate( [
{ $sample : { size : 5 } },
{ $project : { _id : 0, number_of_employees: 1 } }
] )
• Tells you how many times each index has been used since the server process began
• Must be the first stage of the pipeline
• Returns one document per index
• The accesses.ops field reports the number of times an index was used
99
Example: $indexStats
Issue each of the following commands in the mongo shell, one at a time.
db.companies.dropIndexes()
db.companies.createIndex( { number_of_employees : 1 } )
db.companies.aggregate( [ { $indexStats: {} } ] )
db.companies.find( { number_of_employees : { $gte : 100 } },
{ number_of_employees: 1 } ).next()
db.companies.find( { number_of_employees : { $gte : 100 } },
{ number_of_employees: 1 } ).next()
db.companies.aggregate( [ { $indexStats: {} } ] )
// BEGIN EXAMPLES indexStats
// $trunc example
db.companies.aggregate( [
{ $match : { number_of_employees: { $gte: 100, $lt: 1000 } } },
{ $group : { _id : null,
mean_employees: { $avg: "$number_of_employees" } } },
100
{ $project : { _id: 0,
truncated_mean_employees: { $trunc : "$mean_employees" } } }
] )
// $filter example
db.companies.aggregate( [
{ $match : { "funding_rounds.round_code": "e" } },
{ $project : {
_id: 0, name: 1,
series_e_funding: {
$filter: {
input: "$funding_rounds",
as: "series_e_funding",
cond: { $eq : [ "$$series_e_funding.round_code", "e" ] } } } }
}, {
$project : {
name: 1,
"series_e_funding.raised_amount": 1,
"series_e_funding.raised_currency_code": 1,
"series_e_funding.year": 1 }
} ] )
101
])
// END EXAMPLES GRAPHLOOKUP ILLUSTRATION DATA
Example: $out
db.tweets.aggregate([
{ $group: {
_id: null,
users: { $push: {
user: "$$CURRENT.user.screen_name",
activity: "$$CURRENT.user.statuses_count"
}}
}},
{ $unwind: "$users" },
{ $project: { _id: 0, user: "$users.user", activity: "$users.activity" } },
{ $sort: { activity: -1 } },
102
{ $out: "usersByActivity"}
])
db.commentOnEmployees.insertMany( [
{ employeeCount: 405000,
comment: "Biggest company in the set." },
{ employeeCount: 405000,
comment: "So you get two comments." },
{ employeeCount: 100000,
comment: "This is a suspiciously round number." },
{ employeeCount: 99999,
comment: "This is a suspiciously accurate number." },
{ employeeCount: 99998,
comment: "This isn’t in the data set." }
] )
db.companies.aggregate( [
{ $match: { number_of_employees: { $in:
[ 405000, 388000, 100000, 99999, 99998 ] } } },
{ $project: { _id :0, name: 1, number_of_employees: 1 } },
{ $lookup: {
from: "commentOnEmployees",
localField: "number_of_employees",
foreignField: "employeeCount",
as: "example_comments"
} },
{ $sort : { number_of_employees: -1 } } ] )
103
The GraphLookup Stage
• Used to perform a recursive search on a collection, with options for restricting the search by recursion depth and
query filter.
• Has the following prototype form:
$graphLookup: {
from: <collection>,
startWith: <expression>,
connectFromField: <string>,
connectToField: <string>,
as: <string>,
maxDepth: <number>,
depthField: <string>,
restrictSearchWithMatch: <document>
}
$graphLookup Fields
• maxDepth: Optional. Non-negative integral number specifying the maximum recursion depth.
• depthField: Optional. Name of the field to add to each traversed document in the search path. The value of this
field is the recursion depth for the document
• restrictSearchWithMatch: Optional. A document specifying additional conditions for the recursive search.
The syntax is identical to query filter syntax.
Query Documentation28
28 https://docs.mongodb.com/manual/tutorial/query-documents/#read-operations-query-argument
104
$graphLookup Search Process
For each matching document, $graphLookup takes the value of the connectFromField and checks every doc-
ument in the from collection for a matching connectToField value
• For each match, $graphLookup adds the matching document in the from collection to an array field named
by the as parameter
• This step continues recursively until no more matching documents are found, or until it reaches the recursion
depth specified by maxDepth
$graphLookup Considerations
$graphLookup Example
use company;
db.employees.insertMany([
{ "_id" : 1, "name" : "Dev" },
{ "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" },
{ "_id" : 3, "name" : "Ron", "reportsTo" : "Eliot" },
{ "_id" : 4, "name" : "Andrew", "reportsTo" : "Eliot" },
{ "_id" : 5, "name" : "Asya", "reportsTo" : "Ron" },
{ "_id" : 6, "name" : "Dan", "reportsTo" : "Andrew" }
])
105
$graphLookup Example (continued)
db.employees.aggregate([
{
$match: { "name": "Dan" }
}, {
$graphLookup: {
from: "employees",
startWith: "$reportsTo",
connectFromField: "reportsTo",
connectToField: "name",
as: "reportingHierarchy"
}
}
]).pretty()
{
"_id" : 6,
"name" : "Dan",
"reportsTo" : "Andrew",
"reportingHierarchy" : [
{ "_id" : 1, "name" : "Dev" },
{ "_id" : 2, "name" : "Eliot", "reportsTo" : "Dev" },
{ "_id" : 4, "name" : "Andrew", "reportsTo" : "Eliot" }
]
}
For this exercise, incorporate the $graphLookup stage into an aggregation pipeline to find Delta Air Lines routes
from JFK to BOI. Find all routes that only have one layover.
• Start by importing the necessary dataset
106
6.5 Aggregation - Unwind
Learning Objectives
...
"entities" : {
"user_mentions" : [
{
"indices" : [
28,
44
],
"screen_name" : "LatinsUnitedGSX",
"name" : "Henry Ramirez",
"id" : 102220662
}
],
"urls" : [ ],
"hashtags" : [ ]
},
...
Using $unwind
db.tweets.aggregate(
{ $unwind: "$entities.user_mentions" },
{ $group: { _id: "$user.screen_name",
count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 1 })
107
$unwind Behavior
Use the aggregation framework to find the name of the individual who has made the most comments on a blog.
Start by importing the necessary data if you have not already.
To help you verify your work, the author with the fewest comments is Mariela Sherer and she commented 387 times.
Consider together cities in the states of California (CA) and New York (NY) with populations over 25,000. Calculate
the average population of this sample of cities.
Please note:
• Different states might have the same city name.
• A city might have multiple zip codes.
108
Exercise: Projection
Calculate the total number of people who live in a zip code in the US where the city starts with a digit.
$project can extract the first digit from any field. E.g.,
db.zips.aggregate([
{$project:
{
first_char: { $substr: ["$city", 0, 1] },
}
}
])
From the grades collection, find the class (display the class_id) with the highest average student performance
on exams. To solve this problem you’ll want an average of averages.
First calculate the average exam score of each student in each class. Then determine the average class exam score
using these values. If you have not already done so, import the grades collection as follows.
Before you attempt this exercise, explore the grades collection a little to ensure you understand how it is structured.
For additional exercises, consider other statistics you might want to see with this data and how to calculate them.
Learning Objectives
109
The $bucket stage
$bucket Form
{
$bucket: {
groupBy: <expression>,
boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
default: <literal>,
output: {
<output1>: { <$accumulator expression> },
...
<outputN>: { <$accumulator expression> }
}
}
}
$bucket Exercise
• Using our twitter dataset, let’s group users by their tweet/retweet activity
• The bounds should be 0, 100, 500, 2000, 5000, 10000, and 25000
• Produce the following results
29 https://docs.mongodb.com/manual/meta/aggregation-quick-reference/#agg-quick-reference-accumulators
110
$bucketAuto
$bucketAuto Form
{
$bucketAuto: {
groupBy: <expression>,
buckets: <number>,
output: {
<output1>: { <$accumulator expression> },
...
}
granularity: <string>
}
}
$bucketAuto Exercise
• Using our twitter dataset, use $bucketAuto to group documents into the following result
$facet
111
$facet (cont)
{ $facet:
{
<outputField1>: [ <stage1>, <stage2>, ...<stageN>],
<outputField2>: [ <stage1>, <stage2>, ...<stageN>],
...
}
}
Behavior
Behavior (cont)
$facet Exercise
Using our twitter dataset, output a single document with the following fields:
• mostActive: <User with the most user.statuses_count >
– name: <user.screen_name>
– numTweetsAndRetweets: <user.statuses_count>
• leastActive: <Of users who have at least 1 tweet/retweet, user with the least statuses_count and lexicographi-
cally first screen_name>
– name: <user.screen_name>
– numTweetsAndRetweets: <user.statuses_count>
112
$facet Exercise (cont)
{
"mostActive" : {
"name": "currentutc",
"numTweetsAndRetweets": 518702
},
"leastActive" : {
"name": "ACunninghamMP",
"numTweetsAndRetweets": 1
}
}
• Using the twitter dataset, determine how many unique users are in both the top ~10% by tweets/retweets and
the top ~10% by number of followers
Learning Objectives
Aggregation Stages
113
Aggregation Stages (continued)
• $sample: Randomly selects the specified number of documents from its input.
• $graphLookup: Performs a recursive search on a collection
• $indexStats: Returns statistics regarding the use of each index for the collection
• $out: Creates a new collection from the output of an aggregation pipeline
• $facet: Allows multiple independent aggregation stages to happen concurrently
• Even more stages available!
For more information please check the aggregation framework stages31 documentation page.
Which tweeter has mentioned the most unique users in their tweets?
db.tweets.aggregate([
{ $unwind: "$entities.user_mentions" },
{ $group: {
_id: "$user.screen_name",
mset: { $addToSet: "$entities.user_mentions.screen_name" } } },
{ $unwind: "$mset"},
{ $group: { _id: "$_id", count: { $sum: 1 } } },
{ $sort: { count: -1 } },
{ $limit: 1 }
])
31 https://docs.mongodb.com/manual/reference/operator/aggregation/#stage-operators
114
A Sample Dataset
• Insert the following documents into a collection called sales. We’ll be using them for the next few sections.
• Using a combination of operators, it is possible to query and transform our data into useful ways for study and
interpretation.
• For example, given sales data for a year, identify the months that over and under performed with some statistical
significance.
db.sales.aggregate([...])
{ "_id" : ObjectId("58f552fef704abcdc99b737b"), "month" : "April", "sales" : 6789314,
!→"outsideVariance" : true }
• Pipelines that begin with an exact $match on a shard key will run on that shard only
• For operations that must run on multiple shards, grouping and merging will route to a random shard unless they
require running on the primary shard. This avoids overloading the primary shard
• The $out and $lookup stages require running on the primary shard
115
7 Introduction to Schema Design
Schema Design Core Concepts (page 116) An introduction to schema design in MongoDB
Schema Evolution (page 123) Considerations for evolving a MongoDB schema design over an application’s lifetime
Schema Visualization With MongoDB Compass (page 127) Using Compass to visualize schema
Document Validation (page 132) An introduction to document validation in MongoDB
Lab: Document Validation (page 137) An exercise on document validation
Common Schema Design Models (page 140) Common design models for representing 1-1, 1-M, and M-M relation-
ships and tree structures in MongoDB
Learning Objectives
What is a schema?
116
Example: Denormalized Version
User: Book:
- userName - title
- firstName - isbn
- lastName - language
- createdBy
- author
- firstName
- lastName
Four Considerations
Case Study
Author Schema
{ "_id": ObjectId,
"firstName": string,
"lastName": string
}
117
User Schema
{ "_id": ObjectId,
"userName": string,
"password": string
}
Book Schema
{ "_id": ObjectId,
"title": string,
"slug": string,
"author": int32,
"available": boolean,
"isbn": string,
"pages": int32,
"publisher": {
"city": string,
"date": date,
"name": string
},
"subjects": [ string, string ],
"language": string,
"reviews": [ { "user": int32, "text": string },
{ "user": int32, "text": string } ]
}
{ _id: ObjectId(...),
firstName: "F. Scott",
lastName: "Fitzgerald"
}
118
Example Documents: User
{ _id: ObjectId(...),
userName: "emily@mongodb.com",
password: "slsjfk4odk84k209dlkdj90009283d"
}
{
_id: ObjectId(...),
title: "The Great Gatsby",
slug: "9781857150193-the-great-gatsby",
author: 1,
available: true,
isbn: "9781857150193",
pages: 176,
publisher: {
name: "Everyman’s Library",
date: ISODate("1991-09-19T00:00:00Z"),
city: "London"
},
subjects: ["Love stories", "1920s", "Jazz Age"],
language: "English",
reviews: [
{ user: 1, text: "One of the best..." },
{ user: 2, text: "It’s hard to..." }
]
}
Embedded Documents
{ ...
publisher: {
name: "Everyman’s Library",
date: ISODate("1991-09-19T00:00:00Z"),
city: "London"
},
subjects: ["Love stories", "1920s", "Jazz Age"],
language: "English",
reviews: [
{ user: 1, text: "One of the best..." },
{ user: 2, text: "It’s hard to..." }
]
}
119
Embedded Documents: Pros and Cons
Linked Documents
{ ...
author: 1,
reviews: [
{ user: 1, text: "One of the best..." },
{ user: 2, text: "It’s hard to..." }
]
}
Arrays
• Array of scalars
• Array of documents
120
Array of Scalars
{ ...
subjects: ["Love stories", "1920s", "Jazz Age"],
}
Array of Documents
{ ...
reviews: [
{ user: 1, text: "One of the best..." },
{ user: 2, text: "It’s hard to..." }
]
}
Design a schema for users and their book reviews. UserNames are immutable.
• Users
– userName (string)
– email (string)
• Reviews
– text (string)
– rating (int32)
– created_at (date)
121
Solution B: Users and Book Reviews
122
How GridFS Works
Learning Objectives
Upon completing this module, students should understand the basic philosophy of evolving a MongoDB schema
during an application’s lifetime:
• Development Phase
• Production Phase
• Iterative Modifications
Development Phase
123
Development Phase: Known Query Patterns
Production Phase
Evolve the schema to meet the application’s read and write patterns.
{
_id: 1,
title: "The Great Gatsby",
author: {
firstName: "F. Scott",
lastName: "Fitzgerald"
}
// Other fields follow...
}
124
Production Phase: Write Patterns
review = {
user: 1,
text: "I thought this book was great!",
rating: 5
};
db.books.updateOne(
{ _id: 3 },
{ $push: { reviews: review }}
);
Caveats:
• Document size limit (16MB)
• Storage fragmentation after many updates/deletes
125
Solution: Recent Reviews, Update
myReview = {
_id: ObjectId("..."),
rating: 3,
text: "An average read.",
created_at: ISODate("2012-10-13T12:26:11.502Z")
};
db.reviews.updateOne(
{ _id: "bob-201210" },
{ $push: { reviews: myReview }}
);
cursor = db.reviews.find(
{ _id: /^bob-/ },
{ reviews: { $slice: -10 }}
).sort({ _id: -1 }).batchSize(5);
num = 0;
for (var i = 0; i < doc.reviews.length && num < 10; ++i, ++num) {
printjson(doc.reviews[i]);
}
}
Deleting a review
db.reviews.updateOne(
{ _id: "bob-201210" },
{ $pull: { reviews: { _id: ObjectId("...") }}}
);
126
7.3 Schema Visualization With MongoDB Compass
Learning Objectives
Lesson Setup
• Connect to your mongod with MongoDB Compass and select the trips collection
Schema Visualization
32 https://docs.mongodb.com/manual/reference/operator/aggregation/sample/
127
Schema Visualization Detail
128
Visualizing GeoJSON
129
Interactively Build a GeoJSON query
Documents Explorer
130
Documents Explorer Example
Updating a Document
131
Updating Detail
Learning Objectives
Introduction
Example
db.createCollection( "products",
{
validator: {
price : { $exists : true }
},
validationAction: "error"
}
)
132
Why Document Validation?
Anti-Patterns
• Using document validation at the database level without writing it into your application
– This would result in unexpected behavior in your application
• Allowing uncaught exceptions from the DB to leak into the end user’s view
– Catch it and give them a message they can parse
133
Details
validationLevel: “strict”
• Useful when:
– Creating a new collection
– Validating writes to an existing collection already in compliance
– Insert only workloads
– Changing schema and updates should map documents to the new schema
• This will impose validation on update even to invalid documents
validationLevel: “moderate”
• Useful when:
– Changing a schema and you have not migrated fully
– Changing schema but the application can’t map the old schema to the new in just one update
– Changing a schema for new documents but leaving old documents with the old schema
validationAction: “error”
• Useful when:
– Your application will no longer support valid documents
– Not all applications can be trusted to write valid documents
– Invalid documents create regulatory compliance problems
134
validationAction: “warn”
• Useful when:
– You need to receive all writes
– Your application can handle multiple versions of the schema
– Tracking schema-related issues is important
* For example, if you think your application is probably inserting compliant documents, but you want
to be sure
db.createCollection( "products",
{
validator: {
price: { $exists: true }
},
validationAction: "error"
}
)
To see what the validation rules are for all collections in a database:
db.getCollectionInfos()
And you can see the results when you try to insert:
db.products.drop()
db.products.insertOne( { name: "watch", price: 10000, currency: "USD" } )
db.products.insertOne( { name: "happiness" } )
db.runCommand( {
collMod: "products",
validator: {
price: { $exists: true }
},
validationAction: "error",
validationLevel: "moderate"
} )
db.products.updateOne( { name : "happiness" }, { $set : { note: "Priceless." } } )
db.products.updateOne( { name : "watch" }, { $unset : { price : 1 } } )
db.products.insertOne( { name : "inner peace" } )
135
Bypassing Document Validation
* Restoring a backup
* Re-inserting an accidentally deleted document
• For deployments with access control enabled, this is subject to user roles restrictions
• See the MongoDB server documentation for details
136
Quiz
What are the validation levels available and what are the differences?
Quiz
What command do you use to determine what the validation rule is for the things collection?
Quiz
• Import the posts collection (from posts.json) and look at a few documents to understand the schema.
• Insert the following document into the posts collection
• Discuss: what are some restrictions on documents that a validator could and should enforce?
• Add a validator to the posts collection that enforces those restrictions
• Remove the previously inserted document and try inserting it again and see what happens
Create a collection employees with a validator that enforces the following restrictions on documents:
• The name field must exist and be a string
• The salary field must exist and be between 0 and 10,000 inclusive.
• The email field is optional but must be an email address in a valid format if present.
• The phone field is optional but must be a phone number in a valid format if present.
• At least one of the email and phone fields must be present.
137
Exercise: Create collection with validator (expected results)
// Valid documents
{"name":"Jane Smith", "salary":45, "email":"js@example.com"}
{"name":"Tim R. Jones", "salary":30,
"phone":"234-555-6789","email":"trj@example.com"}
{"name":"Cedric E. Oxford", "salary":600, "phone":"918-555-1234"}
// Invalid documents
{"name":"Connor MacLeod", "salary":9001, "phone":"999-555-9999",
"email":"thrcnbnly1"}
{"name":"Herman Hermit", "salary":9}
{"name":"Betsy Bedford", "salary":50, "phone":"", "email":"bb@example.com"}
Modify the validator for the employees collection to support the following additional restrictions:
• The status field must exist and must only be one of the following strings: “active”, “on_vacation”, “termi-
nated”
• The locked field must exist and be a boolean
// Valid documents
{"name":"Jason Serivas", "salary":65, "email":"js@example.com",
"status":"terminated", "locked":true}
{"name":"Logan Drizt", "salary":39,
"phone":"234-555-6789","email":"ld@example.com", "status":"active",
"locked":false}
{"name":"Mann Edger", "salary":100, "phone":"918-555-1234",
"status":"on_vacation", "locked":false}
// Invalid documents
{"name":"Steven Cha", "salary":15, "email":"sc@example.com", "status":"alive
!→",
"locked":false}
{"name":"Julian Barriman", "salary":15, "email":"jb@example.com",
"status":"on_vacation", "locked":"no"}
138
Exercise: Change validation level
Now that the employees validator has been updated, some of the already-inserted documents are not valid. This can
be a problem when, for example, just updating an employee’s salary.
• Try to update the salary of “Cedric E. Oxford”. You should get a validation error.
• Now, change the validation level of the employees collection to allow updates of existing invalid documents,
but still enforce validation of inserted documents and existing valid documents.
Now that we’ve explored document validation in the Mongo shell, let’s explore how easy it is to do with MongoDB
Compass.
• Click below for an overview of MongoDB Compass.
/modules/compass
• Connect to your local database with Compass
• Open the employees collection, and view the validation rules.
139
Exercise: Change validation action
In some cases, it may be desirable to simply log invalid actions, rather than prevent them.
• Change the validation action of the employees collection to reflect this behavior
Learning Objectives
Upon completing this module students should understand common design for modeling:
• One-to-One Relationships
• One-to-Many Relationships
• Many-to-Many Relationships
• Tree Structures
One-to-One Relationship
One-to-One: Linking
db.books.findOne()
{
_id: 1,
title: "The Great Gatsby",
slug: "9781857150193-the-great-gatsby",
author: 1,
// Other fields follow...
}
db.authors.findOne({ _id: 1 })
{
_id: 1,
firstName: "F. Scott",
lastName: "Fitzgerald"
book: 1,
}
140
One-to-One: Embedding
db.books.findOne()
{
_id: 1,
title: "The Great Gatsby",
slug: "9781857150193-the-great-gatsby",
author: {
firstName: "F. Scott",
lastName: "Fitzgerald"
}
// Other fields follow...
}
One-to-Many Relationship
db.authors.findOne()
{
_id: 1,
firstName: "F. Scott",
lastName: "Fitzgerald",
books: [1, 3, 20]
}
db.books.find({ author: 1 })
{
_id: 1,
title: "The Great Gatsby",
slug: "9781857150193-the-great-gatsby",
author: 1,
// Other fields follow...
}
{
_id: 3,
title: "This Side of Paradise",
slug: "9780679447238-this-side-of-paradise",
author: 1,
// Other fields follow...
}
141
One-to-Many: Array of Documents
db.authors.findOne()
{
_id: 1,
firstName: "F. Scott",
lastName: "Fitzgerald",
books: [
{ _id: 1, title: "The Great Gatsby" },
{ _id: 3, title: "This Side of Paradise" }
]
// Other fields follow...
}
Many-to-Many Relationship
db.books.findOne()
{
_id: 1,
title: "The Great Gatsby",
authors: [1, 5]
// Other fields follow...
}
db.authors.findOne()
{
_id: 1,
firstName: "F. Scott",
lastName: "Fitzgerald",
books: [1, 3, 20]
}
142
Many-to-Many: Array of IDs on One Side
db.books.findOne()
{
_id: 1,
title: "The Great Gatsby",
authors: [1, 5]
// Other fields follow...
}
book = db.books.findOne(
{ title: "The Great Gatsby" },
{ authors: 1 }
);
143
Tree Structures
db.subjects.findOne()
{
_id: 1,
name: "American Literature",
sub_category: {
name: "1920s",
sub_category: { name: "Jazz Age" }
}
}
db.subjects.find()
{ _id: "American Literature" }
{ _id : "1920s",
ancestors: ["American Literature"],
parent: "American Literature"
}
144
Find Sub-Categories
{
_id: "Jazz Age in New York",
ancestors: ["American Literature", "1920s", "Jazz Age"],
parent: "Jazz Age"
}
Summary
145
8 Replica Sets
Introduction to Replica Sets (page 146) An introduction to replication and replica sets
Elections in Replica Sets (page 149) The process of electing a new primary (automated failover) in replica sets
Replica Set Roles and Configuration (page 154) Configuring replica set members for common use cases
The Oplog: Statement Based Replication (page 155) The process of replicating data from one node of a replica set
to another
Lab: Working with the Oplog (page 157) A brief lab that illustrates how the oplog works
Write Concern (page 159) Balancing performance and durability of writes
Read Concern (page 164) Settings to minimize/prevent stale and dirty reads
Read Preference (page 171) Configuring clients to read from specific members of a replica set
Lab: Setting up a Replica Set (page 172) Launching members, configuring, and initiating a replica set
Learning Objectives
• High Availability
• Disaster Recovery
• Functional Segregation
146
Disaster Recovery (DR)
Functional Segregation
147
Replica Sets
Client Application
Driver
Writes Reads
Primary
Primary
n Re
a tio pli
ca
c
pli tio
Re n
Secondary Secondary
Primary Server
Secondaries
Heartbeats
Primary
at
He
e
a
tb
rtb
ar
ea
He
Heartbeat
Secondary Secondary
148
The Oplog
• The operations log, or oplog, is a special capped collection that is the basis for replication.
• The oplog maintains one entry for each document affected by every write operation.
• Secondaries copy operations from the oplog of their sync source.
Initial Sync
• Occurs when a new server is added to a replica set, or we erase the underlying data of an existing server (–dbpath)
• All existing collections except the local collection are copied
• As of MongoDB >= 3.4, all indexes are built while data is copied
• As of MongoDB >= 3.4, initial sync is more resilient to intermittent network failure/degradation
Learning Objectives
149
Calling Elections
Primary
Election for New Primary
Secondary Heartbeat Secondary
Priority
150
Optime
• Optime: Operation time, which is the timestamp of the last operation the member applied from the oplog.
• To be elected primary, a member must have the most recent optime.
• Only optimes of visible members are compared.
Connections
replSetStepDown Behavior
• Primary will attempt to terminate long running operations before stepping down.
• Primary will wait for electable secondary to catch up before stepping down.
• “secondaryCatchUpPeriodSecs” can be specified to limit the amount of time the primary will wait for a sec-
ondary to catch up before the primary steps down.
151
Scenario B: 3 Data Nodes in 2 DCs
Which member will become primary following this type of network partition?
152
Scenario D: 5 Nodes in 2 DCs
The following is similar to Scenario C, but with the addition of an arbiter in Data Center 1. What happens here?
What happens here if any one of the nodes/DCs fail? What about recovery time?
153
8.3 Replica Set Roles and Configuration
Learning Objectives
Configuration
154
Data Center 2
Learning Objectives
Binary Replication
155
Tradeoffs
• The good thing is that figuring out where to write, etc. is very efficient.
• But we must have a byte-for-byte match of our data files on the primary and secondaries.
• The problem is that this couples our replica set members in ways that are inflexible.
• Binary replication may also replicate disk corruption.
Statement-Based Replication
Example
db.foo.deleteMany({ age : 30 })
This will be represented in the oplog with records such as the following:
• One statement per document affected by each write: insert, update, or delete.
• Provides a level of abstraction that enables independence among the members of a replica set:
– With regard to MongoDB version.
– In terms of how data is stored on disk.
– Freedom to do maintenance without the need to bring the entire set down.
156
Operations in the Oplog are Idempotent
mkdir -p /data/db
mongo --nodb
Create a replica set by running the following command in the mongo shell.
157
ReplSetTest
• ReplSetTest is useful for experimenting with replica sets as a means of hands-on learning.
• It should never be used in production. Never.
• The command above will create a replica set with three members.
• It does not start the mongods, however.
• You will need to issue additional commands to do that.
replicaSet.startSet()
Issue the following command to configure replication for these mongods. You will need to issue this while output is
flying by in the shell.
replicaSet.initiate()
Status Check
• You should now have three mongods running on ports 20000, 20001, and 20002.
• You will see log statements from all three printing in the current shell.
• To complete the rest of the exercise, open a new shell.
use store
158
Perform an Update
Issue the following update. We might issue this update after a purchase of three items.
The oplog is a capped collection in the local database of each replica set member:
use local
db.oplog.rs.find()
{ "ts" : Timestamp(1406944987, 1), "h" : NumberLong(0), "v" : 2, "op" : "n",
"ns" : "", "o" : { "msg" : "initiating set" } }
...
{ "ts" : Timestamp(1406945076, 1), "h" : NumberLong("-9144645443320713428"),
"v" : 2, "op" : "u", "ns" : "store.products", "o2" : { "_id" : 2 },
"o" : { "$set" : { "inStock" : 19 } } }
{ "ts" : Timestamp(1406945076, 2), "h" : NumberLong("-7873096834441143322"),
"v" : 2, "op" : "u", "ns" : "store.products", "o2" : { "_id" : 5 },
"o" : { "$set" : { "inStock" : 49 } } }
Learning Objectives
159
Answer
160
Write Concern: { w : 1 }
Driver
{ w: 1 }
writeConcern:
Response
Write
Primary
mongod
Apply
Example: { w : 1 }
Write Concern: { w : 2 }
Driver
{ w: 2 }
writeConcern:
Response
Write
Primary
Primary
Replicate
Replicate
Apply
Secondary
Apply
Secondary
Example: { w : 2 }
161
Other Write Concerns
Example: { w : "majority" }
Suppose you have a replica set with 7 data nodes. Your application has critical inserts for which you do not want
rollbacks to happen. Secondaries may be taken down from to time for maintenance, leaving you with a potential 4
server replica set. Which write concern is best suited for these critical inserts?
• {w:1}
• {w:2}
• {w:3}
• {w:4}
• { w : “majority” }
33 http://docs.mongodb.org/manual/reference/replica-configuration/#rsconf.writeConcernMajorityJournalDefault
162
Further Reading
See Write Concern Reference34 for more details on write concern configurations, including setting timeouts and iden-
tifying specific replica set members that must acknowledge writes (i.e. tag sets35 ).
34 http://docs.mongodb.org/manual/reference/write-concern
35 http://docs.mongodb.org/manual/tutorial/configure-replica-set-tag-sets/#replica-set-configuration-tag-sets
163
8.7 Read Concern
Learning Objectives
Read Concerns
• Local: Default
• Majority: Added in MongoDB 3.2, requires WiredTiger and election Protocol Version 1 (PV1)
• Linearizable: Added in MongoDB 3.4, works with MMAP or WiredTiger
Local
Majority
Linearizable
164
Example: Read Concern Level Majority
A new version of the document (W1) is written by App1, and the write is propagated to the secondaries
Once the write is journaled on a majority of nodes, App1 will get a confirmation of the commit on a majority (C1) of
nodes.
165
If App2 reads the document with a read concern level majority at any time before C1, it will get the value R0
However after the committed state (C1), it will get the new value for the document (R1)
• Reads that do not reflect the most recent writes are stale
• These can occur when reading from secondaries
• Systems with stale reads are “eventually consistent”
• Reading from the primary minimizes odds of stale reads
– They can still occur in rare cases
166
Stale Reads on a Primary
• In unusual circumstances, two members may simultaneously believe that they are the primary
– One can acknowledge { w : "majority" } writes
Quiz
167
Read Concern and Read Preference
* --enableMajorityReadConcern
– Specify the read concern level to the driver
• You should:
– Use write concern { w : "majority" }
– Otherwise, an application may not see its own writes
./launch_replset_for_majority_read_concern.sh
168
Example: Using Read Concern (Continued)
#!/usr/bin/env bash
echo ’db.testCollection.drop();’ | mongo --port 27017 readConcernTest; wait
echo ’db.testCollection.insertOne({message: "probably on a secondary." } );’ |
mongo --port 27017 readConcernTest; wait
echo ’db.fsyncLock()’ | mongo --port 27018; wait
echo ’db.fsyncLock()’ | mongo --port 27019; wait
echo ’db.testCollection.insertOne( { message : "Only on primary." } );’ |
mongo --port 27017 readConcernTest; wait
echo ’db.testCollection.find().readConcern("majority");’ |
mongo --port 27017 readConcernTest; wait
echo ’db.testCollection.find(); // read concern "local"’ |
mongo --port 27017 readConcernTest; wait
echo ’db.fsyncUnlock()’ | mongo --port 27018; wait
echo ’db.fsyncUnlock()’ | mongo --port 27019; wait
echo ’db.testCollection.drop();’ | mongo --port 27017 readConcernTest
Quiz
What must you do in order to make the database return documents that have been replicated to a majority of the replica
set members?
169
Replication Protocol Version 1 (continued)
Quiz
Further Reading
36 http://docs.mongodb.org/manual/reference/read-concern
170
8.8 Read Preference
• Read preference allows you to specify the nodes in a replica set to read from.
• Clients only read from the primary by default.
• There are some situations in which a client may want to read from:
– Any secondary
– A specific secondary
– A specific type of secondary
• Only read from a secondary if you can tolerate possibly stale data, as not all writes might have replicated.
Use Cases
• In general, do not read from secondaries to provide extra capacity for reads.
• Sharding37 increases read and write capacity by distributing operations across a group of machines.
• Sharding is a better strategy for adding capacity.
MongoDB drivers support the following read preferences. Note that hidden nodes will never be read from when
connected via the replica set.
• primary: Default. All operations read from the primary.
• primaryPreferred: Read from the primary but if it is unavailable, read from secondary members.
• secondary: All operations read from the secondary members of the replica set.
• secondaryPreferred: Read from secondary members but if no secondaries are available, read from the primary.
• nearest: Read from member of the replica set with the least network latency, regardless of the member’s type.
37 http://docs.mongodb.org/manual/sharding
171
Tag Sets
conf = rs.conf()
conf.members[0].tags = { dc : "east", use : "production" }
conf.members[1].tags = { dc : "east", use : "reporting" }
conf.members[2].tags = { use : "production" }
rs.reconfig(conf)
Overview
• In this exercise we will setup a 3 data node replica set on a single machine.
• In production, each node should be run on a dedicated host:
– To avoid any potential resource contention
– To provide isolation against server failure
Since we will be running all nodes on a single machine, make sure each has its own data directory.
On Linux or Mac OS, run the following in the terminal to create the 3 directories ~/data/rs1, ~/data/rs2, and
~/data/rs3:
mkdir -p ~/data/rs{1,2,3}
172
Launch Each Member
Now start 3 instances of mongod in the foreground so that it is easier to observe and shutdown.
On Linux or Mac OS, run each of the following commands in its own terminal window:
On Windows, run each of the following commands in its own Command Prompt or PowerShell window:
Status
• Connect to the one of the MongoDB instances with the mongo shell.
• To do so run the following command in the terminal, Command Prompt, or PowerShell:
rs.initiate()
// wait a few seconds
rs.add (’<HOSTNAME>:27018’)
rs.addArb(’<HOSTNAME>:27019’)
173
Problems That May Occur When Initializing the Replica Set
> conf = {
_id: "<REPLICA-SET-NAME>",
members: [
{ _id : 0, host : "<HOSTNAME>:27017"},
{ _id : 1, host : "<HOSTNAME>:27018"},
{ _id : 2, host : "<HOSTNAME>:27019",
"arbiterOnly" : true},
]
}
> rs.initiate(conf)
While still connected to the primary (port 27017) with mongo shell, insert a simple test document:
db.testcol.insert({ a: 1 })
db.testcol.count()
exit // Or Ctrl-d
rs.slaveOk()
db.testcol.find()
use local
db.oplog.rs.find()
174
Changing Replica Set Configuration
To change the replica set configuration, first connect to the primary via mongo shell:
Let’s raise the priority of one of the secondaries. Assuming it is the 2nd node (e.g. on port 27018):
cfg = rs.conf()
cfg["members"][1]["priority"] = 10
rs.reconfig(cfg)
You will see errors like the following, which are expected:
rs.conf()
rs.status()
Further Reading
• Replica Configuration38
• Replica States39
38 http://docs.mongodb.org/manual/reference/replica-configuration/
39 http://docs.mongodb.org/manual/reference/replica-states/
175
9 Sharding
Learning Objectives
176
Vertical Scaling
Sharding Overview
177
When to Shard
• If you have more data than one machine can hold on its drives
• If your application is write heavy and you are experiencing too much latency.
• If your working set outgrows the memory you can allocate to a single machine.
Collection1
1TB
Sharding Concepts
Shard Key
178
Shard Key Ranges
Driver
Read
{ a: "z1" }
Results
mongos
a: "z1"
Chunks
179
Sharded Cluster Architecture
Router Router
(mongos) (mongos)
2 or more Routers
2 or more Shards
Shard Shard
(replica set) (replica set)
Mongos
• A mongos is responsible for accepting requests and returning results to an application driver.
• In a sharded cluster, nearly all operations go through a mongos.
• A sharded cluster can have as many mongos routers as required.
• It is typical for each application server to have one mongos.
• Always use more than one mongos to avoid a single point of failure.
Config Servers
ver ver
Router Router
(mongos) (mongos)
ore Routers
vers
ver
ver
ver
ore Shards
Shard Shard
(r (r
180
Config Server Hardware Requirements
Possible Imbalance?
• Depending on how you configure sharding, data can become unbalanced on your sharded cluster.
– Some shards might receive more inserts than others.
– Some shards might have documents that grow more than those in other shards.
• This may result in too much load on a single shard.
– Reads and writes
– Disk activity
• This would defeat the purpose of sharding.
Balancing Shards
• If a chunk grows too large MongoDB will split it into two chunks.
• The MongoDB balancer keeps chunks distributed across shards in equal numbers.
• However, a balanced sharded cluster depends on a good shard key.
181
With a Bad Shard Key
More Specifically
Cardinality
182
Non-Monotonic
• As the number of shards increases, the number of servers in your deployment increases.
• This increases the probability that one server will fail on any given day.
• With redundancy built into each shard you can mitigate this risk.
Learning Objectives
183
Chunks in a Newly Sharded Collection
• The range of a chunk is defined by the shard key values of the documents the chunk contains.
• When a collection is sharded it starts with just one chunk.
• The first chunk for a collection will have the range:
{ $minKey : 1 } to { $maxKey : 1 }
• All shard key values from the smallest possible to the largest fall in this chunk’s range.
Chunk Splits
Shard A
32.1 MB
it
Spl
64.2 MB
Spl
it
32.1 MB
Pre-Splitting Chunks
• You may pre-split data before loading data into a sharded cluster.
• Pre-splitting is useful if:
– You plan to do a large data import early on
– You expect a heavy initial server load and want to ensure writes are distributed
• A balancing round is initiated by the balancer process on the primary config server.
• This happens when the difference in the number of chunks between two shards becomes to large.
• Specifically, the difference between the shard with the most chunks and the shard with the fewest.
• A balancing round starts when the imbalance reaches:
– 2 when the cluster has < 20 chunks
– 4 when the cluster has 20-79 chunks
– 8 when the cluster has 80+ chunks
184
Balancing is Resource Intensive
• Chunk migration requires copying all the data in the chunk from one shard to another.
• Each individual shard can be involved in one migration at a time. Parallel migrations can occur for each shard
migration pair (source + destination).
• The amount of possible parallel chunk migrations for n shards is n/2 rounded down.
• MongoDB creates splits only after an insert operation.
• For these reasons, it is possible to define a balancing window to ensure the balancer will only run during sched-
uled times.
1. The balancer process sends the moveChunk command to the source shard.
2. The source shard continues to process reads/writes for that chunk during the migration.
3. The destination shard requests documents in the chunk and begins receiving copies.
4. After receiving all documents, the destination shard receives any changes to the chunk.
5. Then the destination shard tells the config db that it has the chunk.
6. The destination shard will now handle all reads/writes.
7. The source shard deletes its copy of the chunk.
Learning Objectives
185
Zones - Overview
Example: DateTime
• Documents older than one year need to be kept, but are rarely used.
• You set a part of the shard key as the ISODate of document creation.
• Add shards to the LTS zone.
• These shards can be on cheaper, slower machines.
• Invest in high-performance servers for more frequently accessed data.
Example: Location
186
Zones - Caveats
• Because tagged chunks will only be on certain servers, if you tag more than those servers can handle, you’ll
have a problem.
– You’re not only worrying about your overall server load, you’re worrying about server load for each of
your tags.
• Your chunks will evenly distribute themselves across the available zones. You cannot control things more fine
grained than your tags.
Learning Objectives
• Three shards:
1. A replica set on ports 27107, 27108, 27109
2. A replica set on ports 27117, 27118, 27119
3. A replica set on ports 27127, 27128, 27129
• Three config servers on ports 27217, 27218, 27219
• Two mongos servers at ports 27017 and 27018
187
Build Our Data Directories
On Linux or MacOS, run the following in the terminal to create the data directories we’ll need.
mkdir -p ~/data/cluster/config/{c0,c1,c2}
mkdir -p ~/data/cluster/shard0/{m0,m1,arb}
mkdir -p ~/data/cluster/shard1/{m0,m1,arb}
mkdir -p ~/data/cluster/shard2/{m0,m1,arb}
mkdir -p ~/data/cluster/{s0,s1}
188
Spin Up a Second Replica Set (Linux/MacOS)
189
rs.add (’$HOSTNAME:27128’);\
rs.addArb(’$HOSTNAME:27129’)"
Status Check
190
Launch Config Servers (Linux/MacOS)
mongod
--dbpath ~/data/cluster/config/c0 \
--replSet csrs \
--logpath ~/data/cluster/config/c0/mongod.log \
--fork --port 27217 --configsvr
mongod
--dbpath ~/data/cluster/config/c1 \
--replSet csrs \
--logpath ~/data/cluster/config/c1/mongod.log \
--fork --port 27218 --configsvr
mongod
--dbpath ~/data/cluster/config/c2 \
--replSet csrs \
--logpath ~/data/cluster/config/c2/mongod.log \
--fork --port 27219 --configsvr
191
Launch the Mongos Processes (Linux/MacOS)
Now our mongos’s. We need to tell them about our config servers.
Now our mongos’s. We need to tell them about our config servers.
configseedlist="csrs/$HOSTNAME:27217,$HOSTNAME:27218,$HOSTNAME:27219"
mongos --logpath c:\data\cluster\s0\mongos.log --port 27017 \
--configdb $configseedlist
Note: Instead of doing this through a bash (or other) shell command, you may prefer to launch a mongo shell and
issue each command individually.
Enable sharding for the test database, shard a collection, and insert some documents.
sh.enableSharding("test")
sh.shardCollection("test.testcol", { a : 1, b : 1 } )
192
Observe What Happens
sh.status()
193
10 Application Engineering
What is MongoMart
MongoMart is an on-line store for buying MongoDB merchandise. We’ll use this application to learn more about
interacting with MongoDB through the driver.
• View Items
• View Items by Category
• Text Search
• View Item Details
• Shopping Cart
View Items
• http://localhost:8080
• Pagination and page numbers
• Click on a category
• http://localhost:8080/?category=Apparel
• Pagination and page numbers
• “All” is listed as a category, to return to all items listing
194
Text Search
• http://localhost:8080/search?query=shirt
• Search for any word or phrase in item title, description or slogan
• Pagination
• http://localhost:8080/item?id=1
• Star rating based on reviews
• Add a review
• Related items
• Add item to cart
Shopping Cart
• http://localhost:8080/cart
• Adding an item multiple times increments quantity by 1
• Change quantity of any item
• Changing quantity to 0 removes item
Introduction
• In this lab, we’ll set up and optimize an application called MongoMart. MongoMart is an on-line store for
buying MongoDB merchandise.
• Import the “item” collection to a standalone MongoDB server (without replication) as noted in the README.md
file of the /data directory of MongoMart
• Become familiar with the structure of the Java application in /java/src/main/java/mongomart/
• Modify the MongoMart.java class to properly connect to your local database instance
195
Lab: Populate All Necessary Database Queries
• After running the MongoMart.java class, navigate to “localhost:8080” to view the application
• Initially, all data is static and the application does not query the database
• Modify the ItemDao.java and CartDao.java classes to ensure all information comes from the database (do not
modify the method return types or parameters)
• It is important to use replication for production MongoDB instances, however, Lab 1 advised us to use a stan-
dalone server.
• Convert your local standalone mongod instance to a three node replica set named “shard1”
• Modify MongoMart’s MongoDB connection string to include at least two nodes from the replica set
• Modify your application’s write concern to MAJORITY for all writes to the “cart” collection, any writes to the
“item” collection should continue using the default write concern of W:1
Introduction
• Import the “item” collection to a standalone MongoDB server (without replication) as noted in the README.md
file of the /data directory of MongoMart
• Become familiar with the structure of the Python application in /
• Start the application by running “python mongomart.py”, stop it by using ctrl-c
• Modify the mongomart.py file to properly connect to your local database instance
196
Lab: Use a Local Replica Set with a Write Concern
• It is important to use replication for production MongoDB instances, however, Lab 1 advised us to use a stan-
dalone server.
• Convert your local standalone mongod instance to a three node replica set named “rs0”
• Modify MongoMart’s MongoDB connection string to include at least two nodes from the replica set
• Modify your application’s write concern to MAJORITY for all writes to the database
197
11 Security
Learning Objectives
198
Security Mechanisms
Authentication Options
• Community
– Challenge/response authentication using SCRAM-SHA-1 (username & password)
– X.509 Authentication (using X.509 Certificates)
• Enterprise
– Kerberos
– LDAP
• Predefined roles
• Custom roles
• LDAP authorization (MongoDB Enterprise)
– Query LDAP server for groups to which a user belongs.
– Distinguished names (DN) are mapped to roles on the admin database.
– Requires external authentication (x.509, LDAP, or Kerberos).
Transport Encryption
• TLS/SSL
– May use certificates signed by a certificate authority or self-signed.
• FIPS (MongoDB Enterprise)
199
Network Exposure Options
Security Flow
11.2 Authorization
Learning Objectives
200
Authorization vs Authentication
Authorization Basics
What is a resource?
• Databases?
• Collections?
• Documents?
• Users?
• Nodes?
• Shard?
• Replica Set?
201
Authorization Resources
• Databases
• Collections
• Cluster
Cluster Resources
ver ver
Router Router
(mongos) (mongos)
ore Routers
vers
ver
ver
ver
ore Shards
Shard Shard
(r (r
Types of Actions
202
Specific Actions of Each Type
Authorization Privileges
Action: find
Privilege:
{
resource: {"db": "yourdb", "collection": "mycollection"},
actions: ["find"]
}
Authorization Roles
Built-in Roles
203
Built-in Roles
use admin
db.createUser(
{
user: "myUser",
pwd: "$up3r$3cr7",
roles: [
{role: "readAnyDatabase", db: "admin"},
{role: "dbOwner", db: "superdb"},
{role: "readWrite", db: "yourdb"}
]
}
)
Built-in Roles
use admin
db.grantRolesToUser(
"reportsUser",
[
{ role: "read", db: "accounts" }
]
)
User-defined Roles
use admin
db.createRole({
role: "insertAndFindOnlyMyDB",
privileges: [
{resource: { db: "myDB", collection: "" }, actions: ["insert", "find"]}
],
roles: []})
204
Role Privileges
To check the privileges of any particular role we can get that information using the getRole method:
LDAP Authorization
[
{
match: "(.+)@ENGINEERING",
substitution: "cn={0},ou=engineering,dc=example,dc=com"
}, {
match: "(.+)@DBA",
substitution:"cn={0},ou=dba,dc=example,dc=com"
}
]
205
Mongoldap
mongoldap can be used to test configurations between MongoDB and an LDAP server
$ mongoldap -f mongod.conf \
--user "uid=alice,ou=Users,dc=example,dc=com" \
--password secret
Premise
db.createUser(
{
user: "securityofficer",
pwd: "doughnuts",
customData: { notes: ["admin", "the person that adds other persons"] },
roles: [
{ role: "userAdminAnyDatabase", db: "admin" }
]
})
206
Create DBA user
db.createUser(
{
user: "dba",
pwd: "i+love+indexes",
customData: { notes: ["admin", "the person that admins databases"] },
roles: [
{ role: "dbAdmin", db: "X" }
]
})
If we want to make sure this DBA can administer all databases of the system, which role(s) should he have? See the
MongoDB documentation42 .
Cluster administration is generally an operational role that differs from DBA in the sense that is more focussed on the
deployment and cluster node management.
For a team managing a cluster, what roles enable individuals to do the following?
• Add and remove replica nodes
• Manage shards
• Do backups
• Cannot read data from any application database
Premise
207
Define Privileges
Create Role
• Given the privileges we just defined, we now need to create this role specific to database brands.
• The name of this role should be carlover
• What command do we need to issue?
We now want to grant this role to the user named ilikecars on the database brands.
use brands;
db.createUser(
{
user: "ilikecars",
pwd: "ferrari",
customData: {notes: ["application user"]},
roles: [
{role: "carlover", db: "brands"}
]
})
208
Revoke Role
• Let’s assume that the role carlover is no longer valid for user ilikecars.
• How do we revoke this role?
11.5 Authentication
Learning Objectives
Authentication
Authentication Mechanisms
209
Internal Authentication
For internal authentication purposes (mechanism used by replica sets and sharded clusters) MongoDB relies on:
• Keyfiles
– Shared password file used by replica set members
– Hexadecimal value of 6 to 1024 chars length
• X509 Certificates
To get started we just need to make sure we are launching our mongod instances with the --auth parameter.
For any connections to be established to this mongod instance, the system will require a username and password.
mongo -u user -p
!→
Premise
It is time for us to get started setting up our first MongoDB instance with authentication enabled!
Launch mongod
mkdir /data/secure_instance_dbpath
mongod --dbpath /data/secure_instance_dbpath --port 28000
At this point there is nothing special about this setup. It is just an ordinary mongod instance ready to receive connec-
tions.
210
Root level user
use admin
db.createUser( {
user: "maestro",
pwd: "maestro+rules",
customData: { information_field: "information value" },
roles: [ {role: "root", db: "admin" } ]
} )
Enable Authentication
11.7 Auditing
Learning Objectives
Auditing
• MongoDB Enterprise includes an auditing capability for mongod and mongos instances.
• The auditing facility allows administrators and users to track system activity
• Important for deployments with multiple users and applications.
211
Audit Events
Once enabled, the auditing system can record the following operations:
• Schema
• Replica set and sharded cluster
• Authentication and authorization
• CRUD operations (DML, off by default)
Auditing Configuration
Auditing Message
The audit facility will launch a message every time an auditable event occurs:
{
atype: <String>,
ts : { "$date": <timestamp> },
local: { ip: <String>, port: <int> },
remote: { ip: <String>, port: <int> },
users : [ { user: <String>, db: <String> }, ... ],
roles: [ { role: <String>, db: <String> }, ... ],
param: <document>,
result: <int>
}
212
Auditing Configuration
If we want to configure our audit system to generate a JSON file we would need express the following command:
11.8 Encryption
Learning Objectives
Encryption
Network Encryption
• MongoDB enables TLS/SSL for transport layer encryption of traffic between nodes in a cluster.
• Three different network architecture options are available:
– Encryption of application traffic connections
– Full encryption of all connections
– Mixed encryption between nodes
213
Native Encryption
214
11.9 Log Redaction
Learning Objectives
db.adminCommand({
setParameter: 1, redactClientLogData: true
})
For this exercise we’re going to start a mongod process with verbose logging enabled and then enable log redaction
• Start a mongod with verbose logging enabled
mkdir -p data/db
mongod -v --dbpath data/db --logpath data/mongod.log --logappend --port 31000 --fork
tail -f data/mongod.log
215
Exercise: Enable Log Redaction (cont)
• In the log output, you should see something similar to the following:
Premise
Security and Replication are two aspects that are often neglected during the Development phase to favor usability and
faster development.
These are also important aspects to take in consideration for your Production environments, since you probably don’t
want to have your production environment Unsecured and without High Availability!
This lab is to get fully acquainted with all necessary steps to create a secured replica set using the keyfile for cluster
authentication mode
216
Setup Secured Replica Set
Instantiate mongod
$ pwd
/data
$ mkdir -p /data/secure_replset/{1,2,3}; cd secure_replset/1
systemLog:
destination: file
path: "/data/secure_replset/1/mongod.log"
logAppend: true
storage:
dbPath: "/data/secure_replset/1"
wiredTiger:
engineConfig:
cacheSizeGB: 1
net:
port: 28001
processManagement:
fork: true
# setParameter:
# enableLocalhostAuthBypass: false
# security:
# keyFile: /data/secure_replset/1/mongodb-keyfile
43 https://docs.mongodb.org/manual/reference/configuration-options/
44 https://github.com/thatnerd/work-public/blob/master/mongodb_trainings/secure_replset_config.yaml
217
Instantiate mongod (cont’d)
After defining the basic configuration we just need to call mongod passing the configuration file.
mongod -f mongod.conf
We then need to create a clusterAdmin user to enable management of our replica set.
> db.createUser(
{
user: "pivot",
pwd: "i+like+nodes",
roles: [
{ role: "clusterAdmin", db: "admin" }
]
})
Generate a keyfile
218
Add keyfile to the configuration file
Now that we have the keyfile generated it’s time to add that information to our configuration file. Just un-comment the
last few lines.
systemLog:
destination: file
path: "/data/secure_replset/1/mongod.log"
logAppend: true
storage:
dbPath: "/data/secure_replset/1"
net:
port: 28001
processManagement:
fork: true
setParameter:
enableLocalhostAuthBypass: false
security:
keyFile: /data/secure_replset/1/mongodb-keyfile
Premise
• Authentication and authorization with an external service (like LDAP) is an important functionality for large
organizations that rely on centralized user management tools.
• This lab is designed to get you familiar with the procedure to run a mongod with authentication and authoriza-
tion enabled with an external LDAP service.
219
Test Connection to LDAP (cont’d)
• Your goal is to fill in the following configuration file and get mongoldap to successfully talk to the LDAP
server with the following command:
...
security:
authorization: "enabled"
ldap:
servers: "XXXXXXXXXXXXXX:8389"
authz:
queryTemplate: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
userToDNMapping: ’[{match: "XXXX", substitution:
!→"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"}]’
transportSecurity: "none"
bind:
method: "simple"
setParameter:
authenticationMechanisms: PLAIN
• Once you’ve successfully connected to LDAP with mongoldap you should be able to use the same config file
with mongod.
• From here you should be able to authenticate with alice and secret.
• After successfully authenticating with LDAP, you’ll need to take advantage of the localhost exception to enable
authorization with LDAP.
• Create a role that allows anyone who is apart of the cn=admins,ou=Users,dc=mongodb,dc=com LDAP group
to be able to manage users (e.g., inheriting userAdminAnyDatabase).
• To confirm that you’ve successfully setup authorization the following command should execute without error if
you’re authenticated as alice since she’s apart of the group.
220
11.12 Lab: Security Workshop
Learning Objectives
Introduction
In this workshop, attendees will install and configure a secure replica set on servers running in AWS.
• We are going to secure the backend communications using TLS/SSL
• Enable authorization on the backend side
• Encrypt the storage layer
• Make sure that there are no “leaks” of information
221
Exercise: Accessing your instances from Linux or Mac
• Enable the keychain and ssh into node1, propagating your credentials
ssh-add -K AdvancedAdministrator.pem
ssh -i AdvancedAdministrator.pem -A centos@54.235.1.1
ssh -A node2
cat /etc/hosts
ls /share/downloads
ls /etc/ssl/mongodb
222
Starting MongoDB and configuring the replica set (cont)
• Installation
• Configure the 3 nodes as a replica set named SECURED, change bindIp to the 10.0.0.X address, plus 127.0.0.1
• Use /mongod-data/appdb for your dbpath
• All other defaults are fine for now
storage:
dbPath: /mongod-data/appdb/
...
replication:
replSetName: SECURED
net:
bindIp: 10.0.0.101,127.0.0.1
cfg = {
_id: "SECURED",
version: 1,
members: [
{_id: 0, host: "node1:27017"},
{_id: 1, host: "node2:27017"},
{_id: 2, host: "node3:27017"}
]
}
rs.initiate(cfg)
rs.status()
223
Exercise: Check the Connection to MongoDB
It’s time to connect our client application. Install the application on node4
cd ~
tar xzvf /share/downloads/apps/security_lab.tgz
cd mongo-messenger
npm install
npm start
• The connection string used by the application is in message.js and looks like this:
Once we have our sample application up an running is time to start securing the system.
You should start by enabling MongoDB authentication47
To do this, you will have to decide:
• Which authentication mechanism to use
• Which authorization support will you use
• Set of users required to operate this system
47 https://docs.mongodb.com/manual/core/authentication/
224
Exercise: Enable SSL between the nodes
• We restricted “bindIp” to a local network interface, however if this was an outside address, it would not be good
enough
• Let’s ensure we limit the connections to a list of nodes we control
– Let’s use SSL certificates
– As a reminder, they are in /etc/ssl/mongodb/
• You will need to combine the mongomessenger.key and mongomessenger.pem files together to
quickly test connection in the mongo shell.
• After you have tested SSL from the mongo shell, update the client’s connection info to connect over SSL48 .
• Use mongomessenger.key, mongomessenger.pem, and messenger-CA.pem for your client con-
nection.
To fully secure our MongoDB deployment we need to consider the actual MongoDB instance files.
Your instructor has some scripts that will enable him to have a peek into the your collection and indexes data files.
Don’t let them do so!!!
225
Auditing
At this point we have a secured MongoDB deployment hardened against outside attacks, and used Role-Based Access
Control to limit the access of users.
• The final step is to enable auditing, giving us a clear record of who performed an auditable action.
• Enable auditing for all operations, to include CRUD operations, for your mongo-messenger user
• Output the log file in JSON format
• Output the log file to /mongod-data/audit/SECURED
• There are many filter options49
Putting it together
storage:
dbPath: /mongod-data/appdb/
...
net:
ssl:
mode: requireSSL
PEMKeyFile: /etc/ssl/mongodb/node1.pem
CAFile: /etc/ssl/mongodb/ca.pem
security:
clusterAuthMode: x509
enableEncryption : true
encryptionKeyFile : /etc/ssl/mongodb/mongodb-keyfile
redactClientLogData: true
auditLog:
destination: "file"
format: "JSON"
path: /mongod-data/audit/SECURED/audit.json
filter: ’{ users: { user: "mongo-messenger", db: "security-lab" } }’
49 https://docs.mongodb.com/manual/tutorial/configure-audit-filters/
226
Summary
What we did:
• Enabled basic authorization
• Used SSL certificates
• Encrypted the database at rest
• Redacted the mongod logs
• Configured auditing for a specific user
227
12 Views
Learning Objectives
What a View is
228
How to create and drop a view
Dropping Views
db.contact_info.drop()
229
12.2 Lab: Vertical Views
It is useful to create vertical views to give us a lens into a subset of our overall data.
• Start by importing the necessary data if you have not already.
To help you verify your work, there are 404816 entries in this dataset.
db.companyComplaintsInNY.findOne()
{
"complaint_id" : 1416985,
"product" : "Debt collection",
"sub-product" : "",
"issue" : "Cont’d attempts collect debt not owed",
"sub-issue" : "Debt is not mine",
"state" : "NY",
"zip_code" : 11360,
"submitted_via" : "Web",
"date_received" : ISODate("2015-06-11T04:00:00Z"),
"date_sent_to_company" : ISODate("2015-06-11T04:00:00Z"),
"company" : "Transworld Systems Inc.",
"company_response" : "In progress",
"timely_response" : "Yes",
"consumer_disputed" : ""
}
230
Exercise: Vertical View Creation Validation Instructions
db.complaints.insert({
"complaint_id" : 987654,
"product" : "Food and Beverage",
"sub-product" : "Coffee",
"issue" : "Coffee is too hot",
"sub-issue" : "",
"state" : "NY",
"zip_code" : 11360,
"submitted_via" : "Web",
"date_received" : new Date(),
"date_sent_to_company" : "pending",
"company" : "CoffeeMerks",
"company_response" : "",
"timely_response" : "",
"consumer_disputed" : ""
})
Horizontal views allow us to provide a selective set of fields of the underlying collection of documents for efficiency
and role-based filtering of data.
• Let’s go ahead and create a horizontal view of our dataset.
• Start by importing the necessary data if you have not already.
To help you verify your work, there are 404816 entries in this dataset.
231
Exercise : Horizontal View Creation Instructions
Once you’ve verified the data import was successful, create a view that only shows the the following fields:
• product
• company
• state
db.productComplaints.findOne()
{
"product" : "Debt collection",
"state" : "FL",
"company" : "Enhanced Recovery Company, LLC"
}
We can create a reshaped view of a collection to enable more intuitive data queries and make it easier for applications
to perform analytics.
It is also possible to create a view from a view.
• Use the aggregation framework to create a reshaped view of our dataset.
• It is necessary to have completed Lab: Horizontal Views (page 231)
Create a view that can be queried by company name that shows the amount of complaints by state. The resulting data
should look like:
{
"company" : "ROCKY MOUNTAIN MORTGAGE COMPANY",
"states" : [
{
"state" : "TX",
"count" : 4
}
]
}
232
13 Reporting Tools and Diagnostics
Performance Troubleshooting (page 233) An introduction to reporting and diagnostic tools for MongoDB
Learning Objectives
Upon completing this module students should understand basic performance troubleshooting techniques and tools
including:
• mongostat
• mongotop
• db.setProfilingLevel()
• db.currentOp()
• db.<COLLECTION>.stats()
• db.serverStatus()
db.testcol.drop()
for (i=1; i<=10000; i++) {
arr = [];
for (j=1; j<=1000; j++) {
doc = { _id: (1000 * (i-1) + j), a: i, b: j, c: (1000 * (i-1)+ j) };
arr.push(doc)
};
db.testcol.insertMany(arr);
var x = db.testcol.find( { b : 255 } );
x.next();
var x = db.testcol.find( { _id : 1000 * (i-1) + 255 } );
x.next();
var x = "asdf";
db.testcol.updateOne( { a : i, b : 255 }, { $set : { d : x.pad(1000) } });
print(i)
}
233
Exercise: mongostat (run)
• In a third window, create an index when you see things slowing down:
db.testcol.createIndex( { a : 1, b : 1 } )
• Look at mongostat.
• Notice that things are going significantly faster.
• Then, let’s drop that and build another index.
db.testcol.dropIndexes()
db.testcol.createIndex( { b : 1, a : 1 } )
Exercise: mongotop
db.testcol.drop()
for (i=1; i<=10000; i++) {
arr = [];
for (j=1; j<=1000; j++) {
doc = {_id: (1000*(i-1)+j), a: i, b: j, c: (1000*(i-1)+j)};
arr.push(doc)
};
db.testcol.insertMany(arr);
var x = db.testcol.find( {b: 255} ); x.next();
var x = db.testcol.find( {_id: 1000*(i-1)+255} ); x.next();
var x = "asdf";
db.testcol.updateOne( {a: i, b: 255}, {$set: {d: x.pad(1000)}});
print(i)
}
234
db.currentOp()
Exercise: db.currentOp()
Do the following then, connect with a separate shell, and repeatedly run db.currentOp().
db.testcol.drop()
for (i=1; i<=10000; i++) {
arr = [];
for (j=1; j<=1000; j++) {
doc = {_id: (1000*(i-1)+j), a: i, b: j, c: (1000*(i-1)+j)};
arr.push(doc)
};
db.testcol.insertMany(arr);
var x = db.testcol.find( {b: 255} ); x.next();
var x = db.testcol.find( {_id: 1000*(i-1)+255 }); x.next();
var x = "asdf";
db.testcol.updateOne( {a: i, b: 255}, {$set: {d: x.pad(1000)}});
print(i)
}
db.<COLLECTION>.stats()
235
Exercise: Using Collection Stats
db.testcol.drop()
db.testcol.insertOne( { a : 1 } )
db.testcol.stats()
var x = "asdf"
db.testcol2.insertOne( { a : x.pad(10000000) } )
db.testcol2.stats()
db.stats()
The Profiler
• Off by default.
• To reset, db.setProfilingLevel(0)
• At setting 1, it captures “slow” queries.
• You may define what “slow” is.
• Default is 100ms: db.setProfilingLevel(1)
• E.g., to capture 20 ms: db.setProfilingLevel(1, 20)
db.setProfilingLevel(0)
db.testcol.drop()
db.system.profile.drop()
db.setProfilingLevel(2)
db.testcol.insertOne( { a : 1 } )
db.testcol.find()
var x = "asdf"
db.testcol.insertOne( { a : x.pad(10000000) } ) // ~10 MB
db.setProfilingLevel(0)
db.system.profile.find().pretty()
236
db.serverStatus()
db.testcol.drop()
var x = "asdf"
for (i=0; i<=10000000; i++) {
db.testcol.insertOne( { a : x.pad(100000) } )
}
237
Performance Tips: Write Concern
Bulk Operations
• Using bulk operations (including insertMany and updateMany ) can improve performance, especially
when using write concern greater than 1.
• These enable the server to amortize acknowledgement.
• Can be done with both insertMany and updateMany .
mkdir -p /data/replset/{1,2,3}
mongod --logpath /data/replset/1/mongod.log \
--dbpath /data/replset/1 --replSet mySet --port 27017 --fork
mongod --logpath /data/replset/2/mongod.log \
--dbpath /data/replset/2 --replSet mySet --port 27018 --fork
mongod --logpath /data/replset/3/mongod.log \
--dbpath /data/replset/3 --replSet mySet --port 27019 --fork
db.testcol.drop()
for (i=1; i<=10000; i++) {
for (j=1; j<=1000; j++) {
db.testcol.insertOne( { _id : (1000 * (i-1) + j),
a : i, b : j, c : (1000 * (i-1)+ j) },
{ writeConcern : { w : 1 } } );
};
print(i);
}
238
Multiple insertOne s with {w: 3}
db.testcol.drop()
for (i=1; i<=10000; i++) {
for (j=1; j<=1000; j++) {
db.testcol.insertOne(
{ _id: (1000 * (i-1) + j), a: i, b: j, c: (1000 * (i-1)+ j)},
{ writeConcern: { w: 3 } }
);
};
print(i);
}
db.testcol.drop()
for (i=1; i<=10000; i++) {
arr = []
for (j=1; j<=1000; j++) {
arr.push(
{ _id: (1000 * (i-1) + j), a: i, b: j, c: (1000 * (i-1)+ j) }
);
};
db.testcol.insertMany( arr, { writeConcern : { w : 3 } } );
print(i);
}
239
Schema Design
• Reads and writes that don’t use an index will cripple performance.
• In compound indexes, order matters:
– Sort on a field that comes before any range used in the index.
– You can’t skip fields; they must be used in order.
– Revisit the indexing section for more detail.
240
14 Backup and Recovery
Backup and Recovery (page 241) An overview of backup options for MongoDB
Disasters Do Happen
241
Human Disasters
• Recovery Point Objective (RPO): How much data can you afford to lose?
• Recovery Time Objective (RTO): How long can you afford to be off-line?
Terminology: DR vs. HA
Quiz
242
Backup Options
• Document Level
– Logical
– mongodump, mongorestore
• File system level
– Physical
– Copy files
– Volume/disk snapshots
mongodump
$ mongodump --help
Export MongoDB data to BSON files.
options:
--help produce help message
-v [ --verbose ] be more verbose (include multiple times for
more verbosity e.g. -vvvvv)
--version print the program’s version and exit
-h [ --host ] arg mongo host to connect to ( /s1,s2 for
--port arg server port. Can also use --host hostname
-u [ --username ] arg username
-p [ --password ] arg password
--dbpath arg directly access mongod database files in path
-d [ --db ] arg database to use
-c [ --collection ] arg collection to use (some commands)
-o [ --out ] arg (=dump)output directory or "-" for stdout
-q [ --query ] arg json query
--oplog Use oplog for point-in-time snapshotting
243
File System Level
Ensure Consistency
• Entire database
• Backup files will be large
• Fastest way to create a backup
• Fastest way to restore a backup
• mongorestore
• --oplogReplay replay oplog to point-in-time
244
Backup Sharded Cluster
• mongodump/mongorestore
– --oplog[Replay]
– --objcheck/--repair
– --dbpath
– --query/--filter
• bsondump
– inspect data at console
• LVM snapshot time/space tradeoff
– Multi-EBS (RAID) backup
– clean up snapshots
245
15 MongoDB Atlas, Cloud & Ops Manager Fundamentals
MongoDB Cloud & Ops Manager (page 246) Learn about what Cloud & Ops Manager offers
Automation (page 248) Cloud & Ops Manager Automation
Lab: Cluster Automation (page 251) Set up a cluster with Cloud Manager Automation
Monitoring (page 252) Monitor a cluster with Cloud Manager
Lab: Create an Alert (page 254) Create an alert on Cloud Manager
Backups (page 254) Use Cloud Manager to create and administer backups
Learning Objectives
Deployment Options
246
Architecture
Cloud Manager
Ops Manager
247
Cloud & Ops Manager Use Cases
15.2 Automation
Learning Objectives
What is Automation?
248
Automation Agents
Administrator wants to create a 100-shard sharded cluster, with each shard comprised of a 3 node replica set:
• Administrator installs automation agent on 300 servers
• Cluster environment/topology is created in Cloud / Ops Manager, then deployed to agents
• Agents execute instructions until 100-shard cluster is complete (usually several minutes)
• Upgrades without automation can be a manually intensive process (e.g. 300 servers)
• A lot of edge cases when scripting (e.g. 1 shard has problems, or one replica set is a mixed version)
• One click upgrade with Cloud / Ops Manager Automation for the entire cluster
249
Automation: Behind the Scenes
{
"groupId": "55120365d3e4b0cac8d8a52a737",
"state": "PUBLISHED",
"version": 4,
"cluster": { ...
Configuration File
When version number of configuration file on Cloud / Ops Manager is greater than local version, agent begins making
a plan to implement changes:
"replicaSets": [
{
"_id": "shard_0",
"members": [
{
"_id": 0,
"host": "DemoCluster_shard_0_0",
"priority": 1,
"votes": 1,
"slaveDelay": 0,
"hidden": false,
"arbiterOnly": false
},
...
250
Automation Goal State
Automation agent is considered to be in goal state after all cluster changes (related to the individual agent) have been
implemented.
Demo
• The instructor will demonstrate using Automation to set up a small cluster locally.
• Reference documentation:
• The Automation Agent50
• The Automation API51
• Configuring the Automation Agent52
Learning Objectives
Exercise #1
Create a cluster using Cloud Manager automation with the following topology:
• 3 shards
• Each shard is a 3 node replica set (2 data bearing nodes, 1 arbiter)
• Version 2.6.8 of MongoDB
• To conserve space, set “smallfiles” = true and “oplogSize” = 10
50 https://docs.cloud.mongodb.com/tutorial/nav/automation-agent/
51 https://docs.cloud.mongodb.com/api/
52 https://docs.cloud.mongodb.com/reference/automation-agent/
251
Exercise #2
15.4 Monitoring
Learning Objectives
• Alert on performance issues, to catch them before they turn into an outage
• Diagnose performance problems
• Historical performance analysis
• Monitor cluster health
• Capacity planning and scaling requirements
Monitoring Agent
252
Agent Configuration
Agent Security
Monitoring Demo
Visit https://www.mongodb.com/cloud
• Add charts to view by clicking the name of the chart at the bottom of the host’s page
• “i” icon next to each chart title can be clicked to learn what the chart means
• Holding down the left mouse button and dragging on top of the chart will let you zoom in
Metrics
Alerts
253
15.5 Lab: Create an Alert
Learning Objectives
Exercise #1
Create an alert through Cloud Manager for any node within your cluster that is down.
After the alert has been created, stop a node within your cluster to verify the alert.
15.6 Backups
Learning Objectives
• mongodump
• File system backups
• Cloud / Ops Manager Backups
254
Cloud / Ops Manager Backups
Architecture
255
Snapshotting
Backup Agent
256
16 MongoDB Cloud & Ops Manager Under the Hood
API (page 257) Using the Cloud & Ops Manager API
Lab: Cloud Manager API (page 258) Cloud & Ops Manager API exercise
Architecture (Ops Manager) (page 259) Ops Manager
Security (Ops Manager) (page 261) Ops Manager Security
Lab: Install Ops Manager (page 262) Install Ops Manager
16.1 API
Learning Objectives
API Documentation
https://docs.mms.mongodb.com/core/api/ <https://docs.mms.mongodb.com/core/api/>
257
Ingest Monitoring Data
The monitoring API can be used to ingest monitoring data into another system, such as Nagios, HP OpenView, or your
own internal dashboard.
Use the backup API to programmatically restore an integration or testing environment based on the last production
snapshot.
Configuration Management
Use the automation API to integrate with existing configuration management tools (such as Chef or Puppet) to auto-
mate creating and maintaining environments.
Learning Objectives
If Ops Manager is installed, it may be used in place of Cloud Manager for this exercise.
Exercise #1
Exercise #2
Modify and run the following curl command to return alerts for your Cloud Manager group:
258
Exercise #3
How would you find metrics for a given host within your Cloud Manager account? Create an outline for the API calls
needed.
Learning Objectives
Components
Architecture
Application Server
259
Application Database
Backup Infrastructure
Backup Database
• 3 sections: - blockstore for blocks - oplog - sync for initial sync slices
• Replica set, a standalone MongoDB node can also be used
• Must be sized carefully
• All snapshots are stored here
• Block level de-duping, the same block isn’t stored twice (significantly reduces database size for deployment
with low/moderate writes)
260
Backup Daemon Process
Learning Objectives
• LDAP
• MongoDB-CR
• Kerberos (Linux only)
261
Encrypting Communications
• Read Only
• User Admin
• Monitoring Admin
• Backup Admin
• Automation Admin
• Owner
Learning Objectives
262
Install Ops Manager
Exercise #1
Prepare your environment for running all Ops Manager components: Monitoring, Automation, and Backups
• Set up a 3 node replica set for the Ops Manager application database (2 data bearing nodes, 1 arbiter)
• Set up a 3 node replica set for Ops Manager backups (2 data bearing nodes, 1 arbiter)
• Verify both replica sets have been installed and configured correctly
Exercise #2
Exercise #3
263
Exercise #4
Exercise #5
264
17 Introduction to MongoDB BI Connector
Learning Objectives
MongoDB Connector for BI enables the execution of SQL statements in a MongoDB server.
It’s a native connector implementation that enables Business Intelligence tools to read data from a MongoDB server.
How it works
265
BI Connector Package
The mongodrdl
schema:
- db: <database name>
tables:
- table: <SQL table name>
collection: <MongoDB collection name>
pipeline:
- <optional pipeline elements>
columns:
- Name: <MongoDB field name>
MongoType: <MongoDB field type>
SqlName: <mapped SQL column name>
SqlType: <mapped SQL column type>
mongodrdl Example
266
Custom Filtering
mongodrdl allows you to define a --customFilter field in case we need to express MongoDB native queries
from within our SQL query expression.
mongosqld Daemon
grace?mechanism=PLAIN&source=$external
mongosqld Encryption
267
SQL Compatibalility
• This means we can use a SQL client like mysql to query data on MongoDB
use training;
SELECT * FROM zips;
53 https://docs.mongodb.com/bi-connector/master/supported-operations/
268
Find out more Having trouble? Follow us on twitter
mongodb.com | mongodb.org File a JIRA ticket: @MongoDBInc
university.mongodb.com jira.mongodb.org @MongoDB