MongoDB backup and restore

Backup and restore are important routine jobs. If you are working with MongoDB then you must know how to take backups as well as how to restore them. MongoDB comes with wonderful command line tools for backup and restore, i.e. mongodump and mongorestore.

I will not be explaining all the backup/restore method here; you can refer MongoDB documentation for backup and restore strategies for more insight. But what I am going to show you are, the most common and simple ways to backup and restore MongoDB.

Copying data Files/Folder
The simplest way to backup or restore MongoDB would be to simply copy the database folder/files. That means if you are taking backup then copy the database folder/files to your backup location. And if you are restoring then copy the content of your backup folder to the database folder.

Continue Reading

CRUD operations in MongoDB

The acronym ‘CRUD’ stands for Create, Read, Update and Delete. In this section, we will learn basic CRUD operations. CRUD operations can be extremely complex sometimes but we will not go into the complex stuff for two reasons. First, it is not under the scope of this chapter. Second, we will be doing a lot of CRUD operations in the following chapters.

Before we start learning CRUD, I must tell you that we will be using mongo shell for CRUD in the following sections.

Create or insert document
Data can be added to MongoDB in the form of documents. To add document, we use the insert function. We pass the document to be inserted to insert() function. Let’s see an example –

Continue Reading

Indexes in Mongodb

MongoDB provides different kinds of indexes. Indexes are very important when the dataset is big. You just cannot keep scanning whole collection on disk again and again for every query. So, indexing becomes extremely important when you have huge collections. But when you have huge dataset then chances are there that one type of index will not fit for all as the nature of data differs. We must have indexes that match well with the nature of data as well as nature of query. Let’s see the different types of indexes MongoDB provides.

Default index
MongoDB has default unique indexing on ‘_id’ field in every collection. This index is unique so two documents having same ‘_id’ field cannot co-exist in a collection. This is similar to primary key in relational databases. I am sure you remember ‘_id’ as we have discussed this in previous chapters. When a new document is inserted in a collection, MongoDB inserts ‘_id’ field if ‘_id’ field is not provided by the user/application.

As this the default index in MongoDB, you don’t have to create and index on ‘_id’. MongoDB already does the hard work for you.

Single Field Indexes
MongoDB allows single field indexes. This is the most common index. MongoDB’s default index on ‘_id’ is also a single field index. This type of index is based on single field. The field can in the top level or inside an embedded document. Let’s see a few examples –

Here is our sample document on which we will perform some index operations.

{
"_id" : ObjectId(...),
"name" : "Andrew",
"age" : 25,
"address" : {
"city" : "New York",
"zipcode" : "10037"
}
}

We can create single field index on any top-level fields like this –

> db.mycoll.ensureIndex({ 'name' : 1 })
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}

Similarly we can create index on ‘age’ as well. But interesting thing is we can also create an index on ‘zipcode’ as well. Wondering how? See this.

> db.mycoll.ensureIndex({ 'address.zipcode' : 1 })
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}

Let’s check if the index is created well.

> db.mycoll.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mydb.mycoll"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "mydb.mycoll"
},
{
"v" : 1,
"key" : {
"address.zipcode" : 1
},
"name" : "address.zipcode_1",
"ns" : "mydb.mycoll"
}
]

Bingo! With dot notation you can create index on fields inside an embedded document.

If that not interesting enough for you, then let me tell you that you can index a whole embedded document. In our example, you can create an index on the ‘address’ field also.

> db.mycoll.ensureIndex({ 'address' : 1 })
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"numIndexesAfter" : 4,
"ok" : 1
}

> db.mycoll.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mycoll.mycoll"
},
{
"v" : 1,
"key" : {
"name" : 1
},
"name" : "name_1",
"ns" : "mycoll.mycoll"
},
{
"v" : 1,
"key" : {
"address.zipcode" : 1
},
"name" : "address.zipcode_1",
"ns" : "mycoll.mycoll"
},
{
"v" : 1,
"key" : {
"address" : 1
},
"name" : "address_1",
"ns" : "mycoll.mycoll"
}
]

Single field indexes are very common. MongoDB will use these indexes if the query object contains a field that has index. If we consider the indexes that we just made, MongoDB will use indexes in following cases –

> db.mycoll.find({ 'name' : 'Andrew' })

As we have index on ‘name’, MongoDB will use it for this query.

> db.mycoll.find({ 'age' : 30 })

As we don’t have index on ‘age’, MongoDB will scan whole collection to return matching results.

> db.mycoll.find({ 'address.zipcode' : '110032' })

As we have index on nested field ‘address.zipcode’, MongoDB will use this index for this query.

Compound Indexes
In MongoDB, you can create an index using more than one single field. This kind of indexes is very useful for certain type of queries. They are also very common. Creating compound index is very simple once you know how to make a single field index. In compound index, you just pass more than one single fields to the ensureIndex() method. Here is an example –

> use mydb
> db.mycoll.ensureIndex({ 'name' : 1, 'age' : 1 })
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 4,
"numIndexesAfter" : 5,
"ok" : 1
}

In this example, we created a compound index with ‘name’ and ‘age’ fields. Let’s see how a compound index looks like.

> db.mycoll.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "mycoll.mycoll"
},
{
"v" : 1,
"key" : {
"name" : 1,
"age" : 1
},
"name" : "name_1_age_1",
"ns" : "mycoll.mycoll"
}
]

You can see the new index has both the fields.

In case of compound index, MongoDB can use a compound index for queries that include indexed fields in their query object. For example, in our index that we just create, any query that has ‘name’ or ‘name’ plus ‘age’ in its query object can use this index. The important thing here is the order. That means for queries to use this index the index prefix should be present. Let me give you an example.

For our index, i.e { 'name' : 1, 'age' : 1 } the following query will use the index –

> db.mycoll.find({ 'name' : 'Andrew' })

> db.mycoll.find({ ‘name’ : ‘Andrew’, ‘age’ : 30 })

But the following query will not use this index, as the prefix, in this case ‘name’, is not provided.

> db.mycoll.find({ 'age' : 30 })

Multikey Indexes
MongoDB allows you to index a field that holds an array value. This is called a multikey index because each element is the array will have a separate index entry. Though this can be useful for specific cases but its not a popular one when it comes to indexing in MongoDB. The way we create a multikey index is similar to single field index. There is no special way to create multikey index. When you create a single field index on a field that contains an array value, its call multikey index. Suppose you have following document in your collection.

{
"_id" : ObjectId(...),
"name" : "Andrew",
"age" : 25,
"address" : {
"city" : "New York",
"zipcode" : "10037",
"hobbies" : [ "sports", "music", "reading" ]
}
}

Let’s create a multikey index now.

> db.mycoll.ensureIndex({ "address.hobbies" : 1 })
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 6,
"numIndexesAfter" : 7,
"ok" : 1
}

If you create index on ‘hobbies’, that will be multikey index. Such index will have separate entries for ‘sports’, ‘music’ and ‘reading’. Now the queries for any of the values may use multikey index. How can we prove this? Let me show you how. Let’s query and use explain() method to what index is used.

> db.mycoll.find({ "address.hobbies" : "music"}).explain()
{
"cursor" : "BtreeCursor address.hobbies_1",
"isMultiKey" : true,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"address.hobbies" : [
[
"music",
"music"
]
]
},
"server" : "localhost:27017",
"filterSet" : false
}

We will see explain() method in detail later but for now let me show you a few things. Firstly, you can see the “cursor” field says “BtreeCursor address.hobbies_1”. This means it used index “address.hobbies_1”. Secondly, notice the “isMultiKey” field that says “true”. I guess this is self-explanatory.

Text Indexes
MongoDB provide text search on field that contains a string or array of strings. This is really interesting. To make a field searchable we need to create a text index on the field. Just create the index and MongoDB will take care of the rests. For example you have documents in a collection that looks something like this –

{
"author" : "Tom Smith",
"article"" : "This is the super impressive article I wrote today.",
"tags" : ["history","news", "world"]
}

Now you want to make the “article” field searchable so that you can find the documents where “article” contains a word. Let’s say “today”. For this we need to create a text index on “article” first. Let’s do it now.

> db.mycoll.ensureIndex({ "article" : "text" })

Now let’s try to find something. To do a search we need to use “$text” operator in conjunction with “$search” operator. Here is an example where we are searching “today” in

> db.mycoll.find({ "$text" : { "$search" : "today"} })
{
"author" : "Tom Smith",
"article"" : "This is the super impressive article I wrote today.",
"tags" : ["history","news", "world"]
}

Hashed Indexes
MongoDB provides hashed indexes that flattens the value of the field and creates a hash for it. This index is also not used much. The most common use case is hash based sharding. Hashed indexing doesn’t work with fields having array values. Hashed index can be create like this –

> db.mycoll.ensureIndex({ "name" : "hashed" })

Now let’s see how to use it.

> db.mycoll.find({ 'name' : 'Andrew'}).explain()
{
"cursor" : "BtreeCursor name_hashed",
"isMultiKey" : false,
"n" : 2,
"nscannedObjects" : 2,
"nscanned" : 2,
"nscannedObjectsAllPlans" : 2,
"nscannedAllPlans" : 2,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"name" : [
[
NumberLong("4176813871854105885"),
NumberLong("4176813871854105885")
]
]
},
"server" : "localhost:27017",
"filterSet" : false
}

You can see that it used BTreeCursor “name_hashed”, which shows that it use our hashed index. Also, in the “indexBounds” you can the hash range it used for query.

Geospatial indexes
MongoDB provides an easy and efficient way to handle location data. You can store location data in legacy form or GeoJSON form. I lot of interesting things are possible with location data. But due to limitation of this book we will restrict ourselves to index creation only. The Geospatial queries are not under the scope of this chapter.

There are two type of Geospatial indexes that are used the most and we are going to discuss them here breifly.

2d Index
This type of index supports only the legacy coordinate pairs. Let’s see how to create a 2d index.

> db.mycoll.createIndex({ "loc" : "2d" })

The above command will create a “2d” index on a field called “loc”. The field name is “loc” but you can have some other name. Don’t think that the field containing location data has to be “loc”. Right?

2d index has a few other options like the min-max boundaries and precision bits. I am not going into detail as those are not very often used. You can refer MongoDB documentation for.

2dsphere Index
This index supports both legacy and GeoJSON data. You can have 2dsphere index on field that holds GeoJSON formatted data. This index is more common in use and suits most of the use cases.

Index creation process is very similar. Here is an example –

> db.mycoll.createIndex({ "loc" : "2dsphere" })

You can also create compound indexes also. Geospatial indexes don’t stop you from creating a compound index with Geospatial index.

Continue Reading

Mongodb GridFS using Python

Data is collected and stored in different forms depending on the nature of data. It can be in the form of text, images, videos, audio files and files in other formats. MongoDB can be used for storing all kinds of data, but so far we have used it for storing plain text information in MongoDB documents. As you know by now that MongoDB document has a size limit of 16MB. Though 16MB is good enough in most cases but looks tiny if you think of storing high-resolution images, PDF files, music, videos etc.

In this chapter, we are going to see how we can store more information in MongoDB. Till now we played around with storing text in JSON format but in this chapter we’ll see how to store images, PDFs, audio, video etc. I am sure you are going to enjoy this chapter. So, let’s get started!

Continue Reading