Indexes in MongoDB

MongoDB provides different kinds of indexes. Indexes are vital when the dataset is big. You just cannot keep scanning the whole collection on disk again and again for every query. So, indexing becomes extremely important when you have a massive amount of data stored in the collections. But when you have a huge dataset, then chances are there that one type of index will not fit for all as the nature of data differs. We must have indexes that match well with the kind of data as well as the nature of the queries. Let’s see the different types of indexes MongoDB provides.

Default index

MongoDB has a default unique indexing on the _id field in every collection. This index is unique, so two documents having the same _id field cannot co-exist in a collection. It is similar to a primary key in relational databases. When a new document is inserted in a collection, MongoDB adds the _id field if the _id field is not present in the document entered by the user/application.

As this the default index in MongoDB, you don’t have to create an index on _id. MongoDB already does the hard work for you.

Single Field Indexes

MongoDB allows single field indexes. It is the most common index we see. The index on the _id field is also a single field index, as you can easily guess there is just one field, i.e., _id. This type of index is based on a single field in a document. The field can in the top-level or inside an embedded document. Let’s see a few examples –

Here is our sample document on which we will perform some index operations.

{
    "_id" : ObjectId(),
    "name" : "Andrew",
    "age" : 25,
    "address" : {
        "city" : "New York",
        "zipcode" : "10037"
    }
}

We can create single field index on any top-level fields like this –

> db.mycoll.ensureIndex({ 'name' : 1 })
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 1,
    "numIndexesAfter" : 2,
    "ok" : 1
}

Similarly, we can create an index on ‘age’ as well. But the interesting thing is we can create an index on ‘zipcode’ as well. Wondering how? See this.

> db.mycoll.ensureIndex({ 'address.zipcode' : 1 })
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 2,
    "numIndexesAfter" : 3,
    "ok" : 1
}

Let’s check if the index is created well.

> db.mycoll.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "mydb.mycoll"
    },
    {
        "v" : 1,
        "key" : {
            "name" : 1
        },
        "name" : "name_1",
        "ns" : "mydb.mycoll"
    },
    {
        "v" : 1,
        "key" : {
            "address.zipcode" : 1
        },
        "name" : "address.zipcode_1",
        "ns" : "mydb.mycoll"
    }
]

Bingo! With dot notation, you can create an index on fields inside an embedded document.

If that not interesting enough for you, then let me tell you that you can index a whole embedded document. In our example, you can create an index on the ‘address’ field also.

> db.mycoll.ensureIndex({ 'address' : 1 })
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 3,
    "numIndexesAfter" : 4,
    "ok" : 1
}

> db.mycoll.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "mycoll.mycoll"
    },
    {
        "v" : 1,
        "key" : {
            "name" : 1
        },
        "name" : "name_1",
        "ns" : "mycoll.mycoll"
    },
    {
        "v" : 1,
        "key" : {
            "address.zipcode" : 1
        },
        "name" : "address.zipcode_1",
        "ns" : "mycoll.mycoll"
    },
    {
        "v" : 1,
        "key" : {
            "address" : 1
        },
        "name" : "address_1",
        "ns" : "mycoll.mycoll"
    }
]

Single field indexes are very common. MongoDB will use these indexes if the query object contains a field that has an index. If we consider the indexes that we just made, MongoDB will use indexes in the following cases –

> db.mycoll.find({ 'name' : 'Andrew' })

As we have an index on name, MongoDB will use it for this query.

> db.mycoll.find({ 'age' : 30 })

As we don’t have an index on ‘age’, MongoDB will scan the whole collection to return matching results.

> db.mycoll.find({ 'address.zipcode' : '110032' })

As we have an index on nested field ‘address.zipcode’, MongoDB will use this index for this query.

Compound Indexes

In MongoDB, you can create an index using more than one single field. This kind of index is handy for certain types of queries. They are also very common. Creating a compound index is very simple once you know how to make a single field index. In the compound index, you just pass more than one single field to the ensureIndex() method. Here is an example –

> use mydb
> db.mycoll.ensureIndex({ 'name' : 1, 'age' : 1 })
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 4,
    "numIndexesAfter" : 5,
    "ok" : 1
}

In this example, we created a compound index with ‘name’ and ‘age’ fields. Let’s see how a compound index looks.

> db.mycoll.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "mycoll.mycoll"
    },
    {
        "v" : 1,
        "key" : {
            "name" : 1,
            "age" : 1
        },
        "name" : "name_1_age_1",
        "ns" : "mycoll.mycoll"
    }
]

You can see the new index has both fields.

In the case of a compound index, MongoDB can use a compound index for queries that include indexed fields in their query object. For example, in our index that we just create, any query that has ‘name’ or ’name’ plus ‘age’ in its query object can use this index. The important thing here is the order. That means for queries to use this index, the index prefix should be present. Let me give you an example.

For our index, i.e { 'name' : 1, 'age' : 1 } the following query will use the index –

> db.mycoll.find({ 'name' : 'Andrew' })

> db.mycoll.find({ 'name' : 'Andrew', 'age' : 30 })

But the following query will not use this index, as the prefix, in this case ‘name’, is not provided.

> db.mycoll.find({ 'age' : 30 })

Multikey Indexes

MongoDB allows you to index a field that holds an array value. It is called a multikey index because each element in the array will have a separate index entry. Although it can be useful for a specific case, it’s not a popular one when it comes to indexing in MongoDB. The way we create a multikey index is similar to a single field index. There is no particular way to create a multikey index. When you create a single field index on a field that contains an array value, it’s called a multikey index. Suppose you have the following document in your collection.

{
    "_id" : ObjectId(...),
    "name" : "Andrew",
    "age" : 25,
    "address" : {
        "city" : "New York",
        "zipcode" : "10037",
        "hobbies" : [ "sports", "music", "reading" ]
    }
}

Let’s create a multikey index now.

> db.mycoll.ensureIndex({ "address.hobbies" : 1 })
{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 6,
    "numIndexesAfter" : 7,
    "ok" : 1
}

If you create an index on ‘hobbies’, that will be a multikey index. Such an index will have separate entries for ‘sports’, ‘music’ and ‘reading’. Now the queries for any of the values may use a multikey index. How can we prove this? Let me show you how. Let’s query and use explain() method to what index is used.

> db.mycoll.find({ "address.hobbies" : "music"}).explain()
{
    "cursor" : "BtreeCursor address.hobbies_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 1,
    "nscanned" : 1,
    "nscannedObjectsAllPlans" : 1,
    "nscannedAllPlans" : 1,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "address.hobbies" : [
            [
                "music",
                "music"
            ]
        ]
    },
    "server" : "localhost:27017",
    "filterSet" : false
}

We will see explain() method in detail later, but for now, let me show you a few things. Firstly, you can see the cursor field says BtreeCursor address.hobbies_1. It means it used index address.hobbies_1. Secondly, notice the isMultiKey field that says true. I guess this is self-explanatory.

Text Indexes

MongoDB provides text search on a field that contains a string or array of strings. It is exciting. To make a field searchable, we need to create a text index on the field. Just create the index, and MongoDB will take care of the rest. For example, you have documents in a collection that looks something like this –

{
    "author" : "Tom Smith",
    "article"" : "This is the super impressive article I wrote today.",
    "tags" : ["history","news", "world"]
}

Now you want to make the article field searchable so that you can find the documents where article contains a word. Let’s say today. For this, we need to create a text index on the article first. Let’s do it now.

> db.mycoll.ensureIndex({ "article" : "text" })

Now let’s try to find something. To search, we need to use the $text operator in conjunction with the $search operator. Here is an example where we are searching “today” in the document.

> db.mycoll.find({ "$text" : { "$search" : "today"} })
{
    "author" : "Tom Smith",
    "article"" : "This is the super impressive article I wrote today.",
    "tags" : ["history","news", "world"]
}

Hashed Indexes

MongoDB provides hashed indexes that flatten the value of the field and creates a hash for it. This index is also not used much. The most common use case is hash-based sharding. Hashed indexing doesn’t work with fields having array values. A hashed index can be created like this –

> db.mycoll.ensureIndex({ "name" : "hashed" })

Now let’s see how to use it.

> db.mycoll.find({ 'name' : 'Andrew'}).explain()
{
    "cursor" : "BtreeCursor name_hashed",
    "isMultiKey" : false,
    "n" : 2,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "name" : [
            [
                NumberLong("4176813871854105885"),
                NumberLong("4176813871854105885")
            ]
        ]
    },
    "server" : "localhost:27017",
    "filterSet" : false
}

You can see that it used BTreeCursor name_hashed, which shows that it uses our hashed index. Also, in the indexBounds, you can see the hash range that is used for the query.

Geospatial Indexes

MongoDB provides an easy and efficient way to handle location data. You can store location data in legacy form or GeoJSON form. I lot of exciting things are possible with location data. But due to the limitation of this post, we will restrict ourselves to index creation only. The Geospatial queries are not under the scope of this chapter.

There are two types of Geospatial indexes that are used the most, and we are going to discuss them here briefly.

2d Index

This type of index supports only the legacy coordinate pairs. Let’s see how to create a 2d index.

> db.mycoll.createIndex({ "loc" : "2d" })

The above command will create a 2d index on a field called loc. The field name is loc, but you can have some other name. Don’t think that the field containing location data has to be loc. Right?

2d index has a few other options like the min-max boundaries and precision bits. I am not going into detail, as those are not often used. You can refer MongoDB documentation for it.

2dsphere Index

This index supports both legacy and GeoJSON data. You can have a 2dsphere index on a field that holds GeoJSON formatted data. This index is more common in use and suits most of the use cases.

The index creation process is very similar. Here is an example –

> db.mycoll.createIndex({ "loc" : "2dsphere" })

You can also create compound indexes. Geospatial indexes don’t stop you from creating a compound index with the Geospatial index.