While every situation is unique and may include special considerations, the general guidance is that index keys should be in the following order:
- Equality tests
- Sort fields
- Range filters
The reasons for this order are described below.
Characteristics of Optimal Queries
Optimal queries attempt to minimize the amount of work that the database must perform in order to retrieve the result set. In particular, a properly configured environment works to reduce or completely eliminate the following:
- Documents examined that do not match the query predicates
- In-memory sort operations
- Reads from disk
In MongoDB, as in many database systems, combining the appropriate index with a well-structured schema is the proper technique to achieve an optimal query. MongoDB documentation provides best practices for Indexing Strategies. This article contains supplementary material related to the order of fields for Compound Indexes.
Equality Tests
Selective queries include equality tests in their predicates. For example, to find documents where the value of the
life
field is exactly 42
and the value of the status
field is 1337
we issue the following query:db.foo.find({life:42, status: 1337})
{ "_id" : ObjectId("5890b05daa8624eb3d6638b7"), "life" : 42, "status" : 1337 }
The output of an
explain()
command for such a query indicates that exact equality tests were performed:db.foo.explain().find({life:42, status: 1337})
...
"parsedQuery" : {
"$and" : [
{
"life" : {
"$eq" : 42
}
},
{
"status" : {
"$eq" : 1337
}
}
]
},
...
Without an appropriate index to support the query, the database spends cycles examining documents which do not match the query predicates and which are not returned to the client:
db.foo.explain("executionStats").find({life:42, status: 1337})
...
"executionStats" : {
"nReturned" : 1,
"totalKeysExamined" : 0,
"totalDocsExamined" : 9968,
...
},
...
As outlined in index documentation, creating an appropriate index substantially improves the performance of such a query. After creating a compound index with those two fields, the database can directly identify those documents without looking at a single document that failed to match the predicates:
db.foo.createIndex({life:1, status:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.foo.explain("executionStats").find({life:42, status: 1337})
...
"executionStats" : {
"nReturned" : 1,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
...
Note that the ordering of an equality field relative to another equality field in the index is not important in terms of performance:
db.foo.dropIndexes()
{
"nIndexesWas" : 2,
"msg" : "non-_id indexes dropped for collection",
"ok" : 1
}
//Reversed key order from the earlier example
db.foo.createIndex({status:1, life:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.foo.explain("executionStats").find({life:42, status: 1337})
...
"executionStats" : {
"nReturned" : 1,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
...
However, order is important to consider when selecting indexes that serve multiple queries.
Sort Fields
It is common for a query to include a
sort
request that the database must consider when determining an appropriate plan to retrieve the result set. There are effectively two ways that MongoDB can satisfy such a request:- Identify the entire result set and then sort all of it
- Walk an index that is organized in the requested sort order
The former technique is often referred to as an in-memory or blocking sort and it is usually suboptimal in most scenarios. For more information, see Use Indexes to Sort Query Results.
Generally, fields for sorting should be placed after fields for equality in an index. To demonstrate the difference, first create an index with sort then equality:
db.foo.createIndex({startDate:1, life:1, status:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.foo.explain("executionStats").find({life:42, status: 1337}).sort({startDate:1})
...
"executionStats" : {
"nReturned" : 1,
"executionTimeMillis" : 13,
"totalKeysExamined" : 9968,
"totalDocsExamined" : 9968,
...
Then try the same query using an index with equality then sort:
db.foo.createIndex({life:1, status:1, startDate:1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
db.foo.explain("executionStats").find({life:42, status: 1337}).sort({startDate:1})
...
"executionStats" : {
"nReturned" : 1,
"executionTimeMillis" : 0,
"totalKeysExamined" : 1,
"totalDocsExamined" : 1,
...
As this example demonstrates, placing the sort fields after the equality fields in the index key ordering can greatly reduce the amount of overhead work that the database must perform in order to satisfy a query. There were (at least) 2 orders of magnitude of difference in execution time even for this simple test where the entire collection resides in memory. The effects of this difference in execution behavior can become quite pronounced at scale.
When multiple fields are involved, the sort keys should be provided in the same order in which they appear in the queries.
Range Filters
Finally, consider queries that include range predicates, such as a greater than operator. Initially, it may seem like a good idea to place these with the equality fields ahead of the sort fields. For example:
db.foo.explain("executionStats").find({life:42, status: 1337, total:{$gt:500}}).sort({startDate:1})
...
"winningPlan" : {
"stage" : "SORT",
"sortPattern" : {
"startDate" : 1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
...
This is, however, generally not the best approach due to the
SORT
stage now required in the query plan as the index is no longer being traversed in the sorted order. Consider the following 3 documents:{ ..., "life" : 42, "status" : 1337, "total" : 1000, "startDate": ISODate("2017-01-30T00:00:00Z") },
{ ..., "life" : 42, "status" : 1337, "total" : 1500, "startDate": ISODate("2017-01-01T00:00:00Z") },
{ ..., "life" : 42, "status" : 1337, "total" : 2000, "startDate": ISODate("2017-01-15T00:00:00Z") }
These have been listed in the order in which they would appear in the following index:
db.foo.createIndex({life:1, status:1, total:1, startDate:1})
Specifically, now that the range filters have been introduced in between the equality and sort fields, traversing the index no longer returns documents in the order specified by the
sort
. We can see that in our example documents above as the values for the startDate
field appear as 2017-01-30
, 2017-01-01
, then 2017-01-15
, which is not in ascending order.
In most situations, the appropriate configuration is to append range filters after the sort fields on the index. This allows the database to match the equality criteria in sorted order, then satisfy any range requests by scanning the remaining items in the requested sort order. This may result in the database scanning index keys that do not satisfy the query predicates, but it eliminates blocking sorts and allows the batches to be returned as soon as they are identified. In our example the index would look like:
db.foo.createIndex({life:1, status:1, startDate:1, total:1})
If there are multiple range filters specified in the query, then they should be ordered in ascending order of cardinality. This means that the first range filter in the index should have the fewest number of distinct values. This reduces the number of extraneous keys that are examined while collecting the result set.
Additional Information
Summary and Considerations
For a majority of use cases, the following index key order should be considered when designing schemas, queries, and indexes:
- Equality tests, in any order
- Sort fields, in the order requested by the query
- Range filters, in increasing order of cardinality
These rules usually support the characteristics of optimal queries by reducing the amount of unnecessary work done by the database in order to complete a request.
Please keep in mind that there are Operational Considerations for Indexes that should also be accounted for when determining an appropriate set for an application. It is important to test and examine the behavior of your configuration while planning in order to identify the optimal approach for a given scenario.
Comments
Post a Comment