Why is the order of fields important when searching for embedded documents? Isn't JSON unordered?


MongoDB documents are stored on the server in a binary format called BSON (short for "Binary JSON"), which is a JSON-like format that supports additional data types. The JSON format was designed to be human readable and derived supported data types and behaviour from JavaScript. BSON was designed as a binary data interchange format with more precise control over data representation.
While BSON is JSON-like, there are notable differences:
  • A BSON data structure is an ordered object, not a dictionary. For example, BSON does not require field names to be unique (although drivers commonly default to a hash / dictionary / JSON-like interface that does not support duplicate field names). This gives developers precision which can be important for some use cases and also avoids potentially unnecessary server overhead of inspecting and recursively serializing/deserializing BSON into a predefined field order.
  • Where order is important, officially supported drivers rely on the underlying programming language to support an order-preserving data structure or provide their own. For example, the Python driver (aka PyMongo) includes a SON class for manipulating ordered objects similar to a normal Python dictionary.
  • BSON supports additional data types such as binary data, 32-bit and 64-bit integers, floats, and Decimals (MongoDB 3.4+). As a simple contrast for numeric representation, JSON (and JavaScript) currently only support a single numeric type (Number) which represents all values as a double precision floating point number.
  • MongoDB has defined comparison and sort order rules for BSON values. For example, MongoDB uses simple binary comparison for strings by default, and MongoDB 3.4+ adds the option of language-specific collation.

How should I query for embedded documents?

  1. For exact matches against an embedded document, provide the full embedded document:
  db.bios.find(
  {
      name: {
            first: "Yukihiro",
            last: "Matsumoto"
      }
  }
)
An exact match includes the field order because the underlying data is ordered in BSON. This query performs a binary comparison of the BSON serialization of the embedded document provided in your query against the BSON field value with the embedded document stored in MongoDB. You can potentially take advantage of the field order to consistently put a more selective field earlier in your embedded document, which can potentially help with the query performance if you've indexed the entire embedded document.
  1. To find documents matching fields in an embedded document, specify the field names using dot notation;
db.bios.find(
    {
       "name.first": "Yukihiro",
       "name.last": "Matsumoto"
    }
)
With this query syntax the field order in the embedded document is not significant. However, if you create a compound index to support your common queries the selectivity, order, and sort direction of keys in the index definition will be significant for efficient queries.

Comments