Question
Each database in a sharded cluster has a primary shard that holds all the un-sharded collections for that database. Each database has its own primary shard. The primary shard has no relation to the primary in a replica set.
How is the primary shard selected for a new database in a sharded cluster running MongoDB 3.0 or newer?
Answer
In MongoDB 3.0 and above, a Primary Shard is selected based on:
- the lowest shard id, then
- the data size of the shard, from the output of
db.adminCommand({listDatabases:1}).totalSize
The "shard id" is the
_id
field of the shards
collection in the config
database.
The shard with the lowest values for the two criteria (shard id and data size) will be selected as the Primary Shard for a new database. This will be the shard with the least amount of data.
Example of primary shard selection process
The primary shard selection mechanism can be demonstrated by using three shards and following the steps:
- Start with a new sharded cluster with three shards named
shard01
,shard02
, andshard03
. - Insert a number of large documents in a new database, e.g. 10,000 documents around 3 kilobytes each into database
test1
. This database will be created inshard01
. - Insert a number of large documents in another new database, e.g.
test2
. This database will be created inshard02
. - Create multiple new databases with only one small document each, e.g. databases
t1
-t8
. All subsequent databases will be created inshard03
.
The output of
sh.status()
should look like:databases:
{ "_id": "test1", "primary": "shard01", "partitioned": false }
{ "_id": "test2", "primary": "shard02", "partitioned": false }
{ "_id": "t1", "primary": "shard03", "partitioned": false }
{ "_id": "t2", "primary": "shard03", "partitioned": false }
{ "_id": "t3", "primary": "shard03", "partitioned": false }
{ "_id": "t4", "primary": "shard03", "partitioned": false }
{ "_id": "t5", "primary": "shard03", "partitioned": false }
{ "_id": "t6", "primary": "shard03", "partitioned": false }
{ "_id": "t7", "primary": "shard03", "partitioned": false }
{ "_id": "t8", "primary": "shard03", "partitioned": false }
Note that the relatively large
test1
and test2
databases have their primary shards located in shard01
(from step 2) and shard02
(from step 3) respectively. This is because during the creation of database test1
, all the shards were empty and thus the lowest shard id (i.e. shard01
) was selected as the primary shard for test1
. For database test2
, shard0
2 was selected because after the creation of database test1
, it was the shard with the lowest id having the least amount of data.
The subsequent eight small databases were all created in
shard03
(from step 4), because after steps 2 and 3, the shard with the lowest shard id and the least amount of data would be shard03
.
Comments
Post a Comment