How do I determine the cause of replica set elections?

Typical reasons for an election happening are:

the primary is intentionally stepped down
a replica set member with a higher priority is available due to replica set reconfiguration or catching up on replication lag
the primary becomes unresponsive to heartbeat requests from other replica set members
a network interruption causes the primary to be isolated from the majority of other members
the mongod process of the primary has unexpectedly quit

Note

The examples given below use replica set protocol version 1 (pv1), which is the default for all new replica sets created with MongoDB 3.2 or later. The general diagnostic approach is also valid for protocol version 0 (pv0), but note that pv0 is deprecated as of MongoDB 3.6. Investigating replica set elections

Examine the mongod log files, looking for the following keywords:

Component REPL log lines:
- transition to (PRIMARY, SECONDARY, STARTUP2, RECOVERY)
- stepping down
- election
Component COMMAND log lines:
- replSetStepDown - an administrator or automation agent has requested a primary to step down
- serverStatus was very slow - replication does not use the serverStatus command, but if this command is slow then other commands are likely also affected
MongoDB starting / dbexit - establishes when mongod instances were running and when they stopped

Election example

REPL log lines provide an overview of replication activity.

$ grep REPL server-a.mongod.log  | head

2017-10-09T03:57:28.550+0200 I REPL     [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: 10, timestamp: Oct  9 03:56:14:3) which is more than 30s behind member server-c:27017 whose most recent OpTime is (term: 10, timestamp: Oct  9 03:57:40:2)
2017-10-09T03:57:28.550+0200 I REPL     [ReplicationExecutor] could not find member to sync from
2017-10-09T03:57:34.551+0200 I REPL     [ReplicationExecutor] syncing from: server-b:27017
2017-10-09T03:57:34.567+0200 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to server-b:27017
2017-10-09T04:57:21.984+0200 I REPL     [ReplicationExecutor] syncing from: server-c:27017
2017-10-09T04:57:21.999+0200 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to server-c:27017
2017-10-09T07:35:33.471+0200 I REPL     [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: 10, timestamp: Oct  9 07:34:09:1) which is more than 30s behind member server-b:27017 whose most recent OpTime is (term: 10, timestamp: Oct  9 07:35:45:17)
2017-10-09T07:35:33.471+0200 I REPL     [ReplicationExecutor] could not find member to sync from
2017-10-09T07:35:39.472+0200 I REPL     [ReplicationExecutor] syncing from: server-c:27017
2017-10-09T07:35:39.487+0200 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to server-c:27017
...

These REPL lines contain contain several messages unrelated to elections. To get a better view of election activity, filter out lines containing sync. You can use the same approach to filter out lines about failing heartbeats and other messages if they interfere with your view of election activity.

Lines containing election or transition to events are the most useful information to view.

$ grep REPL server-a.mongod.log  | grep -v sync | head

2017-10-11T11:16:43.616+0200 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
2017-10-11T11:16:43.637+0200 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2017-10-11T11:16:43.649+0200 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2017-10-11T11:16:43.657+0200 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 11
2017-10-11T11:16:43.657+0200 I REPL     [ReplicationExecutor] transition to PRIMARY
2017-10-11T11:16:43.666+0200 I REPL     [rsSync] transition to primary complete; database writes are now permitted
2017-10-11T11:16:45.697+0200 I REPL     [ReplicationExecutor] Member server-c:27017 is now in state SECONDARY
...

Transition to RECOVERY then STARTUP2 is normal soon after a replica set member starts. Unexpected transitions between SECONDARY and PRIMARY are of more importance.

In the example above we can see that server-a was a secondary before 2017-10-11T11:16:43.616+0200, but began an election because it had seen no PRIMARY in the past 10000ms. It is not clear which of the other servers (server-b or server-c) was the primary before. We need to check the logs of the other members to get a complete picture.

The MongoDB starting lines (and the few lines before this showing the last pre-shutdown message) show that both server-b and server-c were running at the time server-a began an election.

~$ grep -B 2 'MongoDB starting' server-b.mongod.log | tail -n 8
--
2017-10-05T10:46:37.439+0200 I CONTROL  [conn119] dbexit:  rc: 0
2017-10-05T10:46:37.737+0200 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-10-05T10:46:37.758+0200 I CONTROL  [initandlisten] MongoDB starting : pid=23793 port=27017 dbpath=/mongo_data 64-bit host=server-b.dmz.corp
--
2017-10-05T11:15:21.568+0200 I CONTROL  [conn508] dbexit:  rc: 0
2017-10-05T11:15:21.924+0200 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-10-05T11:15:21.944+0200 I CONTROL  [initandlisten] MongoDB starting : pid=29221 port=27017 dbpath=/mongo_data 64-bit host=server-b.dmz.corp
~$ tail -n 1 server-b.mongod.log
2017-10-18T16:17:05.083+0200 I COMMAND  [conn187213] command ...
~$
~$ grep -B 2 'MongoDB starting' server-c.mongod.log | tail -n 8
--
2017-10-05T10:48:00.559+0200 I CONTROL  [conn128] dbexit:  rc: 0
2017-10-05T10:48:00.755+0200 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-10-05T10:48:00.777+0200 I CONTROL  [initandlisten] MongoDB starting : pid=24705 port=27017 dbpath=/mongo_data 64-bit host=server-c.dmz.corp
--
2017-10-05T11:15:51.060+0200 I CONTROL  [conn480] dbexit:  rc: 0
2017-10-05T11:15:51.288+0200 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-10-05T11:15:51.308+0200 I CONTROL  [initandlisten] MongoDB starting : pid=29858 port=27017 dbpath=/mongo_data 64-bit host=server-c.dmz.corp
~$ tail -n 1 server-c.mongod.log
2017-10-11T12:09:08.846+0200 I NETWORK  [initandlisten] connection accepted from ...

In this case, this information does not help determine which member was primary. Instead, a stepping down line shows that server-c was the previous primary:

~$ grep "stepping down" server-c.mongod.log
2017-10-11T11:16:43.804+0200 I REPL     [ReplicationExecutor] stepping down from primary, because a new term has begun: 11

Diagnosis

In this example, there are two possible causes for the election:

the network connection between server-c and the other two members failed
the performance of the server-c became so slow it did not respond to requests from the other members for over 10 seconds.

The logs display no evidence of slow commands, but there were transient, one-sided TCP socket exceptions around the time of the election. Thus we can conclude that network failure caused the election.

Mongodb Solutions

Search This Blog

How do I determine the cause of replica set elections?

Election example

Comments

Post a Comment