A capped collection is similar to a first-in-first-out system. It stays at a set size, with the oldest documents being aged out to make space for new documents. The implementation of capped collections will vary by storage engine.
The most common example of a capped collection is a replica set oplog. Unlike a regular capped collection, the oplog does not have an
_id
index, and you cannot create indexes on it. Instead, the oplog is accessed using the timestamps given in the ts
field.
This article describes the capped collection implementation in WiredTiger and a specific optimization for the replication oplog.
Oplog capped collections
In a regular capped collection using the WiredTiger storage engine, once the collection size limit is reached some of the oldest documents are deleted synchronously as new documents are inserted into the collection.
An oplog uses a dedicated background thread to perform capped collection deletion. The thread is called as required and is not time-based like a time-to-live index. There is additional logic that enables fast truncation of the oplog and efficient removal of old records by recording milestones, also known as oplog stones
The stones represent logical markers against the oplog that are used as truncation points. When a record is inserted, its size is added to the stone being filled. If the size of the stone exceeds the threshold, then a new stone is created. If the number of stones exceeds the threshold (between 10 and 100, based on the size of the oplog), then a background thread truncates the records that are contained within the oldest stone.
Oplog stones are not persisted, so new stones are chosen at startup based on the records in the oplog. For small oplogs or those containing few records, the entire oplog is scanned and the number of stones required is computed by packing records into a stone until the threshold is exceeded.
For larger oplogs or those with many records (>20,000), records are oversampled (by a factor of 10) at random from the oplog. Samples are then chosen such that they are expected to be near the right boundary of the logical section. As the oplog is truncated, the error in this estimation is reduced because the actual size of newly created stones is known with greater certainty.
Removal the documents :
Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.
ReplyDeleteBig Data Consulting Services
Data Lake Solutions
Advanced Analytics Services
Full Stack Development Solutions
nice artical : https://thedbadmin.com/how-to-rebuild-mongodb-replica-set-node-fast-in-few-minutes/
ReplyDeletehttps://thedbadmin.com/running-mongodb-on-docker-compose/
ReplyDelete