How does the MMAPv1 freelist work?

Data files contain extents, which in turn contain records. A record is either a document or an index bucket.
When a document is moved or removed, the record it previously occupied is moved to the freelist, where it is considered for reuse. The freelist is not a single list of available records; rather, it is a set of sorted lists, called buckets, that contain records grouped and sorted by size. Thus, when the server requires an available record of a certain size, it does not need to examine the records in smaller buckets. There are 18 buckets, ranging in size from 32 bytes up to the maximum BSON document size of 16MB. The bucket sizes increase by powers of 2 up to 4MB, and everything larger than 4MB goes into the last bucket. In bytes, the bucket boundaries are: 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304, and 16777216.
For performance reasons, after the server has scanned a fixed maximum number of freelist entries per bucket (currently 30), the search jumps to the next bucket if a suitable record has not been found. This prevents spending a long time scanning many entries in a single bucket, especially when the size required is close to the upper limit of the bucket. If all buckets are searched unsuccessfully, then a new extent is allocated.
The frequency with which the freslist is examined gives insights into the degree to which the server is reusing space, and how much time the server spends attempting to reuse space. In MongoDB 2.6 and above, the db.serverStatus() command provides metrics measuring freelist scans:
  • storage.freelist.search.requests counts the number of times the server has tried to look up the freelist for a record allocation.
  • storage.freelist.search.scanned counts the number of freelist bucket entries that have been examined.
  • storage.freelist.search.bucketExhausted counts the number of times a bucket has been fully searched, requiring advancement to the next bucket.
Using this information you could, for example, find the average number of freelist entries scanned per record allocation request by dividing the value of scanned by requests. Whether this value is small or large, relative to the stop point of 30 entries per bucket, gives an indication of whether the server is able to find reusable space easily, or whether it has to search extensively in the freelist before finding reusable space or needing to allocate new space in a new extent.

Comments