MongoDB Sharding & Chunks

If you want to scale your reads or writes using MongoDB, you are going to use sharding.

The nice thing about sharding, is auto-sharding. Mongo doesn’t only distribute the data across nodes for you, it automatically splits collection data and keeps the nodes fairly equally loaded with data. The core mechanism supporting this arrangement is the chunk.

So, what are chunks anyway?

When you want to split data in a collection horizontally, you need a way to track which documents live where. When you define a sharded cluster you pick a sharding key. That sharding key is the data value which is used from each document when deciding on which physical shard that document shall be located. To keep track of all the documents, the config servers need to have some way to persist the shard association. If we kept each shard key value pointing to a shard for each document, it wouldn’t scale very well. A billion documents will require a billion entries, making the config servers work hard and consume lots of memory and I/O. If we defined a hash function that takes the key and distributes the key-value space across some nodes, we wouldn’t need to save each key value, and could predict the shard a document lives on by applying the function only. The trouble with that is that if we added or wanted to remove shards, we’d have a hard time balancing or re-balancing documents across available nodes, as the original function wouldn’t have considered a different amount of nodes. Further, a formulaic key to shard association would not necessarily result in even load on shards. Although a key space (like an integer) may be evenly round-robin’ed across shards (think Mod(x) where x is number of shards), the documents in reality may occur in clumps that end up on one shard vs. the theoretical key-space distribution. No, these methods have severe challenges. What Mongo does instead is define chunks.

A chunk is simply a small record that describes a key range, and the shard that key range is associated to. By having a chunk describe a key range rather than each discrete key, we get around the issue of storing every shard key value found. Although chunks are stored on the config servers, the storage size is much reduced compared to storing each key, since ranges can/would encompass a large number of keys in one small descriptor.

Initially, a chunk is created which encompasses all possible keys. For a sharded collection with a shard key “score” this may look like:

{ "score" : { "$minKey" : 1 } } -->> { "score" : {"$maxKey":1} } on : shard0000

This descriptor states that any document with any score value (from theoretical minimum to maximum) will be located on the shard named “shard0000”.

Once you start populating the collection with documents, Mongo will start splitting chunks and migrating them to other shards in an attempt to keep data evenly spread across the shards. This action takes place when Mongo sees a chunk containing 64MB worth of data. Once a chunk is split, Mongo will move the one of the two chunks to another shard. It will copy all the documents which fall into the chunk’s range over to the new shard, update the config servers to point the chunk to the new shard, and finally clean up the documents from the old shard. All this work is automatic and baked into the sharding mechanism for Mongo.

Chunks are split based on actual seen key values. Mongo computes a splitting point on the shard key based on the actual keys in the shard. The nice thing about this is that given an uneven distribution along a key-space, the splitting will follow the density and reduce it. Let’s say you have a shard key on a credit score of your customers. The theoretical range is 0 - 850. But in reality, your seen scores - the actual scores - are such that the majority are between 600 and 750, with a few outliers below 650, and a few above. The chunk-split progression may look something like:

// one chunk, covering all
{ "score" : { "$minKey" : 1 } } -->> { "score" : {"$maxKey":1} }
// then 2 chunks,
{ "score" : { "$minKey" : 1 } } -->> { "score" : 0 }
{ "score" : 0 } -->> { "score" : {"$maxKey":1} }
// then the upper chunk will split
{ "score" : { "$minKey" : 1 } } -->> { "score" : 0 }
{ "score" : 0 } -->> { "score" : 419 }
{ "score" : 419 } -->> { "score" : {"$maxKey":1} }
// since most scores fall above 419, that chunk will again split given enough data
{ "score" : { "$minKey" : 1 } } -->> { "score" : 0 }
{ "score" : 0 } -->> { "score" : 419 }
{ "score" : 419 } -->> 639 }
{ "score" : 639 } -->> { "score" : {"$maxKey":1} }

This is all data and insertion order related. Based on the addition of documents at a point in time, it may grow and need to split. Chunks remaining small enough are left alone. Over time, and given a good enough shard key, the data will be roughly evenly split across the shards. This mechanism does require active management on Mongo’s part as opposed to fixed hash function over the number of shards. But it has the advantage of allowing you to grow the cluster with ease. If you add another shard to the cluster, Mongo can migrate chunks (without splitting) draining some load off existing shards. Another issue with static hash functions is that it relies on theoretical values, not actual. Imagine all scores are even for some reason, and that your hash function is simply odd/even to place documents on 2 shards. All the even documents will end up on one shard and the other shard for odd numbers will be empty. Not so with Mongo’s chunk mechanism: Mongo will split chunks as they become full, and will move chunks to the least full shard at that time. It’s a dynamic thing. It’s easy to figure out the data load because you can just count chunks. Sure, there might be some pretty empty chunks, but the imbalance over a large enough dataset is negligible.

Mongo takes care of leveling the data size load on each shard. But what it can’t do is level the query load or write load on the shards. Given 3 shards - call them A, B, and C - Mongo doesn’t know if your next 3 documents will end up all on A, or go to B and C. It’s up to you. You must pick a shard key that will have such distribution properties. This is one of the tougher tasks in designing a sharded cluster. The effects of picking a good key can make or break your cluster. (And no, the “score” field is not a good shard key by any means). The principle considerations for picking the shard key are:

  1. We want writes to be spread across shards as evenly as possible
  2. We want reads to be as local as possible
  3. We want chunks to always be splitable

We’re sharding because we want to realize more I/O and processing bandwidth using more servers. Writes are I/O intensive, and therefore spreading the writes across shards evenly will give us the best write-performance. A bad shard key will cause consecutive document write load to go to one of the shards, leaving others idle.

mongos will route queries to the shards it thinks the documents are on. It is capable of clubbing together query results from multiple shards and hand them over to you as on result set. But if your queries can be served by one shard only, the rest of them will be free to answer more/other queries in parallel. Query locality has to do with picking a shard key that is likely present in your queries, and for which the documents largely or wholly live on one or few shards.

A source of imbalance can arise from a giant chunk. A chunk will grow beyond 64MB if Mongo can’t split it. This happens when the shard key is such that a chunk fills up, and all documents in it have the same shard key value. The score value in our previous example is susceptible to this. Over time, many customers may have a score of 720 let’s say. But once a chunk contains all of those, Mongo won’t be able to split it because it won’t find a center point between 720 and 720… That’s bad. It creates for a giant chunk, and the situation just worsens when more and more documents have that key value because they all end up on the same shard. More load will go to that shard relative to others because mongo can’t split that chunk and won’t move it either (balancing the load is based on chunk count). So it really pays to pick a key that has ample range in reality, with your expected documents. An integer such as score has a huge theoretical range, but when used for this problem domain of score it doesn’t.

Chunks are the basis of the sharding mechanism in Mongo. They offer some clear benefits over other distribution mechanisms. There are even more advanced scenarios supported by chunks, such as tag-aware sharding. As an administrator, you can do a few things to affect chunk splitting and migration, taking over some control when needed. But sharding is largely automatic and self managing, letting you focus on writing and deploying your awesome apps.

For a video tutorial on configuring sharding and much more, see this Pluralsight.com course).