The aggregation framework has a slew of new operators and pipeline improvements. One notable improvement is the more robust $unwind
pipeline stage.
One of the main motivations for using Mongo is its flexible document model. Any document in a collection can have arbitrarily different fields and data types in fields. This is great for many reasons. But when it came to aggregating array elements it posed a few problems. Unwinding an array is done with the $unwind
pipeline stage, but you could only specify the field to unwind.
Problem one was that the variety across documents created for non-existent arrays. Consider a cake collection, containing documents about cakes.
> db.cakes.find() |
Pound cake has a recipe field, containing ingredients. Brownies are defined in a different document schema, listing the main constituents of the brownie in a makeup field.
If we wanted to create a listing of possible cakes by ingredient, we’d write something like:
db.cakes.aggregate([ |
But this will skip brownies because brownies don’t have field recipe at all. We’d have to synthetically add a recipe field in order to fix that like so:
db.cakes.aggregate([ |
Adding an array with the single element ‘-‘ (or whatever token you want) was necessary because $unwind
insisted that the field being unwound would both be an array type and contain an element. No element, not an array? $unwind
didn’t emit the document at all.
But with 3.2, $unwind
would allow you to emit a document placeholder even if there is no field or the field contains null. You activate this option by adding a second argument to $unwind
named **preserveNullAndEmptyArrays ** with a value of true
or false
. If you don’t specify the extra argument, then no document is emitted for null or empty array fields. This allows us the more concise expression
db.cakes.aggregate([ |
Using preserveNullAndEmptyArrays
will emit documents that have either null value, or an empty array. The above aggregation example will produce a result with _id null for all cakes that don’t have a recipe field.
Nice.
But variety doesn’t stop at existence or non-existence of fields. What about a field that contains an array in some documents, but is a straight out string in another?
Consider these 3 cakes:
{ "_id" : "princess", "makeup" : [ "sponge", "jam", "sponge", "custard", "sponge", "whipped-cream", "marzipan" ] } |
The first two contain an array for their multi-part “cakiness” in the makeup field. But the brownie, well.. it’s made out of brownie! The value of the makeup field there is just a string.
$unwind
used to error out on this condition. If it encountered any document who’s field was not an underlying BSON type of array it would halt the pipeline and throw an error. But not anymore!
Running a straightforward aggregation:
db.cakes.aggregate([ |
Is perfectly acceptable in the new unwinding world. When $unwind
sees a field with a non-array value, it treats it as an array with that single value, and emits a single document containing that value. No more error. Since in the past such data condition would have produced an error, your legacy code should largely work OK under upgrade. If it didn’t error then this condition wasn’t present – presumably clean data or you took the time to fashion fancy $match
clause or expressions that prevented single-valued fields from entering the $unwind
stage.
Nice.
This feature adds nuances to a common scenario. Initially, a document schema contains only a single value because it is version 0.9 of the software, and the use case was simple. Later on, we discover that a single value doesn’t cover the future feature and convert a field to an array. But then we may have some limbo time when some documents are saved with one schema, and others with another. Either way, we can now aggregate across that field and handle single-valued fields as an array with single a value without the need for elaborate $project of $match.
Oh, and one more thing:
We may have a need to figure out how cake composition varies across cakes. For that, I want to aggregate around the ordinal position of a component across cakes. I’d like to have an idea across the layer 1’s, layer 2’s etc what’s the amount of variation. To assist with that, $unwind
has a new option to emit the array offset alongside the value.
> db.cakes.aggregate([ |
Which lets us see that our cakes thus far have sponge layers and a variety of fillings in between for the first 4 components (offsets 0 - 4)
When we evaluate our customer schema and use cases, we also spend time ensuring that we can produce requisite (and sometimes intricate) queries using available Mongo server syntax. The 3.2 release includes an improved $unwind
and a bevy of other operators which helps us cover more use cases in a more compact and efficient way.