For years now, I’ve been using mongo’s save()
method on a collection. It’s convenient: hand it an document with an id, slam it in and done. With the C# 2.0 driver, (and other drivers as well) it’s now gone!
Will we miss it? Should we miss it? Lets take a closer look:
First – what is the syntactic meaning of “save”? The “save” function provided add-or-replace semantics . If an document by that id existed, it would be overwritten with the new document . If an document with that id did not exist, then the document at hand would become a new document. Seems legit, right?
Consider though, what would happen when a document already existed. It would be gone. Gone in the sense that the new document would overwrite the existing one. I know, I know. We know that! But not everyone catches on to this. Some people have in mind a merge-and-save behavior. A non-existent behavior where save will somehow:
Overwrite fields from the new document over any existing ones 1.
Add fields from the new document that didn’t exist before
Leave existing fields in the old document which aren’t present in the new document alone.
Well, effectively, 1 and 2 would actually happen, but 3 will not. And more than one naïve developer would then be surprised to find skimpy documents “missing” previous values. The remedy, of course, is education. But on the other hand, maybe there’s a better way (please read on).
Second – What did save actually do? “It saved it” would be the first inclination. Yes. it did. But how? Turns out, that it had a bit of logic behind it. If the new document you hand to save()
didn’t have an id field defined, then save would attempt to assign it an id and then simply insert()
the document. This depended on an id generator being assumed or present or inferred. In the shell, an ObjectId()
would be assigned. Language drivers had conventions and defaults to cover such scenario.
In pseudo code, this would look something like
if( newDocument._id is undefined) { |
If the document did have an id defined, then save()
would turn around and execute an update()
. that is, send an update command to the mongo server, with the {upsert: true}
option set, using the _id to identify which document to update. If a document by that id did not exit, the document would have been created, with that _id. That seems fine, right? But here is where things get interesting.
The update command can operate with 2 different interpretations of the “update” part of it.
When the update term is “plain”, Mongo would take the update term and use it as a verbatim document, setting the entire document to that update. Plain means that no fields in the update term started with the dollar sign (“$”). Plain means that the update term did not contain any operators.
If mongo sensed that the update term contained operators, then it would have done a surgical update, carrying out only the field updates specified and potentially maintaining the values of fields not mentioned in the update.
Since update()
used the “plain” mode of the update, any existing document would have been replaced ( the update()
behavior is documented quite well here).
The pseudo code for this would just look like an update, since an id was guaranteed present (otherwise the insert()
path would have been chosen), something along the lines of:
db.collectionName.update({_id: newDocument._id}, newDocument, {upsert: true}); |
Fine then, one might say. But why not just transform the new document into a bunch of $set operators? Well, that’s just not how update works. And even if it did, is this the correct behavior? If a user supplied a document with 3 fields, and previous document had 5, did the user intend that the new document would contain the 3 new fields and the old 2? Or did the user intend that the new document contain only the new 3 fields?
Deprecate feels a bit like a loss. But the semantic meaning is, in fact supported, albeit with a different syntax. Consider this C# snippet:
var person = new Person { Id = "some_id", Name = "Bob" }; |
Given a person object, with some assigned id, ReplaceOneAsync
with the IsUpsert = true;
will carry out the intended save()
. The syntax is a bit more elaborate, but the meaning is clear.
The words “replace one” refer to the whole document, not individual fields. This conveys the meaning well.The “upsert” intent is also explicit. When the value is true, the document will be inserted if it doesn’t already exist. When false, the document would only be replaced if it exists. Secondly, this syntax has you set the filter specifying which document to update on your own. You can, for instance, set a filter on a filed other than the _id field as the filter.
Theoretically this gives you the flexibility to not care about the _id at all. Technically, you can express a filter on a filed other than _id. But in practice, this will go nowhere fast: The “upserted” document must have some _id. If another document is found first with the filter but the _id doesn’t match the incoming document, an error would occur. When we run mongo training courses, questions around these kind of things arise quite often. Hopefully this shed a bit of light on the why and how to properly address such concerns.
The save()
function may be deprecated, but the intended functionality is not. In the new C# driver, you can achieve the same task using ReplaceOnAsync
. I like software that says what it does and does what it says!
Developers should do better since things are explicit, and the nuances of save()
vs. insert()
vs. update are less of a mystery.