Node Mongoose Demo Code to ignore

There’s lots of “getting started” tutorials out there. Some are great, but some, well, shall we say “sub-optimal”?

When using Mongoose, you get an entity-centric model to work with. Very often, it becomes the basis for a RESTful API. The verb mapping typically just rips through POST, GET (list) and GET (one by _id) and DELETE no problem. When it comes to PUT though, things become a bit trickier. Genereally understuood simply as an “update”, implementing PUT can get you into all sorts of funkiness.

The code to ignore in general is something to this effect: (error checking removed for brevity)

// define schema for Animal
var AnimalSchema = new Schema({
name: String
isCute: Boolean
});
module.exports = mongoose.model( 'Animal', AnimalSchema);
/// then wire it up in some other place
router.route( '/animal/:animal_id')
.put( function(req, res) {
// dig up existing entity
Animal.findById(req. params.animal_id, function(err, existing) {
// update the field
existing.name = req.body.name;
// save the animal
existing.save( function(err) {
res.json({ message: 'Yey!' });
});
});
});

TL;DR;

Don’t fetch and entity in order to update it.

Why? Performance, data loss and concurrency.

Let’s talk performance first. The number of transactions against mongo server here is 2, in order to perform only one logical operation. In the code above, there’s Animal.findById() and then a .save() operation. This is wasteful. Mongo has explicit update syntax which allows you to perform an update without 2 round-trips to the server. While 2 operations can be fast, this limits the speed at scale and consumes more resources both mongodb side and the node application side since, well, double work. In addition, the opportunity for failure just increased as now we have 2 operations happening. How do you do an update? Here’s an example:

router.route( '/animal/:animal_id')
.put( function (req, res) {
Animal.update(
{ _id: req. params.animal_id },
{ $set: { name: req.body.name } },
function (err, updateResult) {
res.json(updateResult);
});
});

We are shipping the work of finding the right document and updating a field within the document to Mongo server, which then saves us from doing another round trip. Since the command itself is shipped, the mongo server doesn’t need to send us the entire object over the network and then have us return the same object (modified) again. The larger the document, the bigger the drag on resources if you don’t use the update syntax.

Update takes a query term as the first argument, and an update term as the second argument. The query term is just id equality. So we know that the search for the document is going to be fast since the _id field is always indexed. The update term here is pretty simple, we just set the name field value to the value sent from the API client.

TL;DR; – use the update() function!

What about the data loss potential? Imaging 2 clients trying to update the same animal at roughly the same time. Instead of changing the name, one client wants to update the isCute field only, and second client wanting to update the name field only. So someone might update the original code to look like this:

router.route( '/animal/:animal_id')
.put( function(req, res) {
// dig up existing entity
Animal.findById(req. params.animal_id, function(err, existing) {
if(req.body.name) existing.name = req.body.name;
if(req.body.isCute) existing.isCute = req.body.isCute;
// save the animal
existing.save( function(err) {
res.json({ message: 'Yey!' });
});
});
});

Here the “improved” code first checks if the client even submitted a value for isCute or name, and only if the caller supplied a value it replaces it. Seems like it should work. But there’s a chance of data loss here.

Let’s say the animal right now is {_id: 1, name: ‘Fido’, isCute: false};

  1. Client A reads the animal, gets: _{id: 1, name: ‘Fido’, isCute: false}
  2. Client B reads the animal, gets: _{id: 1, name: ‘Fido’, isCute: false}
  3. Client A updates in-memory, and makes the animal name Rex:_ {id: 1, name: ‘Rex’, isCute: false}
  4. Client B updates in-memory, and makes the animal cute: _{id: 1, name: ‘Rex’, isCute: true}
  5. Client A saves her in-memory object to mongo. Mongo will now have: _{id: 1, name: ‘Rex’, isCute: false}
  6. Client B saves his in-memory object to mongo. Mongo will now have: _{id: 1, name: ‘Fido’, isCute: true}

After they are both done, we would have expected to see _{_id: 1, name: ‘Rex’, isCute: true} _but it isn’t. Client B overwrote A’s update. Worst, client A had no idea that her renaming from Fido to Rex has failed. In fact, it was even succeeded for a small window of time between steps 5 and 6. But the change is nonetheless lost.

What should be done? You guessed it: update!

router.route( '/animal/:animal_id')
.put( function (req, res) {
var values = {};
if(req.body.name) values.name = req.body.name;
if(req.body.isCute) values.isCute = req.body.isCute;
Animal.update(
{ _id: req. params.animal_id },
{ $set: values },
function (err, updateResult) {
res.json(updateResult);
});
});

Here, we exercise the same logic to conditionally only “touch” the fields that the API client submitted. This time however, since we are telling Mongo to update the document and only touch the field submitted, the other field (not submitted) will no be affected!

The sequence then becomes

  1. Client A sends an update to mongo for {_id: 1} {$set: {name: **'Rex'**}}
  2. Client B sends an update to mongo for {_id: 1} {$set: {isCute: **true**}}

Since the mongo server performs these, the result would be that the animal would become named Rex and declared isCute –> true. It doesn’t matter if 1 or 2 occurred out of order. Since each update is touching a different field, they won’t step over each other.

There probably are plugins or middleware that help building update() correctly. But I wanted to make sure the principle and reasons are made clear here. Also, if you are doing REST API, consider distinguishing a PUT from a PATCH. Whereby a PUT might replace the whole entity with the submitted API value alone (destructively, not field-wise) and a PATCH specifies only select parts of the entity be touched. Whatever path you choose, take care you don’t subject yourself to the performance and data loss potential coming from a read-then-save cycle.