MongoDB - Don't be so (case) sensitive!

2017-06-02

I have a collection of people, named peeps:

db.peeps.insert({UserName: 'BOB'})
db.peeps.insert({UserName: 'bob'})
db.peeps.insert({UserName: 'Bob'})
db.peeps.insert({UserName: 'Sally'})

And I want to be able to find the user named “Bob”. Except I don’t know if the user name is lower, upper or mixed case:

db.peeps.find({UserName:'bob'}).count() // 1 result
db.peeps.find({UserName:'BOB'}).count() // 1 result
db.peeps.find({UserName:'Bob'}).count() // 1 result

But I don’t really want to require my users to type the name the same case as was entered into the database…

db.peeps.find({UserName:/bob/i}).count() // 3 results

Me: Yey, Regex!!!

MongoDB: Ahem! Not so fast… Look at the query plan.

db.peeps.find({UserName:/bob/}).explain()
// ugg, collection scan:
// "winningPlan" : {
//    "stage" : "COLLSCAN", ...

Me: Oh, I’ll create an index!


db.peeps.createIndex({UserName:1});
// query again...
db.peeps.find({UserName:/bob/}).explain()
// Ah! IXSCAN!
// "winningPlan" : {
//      "stage" : "FETCH",
//      "inputStage" : {
//              "stage" : "IXSCAN", ...

// ignore case, still uses index
db.peeps.find({UserName:/bob/i}).explain() 
// Ah! IXSCAN!
// "winningPlan" : {
//      "stage" : "FETCH",
//      "inputStage" : {
//              "stage" : "IXSCAN", ...

Me: Yey!!!

MongoDB: Dude, dig deeper… and don’t forget to left-anchor your query.

// Run explain(true) to get full blown details:
db.peeps.find({UserName:/^bob/i}).explain(true)
// "executionStats" : {
//      "executionSuccess" : true,
//      "nReturned" : 3,
//      "executionTimeMillis" : 0,
//      "totalKeysExamined" : 4, // <<<=== Oy! Each key examined.
//      "totalDocsExamined" : 3,

Me: Yey?

MongoDB: Each key in the index was examined! That’s not scalable… for a million documents, mongo will have to evaluate a million keys.

Me: But, but, but…

db.peeps.find({UserName:/^bob/}).explain(true)
//"executionStats" : {
//    "executionSuccess" : true,
//    "nReturned" : 1,
//    "executionTimeMillis" : 0,
//    "totalKeysExamined" : 1, // <<<=== Ok, 
//    "totalDocsExamined" : 1, // <<<=== Not Ok...

Me: This is back to exact match :-) Only one document returned. I want case insensitive match!

Old MongoDB: ¯\(ツ)/¯… Normalize string case for that field, or add another field where you store a lowercase version just for this comparison, then do an exact match?

Me:

New MongoDB: Dude: Collation!

Me: Oh?

Me: (Googles MongoDB Collation frantically…)

Me: Ahh!

db.peeps.createIndex({UserName:-1}, { collation: { locale: 'en', strength: 2 } )

db.peeps.find({UserName:'bob'}).collation({locale:'en',strength:2}) // 3 results!

// "executionStats" : {
//      "executionSuccess" : true,
//      "nReturned" : 3,
//      "executionTimeMillis" : 0,
//      "totalKeysExamined" : 3, // <<<=== Good! Only matching keys examined
//      "totalDocsExamined" : 3,

Me: Squee!

MongoDB: Indeed.

Collation is a very welcome addition to MongoDB.

You can set Collation on a whole collection, or use it in specific indexing strategies.

The main pain point it solves for me is the case-insensitive string match, which previously required either changing the schema just for that (ick!), or using regex (index supported, but not nearly as efficient as exact match).

Beyond case-sensitivity, collation also addresses character variants, diacritics, and sorting concerns. This is a very important addition to the engine, and critical for wide adoption in many languages.

Check out the docs: Collation

+N Consulting, Inc.

MongoDB - Don't be so (case) sensitive!

Notice

+N Consulting, Inc.

Newsletter

Contact Us Now

Recents

Notice