I have a collection of people, named peeps:
db.peeps.insert({UserName: 'BOB'}) |
And I want to be able to find the user named “Bob”. Except I don’t know if the user name is lower, upper or mixed case:
db.peeps.find({UserName:'bob'}).count() // 1 result |
But I don’t really want to require my users to type the name the same case as was entered into the database…
db.peeps.find({UserName:/bob/i}).count() // 3 results |
Me: Yey, Regex!!!
MongoDB: Ahem! Not so fast… Look at the query plan.
db.peeps.find({UserName:/bob/}).explain() |
Me: Oh, I’ll create an index!
|
Me: Yey!!!
MongoDB: Dude, dig deeper… and don’t forget to left-anchor your query.
// Run explain(true) to get full blown details: |
Me: Yey?
MongoDB: Each key in the index was examined! That’s not scalable… for a million documents, mongo will have to evaluate a million keys.
Me: But, but, but…
db.peeps.find({UserName:/^bob/}).explain(true) |
Me: This is back to exact match :-) Only one document returned. I want case insensitive match!
Old MongoDB: ¯\(ツ)/¯… Normalize string case for that field, or add another field where you store a lowercase version just for this comparison, then do an exact match?
Me:
New MongoDB: Dude: Collation!
Me: Oh?
Me: (Googles MongoDB Collation frantically…)
Me: Ahh!
db.peeps.createIndex({UserName:-1}, { collation: { locale: 'en', strength: 2 } ) |
Me: Squee!
MongoDB: Indeed.
Collation is a very welcome addition to MongoDB.
You can set Collation on a whole collection, or use it in specific indexing strategies.
The main pain point it solves for me is the case-insensitive string match, which previously required either changing the schema just for that (ick!), or using regex (index supported, but not nearly as efficient as exact match).
Beyond case-sensitivity, collation also addresses character variants, diacritics, and sorting concerns. This is a very important addition to the engine, and critical for wide adoption in many languages.
Check out the docs: Collation