Skip to main content

Understanding and using MongoDB indexes effectively

· 6 min read
Thomas Bodin
Thomas Bodin
Software Engineer
Cover

At Orus, MongoDB is a cornerstone of our technical stack. Its flexibility, thanks to its document‑oriented model, makes it an excellent choice for quickly modeling complex data. But, as with any powerful tool, that flexibility can become a pitfall if not handled correctly.

Index management, in particular, is still underestimated.

We've all, at some point early in our careers, added an index "just in case," copy‑pasted configuration from a neighboring repository, or assumed MongoDB would "choose the right index" for us. The result: useless indexes, performance regressions, and sometimes even counterproductive setups.

In this article, we share our experience on:

  • How MongoDB indexes really work (beyond the documentation)
  • Indexing patterns to avoid, even if they seem "safe"
  • A clear method to decide when and why to add an index

🗂️ What an index actually does in MongoDB

An index is a data structure MongoDB maintains alongside your collections to enable faster searches. Without an index, MongoDB must read every document one by one to find matches. That’s a collection scan, and it becomes prohibitively expensive as data volumes grow.

B+ Tree
info
Technical aside: What is a "B+ tree"?

MongoDB stores indexes as a B+ tree, a sorted tree structure that's highly efficient for searching, sorting, and finding value ranges.

Think of it like a dictionary where words are sorted alphabetically. If you're looking for “pineapple,” you don't read every single page, you jump to the letter P, then to the right section. That's what a B+ tree is: a fast way to search through large, sorted data sets.

🐌 Simple example: without an index

Assume a subscriptions collection with 100,000 documents. One document might look like:

{
"_id": "68651a9307617c719cadf169",
"subscriptionId": "sub_e256edef"
"userId": "user_764316e7",
"provider": "acme",
"status": "active"
}

If we run the following query:

db.subscriptions.find({ subscriptionId: "sub_e256edef" })

➡️ Without an index on subscriptionId, MongoDB must scan all documents, testing each one. That gets very slow as the data volume increases.

⚡ Now, with an index

We create an index on subscriptionId:

db.subscriptions.createIndex({ subscriptionId: 1 })

MongoDB can now jump directly to "sub_e256edef" in the B+ tree and retrieve matching documents without scanning the entire collection.

⚙️ Compound indexes: how MongoDB uses (or ignores) them

MongoDB also lets you create indexes on multiple fields at once, these are called compound indexes.

db.subscriptions.createIndex({ subscriptionId: 1, userId: 1 })

These are valuable when you frequently filter on both fields together, or need to sort on a combination.

But there's a critical rule:

➡️ MongoDB can only use a compound index if the query includes the first field(s) in the defined order.

✅ What works

db.subscriptions.find({ subscriptionId: "sub_e256edef" })

→ Uses the index partially (on subscriptionId only).

db.subscriptions.find({ 
subscriptionId: "sub_e256edef",
userId: "user_764316e7"
})

→ Fully uses the index (first subscriptionId, then userId).

❌ What won't work

db.subscriptions.find({ userId: "user_764316e7" })

→ MongoDB won't use the compound index, because subscriptionId comes before userId.

MongoDB can't jump straight to userId without knowing the preceding field's value.

💡 Tip: Often filtering on userId alone?

If you frequently query on userId without subscriptionId, it's best to add a separate index:

db.subscriptions.createIndex({ subscriptionId: 1, userId: 1 })
db.subscriptions.createIndex({ userId: 1 })
note
Golden rule

A compound index { a: 1, b: 1 } is used only if the query filters on a alone or on both a and b, in that exact order. It won’t be used if the query filters only on b.

📌 Key takeaways

  • An index can drastically speed up queries
  • Single-field index → for one field
  • Compound index → for filters on multiple fields in a defined order
  • MongoDB reads compound indexes left to right: without the first field in the query, the index is ignored

🎯 When (and when not) to add an index

Adding an index is tempting ("make queries faster"), but too many or unnecessary indexes can actually degrade performance.

At Orus, we always ask three questions before adding an index:

  1. Is this field used in frequent queries?
  2. Does the field have high cardinality (many distinct values)?
  3. Is this field already covered by another index?

Let's examine these questions with examples.

1. Is this field used in frequent queries?

If you never .find({ x: ... }) or .sort({ x: ... }) on a field, an index on it is useless.

2. Does the field have high cardinality?

Cardinality refers to the number of distinct values in a field. The higher the cardinality, the more relevant an index becomes for improving query speed.

Example:

Consider the subscriptions schema:

{
"_id": "68651a9307617c719cadf169",
"subscriptionId": "sub_e256edef",
"status": "active"
}
  • subscriptionId is unique per document → ✅ an index is useful to quickly retrieve a subscription.

A field with very few values (e.g., status: "active" | "suspended" | "cancelled") is unlikely to filter results efficiently. MongoDB may choose to ignore the index.

⚠️ However, in some cases, even low cardinality can justify an index:

  • If one of the values is rare and frequently queried
  • If the field is combined with a more selective one in a compound index
  • If the cardinality evolves over time (e.g., during progressive migrations, versioning, etc.)

Example:

In a collection of materialized views with a minorVersion field, there may initially be only two values (v1, v2), but v1 quickly becomes rare as updates are applied. An index remains very useful for quickly identifying documents to process, even if the absolute cardinality is low.

3. Is the field already part of an existing index?

Often, we think we need another index, but the field is already included in a compound index.

Example:

db.subscriptions.createIndex({ subscriptionId: 1, userId: 1 })

➡️ No separate index on subscriptionId is needed, it's already covered as the first field.

⚠️ Index costs

Indexes aren't free:

  • They must be maintained on every insert/update/delete
  • They can slow down writes
  • Too many indexes make index selection less efficient
  • They consume RAM (each index is an in-memory structure)

📌 Key takeaways

While we cannot write down an exact decision tree for which index to create or not, we can summarize the principles as follows :

  • Always keep in mind how your index will accelerate the search. Think in terms of returned documents vs scanned documents, and the trade-off with index overhead.
  • Be mindful about how these parameters will change over time