Links#
https://www.mongodb.com/docs/manual/administration/analyzing-mongodb-performance/
https://www.mongodb.com/docs/manual/applications/indexes/
https://www.mongodb.com/docs/manual/core/query-optimization/
https://www.mongodb.com/docs/manual/reference/explain-results/
https://www.mongodb.com/docs/manual/tutorial/manage-the-database-profiler/
https://www.mongodb.com/docs/manual/core/wiredtiger/
https://www.mongodb.com/docs/manual/troubleshooting/connection-storms/
https://www.mongodb.com/docs/manual/replication/
1. Tuning Principles#
调优顺序:
1. 先确认业务症状
latency high
timeout
CPU high
disk IO high
replication lag
connection exhausted
2. 再找证据
slow query log
profiler
explain("executionStats")
serverStatus
currentOp
mongostat / mongotop
exporter metrics
3. 再改最小范围
index
query shape
schema
connection pool
write concern
hardware / storage
shard key
4. 最后压测验证
p95 / p99 latency
docs examined / returned
keys examined / returned
CPU / disk / cache
replication lag
do not tune by guess:
不要看到慢就先加机器
不要看到慢就盲目加 index
不要看到 connection 多就只调 maxIncomingConnections
不要把所有读都切到 secondary
不要在没有 explain 的情况下改 query
2. Common Tuning Methods#
slow query#
db.setProfilingLevel(1, { slowms: 100 })
db.system.profile.find({
millis: { $gte: 100 }
}).sort({ ts: -1 }).limit(20).pretty()
watch:
ns
op
millis
command.filter
command.sort
keysExamined
docsExamined
nreturned
planSummary
explain#
db.orders.find({
user_id: "u-1001",
status: "PAID"
}).sort({ created_at: -1 }).limit(50).explain("executionStats")
good:
winningPlan has IXSCAN
totalDocsExamined close to nReturned
totalKeysExamined reasonable
executionTimeMillis within SLO
bad:
COLLSCAN
SORT stage with large input
docsExamined much larger than nReturned
many rejected plans
index tuning#
index rules:
build index for query shape, not for every field
compound index order usually follows ESR:
Equality
Sort
Range
remove unused indexes
avoid many overlapping indexes
partial index for sparse business states
TTL index for expiry data
db.orders.createIndex(
{ user_id: 1, status: 1, created_at: -1 },
{ name: "idx_user_status_created_at" }
)
db.orders.createIndex(
{ created_at: 1 },
{
name: "idx_pending_created_at",
partialFilterExpression: { status: "PENDING" }
}
)
db.orders.aggregate([{ $indexStats: {} }])
query tuning#
query rules:
filter early
project only needed fields
limit result size
avoid large skip
use cursor-based pagination
avoid unbounded $in
avoid regex without index strategy
avoid sorting without supporting index
db.orders.find(
{
user_id: "u-1001",
created_at: { $lt: ISODate("2026-05-29T10:00:00Z") }
},
{
_id: 1,
order_id: 1,
status: 1,
amount: 1,
created_at: 1
}
).sort({ created_at: -1, _id: -1 }).limit(50)
schema tuning#
schema rules:
embed small data read together
reference data that grows without bound
avoid unbounded arrays
avoid huge document close to 16 MB
split hot fields and cold fields when update/read pattern differs
keep frequently updated counters isolated if they cause document contention
aggregation tuning#
aggregation rules:
$match early
$project early to reduce payload
use index before $sort
avoid $lookup on huge unbounded sets
avoid memory-heavy $group on online path
move analytics to offline pipeline when needed
db.orders.aggregate([
{
$match: {
tenant_id: "t-1001",
created_at: { $gte: ISODate("2026-05-01T00:00:00Z") }
}
},
{
$project: {
_id: 0,
status: 1,
amount: 1
}
},
{
$group: {
_id: "$status",
total: { $sum: "$amount" },
count: { $sum: 1 }
}
}
])
connection pool#
connection tuning:
use one MongoClient per process, reuse it
do not create client per request
configure maxPoolSize based on app concurrency
configure minPoolSize for traffic spikes if connection storm appears
monitor connections.current and rejected connections
check app timeout before increasing database limits
example URI options:
maxPoolSize=100
minPoolSize=10
maxIdleTimeMS=60000
serverSelectionTimeoutMS=5000
WiredTiger cache#
WiredTiger:
uses internal cache + filesystem cache
default cache is usually automatically calculated
container / mixed workload should set explicit cacheSizeGB
watch:
cache dirty bytes
bytes read into cache
eviction activity
disk read latency
index size vs memory
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 4
write tuning#
write path:
batch writes where possible
avoid too many indexes on write-heavy collection
use unordered bulkWrite when order is not required
choose writeConcern per business criticality
avoid multi-document transaction on hot path if schema can avoid it
db.orders.bulkWrite(
[
{
updateOne: {
filter: { order_id: "o-1001" },
update: { $set: { status: "PAID" } }
}
},
{
updateOne: {
filter: { order_id: "o-1002" },
update: { $set: { status: "CANCELLED" } }
}
}
],
{ ordered: false }
)
replication tuning#
replication:
monitor replication lag
keep oplog window larger than maintenance window
avoid slow secondary due to weak disk
avoid heavy reads on secondary that delay replication
use majority writeConcern for critical data
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()
sharding tuning#
sharding:
choose shard key from real workload
avoid low-cardinality shard key
avoid monotonically increasing hot shard
prefer targeted query over scatter-gather
monitor chunk distribution and jumbo chunks
3. Classic Tuning Cases#
case 1: COLLSCAN#
problem:
API p95 latency high
explain shows COLLSCAN
docsExamined much larger than nReturned
fix:
create index matching filter and sort
db.orders.createIndex(
{ user_id: 1, created_at: -1 },
{ name: "idx_user_created_at" }
)
verify:
winningPlan uses IXSCAN
docsExamined close to nReturned
case 2: sort without index#
problem:
query filter is indexed but sort is slow
explain has SORT stage
fix:
compound index includes equality fields first, then sort field
db.orders.createIndex(
{ tenant_id: 1, status: 1, created_at: -1 },
{ name: "idx_tenant_status_created_at" }
)
problem:
page 1 fast, page 1000 slow
query uses skip(50000)
fix:
use cursor-based pagination
db.orders.find({
user_id: "u-1001",
created_at: { $lt: ISODate("2026-05-29T10:00:00Z") }
}).sort({ created_at: -1 }).limit(50)
case 4: too many indexes slow writes#
problem:
insert/update latency high
collection has many overlapping indexes
index size grows faster than data size
fix:
review $indexStats
drop unused indexes
merge overlapping compound indexes
db.orders.aggregate([{ $indexStats: {} }])
db.orders.dropIndex("idx_unused_field")
case 5: unbounded array#
problem:
document grows continuously
update latency increases
document may approach 16 MB
fix:
move growing child records to separate collection
or use bucket pattern
bad:
user document contains all login events
good:
users
user_login_events
case 6: connection storm#
problem:
traffic spike causes connection spike
connections.current near limit
app creates MongoClient per request
fix:
create one MongoClient per process
tune maxPoolSize / minPoolSize
set app timeout
bad:
new MongoClient() inside every HTTP request
good:
create MongoClient during application startup
reuse client for all requests
case 7: hot working set larger than memory#
problem:
CPU not high, but latency high
disk read latency high
WiredTiger reads into cache high
fix:
reduce working set
add memory
archive cold data
keep hot indexes in memory
review cacheSizeGB in container
classic solution:
split hot recent orders and cold history
or add time-based query boundary
case 8: aggregation too heavy#
problem:
online request runs large $group / $lookup
memory and CPU spike
fix:
add early $match
project only needed fields
precompute summary
move report to offline job
classic solution:
order_stats_daily collection
update by async worker
API reads precomputed summary
case 9: replication lag#
problem:
secondary lag increases
reads from secondary return stale data
fix:
check slow secondary disk
reduce heavy secondary reads
check index build / migration
increase oplog size if oplog window too short
rs.printSecondaryReplicationInfo()
rs.printReplicationInfo()
case 10: bad shard key#
problem:
sharded cluster still has one hot shard
many queries are scatter-gather
fix:
choose shard key with high cardinality and query targeting
avoid created_at only
consider hashed or compound shard key
bad:
shard key = { created_at: 1 }
better:
shard key = { tenant_id: 1, order_id: "hashed" }
exact choice depends on workload
case 11: regex query slow#
problem:
search query uses unanchored regex
index cannot help effectively
fix:
use anchored prefix when possible
normalize search field
use text/search engine for real search
// better than /abc/ if prefix search is acceptable
db.users.find({ username: /^abc/ }).limit(20)
case 12: large document / blob#
problem:
document read/write is slow
only small metadata is needed on most requests
fix:
store large file in object storage or GridFS
keep metadata and pointer in MongoDB
split hot metadata from cold payload
4. Tuning Checklist#
query:
slow query enabled
explain checked
no COLLSCAN on hot path
docsExamined / nReturned reasonable
sort uses index
projection used
index:
compound indexes match query shapes
unused indexes removed
index size monitored
partial / TTL indexes used where appropriate
schema:
no unbounded arrays
no huge documents
hot and cold fields separated when needed
transaction usage reviewed
runtime:
connection pool reused and bounded
WiredTiger cache reviewed
disk latency monitored
ulimit / file descriptors reviewed
replication:
replication lag monitored
oplog window enough
secondary reads controlled
sharding:
shard key tested
targeted queries preferred
chunk distribution monitored