# Tracing MongoDB Queries in Production   Atlas and Self-Hosted

## The Problem

MongoDB performance problems usually start quietly.

The CPUAn API that used to return in 150 ms starts taking 2 seconds. The CPU climbs on the primary. A background job begins scanning millions of documents. Atlas shows an index recommendation. A self-hosted server starts writing slow query lines to `mongod.log`. The application team sees only this:

```
GET /customers/123/orders took 4.2s
```

That is not enough. You need to answer database-level questions:

* Which query shape is slow?
* Which query runs too many times?
* Which collection is being scanned?
* Is the query using an index?
* Is it sorting in memory?
* Is the problem on the primary, a secondary, or a shard?
* Is one endpoint creating an N+1 query pattern?
* Did a new query shape appear after the latest deployment?
* Is the database actually slow, or is the application holding cursors open?

For MongoDB, a useful production tracing setup has several layers:

| Layer                          | Self-hosted MongoDB                                      | MongoDB Atlas                                            |
| ------------------------------ | -------------------------------------------------------- | -------------------------------------------------------- |
| Slow query visibility          | Diagnostic logs, `slowOpThresholdMs`, `slowOpSampleRate` | Query Profiler, logs, Performance Advisor                |
| Detailed per-operation tracing | Database Profiler, `system.profile`                      | Query Profiler first; Database Profiler only when needed |
| Query shape statistics         | `$queryStats` where available                            | `$queryStats` on supported Atlas tiers                   |
| Live operations                | `$currentOp`                                             | `$currentOp`, Atlas Real-Time Performance Panel          |
| Query plan analysis            | `explain("executionStats")`                              | `explain("executionStats")`, Atlas UI explain tools      |
| Index recommendations          | Manual analysis                                          | Performance Advisor                                      |

The goal is not to log every operation forever. The goal is to keep enough baseline visibility enabled so you can investigate quickly, then temporarily turn on deeper tracing only when the baseline is not enough.

***

## Choose One Mental Model

Self-hosted MongoDB and Atlas expose different tools, but the workflow is the same:

1. Start with aggregate or sampled evidence.
2. Identify the collection, operation type, and query shape.
3. Check whether it is slow once or repeated too often.
4. Use `explain` to understand the plan.
5. Fix the query, index, schema, or calling pattern.
6. Turn off temporary tracing after the investigation.

Atlas gives you more of this through the UI. Self-hosted MongoDB gives you more direct control over logs and profiler settings. In both cases, the production rule is the same: keep low-risk visibility on, and use high-volume tracing only in a controlled window.

***

## What Must Be Enabled Before the Incident?

Some MongoDB evidence exists only if you were already collecting it.

If a query caused a spike at 1:30 PM and you enable profiling at 2:00 PM, the profiler will only capture operations from 2:00 PM onward. It cannot reconstruct the missing 1:30 PM operations.

This is why the baseline matters.

### Good Always-On Baseline

For most production systems:

* Set a useful slow operation threshold.
* Keep slow operation logging enabled at a safe sample rate.
* Set application names in connection strings.
* Keep Atlas monitoring, Query Profiler, and Performance Advisor available where supported.
* Export or retain logs long enough to investigate incidents after the fact.
* Build alerts on slow queries, CPU, I/O, connections, replication lag, and queues.

### Temporary Investigation Tools

Turn these on only when you need deeper evidence:

* Database Profiler level `1` with a lower `slowms`.
* Profiler filters for a specific namespace or operation type.
* Higher slow query sample rate.
* Database Profiler level `2`, only for very short windows.
* Very verbose component logging.

Turn temporary tracing back down after the investigation. This is not a ceremony. It protects database performance, disk usage, and sensitive query data.

***

## Layer 1: Slow Query Visibility

MongoDB records slow operations in the diagnostic log. The default slow operation threshold is commonly 100 ms, but production systems often tune this based on workload.

Slow query logs are the first useful layer because they tell you which operations crossed a threshold.

### Self-Hosted MongoDB

In <mark style="color:$primary;">`mongod.conf`</mark>, keep a reasonable baseline:

```yaml
operationProfiling:
  mode: off
  slowOpThresholdMs: 100
  slowOpSampleRate: 1.0
```

This keeps the database profiler off but configures slow operation logging. With normal `logLevel` behavior, MongoDB writes slow operations to the diagnostic log according to `slowOpSampleRate`.

If your log volume is too high, raise the threshold or reduce the sample rate:

```yaml
operationProfiling:
  mode: off
  slowOpThresholdMs: 250
  slowOpSampleRate: 0.2
```

That means operations slower than 250 ms are considered slow, and MongoDB samples those slow operations at 20%.

You can also change these dynamically from `mongosh`:

```javascript
db.setProfilingLevel(0, { slowms: 250, sampleRate: 0.2 })
```

At level `0`, the database profiler is off, but the `slowms` and `sampleRate` values configure slow operation logging.

Check current settings:

```javascript
db.getProfilingStatus()
```

### Atlas

For Atlas, start with built-in tools:

1. Open your Atlas project.
2. Go to the target cluster.
3. Open the Query Profiler or Query Insights view.
4. Filter by time range, host, namespace, operation type, and latency.
5. Check the slowest operations and the documents examined to returned ratio.

Atlas Query Profiler is different from the Database Profiler. Atlas Query Profiler uses `mongod` query log data to identify slow operations. Changing the database profiler level is not the normal first step in Atlas.

Atlas also provides a Performance Advisor. Use it to review slow query shapes and index suggestions before changing indexes manually.

For deeper troubleshooting, Atlas lets you download MongoDB logs from the UI or CLI:

```bash
atlas logs download <hostname> mongodb.gz --projectId <projectId> --start <unixStart> --end <unixEnd> --decompress
```

Atlas retains MongoDB logs for a limited window, so for serious production systems, export logs to your observability stack or object storage.

***

## Layer 2: Add Application Context

Database traces are much more useful when you know which application, endpoint, or job created the query.

Set `appName` in every MongoDB connection string:

```
mongodb+srv://user:password@cluster.example.mongodb.net/appdb?appName=orders-api
```

For self-hosted MongoDB:

```
mongodb://user:password@mongo1:27017,mongo2:27017/appdb?replicaSet=rs0&appName=orders-api
```

This application name can appear in logs, profiler output, and current operation output.

For request-level tracing, add a query comment. Keep it short and never put secrets, emails, tokens, or customer personal data in it.

In `mongosh`:

```javascript
db.orders
  .find({ customerId: ObjectId("64b7f0b4f2a7d2f8d91a1111") })
  .comment("orders-api GET /customers/:id/orders trace=9f3a2c")
```

For aggregation:

```javascript
db.orders.aggregate(
  [
    { $match: { status: "PAID" } },
    { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
  ],
  { comment: "orders-api daily-summary trace=9f3a2c" }
)
```

### Atlas Steps

In Atlas:

1. Open Query Profiler.
2. Filter the time window around the request.
3. Search for the namespace and operation.
4. Look for the application name or comment in the slow operation details where available.
5. Cross-check the same trace ID in application logs.

### Self-Hosted Steps

On self-hosted MongoDB:

1. Search `mongod.log` for the `appName`, namespace or comment.
2. If profiling was enabled, query `system.profile` for `command.comment`.
3. Use `$currentOp` if the operation is still running.

Example:

```javascript
db.system.profile.find(
  { "command.comment": /trace=9f3a2c/ },
  { ts: 1, millis: 1, ns: 1, op: 1, command: 1, planSummary: 1 }
).sort({ ts: -1 }).limit(20)
```

***

## Layer 3: Atlas Query Profiler and Performance Advisor

If you are on Atlas, use Atlas-native tools first. They are designed for this job and reduce the temptation to enable heavyweight profiling too early.

### Query Profiler

Use Query Profiler to find:

* slow operations over a selected time range,
* namespaces with high latency,
* operations that scan too many documents,
* whether an operation used an index,
* outliers after a deployment,
* repeated query shapes.

Practical Atlas workflow:

1. Open the cluster.
2. Go to Query Profiler or Query Insights.
3. Select the incident time window.
4. Filter to the suspected database and collection.
5. Sort by execution time.
6. Open the slow operation details.
7. Check documents examined, documents returned, index usage, sort, and operation type.
8. Copy the query shape into `mongosh` or Compass and run `explain`.

The main thing to look for is not just "slow". Look for waste:

| Signal                           | Meaning                                   |
| -------------------------------- | ----------------------------------------- |
| Many docs examined, few returned | Missing index or low-selectivity index    |
| Sort without a useful index      | In-memory sort or blocking sort risk      |
| Same shape repeated many times   | N+1 query or polling pattern              |
| Slow on one host                 | primary/secondary/shard-specific pressure |
| Slow after deployment            | new query shape or changed access pattern |

### Performance Advisor

Performance Advisor analyzes slow query logs and suggests indexes. It is useful, but do not blindly create every suggested index.

Before creating an index, check:

* Is this query part of a critical user flow?
* How often does the query run?
* Does the index match the filter and sort order?
* Will the index increase write cost too much?
* Is there an existing index that can be adjusted instead?
* Is this actually a schema or query design issue?

Example index:

```javascript
db.orders.createIndex(
  { customerId: 1, status: 1, createdAt: -1 },
  { name: "idx_orders_customer_status_createdAt" }
)
```

If you create the index, verify with `explain` and production metrics afterward.

***

## Layer 4: Database Profiler and `system.profile`

The MongoDB Database Profiler records detailed operation data into a capped collection named `system.profile` in each database.

It is off by default. That is intentional. Profiling can add overhead, increase disk writes, and expose query data.

### Profiler Levels

| Level | Meaning                                    | Production use                                 |
| ----- | ------------------------------------------ | ---------------------------------------------- |
| `0`   | Profiler off                               | Good default                                   |
| `1`   | Profile slow operations or matching filter | Useful for focused investigations              |
| `2`   | Profile all operations                     | Avoid except for very short controlled windows |

### Self-Hosted: Enable for One Database

First, save the current status:

```javascript
db.getProfilingStatus()
```

Enable profiling for slow operations in the current database:

```javascript
db.setProfilingLevel(1, { slowms: 200, sampleRate: 0.5 })
```

Then inspect captured operations:

```javascript
db.system.profile.find(
  {},
  {
    ts: 1,
    millis: 1,
    ns: 1,
    op: 1,
    command: 1,
    planSummary: 1,
    keysExamined: 1,
    docsExamined: 1,
    nreturned: 1
  }
).sort({ ts: -1 }).limit(20)
```

Disable profiling after the investigation:

```javascript
db.setProfilingLevel(0)
```

### Atlas: Prefer Query Profiler First

In Atlas, use Query Profiler and Performance Advisor first. If you still need Database Profiler data:

1. Connect with `mongosh` to the cluster.
2. Switch to the database you want to investigate.
3. Check current profiler status.
4. Enable level `1` with a conservative threshold and sample rate.
5. Query `system.profile`.
6. Disable profiling after the investigation.

Example:

```javascript
use appdb

db.getProfilingStatus()

db.setProfilingLevel(1, { slowms: 200, sampleRate: 0.2 })

db.system.profile.find(
  {},
  { ts: 1, millis: 1, ns: 1, op: 1, command: 1, planSummary: 1 }
).sort({ ts: -1 }).limit(20)

db.setProfilingLevel(0)
```

Important Atlas notes:

* `db.setProfilingLevel()` is not supported on M0 and Flex clusters.
* Atlas Query Profiler is available on M10+ clusters.
* Changing profiler settings can affect system log behavior.
* Slow operation threshold changes made with `db.setProfilingLevel()` may reset after node restart.
* Profile data may include sensitive query contents.

Sharded cluster note:

* Profiling is not enabled on `mongos`.
* In a sharded cluster, enable profiling on the relevant `mongod` shard members.
* On `mongos`, `slowms` and `sampleRate` affect diagnostic logging, not `system.profile`.

***

## Slow Operations: Find slow operations. Find slow operations in `system.profile`

Once profiling is enabled, these queries are useful.

### Slowest Recent Operations

```javascript
db.system.profile.find(
  { millis: { $gte: 200 } },
  {
    ts: 1,
    millis: 1,
    ns: 1,
    op: 1,
    command: 1,
    planSummary: 1,
    keysExamined: 1,
    docsExamined: 1,
    nreturned: 1
  }
).sort({ millis: -1 }).limit(20)
```

### Operations Scanning Too Many Documents

```javascript
db.system.profile.find(
  {
    docsExamined: { $gt: 10000 },
    nreturned: { $lt: 100 }
  },
  {
    ts: 1,
    millis: 1,
    ns: 1,
    command: 1,
    planSummary: 1,
    docsExamined: 1,
    nreturned: 1
  }
).sort({ docsExamined: -1 }).limit(20)
```

If `docsExamined` is huge and `nreturned` is tiny, the database is doing too much work to find a small result set.

### Operations with Collection Scans

```javascript
db.system.profile.find(
  { planSummary: /COLLSCAN/ },
  {
    ts: 1,
    millis: 1,
    ns: 1,
    command: 1,
    planSummary: 1,
    docsExamined: 1,
    nreturned: 1
  }
).sort({ ts: -1 }).limit(20)
```

A collection scan is not always bad. It may be fine for a tiny collection or one-off admin job. It is suspicious on large hot collections.

### Group by Namespace

```javascript
db.system.profile.aggregate([
  {
    $group: {
      _id: "$ns",
      count: { $sum: 1 },
      totalMillis: { $sum: "$millis" },
      maxMillis: { $max: "$millis" },
      avgMillis: { $avg: "$millis" }
    }
  },
  { $sort: { totalMillis: -1 } },
  { $limit: 20 }
])
```

This helps you find which collections are consuming the most profiled time.

***

## Find Repeated Query Patterns

MongoDB does not have a direct equivalent of PostgreSQL `pg_stat_statements` on every deployment. Depending on your version and environment, you have several options.

### Option A: Atlas Query Profiler

In Atlas:

1. Open Query Profiler.
2. Select a time range.
3. Filter by namespace.
4. Look for repeated shapes or binned operations.
5. Compare the pattern with application request logs.

This is usually the easiest way to detect repeated query shapes in Atlas.

### Option B: `$queryStats` Where Available

`$queryStats` returns runtime statistics for recorded query shapes. It is useful for aggregate query-shape analysis, but MongoDB documents it as unsupported and not guaranteed to have a stable output format. Use it for investigation, not as a hard dependency in application code.

Example:

```javascript
use admin

db.aggregate([
  { $queryStats: {} },
  {
    $project: {
      queryShapeHash: "$queryShapeHash",
      execCount: "$metrics.execCount",
      lastExecutionMicros: "$metrics.lastExecutionMicros",
      docsReturned: "$metrics.docsReturned.sum",
      docsExamined: "$metrics.docsExamined.sum",
      keysExamined: "$metrics.keysExamined.sum",
      key: 1
    }
  },
  { $sort: { execCount: -1 } },
  { $limit: 20 }
])
```

Atlas steps:

1. Use an M10+ cluster.
2. Connect with a user that has the required monitoring privileges, such as `clusterMonitor`.
3. Run the pipeline on the `admin` database.
4. Treat field paths as version-sensitive.
5. Use results to guide investigation, not as a permanent contract.

Self-hosted note: `$queryStats` availability and behavior depend on MongoDB version and deployment support. If it is unavailable, use logs, profiler data, and application metrics.

### Option C: Application-Level Counting

For N+1 problems, the application often has the best evidence. Count queries per request and log a warning when one request executes too many MongoDB operations.

Example symptom:

```javascript
const orders = await db.collection("orders")
  .find({ customerId })
  .toArray();

for (const order of orders) {
  order.items = await db.collection("order_items")
    .find({ orderId: order._id })
    .toArray();
}
```

If the customer has 500 orders, the endpoint runs 501 queries.

Better:

```javascript
const ordersWithItems = await db.collection("orders").aggregate([
  { $match: { customerId } },
  {
    $lookup: {
      from: "order_items",
      localField: "_id",
      foreignField: "orderId",
      as: "items"
    }
  },
  { $sort: { createdAt: -1 } },
  { $limit: 50 }
]).toArray();
```

This may or may not be the final design. For very large arrays, `$lookup` can also become heavy. The important point is to make the repeated pattern visible, then choose the right data access pattern.

***

## Layer 5: Use `$currentOp` for Live Problems

When production is slow right now, look at live operations.

MongoDB recommends the `$currentOp` aggregation stage over the older `currentOp` command.

### Active Operations

Run on the `admin` database:

```javascript
use admin

db.aggregate([
  { $currentOp: { allUsers: true, idleConnections: false } },
  {
    $match: {
      active: true,
      secs_running: { $gte: 5 }
    }
  },
  {
    $project: {
      opid: 1,
      appName: 1,
      client: 1,
      ns: 1,
      op: 1,
      secs_running: 1,
      waitingForLock: 1,
      command: 1
    }
  },
  { $sort: { secs_running: -1 } }
])
```

Look for:

* operations running for a long time,
* lock waits,
* expensive aggregations,
* queries from a specific application name,
* operations against unexpected collections,
* open cursors or long-running `getMore` operations.

### Atlas Steps

In Atlas:

1. Open the cluster.
2. Check the Real-Time Performance Panel for current load.
3. Use Query Profiler for slow operations around the same time.
4. Connect with `mongosh` and run `$currentOp` if you need exact live operation details.

The user needs sufficient privileges to view other users' operations. Without that, you may only see your own operations.

### Self-Hosted Steps

On self-hosted MongoDB:

1. Connect to the primary or the affected replica set member.
2. Run `$currentOp` on `admin`.
3. Identify long-running operations.
4. If needed, use `db.killOp(<opid>)` only after confirming impact.

Killing an operation is an incident response action, not a performance fix. The real fix is usually an index, query rewrite, transaction change, or workload throttling.

***

## Layer 6: Use `explain` to Understand the Plan

Once you have a bad query, do not guess. Run `explain`.

Example:

```javascript
db.orders.explain("executionStats").find({
  customerId: ObjectId("64b7f0b4f2a7d2f8d91a1111"),
  status: "PAID"
}).sort({ createdAt: -1 }).limit(50)
```

Look for:

* `COLLSCAN`: collection scan.
* `IXSCAN`: index scan.
* `SORT`: blocking sort.
* `totalDocsExamined`: how many documents MongoDB had to inspect.
* `totalKeysExamined`: how many index keys MongoDB inspected.
* `nReturned`: how many documents came back.
* `executionTimeMillis`: how long execution took.

The important ratio is often:

```
docs examined : docs returned
```

If MongoDB examined 500,000 documents to return 20, you probably have an index or query-shape problem.

### Atlas Steps

For Atlas:

1. Use Query Profiler to copy the slow query shape.
2. Run the query in Compass, Data Explorer, or `mongosh`.
3. Use Explain Plan or `explain("executionStats")`.
4. Compare the plan before and after adding or changing an index.

### Self-Hosted Steps

For self-hosted:

1. Find the query in logs or `system.profile`.
2. Recreate the query in `mongosh`.
3. Run `explain("executionStats")`.
4. Test candidate indexes on staging or during a safe production window.

Example index:

```javascript
db.orders.createIndex(
  { customerId: 1, status: 1, createdAt: -1 },
  { name: "idx_orders_customer_status_createdAt" }
)
```

Run explain again:

```javascript
db.orders.explain("executionStats").find({
  customerId: ObjectId("64b7f0b4f2a7d2f8d91a1111"),
  status: "PAID"
}).sort({ createdAt: -1 }).limit(50)
```

You want fewer documents examined, lower execution time, and a plan that uses the intended index.

***

## Common MongoDB Query Problems

### 1. Missing Index

Symptom:

* `COLLSCAN`
* high `docsExamined`
* low `nReturned`
* slow query appears in Query Profiler or logs

Fix:

```javascript
db.orders.createIndex({ customerId: 1, createdAt: -1 })
```

### 2. Wrong Compound Index Order

Query:

```javascript
db.orders.find({
  customerId: customerId,
  status: "PAID"
}).sort({ createdAt: -1 })
```

Better index:

```javascript
db.orders.createIndex({ customerId: 1, status: 1, createdAt: -1 })
```

The index should match the query shape: equality filters first, then sort/range fields depending on the access pattern.

### 3. Unbounded Query

Bad:

```javascript
db.audit_logs.find({ tenantId: tenantId }).toArray()
```

Better:

```javascript
db.audit_logs.find({
  tenantId: tenantId,
  createdAt: {
    $gte: ISODate("2026-04-01T00:00:00Z"),
    $lt: ISODate("2026-05-01T00:00:00Z")
  }
}).sort({ createdAt: -1 }).limit(500)
```

Add pagination, date ranges, and limits. For exports, use background jobs.

### 4. In-Memory Sort

Symptom:

* Query filters correctly, but the sort is slow.
* Explain shows a blocking sort.
* Query Profiler shows high latency for sorted results.

Fix:

Create an index that supports both filter and sort:

```javascript
db.orders.createIndex({ customerId: 1, createdAt: -1 })
```

### 5. N+1 Query Pattern

Symptom:

* Many fast queries from one endpoint.
* Database CPU and network rise.
* API latency grows with the number of parent records.

Fix options:

* Batch queries with `$in`.
* Use `$lookup` where appropriate.
* Embed data if the parent document naturally owns it.
* Cache read-mostly reference data.
* Change the API to return fewer nested details.

### 6. Large Aggregation Pipeline

Symptom:

* Slow `aggregate`.
* High CPU.
* Sort/group stages process too much data.

Fix:

* Put `$match` as early as possible.
* Use indexes that support the initial `$match` and `$sort`.
* Reduce fields with `$project`.
* Avoid unbounded `$lookup`.
* Use `allowDiskUse` intentionally, not as a default escape hatch.

***

## Production Debugging Flow

{% stepper %}
{% step %}

### Step 1: Confirm the Symptom

From application monitoring:

* Which endpoint or job is slow?
* What time did it start?
* Was there a deployment?
* Which tenant or customer is affected?
* Is the problem read-heavy, write-heavy, or aggregation-heavy?
  {% endstep %}

{% step %}

### Step 2: Check Atlas or Logs

Atlas:

* Open Query Profiler.
* Select the incident time window.
* Filter by namespace and host.
* Check Performance Advisor for related index suggestions.

Self-hosted:

* Search `mongod.log` for slow operations.
* Filter by namespace, appName, comment, or timestamp.
* Check CPU, disk I/O, connections, and replication lag.
  {% endstep %}

{% step %}

### Step 3: Check Live Operations

Use `$currentOp`:

* Are operations stuck?
* Are they waiting for locks?
* Is one app creating many long-running operations?
* Is a shard or replica set member affected more than others?
  {% endstep %}

{% step %}

### Step 4: Capture Details If Baseline Is Not Enough

Enable temporary profiler level `1` for the affected database:

```javascript
db.setProfilingLevel(1, { slowms: 200, sampleRate: 0.2 })
```

For a narrower trace:

```javascript
db.setProfilingLevel(1, {
  filter: {
    ns: "appdb.orders",
    millis: { $gte: 200 }
  }
})
```

When a filter is set, `slowms` and `sampleRate` are not used for profiling. The filter controls what gets captured.
{% endstep %}

{% step %}

### Step 5: Explain the Query

Run:

```javascript
db.orders.explain("executionStats").find({
  customerId: customerId,
  status: "PAID"
}).sort({ createdAt: -1 }).limit(50)
```

Check:

* winning plan,
* index used,
* docs examined,
* keys examined,
* docs returned,
* execution time,
* sort behavior.
  {% endstep %}

{% step %}

### Step 6: Fix the Real Cause

Common fixes:

* Add or adjust an index.
* Rewrite the query.
* Add limits and pagination.
* Move heavy reports to background jobs.
* Reduce N+1 query patterns.
* Change the schema for the access pattern.
* Split hot and cold data.
* Reduce transaction duration.
* Scale only after query and index problems are understood.
  {% endstep %}

{% step %}

### Step 7: Turn Temporary Tracing Off

After the investigation:

```javascript
db.setProfilingLevel(0)
```

Then confirm:

```javascript
db.getProfilingStatus()
```

Also confirm:

* log volume returned to normal,
* disk usage is stable,
* Atlas Query Profiler still shows enough baseline visibility,
* alerts are not noisy,
* the fix improved application and database metrics.
  {% endstep %}
  {% endstepper %}

***

## What to Keep Enabled and What to Disable

| Setting or tool             | Atlas                                       | Self-hosted                             | Keep enabled?                      |
| --------------------------- | ------------------------------------------- | --------------------------------------- | ---------------------------------- |
| Application `appName`       | Connection string                           | Connection string                       | Yes                                |
| Query comments              | Driver/query option                         | Driver/query option                     | Yes, if sanitized                  |
| Query Profiler              | Atlas UI, M10+                              | Not applicable                          | Yes, use as baseline               |
| Performance Advisor         | Atlas UI, M10+                              | Not applicable                          | Yes                                |
| Slow operation logs         | Atlas-managed logs                          | `slowOpThresholdMs`, `slowOpSampleRate` | Yes, with safe thresholds          |
| Database Profiler level `1` | `db.setProfilingLevel(1, ...)` if supported | `db.setProfilingLevel(1, ...)`          | Temporary                          |
| Database Profiler level `2` | Avoid                                       | Avoid                                   | Very short window only             |
| `$currentOp`                | On demand                                   | On demand                               | On demand                          |
| `$queryStats`               | Investigation, supported tiers              | If available                            | Investigation, not hard dependency |
| Verbose component logging   | Limited use                                 | Limited use                             | Temporary                          |

My default production posture:

* Always keep application names and safe slow-query visibility.
* Use Atlas Query Profiler and Performance Advisor before enabling profiler.
* Use Database Profiler level `1` only for focused investigations.
* Avoid profiler level `2` unless the window is tiny and the system can tolerate it.
* Restore previous settings after the incident.

***

## Minimal Setup Checklist

### Atlas Baseline

1. Use M10+ clusters for Query Profiler and Performance Advisor support.
2. Set `appName` in every service connection string.
3. Use short sanitized query comments for important workflows.
4. Configure alerts for CPU, I/O, connections, replication lag, queues, and slow query symptoms.
5. Review Query Profiler during incidents.
6. Review Performance Advisor before adding indexes.
7. Download or export logs if you need retention beyond the Atlas UI window.

For production troubleshooting, avoid relying on M0 or Flex clusters as your operational model. Some profiler commands, downloadable logs, and performance tools are unavailable or limited there.

### Self-Hosted Baseline

In `mongod.conf`:

```yaml
operationProfiling:
  mode: off
  slowOpThresholdMs: 100
  slowOpSampleRate: 1.0
```

In applications:

```
mongodb://user:password@mongo1:27017,mongo2:27017/appdb?replicaSet=rs0&appName=orders-api
```

During investigation:

```javascript
db.getProfilingStatus()
db.setProfilingLevel(1, { slowms: 200, sampleRate: 0.2 })
```

After investigation:

```javascript
db.setProfilingLevel(0)
db.getProfilingStatus()
```

***

## Final Takeaway

MongoDB tracing is not one switch. It is a layered workflow.

For Atlas, start with Query Profiler, Performance Advisor, metrics, and logs. For self-hosted MongoDB, start with slow operation logs, safe thresholds, application names, and on-demand profiler usage.

The most important lesson is the same as PostgreSQL: enable the low-risk baseline before the incident. If you wait until production is already slow, you only collect data from that moment forward.

Keep the baseline on. Turn deeper tracing on only when you need it. Turn it off when the investigation is done. Then verify the fix with both database evidence and application latency.

That is how you move from "MongoDB is slow" to a useful answer: which query shape, from which application, on which collection, using which plan, with what production impact.

***

## References

* [MongoDB documentation: Database Profiler](https://www.mongodb.com/docs/manual/tutorial/manage-the-database-profiler/)
* [MongoDB documentation: Database Profiler Output](https://www.mongodb.com/docs/manual/reference/database-profiler/)
* [MongoDB documentation: `db.setProfilingLevel()`](https://www.mongodb.com/docs/manual/reference/method/db.setProfilingLevel/)
* [MongoDB documentation: `$currentOp`](https://www.mongodb.com/docs/manual/reference/operator/aggregation/currentOp/)
* [MongoDB documentation: `db.collection.explain()`](https://www.mongodb.com/docs/manual/reference/method/db.collection.explain/)
* [MongoDB documentation: Interpret Explain Plan Results](https://www.mongodb.com/docs/manual/tutorial/analyze-query-plan/)
* [MongoDB documentation: `$queryStats`](https://www.mongodb.com/docs/manual/reference/operator/aggregation/queryStats/)
* [MongoDB Atlas documentation: Monitor Query Performance with Query Profiler](https://www.mongodb.com/docs/atlas/tutorial/query-profiler/)
* [MongoDB Atlas documentation: Performance Advisor](https://www.mongodb.com/docs/atlas/performance-advisor/)
* [MongoDB Atlas documentation: View and Download MongoDB Logs](https://www.mongodb.com/docs/atlas/mongodb-logs/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://wisdom.gitbook.io/gyan/problem-solutions/tracing-mongodb-queries-in-production-atlas-and-self-hosted.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
