You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(progress): disambiguate filter bottleneck via sink-busy gauge + S3 inflight
The progress reporter's `bottleneck` field had been lying for a while:
`filter` triggered whenever the decompressed-line channel was full,
but downstream of that channel sits a lot more than the regex matcher
— `sink.ingest` does the codec compression, framing, per-prefix mutex
handoff, and (for S3) an mpsc send that blocks when TM can't drain.
A no-regex S3 run getting reported as `bottleneck=filter` was the
prompt: the label couldn't tell the user whether their CPU was pegged
on zstd or whether S3 was the floor.
Fix it with two new signals, both cheap:
1. `workers_in_ingest` — sink-agnostic AtomicUsize gauge. Filter
workers bump it via an RAII guard around every `sink.ingest`
call; the progress reporter samples it instantaneously at each
tick. ≥ half the filter workers inside ingest means the sink is
the lid; < half means workers are spending their time on
regex / channel receive / upstream stages.
2. S3-specific drill-down — when the gauge says "sink is the lid",
look at the S3 sink's `inflight_bytes` (resident bytes across
the per-upload mpsc channels and reader pending buffers) and
`active_uploads` count. High inflight per upload means TM is
slow to drain parts → network-bound. Low inflight means the
codec is the producer-side cost → CPU-bound.
Wired through a new `SinkObservability` struct on the `OutputSink`
trait. Default is empty; S3 implements; file / http / void return
empty (they don't have meaningful internal buffers to expose at
this level — for file, `iostat` is the next step).
New label set, all underscore-separated, no parens:
download dc channel mostly empty
filter dc full, sink not busy
sink_s3_codec dc full, sink busy, S3 mpsc empty
sink_s3_network dc full, sink busy, S3 mpsc backed up
sink_file dc full, sink busy, file sink
sink_http dc full, sink busy, HTTP sink (unusual)
sink_void dc full, sink busy, void sink (~never)
sink_busy fallback for unknown sink kinds
compress (HTTP) http line channel saturated [unchanged]
upload (HTTP) http batch channel saturated [unchanged]
Existing HTTP path inherits the gauge for the same disambiguation:
its old `filter` label is now split into `filter` vs `sink_http`
the same way.
The `Search progress` log line now also carries `workers_in_ingest`
and (when applicable) `sink_inflight_bytes` + `sink_active_uploads`
as structured fields so post-run log analysis has the raw signals
alongside the classifier's verdict.
Documentation: new "Reading the `bottleneck` label" section in
README.md enumerates every label, the signals it's computed from,
and operator guidance for what to investigate next per label.
Tests: 10 new classifier tests + 1 RAII gauge test cover every
label transition, the half-of-workers threshold for sink-busy,
the conservative fall-back to `sink_s3_codec` on missing inflight
signal, and the older HTTP priority order under the new signature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
**CPU**: Use samply with the profiling build profile.
326
326
327
327
**Memory**: `cargo build --profile profiling --features dhat-heap`, then submit the generated `profiler.json` to [dh_view](https://nnethercote.github.io/dh_view/dh_view.html).
328
+
329
+
## Reading the `bottleneck` label
330
+
331
+
Every `Search progress` log record carries a `bottleneck` field that names the stage most likely limiting throughput at the time the report was emitted. The classifier reads several channel-fill percentages plus a sink-busy gauge; the label is a one-word summary of the dominant signal.
332
+
333
+
### Signals it looks at
334
+
335
+
- **`dc_pct`** — fill percentage of the decompressed-line channel between the download/decompress stage and the filter workers. High means the downstream stages can't keep up with what's being decompressed.
336
+
- **`workers_in_ingest`** — instantaneous count of filter workers currently inside `sink.ingest`. Compared against the total filter-worker count: ≥ half means the sink is where time is going (codec compression, framing, sink-internal queues), < half means filter-side work (regex / channel receive) is what's eating the workers.
337
+
- **`sink_inflight_bytes`** + **`sink_active_uploads`** — S3 sink only. Bytes resident in the per-upload mpsc channels and reader pending buffers, scaled by how many uploads are currently open. High means TM is slow to drain parts to S3; low while workers are stuck in `sink.ingest` means the codec is the producer-side cost.
338
+
- HTTP-mode only: **`line_pct`** and **`batch_pct`** — fill of the HTTP writer's internal line and batch channels. Provide direct visibility into the compressor and uploader stages.
339
+
340
+
### Label meanings
341
+
342
+
- **`download`** — `dc_pct` is low. The download stage isn't filling the line channel fast enough. Causes: S3 per-connection throughput, network bandwidth, low `--max-parallel`, or storage class.
343
+
- **`filter`** — `dc_pct` high, `workers_in_ingest` low. Filter workers are spending their time on regex matching or waiting on channel receives. Causes: an expensive regex, lots of non-matching lines, or simply too few filter workers (`--filter-tasks`).
344
+
- **`sink_s3_codec`** — `dc_pct` high, `workers_in_ingest` high, sink mpsc is roughly empty. Workers are in `sink.ingest` but bytes are leaving fast — the codec (zstd / gzip) is the producer-side cost. Try `--compression-format none`, raise the level only with eyes on this label, or look at per-prefix lock contention if you have many concurrent prefixes hitting the same per-prefix mutex.
345
+
- **`sink_s3_network`** — `dc_pct` high, `workers_in_ingest` high, sink mpsc is backed up. `ChannelWriter::blocking_send` is waiting because TM / S3 isn't accepting parts fast enough. Causes: network bandwidth ceiling, `multipart_concurrency` set too low, S3 throttling.
346
+
- **`sink_file`** — `dc_pct` high, `workers_in_ingest` high, file sink. Codec or the OS write path is the lid. The file sink doesn't have an internal queue to look at — reach for `iostat`/`vmstat`/`dmesg` to see whether it's the filesystem, dm-crypt, an NBD/EBS volume, or just disk pressure.
347
+
- **`sink_http`** — `dc_pct` high, `workers_in_ingest` high, HTTP sink, but the HTTP writer's own channels aren't full. Unusual; usually you'd see `compress` or `upload` for HTTP-bound runs.
348
+
- **`compress`** *(HTTP only)* — the HTTP writer's line channel is full. The compressor task pool isn't keeping up. Raise `--http-compressor-tasks` (or rely on the auto-inferred default).
349
+
- **`upload`** *(HTTP only)* — the HTTP writer's batch channel is full. The uploader pool can't get batches out to the API fast enough. Raise `--http-upload-tasks`, raise `--max-upload-rate`, or check the upstream API.
350
+
- **`sink_void`** — should never realistically appear; void's ingest is a counter bump. If it shows up, something is wrong.
351
+
- **`sink_busy`** — generic fallback if the sink kind doesn't match a known label. Means: filter workers are stuck inside `sink.ingest` but we couldn't be more specific.
352
+
353
+
### Caveats
354
+
355
+
- The classifier reports the *dominant* signal at sampling time. A flapping pipeline (e.g. download bursts followed by sink bursts) will rotate labels across consecutive reports — that's a useful signal in itself.
356
+
- The threshold for "channel saturated" is 80% fill; for "sink busy" it's ≥ half of filter workers inside `sink.ingest`. Both are heuristics and may need adjustment as workloads shift.
357
+
- For the S3 sink, the codec-vs-network drill-down doesn't see *inside* the AWS transfer manager — once bytes leave our `ChannelWriter`, we lose visibility. If TM has its own internal queueing under pressure, we'd report `sink_s3_codec` (low local inflight) even though the actual bottleneck is downstream. Use the upload throughput numbers and `multipart_concurrency` setting alongside the label.
0 commit comments