Skip to content

DoQ queries can outlive the DNS timeout and retain QUIC streams #4188

@N0zoM1z0

Description

@N0zoM1z0

Operating system

Linux

System version

Linux 5.15.0-138-generic x86_64 GNU/Linux

Installation type

Others

If you are using a graphical client, please provide the version of the client.

Not applicable.

Version

Tested version:


sing-box version 1.13.12

Environment: go1.25.0 linux/amd64
Tags: with_quic,with_dhcp,with_wireguard,with_utls,with_acme,with_clash_api
CGO: enabled


The tested tag commit is `1086ab2563320e0da0c23b3a491d8dfa0939dff4`.

By static comparison, `dns/transport/quic/quic.go` and `dns/client.go` have no relevant diff between `v1.13.12` and `v1.13.13`. The dynamic reproduction below was run on `v1.13.12`.

Description

When sing-box is configured to use a DNS-over-QUIC upstream, a malicious or faulty DoQ server can keep DNS queries alive beyond the configured DNS timeout by accepting the QUIC stream, reading the DNS request, and then never sending a response while keeping the QUIC connection open.

In v1.13.12, the DoQ transport uses the request context while opening a QUIC stream, but the later response read is a plain blocking read:

stream, err := conn.OpenStreamSync(ctx)
// ...
response, err := transport.ReadMessage(stream)

The relevant source excerpt is included at attachments/evidence/quic-exchange-snippet.txt. The important part is that OpenStreamSync(ctx) is bounded by the context, but transport.ReadMessage(stream) is not. The deferred stream.CancelRead(0) only runs after ReadMessage returns, so it does not interrupt a silent peer.

This also interacts badly with the same-question cache/waiter logic. In dns/client.go, duplicate simple questions wait on cacheLock / transportCacheLock before the per-query DNS timeout context is created:

cond, loaded := c.cacheLock.LoadOrStore(question, make(chan struct{}))
if loaded {
    select {
    case <-cond:
    case <-ctx.Done():
        return nil, ctx.Err()
    }
}
// ...
ctx, cancel := context.WithTimeout(ctx, c.timeout)
response, err := transport.Exchange(ctx, message)

The relevant source excerpt is included at attachments/evidence/dns-client-cache-timeout-snippet.txt. As a result, a first hanging DoQ query can also make later identical queries wait without being bounded by the configured DNS timeout.

This is reachable through the normal command line program and normal DNS configuration. The reproduction uses:

  • type: "quic" DNS server pointing to 127.0.0.1:18853
  • local DNS capture on 127.0.0.1:15353
  • a local DoQ server that speaks QUIC/TLS with ALPN doq, reads valid DoQ requests, and keeps the streams open

The relevant threat model is a configured DoQ resolver that is malicious, compromised, or faulty. The client-side trigger traffic is ordinary DNS-over-TCP traffic sent to sing-box's local DNS entry point.

Reproduction

The attachment directory is self-contained except for the sing-box binary under test. It contains:

  • attachments/config/stock-doq-impact.json: minimal sing-box configuration
  • attachments/pocs/cmd/doq_hang_server/main.go: local DoQ upstream that accepts streams and never replies
  • attachments/pocs/cmd/dns_trigger/main.go: DNS-over-TCP trigger client
  • attachments/scripts/*.sh: reproducible test scripts

Run the tests from the report directory:

cd attachments
SING_BOX_BIN=/path/to/sing-box ./scripts/run_single_wait_65s.sh
SING_BOX_BIN=/path/to/sing-box ./scripts/run_abandon_100_60s.sh 100 60
SING_BOX_BIN=/path/to/sing-box ./scripts/run_stream_exhaustion_victim.sh 100
SING_BOX_BIN=/path/to/sing-box ./scripts/run_sameq_wait_65s.sh

The configuration uses only localhost listeners and does not depend on any remote server.

Observed results

Single DoQ query:

  • Configured DNS timeout: 10 seconds
  • Client wait budget: 65 seconds
  • Result: the client waited 65.002358055 seconds and timed out on its own
  • Upstream DoQ streams seen: 1
  • Open upstream DoQ streams: 1
  • Evidence: attachments/results/stock-single-wait-65s/summary.json

Abandoned client requests:

  • 100 DNS requests were sent to sing-box and the client connections were closed immediately
  • After 60 seconds, the DoQ server still observed 100 open upstream streams
  • sing-box goroutines increased from 13 to 117
  • Evidence: attachments/results/stock-abandon-100-60s/summary.json

Later unrelated victim query:

  • 100 hanging upstream streams were created first
  • A later query for a different name failed after about 10 seconds without creating a new upstream stream
  • The DoQ server still had 100 open streams
  • Evidence: attachments/results/stock-stream-exhaustion-victim-100/summary.json

Same-question waiter:

  • The first query for sameq-stock.repro.example. was sent and the client disconnected
  • A second query for the same name waited 65.005602402 seconds and timed out on the client side
  • The DoQ server saw only 1 upstream stream, while sing-box logged 2 exchanges for the same name
  • Evidence: attachments/results/stock-sameq-wait-65s/summary.json

Impact

This is an availability issue in the DoQ DNS path.

A single silent DoQ response can keep a DNS request blocked well beyond the configured DNS timeout. Multiple abandoned local DNS requests can leave many upstream DoQ streams and goroutines alive after the clients are gone. In the reproduced run, 100 abandoned requests left 100 open DoQ streams after 60 seconds and increased sing-box goroutines by 104.

The issue can also affect later traffic. After the upstream streams were filled, a different later query failed without opening a new upstream stream. For repeated identical questions, the second query can block behind the first one before the per-query timeout is applied, so it also outlives the configured DNS timeout.

Expected behavior: the configured DNS timeout should bound the whole query lifetime, including DoQ response reads and same-question waiter time. When the context expires, sing-box should cancel or close the affected QUIC stream and release any waiter state.

attachments.zip

Logs

The generated logs are included under:

- `attachments/logs/stock-single-wait-65s.server.log`
- `attachments/logs/stock-single-wait-65s.sing-box.stdout.log`
- `attachments/logs/stock-abandon-100-60s.server.log`
- `attachments/logs/stock-abandon-100-60s.sing-box.stdout.log`
- `attachments/logs/stock-stream-exhaustion-victim-100.server.log`
- `attachments/logs/stock-stream-exhaustion-victim-100.sing-box.stdout.log`
- `attachments/logs/stock-sameq-wait-65s.server.log`
- `attachments/logs/stock-sameq-wait-65s.sing-box.stdout.log`
- `attachments/logs/stock-doq-impact.sing-box.log`

Supporter

Integrity requirements

  • I confirm that I have read the documentation, understand the meaning of all the configuration items I wrote, and did not pile up seemingly useful options or default values.
  • I confirm that I have provided the server and client configuration files and process that can be reproduced locally, instead of a complicated client configuration file that has been stripped of sensitive data.
  • I confirm that I have provided the simplest configuration that can be used to reproduce the error I reported, instead of depending on remote servers, TUN, graphical interface clients, or other closed-source software.
  • I confirm that I have provided the complete configuration files and logs, rather than just providing parts I think are useful out of confidence in my own intelligence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions