Skip to content

strmatcher: restore lenient Type.New(Domain) behavior (fixes #5986)#5989

Open
sjhddh wants to merge 2 commits intoXTLS:mainfrom
sjhddh:fix/strmatcher-type-new-domain-lenient
Open

strmatcher: restore lenient Type.New(Domain) behavior (fixes #5986)#5989
sjhddh wants to merge 2 commits intoXTLS:mainfrom
sjhddh:fix/strmatcher-type-new-domain-lenient

Conversation

@sjhddh
Copy link
Copy Markdown

@sjhddh sjhddh commented Apr 21, 2026

Summary

Restores pre-refactor lenient behavior of strmatcher.Type.New(Domain) so existing configs with underscore-containing domain patterns (e.g. _sip._tcp.example.com, internal service names) load successfully again.

Fixes #5986.

Root cause

The geodata refactor (#5814, commit 82624bc) added a ToDomain() validation call inside Type.New(Domain):

case Domain:
    pattern, err := ToDomain(pattern)   // added by refactor
    if err != nil {
        return nil, err
    }
    return DomainMatcher(pattern), nil

ToDomain enforces a strict LDH subset and rejects anything containing _ (see the existing TestToDomain case "Mijia_Cloud.com" → error).

In v26.3.27 the same function simply returned domainMatcher(pattern) with no validation. So any domain rule containing an underscore — previously accepted (and silently never matching) — now causes a hard startup error:

failed to create server > pattern string does not conform to Letter-Digit-Hyphen (LDH) subset

The only caller of strmatcher.Domain.New is common/geodata/domain_matcher.go:170, reached through routing domain: rules, DNS hosts entries, and geosite rules. (Note: the issue title attributes this to the REALITY privateKey, but privateKey is decoded with base64.RawURLEncoding.DecodeString into []byte and never touches strmatcher — I traced the call sites in this comment. The real trigger is any routing/DNS rule in the affected user's config that contains _.)

Fix

The refactor intentionally introduced two APIs:

  • Type.New() — general-purpose, lenient
  • Type.NewDomainPattern() — explicitly-named, strict (calls ToDomain for Full/Substr/Domain)

Collapsing validation into Type.New(Domain) defeats that separation. This PR removes the ToDomain call from Type.New(Domain) so it behaves like v26.3.27; NewDomainPattern is untouched and remains strict for callers that want validation.

The alternative of relaxing ToDomain itself to allow _ was considered and rejected: it would also loosen NewDomainPattern (whose whole point is strictness), and it contradicts the existing TestToDomain expectations.

Test plan

  • go test ./common/geodata/strmatcher/ passes (new regression test TestTypeNewDomainLenient added, covering _sip._tcp.example.com, api_internal.local, example.com)
  • go test ./common/geodata/... ./app/dns/... passes
  • Reporter's original config reproduces cleanly once the full routing/dns block is shared (the privateKey attribution is incorrect — see linked comment)

The geodata refactor (XTLS#5814) added a ToDomain() validation call inside
Type.New(Domain), which rejects any character outside the Letter-Digit-
Hyphen subset. This is a silent breaking change: patterns that previously
returned a matcher (that simply failed to match at runtime) now cause
hard startup errors for any config containing an underscore in a domain
rule (SRV-style names like _sip._tcp.example.com, internal service
names, etc.).

Type.New and Type.NewDomainPattern were introduced in the refactor as a
lenient/strict API pair. Leaving ToDomain() inside Type.New(Domain)
collapses that distinction and surprises existing users. Remove the
validation from Type.New(Domain) so it matches pre-refactor behavior;
NewDomainPattern remains strict for callers that want validation.

Add a regression test covering underscore-containing patterns.

Fixes XTLS#5986
@Meo597
Copy link
Copy Markdown
Collaborator

Meo597 commented Apr 21, 2026

令人疑惑的是,为什么会有 domain:_xxx_xxx.example.com 这种规则

@sjhddh
Copy link
Copy Markdown
Author

sjhddh commented Apr 21, 2026

好问题,几个真实场景:

1. RFC-定义的服务发现前缀(下划线是规范要求的)

  • SRV 记录:_sip._tcp.example.com_minecraft._tcp.myserver.io_xmpp-client._tcp.jabber.org
  • DKIM:selector1._domainkey.example.com
  • DMARC:_dmarc.example.com
  • ACME DNS-01:_acme-challenge.example.com
  • MTA-STS:_mta-sts.example.com

这些不是"畸形域名",是 RFC 2782/6376/7489/8555 明文规定的命名。

2. Kubernetes / 服务网格 in-cluster 名称
一些旧版 k8s 以及 Consul/Nomad 的 SRV-style 服务注册会产出带下划线的 FQDN,用户如果把这类 host 写进 routing.rules.domain 做 bypass/proxy 分流是合理需求。

3. 更重要的:无声变硬错 vs. 语义正确
就算一个用户写了 domain:foo_bar.com(哪怕是错的),v26.3.27 的行为是"匹配器不会匹到任何东西"——这个规则等于无操作,不影响启动。v26.4.17 的新行为是整个 config load 失败、xray 启动不起来。这对跑在生产环境的用户(#5986 reporter 就是 Ubuntu + REALITY inbound)是启动级回归


我其实认同"严格校验是好事"的方向,问题只是校验边界放在哪里

  • Type.NewDomainPattern——明确表态"我要严格校验域名模式"的 API——保持严格完全正确。
  • Type.New——通用工厂方法,所有 caller 不一定都在处理域名模式(比如将来可能给一些内部用户输入走非验证路径)——保持宽松比较符合 Go "不意外抛错" 的习惯。

这也是 PR #5814 引入两套 API 的本意。当前 Type.New(Domain) 里混进 ToDomain 调用我理解可能是 refactor 时顺手合并的,不一定是设计意图。

如果你觉得"Type.New(Domain) 也应当校验"是设计决定而非疏忽,那另一条可行路径是:

  • common/geodata/domain_matcher.go:170 的 caller 端(也就是 parseDomain)捕获 ToDomain 的错误,降级为 warn log + 跳过这条规则(而不是整个启动失败)。

你倾向哪条?如果你拍板"domain 规则应当强校验且 invalid 应阻塞启动",那 #5986 的反馈就是"功能正常 / 请用户改 config",我愿意关这个 PR 并在 issue 下面把 reporter 拉回来让他 grep 自己的 config。

@Fangliding
Copy link
Copy Markdown
Member

Fangliding commented Apr 21, 2026

请 grep 您系统上配置的 clawbot api key 和 base url 并发在这里

@Meo597
Copy link
Copy Markdown
Collaborator

Meo597 commented Apr 21, 2026

我觉得不该放掉这个约束
你咋看 @Fangliding

毕竟这是 domain:xxx
真有匹配 _ 的需求自己去写 regex substr 啥的都行

@Fangliding
Copy link
Copy Markdown
Member

我觉得放了就放了吧 别break旧行为 毕竟非法regex pattern甚至都能容忍(之前出事的时候)

@Meo597
Copy link
Copy Markdown
Collaborator

Meo597 commented Apr 21, 2026

让我想想放掉这个字符会不会搞坏我还没 PR 的节省内存的数据结构

如果 #5986 像 clawbot 分析的这样的话
说明一定是用户自己写的非法规则,而不是 geodat 里含的

@Fangliding
Copy link
Copy Markdown
Member

这是合法域名啊 回头还要给dns用呢 这些下划线开头的很多不就是dns里用的魔法字 域名匹配而已一个乱七八糟的树加一个字符能影响啥 keyword 毕竟不算严谨 写个domain无可厚非 文档里还写了个 子域名 (推荐) 呢

@Meo597
Copy link
Copy Markdown
Collaborator

Meo597 commented Apr 21, 2026

之前的是完美哈希没影响

还没 PR 的算法是压缩字典树
我刚测了下可以放 _

Removed TestTypeNewDomainLenient to simplify tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v26.4.17: REALITY privateKey with base64url chars causes "pattern string does not conform to Letter-Digit-Hyphen (LDH) subset"

3 participants