feat: short-message flood detection for unapproved users#400
Merged
Conversation
Codifies the requirement that any new configurable parameter (CLI flag, env var, or tunable knob) must be wired through CLI/env definition, DB-backed config, web settings UI, e2e UI test, and README. Recurring source of partially-broken features when one layer is missed.
Adds an opt-in spam check that bans an unapproved user when their locator
message count minus approved count crosses --max-short-msg-count, and the
current message is short. Closes the gap left by content-based checks for
spammers probing the channel with innocuous one-word messages ("hi",
"hello", "yo") that individually evade similarity, classifier, LLM, and
duplicate detectors.
State is derived from the existing messages table via the
idx_messages_gid_user_id_time index — no new in-memory counter, no cache,
restart-resilient by construction. The check is implemented as an internal
method on Detector (isShortMsgFlood), gated by an outer guard so disabled
mode does no work, and bypasses LLM consensus by design (behavioral signal
not subject to content-based override).
Wired through all required layers per CLAUDE.md: CLI flag and env var
(--max-short-msg-count / MAX_SHORT_MSG_COUNT), DB-backed Settings with
zero-aware semantics, web settings UI form field plus read-only display,
e2e UI round-trip test, and README documentation. Respects training, dry,
and soft-ban modes via the existing pipeline. Disabled by default
(MaxShortMsgCount=0) and incompatible with paranoid mode (validated at
startup).
Related to #399
Captures the design decisions, wiring layers, edge cases, and task breakdown for the short-message flood detection feature. Filed under completed/ since the implementation has landed. Related to #399
Two refinements from post-implementation review: - README.md: drop the trigger-formula code block, locator-derivation paragraph, separate "When triggered" paragraph, dedicated "Rollout note", and split-out "Naturally terse legitimate users" paragraph. The section now matches the brevity of neighboring checker sections (single intro, one Important callout, Configure with). Implementation detail belongs in CLAUDE.md (already there). - lib/tgspam/detector.go: add a comment on the approved-user fast-bail in isShortMsgFlood explaining why it exists. In production main.go forces FirstMessageOnly=true whenever FirstMessagesCount > 0, so approved users short-circuit at the pre-approved branch in Check and never reach this method. The guard is defensive against library consumers that construct Detector with FirstMessageOnly=false and FirstMessagesCount > 0. Related to #399
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an opt-in “short-message flood” behavioral check to the spam detector to catch unapproved users who send many very short messages that individually evade content-based and duplicate detection. The implementation is stateless (derives counts from the locator DB) and is fully wired through CLI/settings, web UI, and tests.
Changes:
- Introduce
MaxShortMsgCountconfig/setting and aMessageCounterdependency to support short-message flood detection inDetector.Check. - Extend
*storage.Locatorwith per-user count + message ID retrieval to back the detector’s new check. - Wire the new setting end-to-end (CLI/env → settings → web UI → e2e) and add unit + integration coverage.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| lib/tgspam/detector.go | Adds MaxShortMsgCount, MessageCounter, wiring setter, and the isShortMsgFlood check integrated into Detector.Check. |
| lib/tgspam/detector_test.go | Comprehensive unit tests covering enable/disable guards, threshold behavior, ID filtering, and error paths. |
| lib/tgspam/mocks/message_counter.go | Generated mock for the new MessageCounter interface used by tests. |
| app/storage/locator.go | Implements CountUserMessages and UserMessageIDs plus adds locking to GetUserMessageIDs. |
| app/storage/locator_test.go | Adds tests for the new locator count/IDs methods and gid isolation. |
| app/main.go | Adds CLI flag, startup validation, wires locator into detector via WithMessageCounter, and plumbs config into detector config. |
| app/main_test.go | Adds validation and detector-config wiring tests for MaxShortMsgCount. |
| app/settings.go | Maps CLI option MaxShortMsgCount into persisted settings. |
| app/settings_test.go | Verifies optToSettings mapping for MaxShortMsgCount. |
| app/config/settings.go | Adds MaxShortMsgCount to settings struct and to zeroAwarePaths. |
| app/config/settings_test.go | Tests zero-aware default merge behavior and JSON/YAML round-trip for the new setting. |
| app/webapi/config.go | Parses maxShortMsgCount from the settings form submission. |
| app/webapi/config_test.go | Tests web form parsing behavior (numeric/empty/non-numeric) for maxShortMsgCount. |
| app/webapi/assets/settings.html | Adds the new editable input and read-only display row for MaxShortMsgCount. |
| e2e-ui/e2e_test.go | Adds Playwright e2e settings round-trip test for MaxShortMsgCount. |
| app/events/listener_test.go | Integration test ensuring short-msg-flood propagates ExtraDeleteIDs and drives deletion/ban behavior. |
| README.md | Documents the new feature and adds the option row matching --help output. |
| CLAUDE.md | Adds a project rule/checklist for wiring new configurable parameters across all layers. |
| docs/plans/completed/20260429-max-short-msg-count.md | Completed implementation plan documenting design decisions, wiring, and testing strategy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
opt-in spam check that bans an unapproved user who has accumulated too many short messages without graduating to "approved" status. Catches spammers who probe a channel with innocuous one-word messages ("hi", "hello", "yo") that individually evade content-based checks and the duplicate detector.
state is derived from the existing locator
messagestable via theidx_messages_gid_user_id_timeindex. No new in-memory counter, no cache, restart-resilient by construction.trigger formula:
(messagesFromUser - approvedCount) >= MaxShortMsgCountAND user is unapproved AND current message is short AND not a CheckOnly request.configuration
--max-short-msg-countenvMAX_SHORT_MSG_COUNTdefault0(disabled). Co-located with--min-msg-lenas a top-level scalar.--first-messages-count > 0or--first-message-only);--paranoidmode is incompatible and rejected at startup.--max-short-msg-count=3 --first-messages-count=2 --min-msg-len=50.design notes
isShortMsgFlood), not a separate struct or MetaCheck. Stateless within the detector, so encapsulation in a struct would add no value.ExtraDeleteIDswith prior message IDs so the listener cleans up the trailing flood the same way the duplicate detector does.--first-messages-count, the check skips for the rest of their lifetime.wiring
new
MessageCounterinterface inlib/tgspamconsumed by*storage.Locator(which gainsCountUserMessagesandUserMessageIDsmethods). Wired inapp/main.goafter the locator is constructed.settings round-trip covered: CLI flag, env var,
optToSettingsmapping,Settingsstruct withzeroAwarePathsentry, web UI form input + read-only display row, e2e UI test, and README.also adds a CLAUDE.md project rule documenting the wiring-layers checklist for new config parameters in general, codifies what was a recurring source of partially-broken features when one layer got missed.
Related to ##399