feat: short-message flood detection for unapproved users by umputun · Pull Request #400 · umputun/tg-spam

umputun · 2026-04-29T20:50:40Z

opt-in spam check that bans an unapproved user who has accumulated too many short messages without graduating to "approved" status. Catches spammers who probe a channel with innocuous one-word messages ("hi", "hello", "yo") that individually evade content-based checks and the duplicate detector.

state is derived from the existing locator messages table via the idx_messages_gid_user_id_time index. No new in-memory counter, no cache, restart-resilient by construction.

trigger formula: (messagesFromUser - approvedCount) >= MaxShortMsgCount AND user is unapproved AND current message is short AND not a CheckOnly request.

configuration

--max-short-msg-count env MAX_SHORT_MSG_COUNT default 0 (disabled). Co-located with --min-msg-len as a top-level scalar.
requires the first-message evaluation path (--first-messages-count > 0 or --first-message-only); --paranoid mode is incompatible and rejected at startup.
recommended baseline: --max-short-msg-count=3 --first-messages-count=2 --min-msg-len=50.

design notes

wired as an internal method on Detector (isShortMsgFlood), not a separate struct or MetaCheck. Stateless within the detector, so encapsulation in a struct would add no value.
bypasses LLM consensus by design: the signal is behavioral, not content-based, and an LLM looking at one short message has no information that would justify overriding the count.
when triggered, populates ExtraDeleteIDs with prior message IDs so the listener cleans up the trailing flood the same way the duplicate detector does.
false-positive risk against naturally terse legitimate users is bounded to the evaluation window only; once the user crosses --first-messages-count, the check skips for the rest of their lifetime.

wiring

new MessageCounter interface in lib/tgspam consumed by *storage.Locator (which gains CountUserMessages and UserMessageIDs methods). Wired in app/main.go after the locator is constructed.

settings round-trip covered: CLI flag, env var, optToSettings mapping, Settings struct with zeroAwarePaths entry, web UI form input + read-only display row, e2e UI test, and README.

also adds a CLAUDE.md project rule documenting the wiring-layers checklist for new config parameters in general, codifies what was a recurring source of partially-broken features when one layer got missed.

Related to ##399

Codifies the requirement that any new configurable parameter (CLI flag, env var, or tunable knob) must be wired through CLI/env definition, DB-backed config, web settings UI, e2e UI test, and README. Recurring source of partially-broken features when one layer is missed.

Adds an opt-in spam check that bans an unapproved user when their locator message count minus approved count crosses --max-short-msg-count, and the current message is short. Closes the gap left by content-based checks for spammers probing the channel with innocuous one-word messages ("hi", "hello", "yo") that individually evade similarity, classifier, LLM, and duplicate detectors. State is derived from the existing messages table via the idx_messages_gid_user_id_time index — no new in-memory counter, no cache, restart-resilient by construction. The check is implemented as an internal method on Detector (isShortMsgFlood), gated by an outer guard so disabled mode does no work, and bypasses LLM consensus by design (behavioral signal not subject to content-based override). Wired through all required layers per CLAUDE.md: CLI flag and env var (--max-short-msg-count / MAX_SHORT_MSG_COUNT), DB-backed Settings with zero-aware semantics, web settings UI form field plus read-only display, e2e UI round-trip test, and README documentation. Respects training, dry, and soft-ban modes via the existing pipeline. Disabled by default (MaxShortMsgCount=0) and incompatible with paranoid mode (validated at startup). Related to #399

Captures the design decisions, wiring layers, edge cases, and task breakdown for the short-message flood detection feature. Filed under completed/ since the implementation has landed. Related to #399

Two refinements from post-implementation review: - README.md: drop the trigger-formula code block, locator-derivation paragraph, separate "When triggered" paragraph, dedicated "Rollout note", and split-out "Naturally terse legitimate users" paragraph. The section now matches the brevity of neighboring checker sections (single intro, one Important callout, Configure with). Implementation detail belongs in CLAUDE.md (already there). - lib/tgspam/detector.go: add a comment on the approved-user fast-bail in isShortMsgFlood explaining why it exists. In production main.go forces FirstMessageOnly=true whenever FirstMessagesCount > 0, so approved users short-circuit at the pre-approved branch in Check and never reach this method. The guard is defensive against library consumers that construct Detector with FirstMessageOnly=false and FirstMessagesCount > 0. Related to #399

Copilot

Pull request overview

Adds an opt-in “short-message flood” behavioral check to the spam detector to catch unapproved users who send many very short messages that individually evade content-based and duplicate detection. The implementation is stateless (derives counts from the locator DB) and is fully wired through CLI/settings, web UI, and tests.

Changes:

Introduce MaxShortMsgCount config/setting and a MessageCounter dependency to support short-message flood detection in Detector.Check.
Extend *storage.Locator with per-user count + message ID retrieval to back the detector’s new check.
Wire the new setting end-to-end (CLI/env → settings → web UI → e2e) and add unit + integration coverage.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
lib/tgspam/detector.go	Adds `MaxShortMsgCount`, `MessageCounter`, wiring setter, and the `isShortMsgFlood` check integrated into `Detector.Check`.
lib/tgspam/detector_test.go	Comprehensive unit tests covering enable/disable guards, threshold behavior, ID filtering, and error paths.
lib/tgspam/mocks/message_counter.go	Generated mock for the new `MessageCounter` interface used by tests.
app/storage/locator.go	Implements `CountUserMessages` and `UserMessageIDs` plus adds locking to `GetUserMessageIDs`.
app/storage/locator_test.go	Adds tests for the new locator count/IDs methods and gid isolation.
app/main.go	Adds CLI flag, startup validation, wires locator into detector via `WithMessageCounter`, and plumbs config into detector config.
app/main_test.go	Adds validation and detector-config wiring tests for `MaxShortMsgCount`.
app/settings.go	Maps CLI option `MaxShortMsgCount` into persisted settings.
app/settings_test.go	Verifies `optToSettings` mapping for `MaxShortMsgCount`.
app/config/settings.go	Adds `MaxShortMsgCount` to settings struct and to `zeroAwarePaths`.
app/config/settings_test.go	Tests zero-aware default merge behavior and JSON/YAML round-trip for the new setting.
app/webapi/config.go	Parses `maxShortMsgCount` from the settings form submission.
app/webapi/config_test.go	Tests web form parsing behavior (numeric/empty/non-numeric) for `maxShortMsgCount`.
app/webapi/assets/settings.html	Adds the new editable input and read-only display row for `MaxShortMsgCount`.
e2e-ui/e2e_test.go	Adds Playwright e2e settings round-trip test for `MaxShortMsgCount`.
app/events/listener_test.go	Integration test ensuring `short-msg-flood` propagates `ExtraDeleteIDs` and drives deletion/ban behavior.
README.md	Documents the new feature and adds the option row matching `--help` output.
CLAUDE.md	Adds a project rule/checklist for wiring new configurable parameters across all layers.
docs/plans/completed/20260429-max-short-msg-count.md	Completed implementation plan documenting design decisions, wiring, and testing strategy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

umputun added 4 commits April 29, 2026 15:12

docs(plans): max-short-msg-count implementation plan

90cf51d

Captures the design decisions, wiring layers, edge cases, and task breakdown for the short-message flood detection feature. Filed under completed/ since the implementation has landed. Related to #399

Copilot AI review requested due to automatic review settings April 29, 2026 20:50

Copilot started reviewing on behalf of umputun April 29, 2026 20:51 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

umputun merged commit 116d28f into master Apr 29, 2026
10 checks passed

umputun deleted the feature/max-short-msg-count-399 branch April 29, 2026 21:24

umputun mentioned this pull request Apr 29, 2026

short-message flood from unapproved users slips past existing checks #399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: short-message flood detection for unapproved users#400

feat: short-message flood detection for unapproved users#400
umputun merged 4 commits into
masterfrom
feature/max-short-msg-count-399

umputun commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

umputun commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants