Skip to content

feat: short-message flood detection for unapproved users#400

Merged
umputun merged 4 commits into
masterfrom
feature/max-short-msg-count-399
Apr 29, 2026
Merged

feat: short-message flood detection for unapproved users#400
umputun merged 4 commits into
masterfrom
feature/max-short-msg-count-399

Conversation

@umputun

@umputun umputun commented Apr 29, 2026

Copy link
Copy Markdown
Owner

opt-in spam check that bans an unapproved user who has accumulated too many short messages without graduating to "approved" status. Catches spammers who probe a channel with innocuous one-word messages ("hi", "hello", "yo") that individually evade content-based checks and the duplicate detector.

state is derived from the existing locator messages table via the idx_messages_gid_user_id_time index. No new in-memory counter, no cache, restart-resilient by construction.

trigger formula: (messagesFromUser - approvedCount) >= MaxShortMsgCount AND user is unapproved AND current message is short AND not a CheckOnly request.

configuration

  • --max-short-msg-count env MAX_SHORT_MSG_COUNT default 0 (disabled). Co-located with --min-msg-len as a top-level scalar.
  • requires the first-message evaluation path (--first-messages-count > 0 or --first-message-only); --paranoid mode is incompatible and rejected at startup.
  • recommended baseline: --max-short-msg-count=3 --first-messages-count=2 --min-msg-len=50.

design notes

  • wired as an internal method on Detector (isShortMsgFlood), not a separate struct or MetaCheck. Stateless within the detector, so encapsulation in a struct would add no value.
  • bypasses LLM consensus by design: the signal is behavioral, not content-based, and an LLM looking at one short message has no information that would justify overriding the count.
  • when triggered, populates ExtraDeleteIDs with prior message IDs so the listener cleans up the trailing flood the same way the duplicate detector does.
  • false-positive risk against naturally terse legitimate users is bounded to the evaluation window only; once the user crosses --first-messages-count, the check skips for the rest of their lifetime.

wiring

new MessageCounter interface in lib/tgspam consumed by *storage.Locator (which gains CountUserMessages and UserMessageIDs methods). Wired in app/main.go after the locator is constructed.

settings round-trip covered: CLI flag, env var, optToSettings mapping, Settings struct with zeroAwarePaths entry, web UI form input + read-only display row, e2e UI test, and README.

also adds a CLAUDE.md project rule documenting the wiring-layers checklist for new config parameters in general, codifies what was a recurring source of partially-broken features when one layer got missed.

Related to ##399

umputun added 4 commits April 29, 2026 15:12
Codifies the requirement that any new configurable parameter (CLI flag, env
var, or tunable knob) must be wired through CLI/env definition, DB-backed
config, web settings UI, e2e UI test, and README. Recurring source of
partially-broken features when one layer is missed.
Adds an opt-in spam check that bans an unapproved user when their locator
message count minus approved count crosses --max-short-msg-count, and the
current message is short. Closes the gap left by content-based checks for
spammers probing the channel with innocuous one-word messages ("hi",
"hello", "yo") that individually evade similarity, classifier, LLM, and
duplicate detectors.

State is derived from the existing messages table via the
idx_messages_gid_user_id_time index — no new in-memory counter, no cache,
restart-resilient by construction. The check is implemented as an internal
method on Detector (isShortMsgFlood), gated by an outer guard so disabled
mode does no work, and bypasses LLM consensus by design (behavioral signal
not subject to content-based override).

Wired through all required layers per CLAUDE.md: CLI flag and env var
(--max-short-msg-count / MAX_SHORT_MSG_COUNT), DB-backed Settings with
zero-aware semantics, web settings UI form field plus read-only display,
e2e UI round-trip test, and README documentation. Respects training, dry,
and soft-ban modes via the existing pipeline. Disabled by default
(MaxShortMsgCount=0) and incompatible with paranoid mode (validated at
startup).

Related to #399
Captures the design decisions, wiring layers, edge cases, and task
breakdown for the short-message flood detection feature. Filed under
completed/ since the implementation has landed.

Related to #399
Two refinements from post-implementation review:

- README.md: drop the trigger-formula code block, locator-derivation
  paragraph, separate "When triggered" paragraph, dedicated "Rollout
  note", and split-out "Naturally terse legitimate users" paragraph.
  The section now matches the brevity of neighboring checker sections
  (single intro, one Important callout, Configure with). Implementation
  detail belongs in CLAUDE.md (already there).

- lib/tgspam/detector.go: add a comment on the approved-user fast-bail
  in isShortMsgFlood explaining why it exists. In production main.go
  forces FirstMessageOnly=true whenever FirstMessagesCount > 0, so
  approved users short-circuit at the pre-approved branch in Check and
  never reach this method. The guard is defensive against library
  consumers that construct Detector with FirstMessageOnly=false and
  FirstMessagesCount > 0.

Related to #399
Copilot AI review requested due to automatic review settings April 29, 2026 20:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in “short-message flood” behavioral check to the spam detector to catch unapproved users who send many very short messages that individually evade content-based and duplicate detection. The implementation is stateless (derives counts from the locator DB) and is fully wired through CLI/settings, web UI, and tests.

Changes:

  • Introduce MaxShortMsgCount config/setting and a MessageCounter dependency to support short-message flood detection in Detector.Check.
  • Extend *storage.Locator with per-user count + message ID retrieval to back the detector’s new check.
  • Wire the new setting end-to-end (CLI/env → settings → web UI → e2e) and add unit + integration coverage.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.

Show a summary per file
File Description
lib/tgspam/detector.go Adds MaxShortMsgCount, MessageCounter, wiring setter, and the isShortMsgFlood check integrated into Detector.Check.
lib/tgspam/detector_test.go Comprehensive unit tests covering enable/disable guards, threshold behavior, ID filtering, and error paths.
lib/tgspam/mocks/message_counter.go Generated mock for the new MessageCounter interface used by tests.
app/storage/locator.go Implements CountUserMessages and UserMessageIDs plus adds locking to GetUserMessageIDs.
app/storage/locator_test.go Adds tests for the new locator count/IDs methods and gid isolation.
app/main.go Adds CLI flag, startup validation, wires locator into detector via WithMessageCounter, and plumbs config into detector config.
app/main_test.go Adds validation and detector-config wiring tests for MaxShortMsgCount.
app/settings.go Maps CLI option MaxShortMsgCount into persisted settings.
app/settings_test.go Verifies optToSettings mapping for MaxShortMsgCount.
app/config/settings.go Adds MaxShortMsgCount to settings struct and to zeroAwarePaths.
app/config/settings_test.go Tests zero-aware default merge behavior and JSON/YAML round-trip for the new setting.
app/webapi/config.go Parses maxShortMsgCount from the settings form submission.
app/webapi/config_test.go Tests web form parsing behavior (numeric/empty/non-numeric) for maxShortMsgCount.
app/webapi/assets/settings.html Adds the new editable input and read-only display row for MaxShortMsgCount.
e2e-ui/e2e_test.go Adds Playwright e2e settings round-trip test for MaxShortMsgCount.
app/events/listener_test.go Integration test ensuring short-msg-flood propagates ExtraDeleteIDs and drives deletion/ban behavior.
README.md Documents the new feature and adds the option row matching --help output.
CLAUDE.md Adds a project rule/checklist for wiring new configurable parameters across all layers.
docs/plans/completed/20260429-max-short-msg-count.md Completed implementation plan documenting design decisions, wiring, and testing strategy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@umputun umputun merged commit 116d28f into master Apr 29, 2026
10 checks passed
@umputun umputun deleted the feature/max-short-msg-count-399 branch April 29, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants