feat(security): implement URL lexical analysis and anti-bot fingerprinting by RutujaSant · Pull Request #81 · Hardhat-Enterprises/smishing-backend

RutujaSant · 2026-04-04T08:55:08Z

Scope & Summary
This PR introduces two critical security enhancements to the
smishing detection pipeline: a lexical analysis engine for
URLs and an infrastructure-level fingerprinting system for bot
defense.

URL Lexical Entropy & Path Analysis:
- Shannon Entropy: Calculates randomness in URL paths to
  identify algorithmically generated strings (DGA).
- Path Depth & Extensions: Analyzes subdirectory counts
  and flags "unnatural" extensions (e.g., .php, .exe).
- TLD Risk Table: Assigns risk scores to suspicious
  Top-Level Domains (e.g., .top, .link, .xyz).
- Risk Scoring: Consolidates these features into a 0-100
  riskScore returned in the /api/scan response.
Anti-Bot Fingerprinting:
- Device Hashing: Generates a unique fingerprint using
  stable HTTP headers (User-Agent, Accept-Language,
  Sec-Ch-Ua, Accept-Encoding).
- Sliding Window Tracker: Monitors "malicious scans"
  (smishing detections) per fingerprint within a
  15-minute window.
- Shadow Banning: Automatically triggers a 429 Too Many
  Requests status once a client exceeds 10 malicious
  scans, preventing attackers from probing the API.

Deliverables

src/utils/entropy.js: Core lexical analysis logic.
src/middlewares/fingerprint.middleware.js: Client
identification and rate-limiting logic.
Updated src/services/detections.service.js: Integrated URL
analysis.
Updated src/controllers/scan.controller.js: Enhanced scan
response and tracking.
Updated src/routes/scan.route.js: Applied fingerprinting
protection.

Testing & Verification

Verified that google.com (Low Entropy) passes while
bit.ly/x72K9aLpq1 (High Entropy) is flagged.
Confirmed that .exe extensions and risky TLDs significantly
increase the riskScore.
Verified that the same browser headers maintain a
consistent fingerprint.
Confirmed that a "Shadow Ban" (Status 429) triggers
successfully after 10 simulated malicious scans from the
same fingerprint.

- feat(utils): add URL lexical entropy and path analysis engine Implemented Shannon entropy calculation and path depth/extension analysis for URL risk detection. - feat(services): integrate URL analysis into detection service Added analyzeMessageUrls to consolidate risk scores from lexical URL features. - feat(middleware): add infrastructure-level client fingerprinting Implemented header-based fingerprinting and sliding-window malicious scan tracking for anti-bot defense. - feat(scan): enhance scan API with URL analysis and anti-bot protection Integrated URL risk reporting and fingerprint tracking/shadow-banning into the scan controller.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security): implement URL lexical analysis and anti-bot fingerprinting#81

feat(security): implement URL lexical analysis and anti-bot fingerprinting#81
RutujaSant wants to merge 1 commit intoHardhat-Enterprises:devfrom
RutujaSant:dev

RutujaSant commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RutujaSant commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant