Skip to content

feat(security): implement URL lexical analysis and anti-bot fingerprinting#81

Open
RutujaSant wants to merge 1 commit intoHardhat-Enterprises:devfrom
RutujaSant:dev
Open

feat(security): implement URL lexical analysis and anti-bot fingerprinting#81
RutujaSant wants to merge 1 commit intoHardhat-Enterprises:devfrom
RutujaSant:dev

Conversation

@RutujaSant
Copy link
Copy Markdown

Scope & Summary
This PR introduces two critical security enhancements to the
smishing detection pipeline: a lexical analysis engine for
URLs and an infrastructure-level fingerprinting system for bot
defense.

  1. URL Lexical Entropy & Path Analysis:

    • Shannon Entropy: Calculates randomness in URL paths to
      identify algorithmically generated strings (DGA).
    • Path Depth & Extensions: Analyzes subdirectory counts
      and flags "unnatural" extensions (e.g., .php, .exe).
    • TLD Risk Table: Assigns risk scores to suspicious
      Top-Level Domains (e.g., .top, .link, .xyz).
    • Risk Scoring: Consolidates these features into a 0-100
      riskScore returned in the /api/scan response.
  2. Anti-Bot Fingerprinting:

    • Device Hashing: Generates a unique fingerprint using
      stable HTTP headers (User-Agent, Accept-Language,
      Sec-Ch-Ua, Accept-Encoding).
    • Sliding Window Tracker: Monitors "malicious scans"
      (smishing detections) per fingerprint within a
      15-minute window.
    • Shadow Banning: Automatically triggers a 429 Too Many
      Requests status once a client exceeds 10 malicious
      scans, preventing attackers from probing the API.

Deliverables

  • src/utils/entropy.js: Core lexical analysis logic.
  • src/middlewares/fingerprint.middleware.js: Client
    identification and rate-limiting logic.
  • Updated src/services/detections.service.js: Integrated URL
    analysis.
  • Updated src/controllers/scan.controller.js: Enhanced scan
    response and tracking.
  • Updated src/routes/scan.route.js: Applied fingerprinting
    protection.

Testing & Verification

  • Verified that google.com (Low Entropy) passes while
    bit.ly/x72K9aLpq1 (High Entropy) is flagged.
  • Confirmed that .exe extensions and risky TLDs significantly
    increase the riskScore.
  • Verified that the same browser headers maintain a
    consistent fingerprint.
  • Confirmed that a "Shadow Ban" (Status 429) triggers
    successfully after 10 simulated malicious scans from the
    same fingerprint.

- feat(utils): add URL lexical entropy and path analysis engine
Implemented Shannon entropy calculation and path depth/extension analysis for URL risk detection.

- feat(services): integrate URL analysis into detection service
Added analyzeMessageUrls to consolidate risk scores from lexical URL features.

- feat(middleware): add infrastructure-level client fingerprinting
Implemented header-based fingerprinting and sliding-window malicious scan tracking for anti-bot defense.

- feat(scan): enhance scan API with URL analysis and anti-bot protection
Integrated URL risk reporting and fingerprint tracking/shadow-banning into the scan controller.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant