Skip to content

Conversation

@stmh
Copy link
Member

@stmh stmh commented Sep 11, 2025

No description provided.

stmh added 30 commits August 16, 2025 18:36
Add comprehensive OAuth2/OIDC authentication support using oauth2-proxy
and GitLab as identity provider, alongside existing bearer token auth.

Core Changes:
- Implement hybrid authentication system supporting dev/oauth/bearer modes
- Add OAuth2-proxy integration with Traefik reverse proxy setup
- Update frontend to detect and adapt to different authentication modes
- Extend API middleware to handle OAuth headers and session management

New Features:
- Development mode: No authentication for local development
- OAuth mode: GitLab OIDC via oauth2-proxy with cookie-based sessions
- Bearer mode: Traditional API token authentication (existing behavior)
- Complete Docker Compose development environment with profiles
- Frontend auth mode detection with appropriate UI changes

Configuration:
- Add AuthMode enum with dev/oauth/bearer variants in scotty-core
- Extend ApiServer settings with OAuth redirect URL and dev user config
- Update CurrentUser struct to include email, name, and optional access token
- Add oauth2-proxy configuration with GitLab OIDC integration

Development Setup:
- Docker Compose profiles for different authentication modes
- Helper scripts for easy local development and testing
- Complete OAuth development example in examples/oauth2-proxy/
- 1Password integration for secure secret management

Documentation:
- Comprehensive CLAUDE.md with architecture and development guidance
- OAuth testing documentation in LOCAL_OAUTH_TESTING.md
- Configuration examples for all authentication modes

Testing:
- Full OAuth flow validation with Playwright browser testing
- Traefik service discovery and routing verification
- End-to-end authentication testing across all modes

This implementation provides a production-ready OAuth authentication
system while maintaining backward compatibility with existing bearer
token authentication.
BREAKING CHANGE: API endpoints restructured with protected resources moved to /api/v1/authenticated/* namespace

Enable secure GitLab OIDC authentication for Scotty's SPA interface while maintaining CLI bearer token support. This production-ready implementation replaces manual token entry with seamless OAuth flow using oauth2-proxy v7.6.0 and Traefik ForwardAuth middleware.

Key improvements:
- Hybrid authentication modes: dev, oauth, and bearer for different use cases
- GitLab OIDC integration supporting both gitlab.com and self-hosted instances
- Redis-backed session storage eliminating 4KB cookie size limitations
- Clean API separation: public endpoints at /api/v1/ and authenticated at /api/v1/authenticated/*
- Reusable ForwardAuth middleware for protecting additional Docker applications
- Production-ready Docker Compose setup with Traefik reverse proxy

Technical implementation:
- ForwardAuth pattern validates sessions and injects user headers into backend requests
- Breaking change: moved protected API endpoints to dedicated authenticated namespace
- Frontend auto-detects authentication mode and handles OAuth redirects seamlessly
- CLI tools updated to use new authenticated API namespace
- Comprehensive example setups for both development and production OAuth scenarios

This enables self-service user authentication without manual token management while preserving existing CLI workflows and adding enterprise-grade session management.
- Add #[derive(Default)] to AuthMode enum as suggested by clippy
- Add #[allow(dead_code)] to access_token field for future OAuth implementations
- Format all code with cargo fmt

All tests passing and clippy warnings resolved.
- Replace 'any' type with proper LoginResponse interface in auth flow
- Add proper type annotations for API response handling
- Format code to match project style guidelines
The monitorTask function was calling getTask() which only checked the local
store instead of actually fetching task status from the API. Changed to use
requestTaskDetails() to properly fetch task status updates, enabling the
loading spinner animation when running/stopping apps.
- Remove automatic OAuth redirects from login page, require user button click
- Add auth_mode to /api/v1/info endpoint for cleaner auth mode detection
- Use validate-token endpoint for proper authentication checking
- Remove Traefik auto-redirect middlewares that bypass login page
- Consolidate OAuth/bearer redirect handling to both use /login route

Improves user experience by showing login page with explicit OAuth button
instead of automatic redirects that users might miss.
- Add comprehensive OAuth authentication guide with oauth2-proxy and Traefik
- Remove outdated LOCAL_OAUTH_TESTING.md file
- Update configuration.md with new auth_mode options and environment variables
- Add oauth-authentication.md to docs navigation

The new documentation covers the current OAuth implementation including:
- Three authentication modes (dev, oauth, bearer)
- Route-based protection (/api/v1/authenticated/*)
- Working examples in examples/oauth2-proxy-oauth/
- Complete setup instructions and troubleshooting
- Reduce healthcheck interval from 30s to 10s for faster feedback
- Add 15s start period to prevent failures during startup
- Reduce timeout from 3s to 2s (health endpoint responds in ~20ms)
- Add explicit retries=3 for clarity

This improves Docker Compose startup time and provides more responsive
health status feedback during development and deployment.
…tion

- Implement hybrid OAuth approach (Option C) with temporary session exchange
- Add OAuth session storage with 5-minute expiry for security
- Create `/oauth/exchange` API endpoint for frontend token retrieval
- Update OAuth callback to redirect to frontend with session ID
- Separate API and frontend OAuth callback paths to prevent conflicts
- Modify frontend to handle session exchange flow instead of direct tokens
- Add comprehensive OAuth state management and validation
- Ensure no sensitive tokens appear in URLs or browser history
- Support configurable frontend callback URLs via redirect_uri parameter

This approach provides secure OAuth authentication while maintaining
flexibility for different deployment configurations and prevents
common security issues like token exposure in URLs.
…Gravatar support

This comprehensive refactoring transforms the OAuth implementation from GitLab-specific
to fully OIDC-compliant and provider-agnostic, while adding modern UI enhancements.

**OIDC Standards Compliance:**
- Replace GitLab-specific `gitlab_url` with generic `oidc_issuer_url` configuration
- Update `GitLabUser` to `OidcUser` with OIDC standard fields (`sub`, `preferred_username`)
- Use OIDC `/oauth/userinfo` endpoint instead of provider-specific endpoints
- Support optional OIDC claims with intelligent fallbacks

**Provider Interchangeability:**
- Support GitLab, Auth0, Keycloak, Google, and other OIDC providers
- Provider-agnostic configuration and documentation
- Remove all GitLab-specific references from codebase

**Enhanced Authentication UX:**
- Add reactive user store for immediate UI updates after OAuth login
- Implement session exchange flow to avoid exposing tokens in URLs
- Fix user info display appearing only after page reload
- Add proper error handling and debugging for OAuth flows

**Modern Avatar System:**
- Add Gravatar support with MD5 hashing for user avatars
- Implement smart fallback system: Gravatar → initials → generic avatar
- Create reusable UserAvatar component with multiple sizes and shapes
- Enhanced user dropdown with professional styling and logout icon

**Updated Documentation:**
- Comprehensive OIDC authentication guide with provider examples
- Updated CLI documentation with new command structure
- Provider setup instructions for GitLab, Auth0, Keycloak, and Google
- Migration guide from oauth2-proxy to native OIDC implementation

**Frontend Improvements:**
- Reactive authentication state management with Svelte stores
- DaisyUI integration for consistent avatar styling
- Enhanced user interface with Gravatar images and rich user dropdowns
- Improved error handling and user feedback

**Backend Enhancements:**
- Better OIDC user field handling with meaningful fallbacks
- Improved session management and token validation
- Enhanced OAuth error handling and debugging capabilities

The system now works seamlessly with any OIDC-compliant provider while providing
a modern, professional user experience with Gravatar support and reactive UI updates.
Complete end-to-end OAuth device flow implementation enabling CLI authentication
with OIDC providers like GitLab. This adds native device flow support alongside
the existing web-based OAuth flow.

Key improvements:
- Implement full device flow token polling and exchange in Scotty server
- Add proper OIDC provider integration with device authorization grant
- Fix server info endpoint to use OIDC-compliant field names
- Resolve server URL mismatch between localhost and 127.0.0.1 in token storage
- Update scottyctl to handle device flow authentication properly
- Add comprehensive error handling for OAuth flow states

Technical changes:
- Server: Add exchange_device_code_for_token() method for GitLab token polling
- Server: Store device flow session interval for proper polling cadence
- Server: Update info handler to return oidc_issuer_url instead of gitlab_url
- scottyctl: Fix token storage to use user-provided server URL
- scottyctl: Remove placeholder user info and use actual token response data
- scottyctl: Update OAuth structures to be fully OIDC-compliant

The device flow now supports the complete OAuth 2.0 Device Authorization Grant
flow (RFC 8628) with proper error handling for authorization_pending,
access_denied, and expired_token scenarios.

Tested with GitLab OIDC provider - full authentication and API access working.
- Remove unused variables in frontend components
  - Remove imageLoaded variable from user-avatar component
  - Remove oauthRedirectUrl variable from login page
- Apply Prettier formatting across all frontend files
- Fix arrow function formatting in stores and components
- Improve code readability with consistent indentation

These changes resolve ESLint warnings and ensure consistent code style
across the frontend codebase.
Update all documentation to use correct CLI command syntax:
- Change apps subcommand to app:subcommand format
- Change blueprints list to blueprint:list
- Change notifications add/remove to notify:add/notify:remove

This aligns documentation with actual CLI implementation.
Implement complete test suite for bearer token and OAuth authentication flows:

**Bearer Authentication Tests (14 tests)**
- Valid/invalid/missing token authentication scenarios
- Malformed headers and public endpoint access validation
- Configuration-dependent behavior testing
- Cross-authentication mode validation

**OAuth Authentication Tests (8 tests)**
- Complete OAuth device flow: authorization → token → protected endpoint access
- OAuth web flow: authorization URL generation, callback handling, session exchange
- Mock OAuth provider integration with exact API format matching
- OAuth provider error handling and authorization pending states

**Key Technical Achievements**
- Tests actual Scotty application router with complete middleware stack
- Uses AppState access for OAuth session store manipulation to test complete flows
- Mock OAuth provider with wiremock exactly matches implementation request formats
- Validates end-to-end authentication: auth flow → token → protected API access
- All 22 tests validate real implementation behavior, not isolated components

The AppState approach solves OAuth web flow testing complexity by directly
populating session stores, enabling complete flow validation without complex
callback coordination between HTTP requests.

Dependencies: Add axum-test, tokio-test, wiremock for testing infrastructure.
Removed assert!(true) statements that would be optimized out by the compiler,
as flagged by clippy's assertions_on_constants lint. These assertions provided
no actual test value and were replaced with comments explaining the test flow.
- Add preflight version check using semver to ensure major/minor compatibility
- Create shared ServerInfo and OAuthConfig types in scotty_core for consistency
- Add --bypass-version-check flag for emergency situations
- Update AuthMode enum to support serialization with proper defaults
- Version check runs before commands requiring server connection
- Show clear error messages when versions are incompatible
- Skip version check for auth commands (login/logout) and completion

This prevents backwards compatibility issues by ensuring scottyctl and
scotty server versions are compatible before executing operations.
- Replace direct println! calls with app_context.ui() methods
- Use proper success/failed status methods for better terminal integration
- Reduce excessive emoji usage for cleaner, more professional output
- Maintain colored text for important information (URLs, usernames, servers)
- Ensure consistent UI behavior with status line management
This commit centralizes shared functionality in scotty-core and significantly
improves OAuth error handling with type-safe enums.

## New shared modules in scotty-core:
- Add HTTP client with retry logic and exponential backoff
- Add unified OAuth types with type-safe error enums
- Add version management utilities with compatibility checking
- Move retry logic from scottyctl to shared location

## OAuth improvements:
- Replace string literals with OAuthErrorCode enum for type safety
- Add built-in error descriptions to OAuth error codes
- Update all OAuth handlers to use type-safe error responses
- Maintain backward compatibility with legacy error formats

## HTTP client consolidation:
- Create shared HttpClient with builder pattern and timeout support
- Replace scattered reqwest::Client usage with shared implementation
- Add proper error handling and retry policies across the workspace
- Update OAuth flows to use shared HTTP client

## Version management:
- Add comprehensive version comparison and compatibility utilities
- Update preflight checker to use shared version management
- Add user-friendly version formatting and update recommendations
- Include extensive test coverage for version handling

## Code cleanup:
- Remove unused dependencies (utoipa-axum)
- Fix clippy warnings and improve code consistency
- Update imports to use shared types across workspace
- Maintain full backward compatibility

All tests pass, ensuring no regressions were introduced.
This commit consolidates the OAuth error handling across the Scotty project by:

- Unifying OAuthError and OAuthErrorCode into a single comprehensive error type in scotty-core
- Implementing smart IntoResponse for AppError that returns OAuth-compliant ErrorResponse format for OAuth errors
- Adding proper HTTP status code mappings for all OAuth error types (400, 401, 403, 404, 429)
- Fixing device flow "Server error" issue by improving scottyctl error handling to process all OAuth status codes
- Adding SlowDown variant for handling OAuth2 "slow_down" error during device flow polling
- Maintaining OAuth2 RFC 6749 compliance while simplifying the error architecture
- Updating all components (scotty, scottyctl, scotty-core) to use the unified system with proper error conversions

The device flow now properly handles polling rate limiting and provides specific error messages instead of generic "Server error" responses.
Add role-based access control (RBAC) system using Casbin for granular
permission management across apps and groups.

Key features:
- Group-based app organization (development, staging, production, default)
- Role-based permissions (viewer, operator, developer, admin)
- Per-app permission checks (view, manage, shell, logs, create, destroy)
- Universal default group access for all users via wildcard assignment
- Authorization middleware for API endpoints
- Groups list endpoint (/api/v1/authenticated/groups/list)
- Seamless fallback when authorization config unavailable

This enables secure multi-tenant app management while maintaining
backward compatibility.
- Fix token bounds checking in basic_auth.rs to prevent panics with short tokens
- Implement proper get_user_permissions method in authorization service
- Update authorization middleware to extract app names from API v1 paths
- Fix middleware ordering by using State extractor instead of request extensions
- Update router to use from_fn_with_state for require_permission middleware

These changes resolve authentication failures and "App state not found" errors
in the RBAC authorization system.
- Remove references to legacy api.access_token fallback
- Add migration instructions for existing bearer token installations
- Update PRD to reflect completed Phase 4 enforcement
- Document middleware architecture improvements
- Add warnings about breaking changes for bearer token authentication

The authorization system now requires explicit RBAC assignments for all
bearer tokens, removing the legacy fallback behavior.
Updates scottyctl to check server auth mode before using stored OAuth tokens.
When server is in bearer mode, prioritizes --access-token parameter and
SCOTTY_ACCESS_TOKEN environment variable over stored OAuth credentials.

Resolves authentication failures where scottyctl would attempt to use
invalid OAuth tokens against servers configured for bearer authentication.
- Remove fallback authorization service - RBAC is now required
- Update tests to use actual RBAC configuration instead of fallback
- Add test bearer token to policy configuration
- Remove obsolete test for no-token configuration
- Improve log format with timestamp, level, target, and message
- Clean up telemetry configuration and reduce verbose span output
stmh added 2 commits October 27, 2025 13:45
…-token-comparison

fix(security): use constant-time comparison for bearer token validation
…se URL

The OAuth callback handler used a hardcoded localhost URL for redirecting
users back to the frontend after authentication. This prevented OAuth from
working in production environments accessed via different hostnames.

Changes:
- Added 'frontend_base_url' field to OAuthSettings configuration
- Defaults to 'http://localhost:21342' for backward compatibility
- Updated OAuth callback handler to use configured URL instead of hardcoded value
- Maintains support for per-flow frontend_callback_url override

The frontend_base_url can now be configured in the settings file:
  [api.oauth]
  frontend_base_url = "https://scotty.example.com"

Fixes: #517
Relates to: scotty-36
@claude
Copy link

claude bot commented Oct 27, 2025

Code Review - PR #467 (v0.2.0)

Thank you for this comprehensive PR introducing authentication and authorization to Scotty! This is a significant enhancement that adds OAuth 2.0, bearer token auth, and RBAC via Casbin. The implementation shows solid engineering practices, but there are critical security issues that must be addressed before merging.

🔴 Critical Issues (Must Fix)

1. Plaintext Secrets in Version Control

Files: config/default.yaml, config/local.yaml

Bearer tokens and OAuth client secrets are stored as plaintext in configuration files:

bearer_tokens:
    admin: "gFW5k1fdvYw8iB2xCxw5qZXj5pkP9dga"
    client-a: "j3Xq973L67JVAsQpU4PfZAMKdMsfdXsu"

Risk: Anyone with repository access can see these secrets.

Recommendation:

  • Remove all plaintext secrets from config files
  • Require environment variables for production (e.g., SCOTTY__API__BEARER_TOKENS__ADMIN)
  • Add startup validation that fails if secrets aren't provided via env vars
  • Consider storing only hashed tokens in configs

2. No Token Expiration Mechanism

Files: Throughout auth implementation

Bearer tokens have no expiration - once issued, they're valid indefinitely.

Risk: Stolen tokens can be used perpetually.

Recommendation:

  • Implement JWT tokens with expiration claims
  • Add token refresh mechanism
  • Implement token revocation API

3. Insecure Authorization Fallback

File: scotty/src/app_state.rs:97-109

If Casbin config fails to load, the system falls back to a permissive default:

Err(e) => {
    warn!("Failed to load authorization config...");
    FallbackService::create_fallback_service(settings.api.access_token.clone()).await
}

Risk: Configuration errors could grant unintended access.

Recommendation: Fail closed - refuse to start if authorization can't be loaded.

4. Container Running as Root

File: Dockerfile:78-79

User switching is commented out:

# RUN chown -R $APP_USER:$APP_USER ${APP}
# USER $APP_USER

Risk: Container breakout vulnerabilities.

Recommendation: Uncomment and fix permission issues properly, never run as root.

🟡 High Priority Issues (Should Fix)

5. Missing Input Validation on Admin Endpoints

Files: scotty/src/api/rest/handlers/admin/*.rs

Limited validation on user IDs, role names, scope names could allow injection attacks or invalid data.

Recommendation: Add comprehensive input validation:

fn validate_identifier(s: &str) -> Result<(), &'static str> {
    if s.is_empty() || s.len() > 128 {
        return Err("Invalid length");
    }
    if !s.chars().all(|c| c.is_alphanumeric() || c == '-' || c == '_') {
        return Err("Invalid characters");
    }
    Ok(())
}

6. In-Memory Session Storage Without Cleanup

File: scotty/src/oauth/handlers.rs

OAuth sessions stored in HashMap with no cleanup mechanism.

Risk: Memory leak over time.

Recommendation:

  • Implement background cleanup task for expired sessions
  • Consider Redis for production deployments
  • Use tokio::sync::RwLock instead of std::sync::Mutex

7. Limited Security Test Coverage

Files: Test files throughout

Found basic tests but missing critical security scenarios:

  • No tests for token timing attacks
  • No tests for session expiration
  • No tests for CSRF bypass attempts
  • No tests for authorization bypass attempts

Recommendation: Add comprehensive security test suite.

8. Missing Security Scanning in CI

File: .github/workflows/ci.yml

CI runs tests, fmt, clippy but no security scans.

Recommendation: Add:

- name: Security audit
  run: cargo audit
- name: Dependency checks
  run: cargo deny check

🟢 Positive Observations

Excellent work on several security aspects:

  1. Constant-time comparison: Properly uses subtle::ConstantTimeEq for token comparison (auth_core.rs:181)
  2. Secret masking: Excellent MaskedSecret implementation with partial display
  3. Type safety: Good use of Rust's type system for permissions and errors
  4. OAuth 2.0 compliance: Proper implementation of device flow and web flow
  5. RBAC structure: Well-designed authorization with Casbin

📊 Additional Recommendations

Medium Priority:

  • Implement permission caching for performance (check_permission called on every request)
  • Extract duplicate token validation logic into shared function
  • Add rate limiting on OAuth endpoints
  • Standardize error handling (mix of anyhow::Result and Result<T, AppError>)

Low Priority:

  • Add audit logging for all admin actions
  • Implement security headers middleware
  • Document the Casbin permission model thoroughly
  • Add permission visualization tooling

🎯 Overall Assessment

Risk Level: HIGH

This PR implements valuable functionality with generally good code quality, but critical security issues prevent merging in current state. The secrets management and token expiration issues are particularly concerning for production use.

Recommendation: Address all critical issues before merge. The authentication system should be secure by default, with production-ready secret management.

📝 Testing Recommendations

Before merging, ensure:

  • All secrets removed from config files
  • Token expiration implemented and tested
  • Authorization fails closed when config missing
  • Container runs as non-root user
  • Input validation added with tests
  • Security test suite expanded
  • CI includes cargo audit
  • Documentation updated with security best practices

Overall: Great foundation! The architecture is solid and the OAuth implementation is well-structured. Once the security issues are addressed, this will be a strong addition to Scotty.

Let me know if you'd like me to elaborate on any of these points or provide code examples for the fixes.

stmh added 4 commits October 27, 2025 13:52
Replaces standard string equality with constant-time comparison using
the subtle crate to prevent timing attacks that could reveal valid
bearer tokens through response time measurements.

The vulnerable code used simple == comparison which short-circuits on
the first character mismatch, allowing attackers to determine correct
characters through timing analysis. This fix ensures all token
comparisons take the same amount of time regardless of match position.

Fixes: #520
Relates to: scotty-39
The login handler was still using standard == comparison for bearer
tokens, vulnerable to the same timing attack fixed in auth_core.rs.

This completes the timing attack mitigation across all token
comparison points in the authentication system.

Addresses PR #524 review feedback
Add documentation for the new frontend_base_url configuration option
that allows OAuth to work with production domains instead of being
hardcoded to localhost.

Changes:
- Add frontend_base_url to OAuth configuration examples
- Add SCOTTY__API__OAUTH__FRONTEND_BASE_URL environment variable
- Add "Understanding OAuth URLs" section explaining the difference
  between redirect_url (backend callback) and frontend_base_url
  (frontend application base)
- Include production configuration example
- Update environment variables reference table

This documents the fix for scotty-36 where OAuth callback URLs were
hardcoded to localhost, preventing OAuth from working in production
environments.
@claude
Copy link

claude bot commented Oct 27, 2025

Code Review: Version 0.2.0

This is a major release introducing OAuth authentication, RBAC authorization, unified output system, and observability. This is a comprehensive review with specific findings.


🎯 Executive Summary

Overall Assessment:APPROVE with Important Recommendations

This is a well-architected and substantial release that significantly enhances Scotty's security and operational capabilities. The implementation demonstrates solid engineering practices, good separation of concerns, and comprehensive testing (777 test occurrences across 57 files). However, there are critical security concerns that should be addressed before production deployment.

Changed: 245 files, +30,318 additions, -2,253 deletions


🔒 Critical Security Issues

1. 🔴 CRITICAL: Hardcoded Bearer Tokens in Version Control

Location: config/default.yaml:7-11

bearer_tokens:
    admin: "gFW5k1fdvYw8iB2xCxw5qZXj5pkP9dga"
    client-a: "j3Xq973L67JVAsQpU4PfZAMKdMsfdXsu"
    client-b: "mPFMijuZeEAaus94hWApEHzD8JMhrcRk"
    hello-world: "mv7UuZddtVpAM5c6tWCFejCp1vizuSlX"
    test-bearer-token-123: "9oWZsLsUzFvC01nPqB6nzZpDK6Zlv3KR"

Issue: While these appear to be example tokens, committing ANY tokens to version control is a critical security vulnerability. These could be accidentally deployed to production or leaked via forks.

Recommendation:

  • Replace all token values with placeholder strings: REPLACE_VIA_ENV_VAR
  • Add validation to reject placeholder tokens on startup
  • Document clearly in comments that real tokens MUST be set via environment variables
  • Consider scanning repository history and rotating any tokens that were previously committed

Impact: HIGH - If these are real tokens or get reused, they provide unauthorized access


2. 🟡 MEDIUM: Wildcard User Authorization

Location: config/casbin/policy.yaml:70-73

'*':
  - role: viewer
    scopes:
    - default

Issue: The wildcard '*' assignment grants viewer access to the default scope for ANY user identifier. This could grant unintended access if the authentication layer has a bypass.

Recommendation:

  • Document this behavior clearly in authorization documentation
  • Ensure this is intentional for your use case (e.g., public read-only access)
  • Consider removing this in production environments or restricting to specific authenticated users
  • Add explicit warnings in the policy file comments

Impact: MEDIUM - Potential unauthorized read access if auth is misconfigured


3. 🟡 MEDIUM: Token Storage in sessionStorage

Location: frontend/src/lib/sessionManager.ts

Positive: Unlike previous reviews suggested, this implementation uses sessionStorage (not localStorage), which is significantly more secure:

  • ✅ Tokens are cleared when the browser tab closes
  • ✅ Not shared across browser tabs
  • ✅ Limited lifetime exposure

Remaining Concerns:

  • Still vulnerable to XSS attacks (can't avoid with client-side storage)
  • No explicit token expiration handling in the storage layer

Recommendations:

  • Add Content Security Policy (CSP) headers to mitigate XSS
  • Implement token expiration validation when retrieving from storage
  • Consider adding integrity checks (e.g., HMAC) if feasible
  • Document the XSS risk in security documentation

Impact: MEDIUM - Acceptable for many use cases, but requires XSS prevention


🐛 Code Quality Issues

4. 🟡 MEDIUM: Panic-Prone Code with Multiple unwrap() Calls

Found: 20 files with .unwrap() calls (searched specifically for production code)

Example: scotty/src/settings/config.rs:179

let settings: Settings = builder.build().unwrap().try_deserialize().unwrap();

Issue: Using unwrap() in configuration loading can cause application panics instead of graceful error handling.

Recommendation:

let settings: Settings = builder.build()
    .context("Failed to build configuration")?
    .try_deserialize()
    .context("Failed to deserialize settings")?;

Files to Review:

  • scottyctl/src/commands/apps/mod.rs
  • scottyctl/src/utils/*.rs
  • scotty/src/oauth/device_flow.rs
  • scotty/src/oauth/handlers.rs

Impact: MEDIUM - Can cause crashes instead of error messages


5. 📝 LOW: Outstanding TODO Items

Found 4 TODO items in production code:

  1. scottyctl/src/api.rs:133 - Token expiration check and refresh logic
  2. scottyctl/src/commands/auth.rs:173 - Token refresh implementation
  3. scotty/src/api/rest/handlers/admin/assignments.rs:211 - Remove assignment method ⚠️
  4. scotty/src/oauth/handlers.rs:556 - JWT token generation instead of OAuth token

Recommendation:

Impact: LOW-MEDIUM - Some features incomplete but non-blocking


✅ Excellent Implementation Aspects

Architecture & Design

  • Clean separation: Authentication vs authorization properly layered
  • Flexible auth modes: Dev, OAuth, and Bearer with proper mode detection
  • RBAC with Casbin: Industry-standard authorization engine
  • Type safety: Excellent use of Rust's type system and TypeScript generation

Security Strengths

  • OAuth with PKCE: Proper authorization code flow implementation
  • Proper secret handling: Uses secrecy crate with zeroize
  • Error handling: Comprehensive error types without leaking sensitive data
  • Permission middleware: Deny-by-default authorization
  • OIDC validation: Native token validation without proxy dependencies
  • sessionStorage usage: Better than localStorage for tokens

Code Organization

  • Modular crates: scotty-core, scotty-types, ts-generator
  • Comprehensive testing: 777 test occurrences across 57 files
  • Integration tests: OAuth flows, bearer auth, WebSocket messaging
  • Documentation: PRDs, technical docs, and inline documentation

Features

  • Unified output system: Proper time synchronization for logs/output
  • Observability stack: OpenTelemetry, Grafana, 40+ metrics
  • WebSocket streaming: Real-time logs and task output
  • Multi-tenant scopes: Proper app isolation

📊 Test Coverage

Strong test coverage found:

  • 57 files with test code
  • 777 test occurrences throughout codebase
  • Dedicated test files:
    • scotty/src/api/bearer_auth_tests.rs
    • scotty/src/api/oauth_flow_tests.rs
    • scotty/src/api/rest/handlers/login_test.rs
    • scotty/src/services/authorization/tests.rs
    • Multiple integration tests for Docker services

Recommendations:

  • ✅ Backend testing is excellent
  • Consider adding frontend component tests for Svelte components
  • Add E2E tests for complete auth flows with real browsers

🚀 Performance Considerations

Good Practices Found

  • TimedBuffer batching: Reduces lock contention 20-100x (per docs)
  • Async/await: Proper use of Tokio for concurrent operations
  • WebSocket streaming: Efficient real-time data delivery

Optimization Opportunities

  • Permission caching: Consider caching user permissions per session rather than fetching on every request
  • RBAC policy loading: Cache Casbin policies in memory after initial load
  • Connection pooling: Ensure database/storage connections are pooled

Impact: LOW - Current implementation should perform well, optimizations are nice-to-haves


📋 Action Items Summary

Before Production Deployment (CRITICAL)

  1. 🔴 Replace hardcoded bearer tokens in config/default.yaml with placeholders
  2. 🔴 Review and document wildcard authorization policy
  3. 🟡 Add CSP headers to mitigate XSS risks
  4. 🟡 Replace unwrap() calls in configuration and error-prone code

Follow-Up Work (Important but Non-Blocking)

  1. 📝 Create GitHub issues for all TODO items
  2. 📝 Implement assignment removal functionality
  3. 📝 Add token refresh logic for CLI
  4. 📝 Consider permission caching for performance

Nice-to-Haves

  1. Add frontend component tests
  2. Implement E2E authentication tests
  3. Add permission caching optimization
  4. Document security threat model

🎓 Documentation Quality

Excellent documentation found:

  • docs/content/oauth-authentication.md - Comprehensive OAuth guide
  • docs/content/authorization.md - RBAC documentation
  • docs/content/observability.md - Observability setup
  • docs/prds/authorization-system.md - Design rationale
  • docs/prds/unified-output-system.md - Technical decisions
  • README.md - Updated with new features and security notes

Recommendation: Add a SECURITY.md file documenting:

  • Token management best practices
  • Environment variable requirements
  • Security considerations for different deployment scenarios

🏁 Final Recommendation

Verdict:APPROVE with conditions

This is an excellent release that represents a major milestone for Scotty. The architecture is sound, the code quality is high, and the test coverage is comprehensive.

Before merging to production:

  1. Address the critical security issue (hardcoded tokens)
  2. Review and document the wildcard authorization policy
  3. Add CSP headers for XSS protection

The identified issues are typical for a major authentication overhaul and are straightforward to fix. The benefits of this release far outweigh the issues.

Estimated remediation time: 2-4 hours for critical items

Confidence: High - This is production-ready after addressing the critical security items.


Great work on this release! The authentication and authorization system is well-designed and the observability features are impressive. 🎉

stmh added 3 commits October 27, 2025 14:30
The login handler was still using standard == comparison for bearer
tokens, vulnerable to the same timing attack fixed in auth_core.rs.

This completes the timing attack mitigation across all token
comparison points in the authentication system.

Addresses PR #524 review feedback
…se URL

The OAuth callback handler used a hardcoded localhost URL for redirecting
users back to the frontend after authentication. This prevented OAuth from
working in production environments accessed via different hostnames.

Changes:
- Added 'frontend_base_url' field to OAuthSettings configuration
- Defaults to 'http://localhost:21342' for backward compatibility
- Updated OAuth callback handler to use configured URL instead of hardcoded value
- Maintains support for per-flow frontend_callback_url override

The frontend_base_url can now be configured in the settings file:
  [api.oauth]
  frontend_base_url = "https://scotty.example.com"

Fixes: #517
Relates to: scotty-36
…k-url

fix(oauth): replace hardcoded localhost with configurable frontend base URL
@claude
Copy link

claude bot commented Oct 27, 2025

Code Review: PR #467 - Version 0.2.0

Overview

This is a substantial release introducing major new features across authentication, authorization, observability, and frontend improvements. The PR includes:

  • 246 files changed
  • 30,355 additions / 2,253 deletions
  • 100 commits
  • Three major feature areas: OAuth/OIDC authentication, Casbin-based RBAC authorization, and comprehensive observability stack

Overall Assessment

Status: ✅ APPROVE WITH RECOMMENDATIONS

This PR demonstrates high-quality engineering with well-architected systems. The authorization implementation is particularly impressive, and the observability stack is production-ready. However, there are critical security issues that should be addressed before production deployment.


🎯 Major Features Review

1. Authorization System (Casbin RBAC)

Strengths:

  • ✅ Excellent architecture using Casbin for scope-based access control
  • ✅ Clean separation of concerns with roles, scopes, and permissions
  • ✅ Deny-by-default security model
  • ✅ Comprehensive documentation (docs/content/authorization.md)
  • ✅ Proper constant-time comparison for bearer token validation (prevents timing attacks)
  • ✅ Flexible permission model: view, manage, shell, logs, create, destroy

Code Quality:

  • Well-structured modules (scotty/src/authorization/)
  • Good error handling with clear user-facing messages
  • Type-safe permission system using enums

2. OAuth/OIDC Authentication

Strengths:

  • ✅ PKCE implementation for enhanced security (SHA256)
  • ✅ Support for both device flow and web flow
  • ✅ Multi-mode authentication (dev, bearer, oauth)
  • ✅ Clean integration with frontend
  • ✅ Comprehensive documentation (docs/content/oauth-authentication.md)

⚠️ CRITICAL Security Issues:

  1. JWT Signature Validation Missing (scotty/src/oauth/device_flow.rs)

    • Currently only validates by calling userinfo endpoint
    • Does NOT verify JWT signature, expiration, audience, or issuer
    • Risk: Tokens could be forged or expired
    • Recommendation: Implement proper OIDC token validation using JWKS
    // NEEDS:
    // - JWT signature verification using provider's JWKS
    // - exp (expiration) claim validation
    // - aud (audience) claim validation
    // - iss (issuer) claim validation
  2. Token Validation Endpoint Doesn't Validate (scotty/src/rest/handlers/login.rs)

    pub async fn validate_token_handler() -> impl IntoResponse {
        // Simply returns success without actual validation!
        let json_response = serde_json::json!({"status": "success"});
        Json(json_response)
    }
    • Risk: Frontend cannot verify token validity
    • Recommendation: Implement actual token validation logic
  3. PKCE Verifier Storage (scotty/src/oauth/mod.rs)

    • Comment indicates PKCE verifier not stored securely
    • Currently in-memory HashMap without persistence
    • Risk: Lost on restart, potential race conditions
    • Recommendation: Use encrypted session storage with short TTL (15-30s)
  4. Session Storage Vulnerability (frontend/src/lib/sessionManager.ts)

    • Tokens stored in sessionStorage (vulnerable to XSS)
    • Risk: XSS attacks can steal tokens
    • Recommendation: Use httpOnly, Secure cookies with SameSite=Strict
  5. No Rate Limiting on Device Code Polling

    • Unlimited polling attempts on device flow
    • Risk: Brute force attacks on device codes
    • Recommendation: Implement exponential backoff and max retries

3. Observability Stack

Strengths:

  • Excellent implementation of OpenTelemetry with metrics and traces
  • ✅ Pre-configured Grafana dashboards for monitoring
  • ✅ Prometheus-compatible metrics (40+ metrics covering all subsystems)
  • ✅ Comprehensive documentation (docs/content/observability.md)
  • ✅ Clean architecture: Scotty → OTEL Collector → VictoriaMetrics/Jaeger → Grafana
  • ✅ Low resource usage (180-250MB total)
  • ✅ Flexible backend support (can swap to Prometheus, Thanos, etc.)

Metrics Coverage:

  • Log streaming (active streams, throughput, errors)
  • Shell sessions (active connections, timeouts)
  • WebSocket connections and message rates
  • Task execution and output streaming
  • HTTP server performance by endpoint
  • Memory usage (RSS and virtual)
  • Application fleet metrics
  • Tokio async runtime health

This is production-ready and well-documented.

4. Frontend Improvements

Strengths:

  • ✅ New unified output component for logs and task output
  • ✅ WebSocket store with automatic reconnection
  • ✅ Permission-based UI rendering
  • ✅ User avatar with Gravatar support
  • ✅ Clean session management abstraction
  • ✅ OAuth callback handling

Code Quality:

  • TypeScript types are well-defined
  • Good separation of concerns (stores, services, components)
  • Error handling implemented

⚠️ Security Concerns:

  • Token storage in sessionStorage (see OAuth section above)
  • Consider implementing Content Security Policy (CSP) headers

🔒 Security Analysis

Critical Issues (Must Fix Before Production)

  1. JWT Validation - Implement proper OIDC token signature and claims validation
  2. Token Storage - Migrate from sessionStorage to httpOnly cookies
  3. Token Validation Handler - Implement actual validation logic
  4. PKCE Storage - Secure encrypted storage for PKCE verifier

High Priority

  1. Rate Limiting - Add rate limiting for device code polling and login attempts
  2. CSRF Protection - Implement CSRF tokens for POST endpoints (admin operations)
  3. Token Refresh - Implement token rotation with refresh tokens

Medium Priority

  1. App Name Validation - Add whitelist validation (alphanumeric, dash, underscore)
  2. WebSocket Message Limits - Add size limits to prevent DoS
  3. Persistent Session Storage - Replace in-memory HashMap with Redis for production

Positive Security Findings ✅

  • Excellent secret handling: MaskedSecret with zero-on-drop is production-ready
  • Constant-time comparisons: Prevents timing attacks on bearer tokens
  • Comprehensive logging: Good audit trail for security events
  • Type safety: Rust prevents many vulnerability classes
  • No SQL injection risk: Uses typed APIs, no raw SQL
  • No command injection risk: Uses Bollard SDK properly

📊 Code Quality Assessment

Rust Backend

Strengths:

  • Clean module organization
  • Good error handling with context
  • Type-safe APIs throughout
  • Comprehensive test coverage appears present
  • Clear separation of concerns

Minor Issues:

  • Some TODO comments should be addressed (scotty/src/oauth/mod.rs)
  • Consider adding more inline documentation for complex authorization logic

Frontend (TypeScript/Svelte)

Strengths:

  • Good use of Svelte stores for state management
  • Clean component structure
  • Type definitions are comprehensive

Suggestions:

  • Consider adding unit tests for auth logic
  • Add JSDoc comments for complex functions
  • Implement stricter TypeScript config (strict: true)

🧪 Testing & Quality

Observations:

  • Test files not visible in PR (external to diff)
  • Should verify test coverage for:
    • Authorization permission checks
    • OAuth flow edge cases (expired codes, invalid state)
    • Bearer token validation
    • Scope membership and wildcards

Recommendations:

  • Add integration tests for complete OAuth flows
  • Add tests for authorization denial scenarios
  • Test concurrent WebSocket connections
  • Test PKCE challenge/verifier pairs

📝 Documentation

Strengths:

  • ✅ Excellent documentation for all major features
  • ✅ Clear examples in authorization.md and oauth-authentication.md
  • ✅ Comprehensive observability setup guide
  • ✅ Migration guides for existing installations
  • ✅ Security best practices documented

Suggestions:

  • Add security considerations section to each doc
  • Document the critical security issues noted above
  • Add troubleshooting section for common OAuth errors
  • Include example curl commands for testing authorization

🚀 Performance Considerations

Positive:

  • Casbin enforcer with caching for permission checks
  • Async/await throughout (non-blocking)
  • WebSocket for real-time updates (efficient)
  • VictoriaMetrics for low-resource metrics storage

Potential Concerns:

  • Bearer token lookup is O(n) - consider HashMap for O(1)
  • In-memory session storage doesn't scale horizontally
  • WebSocket store could accumulate memory without cleanup

Recommendations:

  • Add metrics for authorization check latency
  • Implement session cleanup scheduler
  • Consider connection limits for WebSockets

🔄 Breaking Changes

⚠️ Important for Existing Users:

  1. Bearer Token Authentication Changed

    • Old: api.access_token in config
    • New: Must add tokens to authorization assignments
    • Action Required: Migration documented in authorization.md
  2. Authorization Required

    • All apps assigned to scopes (default scope if not specified)
    • Users/tokens need explicit role assignments
    • Action Required: Create initial policy.yaml
  3. OAuth Mode is New

    • Separate from previous oauth2-proxy setup
    • Action Required: Update redirect URLs if migrating

These are well-documented in the migration sections.


📋 Recommendations Summary

Before Merging

  • Add TODO tracking for critical security issues
  • Update README with security warnings for OAuth
  • Add SECURITY.md documenting known issues

Before Production Deployment

  • Fix JWT validation (CRITICAL)
  • Fix token validation handler (CRITICAL)
  • Implement httpOnly cookie storage (HIGH)
  • Add CSRF protection (HIGH)
  • Secure PKCE storage (CRITICAL)

Next Release (0.2.1)

  • Add rate limiting
  • Implement token refresh
  • Add input validation
  • Switch to persistent session storage
  • Add integration tests

🎉 Highlights

What's Really Well Done:

  1. Secret Handling - The MaskedSecret implementation is excellent and shows security awareness
  2. Authorization Architecture - Casbin integration is clean and powerful
  3. Observability Stack - Production-ready and comprehensive
  4. Documentation - Some of the best documentation I've seen in a PR
  5. Code Organization - Clean module structure and separation of concerns

Final Recommendation

APPROVE with conditions:

This is high-quality work that significantly enhances Scotty's capabilities. The authorization system is well-architected, and the observability stack is production-ready. However, the OAuth/OIDC implementation has critical security gaps that must be addressed.

Recommendation:

  • ✅ Merge to main if this is a development branch
  • ⚠️ DO NOT deploy to production until critical security issues are resolved
  • 📝 Create issues for all security items listed above
  • 🔒 Consider a security audit before production OAuth deployment

Confidence Level: High - The foundations are solid, but security hardening is essential.


Detailed Issue Tracker

Would you like me to create separate GitHub issues for each security concern? I can draft them with:

  • Detailed descriptions
  • Code references
  • Proposed solutions
  • Priority labels

Great work on this release! The overall direction is excellent. 🚀


Review generated by Claude Code based on PR diff, documentation, and codebase analysis.

stmh added 3 commits October 27, 2025 18:17
Add automated session cleanup to prevent memory leaks and implement
full observability for OAuth authentication flows.

Session Cleanup (scotty-37):
- Add periodic cleanup task running every 5 minutes via clokwerk scheduler
- Create generic ExpirableSession trait for all session types
- Clean DeviceFlowStore, WebFlowStore, and OAuthSessionStore
- Prevent unbounded memory growth and DoS attacks
- Include comprehensive unit tests for cleanup functionality

Metrics Implementation (scotty-41):
- Add 10 new OAuth metrics to instruments.rs:
  * Session gauges: device_flow, web_flow, oauth sessions (sampled every 30s)
  * Flow counters: device flows, web flows, flow failures
  * Token validation: total, duration histogram, failures
  * Cleanup counter: expired sessions cleaned
- Create dedicated oauth/metrics.rs module with clean API
- Instrument device_flow.rs and handlers.rs with minimal code changes
- Record token validation timing and success/failure rates

Grafana Dashboard:
- Add "OAuth Authentication" section with 9 panels
- Session monitoring: 4 gauge panels for active sessions
- Flow rates: device/web flow rates with failure tracking
- Performance: token validation duration (p50/p95/p99)
- Error rates: validation and flow error percentages
- Color-coded thresholds and stacked time series

Benefits:
- Fixes memory leak vulnerability (sessions never expired)
- Prevents DoS attacks via session exhaustion
- Full observability for authentication health and performance
- Enables proactive monitoring and alerting
- Minimal impact on business logic (single-line instrumentation)

Relates to: scotty-37, scotty-41
…and-metrics

feat(oauth): implement session cleanup and comprehensive monitoring
@claude
Copy link

claude bot commented Oct 27, 2025

PR Review: Version 0.2.0

Overview

This is a substantial release adding three major features: OAuth authentication, RBAC authorization with Casbin, and observability with OpenTelemetry. The PR includes 31,475 additions across 100+ files.

Overall Assessment: The implementation is well-structured with good separation of concerns. However, the PR size makes thorough review challenging. Future releases should consider breaking features into smaller, reviewable chunks.


🎯 Positive Highlights

Architecture & Design

  • Excellent separation of concerns in the frontend authentication layer (authService.ts, sessionManager.ts, sessionStore.ts)
  • Strong type safety with comprehensive TypeScript types and Rust utoipa schemas
  • Well-documented authorization system with clear documentation in docs/content/authorization.md
  • Good error handling in OAuth flow with RFC 6749-compliant error codes

Security

  • Proper token validation with server-side validation endpoints
  • Secure session management using sessionStorage (appropriate for web apps)
  • OAuth implementation follows best practices with state validation
  • Authorization model using Casbin is a solid choice for RBAC

Code Quality

  • Consistent error handling with custom error types (OAuthError)
  • Good use of Rust features: proper derive macros, type safety with secrecy crate
  • Frontend store architecture is clean and maintainable

⚠️ Issues & Concerns

Critical Issues

1. Security: Hardcoded Credentials in Policy File

Location: config/casbin/policy.yaml:63-68

identifier:test-bearer-token-123:
  - role: admin
    scopes:
      - client-a
      - client-b
      - qa
      - default
  • ⚠️ Do not commit test bearer tokens to the repository, even if they're examples
  • This should be removed or moved to a .example file
  • Document how to properly configure bearer tokens without exposing them in version control

2. Potential Token Leakage in Console Logs

Location: frontend/src/stores/webSocketStore.ts:85-86

console.log(
    'Authenticating WebSocket with token (first 8 chars):',
    token.substring(0, 8) + '...'
);
  • Even partial token logging could aid attackers
  • Consider removing in production or using proper debug flags
  • Similar issue in multiple files with debug logging

3. Missing CSRF Protection

Location: frontend/src/lib/authService.ts:49-56

The login endpoint doesn't appear to use CSRF tokens. While credentials: 'include' is set, there's no CSRF token validation visible in the frontend code. Ensure the backend has CSRF protection for state-changing operations.

High Priority Issues

4. Inconsistent Error Handling in OAuth Callback

Location: frontend/src/lib/authService.ts:104-108

const errorData: OAuthErrorResponse = await response.json();
return {
    success: false,
    error: `Authentication failed: ${errorData.error_description || errorData.error}`
};
  • Error responses might not always be JSON (network errors, 5xx errors)
  • Need try-catch around JSON parsing
  • Should handle unexpected response formats gracefully

5. WebSocket Authentication Race Condition

Location: frontend/src/stores/webSocketStore.ts:242-254

The WebSocket connects and immediately calls authenticateWebSocket() in the onopen handler. However, there's a potential race condition:

  • authenticateWebSocket() calls get(isAuthenticated)
  • The session might not be fully initialized yet
  • Consider adding explicit initialization checks or promise-based readiness

6. Docker Image Security: Running as Root

Location: Dockerfile:78-79

# RUN chown -R $APP_USER:$APP_USER ${APP}
# USER $APP_USER
  • These lines are commented out, meaning the container runs as root
  • This is a security risk - containers should run as non-root users
  • Uncomment these lines or document why root access is required

Medium Priority Issues

7. Missing Input Validation

Location: scotty-core/src/admin/requests.rs

While the request structs are well-defined, there's no visible validation for:

  • Scope name format/length (line 9)
  • Role name format/length (line 20)
  • Permission string validation (line 25)
  • User ID format validation (line 34)

Consider adding validation:

#[cfg_attr(feature = "utoipa", schema(min_length = 1, max_length = 64, pattern = "^[a-z0-9-]+$"))]
pub name: String,

8. Gravatar URL Not Using HTTPS Exclusively

Location: frontend/src/lib/gravatar.ts:26

The Gravatar URLs use https:// which is good, but consider adding a comment about why HTTP is never used (security, mixed content).

9. OAuth Token Refresh Not Implemented

Location: scotty-core/src/auth/oauth_types.rs:29

/// Optional refresh token
pub refresh_token: Option<String>,

The refresh_token field exists but there's no visible implementation for token refresh. Users will be logged out when tokens expire. Consider implementing refresh token flow for better UX.

10. Session Storage Fallbacks

Location: frontend/src/lib/sessionManager.ts:26-28

private isAvailable(): boolean {
    return browser && typeof sessionStorage !== 'undefined';
}

When sessionStorage is unavailable, operations silently fail. Consider:

  • Warning users when sessionStorage is disabled
  • Providing a fallback mechanism or clear error messaging

11. Casbin Policy Missing Audit Trail

Location: config/casbin/policy.yaml

The policy file has created_at timestamps but no:

  • updated_at fields
  • created_by / updated_by fields
  • Version tracking

For production authorization systems, audit trails are critical for compliance and debugging.

Low Priority / Code Quality

12. TypeScript ts-expect-error Usage

Location: frontend/src/lib/authService.ts:139

// @ts-expect-error - Type-safe routing doesn't support dynamic redirect paths
await goto(resolve(redirectTo));

Consider fixing the type issue properly rather than suppressing it, or document why this is the only approach.

13. Duplicate Debug Logging

Multiple files have excessive console logging that should use a proper logging framework with levels:

  • frontend/src/stores/webSocketStore.ts (lines 77-96)
  • frontend/src/lib/authService.ts (line 75)

Consider using a logging library with configurable log levels.

14. Magic Numbers

Location: frontend/src/stores/webSocketStore.ts

const MAX_RECONNECT_ATTEMPTS = 5;
const RECONNECT_DELAY_BASE = 1000;
const PING_INTERVAL = 30000;

These are well-named constants, but consider making them configurable via environment variables for different deployment scenarios.

15. Missing Tests

No test files are visible in the PR. For a security-critical release adding authentication and authorization, comprehensive tests are essential:

  • Unit tests for auth flows
  • Integration tests for OAuth callback handling
  • Tests for authorization rules
  • WebSocket authentication tests

🔍 Performance Considerations

1. WebSocket Reconnection Exponential Backoff

Location: frontend/src/stores/webSocketStore.ts:220-222

function getReconnectDelay(attempts: number): number {
    return Math.min(RECONNECT_DELAY_BASE * Math.pow(2, attempts), 30000);
}

✅ Good implementation with capped maximum delay.

2. Session Validation on Every Auth Check

Location: frontend/src/lib/authService.ts:165-167

const isValid = await sessionStore.validateCurrentToken();

Making a network request for every auth check could be expensive. Consider:

  • Caching validation results with TTL
  • Using JWT with client-side expiry checking
  • Only validating on critical operations

3. Casbin Performance

The PR enables Casbin's cached feature (good!), but ensure:

  • Cache invalidation works correctly when policies change
  • Performance testing with large policy sets
  • Consider adding metrics for authorization check latency

📊 Test Coverage Gaps

Missing test coverage for:

  1. OAuth device flow (scotty-core/src/auth/oauth_types.rs:5-17)
  2. Session store state transitions
  3. WebSocket authentication failure scenarios
  4. Authorization policy enforcement
  5. Token expiration handling
  6. Concurrent session handling
  7. RBAC permission evaluation edge cases

🏗️ Architectural Concerns

1. PR Size

This PR is too large for effective review (31k+ additions). Consider for future:

  • Breaking into feature PRs: OAuth, Authorization, Observability
  • Each feature should be independently reviewable
  • Easier to bisect issues, revert if needed

2. Migration Path

For existing deployments upgrading to 0.2.0:

  • Is there a migration guide for adding authorization?
  • What happens to existing sessions on upgrade?
  • Are there breaking API changes?
  • Database/config migration required?

3. Observability Integration

The observability features (OpenTelemetry) are added but:

  • Are there metrics for auth failures?
  • Are authorization decisions logged?
  • What's the performance impact of tracing?

📝 Documentation

Well documented:

  • ✅ Authorization system (docs/content/authorization.md)
  • ✅ OAuth authentication (docs/content/oauth-authentication.md)
  • ✅ Observability setup (docs/content/observability.md)

Needs documentation:

  • ❌ Migration guide for existing deployments
  • ❌ Security best practices for production deployments
  • ❌ Bearer token generation and rotation
  • ❌ WebSocket authentication troubleshooting
  • ❌ OAuth provider configuration examples (beyond the example)

🎯 Recommendations

Before Merge

  1. MUST FIX: Remove identifier:test-bearer-token-123 from config/casbin/policy.yaml
  2. MUST FIX: Uncomment the USER directive in Dockerfile or document why root is needed
  3. MUST FIX: Add try-catch around JSON parsing in OAuth error handling
  4. SHOULD FIX: Reduce console.log of token information
  5. SHOULD ADD: Basic integration tests for auth flows

Post-Merge (Next Release)

  1. Implement token refresh mechanism
  2. Add comprehensive test suite
  3. Add authorization metrics and alerting
  4. Implement CSRF protection verification
  5. Add input validation for authorization resources
  6. Create migration guide
  7. Performance testing with large policy sets

Process Improvements

  1. Break down large features into reviewable PRs (<5000 lines)
  2. Add pre-merge checklist for security features
  3. Require tests for authentication/authorization code
  4. Security review process for auth-related changes

Summary

This is a significant and well-architected release that adds critical features. The code quality is generally high with good separation of concerns and type safety.

Key concerns:

  • Security issues with hardcoded test credentials and logging
  • Docker security (running as root)
  • Missing tests for security-critical features
  • PR size makes thorough review difficult

Recommendation: Address the critical security issues, add basic integration tests, and document the migration path before merging. The other issues can be addressed in follow-up PRs.

Approval: ⚠️ Approve with required changes - Fix critical security issues before merge.

Great work on this release! 🚀

stmh added 3 commits October 28, 2025 21:58
Wrap sensitive cryptographic material in MaskedSecret to prevent exposure
in memory dumps, debug logs, and crash reports. Secrets are now zeroized
on drop and require explicit .expose_secret() calls for access.

Changes:
- Update WebFlowSession to use MaskedSecret for csrf_token and pkce_verifier
- Update handlers.rs to wrap/unwrap secrets during storage and retrieval
- Update all test code to use MaskedSecret
- Maintain full serialization/deserialization support

Security improvements:
- PKCE verifier protected from memory exposure
- CSRF token protected from memory exposure
- Automatic zeroization on drop
- Masked output in debug logs

All OAuth tests passing (21/21).

Closes scotty-38
Remove base64 encoding/decoding of PKCE verifier as it provides no benefit.
The PKCE verifier (RFC 7636) is already URL-safe and contains only ASCII
alphanumeric characters and safe symbols (-._~).

Changes:
- Store PKCE verifier directly as string without base64 encoding
- Simplify retrieval logic (no decode/UTF-8 conversion needed)
- Remove unused base64 import
- Update struct comment to remove encoding reference
- Add test to verify debug output masks secrets

Benefits:
- Simpler code (removed 25 lines of error handling)
- No unnecessary encode/decode overhead
- Fewer potential failure points
- Easier debugging (see actual verifier format)

All 22 OAuth tests passing including new masking test.
…rotection

fix(oauth): protect PKCE verifier and CSRF token with MaskedSecret
@claude
Copy link

claude bot commented Oct 28, 2025

PR #467 - Version 0.2.0 Code Review

Overview

This is a major release with 31,497 additions across 103 files, introducing comprehensive authentication, authorization, and observability systems. Overall assessment: Strong foundation with some critical issues requiring attention before merge.


🎯 Summary Rating

Category Rating Notes
Code Quality ✅ Good Well-structured, proper error handling
Security ⚠️ Moderate Critical memory safety issue identified
Testing ✅ Good Core flows tested, some gaps remain
Documentation ✅ Excellent Comprehensive guides added
Performance ✅ Good Async architecture, proper instrumentation

🔒 Security Analysis

✅ Strengths

  1. Constant-time token comparison (scotty/src/api/rest/handlers/login.rs:53-60)

    • Uses subtle::ConstantTimeEq to prevent timing attacks
    • Proper implementation ✅
  2. OAuth/OIDC with PKCE support

    • Modern OAuth 2.0 device flow + authorization code flow
    • PKCE prevents authorization code interception
    • Multiple provider support (GitLab, Auth0, Keycloak, Google)
  3. RBAC with Casbin

    • Granular permission model (view, manage, shell, logs, create, destroy)
    • Scope-based resource isolation
    • Role inheritance properly configured
  4. Secret masking in API responses (scotty/src/api/secure_response.rs)

    • SecureJson wrapper masks sensitive data
    • Pattern-based detection (10 sensitive patterns)
    • URI credential masking
  5. Session storage best practices

    • Frontend uses sessionStorage (not localStorage)
    • Tokens cleared on browser close
    • Proper validation on initialization

🚨 Critical Issues

1. Secrets as Plain String in Memory (CRITICAL)

Location: Throughout codebase, documented in docs/SECRET_HANDLING_ANALYSIS.md

Problem:

  • All secrets (passwords, tokens, API keys) stored as plain String
  • No memory zeroization on drop
  • Vulnerable to memory dumps, debuggers, panic messages
  • No type safety prevents accidental logging

Example Flow:

OnePassword APIStringHashMap<String, String> → Docker login → Task logs
                    ↑ Plain text in memory for minutes/hours

Impact: HIGH - Secrets can leak via:

  • Debug output ({:?})
  • Error messages
  • Task output logs
  • Memory dumps

Recommendation: Implement migration to secrecy crate as outlined in SECRET_HANDLING_ANALYSIS.md (4-phase plan, 4-5 weeks estimated).

2. Test Credentials in Production Config (HIGH)

Locations:

  • config/default.yaml:11 - test-bearer-token-123: "test-bearer-123-token"
  • config/casbin/policy.yaml:63-68 - identifier:test-bearer-token-123 with admin role
  • Also appears in test files: scotty/tests/test_bearer_auth.yaml, scotty/src/api/bearer_auth_tests.rs

Problem: Test bearer token with admin privileges shipped in default config

Impact: MEDIUM - Could be exploited if default config used in production

Recommendation:

  • Remove from config/default.yaml and config/casbin/policy.yaml
  • Keep only in test-specific configs under scotty/tests/
  • Add comment explaining these are test-only credentials

3. Missing JWT Generation for OAuth (MEDIUM)

Location: scotty/src/oauth/handlers.rs:533

// For now, return the OIDC token directly
// TODO: Generate a Scotty JWT token instead
Ok(axum::response::Json(TokenResponse {
    access_token: session.oidc_token,

Problem:

  • OAuth flow returns OIDC provider token directly
  • No Scotty-specific JWT generated
  • Ties auth system to external provider tokens

Impact: MEDIUM - External tokens control session lifecycle

Recommendation: Implement JWT generation with:

  • Scotty-controlled expiration
  • Custom claims for RBAC
  • Token refresh support

⚠️ Medium Issues

4. Bearer Token Reverse Lookup (scotty/src/api/auth_core.rs:94-101)

Problem: Linear search through all bearer tokens to find identifier

Impact: LOW - Timing could leak token count; acceptable for small sets

Recommendation: Consider HashMap<String, String> for O(1) lookup if token count grows

5. Typo in Default Config (config/default.yaml:9)

client-b: "client-b-bearet-token"  # Missing 'r' in 'bearer'

6. Docker Registry Passwords in Plain Config

Location: config/default.yaml:28-32

registries:
  factorial_legacy:
    password: "deploybot"  # Should use env var override

Impact: LOW - Comments indicate env var override expected

Recommendation: Use placeholder value like "OVERRIDE_WITH_ENV_VAR" to make it obvious

🔍 Missing Validations

  1. Bearer token format validation - No minimum length/character set requirements
  2. OAuth redirect URL validation - Should validate production URLs
  3. Rate limiting - OAuth/login endpoints not rate-limited (visible)
  4. CORS configuration - Not visible in config files
  5. Token expiration - No automatic refresh mechanism

🧪 Test Coverage Analysis

✅ Well-Tested Areas

  1. OAuth flow (scotty/src/api/oauth_flow_tests.rs)

    • Device flow with mocked OIDC provider ✅
    • Authorization code exchange ✅
    • Token validation ✅
  2. Bearer auth + RBAC (scotty/src/api/bearer_auth_tests.rs)

    • Valid/invalid token scenarios ✅
    • Permission checking ✅
    • Scope-based access ✅
  3. Secret masking (scotty-core/src/utils/sensitive_data.rs)

    • Pattern detection (5 tests) ✅
    • Value masking ✅
    • URI credential masking ✅

❌ Test Gaps

  1. Task output masking - No tests for secrets in docker command output
  2. OnePassword error handling - Error messages could leak partial secrets
  3. OAuth CSRF validation - State parameter validation not explicitly tested
  4. Docker login security - No test verifying password not in process list
  5. WebSocket authentication - No end-to-end WS auth tests visible
  6. Permission caching - No performance tests for repeated permission checks

Recommendation: Add integration tests for secret leakage scenarios:

#[test]
fn test_docker_login_password_not_in_task_output() {
    // Verify docker login errors don't expose password
}

🏗️ Architecture & Code Quality

✅ Strengths

  1. Unified TypeScript generation (scotty-types crate)

    • Single source of truth via ts-rs
    • Faster builds (6s vs 27s)
    • Type safety across stack
  2. Comprehensive error handling

    • AppError enum with 20+ variants
    • Automatic HTTP status mapping
    • OAuth errors follow RFC 6749
  3. Observability stack

    • OpenTelemetry integration
    • 15+ metrics instrumented
    • Grafana dashboard included
    • VictoriaMetrics backend
  4. Proper async architecture

    • Tokio runtime metrics
    • WebSocket with reconnect/backoff
    • Background task schedulers

🔧 Code Quality Issues

  1. Incomplete TODOs

    • scotty/src/oauth/handlers.rs:533 - JWT generation (mentioned above)
    • scotty/src/api/rest/handlers/admin/assignments.rs - Missing remove_assignment method
  2. Dead code markers

    #[allow(dead_code)] on CurrentUser.picture, CurrentUser.access_token
    #[allow(unused)] on OAuthSettings readonly fields

    Recommendation: Either use these fields or document why they're kept

  3. Error message verbosity

    • Some errors logged with full debug format {:?}
    • Could expose internals in production logs
    • Example: OIDC validation errors logged verbosely

📚 Documentation

✅ Excellent Additions

  1. AGENTS.md - Claude Code development guide
  2. oauth-authentication.md - Complete OAuth/OIDC setup guide
  3. authorization.md - RBAC configuration with examples
  4. observability.md - Monitoring stack setup
  5. SECRET_HANDLING_ANALYSIS.md - Critical security analysis (very thorough!)

📝 Minor Gaps

  • No migration guide from v0.1.x to v0.2.0 for existing deployments
  • No rollback procedure documented
  • Production deployment checklist missing (security hardening steps)

🚀 Performance Considerations

✅ Good Practices

  1. Async/await throughout - Non-blocking I/O
  2. Connection pooling - Bollard Docker client
  3. WebSocket for real-time - Avoids polling
  4. Metrics instrumentation - Can identify bottlenecks

🤔 Potential Concerns

  1. Bearer token lookup - O(n) search (mentioned above)
  2. No visible permission caching - RBAC checked on every request
  3. Casbin policy loading - Not clear if policies cached in memory

Recommendation: Add caching layer for permission checks if load testing shows issues.


🎯 Recommendations by Priority

P0 - Block Merge (Must Fix)

  • Remove test-bearer-token-123 from production configs
  • Document secret handling roadmap (secrecy crate migration timeline)
  • Fix typo: client-b-bearet-tokenclient-b-bearer-token

P1 - Before Release (Should Fix)

  • Implement JWT generation for OAuth (complete TODO at line 533)
  • Add task output masking for docker login passwords
  • Add rate limiting to OAuth/login endpoints
  • Add integration tests for secret leakage scenarios
  • Create production deployment checklist
  • Document migration path from v0.1.x

P2 - Post-Release (Nice to Have)

  • Migrate to secrecy crate (4-5 week effort per analysis doc)
  • Implement token refresh mechanism
  • Add permission result caching
  • Optimize bearer token lookup to O(1)
  • Remove or use #[allow(dead_code)] fields
  • Add distributed tracing integration

🎬 Conclusion

This is a high-quality, ambitious release that significantly enhances Scotty's security and observability. The architecture is sound, testing is comprehensive for core flows, and documentation is excellent.

Recommendation: Approve with required changes (fix P0 issues)

The critical issues are:

  1. Test credentials in prod config (easy fix)
  2. Secret memory handling (documented, roadmap needed)
  3. JWT generation TODO (functional issue)

Once P0 items are addressed, this PR represents a major step forward for the project. The SECRET_HANDLING_ANALYSIS.md shows excellent security awareness - implementing that roadmap should be the next major security initiative.

Great work on this release! The OAuth/RBAC implementation is solid, the observability stack is well-integrated, and the unified type system is a smart architectural choice.


📋 Checklist for Merge

  • Remove test-bearer-token-123 from default configs
  • Fix config typo (bearet → bearer)
  • Complete JWT generation or document why OIDC tokens are acceptable
  • Add comment in config explaining env var overrides for docker passwords
  • Create GitHub issue for secrecy crate migration with timeline
  • Create GitHub issue for task output masking
  • Update CHANGELOG.md with breaking changes and migration notes

stmh added 9 commits October 28, 2025 22:47
Implemented tiered rate limiting for all vulnerable API endpoints
to prevent brute force attacks, DoS, and abuse. Uses token bucket
algorithm via tower-governor with three distinct tiers:

- Public auth endpoints (login) - IP-based rate limiting
- OAuth endpoints - IP-based rate limiting
- Authenticated API endpoints - Token-based rate limiting

Rate limiting is disabled by default and can be configured per-tier
with requests_per_minute and burst_size settings. Global switch
allows enabling/disabling all rate limiting.

Key changes:
- Added tower_governor 0.8 and governor 0.10 dependencies
- Created rate limiting module with tier-specific limiters
- Added custom bearer token extractor for authenticated tier
- Split router into separate login, OAuth, and authenticated routers
- Added conditional layer application based on configuration
- Added rate limiting settings to api_server config

All 117 existing tests pass. Addresses scotty-40.
Add comprehensive metrics instrumentation for rate limiting:
- Total rate limit hits counter
- Rate limit hits by tier (authenticated, public_auth, oauth)
- Middleware to automatically record 429 responses

Add Grafana dashboard panels:
- Total rate limit hits stat
- Rate limit hit rate over time
- Stacked chart showing hits by tier

Add integration tests:
- Test rate limiting disabled
- Test authenticated per-token rate limiting
- Test independent tier limits
Document rate limiting configuration, tiers, and monitoring:
- Configuration options and examples
- Three-tier rate limiting system (public_auth, oauth, authenticated)
- Recommended limits per tier
- Monitoring metrics and dashboard
- Behavior when limits are exceeded
The rate limiting integration tests were failing because they use
IP-based rate limiting via SmartIpKeyExtractor, but axum_test::TestServer
doesn't provide real IP addresses. Added X-Forwarded-For headers to all
test requests to allow proper IP extraction and rate limit validation.

Also fixed login endpoint JSON format to use only 'password' field
as required by the FormData struct.

Tests now properly verify:
- Public auth rate limiting (IP-based)
- OAuth rate limiting (IP-based)
- Rate limit response format
- Independent limits across endpoint tiers
- Replace token truncation with SHA256 hashing to prevent collision attacks
- Fix rate calculation precision loss by using per_nanosecond instead of per_second
- Add configuration validation for TierConfig with comprehensive tests
- Validate rate limiting config on application startup

This addresses the 3 MUST FIX items from PR #529 review:
1. Token truncation created collision risk for DoS attacks
2. Integer division precision loss could double rate limits (30 req/min → 1 req/sec)
3. No validation for invalid config (burst_size=0, burst exceeds rate)

All rate limiting tests still pass with these security improvements.
- Document multi-instance deployment limitations and solutions
- Document IPv6 support and considerations
- Verify and document rate limit response headers (Retry-After, X-RateLimit-After)
- Add test assertions for rate limit headers
- Document missing RFC 6585 headers as future enhancement
- Remove emojis from documentation per user preference

This addresses the SHOULD FIX items from PR #529 review:
- IPv6 handling documented with prefix-based attack mitigation strategies
- Multi-instance limitations clearly documented with recommended solutions
- Rate limit headers verified (Retry-After provided by tower-governor)
- Missing headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) documented as future enhancement
Add detailed observability for rate limiting:
- rate_limit_requests_total{tier, status} - tracks allowed/denied requests
- rate_limit_extractor_errors - tracks bearer token extraction failures

Changes:
- Updated ScottyMetrics to include new rate limiting counters
- Modified middleware to record both allowed and denied requests
- Added extraction error tracking in BearerTokenExtractor
- Added unit test for metrics recording functions

All tests passing (126 total).
feat(security): implement comprehensive API rate limiting
@claude
Copy link

claude bot commented Oct 28, 2025

Comprehensive Code Review - Version 0.2.0

This is a major release introducing authentication, authorization, unified output system, and observability. I've conducted a thorough review across all major components.


Executive Summary

Overall Assessment: APPROVE WITH CRITICAL FIXES REQUIRED⚠️

This is an excellent architectural upgrade with strong security foundations. However, several critical issues must be addressed before production deployment:

  1. Panic-prone Mutex unwrap pattern (HIGH)
  2. In-memory session storage (HIGH)
  3. Hardcoded credentials in config (HIGH)
  4. Frontend token logging to console (CRITICAL)
  5. Missing CSRF protection (HIGH)

Major Features Review

1. OAuth Authentication & Authorization System

Strengths ✅

  • Excellent PKCE implementation with SHA256 code challenge prevents authorization code interception
  • Proper CSRF protection using state parameter with constant-time comparison
  • Secret masking for sensitive data in debug output (last 4 chars only)
  • Automatic session cleanup (10 min for auth flows, 5 min for OAuth sessions)
  • Casbin RBAC integration with flexible scope-based permissions
  • Timing attack mitigation using subtle::ConstantTimeEq for bearer token comparison
  • Comprehensive test coverage with mock OAuth providers

Critical Issues 🔴

1. Mutex Unwrap Pattern (Panic-Prone)

  • Severity: HIGH
  • Locations:
    • scotty/src/oauth/handlers.rs:211, 336, 404, 423, 494
    • scotty/src/oauth/device_flow.rs:52, 73
    • scotty/src/oauth/cleanup.rs:39, 94-96
// VULNERABLE CODE
let mut sessions = oauth_state.web_flow_store.lock().unwrap();

Issue: Using .unwrap() on Mutex locks will panic if another thread panics while holding the lock (poisoned mutex), causing server crashes and DoS vulnerability.

Recommendation:

let mut sessions = oauth_state.web_flow_store.lock()
    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;

2. In-Memory Session Storage (No Persistence)

  • Severity: HIGH
  • Location: scotty/src/oauth/mod.rs:39-65
pub type DeviceFlowStore = Arc<Mutex<HashMap<String, DeviceFlowSession>>>;
pub type WebFlowStore = Arc<Mutex<HashMap<String, WebFlowSession>>>;

Issues:

  • Sessions lost on server restart
  • Incompatible with horizontal scaling
  • Code comment acknowledges: "In production, this should use Redis or database"

Impact: Unreliable OAuth flows, load balancer incompatibility

Recommendation: Implement Redis or database-backed session storage before production


3. Hardcoded Bearer Tokens

  • Severity: HIGH
  • Location: config/default.yaml:6-11
bearer_tokens:
    admin: "admin-bearer-token"
    client-a: "client-a-bearer-token"
    client-b: "client-b-bearet-token"  # TYPO: "bearet"

Issues:

  • Plain-text tokens in version-controlled config file
  • Typo in client-b token name
  • Easy to accidentally deploy with test credentials
  • No token rotation guidance

Recommendation:

  • Use environment variables exclusively: SCOTTY__API__BEARER_TOKENS__ADMIN
  • Document token generation best practices
  • Add security warnings in default.yaml

4. OIDC Tokens Returned as Bearer Tokens

  • Severity: MEDIUM
  • Location: scotty/src/oauth/handlers.rs:117-126, 534-547
// TODO: Generate a Scotty JWT token instead
Ok(Json(TokenResponse {
    access_token: oidc_token,  // OIDC token returned directly

Impact:

  • No token revocation capability
  • Token expiration controlled by external provider
  • No fine-grained permission embedding

Recommendation: Implement Scotty JWT generation (TODO at line 556)


5. Casbin Policy Sync Not Atomic

  • Severity: MEDIUM
  • Location: scotty/src/services/authorization/casbin.rs:19
let _ = enforcer.clear_policy().await;  // Errors silently ignored\!
// Then add new policies...

Issues:

  • Policies cleared before new ones added (race condition)
  • Error handling with let _ silently ignores failures
  • Partial policy state possible if sync fails midway

Recommendation: Use load-swap pattern or implement atomicity


2. Frontend Authentication Security

Strengths ✅

  • Uses sessionStorage (not localStorage) - clears on browser close
  • No hardcoded credentials in frontend code
  • Proper session validation before trusting stored tokens
  • Type-safe storage key management
  • Comprehensive error handling with try-catch blocks
  • Clean Svelte store architecture for reactive auth state

Critical Issues 🔴

1. Token Logging to Console

  • Severity: CRITICAL
  • Locations:
    • frontend/src/lib/ws.ts:32
    • frontend/src/stores/webSocketStore.ts:86
console.log(
    'Authenticating WebSocket with token (first 8 chars):',
    token.substring(0, 8) + '...'
);

Risk: Token prefixes visible in browser console (accessible to XSS attacks, developer tools)

Recommendation: Remove ALL token logging entirely


2. Missing CSRF Protection

  • Severity: HIGH
  • Locations:
    • frontend/src/lib/authService.ts:54 - /api/v1/login POST
    • frontend/src/lib/authService.ts:94 - /oauth/exchange POST
    • frontend/src/stores/sessionStore.ts:145 - /validate-token POST

Issue: No CSRF tokens on state-changing POST requests

Recommendation: Implement CSRF tokens or enforce SameSite cookies


3. Dev Mode Bypass Risk

  • Severity: HIGH
  • Location: frontend/src/stores/sessionStore.ts:52-53
if (authMode === 'dev') {
    isAuthenticated = true; // Always authenticated in dev mode
}

Risk: If authMode === 'dev' set in production, authentication fully bypassed

Recommendation: Ensure dev mode is compile-time only, not runtime configurable


4. No Token Expiration Tracking

  • Severity: MEDIUM
  • Token validity only checked via API endpoint
  • No JWT expiration parsing
  • No automatic token refresh mechanism

Recommendation: Implement token metadata tracking and refresh


5. WebSocket Auto-Connect Before Auth

  • Severity: MEDIUM
  • Location: frontend/src/stores/webSocketStore.ts:435-437
if (browser) {
    initialize(); // Connects immediately on module load
}

Issue: WebSocket connection attempted before authentication complete

Recommendation: Only connect after successful authentication


3. Unified Output System

Strengths ✅

  • TimedBuffer batching reduces lock contention 20-100x
  • Excellent lock strategy in TaskManager cleanup (read → process → write)
  • Proper session lifecycle management with cleanup tasks
  • Separate stdout/stderr with time synchronization
  • Configurable memory limits (10,000 lines, 4KB per line)

Issues Found 🟡

1. Broadcast Channel Overflow

  • Severity: MEDIUM
  • Location: scotty/src/api/websocket/client.rs:23
let (sender, _) = broadcast::channel(1000); // 1000 message capacity

Issue: High-frequency log streams can overflow, causing lagged clients to lose messages

Recommendation: Reduce capacity or implement backpressure


2. O(n) Output Line Eviction

  • Severity: MEDIUM
  • Location: scotty-types/src/lib.rs:201-204
while self.lines.len() > self.limits.max_lines {
    self.lines.remove(0);  // Shifts 9,999 lines on each eviction\!
}

Issue: Vec::remove(0) is O(n) - expensive at 10,000 lines

Recommendation: Replace with VecDeque for O(1) pop_front


3. Double Task Output Polling

  • Severity: LOW
  • Location: scotty/src/tasks/output_streaming.rs:227, 230

Calls task_manager.get_task_output() twice in same iteration

Recommendation: Cache Arc from first call


4. Docker Timestamp Parsing Heuristic

  • Severity: LOW
  • Location: scotty/src/docker/services/logs.rs:529-544
if content.starts_with('2') { /* assume timestamp */ }

Risk: Can fail on log lines starting with "2" that aren't timestamps

Recommendation: Use regex or strict format validation


4. Configuration Security

Issues Found 🔴

1. Registry Credentials in Default Config

  • Location: config/default.yaml:25-32
registries:
    factorial_legacy:
        username: "deploybot"
        password: "deploybot"  # Override with env var

Issue: While comment mentions env vars, having example credentials in committed config is risky

Recommendation: Use OVERRIDE_VIA_ENV_VAR placeholder


2. Wildcard User Assignment

  • Location: config/casbin/policy.yaml:38-41
assignments:
  '*':  # All users get viewer role
  - role: viewer
    scopes:
    - default

Risk: Every authenticated user gets viewer access to default scope

Recommendation: Add comment explaining behavior, ensure this is intentional


3. Test Credentials in Default Policy

  • Location: config/casbin/policy.yaml:42-69

Multiple test identifiers (admin, hello-world, test-bearer-token-123) in default config

Recommendation: Separate example configs from production defaults


5. Observability Stack

Strengths ✅

  • OpenTelemetry integration with Grafana, Jaeger, VictoriaMetrics
  • 40+ metrics covering all major subsystems
  • Pre-built dashboards with comprehensive monitoring
  • Low resource usage (180-250 MB)
  • Flexible architecture with interchangeable components

No significant issues found - excellent implementation!


Test Coverage Assessment

Good Coverage ✅

  • 43 test modules identified
  • 96 individual test functions
  • Tests for: OAuth flow, bearer auth, authorization, WebSocket, logs, shell
  • Mock OAuth providers for isolated testing

Gaps to Address 📝

  1. End-to-end OAuth testing with real OIDC provider
  2. WebSocket load testing (concurrent connections, message rates)
  3. Frontend component tests (Svelte tests not found in PR)
  4. Authorization edge cases (scope wildcards, permission inheritance)
  5. Memory leak testing under sustained load

Performance Considerations

Optimizations Needed 🟡

  1. Permission checking overhead - No caching, looks up permissions on every request

    • Recommendation: Implement Redis or in-memory permission cache
  2. Lock contention in broadcast_to_all() - iterates while holding lock

    • Recommendation: Collect client IDs first, release lock, then send
  3. RBAC policy loading from YAML - no caching mechanism

    • Recommendation: Cache loaded policies in production
  4. String cloning in broadcasts - Creates copies for each WebSocket message

    • Recommendation: Use Arc<String> for shared message data

Breaking Changes

User Impact ⚠️

  1. Bearer token authentication changes - api.access_token fallback removed
  2. TaskDetails fields removed - stdout/stderr replaced with unified output
  3. Authorization required - Apps must declare scope membership in .scotty.yml

Recommendation: Provide migration guide in CHANGELOG and release notes


Documentation Quality

Excellent Documentation ✅

  • docs/prds/authorization-system.md - Comprehensive PRD with use cases
  • docs/prds/unified-output-system.md - Detailed technical design
  • docs/content/oauth-authentication.md - Complete setup guide
  • docs/content/observability.md - Thorough monitoring guide
  • observability/README.md - Quick-start instructions

No documentation gaps found!


Priority Action Items

CRITICAL (Fix Before Merge) 🔴

  1. Remove token logging from frontend console (ws.ts, webSocketStore.ts)
  2. Replace Mutex unwrap pattern throughout OAuth handlers
  3. Remove hardcoded credentials from config files
  4. Add CSRF protection to login/OAuth endpoints

HIGH (Fix Before Production) 🟡

  1. Implement persistent session storage (Redis/database)
  2. Make Casbin policy sync atomic (load-swap pattern)
  3. Fix configuration typo (client-b-bearet-token)
  4. Implement Scotty JWT generation (replace OIDC token passthrough)
  5. Add permission caching for performance

MEDIUM (Post-Release Improvements) 🔵

  1. Replace Vec with VecDeque for output line eviction
  2. Add WebSocket backpressure or reduce broadcast capacity
  3. Implement token refresh mechanism in frontend
  4. Add E2E OAuth tests with real providers
  5. Separate test configs from production defaults

Final Verdict

✅ APPROVE WITH RECOMMENDATIONS

This is an outstanding release that significantly enhances Scotty's capabilities. The architecture is sound, security practices are mostly excellent, and documentation is comprehensive.

However, the critical issues listed above MUST be addressed before production deployment. These are typical for a major authentication overhaul and can be fixed with focused effort.

Estimated Effort:

  • Critical fixes: 1-2 days
  • High priority: 3-5 days
  • Medium priority: 1-2 weeks

The benefits of this release far outweigh the identified issues. Once the critical and high-priority items are resolved, this will be a rock-solid foundation for Scotty's future.


Detailed Review Reports

The following detailed analyses were conducted:

  1. Authentication & Authorization - 753 lines, OAuth flows, Casbin RBAC, bearer tokens
  2. Frontend Security - Token storage, WebSocket auth, session management, CSRF gaps
  3. Unified Output System - WebSocket streaming, TimedBuffer, log/task output, memory management

All findings include specific file paths and line numbers for targeted fixes.


Reviewer: Claude Code Review Agent
Review Date: 2025-10-28
PR: #467 - Version 0.2.0
Files Changed: 255 files, +33,003 / -2,259 lines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants