Skip to content

feat(lens): Add SaFE SSO authentication support for Lens API#286

Open
haishuok0525 wants to merge 30 commits intomainfrom
feature/lens/ldap
Open

feat(lens): Add SaFE SSO authentication support for Lens API#286
haishuok0525 wants to merge 30 commits intomainfrom
feature/lens/ldap

Conversation

@haishuok0525
Copy link
Collaborator

Summary

This PR implements SaFE SSO cookie authentication support for Lens API, enabling seamless single sign-on between SaFE and Lens systems.

Changes

Authentication Middleware (modules/api/pkg/api/auth/middleware.go)

  • Add support for SaFE SSO Token cookie (capital T)
  • Implement token extraction priority: Lens cookie → SaFE cookie → Bearer header
  • Add 2-second retry logic to handle token sync delay after SSO login

Session Management (modules/core/pkg/controlplane/auth/session/)

  • Add DisplayName field to SessionInfo struct
  • Populate DisplayName from user record in Validate method

API Response (modules/api/pkg/api/auth/login_handler.go)

  • Return display_name in /auth/me API response
  • Use standard rest.SuccessResp format for consistent API responses

Route Protection (modules/api/pkg/api/router.go)

  • Apply SessionAuthMiddleware to all business API routes
  • Keep /detection-status/log-report public for internal telemetry-processor use

Token/User Sync (modules/adapter/primus-safe-adapter/)

  • Reduce token sync interval: 30s → 1s (near real-time)
  • Reduce user sync interval: 60s → 5s
  • Sync email and display_name from SaFE User CRD annotations

Testing

  • Verified SaFE SSO cookie authentication works correctly
  • Tested all node API endpoints with authenticated requests
  • Confirmed retry logic handles token sync delay gracefully

Related

This enables Lens to seamlessly authenticate users who are already logged into SaFE via SSO.

…se 1-6)

- Phase 1: Control Plane database architecture
  - Add ClusterManager extensions for Control Plane support
  - Implement DAL layer with GORM Gen
  - Create database migrations for auth tables

- Phase 2: System initialization and admin APIs
  - Add system initializer with SaFE detection
  - Implement root user creation flow
  - Add admin APIs for auth mode and password management

- Phase 3: Auth provider management APIs
  - Implement CRUD APIs for auth providers
  - Add provider configuration types (LDAP, OIDC)
  - Add system config management APIs

- Phase 4: LDAP provider implementation
  - Add LDAP connection pool with TLS support
  - Implement user search and authentication
  - Add attribute mapping and group membership

- Phase 5: Token sync adapter
  - Add TokenSyncService to sync SaFE tokens
  - Add TokenCleanupService for session cleanup
  - Extend bootstrap to initialize sync tasks

- Phase 6: Session management
  - Implement DB-based Session Manager
  - Add token generation, validation, and refresh
  - Add login audit service for security logging
- Add login/logout API handlers (POST /auth/login, POST /auth/logout)
- Add session refresh API (POST /auth/refresh)
- Add current user API (GET /auth/me)
- Implement SessionAuthMiddleware for session-based authentication
- Implement AdminAuthMiddleware for admin privilege checks
- Implement OptionalAuthMiddleware for optional auth
- Register auth routes in main router.go
- Add auth mode management (GetCurrentAuthMode, SetCurrentAuthMode)
- Add authentication error definitions
- Add CreateFromLDAP method to UserFacade

New routes (independent from existing APIs):
- POST /auth/login - Public, supports Local/LDAP auth
- POST /auth/logout - Public
- POST /auth/refresh - Public
- GET /auth/me - Requires session auth
- GET/PUT /auth/mode - Requires admin auth
- CRUD /auth/providers/* - Requires admin auth
- GET/POST /init/status, /init/setup - Public
- CRUD /configs/* - Requires admin auth
- POST /root/change-password - Requires admin auth
…s Secret

- Initialize auth system during API startup
- Auto-create root user if not exists
- Store generated password in Kubernetes Secret instead of logging
- Support multi-pod concurrent startup with database unique constraint
- Password priority: env var > existing secret > auto-generate
- Add InitializeAuthHandlers call to enable auth API endpoints
- Move auth initialization to preInit callback of InitServerWithPreInitFunc
- This ensures ClusterManager is initialized before accessing K8s client
- Fixes panic: cluster manager not initialized
…ation

- Add controlPlane.enabled config option in config.go
- Add NewControlPlaneConfigFromEnv() to read DB config from env vars
- Modify server.go to initialize Control Plane when enabled
- Update preInitAuthSystem to check Control Plane availability
- Skip auth initialization gracefully when Control Plane is disabled

Environment variables for Control Plane DB:
- CONTROL_PLANE_DB_HOST
- CONTROL_PLANE_DB_PORT (default: 5432)
- CONTROL_PLANE_DB_NAME
- CONTROL_PLANE_DB_USER
- CONTROL_PLANE_DB_PASSWORD
- CONTROL_PLANE_DB_SSL_MODE (default: require)
- Remove environment variable based config reading
- Add NewControlPlaneConfigFromSecret() to read DB config from K8s Secret
- Add ClusterManager.InitControlPlane() for delayed initialization
- server.go now: 1) init K8s client, 2) read secret, 3) init Control Plane
- Config only needs controlPlane.enabled flag, DB info auto-read from secret
- Default secret: primus-lens-control-plane-pguser-primus-lens-control-plane
- Default namespace: POD_NAMESPACE env or primus-lens
GORM error callback converts ErrRecordNotFound to nil, causing
GetByUsername to return an empty user struct with nil error.
Added check for non-empty user.ID to correctly detect existing users.
- AuthModeNone now supports local authentication
- Root user can always login using local credentials regardless of auth mode
- LDAP/SaFE/SSO modes now allow root user fallback to local auth
- Unknown auth modes fall back to local authentication with warning
…to nil

- provider_handler.go: Add ID check for GetByID and GetByName calls
- login_handler.go: Add ID check for GetByUsername calls
- This fixes the Auth Provider creation bug where empty struct was
  incorrectly detected as existing provider
- Also improves authenticateLocal to return proper ErrUserNotFound
  when user doesn't exist
- New UserSyncService syncs users from SaFE User CRD to lens_users table
- User admin status determined by SaFE roles (system-admin, system-admin-readonly)
- User restricted status maps to disabled status in Lens
- Sync runs every 60 seconds
- Also fixed GORM callback issue in token_sync_service.go
Added corev1.AddToScheme in controller manager's init function to ensure
core Kubernetes types (Secret, ConfigMap, etc.) are available when
K8s client is created, before any preInit callbacks are called.

This fixes the 'no kind is registered for the type v1.Secret in scheme'
error when primus-safe-adapter tries to read Control Plane config from Secret.
corev1 is now registered by default in controller/manager.go init(),
so app-specific schemes only need to register their custom types.
Added GetByNodeNameAndNamespaceNameIncludingDeleted and Recover methods
to NodeNamespaceMappingFacade.

Modified namespace_sync_service to check for soft-deleted mappings before
creating new ones. If a soft-deleted mapping exists, it recovers the record
instead of attempting to create a duplicate that would violate the unique
constraint.
…notations

- Add annotation key constants for primus-safe.user.email and primus-safe.user.name
- Extract email and display name from User CRD annotations during sync
- Update existing users if email or display name changed
- Include email and display name when creating new users
- Add SafeTokenCookieName constant for SaFE 'token' cookie
- Add getTokenFromRequest helper to extract token from multiple sources
- Token lookup priority: lens_session cookie -> SaFE token cookie -> Bearer header
- This enables users authenticated via SaFE SSO to access Lens APIs
- Works with primus-safe-adapter which syncs SaFE sessions to Lens DB
… format

- Add DisplayName field to SessionInfo struct
- Populate DisplayName from user record in Validate method
- Add CurrentUserResponse struct for /auth/me endpoint
- Use rest.SuccessResp for standard response format
- Return display_name alongside username for better UX
…llout

- Apply OptionalAuthMiddleware to /nodes/* routes
- Allows authenticated and unauthenticated access during migration
- Session info available in context when user is authenticated
- Add SessionAuthMiddleware to all business API route groups
- Protected routes: nodes, pods, clusters, workloads, storage, alerts,
  gpu-aggregation, job-history, ai-metadata, weekly-reports, detection-configs,
  profiler, tracelens, perfetto, registry, system-config, realtime,
  pyspy, github-workflow-metrics, github-runners
- Keep /detection-status/log-report public for telemetry-processor internal use
- All routes now require valid SaFE SSO session cookie for access
…ience

- Token sync: 30s -> 1s (near real-time token availability)
- User sync: 60s -> 5s (quick user info propagation)
- Users can now access Lens within seconds after SaFE login
…delay

- Retry session validation once after 2s delay if first attempt fails
- Handles race condition when user just logged in via SaFE
  but token hasn't synced to Lens yet
- Respects request context cancellation during retry wait
- Log retry attempts for debugging
SaFE uses 'Token' (capital T) as the cookie name, not 'token'.
Cookie names are case-sensitive.
Copilot AI review requested due to automatic review settings January 10, 2026 05:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements SaFE SSO cookie authentication support for the Lens API, enabling seamless single sign-on integration between SaFE and Lens systems. The implementation includes a comprehensive authentication system with database-backed user management, session handling, and LDAP support.

Changes:

  • Added Control Plane database infrastructure with PostgreSQL support for user/session management
  • Implemented SaFE SSO cookie authentication with 2-second retry logic for token sync delays
  • Protected all business API routes with session authentication middleware while keeping telemetry endpoints public

Reviewed changes

Copilot reviewed 63 out of 65 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
modules/core/pkg/server/server.go Reads DB config from K8s Secret after client initialization
modules/core/pkg/controlplane/database/* Implements Control Plane database facades for users, sessions, auth providers
modules/core/pkg/controlplane/auth/* Core authentication logic including session management, LDAP, and SafeDetector
modules/api/pkg/api/auth/* Authentication API handlers and middleware with SaFE SSO support
modules/api/pkg/api/router.go Applies session auth middleware to business routes
modules/core/pkg/clientsets/* Control Plane database connection management
Comments suppressed due to low confidence (1)

Lens/modules/core/pkg/controlplane/auth/ldap/provider.go:1

  • The error return from crypto/rand.Read is ignored. If random number generation fails, the bytes slice will contain zeros, leading to weak or predictable IDs. Check and handle the error: if _, err := rand.Read(bytes); err != nil { return "", err }
// Copyright (C) 2025-2026, Advanced Micro Devices, Inc. All rights reserved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


func (e ExtType) Value() (driver.Value, error) {
b, err := json.Marshal(e)
return *(*string)(unsafe.Pointer(&b)), err
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unsafe.Pointer to convert []byte to string bypasses Go's type safety and could lead to memory safety issues. The byte slice 'b' could be garbage collected while the string is still in use, causing undefined behavior. Use the safe conversion: return string(b), err

Copilot uses AI. Check for mistakes.

func generateAuditID() string {
bytes := make([]byte, 16)
rand.Read(bytes)
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error return from crypto/rand.Read is ignored. If random number generation fails, audit IDs could collide, compromising audit integrity. Check and handle the error: if _, err := rand.Read(bytes); err != nil { return "" }

Suggested change
rand.Read(bytes)
if _, err := rand.Read(bytes); err != nil {
log.Errorf("failed to generate audit ID: %v", err)
return ""
}

Copilot uses AI. Check for mistakes.
// randomHex generates a random hex string of the given length
func randomHex(n int) string {
bytes := make([]byte, n/2)
rand.Read(bytes)
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error return from crypto/rand.Read is ignored. If random number generation fails, user IDs could collide, creating serious security issues. Check and handle the error: if _, err := rand.Read(bytes); err != nil { return "" }

Suggested change
rand.Read(bytes)
if _, err := rand.Read(bytes); err != nil {
return ""
}

Copilot uses AI. Check for mistakes.

func generateSessionID() string {
bytes := make([]byte, 16)
rand.Read(bytes)
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error return from crypto/rand.Read is ignored. If random number generation fails, session IDs could collide, allowing session hijacking. Check and handle the error: if _, err := rand.Read(bytes); err != nil { panic(err) }

Suggested change
rand.Read(bytes)
if _, err := rand.Read(bytes); err != nil {
panic(fmt.Errorf("failed to generate session ID: %w", err))
}

Copilot uses AI. Check for mistakes.
log.Debugf("Session validation failed (attempt 1): %v, retrying in %v", err, sessionValidationRetryDelay)

select {
case <-time.After(sessionValidationRetryDelay):
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic blocks the request for 2 seconds on every authentication failure, which could be exploited for DoS attacks by sending many invalid tokens. Consider implementing exponential backoff or limiting retries to specific error types (e.g., only retry on 'token not found' errors that might indicate sync lag, not on 'token expired' or 'invalid format' errors).

Copilot uses AI. Check for mistakes.
Comment on lines +405 to +407
func contains(s, substr string) bool {
return len(s) >= len(substr) && (s == substr || len(s) > 0 && containsImpl(s, substr))
}
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use strings.Contains from the standard library instead of implementing a custom contains function. The standard library function is well-tested and more efficient.

Copilot uses AI. Check for mistakes.

// Write custom type file
customFilePath := fmt.Sprintf("%s/ext_type.go", outPath)
err = os.WriteFile(customFilePath, []byte(customTypeFileContent), 0644)
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File permissions 0644 allow group and world read access to generated code files. Use 0600 or 0640 to restrict access since these files may contain sensitive database schema information.

Suggested change
err = os.WriteFile(customFilePath, []byte(customTypeFileContent), 0644)
err = os.WriteFile(customFilePath, []byte(customTypeFileContent), 0600)

Copilot uses AI. Check for mistakes.
- Add auto_register service to automatically enable safe mode from DB config
- Add session validator for direct SaFE DB validation
- Remove token sync (replaced by direct DB validation)
- Add SafeSetupConfig to init API for adapter_url and sso_url configuration
- Add new config keys: safe.adapter_url, safe.sso_url
- Add oidc package with types and safe_validator
- Add auth provider interfaces (local, ldap, oidc)
- Update constants with new config keys
- Removed oidc/router.go which referenced undefined NewHandlers()
- Removed import of oidc package from auth/router.go
- Authentication routes will use existing login handler
- Created middleware.go with SessionAuthMiddleware() and AdminAuthMiddleware()
- SessionAuthMiddleware validates session using global HandleAuth context
- AdminAuthMiddleware checks for admin/root user type
- Created initializer.go with Initializer type, NewInitializer, NewInitializerWithK8s
- Added SafeSetupConfig, InitializeOptions, InitializeResult types
- Added InitializeAuthHandlers function to init_handler.go
- Remove NewInitializerWithK8s, keep only NewInitializer with K8s client
- Remove NewSafeDetectorWithoutK8s - system always runs in K8s
- Simplify bootstrap.go initialization logic
SaFE stores user info in SSO/LDAP, not in database.
Only UserToken table exists for session validation.
Backend changes for Safe authentication mode:

1. AuthConfigService (config_service.go):
   - Read auth mode from database with TTL cache
   - Support Safe/LDAP/Local/SSO modes
   - Get Safe config (adapter_url, login_url, callback_path)

2. Safe Login Handler (safe_login_handler.go):
   - GET /api/v1/auth/config - return auth configuration
   - GET /api/v1/auth/login - redirect to SaFE login (302)
   - Sanitize redirect URLs to prevent open redirect

3. Dynamic Auth Middleware (auth.go):
   - HandleDynamicAuth() reads config from database
   - Safe mode: validate Token cookie via primus-safe-adapter
   - Local/LDAP mode: validate lens_session via session manager

4. Router updates:
   - Register new auth endpoints
   - Use HandleDynamicAuth with configurable exclude paths

Related: safe-auth-design.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants