fix: patch 4 critical security vulnerabilities#497
fix: patch 4 critical security vulnerabilities#497gn00295120 wants to merge 2 commits intoopenai:mainfrom
Conversation
- Fix SQL injection in postgres_datastore.py delete_by_filters by replacing f-string interpolation with parameterized queries - Fix Zip Slip path traversal in process_zip.py by validating member paths before extraction - Fix race condition with hardcoded temp file path in file.py by using tempfile.NamedTemporaryFile - Fix timing side-channel in bearer token comparison in main.py by using secrets.compare_digest
abb2299 to
7ad2c87
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses multiple security vulnerabilities across the Postgres datastore, zip processing script, upload temp-file handling, and bearer token validation.
Changes:
- Parameterizes
delete_by_filtersSQL predicates to prevent SQL injection. - Adds a safe zip extraction helper to block Zip Slip path traversal.
- Replaces a fixed
/tmpupload path withNamedTemporaryFilefor safer temp storage. - Uses
secrets.compare_digestto reduce timing side-channel leakage in token checks.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
datastore/providers/postgres_datastore.py |
Switches delete_by_filters to parameterized predicates via %s + params list. |
scripts/process_zip/process_zip.py |
Introduces _safe_extract() to validate zip member paths before extraction. |
services/file.py |
Uses tempfile.NamedTemporaryFile instead of a hardcoded temp filepath for uploads. |
server/main.py |
Replaces direct token inequality with constant-time comparison (compare_digest). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| filters = "WHERE" | ||
| params = [] | ||
| if filter.document_id: | ||
| filters += f" document_id = '{filter.document_id}' AND" | ||
| filters += " document_id = %s AND" | ||
| params.append(filter.document_id) | ||
| if filter.source: | ||
| filters += f" source = '{filter.source}' AND" | ||
| filters += " source = %s AND" | ||
| params.append(filter.source) | ||
| if filter.source_id: | ||
| filters += f" source_id = '{filter.source_id}' AND" | ||
| filters += " source_id = %s AND" | ||
| params.append(filter.source_id) | ||
| if filter.author: | ||
| filters += f" author = '{filter.author}' AND" | ||
| filters += " author = %s AND" | ||
| params.append(filter.author) | ||
| if filter.start_date: | ||
| filters += f" created_at >= '{filter.start_date}' AND" | ||
| filters += " created_at >= %s AND" | ||
| params.append(filter.start_date) | ||
| if filter.end_date: | ||
| filters += f" created_at <= '{filter.end_date}' AND" | ||
| filters += " created_at <= %s AND" | ||
| params.append(filter.end_date) | ||
| filters = filters[:-4] | ||
|
|
||
| with self.client.cursor() as cur: | ||
| cur.execute(f"DELETE FROM {table} {filters}") | ||
| cur.execute(f"DELETE FROM {table} {filters}", params) |
There was a problem hiding this comment.
filters = filters[:-4] assumes at least one predicate was appended (ending in " AND"). If the caller passes a DocumentMetadataFilter with all fields unset, filters becomes "W" and the resulting SQL (DELETE FROM <table> W) will fail. Consider explicitly handling the empty-filter case (e.g., raise a clear exception / return early) rather than issuing an invalid query (and definitely avoid turning it into an unbounded DELETE).
| async def delete_by_filters(self, table: str, filter: DocumentMetadataFilter): | ||
| """ | ||
| Deletes rows in the table that match the filter. | ||
| """ | ||
|
|
||
| filters = "WHERE" | ||
| params = [] | ||
| if filter.document_id: | ||
| filters += f" document_id = '{filter.document_id}' AND" | ||
| filters += " document_id = %s AND" | ||
| params.append(filter.document_id) | ||
| if filter.source: | ||
| filters += f" source = '{filter.source}' AND" | ||
| filters += " source = %s AND" | ||
| params.append(filter.source) | ||
| if filter.source_id: | ||
| filters += f" source_id = '{filter.source_id}' AND" | ||
| filters += " source_id = %s AND" | ||
| params.append(filter.source_id) | ||
| if filter.author: | ||
| filters += f" author = '{filter.author}' AND" | ||
| filters += " author = %s AND" | ||
| params.append(filter.author) | ||
| if filter.start_date: | ||
| filters += f" created_at >= '{filter.start_date}' AND" | ||
| filters += " created_at >= %s AND" | ||
| params.append(filter.start_date) | ||
| if filter.end_date: | ||
| filters += f" created_at <= '{filter.end_date}' AND" | ||
| filters += " created_at <= %s AND" | ||
| params.append(filter.end_date) | ||
| filters = filters[:-4] | ||
|
|
||
| with self.client.cursor() as cur: | ||
| cur.execute(f"DELETE FROM {table} {filters}") | ||
| cur.execute(f"DELETE FROM {table} {filters}", params) | ||
| self.client.commit() |
There was a problem hiding this comment.
This change materially alters delete behavior and is security-sensitive, but there’s no test exercising delete_by_filters (especially for combinations of filters and the empty-filter case). Adding a targeted unit/integration test in the existing Postgres datastore test suite would help prevent regressions and confirm correct parameter binding.
| def validate_token(credentials: HTTPAuthorizationCredentials = Depends(bearer_scheme)): | ||
| if credentials.scheme != "Bearer" or credentials.credentials != BEARER_TOKEN: | ||
| if credentials.scheme != "Bearer" or not secrets.compare_digest(credentials.credentials, BEARER_TOKEN): | ||
| raise HTTPException(status_code=401, detail="Invalid or missing token") |
There was a problem hiding this comment.
secrets.compare_digest will raise a TypeError if BEARER_TOKEN is None. Today that’s prevented via assert BEARER_TOKEN is not None, but asserts can be disabled (e.g., running Python with -O), which would turn a missing env var into a 500 instead of a deterministic startup failure or a 401. Prefer an explicit runtime check (raise a clear exception during startup / module import, or guard in validate_token) rather than relying on assert for required configuration.
- Refactor delete_by_filters to use conditions list instead of string slicing, preventing empty-filter edge case - Replace assert with explicit ValueError for BEARER_TOKEN check, ensuring it works with python -O
Summary
This PR fixes 4 security vulnerabilities identified in the codebase:
postgres_datastore.pydelete_by_filtersprocess_zip.pyservices/file.pyserver/main.pyChanges
1. SQL Injection in
delete_by_filters(datastore/providers/postgres_datastore.py)Before (vulnerable): Filter values interpolated directly into SQL via f-strings.
After (fixed): All filter values use parameterized
%splaceholders passed via aparamslist.The table name remains from config (not user input). All six user-controllable filter fields (
document_id,source,source_id,author,start_date,end_date) are now parameterized.2. Zip Slip path traversal (
scripts/process_zip/process_zip.py)Before (vulnerable):
zip_file.extractall("dump")with no member path validation.After (fixed): Added
_safe_extract()that resolves each member's real path and rejects any that escape the target directory:3. Hardcoded temp file path (
services/file.py)Before (vulnerable): All uploads wrote to the same
/tmp/temp_file, creating a race condition (TOCTOU) and predictable path.After (fixed): Uses
tempfile.NamedTemporaryFilefor unique, unpredictable temp files with correct extension:4. Timing side-channel in token comparison (
server/main.py)Before (vulnerable): Direct
!=comparison leaks token length/content via timing.After (fixed): Constant-time comparison via
secrets.compare_digest:Test plan
delete_by_filtersworks with various filter combinations against a Postgres instance../../etc/passwdstyle paths