Summary
For very long streaming responses, collected_chunks list can grow unbounded, potentially causing memory issues.
Location
src/gateway/main.py - chat_completions() function, streaming path
Current Behavior
In the generate() function, chunks are accumulated without limit.
Expected Behavior
Implement a limit on stored chunks or use streaming token extraction without full storage.
Priority
Low
References
- CWE-400: Resource Exhaustion
Summary
For very long streaming responses, collected_chunks list can grow unbounded, potentially causing memory issues.
Location
src/gateway/main.py - chat_completions() function, streaming path
Current Behavior
In the generate() function, chunks are accumulated without limit.
Expected Behavior
Implement a limit on stored chunks or use streaming token extraction without full storage.
Priority
Low
References