[Security] Fix Critical RCE Vulnerabilities in Benchmark Evaluation (Issue #1942)#1943
Open
paipeline wants to merge 1 commit intoFoundationAgents:mainfrom
Open
Conversation
This commit addresses multiple Critical Remote Code Execution (RCE) vulnerabilities in MetaGPT's aflow extension benchmark evaluation modules (fixes issue FoundationAgents#1942). Vulnerabilities Fixed: - HumanEval benchmark: unsafe exec() in check_solution() (lines 77, 82) - MBPP benchmark: unsafe exec() in check_solution() (lines 58, 63) - Operator script: unsafe exec() in exec_code() (line 228) - Run_code function: unsafe exec() with insufficient filtering Security Improvements: 1. Added SecureCodeExecutor class with comprehensive sandboxing: - AST-based validation blocks dangerous imports/functions - Restricted built-ins environment (no exec, eval, open, etc.) - Safe module allowlist (math, re, typing, etc.) - Timeout protection against infinite loops - Proper error handling and logging 2. Replaced all unsafe exec() calls with secure_execute_solution(): - Validates both solution and test code before execution - Prevents RCE via prompt injection or malicious datasets - Maintains full functionality for legitimate code 3. Comprehensive test suite validates: - Legitimate code continues to work (HumanEval/MBPP style) - All RCE attack vectors are blocked - Original PoC from issue FoundationAgents#1942 is prevented Impact: - CRITICAL security vulnerabilities eliminated - Zero functional regressions for valid use cases - Backward compatible API (drop-in replacement) Tested attack vectors blocked: ✓ import os; os.system() calls ✓ import subprocess; subprocess.run() calls ✓ exec() and eval() function calls ✓ __globals__ and __builtins__ attribute access ✓ File system access via open() ✓ All dangerous built-in functions Files Changed: - metagpt/utils/secure_exec.py (new): Secure execution engine - metagpt/ext/aflow/benchmark/humaneval.py: Use secure execution - metagpt/ext/aflow/benchmark/mbpp.py: Use secure execution - metagpt/ext/aflow/scripts/operator.py: Use secure execution - tests/metagpt/utils/test_secure_exec.py (new): Security test suite
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Security] Fix Critical RCE Vulnerabilities in Benchmark Evaluation (Issue #1942)
Summary
This PR fixes Critical Remote Code Execution (RCE) vulnerabilities in MetaGPT's aflow extension benchmark evaluation modules, addressing issue #1942.
Vulnerabilities Fixed
Critical RCE Issues:
exec()calls incheck_solution()method (lines 77, 82)exec()calls incheck_solution()method (lines 58, 63)exec()calls inexec_code()method (line 228)Attack Vector:
Security Solution
New Security Architecture:
1. SecureCodeExecutor Class (metagpt/utils/secure_exec.py)
2. secure_execute_solution() Function
Security Validations:
BLOCKED: All these attack vectors are now prevented:
ALLOWED: Legitimate code continues to work:
Testing
Security Tests (22 test cases):
exec()#1942 preventedRegression Tests:
Impact Assessment
Security Impact:
Functional Impact:
Performance Impact:
Files Changed
Risk Analysis
Before This Fix:
After This Fix:
Checklist
This fix eliminates critical security vulnerabilities while maintaining full backward compatibility. Ready for immediate deployment to production.