Skip to content

Commit 262dff0

Browse files
authored
Fix Agent prompt and infra (#804)
Fixing some issues revealed by full agent experiments earlier: 1. [x] LLM-generated build scripts do not save fuzz target binary into the correct path. 2. [x] Use default build script in the code fixing prompt in this scenario: 1. The default build scripts builds successfully but failed other checks (i.e., reference), and 2. The LLM-generated build script does not work. 3. [x] Selectively use the default built script and the LLM-generated built script, depending which is better. 4. [x] Use different code-fixing prompts based on which built script and which result it is: * default or LLM built script * No reference, no binary, or compilation failure 5. [x] Backup human-writtent `/src/build.sh` to `/src/build.bk.sh` in agent's containers in case LLM wants to reuse it in the new build script. * Create the same copy for fuzzing execution. 6. [x] Hide the compile command to prevent LLM from reusing it in the inspection tool and be distracted by irrelevant errors. E.g.: * The inspection container always runs compile before LLM analysis. Rerunning it may fail in some projects due to an existing /src/<project>/build directory. 7. [x] Prompt use example fuzz target in the language the same as the generated fuzz target, (not the project). * Also dynamically adjust instructions in priming. Do not leave LLM to judge which language the fuzz target is. 8. [x] Remove the agent log when receiving fuzz targets. 9. [x] Do not restrict LLM to send one bash command per query. Also need to: 1. [ ] Use SemanticAnalyzer in agent workflow, at least to ensure the last Result is Analysis Result. 2. [ ] Add an Enhancer in agent workflow. 3. [ ] Use service account in GKE, hopefully this will solve the [`Service Unavailable` problem](google/oss-fuzz#13042).
1 parent 16bed89 commit 262dff0

13 files changed

+652
-135
lines changed

agent/prototyper.py

Lines changed: 232 additions & 103 deletions
Large diffs are not rendered by default.

experiment/builder_runner.py

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -922,17 +922,15 @@ def build_and_run_cloud(
922922
f'--real_project={project_name}',
923923
]
924924

925-
# Temporarily comment out due to error in cached images.
926-
# TODO(dongge): Add this back when the cached image works again.
927-
# if oss_fuzz_checkout.ENABLE_CACHING and (
928-
# oss_fuzz_checkout.is_image_cached(project_name, 'address') and
929-
# oss_fuzz_checkout.is_image_cached(project_name, 'coverage')):
930-
# logger.info('Using cached image for %s', project_name)
931-
# command.append('--use_cached_image')
932-
933-
# # Overwrite the Dockerfile to be caching friendly
934-
# oss_fuzz_checkout.rewrite_project_to_cached_project_chronos(
935-
# generated_project)
925+
if oss_fuzz_checkout.ENABLE_CACHING and (
926+
oss_fuzz_checkout.is_image_cached(project_name, 'address') and
927+
oss_fuzz_checkout.is_image_cached(project_name, 'coverage')):
928+
logger.info('Using cached image for %s', project_name)
929+
command.append('--use_cached_image')
930+
931+
# Overwrite the Dockerfile to be caching friendly
932+
oss_fuzz_checkout.rewrite_project_to_cached_project_chronos(
933+
generated_project)
936934

937935
if cloud_build_tags:
938936
command += ['--tags'] + cloud_build_tags

experiment/evaluator.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,8 @@ def create_ossfuzz_project(self,
306306
os.path.basename('agent-build.sh')))
307307

308308
# Add additional statement in dockerfile to overwrite with generated fuzzer
309+
with open(os.path.join(generated_project_path, 'Dockerfile'), 'a') as f:
310+
f.write('\nRUN cp /src/build.sh /src/build.bk.sh\n')
309311
with open(os.path.join(generated_project_path, 'Dockerfile'), 'a') as f:
310312
f.write('\nCOPY agent-build.sh /src/build.sh\n')
311313

experiment/oss_fuzz_checkout.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def _clone_oss_fuzz_repo():
7070
"""Clones OSS-Fuzz to |OSS_FUZZ_DIR|."""
7171
clone_command = [
7272
'git', 'clone', 'https://github.com/google/oss-fuzz', '--depth', '1',
73-
'--branch', 'target-exp-log-account', OSS_FUZZ_DIR
73+
OSS_FUZZ_DIR
7474
]
7575
proc = sp.Popen(clone_command,
7676
stdout=sp.PIPE,

llm_toolkit/prompt_builder.py

Lines changed: 65 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
from experiment.benchmark import Benchmark, FileType
2929
from experiment.fuzz_target_error import SemanticCheckResult
3030
from llm_toolkit import models, prompts
31+
from results import BuildResult
3132

3233
logger = logging.getLogger(__name__)
3334

@@ -546,15 +547,22 @@ class PrototyperTemplateBuilder(DefaultTemplateBuilder):
546547
def __init__(self,
547548
model: models.LLM,
548549
benchmark: Benchmark,
549-
template_dir: str = DEFAULT_TEMPLATE_DIR):
550-
super().__init__(model)
551-
self._template_dir = template_dir
550+
template_dir: str = DEFAULT_TEMPLATE_DIR,
551+
initial: Any = None):
552+
super().__init__(model, benchmark, template_dir, initial)
552553
self.agent_templare_dir = AGENT_TEMPLATE_DIR
553-
self.benchmark = benchmark
554554

555555
# Load templates.
556-
self.priming_template_file = self._find_template(self.agent_templare_dir,
557-
'prototyper-priming.txt')
556+
if benchmark.is_c_target:
557+
self.priming_template_file = self._find_template(
558+
self.agent_templare_dir, 'prototyper-priming.c.txt')
559+
elif benchmark.is_cpp_target:
560+
self.priming_template_file = self._find_template(
561+
self.agent_templare_dir, 'prototyper-priming.cpp.txt')
562+
else:
563+
self.problem_template_file = self._find_template(
564+
self.agent_templare_dir, 'prototyper-priming.txt')
565+
558566
self.cpp_priming_filler_file = self._find_template(
559567
template_dir, 'cpp-specific-priming-filler.txt')
560568
self.problem_template_file = self._find_template(template_dir,
@@ -568,11 +576,13 @@ def build(self,
568576
example_pair: list[list[str]],
569577
project_example_content: Optional[list[list[str]]] = None,
570578
project_context_content: Optional[dict] = None,
571-
tool_guides: str = '') -> prompts.Prompt:
579+
tool_guides: str = '',
580+
project_dir: str = '') -> prompts.Prompt:
572581
"""Constructs a prompt using the templates in |self| and saves it."""
573582
if not self.benchmark:
574583
return self._prompt
575584
priming = self._format_priming(self.benchmark)
585+
priming = priming.replace('{PROJECT_DIR}', project_dir)
576586
final_problem = self.format_problem(self.benchmark.function_signature)
577587
final_problem += (f'You MUST call <code>\n'
578588
f'{self.benchmark.function_signature}\n'
@@ -585,6 +595,54 @@ def build(self,
585595
return self._prompt
586596

587597

598+
class PrototyperFixerTemplateBuilder(PrototyperTemplateBuilder):
599+
"""Builder specifically targeted C (and excluding C++)."""
600+
601+
def __init__(self,
602+
model: models.LLM,
603+
benchmark: Benchmark,
604+
build_result: BuildResult,
605+
compile_log: str,
606+
template_dir: str = DEFAULT_TEMPLATE_DIR,
607+
initial: Any = None):
608+
super().__init__(model, benchmark, template_dir, initial)
609+
# Load templates.
610+
self.priming_template_file = self._find_template(self.agent_templare_dir,
611+
'prototyper-fixing.txt')
612+
self.build_result = build_result
613+
self.compile_log = compile_log
614+
615+
def build(self,
616+
example_pair: list[list[str]],
617+
project_example_content: Optional[list[list[str]]] = None,
618+
project_context_content: Optional[dict] = None,
619+
tool_guides: str = '',
620+
project_dir: str = '') -> prompts.Prompt:
621+
"""Constructs a prompt using the templates in |self| and saves it."""
622+
del (example_pair, project_example_content, project_context_content,
623+
tool_guides)
624+
if not self.benchmark:
625+
return self._prompt
626+
627+
if self.build_result.build_script_source:
628+
build_text = (f'<build script>\n{self.build_result.build_script_source}\n'
629+
'</build script>')
630+
else:
631+
build_text = 'Build script reuses `/src/build.bk.sh`.'
632+
633+
prompt = self._get_template(self.priming_template_file)
634+
prompt = prompt.replace('{FUZZ_TARGET_SOURCE}',
635+
self.build_result.fuzz_target_source)
636+
prompt = prompt.replace('{BUILD_TEXT}', build_text)
637+
prompt = prompt.replace('{COMPILE_LOG}', self.compile_log)
638+
prompt = prompt.replace('{FUNCTION_SIGNATURE}',
639+
self.benchmark.function_signature)
640+
prompt = prompt.replace('{PROJECT_DIR}', project_dir)
641+
self._prompt.append(prompt)
642+
643+
return self._prompt
644+
645+
588646
class DefaultJvmTemplateBuilder(PromptBuilder):
589647
"""Default builder for JVM projects."""
590648

prompts/agent/prototyper-fixing.txt

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Failed to build fuzz target. Here is the fuzz target, build script, compilation command, and compilation output:
2+
<fuzz target>\n{FUZZ_TARGET_SOURCE}\n</fuzz target>
3+
{BUILD_TEXT}
4+
<compilation log>\n{COMPILE_LOG}\n</compilation log>
5+
YOU MUST first analyze the error messages with the fuzz target and the build script carefully to identify the root cause.
6+
YOU MUST NOT make any assumptions of the source code or build environment. Always confirm assumptions with source code evidence, obtain them via Bash commands.
7+
Once you are absolutely certain of the error root cause, output the FULL SOURCE CODE of the fuzz target (and FULL SOURCE CODE of build script, if /src/build.bk.sh is insufficient).
8+
TIPS:
9+
1. If necessary, #include necessary headers and #define required macros or constants in the fuzz target.
10+
2. Adjust compiler flags to link required libraries in the build script.
11+
3. After collecting information, analyzing and understanding the error root cause. YOU MUST take at least one step to validate your theory with source code evidence.
12+
4. Always use the source code from project source code directory `{PROJECT_DIR}/` to understand errors and how to fix them. For example, search for the key words (e.g., function name, type name, constant name) in the source code to learn how they are used. Similarly, learn from the other fuzz targets and the build script to understand how to include the correct headers.
13+
5. Once you have verified the error root cause, output the FULL SOURCE CODE of the fuzz target (and FULL SOURCE CODE of build script, if /src/build.bk.sh is insufficient).
14+
6. Focus on writing a compilable fuzz target that calls the function-under-test {FUNCTION_SIGNATURE}, don't worry about coverage or finding bugs. We can improve that later, but first try to ensure it calls the function-under-test {FUNCTION_SIGNATURE} and can compile successfully.
15+
7. If an error happens repeatedly and cannot be fixed, try to mitigate it. For example, replace or remove the line.
16+
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
<system>
2+
As a security testing engineer, you must write an `int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)` fuzz target in {LANGUAGE}.
3+
Objective: Your goal is to modify an existing fuzz target `{FUZZ_TARGET_PATH}` to write a minimum fuzz target of a given function-under-test that can build successfully.
4+
</system>
5+
6+
<steps>
7+
Follow these steps to write a minimum fuzz target:
8+
9+
Step 1. Determine the information you need to write an effective fuzz target.
10+
This includes:
11+
* **Source code** of the function under test.
12+
* **Custom Types and Dependencies** definitions and implementations.
13+
* **Initialization and setup** requirements and steps.
14+
* **Build details** and integration steps.
15+
* Valid and edge-case input values.
16+
* Environmental and runtime dependencies.
17+
18+
Step 2. Collect information using the Bash tool.
19+
Use the bash tool (see <tool> section) and follow its rules to gather the necessary information. You can collect information from:
20+
* The existing human written fuzz target at `{FUZZ_TARGET_PATH}`.
21+
* The existing human written build script `/src/build.bk.sh`.
22+
* The project source code directory `{PROJECT_DIR}/` cloned from the project repository.
23+
* Documentation about the project, the function, and the variables/constants involved.
24+
* Environment variables.
25+
* Knowledge about OSS-Fuzz's build infrastructure: It will compile your fuzz target in the same way as the exiting human written fuzz target with the build script.
26+
27+
Step 3. Analyze the function and its parameters.
28+
Understand the function under test by analyzing its source code and documentation:
29+
* **Purpose and functionality** of the function.
30+
* **Input processing** and internal logic.
31+
* **Dependencies** on other functions or global variables.
32+
* **Error handling** and edge cases.
33+
34+
Step 4. Understand initialization requirements.
35+
Identify what is needed to properly initialize the function:
36+
* **Header files** and their relative paths used by include statements in the fuzz target.
37+
* **Complex input parameters or objects** initialization.
38+
* **Constructor functions** or initialization routines.
39+
* **Global state** or configuration needs to be set up.
40+
* **Mocking** external dependencies if necessary.
41+
42+
Step 5. Understand Constraints and edge cases.
43+
For each input parameter, understand:
44+
* Valid ranges and data types.
45+
* Invalid or edge-case values (e.g., zero, NULL, predefined constants, maximum values).
46+
* Special values that trigger different code paths.
47+
48+
Step 6: Plan Fuzz Target Implementation.
49+
Decide how to implement the fuzz target:
50+
* **Extract parameters** from the `data` and `size` variable of `LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)`.
51+
* Handle fixed-size versus variable-size data.
52+
* **Initialize function's parameters** by appropriately mapping the raw input bytes.
53+
* Ensure that the fuzz target remains deterministic and avoids side effects.
54+
* Avoid `goto` statements.
55+
56+
Step 7: **Write** the fuzz target code.
57+
Implement the `LLVMFuzzerTestOneInput` function:
58+
* Header files:
59+
* Investigate how existing fuzz targets include headers.
60+
* Investigate where they are located in the project
61+
* Collect all headers required by your fuzz target and their locations.
62+
* Include their relative path in the same way as the existing fuzz targets.
63+
* Macros or Constants:
64+
* Include or define necessary macros or constants.
65+
* Input Handling:
66+
* Check that the input size is sufficient.
67+
* Extract parameters from the input data.
68+
* Handle any necessary conversions or validations.
69+
* Function Invocation:
70+
* Initialize required objects or state.
71+
* Modify the existing fuzz target at `{FUZZ_TARGET_PATH}` to fuzz the function under test with the fuzzed parameters.
72+
* Ensure proper error handling.
73+
*
74+
* Cleanup:
75+
* Free any allocated resources.
76+
* Reset any global state if necessary.
77+
78+
Step 8 (Optional): **Modify** the Build Script.
79+
Write a new build script only if the existing one (`/src/build.bk.sh`) is insufficient:
80+
* Decide if you need to modify the build script at `/src/build.bk.sh` to successfully build the new fuzz target.
81+
* Include compilation steps for the project under test.
82+
* Include compilation steps for the new fuzz target.
83+
* Specify necessary compiler and linker flags.
84+
* Ensure all dependencies are correctly linked.
85+
86+
Step 9: Providing Your Conclusion:
87+
* Provide your conclusion on the FULL new fuzz target and build script **ONLY AFTER** you have gathered all necessary information.
88+
* **DO NOT SEND** any other content (e.g., bash tool commands) in the conclusion message. ALWAYS send other commands individually and ONLY SEND conclusion after collecting all information.
89+
* Conclusion Format:
90+
* Overall Description:
91+
* Summarize your findings and describe your fuzz target design.
92+
* Wrap this summary within <conclusion> and </conclusion> tags.
93+
* Modified Fuzz Target:
94+
* Provide the full code of the modified fuzz target.
95+
* Wrap the code within <fuzz target> and </fuzz target> tags.
96+
* Modified Build Script (if applicable):
97+
* If you need to modify the build script, provide the full code.
98+
* Wrap it within <build script> and </build script> tags.
99+
* Format Example:
100+
<conclusion>
101+
I determined that the fuzz target needs to include specific header files and adjust the `LLVMFuzzerTestOneInput` function to call the new function-under-test. Additionally, the build script requires modification to link against the necessary libraries.
102+
</conclusion>
103+
<fuzz target>
104+
[Your FULL fuzz target code here.]
105+
</fuzz target>
106+
<build script>
107+
[Your FULL build script code here, if applicable.]
108+
</build script>
109+
110+
</steps>
111+
112+
{TYPE_SPECIFIC_PRIMING}
113+
114+
<instructions>
115+
3. Methodical Approach:
116+
* Be systematic to cover all necessary aspects, such as:
117+
* Understanding the function's parameters and dependencies.
118+
* Identifying required header files and libraries.
119+
* Recognizing any special initialization or environmental requirements.
120+
1. Utilizing Existing Examples:
121+
* Use the existing fuzz target at `{FUZZ_TARGET_PATH}` and other fuzz targets with `LLVMFuzzerTestOneInput` in its parent directory as references.
122+
* Pay special attention to:
123+
* How header files are included.
124+
* The structure and content of the `LLVMFuzzerTestOneInput` function.
125+
* Typically, you only need to modify the content of `LLVMFuzzerTestOneInput`.
126+
2. Investigating Header Inclusions:
127+
* Use bash tool to find required headers and libraries.
128+
* Examine library files built by `/src/build.bk.sh` to understand available functions and symbols.
129+
3. Modifying the Build Script (if necessary):
130+
* Modifying `/src/build.bk.sh` to build the necessary components or include required libraries if function-under-test is not included.
131+
* The project's directory may contain a `README.md` with build instructions (e.g., at `/src/<project-name>/README.md`
132+
4. Do Not Compile:
133+
* **Do not compile** the fuzz target during your investigation.
134+
* Provide your conclusions based on the information gathered after you have a solution.
135+
5. Formatting Code Snippets:
136+
* Do not wrap code snippets with triple backticks (```).
137+
* Use the specified XML-style tags for wrapping code and other content.
138+
6. DO NOT send the <conclusion> early: Provide conclusions **only after** gathering all necessary information.
139+
7. Focus on Final Goals:
140+
* Ensure that your fuzz target and build script aim to successfully build the fuzz target and fuzz the function-under-test.
141+
</instructions>

0 commit comments

Comments
 (0)