Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

Code and data for the paper Improved Generalized Planning with LLMs through Strategy Refinement and Reflection, accepted at ICAPS 2026 (ICAPS version will be uploaded to arxiv soon).

A previous version of the paper was presented at the Workshop on Planning in the Era of LLMs @ ICAPS25
The code for the previous paper version is availble as release v1.0

See also the project website for an overview.

Prerequisites

Current version of the code is set up for

OpenAI models and tested with OpenAI non-reasoning models
DeepSeek reasoning models
Both reasoning and non-reasoning models where a local model is run as a server using sglang

Requires

requirements specified in requirements.txt
cuda
sglang

Create a file 'set_env.py' in the utils folder

import os

def set_env_vars():
    os.environ['OPENAI_API_KEY'] = 
    os.environ['DEEPSEEK_API_KEY']

Code tested with Python 3.10 on Linux

Requires a compiled version of the plan validator VAL and fastdownward.
Set the FASTDOWNWARD and VAL path variables in the ./utils/paths.py script accordingly (line 7-8)

Data Availability

All data will be made available in a form that prevents crawling the data easily for training new LLMs. The exact way is not decided yet. To get access to the data in the meantime contact [email protected]

How to run

Generating Generalized Plans

python run_pipeline.py --env [dataset]-[domain] --config [config_path]

env: dataset name (i.e. subfolder of the ./data directory) and name of the domain, separated by '-', e.g. 'silver-ferry'
config_path: path to the configuration .json file

See also scripts in the sh_scripts_generation folder.
To generate additional sh_scripts run: python create_sh_scripts.py -d [dataset]-[domain] --conf [config_dir] --sh [sh_dir]
This will generate a file in the sh_dir directory with the commands for running the pipeline with all configuration files in the config_dir.

Configuration Files
Baseline

baseline.json: our baseline

Full framework:

full_3_6.json: F3-6
full_5_3.json: F5-3

Ablations

full_no_code_sr.json: no self-reflection during code debugging
ful_no_multicode.json: only one initial program version
full_no_strat_debug.json: no debugging on strategy

Evaluating Generalized Plans

python run_evaluation.py --env [dataset]-[domain] --eval_env [dataset_ev]-[domain_ev] --config [config_path] --out [output file]

env: dataset name, needs to match the dataset name from generating the generalized plans
eval_env: dataset name for the data to use for evaluation
config_path: path to the evaluation configuration .json file
output file: the name of the file where the results are saved; output directory is defined in the config file

Note on env vs. eval_env

env determines the name of the subfolders in the output directory from which the generalized plans are read
eval_env determines in which subfolders of the ./data directory the evaluation data can be found
the domains should be the same but can come from different datasets, e.g. --env silver-ferry --eval-env additional-ferry

See also the scripts in the sh_scripts_evaluation folder

Configuration Files

See ./eval_configs/eval_config_all.json

experiments_output_folder: folder with the output files of the program generation, i.e. with the programs to test
experiments_results_folder: folder with the corresponding result files from running the pipeline
experiments_names: names of the approaches, i.e. sub-folders in the outputs and results folder that should be considered
split_file: name of the .json file in the data directory which specifies which tasks to run the evaluation on,
eval_split_name: the key to extract from the json file
val_split_file: the name of the .json file that specified the data splits for generating the programs

Use generalized plan to create a plan for a specific instance

python generate_plan_for_instance.py [-t timeout] [-p] domain_file instance_file code_file output_path log_path

timeout: Set a specific timeout, default is 45 (given in seconds)
-p, --print: Set this flag to print the log file contents to console
domain_file: Path to the domain file for the relevant domain
instance_file: Path to the problem instance file
code_file: Path to the file with the generated python program
output_path: Filepath to store the plan at
log_path: Filepath to store the log notes at

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
agentic_genplan		agentic_genplan
agents		agents
configs		configs
data		data
eval_configs		eval_configs
evaluation		evaluation
feedback_generators		feedback_generators
llm_models		llm_models
plan_generation		plan_generation
prompts		prompts
prompts_reasoning		prompts_reasoning
prompts_reasoning_deepseek		prompts_reasoning_deepseek
sh_scripts_evaluation		sh_scripts_evaluation
sh_scripts_generation		sh_scripts_generation
utils		utils
.gitignore		.gitignore
README.md		README.md
create_sh_scripts.py		create_sh_scripts.py
exe.sh		exe.sh
generate_plan_for_instance.py		generate_plan_for_instance.py
rename_gpus.sh		rename_gpus.sh
requirements.txt		requirements.txt
run.sub		run.sub
run_evaluation.py		run_evaluation.py
run_pipeline.py		run_pipeline.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

Prerequisites

Data Availability

How to run

Generating Generalized Plans

Evaluating Generalized Plans

Use generalized plan to create a plan for a specific instance

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Improved Generalized Planning with LLMs through Strategy Refinement and Reflection

Prerequisites

Data Availability

How to run

Generating Generalized Plans

Evaluating Generalized Plans

Use generalized plan to create a plan for a specific instance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages