This is the official codebase for Context-Informed Grounding Supervision (CINGS). It includes the training and inference code for reproducing our main experiments.
Install dependencies using:
pip install -r requirements.txtWe use a filtered version of the Self-RAG dataset for training.
Model checkpoints will be released soon. Stay tuned!
To train using LLaMA 3 8B as the base model:
torchrun --nnodes 1 --master_port=29100 --nproc_per_node 8 train.py \
--enable_fsdp --low_cpu_fsdp \
--training_argument configs/training_configs/llama3_train.json \
--model_name meta-llama/Llama-3.1-8B \
--token_name meta-llama/Llama-3.1-8B-Instruct \
--num_epochs 3 \
--dataset llava_llama3_selfrag_single_dataset \
--dist_checkpoint_folder llama3_basemodel \`c
--batch_size_training 128 \
--micro_batch_size 16 \
--loss_mask_context context \
--model_use_peftArgument descriptions:
-
--training_argument: Select config with the base model name. Check files underconfigs/training_configs. -
--model_name: Base model to fine-tune. -
--token_name: Tokenizer name (Instruct version for chat template compatibility). -
--dataset: Training dataset (aligned with your base model). Check the list of datasets underconfigs/datasets_dpr.py. -
--dist_checkpoint_folder: Folder to save checkpoints. -
--loss_mask_context: Choose from:no_contextβ standard instruction tuningcontextβ CINGS (ours)no_maskβ CINGS without context masking
-
--model_use_peft: Use LoRA for parameter-efficient fine-tuning (remove to train all parameters).
After training the language model, use the official LLaVA repo for vision-language alignment.
Update the following scripts:
scripts/pretrain.shscripts/finetune.sh
Replace model_name_or_path with the checkpoint folder from the text-only training step (dist_checkpoint_folder).
CUDA_VISIBLE_DEVICES=0 accelerate launch inference.py \
--training_argument {training_argument}.json \
--dataset {dataset} \
--dist_checkpoint_folder {dist_checkpoint_folder} \
--val_batch_size 1 \
--add_docs \
--model_use_peftUse the same arguments as training. Only --dataset should be updated to point to your evaluation dataset (see configs/datasets.py for available options).
We follow the evaluation process from the official LLaVA repo. See the evaluation guide for details.
If you use this work, please cite:
@misc{lee2025contextinformedgroundingsupervision,
title={Context-Informed Grounding Supervision},
author={Hyunji Lee and Seunghyun Yoon and Yunjae Won and Hanseok Oh and Geewook Kim and Trung Bui and Franck Dernoncourt and Elias Stengel-Eskin and Mohit Bansal and Minjoon Seo},
year={2025},
eprint={2506.15480},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.15480}
}This repository builds on Metaβs LLaMA Recipes. We are grateful to the community and all contributors.