Finetune and Evaluation Libra on Custom Datasets

Training/Validation Dataset Format

Convert your train/valid data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image or images), and conversations (the conversation data between human and AI).

Here's a sample JSON for finetuning Libra for Radiology Report Generation:

[
    {
        "id": 12345678,
        "image": [
            "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg",
            "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nProvide a detailed description of the findings in the radiology image. Following clinical context: History: Chest tightness."
            },
            {
                "from": "gpt",
                "value": "The heart and mediastinum are normal. The lung fields are clear. The costophrenic angles are sharp. No infiltrates are present. There is no evidence of a pneumothorax."
            }
        ]
    },
    ...
]

If there is only one image, you only need to pass a single path to the "image".

[   
    {
        ...
        "image": [
            "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg"
        ]
        ...
    } 
]

Evaluation Dataset Format

Convert your eval data to a JSON Line file of a List of all samples. Sample metadata should contain question_id (a unique identifier), image (the path to the image or images), and text (the prompt or question to be answered by the model). The reference (the ground truth answer) is optional and can be used for scoring evaluations.

Here's a sample JSONL for evaluating Libra for Radiology Report Generation:

{"question_id": 12345678, "image": ["files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg", "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg"], "text": "Provide a detailed description of the findings in the radiology image. Following clinical context: History: Chest tightness.", "reference": "The heart and mediastinum are normal. The lung fields are clear. The costophrenic angles are sharp. No infiltrates are present. There is no evidence of a pneumothorax."}
...

Tips

Finetuning with Limited Data

If you have limited task-specific data, we recommend finetuning from Libra checkpoints using LoRA. Follow this finetune_lora.sh script.

Finetuning with Sufficient Data

If you have sufficient task-specific data, you can perform full-model finetuning from Libra checkpoints. Follow this finetune.sh script.

Hyperparameter Adjustment

Adjust the hyperparameters to fit your specific dataset and hardware constraints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetune and Evaluation Libra on Custom Datasets

Training/Validation Dataset Format

Evaluation Dataset Format

Tips

Finetuning with Limited Data

Finetuning with Sufficient Data

Hyperparameter Adjustment

FilesExpand file tree

CUSTOM_DATA.md

Latest commit

History

CUSTOM_DATA.md

File metadata and controls

Finetune and Evaluation Libra on Custom Datasets

Training/Validation Dataset Format

Evaluation Dataset Format

Tips

Finetuning with Limited Data

Finetuning with Sufficient Data

Hyperparameter Adjustment