Skip to content

Latest commit

 

History

History
72 lines (54 loc) · 3.12 KB

File metadata and controls

72 lines (54 loc) · 3.12 KB

Finetune and Evaluation Libra on Custom Datasets

Training/Validation Dataset Format

Convert your train/valid data to a JSON file of a List of all samples. Sample metadata should contain id (a unique identifier), image (the path to the image or images), and conversations (the conversation data between human and AI).

Here's a sample JSON for finetuning Libra for Radiology Report Generation:

[
    {
        "id": 12345678,
        "image": [
            "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg",
            "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg"
        ],
        "conversations": [
            {
                "from": "human",
                "value": "<image>\nProvide a detailed description of the findings in the radiology image. Following clinical context: History: Chest tightness."
            },
            {
                "from": "gpt",
                "value": "The heart and mediastinum are normal. The lung fields are clear. The costophrenic angles are sharp. No infiltrates are present. There is no evidence of a pneumothorax."
            }
        ]
    },
    ...
]
If there is only one image, you only need to pass a single path to the "image".
[   
    {
        ...
        "image": [
            "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg"
        ]
        ...
    } 
]

Evaluation Dataset Format

Convert your eval data to a JSON Line file of a List of all samples. Sample metadata should contain question_id (a unique identifier), image (the path to the image or images), and text (the prompt or question to be answered by the model). The reference (the ground truth answer) is optional and can be used for scoring evaluations.

Here's a sample JSONL for evaluating Libra for Radiology Report Generation:

{"question_id": 12345678, "image": ["files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg", "files/p19/p19586697/s50637770/efb2c222-0fe78b2f-2bd67556-d10e01d8-72e87669.jpg"], "text": "Provide a detailed description of the findings in the radiology image. Following clinical context: History: Chest tightness.", "reference": "The heart and mediastinum are normal. The lung fields are clear. The costophrenic angles are sharp. No infiltrates are present. There is no evidence of a pneumothorax."}
...

Tips

Finetuning with Limited Data

If you have limited task-specific data, we recommend finetuning from Libra checkpoints using LoRA. Follow this finetune_lora.sh script.

Finetuning with Sufficient Data

If you have sufficient task-specific data, you can perform full-model finetuning from Libra checkpoints. Follow this finetune.sh script.

Hyperparameter Adjustment

Adjust the hyperparameters to fit your specific dataset and hardware constraints.