Skip to content

StarFlow converts sketches and diagrams into structured workflows using fine-tuned vision–language models and a purpose-built dataset.

License

Notifications You must be signed in to change notification settings

ServiceNow/StarFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Setup

  1. Clone the repository
git clone https://github.com/ServiceNow/StarFlow.git
cd StarFlow
  1. Edit ~/.secret (create a new file if it does not exist)
export HF_TOKEN=<HF_TOKEN>
export WANDB_API_KEY=<WANDB_API_KEY>
export OPENROUTER_API_KEY=<OPENROUTER_API_KEY>
export OUTPUT_DIR=<OUTPUT_DIR>
...
  1. Edit ~/.bashrc (create a new file if it does not exist)
source ~/.secret
...
  1. Install packages
# for Llama, Qwen, Pixtral, and API models
bash installer/default/install.sh
# for Phi-3.5 model
bash installer/phi35/install.sh
# for Phi-4 model
bash installer/phi4/install.sh
# for DeepSeek models
bash installer/deepseek/install.sh

Training and Evaluation Guide

Commands

  1. Training
torchrun \
    --nproc-per-node 2 \
    starflow/pipeline/train.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/llama_32_11b.yaml \
    pipeline_config_file=starflow/config/pipeline/train.yaml
  1. Evaluation
torchrun \
    --nproc-per-node 2 \
    starflow/pipeline/evaluate.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/llama_32_11b.yaml \
    pipeline_config_file=starflow/config/pipeline/evaluate.yaml
  1. Evaluation for very large models (e.g. Llama-3.2-90B-Vision-Instruct)
python \
    starflow/pipeline/evaluate.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/llama_32_90b.yaml \
    pipeline_config_file=starflow/config/pipeline/evaluate.yaml
  1. Evaluation for API models (e.g. GPT-4o)
python \
    starflow/pipeline/evaluate_api.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/gpt_4o.yaml \
    pipeline_config_file=starflow/config/pipeline/evaluate_api.yaml

Notes

  1. Other models can be trained and evaluated by setting their config file path as the value of model_config_file.

  2. The values in the involved config files should be set properly before running training and evaluation.

Concept Introduction

StarFlow consists of four types of components: datasets, metrics, models, and pipelines.

Datasets

Datasets provide vision-language data for training and evaluation. They are encapsulated as sub-classes of VLDataset. For example, the BigDocs datasets are encapsulated as BigDocsDataset.

When instantiating a dataset, its data examples are first loaded from either Hugging Face or local storage, and then encapsulated as VLExample.

Each dataset comes with a config file, which specifies the settings for instantiating and using the dataset. For example, the config file for ServiceNow/BigDocs-Sketch2Flow is starflow/config/dataset/bigdocs_sketch2flow.yaml.

Metrics

Metrics compute performance numbers of models on datasets. They are encapsulated as sub-classes of VLMetric. For example, the Flow Similarity metric is encapsulated as FlowSimilarityMetric.

When using a metric to evaluate a model on a dataset, the metric compares the outputs of the model with the corresponding ground truths in the dataset and thereby obtains the performance numbers.

Each metric is applied to one or more datasets, and the settings for instantiating and using the metric are specifed in the config files of the target datasets. For example, the settings for FlowSimilarityMetric are specified in the config file of ServiceNow/BigDocs-Sketch2Flow (starflow/config/dataset/bigdocs_sketch2flow.yaml).

Models

Models generate textual outputs given vision-language inputs from datasets. They are encapsulated as sub-classes of VLModel, and their inputs are encapsulated as sub-classes of VLInput. For example, the Llama-3.2-Vision-Instruct models are encapsulated as LlamaModel, and their inputs are encapsulated as LlamaInput.

When training a model, a cross-entropy loss is obtained from the forward pass of the model, which is then optimized in the backward pass through gradient decent. When evaluating a model, the textual outputs of the model are processed by the applied metrics to compute performance numbers.

Each model comes with a config file, which specifies the settings for instantiating and using the model. For example, the config file for Llama-3.2-11B-Vision-Instruct is starflow/config/model/llama_32_11b.yaml.

A special category of models is API models, which can only be used through API calls. They are encapsulated as sub-classes of VLAPIModel, and each of them comes with a config file. For example, the OpenRouter-routed GPT-4o model is encapsulated as OpenRouterAPIModel, and its config file is starflow/config/model/gpt_4o.yaml. API models cannot be trained, but can still be evaluated.

Pipelines

Pipelines are Python scripts that execute complete processes with datasets, metrics, and models. There are three pipelines, each of with comes with a config file:

Citation

@article{bechard2025starflow,
  title={StarFlow: Generating Structured Workflow Outputs From Sketch Images},
  author={Bechard, Patrice and Wang, Chao and Abaskohi, Amirhossein and Rodriguez, Juan and Pal, Christopher and Vazquez, David and Gella, Spandana and Rajeswar, Sai and Taslakian, Perouz},
  journal={arXiv preprint arXiv:2503.21889},
  year={2025}
}

About

StarFlow converts sketches and diagrams into structured workflows using fine-tuned vision–language models and a purpose-built dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •