StarFlow: Generating Structured Workflow Outputs From Sketch Images

Setup

Clone the repository

git clone https://github.com/ServiceNow/StarFlow.git
cd StarFlow

Edit ~/.secret (create a new file if it does not exist)

export HF_TOKEN=<HF_TOKEN>
export WANDB_API_KEY=<WANDB_API_KEY>
export OPENROUTER_API_KEY=<OPENROUTER_API_KEY>
export OUTPUT_DIR=<OUTPUT_DIR>
...

Edit ~/.bashrc (create a new file if it does not exist)

source ~/.secret
...

Install packages

# for Llama, Qwen, Pixtral, and API models
bash installer/default/install.sh
# for Phi-3.5 model
bash installer/phi35/install.sh
# for Phi-4 model
bash installer/phi4/install.sh
# for DeepSeek models
bash installer/deepseek/install.sh

Training and Evaluation Guide

Commands

Training

torchrun \
    --nproc-per-node 2 \
    starflow/pipeline/train.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/llama_32_11b.yaml \
    pipeline_config_file=starflow/config/pipeline/train.yaml

Evaluation

torchrun \
    --nproc-per-node 2 \
    starflow/pipeline/evaluate.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/llama_32_11b.yaml \
    pipeline_config_file=starflow/config/pipeline/evaluate.yaml

Evaluation for very large models (e.g. Llama-3.2-90B-Vision-Instruct)

python \
    starflow/pipeline/evaluate.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/llama_32_90b.yaml \
    pipeline_config_file=starflow/config/pipeline/evaluate.yaml

Evaluation for API models (e.g. GPT-4o)

python \
    starflow/pipeline/evaluate_api.py \
    dataset_config_file=starflow/config/dataset/bigdocs_sketch2flow.yaml \
    model_config_file=starflow/config/model/gpt_4o.yaml \
    pipeline_config_file=starflow/config/pipeline/evaluate_api.yaml

Notes

Other models can be trained and evaluated by setting their config file path as the value of model_config_file.
The values in the involved config files should be set properly before running training and evaluation.

Concept Introduction

StarFlow consists of four types of components: datasets, metrics, models, and pipelines.

Datasets

Datasets provide vision-language data for training and evaluation. They are encapsulated as sub-classes of VLDataset. For example, the BigDocs datasets are encapsulated as BigDocsDataset.

When instantiating a dataset, its data examples are first loaded from either Hugging Face or local storage, and then encapsulated as VLExample.

Each dataset comes with a config file, which specifies the settings for instantiating and using the dataset. For example, the config file for ServiceNow/BigDocs-Sketch2Flow is starflow/config/dataset/bigdocs_sketch2flow.yaml.

Metrics

Metrics compute performance numbers of models on datasets. They are encapsulated as sub-classes of VLMetric. For example, the Flow Similarity metric is encapsulated as FlowSimilarityMetric.

When using a metric to evaluate a model on a dataset, the metric compares the outputs of the model with the corresponding ground truths in the dataset and thereby obtains the performance numbers.

Each metric is applied to one or more datasets, and the settings for instantiating and using the metric are specifed in the config files of the target datasets. For example, the settings for FlowSimilarityMetric are specified in the config file of ServiceNow/BigDocs-Sketch2Flow (starflow/config/dataset/bigdocs_sketch2flow.yaml).

Models

Models generate textual outputs given vision-language inputs from datasets. They are encapsulated as sub-classes of VLModel, and their inputs are encapsulated as sub-classes of VLInput. For example, the Llama-3.2-Vision-Instruct models are encapsulated as LlamaModel, and their inputs are encapsulated as LlamaInput.

When training a model, a cross-entropy loss is obtained from the forward pass of the model, which is then optimized in the backward pass through gradient decent. When evaluating a model, the textual outputs of the model are processed by the applied metrics to compute performance numbers.

Each model comes with a config file, which specifies the settings for instantiating and using the model. For example, the config file for Llama-3.2-11B-Vision-Instruct is starflow/config/model/llama_32_11b.yaml.

A special category of models is API models, which can only be used through API calls. They are encapsulated as sub-classes of VLAPIModel, and each of them comes with a config file. For example, the OpenRouter-routed GPT-4o model is encapsulated as OpenRouterAPIModel, and its config file is starflow/config/model/gpt_4o.yaml. API models cannot be trained, but can still be evaluated.

Pipelines

Pipelines are Python scripts that execute complete processes with datasets, metrics, and models. There are three pipelines, each of with comes with a config file:

Training pipeline: the pipeline for training a model on a dataset. It is implemented as starflow/pipeline/train.py, and its config file is starflow/config/pipeline/train.yaml.
Evaluation pipeline: the pipeline for evaluating a model on a dataset with the applied metrics. It is implemented as starflow/pipeline/evaluate.py, and its config file is starflow/config/pipeline/evaluate.yaml.
API model evaluation pipeline: the pipeline for evaluating an API model on a dataset with the applied metrics. It is implemented as starflow/pipeline/evaluate_api.py, and its config file is starflow/config/pipeline/evaluate_api.yaml.

Citation

@article{bechard2025starflow,
  title={StarFlow: Generating Structured Workflow Outputs From Sketch Images},
  author={Bechard, Patrice and Wang, Chao and Abaskohi, Amirhossein and Rodriguez, Juan and Pal, Christopher and Vazquez, David and Gella, Spandana and Rajeswar, Sai and Taslakian, Perouz},
  journal={arXiv preprint arXiv:2503.21889},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
docs		docs
installer		installer
starflow		starflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Setup

Training and Evaluation Guide

Commands

Notes

Concept Introduction

Datasets

Metrics

Models

Pipelines

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ServiceNow/StarFlow

Folders and files

Latest commit

History

Repository files navigation

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Setup

Training and Evaluation Guide

Commands

Notes

Concept Introduction

Datasets

Metrics

Models

Pipelines

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages