Skip to content

Latest commit

 

History

History

benchmarks

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Im-Promptu: Visual Analogy Suite

This folder contains the visual analogy tasks used for the paper Im-Promptu: In-Context Composition from Image Prompts. Each benchmark has been separately described below. Each visual analogy suite is divided into two broad kind of analogies depending on the underlying relation - Primitive and Composite tasks

(1) Primitive: A single image attribute is modified at a time. For example, the color of the object is changed from red to blue.

(2) Composite: Multiple image attributes are modified at a time. For example, the color of the object is changed from red to blue and the scene orientation is changed from -15 degrees to +15 degrees.

Table of Contents

3D Shapes Visual Analogy Benchmark

The link to the benchmark is here

Description

This dataset consists of static scenes of various objects lying on a colored floor in front of a colored wall viewed from different orientations. To set up primitive tasks, we consider the four composable properties: {object color, wall color, floor color, scene orientation}

Source: 3D Shapes Dataset
Primitive Task: {Property} | {Domain Size}
       Object Color -- 10
       Wall Color -- 10
       Floor Color -- 10
       Scene Orientation -- 15
Examples per task: 3
Number of Training Tasks: 80000
Number of Primitive Extrapolation Tasks: 1000
Number of Composite Extrapolation Tasks: 1000 $\forall$ k $\in$ {2,3,4}

Examples

object color

Primitive analogy that modifies the object color

wall color

Primitive analogy that modifies the wall color

object color

Primitive analogy that modifies the floor color

orientation

Primitive analogy that modifies the scene orientation

Jupyter Notebook

This notebook explains the dataset structure.

BitMoji Visual Analogy Benchmark

The link to the benchmark is here

Description

BitMoji is an avatar creator service for social media users that allows them to create intricate cartoon faces. We create visual analogies using four underlying dynamic elements of the avatar: {skin tone, hair type, facial hair type, eyewear type}.

Source: BitMoji API
Primitive Task: {Property} | {Domain Size}
       Skin Tone -- 3
       Hair Style -- 10
       Facial Hair -- 5
       Eyewear -- 5
Examples per task: 3
Number of Training Tasks: 80000
Number of Primitive Extrapolation Tasks: 1000
Number of Composite Extrapolation Tasks: 1000 $\forall$ k $\in$ {2,3}

Examples

skin tone

Primitive analogy that modifies the skin tone

hair style

Primitive analogy that modifies the hair style

beard type

Primitive analogy that modifies the beard style

eyewear

Primitive analogy that modifies the eye wear type

Jupyter Notebook

This notebook explains the dataset structure.

CLEVr Visual Analogy

Files can be downloaded from here

Description

CLEVr is a popular visual question-answering dataset with the visual component consisting of multiple objects lying in a scene. We use the CLEVr rendering engine to set up primitive tasks that include adding and deleting the same object across various scenes.

Source: CLEVr Dataset
Primitive Task: {Property} | {Domain Size}
       Add object -- 1000
       Delete Object -- 1000
Examples per task: 3
Number of Training Tasks: 55000
Number of Primitive Extrapolation Tasks: 1000
Number of Composite Extrapolation Tasks: 200 $\forall$ k $\in$ {2,3}

Examples

clevr add

Primitive analogy that adds an object

clevr delete

Primitive analogy that deletes an object

Cite this work

Cite our work using the following bibtex entry:

@misc{dedhia2023impromptu,
      title={Im-Promptu: In-Context Composition from Image Prompts}, 
      author={Bhishma Dedhia and Michael Chang and Jake C. Snell and Thomas L. Griffiths and Niraj K. Jha},
      year={2023},
      eprint={2305.17262},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Jupyter Notebook

This notebook explains the dataset structure.

License

The Clear BSD License Copyright (c) 2023, Bhishma Dedhia and Jha Lab. All rights reserved.

See License file for more details.