This folder contains the visual analogy tasks used for the paper Im-Promptu: In-Context Composition from Image Prompts. Each benchmark has been separately described below. Each visual analogy suite is divided into two broad kind of analogies depending on the underlying relation - Primitive and Composite tasks
(1) Primitive: A single image attribute is modified at a time. For example, the color of the object is changed from red to blue.
(2) Composite: Multiple image attributes are modified at a time. For example, the color of the object is changed from red to blue and the scene orientation is changed from -15 degrees to +15 degrees.
The link to the benchmark is here
This dataset consists of static scenes of various objects lying on a colored floor in front of a colored wall viewed from different orientations. To set up primitive tasks, we consider the four composable properties: {object color, wall color, floor color, scene orientation}
Source: 3D Shapes Dataset
Primitive Task: {Property} | {Domain Size}
Object Color -- 10
Wall Color -- 10
Floor Color -- 10
Scene Orientation -- 15
Examples per task: 3
Number of Training Tasks: 80000
Number of Primitive Extrapolation Tasks: 1000
Number of Composite Extrapolation Tasks: 1000
Primitive analogy that modifies the object color
Primitive analogy that modifies the wall color
Primitive analogy that modifies the floor color
Primitive analogy that modifies the scene orientation
This notebook explains the dataset structure.
The link to the benchmark is here
BitMoji is an avatar creator service for social media users that allows them to create intricate cartoon faces. We create visual analogies using four underlying dynamic elements of the avatar: {skin tone, hair type, facial hair type, eyewear type}.
Source: BitMoji API
Primitive Task: {Property} | {Domain Size}
Skin Tone -- 3
Hair Style -- 10
Facial Hair -- 5
Eyewear -- 5
Examples per task: 3
Number of Training Tasks: 80000
Number of Primitive Extrapolation Tasks: 1000
Number of Composite Extrapolation Tasks: 1000
Primitive analogy that modifies the skin tone
Primitive analogy that modifies the hair style
Primitive analogy that modifies the beard style
Primitive analogy that modifies the eye wear type
This notebook explains the dataset structure.
Files can be downloaded from here
CLEVr is a popular visual question-answering dataset with the visual component consisting of multiple objects lying in a scene. We use the CLEVr rendering engine to set up primitive tasks that include adding and deleting the same object across various scenes.
Source: CLEVr Dataset
Primitive Task: {Property} | {Domain Size}
Add object -- 1000
Delete Object -- 1000
Examples per task: 3
Number of Training Tasks: 55000
Number of Primitive Extrapolation Tasks: 1000
Number of Composite Extrapolation Tasks: 200
Primitive analogy that adds an object
Primitive analogy that deletes an object
Cite our work using the following bibtex entry:
@misc{dedhia2023impromptu,
title={Im-Promptu: In-Context Composition from Image Prompts},
author={Bhishma Dedhia and Michael Chang and Jake C. Snell and Thomas L. Griffiths and Niraj K. Jha},
year={2023},
eprint={2305.17262},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This notebook explains the dataset structure.
The Clear BSD License Copyright (c) 2023, Bhishma Dedhia and Jha Lab. All rights reserved.
See License file for more details.