Skip to content

Conversation

@paulbauriegel
Copy link
Contributor

@paulbauriegel paulbauriegel commented Oct 16, 2025

Image Annotation for Argilla

This PR adds image annotation capabilities to Argilla, enabling users to create annotations via rectangles and polygons and holes on images for object detection, segmentation, and other computer vision tasks. The implementation uses the image field and konva.js to provide an interactive canvas based annotation. The annotations are stored in the labelme format with an optional extension for holes that should allow seamless integration with popular CV frameworks.

argilla

Key Features

🎨 Two Interactive Drawing Tools

  • Rectangle Tool: Click and drag to create bounding boxes for object detection
  • Polygon Tool: Click to add vertices, double-click or press Enter to complete, Escape to cancel
  • Advanced Polygon Features: Support for polygons with holes (cutouts) for complex segmentation tasks, add additional points to polygons during edit phase
  • Visual feedback with dashed preview lines, point markers, and real-time shape rendering during drawing

🖱️ UI Interactions

  • List of annotaions in question field: You jump between editing/deleting the annotations in the question field or fo it directly on the image field
  • Right-Click Context Menu: Delete/Edit annotations via context menu on the image field
  • Bidirectional Sync: Hovering in the annotation list highlights the corresponding annotation on the canvas
  • Keyboard Shortcuts: Fast annotation workflow with keyboard support (Enter to complete, Escape to cancel, Delete to remove)
  • Responsive Canvas: Automatically adapts to window resizing while maintaining annotation accuracy

🔧 Technical Implementation

Frontend Architecture:

  • Built on Konva.js for canvas rendering and interaction
  • Modular tool system with factory pattern (AnnotationToolFactory, BaseAnnotationTool, IAnnotationTool)
  • Separate tools for different shape types: RectangleTool, PolygonTool
  • Composables for reusable functionality:
    • useImageLoader: Handles image loading and error states
    • useContextMenu: Manages right-click context menu positioning
    • useKeyboardShortcuts: Centralized keyboard event handling
    • useKonvaStage: Stage initialization and management
    • useResize: Responsive canvas resizing
  • Utility modules:
    • coordinates.ts: Coordinate transformation between canvas and image space
    • geometry.ts: Geometric calculations (point-in-polygon, distance, etc.)
    • holeCreationUtils.ts: Complex polygon hole creation logic
    • konvaShapes.ts: Konva shape rendering utilities
  • AnnotationRenderer.ts - Handles rendering of annotations on Konva canvas

Backend Architecture:

  • Question Type: ImageAnnotationQuestion with configurable settings
  • Schema Validation: Comprehensive validation for annotation data structure
    • Label validation against configured options
    • Shape type validation (rectangle, polygon, circle, line, point)
    • Points format validation based on shape type
    • Field existence validation
  • Labelme Format: Standard annotation format with fields:
    • label: Annotation class label
    • points: Coordinates in [[x1,y1], [x2,y2], ...] format
    • shape_type: Shape identifier (rectangle, polygon, circle, line, point)
    • group_id: Optional grouping identifier
    • flags: Additional metadata dictionary
    • holes: Optional list of hole polygons for complex shapes

Data Flow:

  1. User creates ImageAnnotationQuestion linked to an ImageField
  2. Frontend renders interactive canvas with drawing tools
  3. Annotations stored in labelme format in responses
  4. Backend validates annotations against question settings
  5. Annotations can be exported for training CV models

API Usage Examples

Creating a Dataset

"""
Simple script to create image annotation dataset
Run this first before logging records
"""

import argilla as rg

# Initialize Argilla client
client = rg.Argilla(
    api_url="http://localhost:3000",
    api_key="argilla.apikey"
)

# Define the dataset settings
settings = rg.Settings(
    fields=[
        rg.ImageField(name="image", title="Image"),
        rg.TextField(name="persona", title="Prompt"),
    ],
    questions=[
        rg.ImageAnnotationQuestion(
            name="objects",
            field="image",
            labels=["person", "face", "object", "background", "other"],
            title="Annotate objects in the image",
            required=True,
            allow_multiple=True,
            shape_types=["rectangle", "polygon"],
        ),
    ],
)

# Create the dataset
dataset = rg.Dataset(
    name="image-annotation-test-ds",
    settings=settings,
    workspace="argilla",
    client=client,
)

dataset.create()

# Add Records
from datasets import load_dataset
ds = load_dataset("dvilasuero/finepersonas-v0.1-tiny-flux-schnell", name="default", split="train")
dataset.records.log(ds, mapping={})

PS: You should also be able to create dataset with labelme preannotaions

Testing

Manual Testing:

docker run -d --name redis docker.io/library/redis
docker run -d \
  --name elasticsearch \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m -XX:UseSVE=0" \
  -e "CLI_JAVA_OPTS=-XX:UseSVE=0" \
  -e "node.name=elasticsearch" \
  -e "cluster.name=es-argilla-local" \
  -e "discovery.type=single-node" \
  -e "cluster.routing.allocation.disk.threshold_enabled=false" \
  -e "xpack.security.enabled=false" \
  --ulimit memlock=-1:-1 \
  -v elasticdata:/usr/share/elasticsearch/data/ \
  docker.elastic.co/elasticsearch/elasticsearch:8.17.0
pdm server-dev
npm run dev

and then via Firefox

Unit Tests:

  • PolygonTool.test.ts - Polygon tool functionality
  • RectangleTool.test.ts - Rectangle tool functionality

Future Testing Needs:

  • E2E tests for complete annotation workflow
  • Integration tests for backend validators
  • Performance tests with large numbers of annotations
  • Cross-browser compatibility tests

Known Limitations & Future Enhancements

Current Limitations:

  • No undo/redo functionality (planned)
  • No zoom/pan on canvas (planned)
  • No brush/eraser tool for mask annotations (planned)
  • Linestring/Point shape type

Planned Enhancements (Not in this PR):

  • Undo/redo interface on image edit
  • Zoom into image capability

Breaking Changes

None. This is a new feature that doesn't affect existing functionality.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • I confirm my changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works (unit tests for tools)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

PS: I'm not a front end designer so please let me know if something should be changed :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant