The OpenRefine AI Extension bridges the power of modern language models with OpenRefine's robust data transformation capabilities. This extension enables users to leverage any LLM provider that supports a chat completion API endpoint, bringing AI-powered data wrangling, enhancement, and analysis directly into your OpenRefine workflows. For more information, read the AI Column Extraction and Provider Setup guides in this repo.
The extension serves multiple purposes in the data processing pipeline:
- Intelligent Data Cleaning: Use LLMs to suggest and implement context-aware data cleaning operations that go beyond rule-based approaches.
- Semantic Enrichment: Enhance datasets by generating additional attributes or metadata based on existing content.
- Natural Language Transformations: Express complex data transformations in plain English.
- Anomaly Detection: Identify unusual patterns or outliers in your data through AI-powered analysis.
- Content Generation: Fill gaps in your datasets with contextually appropriate synthetic data.