Skip to content

Commit cb413f8

Browse files
committed
feat(io): add import/export functionality for JSON and CSV
- Implemented methods for exporting and importing data in JSON Lines and CSV formats. - Added support for filtered exports using metadata filters. - Included batch processing for efficient handling of large datasets. - Created comprehensive tests for import/export functionality. - Updated documentation and examples to reflect new features.
1 parent 23ae35e commit cb413f8

File tree

7 files changed

+729
-2
lines changed

7 files changed

+729
-2
lines changed

CHANGELOG.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,31 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1818
- Comprehensive test coverage for metadata filtering (unit, integration, and security tests)
1919
- New example `advanced_metadata_queries.py` demonstrating nested paths and complex queries
2020
- Updated `metadata_filtering.py` example with new filtering methods
21+
- **Export/import functionality**: New methods for data portability
22+
- `export_to_json()` - Export records to JSON Lines format
23+
- `import_from_json()` - Import records from JSON Lines format
24+
- `export_to_csv()` - Export records to CSV format
25+
- `import_from_csv()` - Import records from CSV format
26+
- Support for filtered exports using metadata filters
27+
- Batch processing for memory-efficient import/export of large datasets
28+
- Optional embedding inclusion in exports
29+
- New `io` module with import/export utilities
30+
- Comprehensive tests for import/export functionality
31+
- New example `export_import_example.py` demonstrating backup/restore workflows
2132

2233
### Security
2334
- SQL injection prevention in metadata filter keys
2435
- Validation of JSON paths to prevent malicious queries
2536
- Parameterized queries for all metadata filtering operations
2637

38+
### Improved
39+
- Data migration workflows now much easier
40+
- Backup and restore capabilities for production use
41+
- Interoperability with external systems via CSV/JSON
42+
2743
### Documentation
2844
- Added "Metadata Filtering" section to README with examples
45+
- Added "Export/Import" section to README with examples
2946
- Updated examples list in README
3047
- Added comprehensive docstrings for new methods
3148

@@ -190,7 +207,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
190207

191208
## Version History
192209

193-
- **2.2.0** - Added metadata filtering with JSON_EXTRACT support
210+
- **2.2.0** - Added metadata filtering with JSON_EXTRACT support and export/import functionality
194211
- **2.1.1** - Moved table name validation to create_table()
195212
- **2.1.0** - Added connection pooling support
196213
- **2.0.0** - Major refactor: simplified API, removed niche methods, cleaner naming

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,37 @@ rows = client.get_many(rowids)
6666
client.close()
6767
```
6868

69+
## Export/Import
70+
71+
Export and import data in JSON or CSV formats for backups, migrations, and data sharing:
72+
73+
```python
74+
# Export to JSON (includes embeddings)
75+
count = client.export_to_json("backup.jsonl")
76+
77+
# Export to CSV (human-readable, optional embeddings)
78+
count = client.export_to_csv("data.csv", include_embeddings=False)
79+
80+
# Export filtered data
81+
count = client.export_to_json(
82+
"important.jsonl",
83+
filters={"priority": "high"}
84+
)
85+
86+
# Import from JSON
87+
count = client.import_from_json("backup.jsonl")
88+
89+
# Import from CSV
90+
count = client.import_from_csv("data.csv")
91+
92+
# Backup and restore workflow
93+
client.export_to_json("backup.jsonl")
94+
# ... data loss ...
95+
client.import_from_json("backup.jsonl")
96+
```
97+
98+
See [examples/export_import_example.py](examples/export_import_example.py) for more examples.
99+
69100
## Metadata Filtering
70101

71102
Efficiently filter records by metadata fields using SQLite's JSON functions:
@@ -255,6 +286,7 @@ Edit [benchmarks/config.yaml](benchmarks/config.yaml) to customize:
255286
- [basic_usage.py](examples/basic_usage.py) - Basic CRUD operations
256287
- [metadata_filtering.py](examples/metadata_filtering.py) - Metadata filtering and queries
257288
- [advanced_metadata_queries.py](examples/advanced_metadata_queries.py) - Advanced metadata filtering with nested paths
289+
- [export_import_example.py](examples/export_import_example.py) - Export/import data in JSON and CSV formats
258290
- [transaction_example.py](examples/transaction_example.py) - Transaction management with all CRUD operations
259291
- [batch_operations.py](examples/batch_operations.py) - Bulk operations
260292
- [logging_example.py](examples/logging_example.py) - Logging configuration

TODO

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,8 @@
7272
- [x] Partial search on JSON metadata (JSON_EXTRACT)
7373
- [x] Metadata field filtering (key-value based)
7474
- [x] Transaction context manager
75+
- [x] Export/import functions (JSON, CSV)
7576
- [ ] Async/await support (aiosqlite)
76-
- [ ] Export/import functions (JSON, CSV)
7777
- [ ] Table migration utilities
7878
- [ ] Backup/restore functions
7979

examples/export_import_example.py

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
"""Export/import example for sqlite-vec-client.
2+
3+
Demonstrates:
4+
- Exporting data to JSON and CSV
5+
- Importing data from JSON and CSV
6+
- Filtered exports
7+
- Backup and restore workflows
8+
"""
9+
10+
from sqlite_vec_client import SQLiteVecClient
11+
12+
13+
def main():
14+
# Create and populate database
15+
client = SQLiteVecClient(table="documents", db_path=":memory:")
16+
client.create_table(dim=128, distance="cosine")
17+
18+
texts = [
19+
"Introduction to Python",
20+
"Advanced JavaScript",
21+
"Python for Data Science",
22+
"Java Programming Guide",
23+
"Machine Learning with Python",
24+
]
25+
26+
embeddings = [[0.1 * i] * 128 for i in range(len(texts))]
27+
28+
metadata = [
29+
{"category": "python", "level": "beginner"},
30+
{"category": "javascript", "level": "advanced"},
31+
{"category": "python", "level": "intermediate"},
32+
{"category": "java", "level": "beginner"},
33+
{"category": "python", "level": "advanced"},
34+
]
35+
36+
client.add(texts=texts, embeddings=embeddings, metadata=metadata)
37+
print(f"Added {client.count()} documents\n")
38+
39+
# Example 1: Export all data to JSON
40+
print("=== Export to JSON ===")
41+
count = client.export_to_json("backup.jsonl")
42+
print(f"Exported {count} records to backup.jsonl\n")
43+
44+
# Example 2: Export filtered data to JSON
45+
print("=== Export Filtered Data ===")
46+
count = client.export_to_json("python_docs.jsonl", filters={"category": "python"})
47+
print(f"Exported {count} Python documents to python_docs.jsonl\n")
48+
49+
# Example 3: Export to CSV (without embeddings for readability)
50+
print("=== Export to CSV ===")
51+
count = client.export_to_csv("documents.csv", include_embeddings=False)
52+
print(f"Exported {count} records to documents.csv\n")
53+
54+
# Example 4: Export to CSV with embeddings
55+
print("=== Export to CSV with Embeddings ===")
56+
count = client.export_to_csv("documents_full.csv", include_embeddings=True)
57+
print(f"Exported {count} records with embeddings to documents_full.csv\n")
58+
59+
# Example 5: Backup and restore workflow
60+
print("=== Backup and Restore Workflow ===")
61+
62+
# Backup
63+
print("Creating backup...")
64+
client.export_to_json("backup_full.jsonl")
65+
66+
# Simulate data loss
67+
print("Simulating data loss...")
68+
original_count = client.count()
69+
for rowid in range(1, original_count + 1):
70+
client.delete(rowid)
71+
print(f"Records after deletion: {client.count()}")
72+
73+
# Restore
74+
print("Restoring from backup...")
75+
count = client.import_from_json("backup_full.jsonl")
76+
print(f"Restored {count} records")
77+
print(f"Records after restore: {client.count()}\n")
78+
79+
# Example 6: Data migration scenario
80+
print("=== Data Migration Scenario ===")
81+
82+
# Export from source
83+
print("Exporting from source database...")
84+
client.export_to_json("migration.jsonl")
85+
86+
# Import to new database (simulated with same client)
87+
print("Importing to destination database...")
88+
# In real scenario, you would create a new client with different db_path
89+
# new_client = SQLiteVecClient(table="documents", db_path="new.db")
90+
# new_client.create_table(dim=128)
91+
# new_client.import_from_json("migration.jsonl")
92+
print("Migration complete!\n")
93+
94+
# Example 7: Filtered export for data sharing
95+
print("=== Export Subset for Sharing ===")
96+
count = client.export_to_csv(
97+
"beginner_docs.csv", include_embeddings=False, filters={"level": "beginner"}
98+
)
99+
print(f"Exported {count} beginner-level documents for sharing\n")
100+
101+
# Cleanup
102+
print("=== Cleanup ===")
103+
import os
104+
105+
for file in [
106+
"backup.jsonl",
107+
"python_docs.jsonl",
108+
"documents.csv",
109+
"documents_full.csv",
110+
"backup_full.jsonl",
111+
"migration.jsonl",
112+
"beginner_docs.csv",
113+
]:
114+
if os.path.exists(file):
115+
os.remove(file)
116+
print(f"Removed {file}")
117+
118+
client.close()
119+
print("\nExample complete!")
120+
121+
122+
if __name__ == "__main__":
123+
main()

sqlite_vec_client/base.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
import sqlite_vec
1818

19+
from . import io as io_module
1920
from .exceptions import ConnectionError as VecConnectionError
2021
from .exceptions import TableNotFoundError
2122
from .logger import get_logger
@@ -635,6 +636,80 @@ def similarity_search_with_filter(
635636
) from e
636637
raise
637638

639+
def export_to_json(
640+
self,
641+
filepath: str,
642+
include_embeddings: bool = True,
643+
filters: dict[str, Any] | None = None,
644+
batch_size: int = 1000,
645+
) -> int:
646+
"""Export records to JSON Lines format.
647+
648+
Args:
649+
filepath: Path to output file
650+
include_embeddings: Whether to include embeddings in export
651+
filters: Optional metadata filters to apply
652+
batch_size: Number of records to process at once
653+
654+
Returns:
655+
Number of records exported
656+
"""
657+
return io_module.export_to_json(
658+
self, filepath, include_embeddings, filters, batch_size
659+
)
660+
661+
def import_from_json(
662+
self, filepath: str, skip_duplicates: bool = False, batch_size: int = 1000
663+
) -> int:
664+
"""Import records from JSON Lines format.
665+
666+
Args:
667+
filepath: Path to input file
668+
skip_duplicates: Whether to skip records with existing rowids
669+
batch_size: Number of records to import at once
670+
671+
Returns:
672+
Number of records imported
673+
"""
674+
return io_module.import_from_json(self, filepath, skip_duplicates, batch_size)
675+
676+
def export_to_csv(
677+
self,
678+
filepath: str,
679+
include_embeddings: bool = False,
680+
filters: dict[str, Any] | None = None,
681+
batch_size: int = 1000,
682+
) -> int:
683+
"""Export records to CSV format.
684+
685+
Args:
686+
filepath: Path to output file
687+
include_embeddings: Whether to include embeddings (as JSON string)
688+
filters: Optional metadata filters to apply
689+
batch_size: Number of records to process at once
690+
691+
Returns:
692+
Number of records exported
693+
"""
694+
return io_module.export_to_csv(
695+
self, filepath, include_embeddings, filters, batch_size
696+
)
697+
698+
def import_from_csv(
699+
self, filepath: str, skip_duplicates: bool = False, batch_size: int = 1000
700+
) -> int:
701+
"""Import records from CSV format.
702+
703+
Args:
704+
filepath: Path to input file
705+
skip_duplicates: Whether to skip records with existing rowids
706+
batch_size: Number of records to import at once
707+
708+
Returns:
709+
Number of records imported
710+
"""
711+
return io_module.import_from_csv(self, filepath, skip_duplicates, batch_size)
712+
638713
@contextmanager
639714
def transaction(self) -> Generator[None, None, None]:
640715
"""Context manager for atomic transactions.

0 commit comments

Comments
 (0)