Skip to content

Commit c605cd6

Browse files
authored
Merge pull request #5 from atasoglu/v2.3.0
feat: add backup and restore functionality
2 parents c2cdd92 + e421652 commit c605cd6

File tree

15 files changed

+512
-9
lines changed

15 files changed

+512
-9
lines changed

.github/workflows/test.yml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,10 @@ jobs:
3737
run: mypy sqlite_vec_client/
3838

3939
- name: Test with pytest
40-
run: pytest --cov=sqlite_vec_client --cov-report=term
40+
run: pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml
41+
42+
- name: Upload coverage report
43+
uses: actions/upload-artifact@v4
44+
with:
45+
name: coverage-${{ matrix.python-version }}
46+
path: coverage.xml

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,25 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [2.3.0] - 2025-02-15
9+
10+
### Added
11+
- High-level `backup()` and `restore()` helpers wrapping JSONL/CSV workflows
12+
- MkDocs documentation scaffold with API reference, operations playbook, and migration guide
13+
- Backup/restore coverage in the integration test suite
14+
15+
### Fixed
16+
- Enforced embedding dimension validation across add/update/search operations
17+
- `import_from_json()` and `import_from_csv()` now respect `skip_duplicates` and emit clear errors when embeddings are missing
18+
19+
### Documentation
20+
- New migration guide outlining v2.3.0 changes
21+
- Expanded README with backup helper examples and coverage instructions
22+
- Requirements updated with MkDocs to build the documentation locally
23+
24+
### CI
25+
- Pytest coverage step now generates XML output and uploads `coverage.xml` as a GitHub Actions artifact
26+
827
## [2.2.0] - 2025-02-01
928

1029
### Added

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ A lightweight Python client around [sqlite-vec](https://github.com/asg017/sqlite
1616
- **Filtering helpers**: Fetch by `rowid`, `text`, or `metadata`.
1717
- **Pagination & sorting**: List records with `limit`, `offset`, and order.
1818
- **Bulk operations**: Efficient `update_many()`, `get_all()` generator, and transaction support.
19+
- **Backup tooling**: High-level `backup()` and `restore()` helpers for disaster recovery workflows.
1920

2021
## Requirements
2122
- Python 3.9+
@@ -97,6 +98,20 @@ client.import_from_json("backup.jsonl")
9798

9899
See [examples/export_import_example.py](examples/export_import_example.py) for more examples.
99100

101+
### Quick backup & restore helpers
102+
103+
```python
104+
# Create a JSONL backup
105+
client.backup("backup.jsonl")
106+
107+
# Restore later (optionally skip duplicates)
108+
client.restore("backup.jsonl", skip_duplicates=True)
109+
110+
# Work with CSV
111+
client.backup("backup.csv", format="csv", include_embeddings=True)
112+
client.restore("backup.csv", format="csv", skip_duplicates=True)
113+
```
114+
100115
## Metadata Filtering
101116

102117
Efficiently filter records by metadata fields using SQLite's JSON functions:
@@ -227,10 +242,11 @@ pytest -m unit # Unit tests only
227242
pytest -m integration # Integration tests only
228243
```
229244

230-
**Run with coverage report:**
245+
**Coverage (terminal + XML for CI):**
231246
```bash
232-
pytest --cov=sqlite_vec_client --cov-report=html
247+
pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml
233248
```
249+
The CI workflow uploads the generated `coverage.xml` as an artifact for downstream dashboards.
234250

235251
**Run specific test file:**
236252
```bash
@@ -282,6 +298,7 @@ Edit [benchmarks/config.yaml](benchmarks/config.yaml) to customize:
282298
- [CONTRIBUTING.md](CONTRIBUTING.md) - Contribution guidelines
283299
- [CHANGELOG.md](CHANGELOG.md) - Version history
284300
- [TESTING.md](TESTING.md) - Testing documentation
301+
- [Docs site (MkDocs)](docs/index.md) - Serve locally with `mkdocs serve`
285302
- [Examples](examples/) - Usage examples
286303
- [basic_usage.py](examples/basic_usage.py) - Basic CRUD operations
287304
- [metadata_filtering.py](examples/metadata_filtering.py) - Metadata filtering and queries

TODO

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@
3535
### Documentation
3636
- [x] Create CONTRIBUTING.md
3737
- [x] Start CHANGELOG.md
38-
- [ ] API reference documentation (Sphinx or MkDocs)
39-
- [ ] Migration guide (for version updates)
38+
- [x] API reference documentation (Sphinx or MkDocs)
39+
- [x] Migration guide (for version updates)
4040

4141
## 🟢 Medium Priority (Development & Tooling)
4242

@@ -75,7 +75,7 @@
7575
- [x] Export/import functions (JSON, CSV)
7676
- [ ] Async/await support (aiosqlite)
7777
- [ ] Table migration utilities
78-
- [ ] Backup/restore functions
78+
- [x] Backup/restore functions
7979

8080
### API Improvements
8181
- [x] Optimized methods for bulk operations
@@ -102,7 +102,7 @@
102102

103103
## 📊 Metrics & Monitoring
104104

105-
- [ ] Code coverage tracking
105+
- [x] Code coverage tracking
106106
- [ ] Performance metrics
107107
- [ ] Download statistics (PyPI)
108108
- [ ] Issue response time tracking

docs/api.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
++ docs/api.md
2+
# API Reference
3+
4+
The `sqlite_vec_client` package exposes a single high-level class and a few helpers.
5+
This page captures the behaviour most consumers rely on.
6+
7+
## sqlite_vec_client.SQLiteVecClient
8+
9+
```python
10+
from sqlite_vec_client import SQLiteVecClient
11+
```
12+
13+
### Constructor
14+
15+
`SQLiteVecClient(table: str, db_path: str | None = None, pool: ConnectionPool | None = None)`
16+
17+
- Validates table name and establishes a connection (or borrows from the supplied pool).
18+
- Loads the `sqlite-vec` extension and configures pragmas for performance.
19+
20+
### create_table
21+
22+
`create_table(dim: int, distance: Literal["L1", "L2", "cosine"] = "cosine") -> None`
23+
24+
Creates the base table, vector index, and triggers that keep embeddings in sync.
25+
26+
### add
27+
28+
`add(texts: list[str], embeddings: list[list[float]], metadata: list[dict] | None = None) -> list[int]`
29+
30+
- Validates that all embeddings match the configured dimension.
31+
- Serialises metadata and embeddings and returns the new rowids.
32+
33+
### similarity_search / similarity_search_with_filter
34+
35+
- Both methods require embeddings that match the table dimension.
36+
- Filtering variant accepts the same metadata constraints as `filter_by_metadata`.
37+
38+
### backup / restore
39+
40+
High-level helpers that wrap JSONL/CSV export/import:
41+
42+
```python
43+
client.backup("backup.jsonl")
44+
client.restore("backup.jsonl")
45+
46+
client.backup("backup.csv", format="csv", include_embeddings=True)
47+
client.restore("backup.csv", format="csv", skip_duplicates=True)
48+
```
49+
50+
### Transactions
51+
52+
`with client.transaction(): ...` wraps operations in a BEGIN/COMMIT pair and rolls back on error.
53+
54+
### Connection Management
55+
56+
- `client.close()` returns the connection to the pool (if configured) or closes it outright.
57+
- Connections emit debug logs to help trace lifecycle events.
58+
59+
## Exceptions
60+
61+
- `VecClientError` — base class for client-specific errors.
62+
- `ValidationError` — invalid user input.
63+
- `TableNotFoundError` — operations attempted before `create_table`.
64+
- `DimensionMismatchError` — embeddings do not match the table dimension.
65+
66+
## Utilities
67+
68+
- `serialize_f32` / `deserialize_f32` convert embeddings to/from blobs.
69+
- Metadata helpers build safe JSON filter clauses.
70+
71+
Refer to `sqlite_vec_client/utils.py` for implementation details.

docs/guides/migration.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
++ docs/guides/migration.md
2+
# Migration Guide
3+
4+
## Upgrading to v2.3.0
5+
6+
### Embedding Dimension Validation
7+
8+
- All write and search operations now validate embedding length against the table
9+
dimension. Existing databases created with `create_table` are supported automatically,
10+
but manual schemas must follow the `float[dim]` declaration used by `sqlite-vec`.
11+
- Action: ensure any custom tooling or fixtures produce embeddings with the expected
12+
dimension before calling client methods.
13+
14+
### Import Behaviour
15+
16+
- `import_from_json` and `import_from_csv` honour `skip_duplicates`, skipping records
17+
whose rowids already exist.
18+
- Importers now require embeddings to be present; CSV sources exported without the
19+
`embedding` column raise a descriptive error.
20+
- Action: export backups with `include_embeddings=True` if you intend to re-import them.
21+
22+
### Backup & Restore Helpers
23+
24+
- New `backup()` and `restore()` helpers wrap JSONL/CSV workflows and log the format
25+
being used. Prefer these helpers for consistent backup scripts.
26+
27+
### Continuous Coverage
28+
29+
- The CI pipeline now uploads `coverage.xml` as an artifact. Configure downstream
30+
tooling (Codecov, Sonar, etc.) to consume the artifact if you need external reporting.
31+
32+
## General Advice
33+
34+
- Always run `pytest --cov=sqlite_vec_client --cov-report=xml` before publishing.
35+
- Keep `requirements-dev.txt` up-to-date locally to build the documentation site with
36+
`mkdocs serve`.

docs/index.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
++ docs/index.md
2+
# sqlite-vec-client Documentation
3+
4+
Welcome to the project documentation. This site complements the information in `README.md`
5+
and focuses on how to operate the client in real-world scenarios.
6+
7+
## Highlights
8+
9+
- Lightweight CRUD and similarity search API powered by `sqlite-vec`
10+
- Typed results for safer integrations
11+
- Bulk operations, metadata filters, and transaction helpers
12+
- New backup/restore helpers to streamline disaster recovery
13+
14+
## Quick Links
15+
16+
- [API Reference](api.md) — method-by-method contract details
17+
- [Migration Guide](guides/migration.md) — upgrade notes for the latest releases
18+
- [Operational Playbook](operations.md) — checklists for testing, backups, and restore
19+
20+
## Building the Docs
21+
22+
```bash
23+
pip install -r requirements-dev.txt
24+
mkdocs serve
25+
```
26+
27+
The site is served at `http://127.0.0.1:8000` by default.

docs/operations.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
++ docs/operations.md
2+
# Operational Playbook
3+
4+
## Testing
5+
6+
- Run `pytest --cov=sqlite_vec_client --cov-report=term-missing --cov-report=xml`.
7+
- Upload the generated `coverage.xml` as part of your CI artifacts (handled automatically
8+
in the GitHub Actions workflow).
9+
- For environments without the native `sqlite-vec` extension, rely on the mocked tests
10+
planned in the roadmap or disable integration markers temporarily.
11+
12+
## Backups
13+
14+
```python
15+
client.backup("backup.jsonl")
16+
client.backup("backup.csv", format="csv", include_embeddings=True)
17+
```
18+
19+
- JSONL is recommended for long-term storage (embeddings stay in human-readable lists).
20+
- CSV is convenient for spreadsheets but still requires embeddings for restore.
21+
22+
## Restore & Disaster Recovery
23+
24+
```python
25+
client.restore("backup.jsonl")
26+
client.restore("backup.csv", format="csv", skip_duplicates=True)
27+
```
28+
29+
- Use `skip_duplicates=True` when replaying backups into a database that may contain
30+
partial data (e.g., after a failed migration).
31+
32+
## Observability
33+
34+
- Set `SQLITE_VEC_CLIENT_LOG_LEVEL=DEBUG` in the environment to trace connection
35+
lifecycle and queries during incident response.
36+
- Logs include connection open/close events and count of rows processed during imports
37+
and exports.

mkdocs.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
site_name: sqlite-vec-client
2+
site_url: https://atasoglu.github.io/sqlite-vec-client/
3+
repo_url: https://github.com/atasoglu/sqlite-vec-client
4+
theme:
5+
name: mkdocs
6+
nav:
7+
- Overview: index.md
8+
- API Reference:
9+
- SQLiteVecClient: api.md
10+
- Guides:
11+
- Migration Guide: guides/migration.md
12+
- Operations: operations.md
13+
markdown_extensions:
14+
- admonition

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "sqlite-vec-client"
7-
version = "2.2.0"
7+
version = "2.3.0"
88
description = "A lightweight Python client around sqlite-vec for CRUD and similarity search."
99
readme = "README.md"
1010
requires-python = ">=3.9"

0 commit comments

Comments
 (0)