Skip to content

Optimize sqlite database size and speed#168

Merged
bgyori merged 3 commits into
masterfrom
sqlite_optim
May 29, 2026
Merged

Optimize sqlite database size and speed#168
bgyori merged 3 commits into
masterfrom
sqlite_optim

Conversation

@bgyori
Copy link
Copy Markdown
Member

@bgyori bgyori commented May 29, 2026

This PR makes optimizations to the sqlite-based back-end for Gilda. The previous approach represented each norm_text as one row with a list of corresponding JSON-serialized Terms as a single value in another column. This is convenient and maps intuitively to the Dict[str, List[Term]] data structure the sqlite adapter stands in for. But all the JSON keys being stored redundantly are taking up a large amount of space.

In this PR, the sqlite schema is changed such that we have one row per Term and each Term attribute has its own column. This significantly cuts down on db size (old: 655MB, new: 331MB) and the net speed of retrieval (i.e., norm_text -> list of deserialized Term objects) is also slightly improved.

@bgyori bgyori merged commit aec566e into master May 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant