Optimize sqlite database size and speed by bgyori · Pull Request #168 · gyorilab/gilda

bgyori · 2026-05-29T19:48:01Z

This PR makes optimizations to the sqlite-based back-end for Gilda. The previous approach represented each norm_text as one row with a list of corresponding JSON-serialized Terms as a single value in another column. This is convenient and maps intuitively to the Dict[str, List[Term]] data structure the sqlite adapter stands in for. But all the JSON keys being stored redundantly are taking up a large amount of space.

In this PR, the sqlite schema is changed such that we have one row per Term and each Term attribute has its own column. This significantly cuts down on db size (old: 655MB, new: 331MB) and the net speed of retrieval (i.e., norm_text -> list of deserialized Term objects) is also slightly improved.

bgyori added 3 commits May 29, 2026 15:21

Change sqlite schema to save space/time

77374b8

Bump version

2558301

Simplify and optimize some more

9951fd6

bgyori merged commit aec566e into master May 29, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize sqlite database size and speed#168

Optimize sqlite database size and speed#168
bgyori merged 3 commits into
masterfrom
sqlite_optim

bgyori commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bgyori commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant