Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate Table Compaction #7

Open
ayuhito opened this issue Apr 26, 2024 · 0 comments
Open

Aggregate Table Compaction #7

ayuhito opened this issue Apr 26, 2024 · 0 comments
Labels
core Related to core performance Performance-related

Comments

@ayuhito
Copy link
Member

ayuhito commented Apr 26, 2024

Data is usually not modified after being appended, especially after a period of 24 hours. It might make sense to run background jobs to compact all similar rows into an aggregate table that is partitioned by hours.

While we might need minute-by-minute granularity for our timestamps for the first 24 hours, I think it is acceptable to reduce the granularity we store past 24 hours into hourly chunks instead. However, we will have to limit the API for any queries requesting for minute granularity data past the 24 hour mark. The benefits are that it would significantly improve performance and storage usage for larger sites, as we would no longer need to store as much duplicate data and process as many rows.

DuckDB does not support aggregate tables, thus we'll have to run our own compaction jobs, which aren't too heavy of an operation. For reference, here are the documentation for StarRocks and ClickHouse natively supported aggregated tables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core performance Performance-related
Projects
None yet
Development

No branches or pull requests

1 participant