Aggregate Table Compaction #7

ayuhito · 2024-04-26T21:20:49Z

Data is usually not modified after being appended, especially after a period of 24 hours. It might make sense to run background jobs to compact all similar rows into an aggregate table that is partitioned by hours.

While we might need minute-by-minute granularity for our timestamps for the first 24 hours, I think it is acceptable to reduce the granularity we store past 24 hours into hourly chunks instead. However, we will have to limit the API for any queries requesting for minute granularity data past the 24 hour mark. The benefits are that it would significantly improve performance and storage usage for larger sites, as we would no longer need to store as much duplicate data and process as many rows.

DuckDB does not support aggregate tables, thus we'll have to run our own compaction jobs, which aren't too heavy of an operation. For reference, here are the documentation for StarRocks and ClickHouse natively supported aggregated tables.

ayuhito added core Related to core performance Performance-related labels Apr 26, 2024

ayuhito mentioned this issue Jun 19, 2024

feat(core): remove beacon id and use new duration table #29

Closed

ayuhito mentioned this issue Jul 18, 2024

📋 Medama - Roadmap #99

Open

ayuhito mentioned this issue Jul 30, 2024

Large Website Benchmarking #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregate Table Compaction #7

Aggregate Table Compaction #7

ayuhito commented Apr 26, 2024

Aggregate Table Compaction #7

Aggregate Table Compaction #7

Comments

ayuhito commented Apr 26, 2024