Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 2.17 KB

relational-table-schema.md

File metadata and controls

35 lines (22 loc) · 2.17 KB

Relational Table Schema

We introduce the rough row-based encoding format in relational states

In this doc, we will take HashAgg with extreme state (max, min) or value state (sum, count) for example, and introduce a more detailed design for the internal table schema.

Code

Table id

table_id is a globally unique id allocated in meta for each relational table object. Meta is responsible for traversing the Plan Tree and calculating the total number of Relational Tables needed. For example, the Hash Join Operator needs 2, one for the left table and one for the right table. The number of tables needed for Agg depends on the number of agg calls.

Value State (Sum, Count)

Query example:

select sum(v2), count(v3) from t group by v1 

This query will need to initiate 2 Relational Tables. The schema is table_id/group_key.

Extreme State (Max, Min)

Query example:

select max(v2), min(v3) from t group by v1 

This query will need to initiate 2 Relational Tables. If the upstream is not append-only, the schema becomes table_id/group_key/sort_key/upstrea_pk.

The order of sort_key depends on the agg call kind. For example, if it's max(), sort_key will order with Ascending. if it's min(), sort_key will order with Descending. The upstream_pk is also appended to ensure the uniqueness of the key. This design allows the streaming executor not to read all the data from the storage when the cache fails, but only a part of it. The streaming executor will try to write all streaming data to storage, because there may be update or delete operations in the stream, it's impossible to always guarantee correct results without storing all data.

If t is created with append-only flag, the schema becomes table_id/group_key, which is the same for Value State. This is because in the append-only mode, there is no update or delete operation, so the cache will never miss. Therefore, we only need to write one value to the storage.