-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Issue
With the following schema, unexpected load happens to the tables which don't have a parent-child relationship with a target table.
CREATE TABLE l1_P (
Id INT64,
) PRIMARY KEY(Id);
CREATE TABLE l1_P_l2_A (
Id INT64,
) PRIMARY KEY(Id),
INTERLEAVE IN PARENT l1_P ON DELETE NO ACTION;
CREATE TABLE l1_P_l2_B (
Id INT64,
) PRIMARY KEY(Id),
INTERLEAVE IN PARENT l1_P ON DELETE NO ACTION;
CREATE TABLE l1_P_l2_B_l3_X (
Id INT64,
) PRIMARY KEY(Id),
INTERLEAVE IN PARENT l1_P_l2_B ON DELETE NO ACTION;
CREATE TABLE l1_P_l2_B_l3_Y (
Id INT64,
) PRIMARY KEY(Id),
INTERLEAVE IN PARENT l1_P_l2_B ON DELETE NO ACTION;
CREATE TABLE l1_P_l2_C (
Id INT64,
) PRIMARY KEY(Id),
INTERLEAVE IN PARENT l1_P ON DELETE NO ACTION;
# Case 1)
Expected:
- l1_P
- l1_P_l2_A
$ ./gcsb load -p MY_PROJECT -i gcsbtest -d db1 -t l1_P_l2_A -o 1
:
2023/01/10 21:29:07 Executing load phase
2023/01/10 21:29:07 +-----------+------------+------+-------+---------+
2023/01/10 21:29:07 | TABLE | OPERATIONS | READ | WRITE | CONTEXT |
2023/01/10 21:29:07 +-----------+------------+------+-------+---------+
2023/01/10 21:29:07 | l1_P_l2_A | 5 | N/A | N/A | LOAD |
2023/01/10 21:29:07 | l1_P | 1 | N/A | N/A | LOAD |
2023/01/10 21:29:07 | l1_P_l2_C | 5 | N/A | N/A | LOAD |
2023/01/10 21:29:07 +-----------+------------+------+-------+---------+
2023/01/10 21:29:07 +-----------------------+-------+--------------+--------------+--------------+-----------+--------------+--------------+--------------+
2023/01/10 21:29:07 | METRIC | COUNT | MIN | MAX | MEAN | STDDEV | MEDIAN | 95% | 99% |
2023/01/10 21:29:07 +-----------------------+-------+--------------+--------------+--------------+-----------+--------------+--------------+--------------+
2023/01/10 21:29:07 | schema.inference | 1 | 6.955609345s | 6.955609345s | 6.955609345s | 0s | 6.955609345s | 6.955609345s | 6.955609345s |
2023/01/10 21:29:07 | run | 1 | 43.32597ms | 43.32597ms | 43.32597ms | 0s | 43.32597ms | 43.32597ms | 43.32597ms |
2023/01/10 21:29:07 | operations.read.data | 0 | 0s | 0s | 0s | 0s | 0s | 0s | 0s |
2023/01/10 21:29:07 | operations.read.time | 0 | 0s | 0s | 0s | 0s | 0s | 0s | 0s |
2023/01/10 21:29:07 | operations.write.data | 11 | 4.421µs | 57.797µs | 21.242µs | 17.532µs | 10.785µs | 57.797µs | 57.797µs |
2023/01/10 21:29:07 | operations.write.time | 11 | 16.706932ms | 36.485169ms | 25.84112ms | 5.87515ms | 25.223941ms | 36.485169ms | 36.485169ms |
2023/01/10 21:29:07 +-----------------------+-------+--------------+--------------+--------------+-----------+--------------+--------------+--------------+
# Case 2)
Expected
- l1_P
- l1_P_l2_B
- l1_P_l2_B_l3_X
$ ./gcsb load -p MY_PROJECT -i gcsbtest -d db1 -t l1_P_l2_B_l3_X -o 1
:
2023/01/10 21:29:42 Executing load phase
2023/01/10 21:29:42 +----------------+------------+------+-------+---------+
2023/01/10 21:29:42 | TABLE | OPERATIONS | READ | WRITE | CONTEXT |
2023/01/10 21:29:42 +----------------+------------+------+-------+---------+
2023/01/10 21:29:42 | l1_P_l2_B_l3_X | 5 | N/A | N/A | LOAD |
2023/01/10 21:29:42 | l1_P | 1 | N/A | N/A | LOAD |
2023/01/10 21:29:42 | l1_P_l2_C | 5 | N/A | N/A | LOAD |
2023/01/10 21:29:42 +----------------+------------+------+-------+---------+
2023/01/10 21:29:42 +-----------------------+-------+--------------+--------------+--------------+------------+--------------+--------------+--------------+
2023/01/10 21:29:42 | METRIC | COUNT | MIN | MAX | MEAN | STDDEV | MEDIAN | 95% | 99% |
2023/01/10 21:29:42 +-----------------------+-------+--------------+--------------+--------------+------------+--------------+--------------+--------------+
2023/01/10 21:29:42 | schema.inference | 1 | 6.292371079s | 6.292371079s | 6.292371079s | 0s | 6.292371079s | 6.292371079s | 6.292371079s |
2023/01/10 21:29:42 | run | 1 | 30.534362ms | 30.534362ms | 30.534362ms | 0s | 30.534362ms | 30.534362ms | 30.534362ms |
2023/01/10 21:29:42 | operations.read.data | 0 | 0s | 0s | 0s | 0s | 0s | 0s | 0s |
2023/01/10 21:29:42 | operations.read.time | 0 | 0s | 0s | 0s | 0s | 0s | 0s | 0s |
2023/01/10 21:29:42 | operations.write.data | 11 | 4.999µs | 44.35µs | 22.819µs | 15.622µs | 29.294µs | 44.35µs | 44.35µs |
2023/01/10 21:29:42 | operations.write.time | 11 | 15.929295ms | 23.794854ms | 20.231022ms | 2.644308ms | 20.875363ms | 23.794854ms | 23.794854ms |
2023/01/10 21:29:42 +-----------------------+-------+--------------+--------------+--------------+------------+--------------+--------------+--------------+
# Case 3)
Expected:
- l1_P
- l1_P_l2_B
$ ./gcsb load -p MY_PROJECT -i gcsbtest -d db1 -t l1_P_l2_B -o 1
:
2023/01/10 21:33:11 Executing load phase
2023/01/10 21:33:11 +-----------+------------+------+-------+---------+
2023/01/10 21:33:11 | TABLE | OPERATIONS | READ | WRITE | CONTEXT |
2023/01/10 21:33:11 +-----------+------------+------+-------+---------+
2023/01/10 21:33:11 | l1_P_l2_B | 5 | N/A | N/A | LOAD |
2023/01/10 21:33:11 | l1_P | 1 | N/A | N/A | LOAD |
2023/01/10 21:33:11 | l1_P_l2_C | 5 | N/A | N/A | LOAD |
2023/01/10 21:33:11 +-----------+------------+------+-------+---------+
2023/01/10 21:33:11 +-----------------------+-------+--------------+--------------+--------------+------------+--------------+--------------+--------------+
2023/01/10 21:33:11 | METRIC | COUNT | MIN | MAX | MEAN | STDDEV | MEDIAN | 95% | 99% |
2023/01/10 21:33:11 +-----------------------+-------+--------------+--------------+--------------+------------+--------------+--------------+--------------+
2023/01/10 21:33:11 | schema.inference | 1 | 6.686091806s | 6.686091806s | 6.686091806s | 0s | 6.686091806s | 6.686091806s | 6.686091806s |
2023/01/10 21:33:11 | run | 1 | 34.79919ms | 34.79919ms | 34.79919ms | 0s | 34.79919ms | 34.79919ms | 34.79919ms |
2023/01/10 21:33:11 | operations.read.data | 0 | 0s | 0s | 0s | 0s | 0s | 0s | 0s |
2023/01/10 21:33:11 | operations.read.time | 0 | 0s | 0s | 0s | 0s | 0s | 0s | 0s |
2023/01/10 21:33:11 | operations.write.data | 11 | 30.599µs | 41.917µs | 35.147µs | 3.826µs | 34.844µs | 41.917µs | 41.917µs |
2023/01/10 21:33:11 | operations.write.time | 11 | 15.64244ms | 25.150921ms | 22.768607ms | 2.641282ms | 23.397655ms | 25.150921ms | 25.150921ms |
2023/01/10 21:33:11 +-----------------------+-------+--------------+--------------+--------------+------------+--------------+--------------+--------------+
Cause
(t *table) GetAllRelationNames() relies on child table information of an apex table.
Lines 359 to 369 in 0a71c86
| func (t *table) GetAllRelationNames() []string { | |
| apex := t.GetApex() | |
| ret := []string{apex.Name()} | |
| child := apex.Child() | |
| for ok := true; ok; ok = (child != nil) { | |
| ret = append(ret, child.Name()) | |
| child = child.Child() | |
| } | |
| return ret |
The child table gets overwritten whenever a different interleaved table is found in creating parental relationships.
Lines 87 to 104 in 0a71c86
| func (t *tables) Traverse() error { | |
| // Iterate over tables setting parental relationships | |
| for _, child := range t.tables { | |
| if child.ParentName() != "" { | |
| // fetch the parent table | |
| parent := t.GetTable(child.ParentName()) | |
| if parent == nil { | |
| return fmt.Errorf("table '%s' references a parent table '%s' that is not in information schema", child.Name(), child.ParentName()) | |
| } | |
| // Set parent as this tables parent | |
| child.SetParent(parent) | |
| // Set parents child | |
| parent.SetChildName(child.Name()) | |
| parent.SetChild(child) | |
| } | |
| } |
Therefore,
l1_P_l2_C eventually becomes the child of the table l1_P. This info was always used on the cases above where loading l1_P_l2_C was not needed in the interleaving chain for a target table.
Moreover, if the unrelated l1_P_l2_C has some other child tables, load operations happen to the descendant tables as well.
The issue is troublesome to in terms of the workload and execution time. And this is risky when the unexpected load happen to an existing production DB.
Solution
It doesn't look appropriate to define and rely on a 1:1 parent-child relationship because the actual relationship is 1:many.
So the needed work is:
- Use parent info instead of child info in GetAllRelationNames.
- Change the table struct so that it can contain multiple children or just count the number of child tables. It seems the child info is stored to detemine if a table is interleaved or not.
Line 76 in 0a71c86
child Table
Lines 323 to 325 in 0a71c86
// IsInterleaved will return true if the table has a parent or child func (t *table) IsInterleaved() bool { return t.HasChild() || t.HasParent()
If this is the only purpose to store the child info, just store the number of child tables should be enough instead of having thesecandchild.
Lines 75 to 76 in 0a71c86
c string // child name child Table
Remarks
- This issue can also be the cause of an implied bug and the workaround at https://github.com/cloudspannerecosystem/gcsb/blob/master/pkg/workload/core.go#L165-L167
if n == t { // Avoid inserting t twice for some reason... i dont have time to figure out why this is happenign
continue
}
- Another weird thing is why Case 2) succeeds even though there was no load operation to l1_P_l2_B according to the stats.