Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2023 Case Failed]: aggregation-func & group-by & complex-sub-query #38

Closed
luooofan opened this issue Oct 27, 2023 · 2 comments · Fixed by #42
Closed

[2023 Case Failed]: aggregation-func & group-by & complex-sub-query #38

luooofan opened this issue Oct 27, 2023 · 2 comments · Fixed by #42

Comments

@luooofan
Copy link
Owner

luooofan commented Oct 27, 2023

base on:

description:
先做的 aggr func,没有 group by,这个时候对空数据行的聚集函数运算结果应该是 0 或者 null

select count(id) from exp_table where 6/1*6 < 7+col3*col3/2;
- 0
-- below are some requests executed before(partial) --
-- init data
create table exp_table(id int, col1 int, col2 int, col3 float, col4 float);
create table exp_table2(id int, col1 int);
insert into exp_table VALUES (6, 9, 1, 4.93, 8.98);
insert into exp_table VALUES (3, 2, 6, 2.05, 1.24);
insert into exp_table VALUES (7, 4, 9, 3.32, 8.98);
...

后来有一次提测 group by 挂了:

select id, sum(score) from t_group_by where id>9 group by id;
+ | NULL
-- below are some requests executed before(partial) --
-- init data
create table t_group_by (id int not null, score float not null, name char(1) null);
create table t_group_by_2 (id int not null, age int not null);
insert into t_group_by VALUES (4, 3.40, 'U');
insert into t_group_by VALUES (3, 4.06, 'V');
insert into t_group_by VALUES (4, 3.11, 'F');
...

整理后

create table t_group_by (id int not null, score float not null, name char(1) null);
create table t_group_by_2 (id int not null, age int not null);
insert into t_group_by VALUES (4, 3.40, 'U');
insert into t_group_by VALUES (3, 4.06, 'V');
select id, sum(score) from t_group_by where id>9 group by id;
create table t1(c1 int primary key, c2 int);
select count(c2) from t1 where c1 > 2 group by c1;
select max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c2) from t1 where c1 > 2 group by c1;
insert into t1 values(1,2),(2,3),(3,4),(0,1),(4,NULL),(5,NULL),(6,NULL);
select count(c2) from t1 where c1 > 2 group by c1;
select max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c1) from t1 where c1 > 2 group by c1;
select c1, max(c2) from t1 where c1 > 2 group by c1;
select count(c2) from t1 where c1 > 20 group by c1;
select max(c1) from t1 where c1 > 20 group by c1;
select c1, max(c1) from t1 where c1 > 20 group by c1;
select c1, max(c2) from t1 where c1 > 20 group by c1;

与 mysql 对比测试发现,带 group by 的情况下,空数据行应该不返回结果

但我们直接把它改成了空数据行的情况下返回 RECORD_EOF

然后出现了新的问题:

select * from csq_1 where feat1 <> (select min(csq_2.feat2) from csq_2 where csq_2.feat2 > csq_1.feat1);
- 61 | 35 | 8.64
- 68 | 37 | 63.64
- 85 | 79 | 61.71
- 98 | 25 | 58.94
-- below are some requests executed before(partial) --
-- init data
CREATE TABLE csq_1(id int, col1 int, feat1 float);
CREATE TABLE csq_2(id int, col2 int, feat2 float);
CREATE TABLE csq_3(id int, col3 int, feat3 float);
CREATE TABLE csq_4(id int, col4 int, feat4 float);
INSERT INTO csq_1 VALUES (96, 94, 86.30);
...

由于子查询中聚集函数对空数据行进行计算直接返回 RECORD_EOF,父查询处理谓词的时候直接向上返回 RECORD_EOF,所以少处理了一部分数据

image

@luooofan
Copy link
Owner Author

luooofan commented Oct 27, 2023

总结:

  • 没有 group by 子句时,聚集函数处理空数据行应该返回 0 或者 null
  • 有 group by 子句时,聚集函数处理空数据行应该什么都不返回
  • 父子查询中,普通子查询(非 IN EXISTS 子查询)返回空时应该设置 null

@luooofan luooofan changed the title [2023 Case Failed]: aggregation-func & group-by [2023 Case Failed]: aggregation-func & group-by & complex-sub-query Oct 27, 2023
@luooofan
Copy link
Owner Author

RC GroupByPhysicalOperator::next()
{
if (is_record_eof_) {
return RC::RECORD_EOF;
}
RC rc = RC::SUCCESS;
if (is_first_) {
rc = children_[0]->next();
// maybe empty. count(x) -> 0
if (RC::SUCCESS != rc) {
if (RC::RECORD_EOF == rc) {
is_record_eof_ = true;
if (groupby_fields_.empty()) {
tuple_.do_aggregate_done();
return RC::SUCCESS;
}
}
return rc;
}

auto get_value = [&tuple](const std::unique_ptr<Expression>& expr, Value& value) {
RC rc = expr->get_value(tuple, value);
if (expr->type() == ExprType::SUBQUERY && RC::RECORD_EOF == rc) {
value.set_null();
rc = RC::SUCCESS;
}
return rc;
};
rc = get_value(left_, left_value);
if (rc != RC::SUCCESS) {
LOG_WARN("failed to get value of left expression. rc=%s", strrc(rc));
return rc;
}
if (left_subquery_expr && left_subquery_expr->has_more_row(tuple)) {
return RC::INVALID_ARGUMENT;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant