Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2023 Others]: rethink select stmt resolver #40

Open
luooofan opened this issue Oct 27, 2023 · 1 comment
Open

[2023 Others]: rethink select stmt resolver #40

luooofan opened this issue Oct 27, 2023 · 1 comment
Labels

Comments

@luooofan
Copy link
Owner

luooofan commented Oct 27, 2023

description:

现在的实现虽然能过测例,但是有问题,重新捋一下

理想中的解析顺序:

  1. from clause:判断表存在与否,别名重复等,并生成一个表集 -> process_from_clause
  2. where clause:-> create filter stmt
    • 检查 conditon 表达式:根据 table map 检查字段,检查系统函数,子查询 resolve 等等
  3. group by clause:-> create group by stmt
    • 检查 group by 表达式:根据 table map 检查字段,检查系统函数等等,没有子查询
  4. having clause:-> create filter stmt
    • 检查 having 表达式:基础检查 + 需要考虑 group by 表达式,要保证每一个表达式的值在每一个 group 是唯一的
  5. order by clause:-> create order by stmt
    • 检查 order by 表达式:基础检查 + 需要考虑 group by 表达式,要保证每一个表达式的值在每一个 group 是唯一的
  6. project clause:-> process project clause
    • 检查 project 表达式:基础检查 + 需要考虑 group by 表达式,要保证每一个表达式的值在每一个 group 是唯一的

关于提取表达式,提取的目的是为了让上层算子能拿到想要的数据,取决于具体实现:

  • order by 只拷贝上层所需要的数据,所以要提取
    • project 下挂的 order by:要提取 project 中的所有 aggr func expr 和不在聚集函数中的 field expr
    • group by 下挂的 order by:要提取 having project groupby 中所有的 field expr,不用提取 aggr func expr
  • group by 要提取出现在 having orderby project 中的
    • 所有聚集函数表达式,要在 group by 算子里为每一个 group 计算结果
    • 所有不在聚集函数中的字段表达式,要在 group by 算子中为每一个 group 保存其唯一的值
    • 所有字段表达式(能通过前两项提取出),是为了 group by 算子中的 order by 算子(取决于 order by 的具体实现)

PS:

  • 有 aggr func expr 或者有 group by 字段:需要生成 group by stmt
  • group by stmt 中可以包含一个 order by stmt(而不是等到生成逻辑计划树的时候再挂算子),在有 group by 字段的时候 create
  • 最开始在实现 join table 的时候偷懒在 resolver 这里做了下推,导致这边的复杂度增加(PS:不过后来得知今年不做下推也能过 join tables,超时时间变长了,所以后续重构的时候把下推的逻辑删了)

和现有实现相比:

  • 处理顺序
  • 要各自在 create 内部做检查,检查函数分两个即可;而现在是处理 group by 的时候全都提取出来做检查
  • 边解析边提取边丢给 group by 或者 order by;而现在是处理 group by 和 order by 的时候一次性提取,总的来说会多次提取
@luooofan
Copy link
Owner Author

根据捋完的结果,写一份大致的伪代码:
各部分处理各自的语义解析,在函数内部完成,检查逻辑一样的话可以复用检查函数

auto tables, table_map
auto normal_check_func = [tables, table_map] (expr) { ... }

rc = process_from_clause(from_relations, tables, table_map, normal_check_func) -> join_tables

rc = create_filter_stmt(conditions, normal_check_func) -> filter_stmt

rc = create_groupby_stmt(groupbys, normal_check_func) -> groupby_stmt, groupby_exprs

auto normal_check_with_groupby = [tables, table_map, groupby_exprs] (expr) { ... }

rc = create_filter_stmt(havings, normal_check_with_groupby) -> having_stmt, aggr_exprs, field_exprs_not_in_aggr
if (groupby_stmt) { groupby_stmt->add_exprs(aggr_exprs, field_exprs_not_in_aggr) }

rc = create_orderby_stmt(orderbys, normal_check_with_groupby) -> orderby_stmt, aggr_exprs, field_exprs_not_in_aggr
if (groupby_stmt) { groupby_stmt->add_exprs(aggr_exprs, field_exprs_not_in_aggr) }

rc = process_project_clause(projects, normal_check_with_groupby) -> project_exprs, aggr_exprs, field_exprs_not_in_aggr
if (groupby_stmt) { groupby_stmt->add_exprs(aggr_exprs, field_exprs_not_in_aggr) }
if (orderby_stmt) { orderby_stmt->add_exprs(aggr_exprs, field_exprs_not_in_aggr) }

return select_stmt(join_tables, filter_stmt, groupby_stmt, having_stmt, orderby_stmt, project_exprs)
create_groupby_stmt(groupbys, normal_check_func) {
  rc = groupbys.for_each(traverse_check(normal_check_func))
  return groupby_stmt(groupbys, orderby_stmt(groupbys))
}

groupby_stmt::add_exprs(aggr_exprs, field_exprs_not_in_aggr) {
  this->add_aggr_exprs(aggr_exprs)
  this->add_field_exprs(field_exprs_not_in_aggr)
  this->orderby_stmt->add_field_exprs(field_exprs_not_in_aggr)
  this->orderby_stmt->add_field_exprs(extract_field_exprs(aggr_fields))
}

时间有限,想法可能也不完善,就不重构这边了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant