[2023 Topic]: Sort #30

Jingluo-nan · 2023-10-24T02:35:53Z

topic: Sort

description:
去年的实现是在 sort open 的时候把所有 record 复制一份缓存下来,构成一个 record vector，对排序列 + record 的索引进行排序，之后根据排序后的索引从缓存的 record vector中取record 向上返回。
今年考虑到 big order ，那么缓存的 record 中就应该只存储 select 子句需要输出的内容。所以需要提取 select clase 中所有的 FieldExpr 传递给 order by 算子。

Jingluo-nan · 2023-10-26T03:21:51Z

针对big order，可以发现是在 Orderby 排序完成后, ProjectTuple取数据时时间过长，导致超时。

因为 FieldExpr 每次调用 get_value(tuple,value)获取 value 时都要构造一个 TupleCellSpec,根据表名、列名取 tuple 中取数据，这会导致大量的字符串比较。

优化思路是只在第一次调用 get_value时走原始流程，并且会返回一个指向该 value 的 index，那么接下来再次调用 get_value，就会通过 cell_at(index,value) 获取 value，避免了大量的字符串比较。

RC FieldExpr::get_value(const Tuple &tuple, Value &value) const
{
  if(is_first_)
  {
    bool & is_first_ref = const_cast<bool&>(is_first_);
    is_first_ref = false;
    return tuple.find_cell(TupleCellSpec(table_name(), field_name()), value,const_cast<int&>(index_));
  }
  else
  {
    return tuple.cell_at(index_,value);
  }
}

luooofan · 2023-10-27T18:03:16Z

PS1：瓶颈主要在时间，而不在内存，只拷贝有必要数据的方案，在 select * 的情况下没有节省内存，反而由于没有去重（有必要数据和 order by fields 之间的重复）导致了内存使用量和内存拷贝操作的增加

PS2：由于测试的随机性，我们最初的实现就是有几率能过 big order by 提测的

简单跑了一下火焰图如图：

根据图示做了上述优化，后面跑 big order by 确实是能稳定过了，新跑火焰图瓶颈也不再 TupleCellSpec 的构造和=比较这里

PS3：实现 order by 算子的时候是在 open 的时候预取、拷贝数据进行排序的，结合后面子查询考虑的话应该把这个操作放在第一次 next 的时候（根据我们的具体实现是应该这样，不过测例里子查询中不会出现 order by group by）参见 #39

PS4：WSL2 安装 perf https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13?permalink_comment_id=4532886#gistcomment-4532886

# windows
wsl --update 
# wsl 2
sudo apt update
sudo apt install flex bison 
sudo apt install libdwarf-dev libelf-dev libnuma-dev libunwind-dev \
libnewt-dev libdwarf++0 libelf++0 libdw-dev libbfb0-dev \
systemtap-sdt-dev libssl-dev libperl-dev python-dev-is-python3 \
binutils-dev libiberty-dev libzstd-dev libcap-dev libbabeltrace-dev
git clone https://github.com/microsoft/WSL2-Linux-Kernel --depth 1
cd WSL2-Linux-Kernel/tools/perf
make -j8 # parallel build
sudo cp perf /usr/local/bin

Jingluo-nan · 2023-10-29T03:53:56Z

优化完后的火焰图如下：

可以看到此时主要耗时发生在 fetch_and_sort_tables中。

Jingluo-nan added the topic label Oct 24, 2023

luooofan closed this as completed Oct 27, 2023

luooofan mentioned this issue Oct 27, 2023

[2023 PR]: pass order-by & big-order-by & group-by & complex-sub-query #42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2023 Topic]: Sort #30

[2023 Topic]: Sort #30

Jingluo-nan commented Oct 24, 2023

Jingluo-nan commented Oct 26, 2023

luooofan commented Oct 27, 2023 •

edited

Loading

Jingluo-nan commented Oct 29, 2023

[2023 Topic]: Sort #30

[2023 Topic]: Sort #30

Comments

Jingluo-nan commented Oct 24, 2023

Jingluo-nan commented Oct 26, 2023

luooofan commented Oct 27, 2023 • edited Loading

Jingluo-nan commented Oct 29, 2023

luooofan commented Oct 27, 2023 •

edited

Loading