-
Notifications
You must be signed in to change notification settings - Fork 1.5k
TopK dynamic filter pushdown attempt 2 #15770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter { | |||
|
|||
context | |||
.transform_up(|node| { | |||
if node.plan.as_any().downcast_ref::<FilterExec>().is_some() { | |||
if node.plan.as_any().downcast_ref::<FilterExec>().is_some() || node.plan.as_any().downcast_ref::<SortExec>().is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@berkaysynnada I didn't notice this in the original PR. This seems problematic. IMO doing downcast matching here is a smell that the API needs changing. It limits implementations to a hardcoded list of plans, which defeats the purpose of making DataFusion pluggable / having a dyn ExecutionPlan
. The original implementation didn't require this. I think this goes hand-in hand with the revisit
parameter. It seems that you were able to get from 3 methods down to 2 by replacing one of them with this downcast matching and the other with the extra recursion via the revisit
parameter. It would be great to iterate on this and find a way to avoid the downcast matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. We can run this pushdown logic on every operator actually, but then it will work in worst-time complexity always. I've shared the solution of removing revisit parameter, and let me open an issue for that. I strongly believe it will be taken and implemented in short time by some people.
To remove these downcasts, I think we can either introduce a new method to the API just returning a boolean saying that "this operator might introduce a filter or not", or try to understand that by the existing API's, maybe with some refactor. Do you have an idea for the latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose an API something like this:
trait ExecutionPlan {
fn gather_filters_for_pushdown(
&self,
parent_filters: &[Arc<dyn ExecutionPlan>],
) -> Result<FilterPushdownPlan> {
let unsupported = vec![FilterPushdownSupport::Unsupported; parent_filters.len()];
Ok(
FilterPushdownPlan {
parent_filters_for_children: vec![unsupported; self.children().len()],
self_filters_for_children: vec![vec![]; self.children().len()],
},
)
}
fn propagate_filter_pushdown(
&self,
parent_pushdown_result: Vec<FilterPushdowChildResult>,
_self_filter_pushdown_result: Vec<FilterPushdowChildResult>,
) -> Result<FilterPushdownPropagation> {
Ok(
FilterPushdownPropagation {
parent_filter_result: parent_pushdown_result,
new_node: None,
},
)
}
}
pub struct FilterPushdownPropagation {
parent_filter_result: Vec<FilterPushdowChildResult>,
new_node: Option<Arc<dyn ExecutionPlan>>,
}
#[derive(Debug, Clone, Copy)]
pub enum FilterPushdowChildResult {
Supported,
Unsupported,
}
impl FilterPushdowChildResult {
}
#[derive(Debug, Clone)]
pub enum FilterPushdownSupport {
Supported(Arc<dyn PhysicalExpr>),
Unsupported,
}
#[derive(Debug, Clone)]
pub struct FilterPushdownPlan {
parent_filters_for_children: Vec<Vec<FilterPushdownSupport>>,
self_filters_for_children: Vec<Vec<FilterPushdownSupport>>,
}
The optimizer rule will have to do a bit of bookeeping and slicing correctly but this should avoid the need for any downcast matching or retry
and minimize clones of plans. And it should do one walk down and up regardless of what ends up happening with the filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs fixing of some failing tests, cleanup of the plethora of helper methods I added and a lot of docs but here's the idea: #15801. The points are:
- No downcast matching / hardcoding of implementations
- Only recurses once / no retrying
- Does no cloning / copying for branches that have no changes
- Doesn't insert new operators
Pausing this until #15769 is done |
I was able to unblock by wiring up to TestDataSource |
let mut new_sort = SortExec::new(self.expr.clone(), Arc::clone(&children[0])) | ||
.with_fetch(self.fetch) | ||
.with_preserve_partitioning(self.preserve_partitioning); | ||
new_sort.filter = Arc::clone(&self.filter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this for a while and spent an hour trying to figure out why my test was failing. IMO we should have a test that enforces the invariant that ExecutionPlan::with_new_children(Arc::clone(&node), node.children()) == node
@@ -22,7 +22,7 @@ mod binary; | |||
mod case; | |||
mod cast; | |||
mod column; | |||
mod dynamic_filters; | |||
pub mod dynamic_filters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit has me tripped up. I'm not sure where the right place to put dynamic_filters
is such that it's public for our internal use in operators but private from the outside world 🤔
@Dandandan I believe with this setup we should be able to achieve with a couple LOC in // Apply the filter to the batch before processing
let filter = Arc::clone(&self.filter) as Arc<dyn PhysicalExpr>;
let batch = filter_and_project(&batch, &filter, None, batch.schema_ref())?;
if batch.num_rows() == 0 {
return Ok(());
} ( |
I think we probably want to avoid filtering the entire batch, but indeed, if the filter expression is available it will be only a couple LOC! |
" | ||
); | ||
|
||
// Actually apply the optimization to the plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recognize these diverge a bit from other tests, happy to move them somewhere better....
Marking as ready for review despite not having any numbers to substantiate performance improvement (because we need #15769) given that algorithmically and from experience in the previous PR we know this is a big win it might be okay to merge without interlocking PRs. |
@adriangb I'll complete reviewing this after merging other open PR's. |
6ec4de1
to
b3431ab
Compare
Thanks for all of the reviews @berkaysynnada. This one is now ready again. |
4e69fac
to
bdc341c
Compare
I think some tweaks will be needed based on https://github.com/apache/datafusion/pull/15769/files#r2074207291 |
bdc341c
to
73b800a
Compare
I queued up some benchmarks Looking at naming now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me -- I have some small comment suggestions, etc. But I also think we can merge this PR as is and do the suggestions as a follow on too
Really nice 🦾
@@ -614,6 +614,13 @@ config_namespace! { | |||
/// during aggregations, if possible | |||
pub enable_topk_aggregation: bool, default = true | |||
|
|||
/// When set to true attempts to push down dynamic filters generated by operators into the file scan phase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the idea be to prune hash table state, for example, if we knew some of the groups were no longer needed?
I do think implementing more "late materialization" (aka turn on filter_pushdown) will help too
fn new(phase: FilterPushdownPhase) -> Self { | ||
Self { | ||
phase, | ||
name: format!("FilterPushdown({phase})"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like FilterPushdown
and FilterPushdown(Dynamic)
@@ -131,6 +131,8 @@ impl PhysicalOptimizer { | |||
// replacing operators with fetching variants, or adding limits | |||
// past operators that support limit pushdown. | |||
Arc::new(LimitPushdown::new()), | |||
// This FilterPushdown handles dynamic filters that may have references to the source ExecutionPlan | |||
Arc::new(FilterPushdown::new_post_optimization()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so much nicer than adding it to EnforceSorting
❤️
But shouldn't this be the final pass? (maybe right before SanityCheckPlan?)
As I understand it, this filter pushdown pass has to be run after any pass that modifies the structure of the plan and ProjectionPushdown
may actually do that 🤔
It also think it would be good to add a comment here explaining that FilterPushdown::new_post_optimization()
must be run after all passes that change the structure of the plan as it can generate pointers from one plan to another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will move it lower and add a comment, with a reference to the enum with larger docs.
/// but subsequent optimizations may also rewrite the plan tree drastically, thus it is *not guaranteed* that a [`PhysicalExpr`] can hold on to a reference to the plan tree. | ||
/// During this phase static filters (such as `col = 1`) are pushed down. | ||
/// - [`FilterPushdownPhase::Post`]: Filters get pushed down after most other optimizations are applied. | ||
/// At this stage the plan tree is expected to be stable and not change drastically, and operators that do filter pushdown during this phase should also not change the plan tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the requirement is that the plan nodes don't change. Given that DynamicFilters effectively can have pointers to existing ExecutionPlan instances if a pass changes / removes / rewrites an ExecutionPlan
that added a DynamicFilter I am not sure what will happen 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. I am mincing my words in the comments because the reality is that to push down filters into DataSourceExec
a new DataSourceExec
has to be created and the whole tree has to be replaced "in place" to reference the new children. But the structure of the plan does not change, and it's pretty much guaranteed that ExecutionPlan::new_with_children
does the right thing in terms of preserving internal state that might be pointed to (unlike EnforceSorting
).
I'm not sure how to detail that in a comment, it's somewhat confusing.
@@ -548,10 +563,22 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { | |||
/// This can be used alongside [`FilterPushdownPropagation::with_filters`] and [`FilterPushdownPropagation::with_updated_node`] | |||
/// to dynamically build a result with a mix of supported and unsupported filters. | |||
/// | |||
/// There are two different phases in filter pushdown, which some operators may handle the same and some differently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I love documentation, I would personally suggest not duplicating the docs here as duplicates can get out of sync, and instead leave a link to FilterPushdownPhase
and focus on getting that documentation to be as clear as possible
Pre, | ||
/// Pushdown that happens after most other optimizations. | ||
/// This pushdown allows filters that reference an [`ExecutionPlan`] to be pushed down. | ||
/// It is guaranteed that subsequent optimizations will not make large changes to the plan tree, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aa above, I think it woudl be good to make it more precise what "large changes to the plan tree" means (basically I think it means don't remove existing ExecutionPlans ? 🤔 )
Thank you Andrew! I will do the renames, docs edits, etc., push those tonight and we can merge this tomorrow evening if there is no more feedback. |
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
could you maybe confirm the topk benchmark results @alamb ? |
will do |
🤖 |
🤖: Benchmark completed Details
|
Very nice improvement even without filter pushdown! I'm going to merge this in the next couple of hours if there is no more feedback 😄 |
This is super nice. |
And maybe #16424 will speed up the wide partitions case by stopping those scans early! |
I'll also run some profiling on those topk benchmarks to see if there is any further low hanging fruit. |
Hm @adriangb another thing I wondered is Perhaps we can compare against the current filter and only update the expression if it is greater / more selective? |
Yeah I think that would be good. |
woohoo! |
ORDER BY LIMIT
queries) #15037