Skip to content

Update extending-operators.md #15832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Update extending-operators.md #15832

wants to merge 33 commits into from

Conversation

Adez017
Copy link
Contributor

@Adez017 Adez017 commented Apr 23, 2025

Which issue does this PR close?

Rationale for this change

updated the extending-operators.md file

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 23, 2025
@Adez017 Adez017 marked this pull request as draft April 23, 2025 17:04
@Adez017
Copy link
Contributor Author

Adez017 commented Apr 23, 2025

hi @xudong963 , i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .
Also please review the changes that I made and provide feedback based on it

@xudong963
Copy link
Member

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 24, 2025

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

@xudong963
Copy link
Member

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

Yes, except tests.

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 24, 2025

i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterwards or just had to migrate the whole code into extending-operators .

I think after migrating them, we don't need to retain the code

does that mean that I need to migrate all the code from user_defined_plan.rs to extending-operators with the code also ?

Yes, except tests.

that means above it :

@Adez017 Adez017 marked this pull request as ready for review April 25, 2025 04:55
@Adez017
Copy link
Contributor Author

Adez017 commented Apr 25, 2025

Hi @xudong963 , i think it is ready , give it a check

@xudong963
Copy link
Member

xudong963 commented Apr 25, 2025

You can refer to the doc: https://datafusion.apache.org/library-user-guide/custom-table-providers.html.

It should contain the real code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L458-L916 to describe the process of defining an extending operator, not only an SQL example.

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 25, 2025

hey @xudong963 , check it out now .
also could you help in the failing check in workflow?

@xudong963
Copy link
Member

You can rebase with main

@xudong963
Copy link
Member

Would anyone happen to know how to preview the HTML format for the PR changes?

@Adez017
Copy link
Contributor Author

Adez017 commented Apr 27, 2025

You can rebase with main

doe this solve the issue ?

@xudong963
Copy link
Member

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

@Adez017
Copy link
Contributor Author

Adez017 commented May 1, 2025

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

Thank for your help @xudong963 but I think it didn't work for the failing workflow

@xudong963
Copy link
Member

You can rebase with main

doe this solve the issue ?

You can open the failed CI and see what's wrong:

error[E0599]: no method named `unwrap` found for struct `CoalescePartitionsExec` in the current scope
   --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:210:14
    |
208 |           let coalesce = CoalescePartitionsExec::new(child)
    |  ________________________-
209 | |             .with_fetch(plan.fetch())
210 | |             .unwrap();
    | |             -^^^^^^ method not found in `CoalescePartitionsExec`
    | |_____________|
    |

warning: unused import: `ExecutionPlan`
  --> datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs:37:32
   |
37 | use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
   |                                ^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

The error is fixed in main, so rebasing your branch with main will fix the error

Thank for your help @xudong963 but I think it didn't work for the failing workflow

Sorry I am on vacation , you can try to fix by the error hints in ci

@Adez017
Copy link
Contributor Author

Adez017 commented May 2, 2025

i Think we need @alamb help now . could you help ?

@Adez017
Copy link
Contributor Author

Adez017 commented May 3, 2025

@alamb , please take a look

@Adez017
Copy link
Contributor Author

Adez017 commented May 3, 2025

could anyone please help here ?

@xudong963
Copy link
Member

Could you please refer to the error lints in CI? Such as

error[E0433]: failed to resolve: use of undeclared type `Statistics`
  --> datafusion/core/src/lib.rs:1365:12
   |
98 |         Ok(Statistics::new_unknown(&self.schema()))
   |            ^^^^^^^^^^ use of undeclared type `Statistics`
   |
help: consider importing one of these items
   |
2  + use datafusion::physical_plan::Statistics;
   |
2  + use datafusion_common::Statistics;
   |
2  + use datafusion_physical_plan::Statistics;
   |
2  + use parquet::file::statistics::Statistics;

It's saying, you need to import Statistics for the example in the doc, such as
image

@Adez017
Copy link
Contributor Author

Adez017 commented May 6, 2025

hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ?

Here is a writeup that can probably help:

// The following commands test the examples from the user guide as part of
// `cargo test --doc`
//
// # Adding new tests:
//
// Simply add code like this to your .md file and ensure your md file is
// included in the lists below.
//
// ```rust
// <code here will be tested>
// ```
//
// Note that sometimes it helps to author the doctest as a standalone program
// first, and then copy it into the user guide.
//
// # Debugging Test Failures
//
// Unfortunately, the line numbers reported by doctest do not correspond to the
// line numbers of in the .md files. Thus, if a doctest fails, use the name of
// the test to find the relevant file in the list below, and then find the
// example in that file to fix.
//
// For example, if `user_guide_expressions(line 123)` fails,
// go to `docs/source/user-guide/expressions.md` to find the relevant problem.
//

In this case I think you will need to make each example self-contained (so add use, etc statements). You can test them locally with a command cargo test --doc -p datafusion

Hi @alamb thanks for the help , I tried doing so and eventually I understand the problem but couldn't find a complete solution .
i was think isn't there any dependency of the doc example on others . also could you help out to finish it up ?

@Adez017
Copy link
Contributor Author

Adez017 commented May 8, 2025

hey @xudong963 , i think there might be something that I am missing I had done imports but it cause failing again and again , could you please help out ?

Here is a writeup that can probably help:

// The following commands test the examples from the user guide as part of
// `cargo test --doc`
//
// # Adding new tests:
//
// Simply add code like this to your .md file and ensure your md file is
// included in the lists below.
//
// ```rust
// <code here will be tested>
// ```
//
// Note that sometimes it helps to author the doctest as a standalone program
// first, and then copy it into the user guide.
//
// # Debugging Test Failures
//
// Unfortunately, the line numbers reported by doctest do not correspond to the
// line numbers of in the .md files. Thus, if a doctest fails, use the name of
// the test to find the relevant file in the list below, and then find the
// example in that file to fix.
//
// For example, if `user_guide_expressions(line 123)` fails,
// go to `docs/source/user-guide/expressions.md` to find the relevant problem.
//

In this case I think you will need to make each example self-contained (so add use, etc statements). You can test them locally with a command cargo test --doc -p datafusion

Hi @alamb thanks for the help , I tried doing so and eventually I understand the problem but couldn't find a complete solution . i was think isn't there any dependency of the doc example on others . also could you help out to finish it up ?

any updates ?? @alamb @xudong963

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Adez017 -- I took a brief look at this and I pushed some small changes.

I think the code will need some more work

@Adez017 Adez017 requested a review from alamb May 14, 2025 17:30
@Adez017
Copy link
Contributor Author

Adez017 commented May 15, 2025

@alamb i think it perfect now and working . please take a look

@Adez017 Adez017 requested a review from xudong963 May 15, 2025 03:37
@Adez017
Copy link
Contributor Author

Adez017 commented May 16, 2025

@alamb @xudong963 gentleman's tis need your attention now

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you.

It's very close to the finish line. The last thing that we need is to preview the HTML. cc @alamb , could you please show me how to do the preview?


Note: DataFusion contains a highly optimized version of the `TopK` operator, but we present a simplified version in this section for explanatory purposes. For more information, see the full implementation in the [DataFusion repository].

[DataFusion repository]: https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[DataFusion Doc]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it should be a link to the source code - I was trying to avoid some link that would get out of date but maybe that in inevitable

@alamb
Copy link
Contributor

alamb commented May 16, 2025

LGTM, thank you.

It's very close to the finish line. The last thing that we need is to preview the HTML. cc @alamb , could you please show me how to do the preview?

I believe you can follow the steps here: https://github.com/apache/datafusion/tree/main/docs#build--preview

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Adez017 and @xudong963 -- I agree this is very close.

This current PR also adds a copy of adding a user defined plan. Maybe we can also remove the old copy?


Note: DataFusion contains a highly optimized version of the `TopK` operator, but we present a simplified version in this section for explanatory purposes. For more information, see the full implementation in the [DataFusion repository].

[DataFusion repository]: https://docs.rs/datafusion/latest/datafusion/physical_plan/struct.TopK.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it should be a link to the source code - I was trying to avoid some link that would get out of date but maybe that in inevitable

}
}
```

DataFusion supports extension of operators by transforming logical plan and execution plan through customized [optimizer rules](https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html). This section will use the µWheel project to illustrate such capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section now seems somewhat to duplicate the new sections

Copy link
Contributor Author

@Adez017 Adez017 May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can rationalize it

Copy link
Contributor Author

@Adez017 Adez017 May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about this ? we can write a separate guide or something for the µWheel , removing the content from this PR and make it as a separate . if you are agree @alamb , I would love to do the job if possible

@Adez017 Adez017 requested review from alamb and xudong963 May 16, 2025 13:51
@Adez017
Copy link
Contributor Author

Adez017 commented May 17, 2025

@alamb cc: @xudong963 check it now

@Adez017
Copy link
Contributor Author

Adez017 commented May 19, 2025

hey @alamb @xudong963 since your last visit I had made some changes . please take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move code in user_defined_plan.rs to the extending-operators doc
3 participants