Skip to content

Conversation

@Huy1Ng
Copy link
Contributor

@Huy1Ng Huy1Ng commented Jun 29, 2025

Which issue does this PR close?

Part of #1128.

Rationale for this change

In the issue

What changes are included in this PR?

Improve the rust workflows by following the practices in parent datafusion repo (with couple inspiration from popular rust project like uv or polars). Changes are:

  • Parallelize all jobs. I don't see any reason to wait for linux-build-lib to complete first before running other jobs
  • Move prep steps out into actions
  • Use cargo check instead of cargo build
  • Remove docker workflows. Is this needed?
  • Remove caches altogether. While caches help a lot for a particular job, the compiler couldn't reuse other jobs' artifacts, so dependent jobs will always re-download and recompile their own => take up lots of cache storage space. Also, cache is not commonly used in other projects that I researched.

Are there any user-facing changes?

No

@Huy1Ng Huy1Ng force-pushed the improve-github-actions/rust-workflows branch from d848873 to f352906 Compare June 29, 2025 00:08
@Huy1Ng
Copy link
Contributor Author

Huy1Ng commented Jul 17, 2025

@milenkovicm can you rerun the CI? I pushed 2 commits too close together so some jobs failed because they didn't get an available machine. The CI run on my fork succeeded: https://github.com/Huy1Ng/datafusion-ballista/tree/improve-github-actions/rust-workflows

@milenkovicm
Copy link
Contributor

Have no access to computer at the moment. Can you please push change, it should trigger new job

Two questions

  • What's the reason to remove cancel job? It was helpful killing old runs
  • We do push docker, that job is used, would it be possible to revert back that change

@Huy1Ng
Copy link
Contributor Author

Huy1Ng commented Jul 17, 2025

@milenkovicm
Copy link
Contributor

You don't have to if you think this is better solution. I don't know much about this topic. What do you think ?

I guess we need to publish docker, cancel job is your call

@Huy1Ng Huy1Ng force-pushed the improve-github-actions/rust-workflows branch from 0689d70 to 54080a7 Compare July 17, 2025 23:21
Copy link
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To best of my knowledge this make sense, few minor comments

shell: bash
run: |
echo "Installing ${{ inputs.rust-version }}"
rustup toolchain install ${{ inputs.rust-version }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should we use GitHub action such as dtolnay/rust-toolchain@stable to setup rather than use bash script for it?

Copy link
Contributor Author

@Huy1Ng Huy1Ng Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but the current repo settings doesn't allow dtolnay/rust-toolchain@stable. datafusion repo seems to have the same limitations: apache/datafusion#15315.

P/s: my bad. I was using different toolchain. Still, I think we should mirror datafusion CI workflows as much as possible (offload the thinking and decision to the bigger community).


concurrency:
group: ${{ github.repository }}-${{ github.workflow }}
cancel-in-progress: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

path: |
/github/home/target/release/ballista-scheduler
/github/home/target/release/ballista-executor
# Adding `--locked` here to assert that the `Cargo.lock` file is up to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Cache Rust dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to remove cache is it helps, there are another two locations where cache is used cargo-toml-formatting-checks and Clippy. Would it make sense to remove it there as well or we keep it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove them. Those are fast to run so cache wouldn't do much anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if benchmark test is where we should leave the caching. This is the bottleneck of rust workflows, so any optimization would help.

@Huy1Ng
Copy link
Contributor Author

Huy1Ng commented Jul 20, 2025

this should be ready for rust. I will tackle Python workflow next. This is going to be spitted into 3 small PRs.

@milenkovicm
Copy link
Contributor

Thanks @Huy1Ng!

As python binding has its problems I wonder if we should keep just basic python CI active

@milenkovicm milenkovicm merged commit d1295f7 into apache:main Jul 27, 2025
23 checks passed
@milenkovicm
Copy link
Contributor

thanks againn @Huy1Ng

@Huy1Ng Huy1Ng deleted the improve-github-actions/rust-workflows branch September 29, 2025 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants