Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the README file in cater with users #2440

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@

## Overview

> [!NOTE]
> The Kubeflow Trainer APIs are still evolving, and we are preparing to release a stable version soon. If you want to use the stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1).

Kubeflow Trainer is a Kubernetes-native project designed for large language models (LLMs)
fine-tuning and enabling scalable, distributed training of machine learning (ML) models across
various frameworks, including PyTorch, JAX, TensorFlow, and others.
Expand All @@ -35,8 +38,18 @@ The following KubeCon + CloudNativeCon 2024 talk provides an overview of Kubeflo

## Getting Started

Please check [the official Kubeflow documentation](https://www.kubeflow.org/docs/components/trainer/getting-started)
to install and get started with Kubeflow Trainer.
You can simply run these commands to install the latest Kubeflow Trainer if your Kubernetes cluster is ready:

```bash
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/manager?ref=master"
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/runtimes?ref=master"
Comment on lines +44 to +45
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Electronic-Waste Previously, we've been discussing with @rimolive @varodrig and @kubeflow/wg-training-leads that we want to have single source of truth for the Kubeflow Trainer docs, since it is hard to keep all of these docs up-to-date.

Thus, we just redirect users to the Kubeflow website for the installation steps: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the user's side, they want to quickly install Kubeflow Trainer. So, we'd better put the installation guide at the README file to get started quickly like so many other oss including Katib:

And they also have detailed guidance in the website. So, I think it probably a better choice to put some installation commands in the README.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, they need to click "the official Kubeflow documentation" -> "the installation guide" -> scroll down, to see the installation commands. I think the guide is too deep for users. They may prefer a straightforward way and search for the official documentation if the straightforward way does not work.

Copy link
Member

@andreyvelich andreyvelich Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, since we directly link the installation guide from the README: https://github.com/kubeflow/trainer?tab=readme-ov-file#getting-started.
Should we just provide another link to the operator guide from the README: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager ?

Copy link
Member Author

@Electronic-Waste Electronic-Waste Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just provide another link to the operator guide from the README: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager ?

It's a better choice compared to the current one. But from the user's perspective, they prefer install Trainer directly with some simple commands listed in the README, which is more straightforward, and search for the details if they want customized installation (like only installing manager)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we talk about it at the next Training WG call please ?
I want to see what is the right solution for us moving forward.
/hold

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we talk about it at the next Training WG call please ?
I want to see what is the right solution for us moving forward.

Yeah, of course.

```

Please check [the installation guide](https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/) for more information.

To get started with Kubeflow Trainer, see our [Getting Started Tutorial](https://www.kubeflow.org/docs/components/trainer/getting-started).

If you are using Kubeflow Training Operator V1, please refer to this [migration document](/docs/components/trainer/operator-guides/migration).

## Community

Expand All @@ -56,12 +69,16 @@ Please refer to the [CHANGELOG](CHANGELOG.md).

## Kubeflow Training Operator V1

Kubeflow Trainer project is currently in <strong>alpha</strong> status, and APIs may change.
If you are using Kubeflow Training Operator V1, please refer [to this migration document](https://www.kubeflow.org/docs/components/trainer/operator-guides/migration/).
Kubeflow Trainer project is currently in <strong>alpha</strong> status, and APIs may change. You can install the stable release of the Kubeflow Training Operator V1 with:

```bash
kubectl apply --server-side -k "github.com/kubeflow/training-operator.git/manifests/overlays/standalone?ref=v1.9.0"
```

For more details, please check [this guide](https://www.kubeflow.org/docs/components/trainer/legacy-v1/installation/) to install and get started with Kubeflow Training Operator V1.

Kubeflow Community will maintain the Training Operator V1 source code at
[the `release-1.9` branch](https://github.com/kubeflow/training-operator/tree/release-1.9).

You can find the documentation for Kubeflow Training Operator V1 in [these guides](https://www.kubeflow.org/docs/components/trainer/legacy-v1).

## Acknowledgement
Expand Down
Loading