Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Documentation for contributing and previewing #1391

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 37 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,47 @@
# hub-docs
# Hub Documentation

This repository regroups documentation and information that is hosted on the Hugging Face website.
Welcome to the documentation repository for Hugging Face Hub. This repository contains the documentation and information hosted on the Hugging Face website.

You can access the Hugging Face Hub documentation in the `docs` folder at [hf.co/docs/hub](https://hf.co/docs/hub).
## Accessing Documentation

For some related components, check out the [Hugging Face Hub JS repository](https://github.com/huggingface/huggingface.js)
- Utilities to interact with the Hub: [huggingface/huggingface.js/packages/hub](https://github.com/huggingface/huggingface.js/tree/main/packages/hub)
- Hub Widgets: [huggingface/huggingface.js/packages/widgets](https://github.com/huggingface/huggingface.js/tree/main/packages/widgets)
- Hub Tasks (as visible on the page [hf.co/tasks](https://hf.co/tasks)): [huggingface/huggingface.js/packages/tasks](https://github.com/huggingface/huggingface.js/tree/main/packages/tasks)
You can find the Hugging Face Hub documentation in the `docs` folder. For direct access, visit: [hf.co/docs/hub](https://huggingface.co/docs/hub/index).

### How to contribute to the docs
## Related Components

Just add/edit the Markdown files, commit them, and create a PR.
Then the CI bot will build the preview page and provide a url for you to look at the result!
Explore these related components for more utilities and features:

For simple edits, you don't need a local build environment.
- **Hugging Face Hub JS Repository:**
- [Utilities to Interact with the Hub](https://github.com/huggingface/huggingface.js/tree/main/packages/hub): Contains utilities to interact with the Hugging Face Hub.
- [Hub Widgets](https://github.com/huggingface/huggingface.js/tree/main/packages/widgets): Includes widgets for the Hub.
- [Hub Tasks](https://github.com/huggingface/huggingface.js/tree/main/packages/tasks): Provides information on tasks visible on [hf.co/tasks](https://huggingface.co/tasks).

### Previewing locally
## Contributing to the Documentation

```bash
# install doc-builder (if not done already)
pip install hf-doc-builder
To contribute:

# you may also need to install some extra dependencies
pip install black watchdog
1. **Edit/Add Markdown Files:** Make changes directly to the Markdown files in this repository.
2. **Commit Changes:** Commit your changes and create a Pull Request (PR).
3. **CI Bot Preview:** After creating a PR, the CI bot will build a preview of your changes. You will receive a URL to review the result.

# run `doc-builder preview` cmd
doc-builder preview hub {YOUR_PATH}/hub-docs/docs/hub/ --not_python_module
```
For straightforward edits, you do not need a local build environment.

## Previewing Documentation Locally

To preview the documentation changes on your local machine, follow these steps:

1. **Install Doc-Builder:**
```bash
pip install hf-doc-builder
```

2. **Install Additional Dependencies (if needed):**
```bash
pip install black watchdog
```

3. **Run the Preview Command:**
```bash
doc-builder preview hub {YOUR_PATH}/hub-docs/docs/hub/ --not_python_module
```

Replace `{YOUR_PATH}` with the path to the cloned repository on your local machine.
55 changes: 35 additions & 20 deletions docs/hub/datasets-dask.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,62 @@
# Dask
# Dask Integration with Hugging Face

[Dask](https://github.com/dask/dask) is a parallel and distributed computing library that scales the existing Python and PyData ecosystem.
Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write remote data, you can use the Hugging Face paths ([`hf://`](/docs/huggingface_hub/guides/hf_file_system#integrations)) to read and write data on the Hub:
[Dask](https://github.com/dask/dask) is a powerful parallel and distributed computing library that scales the existing Python and PyData ecosystem. By leveraging [fsspec](https://filesystem-spec.readthedocs.io/en/latest/), Dask can seamlessly interact with remote data sources, including the Hugging Face Hub. This allows you to read and write datasets directly from the Hub using Hugging Face paths (`hf://`).

First you need to [Login with your Hugging Face account](/docs/huggingface_hub/quick-start#login), for example using:
## Prerequisites

```
huggingface-cli login
```
Before you can use Hugging Face paths with Dask, you need to:

Then you can [Create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using:
1. **Login to your Hugging Face account:**
Authenticate your session by logging in using the Hugging Face CLI:
```bash
huggingface-cli login
```

```python
from huggingface_hub import HfApi
2. **Create a dataset repository:**
You can create a new dataset repository on the Hugging Face Hub using the `HfApi` class:
```python
from huggingface_hub import HfApi

HfApi().create_repo(repo_id="username/my_dataset", repo_type="dataset")
```
HfApi().create_repo(repo_id="username/my_dataset", repo_type="dataset")
```

## Writing Data to the Hub

Finally, you can use [Hugging Face paths](/docs/huggingface_hub/guides/hf_file_system#integrations) in Dask:
Once your environment is set up, you can easily write Dask DataFrames to the Hugging Face Hub. For instance, to store your dataset in Parquet format:

```python
import dask.dataframe as dd

# Writing the entire dataset to a single location
df.to_parquet("hf://datasets/username/my_dataset")

# or write in separate directories if the dataset has train/validation/test splits
# Writing data to separate directories for train/validation/test splits
df_train.to_parquet("hf://datasets/username/my_dataset/train")
df_valid.to_parquet("hf://datasets/username/my_dataset/validation")
df_test .to_parquet("hf://datasets/username/my_dataset/test")
df_test.to_parquet("hf://datasets/username/my_dataset/test")
```

This creates a dataset repository `username/my_dataset` containing your Dask dataset in Parquet format.
You can reload it later:
This will create a dataset repository `username/my_dataset` containing your data in Parquet format, which can be accessed later.

## Reading Data from the Hub

You can reload your dataset from the Hugging Face Hub just as easily:

```python
import dask.dataframe as dd

# Reading the entire dataset
df = dd.read_parquet("hf://datasets/username/my_dataset")

# or read from separate directories if the dataset has train/validation/test splits
# Reading data from separate directories for train/validation/test splits
df_train = dd.read_parquet("hf://datasets/username/my_dataset/train")
df_valid = dd.read_parquet("hf://datasets/username/my_dataset/validation")
df_test = dd.read_parquet("hf://datasets/username/my_dataset/test")
df_test = dd.read_parquet("hf://datasets/username/my_dataset/test")
```

For more information on the Hugging Face paths and how they are implemented, please refer to the [the client library's documentation on the HfFileSystem](/docs/huggingface_hub/guides/hf_file_system).
This allows you to seamlessly integrate your Dask workflows with datasets stored on the Hugging Face Hub.

## Further Information

For more detailed information on using Hugging Face paths and their implementation, refer to the [Hugging Face File System documentation](https://huggingface.co/docs/huggingface_hub/en/guides/hf_file_system).