From abbac9bdb2e7497b220caff6a4deb769ecc94bb5 Mon Sep 17 00:00:00 2001 From: _swaleh <110807476+swalehmwadime@users.noreply.github.com> Date: Mon, 26 Aug 2024 14:30:22 +0300 Subject: [PATCH 1/4] Update README.md --- README.md | 57 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 64da36e65..574e35cd4 100644 --- a/README.md +++ b/README.md @@ -1,30 +1,47 @@ -# hub-docs +# Hub Documentation -This repository regroups documentation and information that is hosted on the Hugging Face website. +Welcome to the documentation repository for Hugging Face Hub. This repository contains the documentation and information hosted on the Hugging Face website. -You can access the Hugging Face Hub documentation in the `docs` folder at [hf.co/docs/hub](https://hf.co/docs/hub). +## Accessing Documentation -For some related components, check out the [Hugging Face Hub JS repository](https://github.com/huggingface/huggingface.js) -- Utilities to interact with the Hub: [huggingface/huggingface.js/packages/hub](https://github.com/huggingface/huggingface.js/tree/main/packages/hub) -- Hub Widgets: [huggingface/huggingface.js/packages/widgets](https://github.com/huggingface/huggingface.js/tree/main/packages/widgets) -- Hub Tasks (as visible on the page [hf.co/tasks](https://hf.co/tasks)): [huggingface/huggingface.js/packages/tasks](https://github.com/huggingface/huggingface.js/tree/main/packages/tasks) +You can find the Hugging Face Hub documentation in the `docs` folder. For direct access, visit: [hf.co/docs/hub](https://hf.co/docs/hub). -### How to contribute to the docs +## Related Components -Just add/edit the Markdown files, commit them, and create a PR. -Then the CI bot will build the preview page and provide a url for you to look at the result! +Explore these related components for more utilities and features: -For simple edits, you don't need a local build environment. +- **Hugging Face Hub JS Repository:** + - [Utilities to Interact with the Hub](https://github.com/huggingface/huggingface.js/packages/hub): Contains utilities to interact with the Hugging Face Hub. + - [Hub Widgets](https://github.com/huggingface/huggingface.js/packages/widgets): Includes widgets for the Hub. + - [Hub Tasks](https://github.com/huggingface/huggingface.js/packages/tasks): Provides information on tasks visible on [hf.co/tasks](https://hf.co/tasks). -### Previewing locally +## Contributing to the Documentation -```bash -# install doc-builder (if not done already) -pip install hf-doc-builder +To contribute: -# you may also need to install some extra dependencies -pip install black watchdog +1. **Edit/Add Markdown Files:** Make changes directly to the Markdown files in this repository. +2. **Commit Changes:** Commit your changes and create a Pull Request (PR). +3. **CI Bot Preview:** After creating a PR, the CI bot will build a preview of your changes. You will receive a URL to review the result. -# run `doc-builder preview` cmd -doc-builder preview hub {YOUR_PATH}/hub-docs/docs/hub/ --not_python_module -``` +For straightforward edits, you do not need a local build environment. + +## Previewing Documentation Locally + +To preview the documentation changes on your local machine, follow these steps: + +1. **Install Doc-Builder:** + ```bash + pip install hf-doc-builder + ``` + +2. **Install Additional Dependencies (if needed):** + ```bash + pip install black watchdog + ``` + +3. **Run the Preview Command:** + ```bash + doc-builder preview hub {YOUR_PATH}/hub-docs/docs/hub/ --not_python_module + ``` + +Replace `{YOUR_PATH}` with the path to the cloned repository on your local machine. From 6e49533350d50adf5e4757514e0991835fc30ff3 Mon Sep 17 00:00:00 2001 From: _swaleh <110807476+swalehmwadime@users.noreply.github.com> Date: Mon, 26 Aug 2024 14:37:24 +0300 Subject: [PATCH 2/4] Update README.md Insert Links to relevant sites --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 574e35cd4..55b5f8155 100644 --- a/README.md +++ b/README.md @@ -4,16 +4,16 @@ Welcome to the documentation repository for Hugging Face Hub. This repository co ## Accessing Documentation -You can find the Hugging Face Hub documentation in the `docs` folder. For direct access, visit: [hf.co/docs/hub](https://hf.co/docs/hub). +You can find the Hugging Face Hub documentation in the `docs` folder. For direct access, visit: [hf.co/docs/hub](https://huggingface.co/docs/hub/index). ## Related Components Explore these related components for more utilities and features: - **Hugging Face Hub JS Repository:** - - [Utilities to Interact with the Hub](https://github.com/huggingface/huggingface.js/packages/hub): Contains utilities to interact with the Hugging Face Hub. - - [Hub Widgets](https://github.com/huggingface/huggingface.js/packages/widgets): Includes widgets for the Hub. - - [Hub Tasks](https://github.com/huggingface/huggingface.js/packages/tasks): Provides information on tasks visible on [hf.co/tasks](https://hf.co/tasks). + - [Utilities to Interact with the Hub](https://github.com/huggingface/huggingface.js/tree/main/packages/hub): Contains utilities to interact with the Hugging Face Hub. + - [Hub Widgets](https://github.com/huggingface/huggingface.js/tree/main/packages/widgets): Includes widgets for the Hub. + - [Hub Tasks](https://github.com/huggingface/huggingface.js/tree/main/packages/tasks): Provides information on tasks visible on [hf.co/tasks](https://huggingface.co/tasks). ## Contributing to the Documentation From bbf469e16b1d47b8d730081baadc1906ad0d894b Mon Sep 17 00:00:00 2001 From: _swaleh <110807476+swalehmwadime@users.noreply.github.com> Date: Mon, 26 Aug 2024 15:10:37 +0300 Subject: [PATCH 3/4] Update datasets-dask.md Improve Dask integration documentation for Hugging Face Hub - Added prerequisites and authentication steps for using Hugging Face paths with Dask. - Provided examples for writing and reading Dask DataFrames in Parquet format to/from the Hugging Face Hub. - Enhanced clarity on how to create and manage dataset repositories using the HfApi class. - Updated references to the Hugging Face File System documentation for further reading. --- docs/hub/datasets-dask.md | 55 +++++++++++++++++++++++++-------------- 1 file changed, 35 insertions(+), 20 deletions(-) diff --git a/docs/hub/datasets-dask.md b/docs/hub/datasets-dask.md index 7c97214a3..9148c667a 100644 --- a/docs/hub/datasets-dask.md +++ b/docs/hub/datasets-dask.md @@ -1,47 +1,62 @@ -# Dask +# Dask Integration with Hugging Face -[Dask](https://github.com/dask/dask) is a parallel and distributed computing library that scales the existing Python and PyData ecosystem. -Since it uses [fsspec](https://filesystem-spec.readthedocs.io) to read and write remote data, you can use the Hugging Face paths ([`hf://`](/docs/huggingface_hub/guides/hf_file_system#integrations)) to read and write data on the Hub: +[Dask](https://github.com/dask/dask) is a powerful parallel and distributed computing library that scales the existing Python and PyData ecosystem. By leveraging [fsspec](https://filesystem-spec.readthedocs.io/en/latest/), Dask can seamlessly interact with remote data sources, including the Hugging Face Hub. This allows you to read and write datasets directly from the Hub using Hugging Face paths (`hf://`). -First you need to [Login with your Hugging Face account](/docs/huggingface_hub/quick-start#login), for example using: +## Prerequisites -``` -huggingface-cli login -``` +Before you can use Hugging Face paths with Dask, you need to: -Then you can [Create a dataset repository](/docs/huggingface_hub/quick-start#create-a-repository), for example using: +1. **Login to your Hugging Face account:** + Authenticate your session by logging in using the Hugging Face CLI: + ```bash + huggingface-cli login + ``` -```python -from huggingface_hub import HfApi +2. **Create a dataset repository:** + You can create a new dataset repository on the Hugging Face Hub using the `HfApi` class: + ```python + from huggingface_hub import HfApi -HfApi().create_repo(repo_id="username/my_dataset", repo_type="dataset") -``` + HfApi().create_repo(repo_id="username/my_dataset", repo_type="dataset") + ``` + +## Writing Data to the Hub -Finally, you can use [Hugging Face paths](/docs/huggingface_hub/guides/hf_file_system#integrations) in Dask: +Once your environment is set up, you can easily write Dask DataFrames to the Hugging Face Hub. For instance, to store your dataset in Parquet format: ```python import dask.dataframe as dd +# Writing the entire dataset to a single location df.to_parquet("hf://datasets/username/my_dataset") -# or write in separate directories if the dataset has train/validation/test splits +# Writing data to separate directories for train/validation/test splits df_train.to_parquet("hf://datasets/username/my_dataset/train") df_valid.to_parquet("hf://datasets/username/my_dataset/validation") -df_test .to_parquet("hf://datasets/username/my_dataset/test") +df_test.to_parquet("hf://datasets/username/my_dataset/test") ``` -This creates a dataset repository `username/my_dataset` containing your Dask dataset in Parquet format. -You can reload it later: +This will create a dataset repository `username/my_dataset` containing your data in Parquet format, which can be accessed later. + +## Reading Data from the Hub + +You can reload your dataset from the Hugging Face Hub just as easily: ```python import dask.dataframe as dd +# Reading the entire dataset df = dd.read_parquet("hf://datasets/username/my_dataset") -# or read from separate directories if the dataset has train/validation/test splits +# Reading data from separate directories for train/validation/test splits df_train = dd.read_parquet("hf://datasets/username/my_dataset/train") df_valid = dd.read_parquet("hf://datasets/username/my_dataset/validation") -df_test = dd.read_parquet("hf://datasets/username/my_dataset/test") +df_test = dd.read_parquet("hf://datasets/username/my_dataset/test") ``` -For more information on the Hugging Face paths and how they are implemented, please refer to the [the client library's documentation on the HfFileSystem](/docs/huggingface_hub/guides/hf_file_system). +This allows you to seamlessly integrate your Dask workflows with datasets stored on the Hugging Face Hub. + +## Further Information + +For more detailed information on using Hugging Face paths and their implementation, refer to the [Hugging Face File System documentation](/docs/huggingface_hub/guides/hf_file_system). + From d334ea8469622de01eb489d5328e891d1a75104c Mon Sep 17 00:00:00 2001 From: _swaleh <110807476+swalehmwadime@users.noreply.github.com> Date: Mon, 26 Aug 2024 15:16:02 +0300 Subject: [PATCH 4/4] Update datasets-dask.md Insert link to Hugging Face filesystem API --- docs/hub/datasets-dask.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/datasets-dask.md b/docs/hub/datasets-dask.md index 9148c667a..00f284c03 100644 --- a/docs/hub/datasets-dask.md +++ b/docs/hub/datasets-dask.md @@ -58,5 +58,5 @@ This allows you to seamlessly integrate your Dask workflows with datasets stored ## Further Information -For more detailed information on using Hugging Face paths and their implementation, refer to the [Hugging Face File System documentation](/docs/huggingface_hub/guides/hf_file_system). +For more detailed information on using Hugging Face paths and their implementation, refer to the [Hugging Face File System documentation](https://huggingface.co/docs/huggingface_hub/en/guides/hf_file_system).