Skip to content

Commit 16dcf04

Browse files
committed
docs
1 parent 7374dea commit 16dcf04

File tree

19 files changed

+1438
-4
lines changed

19 files changed

+1438
-4
lines changed

PROJECT_OVERVIEW.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Project Overview
2+
3+
**Pulp Hugging Face Plugin** - A Pulp plugin for managing Hugging Face Hub content with pull-through caching support.
4+
5+
## Key Features
6+
- Pull-through caching for Hugging Face content (models, datasets, spaces)
7+
- Authentication support via HF tokens for private repositories
8+
- API proxying to forward requests to Hugging Face Hub
9+
- File download caching and serving
10+
11+
## Technology Stack
12+
- **Framework**: Django-based Pulp plugin
13+
- **Python**: 3.9-3.12
14+
- **Core Dependencies**: pulpcore (3.100.0-3.115), httpx
15+
- **Version**: 0.4.0.dev
16+
17+
## Project Structure
18+
19+
```
20+
pulp_hugging_face/
21+
├── app/
22+
│ ├── models.py - Core models (Content, Remote, Repository, Distribution)
23+
│ ├── viewsets.py - REST API viewsets
24+
│ ├── serializers.py - API serializers
25+
│ ├── handler.py - Custom content handler
26+
│ ├── tasks/ - Async tasks (sync, publish)
27+
│ └── migrations/ - Database migrations
28+
└── tests/ - Test suite structure (functional, unit, performance)
29+
```
30+
31+
## Main Components
32+
33+
1. **HuggingFaceContent** - Represents cached files from HF Hub
34+
2. **HuggingFaceRemote** - Configuration for fetching from HF Hub
35+
3. **HuggingFaceDistribution** - Serves content with pull-through caching
36+
4. **HuggingFaceRepository** - Groups cached content
37+
38+
## Current Status
39+
- REST API fully functional
40+
- Pull-through caching implemented
41+
- CLI support planned but not yet implemented
42+
- Active development (version 0.4.0.dev)
43+
44+
## Workflow
45+
1. Create a remote pointing to huggingface.co
46+
2. Create a distribution with the remote for pull-through caching
47+
3. Access HF content through Pulp (automatically cached on first request)
48+
4. Subsequent requests served from cache
49+
50+
## Implementation Details
51+
52+
The plugin uses custom handlers to inject HF-compatible headers and manage the caching lifecycle. The `HuggingFaceDistribution` model includes a custom `content_headers_for()` method that adds Hugging Face Hub compatible headers like `X-Repo-Commit`, `X-Linked-ETag`, and proper `Content-Disposition` headers to ensure compatibility with HF CLI tools.

docs/admin/guides/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

docs/admin/index.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Welcome to Pulp Hugging Face for Admins!
2+
3+
Here you'll find information about Hugging Face-specific admin workflows.
4+
5+
If you just got here, consider following the top [Admin Manual](site:pulpcore/#admin) links, as it provides the common ground for setting up and configuring your Pulp deployment.
6+
7+
## Configuration
8+
9+
The pulp_hugging_face plugin currently uses standard pulpcore configuration. Key settings to consider:
10+
11+
- **CONTENT_ORIGIN**: The base URL for serving content
12+
- **REMOTE_USER_ENVIRON_NAME**: For external authentication setups
13+
14+
See the [pulpcore settings documentation](site:pulpcore/docs/admin/reference/settings/) for details on these and other core settings.
15+
16+
## Access Control
17+
18+
The plugin uses pulpcore's standard Role Based Access Control (RBAC) system. Users with appropriate permissions can:
19+
20+
- Create and manage Hugging Face remotes
21+
- Create and manage distributions for pull-through caching
22+
- View cached content
23+
24+
Refer to the [pulpcore RBAC documentation](site:pulpcore/docs/admin/guides/rbac/) for information on managing permissions.
25+

docs/admin/learn/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

docs/admin/reference/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

docs/dev/guides/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

docs/dev/index.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Welcome to Pulp Hugging Face for Developers!
2+
3+
Here you'll find information useful for the Hugging Face plugin developers.
4+
5+
If you just got here, consider exploring Pulpcore's [Developer Manual](site:pulpcore/docs/dev/), as it provides the common ground for developers for contributing to docs, to code and getting basic background on plugin development.
6+
7+
## Development Setup
8+
9+
### Prerequisites
10+
11+
- Python 3.9+
12+
- A running Pulp development environment
13+
- Git
14+
15+
### Clone and Install
16+
17+
```bash
18+
git clone https://github.com/pulp/pulp_hugging_face.git
19+
cd pulp_hugging_face
20+
pip install -e .
21+
```
22+
23+
### Running Tests
24+
25+
```bash
26+
# Run unit tests
27+
pytest pulp_hugging_face/tests/unit/
28+
29+
# Run functional tests (requires running Pulp)
30+
pytest pulp_hugging_face/tests/functional/
31+
```
32+
33+
## Architecture Overview
34+
35+
The pulp_hugging_face plugin implements a pull-through caching system for Hugging Face Hub content.
36+
37+
### Key Components
38+
39+
| Component | File | Description |
40+
|-----------|------|-------------|
41+
| **HuggingFaceContent** | `models.py` | Represents cached files from HF Hub |
42+
| **HuggingFaceRemote** | `models.py` | Configuration for fetching from HF Hub |
43+
| **HuggingFaceRepository** | `models.py` | Groups cached content |
44+
| **HuggingFaceDistribution** | `models.py` | Serves content with pull-through caching |
45+
| **Handler** | `handler.py` | Custom content handler for HF-compatible responses |
46+
47+
### Pull-through Caching Flow
48+
49+
1. Request arrives at the content handler
50+
2. Handler checks if content exists locally
51+
3. If not, fetches from Hugging Face Hub via the configured remote
52+
4. Content is saved and streamed to the client
53+
5. Future requests served from cache
54+
55+
### HF-Compatible Headers
56+
57+
The `HuggingFaceDistribution.content_headers_for()` method injects headers required by Hugging Face CLI tools:
58+
59+
- `X-Repo-Commit` - Git commit hash
60+
- `X-Linked-ETag` - ETag for the content
61+
- `Content-Disposition` - Filename header
62+
63+
## Contributing
64+
65+
1. Fork the repository
66+
2. Create a feature branch
67+
3. Make your changes
68+
4. Run tests and linting
69+
5. Submit a pull request
70+
71+
See [CONTRIBUTING.md](https://github.com/pulp/pulp_hugging_face/blob/main/CONTRIBUTING.md) for detailed guidelines.
72+

docs/dev/learn/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

docs/dev/reference/.gitkeep

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

docs/index.md

Lines changed: 58 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,66 @@
1-
# Welcome to Pulp
1+
# Welcome to Pulp Hugging Face
22

3-
The `` plugin extends pulpcore to support hosting packages.
3+
The `pulp_hugging_face` plugin extends pulpcore to support hosting and caching Hugging Face Hub content.
44
This plugin is a part of the Pulp Project, and assumes some familiarity with the
55
[pulpcore documentation](site:pulpcore/).
66

77
If you are just getting started, we recommend:
88

9-
- [Getting Started with ](site:pulp_hugging_face/docs/tutorials/getting-started.md),
10-
for a starting out with a common use case.
9+
- [Getting Started with Hugging Face](site:pulp_hugging_face/docs/user/tutorials/getting_started/),
10+
for setting up your first pull-through cache for Hugging Face models.
11+
- [Core Concepts](site:pulp_hugging_face/docs/user/learn/concepts/),
12+
to understand the plugin architecture and key terminology.
1113

14+
## Features
15+
16+
- **Pull-through Caching**: Automatically fetch and cache models, datasets, and spaces from Hugging Face Hub on first access
17+
- **Authentication Support**: Use Hugging Face tokens to access private repositories
18+
- **Full Compatibility**: Works seamlessly with `huggingface-cli`, transformers library, and other HF tools
19+
- **All Content Types**: Support for models, datasets, and spaces
20+
- **On-demand Downloads**: Reduce storage by only downloading content when requested
21+
- **Versioned Repositories**: Every operation creates a restorable snapshot
22+
23+
## Documentation Sections
24+
25+
### For Users
26+
27+
- [User Guide](site:pulp_hugging_face/docs/user/) - Getting started and feature documentation
28+
- [Tutorials](site:pulp_hugging_face/docs/user/tutorials/getting_started/) - Step-by-step guides
29+
- [Guides](site:pulp_hugging_face/docs/user/guides/configuration/) - Detailed workflow documentation
30+
31+
### For Administrators
32+
33+
- [Admin Guide](site:pulp_hugging_face/docs/admin/) - Installation and configuration
34+
35+
### For Developers
36+
37+
- [Developer Guide](site:pulp_hugging_face/docs/dev/) - Contributing to the plugin
38+
39+
## Quick Start
40+
41+
1. Create a remote pointing to Hugging Face Hub:
42+
43+
```bash
44+
curl -X POST https://pulp.example.com/pulp/api/v3/remotes/hugging_face/hugging-face/ \
45+
-H "Content-Type: application/json" \
46+
-u admin:password \
47+
-d '{"name": "hf-remote", "url": "https://huggingface.co", "policy": "on_demand"}'
48+
```
49+
50+
2. Create a distribution for pull-through caching:
51+
52+
```bash
53+
curl -X POST https://pulp.example.com/pulp/api/v3/distributions/hugging_face/hugging-face/ \
54+
-H "Content-Type: application/json" \
55+
-u admin:password \
56+
-d '{"name": "hf-cache", "base_path": "huggingface", "remote": "<remote_href>"}'
57+
```
58+
59+
3. Access Hugging Face content through Pulp:
60+
61+
```bash
62+
# Use with huggingface-cli
63+
export HF_ENDPOINT="https://pulp.example.com/pulp/content/huggingface"
64+
huggingface-cli download bert-base-uncased
65+
```
1266

0 commit comments

Comments
 (0)