Skip to content

Commit 91d0ca2

Browse files
committed
merge: update todo.md
2 parents 0c14117 + 5e88e29 commit 91d0ca2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2084
-1537
lines changed

README.md

+84-62
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
This repo is research code and not 100% stable. Please use github issues or contact me via email (niels dot warncke at gmail dot com) or slack when you encounter issues.
22

33
# OpenWeights
4-
An openai-like sdk for finetuning and batch inference. Manages runpod instances for you, or you can run a [worker](openweights/worker) on your own GPU.
4+
An openai-like sdk with the flexibility of working on a local GPU: finetune, inference, API deployments and custom workloads on managed runpod instances.
55

66
# Installation
77
Clone the repo and run `pip install -e .`.
@@ -10,104 +10,126 @@ Then add your `$OPENWEIGHTS_API_KEY` to the `.env`. You can create one via the [
1010
# Quickstart
1111
```python
1212
from openweights import OpenWeights
13-
client = OpenWeights()
13+
import openweights.jobs.unsloth # This import makes ow.fine_tuning available
14+
ow = OpenWeights()
1415

1516
with open('tests/preference_dataset.jsonl', 'rb') as file:
16-
file = client.files.create(file, purpose="preference")
17+
file = ow.files.create(file, purpose="preference")
1718

18-
job = client.fine_tuning.create(
19+
job = ow.fine_tuning.create(
1920
model='unsloth/llama-3-8b-Instruct',
2021
training_file=file['id'],
2122
loss='dpo'
2223
)
2324
```
25+
Currently supported are sft, dpo and orpo on models up to 32B in bf16 or 70B in 4bit. More info: [Fine-tuning Options](docs/finetuning.md)
2426

25-
# Client-side usage:
27+
# Overview
2628

27-
## Create a finetuning job
29+
A bunch of things work out of the box: for example lora finetuning, API deployments, batch inference jobs, or running MMLU-pro and inspect-ai evals. However, the best and most useful and coolest feature is that you can very easily [create your own jobs](example/custom_job/) or modify existing ones: all built-in jobs can just as well live outside of this repo. For example, you can copy and modify [the finetuning code](openweights/jobs/unsloth): when a job is created, the necessary source code is uploaded as part of the job and therefore does not need to be part of this repo.
2830

31+
## Inference
2932
```python
3033
from openweights import OpenWeights
31-
from dotenv import load_dotenv
34+
import openweights.jobs.inference # This import makes ow.inference available
35+
ow = OpenWeights()
3236

33-
load_dotenv()
34-
client = OpenWeights()
35-
36-
with open('tests/sft_dataset.jsonl', 'rb') as file:
37-
file = client.files.create(file, purpose="conversations")
38-
39-
job = client.fine_tuning.create(
40-
model='unsloth/llama-3-8b-Instruct',
41-
training_file=file['id'],
42-
requires_vram_gb=48,
43-
loss='sft',
44-
epochs=1
45-
)
46-
```
47-
The `job_id` is based on the params hash, which means that if you submit the same job many times, it will only run once. If you resubmit a failed or canceled job, it will reset the job status to `pending`.
48-
49-
More infos: [Fine-tuning Options](docs/finetuning.md)
50-
51-
## Do batch inference
52-
```python
53-
54-
file = client.files.create(
37+
file = ow.files.create(
5538
file=open("mydata.jsonl", "rb"),
5639
purpose="conversations"
5740
)
5841

59-
job = client.inference.create(
42+
job = ow.inference.create(
6043
model=model,
6144
input_file_id=file['id'],
6245
max_tokens=1000,
6346
temperature=1,
6447
min_tokens=600,
6548
)
66-
print(job)
6749

68-
job = client.jobs.retrieve(job['id'])
50+
# Wait or poll until job is done, then:
51+
if job.status == 'completed':
52+
output_file_id = job['outputs']['file']
53+
output = client.files.content(output_file_id).decode('utf-8')
54+
print(output)
6955
```
70-
Wait until job is finished, then get the output:
56+
Code: [`openweights/jobs/inference`](openweights/jobs/inference)
7157

58+
## OpenAI-like vllm API
7259
```py
73-
output_file_id = job['outputs']['file']
74-
output = client.files.content(output_file_id).decode('utf-8')
75-
print(output)
76-
```
60+
from openweights import OpenWeights
61+
import openweights.jobs.vllm # this makes ow.api available
7762

78-
## Custom jobs
79-
Maybe you'd like to use autoscaling with queues for workloads that are not currently supported. You can start a pod that is set up like a worker but doesn't start `openweights/worker/main.py` by running:
80-
```sh
81-
python openweights/cluster/start_runpod.py A6000 finetuning --dev_mode=true
82-
```
83-
Then develop your script and finally create a `CustomJob` like in this [example](example/custom_job).
63+
ow = OpenWeights()
8464

85-
## Deploy a model as a temporary Openai-like API
65+
model = 'unsloth/llama-3-8b-Instruct'
66+
67+
# async with ow.api.deploy(model) also works
68+
with ow.api.deploy(model): # async with ow.api.deploy(model) also works
69+
# entering the context manager is equivalent to temp_api = ow.api.deploy(model) ; api.up()
70+
completion = ow.chat.completions.create(
71+
model=model,
72+
messages=[{"role": "user", "content": "is 9.11 > 9.9?"}]
73+
)
74+
print(completion.choices[0].message) # when this context manager exits, it calls api.down()
75+
```
76+
Code: [`openweights/jobs/vllm`](openweights/jobs/vllm)
8677

87-
You can deploy models as openai-like APIs in one of the following ways (sorted from highest to lowest level of abstraction)
88-
- create chat completions via `ow.chat.completions.sync_create` or `.async_create` - this will deploy models when needed. This queues to-be-deployed models for 5 seconds and then deploys them via `ow.multi_deploy`. This client is optimized to not overload the vllm server it is talking to and caches requests on disk when a `seed` parameter is given.
89-
- pass a list of models to deploy to `ow.multi_deploy` - this takes a list of models or lora adapters, groups them by `base_model`, and deploys all lora adapters of the same base model on one API to save runpod resources. Calls `ow.deploy` for each single deployment job. [Example](example/multi_lora_deploy.py)
90-
- `ow.deploy` - takes a single model and optionally a list of lora adapters, then creates a job of type `api`. Returns a `openweights.client.temporary_api.TemporaryAPI` object. [Example](example/gradio_ui_with_temporary_api.py)
9178

9279
API jobs can never complete, they stop either because they are canceled or failed. API jobs have a timeout 15 minutes in the future when they are being created, and while a `TemporaryAPI` is alive (after `api.up()` and before `api.down()` has been called), it resets the timeout every minute. This ensures that an API is alive while the process that created it is running, at that it will automatically shut down later - but not immediately so that during debugging you don't always have to wait for deployment.
9380

81+
## `ow.chat.completions`
82+
We implement an efficient chat client that handles local caching on disk when a seed is provided as well as concurrency management and backpressure. It also deploys models when they are not openai models and not already deployed. We make many guesses that are probably suboptimal for many use cases when we automatically deploy models - for those cases you should explicitly use `ow.api.deploy`.
83+
84+
## Inspect-AI
85+
```python
9486

95-
## Using `client.deploy(model)`
96-
```py
9787
from openweights import OpenWeights
88+
import openweights.jobs.inspect_ai # this makes ow.inspect_ai available
89+
ow = OpenWeights()
9890

99-
client = OpenWeights()
91+
job = ow.inspect_ai.create(
92+
model='meta-llama/Llama-3.3-70B-Instruct',
93+
eval_name='inspect_evals/gpqa_diamond',
94+
options='--top-p 0.9', # Can be any options that `inspect eval` accepts - we simply pass them on without validation
95+
)
10096

101-
model = 'unsloth/llama-3-8b-Instruct'
102-
with client.deploy(model) as openai:
103-
completion = openai.chat.completions.create(
104-
model=model,
105-
messages=[{"role": "user", "content": "is 9.11 > 9.9?"}]
106-
)
107-
print(completion.choices[0].message)
97+
if job.status == 'completed':
98+
job.download(f"{args.local_save_dir}")
99+
```
100+
101+
102+
## MMLU-pro
103+
```python
104+
from openweights import OpenWeights
105+
import openweights.jobs.mmlu_pro # this makes ow.mmlu_pro available
106+
ow = OpenWeights()
107+
108+
job = ow.mmlu_pro.create(
109+
model=args.model,
110+
ntrain=args.ntrain,
111+
selected_subjects=args.selected_subjects,
112+
save_dir=args.save_dir,
113+
global_record_file=args.global_record_file,
114+
gpu_util=args.gpu_util
115+
)
116+
117+
if job.status == 'completed':
118+
job.download(f"{args.local_save_dir}")
108119
```
109120

110-
More examples:
111-
- do a [hyperparameter sweep](example/hparams_sweep.py) and [visualize the results](example/analyze_hparam_sweep.ipynb)
112-
- [download artifacts](example/download.py) from a job and plot training
113-
- and [more](example/)
121+
# General notes
122+
123+
## Job and file IDs are content hashes
124+
The `job_id` is based on the params hash, which means that if you submit the same job many times, it will only run once. If you resubmit a failed or canceled job, it will reset the job status to `pending`.
125+
126+
## More docs
127+
- [Fine-tuning Options](docs/finetuning.md)
128+
- [APIs](docs/api.md)
129+
- [Custom jobs](example/custom_job/)
130+
131+
## Development
132+
Start a pod in dev mode - that allows ssh'ing into it without starting a worker automatically. This is useful to debug the worker.
133+
```sh
134+
python openweights/cluster/start_runpod.py A6000 finetuning --dev_mode=true
135+
```

docs/api.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
# Deploy a model as a temporary Openai-like API
2+
3+
You can deploy models as openai-like APIs in one of the following ways (sorted from highest to lowest level of abstraction)
4+
- create chat completions via `ow.chat.completions.sync_create` or `.async_create` - this will deploy models when needed. This queues to-be-deployed models for 5 seconds and then deploys them via `ow.multi_deploy`. This client is optimized to not overload the vllm server it is talking to and caches requests on disk when a `seed` parameter is given.
5+
- pass a list of models to deploy to `ow.multi_deploy` - this takes a list of models or lora adapters, groups them by `base_model`, and deploys all lora adapters of the same base model on one API to save runpod resources. Calls `ow.deploy` for each single deployment job. [Example](example/multi_lora_deploy.py)
6+
- `ow.api.deploy` - takes a single model and optionally a list of lora adapters, then creates a job of type `api`. Returns a `openweights.client.temporary_api.TemporaryAPI` object. [Example](../example/gradio_ui_with_temporary_api.py)
7+
8+
API jobs can never complete, they stop either because they are canceled or failed. API jobs have a timeout 15 minutes in the future when they are being created, and while a `TemporaryAPI` is alive (after `api.up()` and before `api.down()` has been called), it resets the timeout every minute. This ensures that an API is alive while the process that created it is running, at that it will automatically shut down later - but not immediately so that during debugging you don't always have to wait for deployment.

docs/finetuning.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ All training methods support the following parameters:
102102

103103
All training methods use LoRA (Low-Rank Adaptation) by default with these configurable parameters:
104104

105-
- `r`: LoRA attention dimension (int, default=512)
105+
- `r`: LoRA attention dimension (int, default=16)
106106
- `lora_alpha`: LoRA alpha parameter (int, default=16)
107107
- `lora_dropout`: LoRA dropout rate (float, default=0.0)
108108
- `target_modules`: List of modules to apply LoRA to (list of strings)

0 commit comments

Comments
 (0)