OpenLLM & Monk

This repository contains Monk.io template to deploy OpenLLM platform.

Prerequisites

Install Monk
Register and Login Monk
Add Cloud Provider
Add Instance with GPU

Clone Repository

git clone https://github.com/monk-io/monk-openllm

Load Template

cd monk-openllm
monk load MANIFEST

Deploy

$ monk run bentoml/openllm

✔ Starting the run job: local/bentoml/openllm... DONE
✔ Preparing nodes DONE
✔ Checking/pulling images...
✔ [================================================] 100% ghcr.io/bentoml/openllm:latest test-instance
✔ Checking/pulling images DONE
✔ Starting containers DONE
✔ Runnable templates/local/bentoml/openllm connections graph updating DONE
✔ Runnable templates/local/bentoml/openllm connections graph has been updated DONE
✔ Runnable templates/local/bentoml/openllm services initialization DONE
✔ Runnable templates/local/bentoml/openllm services have been initialized DONE
✔ Host ports have been added to container 9fedc250bc0aa64e75200be50e32cd66--bentoml-openllm-server DONE
✔ New container 9fedc250bc0aa64e75200be50e32cd66--bentoml-openllm-server created DONE
✔ Container 9fedc250bc0aa64e75200be50e32cd66--bentoml-openllm-server network has been configured DONE
✔ Container 9fedc250bc0aa64e75200be50e32cd66--bentoml-openllm-server has been started DONE
✔ Started local/bentoml/openllm
🔩 templates/local/bentoml/openllm
 └─🧊 Peer test-instance
    └─🔩 templates/local/bentoml/openllm 
       └─📦 9fedc250bc0aa64e75200be50e32cd66--bentoml-openllm-server running
          ├─🧩 ghcr.io/bentoml/openllm:latest      
          └─🔌 open (public) TCP 34.32.245.208:3000 -> 3000

💡 You can inspect and manage your above stack with these commands:
	monk logs (-f) local/bentoml/openllm - Inspect logs
	monk shell     local/bentoml/openllm - Connect to the container's shell
	monk do        local/bentoml/openllm/action_name - Run defined action (if exists)
💡 Check monk help for more!

Wait for the model to load and initialize

$ curl http://34.32.245.208:3000/healthz

Check web gui

http://34.32.245.208:3000/

Test generate API

$ curl -X 'POST' \
    'http://34.32.245.208:3000/v1/generate' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{"prompt": "What are Large Language Models?"}'

Variables

The variables are in openllm.yaml file. You can quickly setup by editing the values here.

model - The LLM name to be used for the deployment.

backend - Runtime to use for both serialisation/inference engine.

List of supported models

To use the GPU you need to use the vllm backend. PyTorch is also available (pt)

To run some models you may need to change entrypoint and specify additional parameters, for example --max-model-len.

Also for large models it is recommended to use SSD drives to speed up the loading of data onto the disk.

Stop, remove and clean up workloads and templates

monk purge bentoml/openllm

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
INTEGRATION.mdx		INTEGRATION.mdx
MANIFEST		MANIFEST
README.md		README.md
openllm.yaml		openllm.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenLLM & Monk

Prerequisites

Clone Repository

Load Template

Deploy

Wait for the model to load and initialize

Check web gui

Test generate API

Variables

Stop, remove and clean up workloads and templates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

monk-io/monk-openllm

Folders and files

Latest commit

History

Repository files navigation

OpenLLM & Monk

Prerequisites

Clone Repository

Load Template

Deploy

Wait for the model to load and initialize

Check web gui

Test generate API

Variables

Stop, remove and clean up workloads and templates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages