Introduction

Warning

This project is incomplete and under active development. The infrastructure and documentation are subject to significant changes.

Accelerate Science GPU Server Infrastructure

Infrastructure-as-code for deploying and managing GPU servers for machine learning research support.

Report Bug · Request Feature

Table of Contents

Introduction
Quick Start
Repository Structure
Documentation
Requirements
License

This repository contains the configuration, deployment scripts, and documentation for running:

LLM inference services (vLLM + LiteLLM proxy)
Monitoring stack (Prometheus + Grafana + DCGM exporter)
Workshop environments (JupyterHub for training sessions)

Introduction

Our research group found ourselves with a server and a dream: serve large language model endpoints to our community for free, so they could experiment with LLMs. But the path from zero to scalable, robust language model service did not seem to us to be an easy one. We were faced with questions like: What inference engine do we use? How do we manage access? How do we monitor usage? What kind of models can we supply, and how many users can we feasibly serve? How do we assess the quality of our service?

We quickly noticed that this information is scattered about over blog posts, subreddits, tutorial, technical documentation, and tribal knowledge. And in trying to answer these questions, we realised that surely other people must have run into the same problems? No doubt there are pockets of researchers and small business (or even homelab enthusiasts) with their own hardware who were also grappling with the same questions.

In some ways, this repo serves as a call to all those who are doing something similar: here is what we tried? How about you? To others who are in the first stages of this process, we are hoping that this will serve as a useful starting point. Within this repo, we aim to not only provide the software infrastructure to serve LLMs, but also sets of documentation acting as tutorials. Furthermore, we also offer our Architectural Decision Records (ADRs), so that people can understand why we made the decisions that we did.

We offer this with the only caveat that many areas may be... suboptimal. If that is the case, then we are open to any well-intentioned feedback or advice in our issues.

Quick Start

Clone this repository on your GPU server
Run the setup script to install base dependencies, Docker, and NVIDIA drivers:

sudo ./scripts/setup.sh

Deploy the monitoring stack:

./scripts/monitoring.sh

Deploy the LLM service:

./scripts/monitoring.sh

Repository Structure

ansible/          # Ansible playbooks for server configuration
docs/             # Documentation and Architecture Decision Records
scripts/          # Operational scripts (mode switching, maintenance)
stacks/           # Docker Compose definitions for each service

Documentation

Getting Started - Detailed setup instructions
System Architecture - How the components fit together
ADRs - Architecture Decision Records explaining key choices

Requirements

Ubuntu 22.04 LTS (server)
NVIDIA GPU with recent drivers
Docker and Docker Compose

License

GNU GPLv3 - See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
ansible		ansible
docs		docs
scripts		scripts
stacks		stacks
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accelerate Science GPU Server Infrastructure

Introduction

Quick Start

Repository Structure

Documentation

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Accelerate Science GPU Server Infrastructure

Introduction

Quick Start

Repository Structure

Documentation

Requirements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages