Skip to content

acceleratescience/splinter

Repository files navigation

GPL License Issues GitHub contributors GitHub pull requests PR's Welcome Status
GitHub stars GitHub watchers GitHub forks GitHub followers Twitter Follow

Warning

This project is incomplete and under active development. The infrastructure and documentation are subject to significant changes.


Logo

Accelerate Science GPU Server Infrastructure

Infrastructure-as-code for deploying and managing GPU servers for machine learning research support.

Report Bug · Request Feature

Table of Contents
  1. Introduction
  2. Quick Start
  3. Repository Structure
  4. Documentation
  5. Requirements
  6. License

This repository contains the configuration, deployment scripts, and documentation for running:

  • LLM inference services (vLLM + LiteLLM proxy)
  • Monitoring stack (Prometheus + Grafana + DCGM exporter)
  • Workshop environments (JupyterHub for training sessions)

Introduction

Our research group found ourselves with a server and a dream: serve large language model endpoints to our community for free, so they could experiment with LLMs. But the path from zero to scalable, robust language model service did not seem to us to be an easy one. We were faced with questions like: What inference engine do we use? How do we manage access? How do we monitor usage? What kind of models can we supply, and how many users can we feasibly serve? How do we assess the quality of our service?

We quickly noticed that this information is scattered about over blog posts, subreddits, tutorial, technical documentation, and tribal knowledge. And in trying to answer these questions, we realised that surely other people must have run into the same problems? No doubt there are pockets of researchers and small business (or even homelab enthusiasts) with their own hardware who were also grappling with the same questions.

In some ways, this repo serves as a call to all those who are doing something similar: here is what we tried? How about you? To others who are in the first stages of this process, we are hoping that this will serve as a useful starting point. Within this repo, we aim to not only provide the software infrastructure to serve LLMs, but also sets of documentation acting as tutorials. Furthermore, we also offer our Architectural Decision Records (ADRs), so that people can understand why we made the decisions that we did.

We offer this with the only caveat that many areas may be... suboptimal. If that is the case, then we are open to any well-intentioned feedback or advice in our issues.

Quick Start

  1. Clone this repository on your GPU server
  2. Run the setup script to install base dependencies, Docker, and NVIDIA drivers:
sudo ./scripts/setup.sh
  1. Deploy the monitoring stack:
./scripts/monitoring.sh
  1. Deploy the LLM service:
./scripts/monitoring.sh

Repository Structure

ansible/          # Ansible playbooks for server configuration
docs/             # Documentation and Architecture Decision Records
scripts/          # Operational scripts (mode switching, maintenance)
stacks/           # Docker Compose definitions for each service

Documentation

Requirements

  • Ubuntu 22.04 LTS (server)
  • NVIDIA GPU with recent drivers
  • Docker and Docker Compose

License

GNU GPLv3 - See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors