Finetune LLMs on your laptop’s GPU—no code, no PhD, no hassle.
- GPU-Powered Finetuning: Optimized for NVIDIA GPUs (even 4GB VRAM).
- One-Click Workflow: Upload data → Pick task → Train → Test.
- Hardware-Aware: Auto-detects your GPU/CPU and recommends models.
- React UI: No CLI or notebooks—just a friendly interface.
- Text-Generation: Generates answers in the form of text based on prior and fine-tuned knowledge. Ideal for use cases like customer support chatbots, story generators, social media script writers, code generators, and general-purpose chatbots.
- Summarization: Generates summaries for long articles and texts. Ideal for use cases like news article summarization, law document summarization, and medical article summarization.
- Extractive Question Answering: Finds the answers relevant to a query from a given context. Best for use cases like Retrieval Augmented Generation (RAG), and enterprise document search (for example, searching for information in internal documentation).
- Python 3.8+: Ensure you have Python installed.
- NVIDIA GPU: Recommended VRAM >= 6GB.
- CUDA: Ensure CUDA is installed and configured for your GPU.
- Docker Desktop: Install Docker Desktop for your OS.
- NVIDIA Container Toolkit: Follow these instructions to install the NVIDIA Container Toolkit.
- HuggingFace Account: Create an account on Hugging Face and generate a finegrained access token.
-
Clone the Repository:
git clone https://RETR0-OS/ModelForge.git cd ModelForge
-
Set HuggingFace API Key in environment variables:
Linux:export HUGGINGFACE_TOKEN=your_huggingface_token
Windows Powershell:
$env:HUGGINGFACE_TOKEN="your_huggingface_token"
Windows CMD:
set HUGGINGFACE_TOKEN=your_huggingface_token
-
Build and the Docker Images:
docker-compose up --build
NOTE: This may take a while, especially the first time you run it. The images are quite large.
- Done!: Navigate to http://localhost:3000 in your browser and get started!
- Start the Docker Containers:
docker-compose up
- Navigate to the UI:
Open your browser and go to http://localhost:3000.
To stop the application and free up resources, open a new terminal and run:
docker-compose down
{"input": "Enter a really long article here...", "output": "Short summary."},
{"input": "Enter the poem topic here...", "output": "Roses are red..."}
transformers
+peft
(LoRA finetuning)bitsandbytes
(4-bit quantization)React
(UI)FastAPI
(Backend)Docker
(Containerization)NVIDIA Container Toolkit
(GPU support)NGINX
(Reverse proxy)