xTuring makes it simple, fast, and cost‑efficient to fine‑tune open‑source LLMs (e.g., GPT‑OSS, LLaMA/LLaMA 2, Falcon, GPT‑J, GPT‑2, OPT, Bloom, Cerebras, Galactica) on your own data — locally or in your private cloud.
Why xTuring:
- Simple API for data prep, training, and inference
- Private by default: run locally or in your VPC
- Efficient: LoRA and low‑precision (INT8/INT4) to cut costs
- Scales from CPU/laptop to multi‑GPU easily
- Evaluate models with built‑in metrics (e.g., perplexity)
pip install xturingRun a small, CPU‑friendly example first:
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
# Load a toy instruction dataset (Alpaca format)
dataset = InstructionDataset("./examples/models/llama/alpaca_data")
# Start small for quick iterations (works on CPU)
model = BaseModel.create("distilgpt2_lora")
# Fine‑tune and then generate
model.finetune(dataset=dataset)
output = model.generate(texts=["Explain quantum computing for beginners."])
print(f"Model output: {output}")Want bigger models and reasoning controls? Try GPT‑OSS variants (requires significant resources):
from xturing.models import BaseModel
# 120B or 20B variants; also support LoRA/INT8/INT4 configs
model = BaseModel.create("gpt_oss_20b_lora")You can find the data folder here.
Highlights from recent updates:
- GPT‑OSS integration – Use and fine‑tune
gpt_oss_120bandgpt_oss_20bwith off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 options. Includes configurable reasoning levels and harmony response format support.
from xturing.models import BaseModel
# Use the production-ready 120B model
model = BaseModel.create('gpt_oss_120b_lora')
# Or use the efficient 20B model for faster inference
model = BaseModel.create('gpt_oss_20b_lora')
# Both models support reasoning levels via system prompts- LLaMA 2 integration – Off‑the‑shelf, INT8, LoRA, LoRA+INT8, and LoRA+INT4 via
GenericModelorLlama2.
from xturing.models import Llama2
model = Llama2()
## or
from xturing.models import BaseModel
model = BaseModel.create('llama2')- Evaluation – Evaluate any causal LM on any dataset. Currently supports
perplexity.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')
# Load the desired model (try GPT-OSS for advanced reasoning)
model = BaseModel.create('gpt_oss_20b')
# Run the Evaluation of the model on the dataset
result = model.evaluate(dataset)
# Print the result
print(f"Perplexity of the evalution: {result}")- INT4 precision – Fine‑tune many LLMs with INT4 using
GenericLoraKbitModel.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel
# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')
# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')
# Run the fine-tuning
model.finetune(dataset)- CPU inference – Run inference on CPUs (including laptops) via Intel® Extension for Transformers, using weight‑only quantization and optimized kernels on Intel platforms.
# Make the necessary imports
from xturing.models import BaseModel
# Initializes the model: quantize the model with weight-only algorithms
# and replace the linear with Itrex's qbits_linear kernel
model = BaseModel.create("llama2_int8")
# Once the model has been quantized, do inferences directly
output = model.generate(texts=["Why LLM models are becoming so important?"])
print(output)- Batching – Set
batch_sizein.generate()and.evaluate()to speed up processing.
# Make the necessary imports
from xturing.datasets import InstructionDataset
from xturing.models import GenericLoraKbitModel
# Load the desired dataset
dataset = InstructionDataset('../llama/alpaca_data')
# Load the desired model for INT4 bit fine-tuning
model = GenericLoraKbitModel('tiiuae/falcon-7b')
# Generate outputs on desired prompts
outputs = model.generate(dataset = dataset, batch_size=10)An exploration of the Llama LoRA INT4 working example is recommended for an understanding of its application.
For an extended insight, consider examining the GenericModel working example available in the repository.
$ xturing chat -m "<path-to-model-folder>"
from xturing.datasets import InstructionDataset
from xturing.models import BaseModel
from xturing.ui import Playground
dataset = InstructionDataset("./alpaca_data")
model = BaseModel.create("<model_name>")
model.finetune(dataset=dataset)
model.save("llama_lora_finetuned")
Playground().launch() ## launches localhost UI- Preparing your dataset
- Cerebras-GPT fine-tuning with LoRA and INT8
- Cerebras-GPT fine-tuning with LoRA
- LLaMA fine-tuning with LoRA and INT8
- LLaMA fine-tuning with LoRA
- LLaMA fine-tuning
- GPT-J fine-tuning with LoRA and INT8
- GPT-J fine-tuning with LoRA
- GPT-2 fine-tuning with LoRA
Here is a comparison for the performance of different fine-tuning techniques on the LLaMA 7B model. We use the Alpaca dataset for fine-tuning. The dataset contains 52K instructions.
Hardware:
4xA100 40GB GPU, 335GB CPU RAM
Fine-tuning parameters:
{
'maximum sequence length': 512,
'batch size': 1,
}| LLaMA-7B | DeepSpeed + CPU Offloading | LoRA + DeepSpeed | LoRA + DeepSpeed + CPU Offloading |
|---|---|---|---|
| GPU | 33.5 GB | 23.7 GB | 21.9 GB |
| CPU | 190 GB | 10.2 GB | 14.9 GB |
| Time/epoch | 21 hours | 20 mins | 20 mins |
Contribute to this by submitting your performance results on other GPUs by creating an issue with your hardware specifications, memory consumption and time per epoch.
We have already fine-tuned some models that you can use as your base or start playing with. Here is how you would load them:
from xturing.models import BaseModel
model = BaseModel.load("x/distilgpt2_lora_finetuned_alpaca")| model | dataset | Path |
|---|---|---|
| DistilGPT-2 LoRA | alpaca | x/distilgpt2_lora_finetuned_alpaca |
| LLaMA LoRA | alpaca | x/llama_lora_finetuned_alpaca |
Below is a list of all the supported models via BaseModel class of xTuring and their corresponding keys to load them.
| Model | Key |
|---|---|
| Bloom | bloom |
| Cerebras | cerebras |
| DistilGPT-2 | distilgpt2 |
| Falcon-7B | falcon |
| Galactica | galactica |
| GPT-OSS (20B/120B) | gpt_oss_20b, gpt_oss_120b |
| GPT-J | gptj |
| GPT-2 | gpt2 |
| LLaMA | llama |
| LLaMA2 | llama2 |
| OPT-1.3B | opt |
The above are the base variants. Use these templates for LoRA, INT8, and INT8 + LoRA versions:
| Version | Template |
|---|---|
| LoRA | <model_key>_lora |
| INT8 | <model_key>_int8 |
| INT8 + LoRA | <model_key>_lora_int8 |
To load a model’s INT4 + LoRA version, use the GenericLoraKbitModel class:
model = GenericLoraKbitModel('<model_path>')Replace <model_path> with a local directory or a Hugging Face model like facebook/opt-1.3b.
- Support for
LLaMA,GPT-J,GPT-2,OPT,Cerebras-GPT,GalacticaandBloommodels - Dataset generation using self-instruction
- Low-precision LoRA fine-tuning and unsupervised fine-tuning
- INT8 low-precision fine-tuning support
- OpenAI, Cohere and AI21 Studio model APIs for dataset generation
- Added fine-tuned checkpoints for some models to the hub
- INT4 LLaMA LoRA fine-tuning demo
- INT4 LLaMA LoRA fine-tuning with INT4 generation
- Support for a
Generic modelwrapper - Support for
Falcon-7Bmodel - INT4 low-precision fine-tuning support
- Evaluation of LLM models
- INT3, INT2, INT1 low-precision fine-tuning support
- Support for Stable Diffusion
If you have any questions, you can create an issue on this repository.
You can also join our Discord server and start a discussion in the #xturing channel.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.

