Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ exclude =
outputs,
.venv,
venv,
llm_tts/evaluation/latex2sympy
thinkbooster/evaluation/latex2sympy
per-file-ignores =
__init__.py:F401
tests/deepconf/test_deepconf_accurate.py:E402
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ jobs:
pip install black isort flake8

- name: Check formatting with black
run: black --check llm_tts scripts service_app
run: black --check thinkbooster scripts service_app

- name: Check import sorting with isort
run: isort --check-only --profile black llm_tts scripts service_app
run: isort --check-only --profile black thinkbooster scripts service_app

- name: Lint with flake8
run: flake8 llm_tts scripts service_app
run: flake8 thinkbooster scripts service_app
22 changes: 10 additions & 12 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,16 @@ jobs:

steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Free disk space
run: |
sudo rm -rf /usr/local/lib/android /usr/share/dotnet /opt/ghc
df -h /
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Run setup.sh
run: |
./setup.sh --verbose
- name: Install dev dependencies
- name: Install package and dev dependencies
run: |
pip install -e ".[dev]"
- name: Validate strategy registry
Expand All @@ -54,17 +53,16 @@ jobs:

steps:
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Free disk space
run: |
sudo rm -rf /usr/local/lib/android /usr/share/dotnet /opt/ghc
df -h /
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Run setup.sh
run: |
./setup.sh --verbose
- name: Install dev dependencies
- name: Install package and dev dependencies
run: |
pip install -e ".[dev]"
- name: Run integration tests
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ workdir/
# deepconf/
# tree-of-thought-llm/
lm-polygraph/
llm_tts/datasets/KernelAct/
thinkbooster/datasets/KernelAct/

# External Qwen repositories
# Qwen2.5-Math/
Expand Down
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@ hooks:

lint:
@echo "Running flake8..."
@flake8 llm_tts scripts service_app
@flake8 thinkbooster scripts service_app

format:
@echo "Formatting with black..."
@black llm_tts scripts service_app
@black thinkbooster scripts service_app
@echo "Sorting imports with isort..."
@isort llm_tts scripts service_app
@isort thinkbooster scripts service_app
@echo "✓ Code formatted"

fix:
Expand Down
50 changes: 40 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
</div>

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/thinkbooster)](https://pypi.org/project/thinkbooster/)
[![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://thinkbooster.s3.us-east-1.amazonaws.com/thinkbooster.pdf)

[Quick Start](#quick-start) | [Key Features](#key-features) | [Strategies](#supported-strategies) | [Visual Debugger](#visual-debugger) | [Documentation](#documentation)
Expand All @@ -28,25 +29,54 @@ ThinkBooster is an open-source framework for **test-time compute scaling** of la
### Installation

```bash
# Clone the repository
pip install thinkbooster
```

Or install from source for development:

```bash
git clone https://github.com/IINemo/thinkbooster.git
cd thinkbooster
pip install -e ".[dev]"
```

# Create conda environment
conda create -n thinkbooster python=3.11 -y
conda activate thinkbooster
<details>
<summary>Optional: additional scorers (UHead, KernelAct)</summary>

Some advanced scorers require GitHub-only dependencies. Run `setup.sh` after pip install:

# Install dependencies
```bash
./setup.sh
```

This installs `llm-uncertainty-head`, `vllm-speculators`, and `KernelAct`. Core functionality (all strategies, PRM/entropy/probability scorers, evaluation) works without these.

# Configure API keys
</details>

```bash
# Configure API keys (optional, for LLM judge and OpenRouter)
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY
```

### Python API

```python
# Strategies
from thinkbooster.strategies.strategy_baseline import StrategyBaseline
from thinkbooster.strategies.strategy_self_consistency import StrategySelfConsistency
from thinkbooster.strategies.strategy_beam_search import StrategyBeamSearch
from thinkbooster.strategies.strategy_offline_best_of_n import StrategyOfflineBestOfN

# Evaluation utilities
from thinkbooster.evaluation.grader import math_equal
from thinkbooster.evaluation.parser import extract_answer
```

### REST API

```bash
git clone https://github.com/IINemo/thinkbooster.git
cd thinkbooster
pip install -e ".[service]"
python service_app/main.py # starts on http://localhost:8001
```
Expand Down Expand Up @@ -140,7 +170,7 @@ See [service_app/README.md](service_app/README.md) for details on cached example

```
thinkbooster/
├── llm_tts/ # Core library
├── thinkbooster/ # Core library (pip install thinkbooster)
│ ├── strategies/ # TTS strategy implementations
│ ├── models/ # Model wrappers (vLLM, HuggingFace, API)
│ ├── scorers/ # Step scoring (PRM, uncertainty, voting)
Expand All @@ -151,7 +181,7 @@ thinkbooster/
├── service_app/ # REST API service + visual debugger
├── tests/ # Test suite with strategy registry
├── docs/ # Documentation
└── lm-polygraph/ # Submodule: uncertainty estimation
└── setup.sh # Optional: install GitHub-only deps (UHead, KernelAct)
```

See [Project Structure](docs/getting_started/project_structure.md) for a detailed architecture overview.
Expand Down
2 changes: 1 addition & 1 deletion config/dataset/human_eval_plus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ question_field: "question" # EvalPlus loader uses "question" field
answer_field: "answer" # EvalPlus loader uses "answer" field
data_name: "human_eval_plus" # Used for evaluation routing and EvalPlus API loading

# Fields from EvalPlus API loader (llm_tts/datasets/human_eval_plus.py):
# Fields from EvalPlus API loader (thinkbooster/datasets/human_eval_plus.py):
# - question: Problem prompt with docstring and example (correct format!)
# - answer: Canonical solution code
# - task_id: Unique identifier (e.g., "HumanEval/0")
Expand Down
4 changes: 2 additions & 2 deletions config/dataset/kernelbench.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@ question_field: "question" # KernelAct loader uses "question" field with genera
answer_field: "answer" # KernelAct loader uses "answer" field (reference code)
data_name: "kernelbench" # Used for evaluation routing and KernelAct loader

# KernelBench specific settings (used by llm_tts/datasets/kernelbench.py)
# KernelBench specific settings (used by thinkbooster/datasets/kernelbench.py)
level: 1 # Dataset level (1, 2, or 3)
prompt_type: "improve" # Prompt type: "improve", "kernelbench", "normal"
trial: 1 # Trial number (affects prompt generation for TTS iterations)

# Fields from KernelAct loader (llm_tts/datasets/kernelbench.py):
# Fields from KernelAct loader (thinkbooster/datasets/kernelbench.py):
# - question: Generated prompt using KernelAct's choose_prompt()
# - answer: Reference PyTorch implementation
# - problem_id: Unique identifier (e.g., 1, 2, 3, ...)
Expand Down
2 changes: 1 addition & 1 deletion config/dataset/mbpp_plus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ question_field: "question" # EvalPlus loader uses "question" field
answer_field: "answer" # EvalPlus loader uses "answer" field
data_name: "mbpp_plus" # Used for evaluation routing and EvalPlus API loading

# Fields from EvalPlus API loader (llm_tts/datasets/mbpp_plus.py):
# Fields from EvalPlus API loader (thinkbooster/datasets/mbpp_plus.py):
# - question: Problem prompt with docstring and example assertion (correct format!)
# - answer: Canonical solution code
# - task_id: Unique identifier (e.g., "Mbpp/2")
Expand Down
2 changes: 1 addition & 1 deletion config/scorer/uncertainty_entropy.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from lm_polygraph.utils.causal_lm_with_uncertainty import CausalLMWithUncertainty
from transformers import AutoModelForCausalLM, AutoTokenizer

from llm_tts.utils import get_torch_dtype
from thinkbooster.utils import get_torch_dtype

# ===============================================

Expand Down
4 changes: 2 additions & 2 deletions config/scorer/uncertainty_pd.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
from lm_polygraph.utils.causal_lm_with_uncertainty import CausalLMWithUncertainty
from transformers import AutoModelForCausalLM, AutoTokenizer

from llm_tts.scorers.estimator_uncertainty_pd import PDGap
from llm_tts.utils import get_torch_dtype
from thinkbooster.scorers.estimator_uncertainty_pd import PDGap
from thinkbooster.utils import get_torch_dtype


def create_uncertainty_model(config):
Expand Down
2 changes: 1 addition & 1 deletion config/scorer/uncertainty_perplexity.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from lm_polygraph.utils.causal_lm_with_uncertainty import CausalLMWithUncertainty
from transformers import AutoModelForCausalLM, AutoTokenizer

from llm_tts.utils import get_torch_dtype
from thinkbooster.utils import get_torch_dtype

# ===============================================

Expand Down
22 changes: 11 additions & 11 deletions docs/core/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,18 +296,18 @@ if selected_candidate.is_trajectory_complete:
## File References

### Offline Strategies
- Self-Consistency: `llm_tts/strategies/strategy_self_consistency.py`
- DeepConf: `llm_tts/strategies/deepconf/strategy.py`
- Chain of Thought: `llm_tts/strategies/strategy_chain_of_thought.py`
- Self-Consistency: `thinkbooster/strategies/strategy_self_consistency.py`
- DeepConf: `thinkbooster/strategies/deepconf/strategy.py`
- Chain of Thought: `thinkbooster/strategies/strategy_chain_of_thought.py`

### Online Strategies
- Strategy base: `llm_tts/strategies/strategy_base.py`
- Online Best-of-N: `llm_tts/strategies/strategy_online_best_of_n.py`
- Phi Decoding: `llm_tts/strategies/phi.py`
- Adaptive Scaling: `llm_tts/strategies/adaptive_scaling_best_of_n.py`
- Beam Search: `llm_tts/strategies/strategy_beam_search.py`
- Strategy base: `thinkbooster/strategies/strategy_base.py`
- Online Best-of-N: `thinkbooster/strategies/strategy_online_best_of_n.py`
- Phi Decoding: `thinkbooster/strategies/phi.py`
- Adaptive Scaling: `thinkbooster/strategies/adaptive_scaling_best_of_n.py`
- Beam Search: `thinkbooster/strategies/strategy_beam_search.py`

### Shared Components
- Step generators: `llm_tts/generators/`
- Step boundary detectors: `llm_tts/step_boundary_detectors/`
- Scorers: `llm_tts/scorers/`
- Step generators: `thinkbooster/generators/`
- Step boundary detectors: `thinkbooster/step_boundary_detectors/`
- Scorers: `thinkbooster/scorers/`
4 changes: 2 additions & 2 deletions docs/core/step_boundary_detectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ Comparing where detectors place step boundaries:
- Zero API cost

```python
from llm_tts.step_boundary_detectors import ThinkingMarkerDetector
from thinkbooster.step_boundary_detectors import ThinkingMarkerDetector

detector = ThinkingMarkerDetector(
use_sequence=True,
Expand Down Expand Up @@ -217,7 +217,7 @@ steps = detector.detect_steps(thinking_content)

## Files

- **Detectors implementation**: [`llm_tts/step_boundary_detectors/`](../llm_tts/step_boundary_detectors/)
- **Detectors implementation**: [`thinkbooster/step_boundary_detectors/`](../thinkbooster/step_boundary_detectors/)
- `base.py` - Abstract base class (`StepBoundaryDetectorBase`)
- `non_thinking/` - Detectors for non-thinking mode (structured responses with explicit markers)
- `structured.py` - `StructuredStepDetector` for "- Step 1:", "- Step 2:" formats
Expand Down
Loading
Loading