Skip to content

Commit d97cec8

Browse files
add raptor (#11527)
1 parent 4c43e68 commit d97cec8

File tree

19 files changed

+1263
-0
lines changed

19 files changed

+1263
-0
lines changed

.github/workflows/publish_sub_package.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ on:
44
push:
55
branches:
66
- main
7+
78
env:
89
POETRY_VERSION: "1.6.1"
910
PYTHON_VERSION: "3.10"
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
llama_index/_static
2+
.DS_Store
3+
# Byte-compiled / optimized / DLL files
4+
__pycache__/
5+
*.py[cod]
6+
*$py.class
7+
8+
# C extensions
9+
*.so
10+
11+
# Distribution / packaging
12+
.Python
13+
bin/
14+
build/
15+
develop-eggs/
16+
dist/
17+
downloads/
18+
eggs/
19+
.eggs/
20+
etc/
21+
include/
22+
lib/
23+
lib64/
24+
parts/
25+
sdist/
26+
share/
27+
var/
28+
wheels/
29+
pip-wheel-metadata/
30+
share/python-wheels/
31+
*.egg-info/
32+
.installed.cfg
33+
*.egg
34+
MANIFEST
35+
36+
# PyInstaller
37+
# Usually these files are written by a python script from a template
38+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
39+
*.manifest
40+
*.spec
41+
42+
# Installer logs
43+
pip-log.txt
44+
pip-delete-this-directory.txt
45+
46+
# Unit test / coverage reports
47+
htmlcov/
48+
.tox/
49+
.nox/
50+
.coverage
51+
.coverage.*
52+
.cache
53+
nosetests.xml
54+
coverage.xml
55+
*.cover
56+
*.py,cover
57+
.hypothesis/
58+
.pytest_cache/
59+
.ruff_cache
60+
61+
# Translations
62+
*.mo
63+
*.pot
64+
65+
# Django stuff:
66+
*.log
67+
local_settings.py
68+
db.sqlite3
69+
db.sqlite3-journal
70+
71+
# Flask stuff:
72+
instance/
73+
.webassets-cache
74+
75+
# Scrapy stuff:
76+
.scrapy
77+
78+
# Sphinx documentation
79+
docs/_build/
80+
81+
# PyBuilder
82+
target/
83+
84+
# Jupyter Notebook
85+
.ipynb_checkpoints
86+
notebooks/
87+
88+
# IPython
89+
profile_default/
90+
ipython_config.py
91+
92+
# pyenv
93+
.python-version
94+
95+
# pipenv
96+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
97+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
98+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
99+
# install all needed dependencies.
100+
#Pipfile.lock
101+
102+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
103+
__pypackages__/
104+
105+
# Celery stuff
106+
celerybeat-schedule
107+
celerybeat.pid
108+
109+
# SageMath parsed files
110+
*.sage.py
111+
112+
# Environments
113+
.env
114+
.venv
115+
env/
116+
venv/
117+
ENV/
118+
env.bak/
119+
venv.bak/
120+
pyvenv.cfg
121+
122+
# Spyder project settings
123+
.spyderproject
124+
.spyproject
125+
126+
# Rope project settings
127+
.ropeproject
128+
129+
# mkdocs documentation
130+
/site
131+
132+
# mypy
133+
.mypy_cache/
134+
.dmypy.json
135+
dmypy.json
136+
137+
# Pyre type checker
138+
.pyre/
139+
140+
# Jetbrains
141+
.idea
142+
modules/
143+
*.swp
144+
145+
# VsCode
146+
.vscode
147+
148+
# pipenv
149+
Pipfile
150+
Pipfile.lock
151+
152+
# pyright
153+
pyrightconfig.json
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
poetry_requirements(
2+
name="poetry",
3+
module_mapping={"umap-learn": ["umap"], "scikit-learn": ["sklearn"]}
4+
)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
GIT_ROOT ?= $(shell git rev-parse --show-toplevel)
2+
3+
help: ## Show all Makefile targets.
4+
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'
5+
6+
format: ## Run code autoformatters (black).
7+
pre-commit install
8+
git ls-files | xargs pre-commit run black --files
9+
10+
lint: ## Run linters: pre-commit (black, ruff, codespell) and mypy
11+
pre-commit install && git ls-files | xargs pre-commit run --show-diff-on-failure --files
12+
13+
test: ## Run tests via pytest.
14+
pytest tests
15+
16+
watch-docs: ## Build and watch documentation.
17+
sphinx-autobuild docs/ docs/_build/html --open-browser --watch $(GIT_ROOT)/llama_index/
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Raptor Retriever LlamaPack
2+
3+
This LlamaPack shows how to use an implementation of RAPTOR with llama-index, leveraging the RAPTOR pack.
4+
5+
RAPTOR works by recursively clustering and summarizing clusters in layers for retrieval.
6+
7+
There two retrieval modes:
8+
9+
- tree_traversal -- traversing the tree of clusters, performing top-k at each level in the tree.
10+
- collapsed -- treat the entire tree as a giant pile of nodes, perform simple top-k.
11+
12+
See [the paper](https://arxiv.org/abs/2401.18059) for full algorithm details.
13+
14+
## CLI Usage
15+
16+
You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package:
17+
18+
```bash
19+
llamaindex-cli download-llamapack RaptorPack --download-dir ./raptor_pack
20+
```
21+
22+
You can then inspect/modify the files at `./raptor_pack` and use them as a template for your own project.
23+
24+
## Code Usage
25+
26+
You can alternaitvely install the package:
27+
28+
`pip install llama-index-packs-raptor`
29+
30+
Then, you can import and initialize the pack! This will perform clustering and summarization over your data.
31+
32+
```python
33+
from llama_index.packs.raptor import RaptorPack
34+
35+
pack = RaptorPack(documents, llm=llm, embed_model=embed_model)
36+
```
37+
38+
The `run()` function is a light wrapper around `retriever.retrieve()`.
39+
40+
```python
41+
nodes = pack.run(
42+
"query",
43+
mode="collapsed", # or tree_traversal
44+
)
45+
```
46+
47+
You can also use modules individually.
48+
49+
```python
50+
# get the retriever
51+
retriever = pack.retriever
52+
```
53+
54+
## Persistence
55+
56+
The `RaptorPack` comes with the `RaptorRetriever`, which offers ways of saving/reloading!
57+
58+
If you are using a remote vector-db, just pass it in
59+
60+
```python
61+
# Pack usage
62+
pack = RaptorPack(..., vector_store=vector_store)
63+
64+
# RaptorRetriever usage
65+
retriever = RaptorRetriever(..., vector_store=vector_store)
66+
```
67+
68+
Then, to re-connect, just pass in the vector store again and an empty list of documents
69+
70+
```python
71+
# Pack usage
72+
pack = RaptorPack([], ..., vector_store=vector_store)
73+
74+
# RaptorRetriever usage
75+
retriever = RaptorRetriever([], ..., vector_store=vector_store)
76+
```
77+
78+
Check out the [notebook here for complete details!](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-raptor/examples/raptor.ipynb).

0 commit comments

Comments
 (0)