|
| 1 | +# Raptor Retriever LlamaPack |
| 2 | + |
| 3 | +This LlamaPack shows how to use an implementation of RAPTOR with llama-index, leveraging the RAPTOR pack. |
| 4 | + |
| 5 | +RAPTOR works by recursively clustering and summarizing clusters in layers for retrieval. |
| 6 | + |
| 7 | +There two retrieval modes: |
| 8 | + |
| 9 | +- tree_traversal -- traversing the tree of clusters, performing top-k at each level in the tree. |
| 10 | +- collapsed -- treat the entire tree as a giant pile of nodes, perform simple top-k. |
| 11 | + |
| 12 | +See [the paper](https://arxiv.org/abs/2401.18059) for full algorithm details. |
| 13 | + |
| 14 | +## CLI Usage |
| 15 | + |
| 16 | +You can download llamapacks directly using `llamaindex-cli`, which comes installed with the `llama-index` python package: |
| 17 | + |
| 18 | +```bash |
| 19 | +llamaindex-cli download-llamapack RaptorPack --download-dir ./raptor_pack |
| 20 | +``` |
| 21 | + |
| 22 | +You can then inspect/modify the files at `./raptor_pack` and use them as a template for your own project. |
| 23 | + |
| 24 | +## Code Usage |
| 25 | + |
| 26 | +You can alternaitvely install the package: |
| 27 | + |
| 28 | +`pip install llama-index-packs-raptor` |
| 29 | + |
| 30 | +Then, you can import and initialize the pack! This will perform clustering and summarization over your data. |
| 31 | + |
| 32 | +```python |
| 33 | +from llama_index.packs.raptor import RaptorPack |
| 34 | + |
| 35 | +pack = RaptorPack(documents, llm=llm, embed_model=embed_model) |
| 36 | +``` |
| 37 | + |
| 38 | +The `run()` function is a light wrapper around `retriever.retrieve()`. |
| 39 | + |
| 40 | +```python |
| 41 | +nodes = pack.run( |
| 42 | + "query", |
| 43 | + mode="collapsed", # or tree_traversal |
| 44 | +) |
| 45 | +``` |
| 46 | + |
| 47 | +You can also use modules individually. |
| 48 | + |
| 49 | +```python |
| 50 | +# get the retriever |
| 51 | +retriever = pack.retriever |
| 52 | +``` |
| 53 | + |
| 54 | +## Persistence |
| 55 | + |
| 56 | +The `RaptorPack` comes with the `RaptorRetriever`, which offers ways of saving/reloading! |
| 57 | + |
| 58 | +If you are using a remote vector-db, just pass it in |
| 59 | + |
| 60 | +```python |
| 61 | +# Pack usage |
| 62 | +pack = RaptorPack(..., vector_store=vector_store) |
| 63 | + |
| 64 | +# RaptorRetriever usage |
| 65 | +retriever = RaptorRetriever(..., vector_store=vector_store) |
| 66 | +``` |
| 67 | + |
| 68 | +Then, to re-connect, just pass in the vector store again and an empty list of documents |
| 69 | + |
| 70 | +```python |
| 71 | +# Pack usage |
| 72 | +pack = RaptorPack([], ..., vector_store=vector_store) |
| 73 | + |
| 74 | +# RaptorRetriever usage |
| 75 | +retriever = RaptorRetriever([], ..., vector_store=vector_store) |
| 76 | +``` |
| 77 | + |
| 78 | +Check out the [notebook here for complete details!](https://github.com/run-llama/llama_index/blob/main/llama-index-packs/llama-index-packs-raptor/examples/raptor.ipynb). |
0 commit comments