|
1 | | -# Example langchain retriever |
2 | | - |
3 | | -This project demonstrates one approach for implementing a |
4 | | -[langchain retriever](https://python.langchain.com/docs/modules/data_connection/) |
5 | | -that allows for |
6 | | -[Retrieval Augmented Generation (RAG)](https://python.langchain.com/docs/use_cases/question_answering/) |
7 | | -to be supported via MarkLogic and the MarkLogic Python Client. This example uses the same data as in |
8 | | -[the langchain RAG quickstart guide](https://python.langchain.com/docs/use_cases/question_answering/quickstart), |
9 | | -but with the data having first been loaded into MarkLogic. |
10 | | - |
11 | | -**This is only intended as an example** of how easily a langchain retriever can be developed |
12 | | -using the MarkLogic Python Client. The queries in this example are simple and naturally |
13 | | -do not have any knowledge of how your data is modeled in MarkLogic. You are encouraged to use |
14 | | -this as an example for developing your own retriever, where you can build a query based on a |
15 | | -question submitted to langchain that fully leverages the indexes and data models in your MarkLogic |
16 | | -application. Additionally, please see the |
17 | | -[langchain documentation on splitting text](https://python.langchain.com/docs/modules/data_connection/document_transformers/). You may need to restructure your data so that you have a larger number of |
18 | | -smaller documents in your database so that you do not exceed the limit that langchain imposes on how |
19 | | -much data a retriever can return. |
20 | | - |
21 | | -# Setup |
22 | | - |
23 | | -To try out this project, use [docker-compose](https://docs.docker.com/compose/) to instantiate a new MarkLogic |
24 | | -instance with port 8003 available (you can use your own MarkLogic instance too, just be sure that port 8003 |
25 | | -is available): |
26 | | - |
27 | | - docker-compose up -d --build |
28 | | - |
29 | | -Then deploy a small REST API application to MarkLogic, which includes a basic non-admin MarkLogic user |
30 | | -named `langchain-user`: |
31 | | - |
32 | | - ./gradlew -i mlDeploy |
33 | | - |
34 | | -Next, create a new Python virtual environment - [pyenv](https://github.com/pyenv/pyenv) is recommended for this - |
35 | | -and install the |
36 | | -[langchain example dependencies](https://python.langchain.com/docs/use_cases/question_answering/quickstart#dependencies), |
37 | | -along with the MarkLogic Python Client: |
38 | | - |
39 | | - pip install -U langchain langchain_openai langchain-community langchainhub openai chromadb bs4 marklogic_python_client |
40 | | - |
41 | | -Then run the following Python program to load text data from the langchain quickstart guide |
42 | | -into two different collections in the `langchain-test-content` database: |
43 | | - |
44 | | - python load_data.py |
45 | | - |
46 | | -Create a ".env" file to hold your OpenAI API key: |
47 | | - |
48 | | - echo "OPENAI_API_KEY=<your key here>" > .env |
49 | | - |
50 | | -# Testing the retriever |
51 | | - |
52 | | -You are now ready to test the example retriever. Run the following to ask a question with the |
53 | | -results augmented via the `marklogic_retriever.py` module in this project; you will be |
54 | | -prompted for an OpenAI API key when you run this, which you can type or paste in: |
55 | | - |
56 | | - python ask.py "What is task decomposition?" posts |
57 | | - |
58 | | -The retriever uses a [cts.similarQuery](https://docs.marklogic.com/cts.similarQuery) to select from the documents |
59 | | -loaded via `load_data.py`. It defaults to a page length of 10. You can change this by providing a command line |
60 | | -argument - e.g.: |
61 | | - |
62 | | - python ask.py "What is task decomposition?" posts 15 |
63 | | - |
64 | | -Example of a question for the "sotu" (State of the Union speech) collection: |
65 | | - |
66 | | - python ask.py "What are economic sanctions?" sotu 20 |
67 | | - |
68 | | -To use a word query instead of a similar query, along with a set of drop words, specify "word" as the 4th argument: |
69 | | - |
70 | | - python ask.py "What are economic sanctions?" sotu 20 word |
| 1 | +This example project has been moved to the [MarkLogic AI examples repository](https://github.com/marklogic/marklogic-ai-examples). |
0 commit comments