Skip to content

Commit 671939d

Browse files
authoredMar 3, 2025··
Lfrqa tutorial adjustments (#897)
1 parent 27ef4e8 commit 671939d

File tree

2 files changed

+543
-33
lines changed

2 files changed

+543
-33
lines changed
 

‎docs/tutorials/running_on_lfrqa.ipynb

+464
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,464 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Measuring PaperQA2 with LFRQA\n",
8+
"> This tutorial is available as a Jupyter notebook [here](https://github.com/Future-House/paper-qa/blob/main/docs/tutorials/running_on_lfrqa.md)"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"metadata": {},
14+
"source": [
15+
"## Overview\n",
16+
"\n",
17+
"The **LFRQA dataset** was introduced in the paper [_RAG-QA Arena: Evaluating Domain Robustness for Long-Form Retrieval-Augmented Question Answering_](https://arxiv.org/pdf/2407.13998). It features **1,404 science questions** (along with other categories) that have been human-annotated with answers. This tutorial walks through the process of setting up the dataset for use and benchmarking.\n",
18+
"\n",
19+
"## Download the Annotations\n",
20+
"\n",
21+
"First, we need to obtain the annotated dataset from the official repository:\n"
22+
]
23+
},
24+
{
25+
"cell_type": "code",
26+
"execution_count": null,
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"# Create a new directory for the dataset\n",
31+
"!mkdir -p data/rag-qa-benchmarking\n",
32+
"\n",
33+
"# Get the annotated questions\n",
34+
"!curl https://raw.githubusercontent.com/awslabs/rag-qa-arena/refs/heads/main/data/\\\n",
35+
"annotations_science_with_citation.jsonl \\\n",
36+
"-o data/rag-qa-benchmarking/annotations_science_with_citation.jsonl"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {},
42+
"source": [
43+
"\n",
44+
"\n",
45+
"## Download the Robust-QA Documents\n",
46+
"\n",
47+
"LFRQA is built upon **Robust-QA**, so we must download the relevant documents:\n"
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"metadata": {},
54+
"outputs": [],
55+
"source": [
56+
"# Download the Lotte dataset, which includes the required documents\n",
57+
"!curl https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/lotte.tar.gz --output lotte.tar.gz\n",
58+
"\n",
59+
"# Extract the dataset\n",
60+
"!tar -xvzf lotte.tar.gz\n",
61+
"\n",
62+
"# Move the science test collection to our dataset folder\n",
63+
"!cp lotte/science/test/collection.tsv ./data/rag-qa-benchmarking/science_test_collection.tsv\n",
64+
"\n",
65+
"# Clean up unnecessary files\n",
66+
"!rm lotte.tar.gz\n",
67+
"!rm -rf lotte"
68+
]
69+
},
70+
{
71+
"cell_type": "markdown",
72+
"metadata": {},
73+
"source": [
74+
"For more details, refer to the original paper: [_RAG-QA Arena: Evaluating Domain Robustness for Long-Form Retrieval-Augmented Question Answering_](https://arxiv.org/pdf/2407.13998)."
75+
]
76+
},
77+
{
78+
"cell_type": "markdown",
79+
"metadata": {},
80+
"source": [
81+
"\n",
82+
"\n",
83+
"## Load the Data\n",
84+
"\n",
85+
"We now load the documents into a pandas dataframe:\n"
86+
]
87+
},
88+
{
89+
"cell_type": "code",
90+
"execution_count": null,
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"import os\n",
95+
"\n",
96+
"import pandas as pd\n",
97+
"\n",
98+
"# Load questions and answers dataset\n",
99+
"rag_qa_benchmarking_dir = os.path.join(\"data\", \"rag-qa-benchmarking\")\n",
100+
"\n",
101+
"# Load documents dataset\n",
102+
"lfrqa_docs_df = pd.read_csv(\n",
103+
" os.path.join(rag_qa_benchmarking_dir, \"science_test_collection.tsv\"),\n",
104+
" sep=\"\\t\",\n",
105+
" names=[\"doc_id\", \"doc_text\"],\n",
106+
")"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"metadata": {},
112+
"source": [
113+
"## Select the Documents to Use\n",
114+
"RobustQA consists on 1.7M documents. Hence, it takes around 3 hours to build the whole index.\n",
115+
"\n",
116+
"To run a test, we can use 1% of the dataset. This will be accomplished by selecting the first 1% available documents and the questions referent to these documents."
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"metadata": {},
123+
"outputs": [],
124+
"source": [
125+
"proportion_to_use = 1 / 100\n",
126+
"amount_of_docs_to_use = int(len(lfrqa_docs_df) * proportion_to_use)\n",
127+
"print(f\"Using {amount_of_docs_to_use} out of {len(lfrqa_docs_df)} documents\")"
128+
]
129+
},
130+
{
131+
"cell_type": "markdown",
132+
"metadata": {},
133+
"source": [
134+
"## Prepare the Document Files\n",
135+
"We now create the document directory and store each document as a separate text file, so that paperqa can build the index."
136+
]
137+
},
138+
{
139+
"cell_type": "code",
140+
"execution_count": null,
141+
"metadata": {},
142+
"outputs": [],
143+
"source": [
144+
"partial_docs = lfrqa_docs_df.head(amount_of_docs_to_use)\n",
145+
"lfrqa_directory = os.path.join(rag_qa_benchmarking_dir, \"lfrqa\")\n",
146+
"os.makedirs(\n",
147+
" os.path.join(lfrqa_directory, \"science_docs_for_paperqa\", \"files\"), exist_ok=True\n",
148+
")\n",
149+
"\n",
150+
"for i, row in partial_docs.iterrows():\n",
151+
" doc_id = row[\"doc_id\"]\n",
152+
" doc_text = row[\"doc_text\"]\n",
153+
"\n",
154+
" with open(\n",
155+
" os.path.join(\n",
156+
" lfrqa_directory, \"science_docs_for_paperqa\", \"files\", f\"{doc_id}.txt\"\n",
157+
" ),\n",
158+
" \"w\",\n",
159+
" encoding=\"utf-8\",\n",
160+
" ) as f:\n",
161+
" f.write(doc_text)\n",
162+
"\n",
163+
" if i % int(len(partial_docs) * 0.05) == 0:\n",
164+
" progress = (i + 1) / len(partial_docs)\n",
165+
" print(f\"Progress: {progress:.2%}\")"
166+
]
167+
},
168+
{
169+
"cell_type": "markdown",
170+
"metadata": {},
171+
"source": [
172+
"## Create the Manifest File\n",
173+
"The **manifest file** keeps track of document metadata for the dataset. We need to fill some fields so that paperqa doesn’t try to get metadata using llm calls. This will make the indexing process faster."
174+
]
175+
},
176+
{
177+
"cell_type": "code",
178+
"execution_count": null,
179+
"metadata": {},
180+
"outputs": [],
181+
"source": [
182+
"manifest = partial_docs.copy()\n",
183+
"manifest[\"file_location\"] = manifest[\"doc_id\"].apply(lambda x: f\"files/{x}.txt\")\n",
184+
"manifest[\"doi\"] = \"\"\n",
185+
"manifest[\"title\"] = manifest[\"doc_id\"]\n",
186+
"manifest[\"key\"] = manifest[\"doc_id\"]\n",
187+
"manifest[\"docname\"] = manifest[\"doc_id\"]\n",
188+
"manifest[\"citation\"] = \"_\"\n",
189+
"manifest = manifest.drop(columns=[\"doc_id\", \"doc_text\"])\n",
190+
"manifest.to_csv(\n",
191+
" os.path.join(lfrqa_directory, \"science_docs_for_paperqa\", \"manifest.csv\"),\n",
192+
" index=False,\n",
193+
")"
194+
]
195+
},
196+
{
197+
"cell_type": "markdown",
198+
"metadata": {},
199+
"source": [
200+
"## Filter and Save Questions\n",
201+
"Finally, we load the questions and filter them to ensure we only include questions that reference the selected documents:"
202+
]
203+
},
204+
{
205+
"cell_type": "code",
206+
"execution_count": null,
207+
"metadata": {},
208+
"outputs": [],
209+
"source": [
210+
"questions_df = pd.read_json(\n",
211+
" os.path.join(rag_qa_benchmarking_dir, \"annotations_science_with_citation.jsonl\"),\n",
212+
" lines=True,\n",
213+
")\n",
214+
"partial_questions = questions_df[\n",
215+
" questions_df.gold_doc_ids.apply(\n",
216+
" lambda ids: all(_id < amount_of_docs_to_use for _id in ids)\n",
217+
" )\n",
218+
"]\n",
219+
"partial_questions.to_csv(\n",
220+
" os.path.join(lfrqa_directory, \"questions.csv\"),\n",
221+
" index=False,\n",
222+
")\n",
223+
"\n",
224+
"print(\"Using\", len(partial_questions), \"questions\")"
225+
]
226+
},
227+
{
228+
"cell_type": "markdown",
229+
"metadata": {},
230+
"source": [
231+
"## Install paperqa\n",
232+
"From now on, we will be using the paperqa library, so we need to install it:"
233+
]
234+
},
235+
{
236+
"cell_type": "code",
237+
"execution_count": null,
238+
"metadata": {},
239+
"outputs": [],
240+
"source": [
241+
"!pip install paper-qa"
242+
]
243+
},
244+
{
245+
"cell_type": "markdown",
246+
"metadata": {},
247+
"source": [
248+
"## Index the Documents\n",
249+
"\n",
250+
"Now we will build an index for the LFRQA documents. The index is a **Tantivy index**, which is a fast, full-text search engine library written in Rust. Tantivy is designed to handle large datasets efficiently, making it ideal for searching through a vast collection of papers or documents.\n",
251+
"\n",
252+
"Feel free to adjust the concurrency settings as you like. Because we defined a manifest, we don’t need any API keys for building this index because we don't discern any citation metadata, but you do need LLM API keys to answer questions.\n",
253+
"\n",
254+
"Remember that this process is quick for small portions of the dataset, but can take around 3 hours for the whole dataset."
255+
]
256+
},
257+
{
258+
"cell_type": "code",
259+
"execution_count": null,
260+
"metadata": {},
261+
"outputs": [],
262+
"source": [
263+
"import nest_asyncio\n",
264+
"\n",
265+
"nest_asyncio.apply()"
266+
]
267+
},
268+
{
269+
"cell_type": "markdown",
270+
"metadata": {},
271+
"source": [
272+
"We add the line above to handle async code within a notebook.\n",
273+
"\n",
274+
"However, to improve compatibility and speed up the indexing process, we strongly recommend running the following code in a separate `.py` file"
275+
]
276+
},
277+
{
278+
"cell_type": "code",
279+
"execution_count": null,
280+
"metadata": {},
281+
"outputs": [],
282+
"source": [
283+
"import os\n",
284+
"\n",
285+
"from paperqa import Settings\n",
286+
"from paperqa.agents import build_index\n",
287+
"from paperqa.settings import AgentSettings, IndexSettings, ParsingSettings\n",
288+
"\n",
289+
"settings = Settings(\n",
290+
" agent=AgentSettings(\n",
291+
" index=IndexSettings(\n",
292+
" name=\"lfrqa_science_index\",\n",
293+
" paper_directory=os.path.join(\n",
294+
" \"data\", \"rag-qa-benchmarking\", \"lfrqa\", \"science_docs_for_paperqa\"\n",
295+
" ),\n",
296+
" index_directory=os.path.join(\n",
297+
" \"data\", \"rag-qa-benchmarking\", \"lfrqa\", \"science_docs_for_paperqa_index\"\n",
298+
" ),\n",
299+
" manifest_file=\"manifest.csv\",\n",
300+
" concurrency=10_000,\n",
301+
" batch_size=10_000,\n",
302+
" )\n",
303+
" ),\n",
304+
" parsing=ParsingSettings(\n",
305+
" use_doc_details=False,\n",
306+
" defer_embedding=True,\n",
307+
" ),\n",
308+
")\n",
309+
"\n",
310+
"build_index(settings=settings)"
311+
]
312+
},
313+
{
314+
"cell_type": "markdown",
315+
"metadata": {},
316+
"source": [
317+
"After this runs, you will have an index ready to use!"
318+
]
319+
},
320+
{
321+
"cell_type": "markdown",
322+
"metadata": {},
323+
"source": [
324+
"## Benchmark!\n",
325+
"After you have built the index, you are ready to run the benchmark. We advice running this in a separate `.py` file.\n",
326+
"\n",
327+
"To run this, you will need to have the [`ldp`](https://github.com/Future-House/ldp) and [`fhaviary[lfrqa]`](https://github.com/Future-House/aviary/blob/main/packages/lfrqa/README.md#installation) packages installed.\n"
328+
]
329+
},
330+
{
331+
"cell_type": "code",
332+
"execution_count": null,
333+
"metadata": {},
334+
"outputs": [],
335+
"source": [
336+
"!pip install ldp \"fhaviary[lfrqa]\""
337+
]
338+
},
339+
{
340+
"cell_type": "code",
341+
"execution_count": null,
342+
"metadata": {},
343+
"outputs": [],
344+
"source": [
345+
"import asyncio\n",
346+
"import json\n",
347+
"import logging\n",
348+
"import os\n",
349+
"\n",
350+
"import pandas as pd\n",
351+
"from aviary.envs.lfrqa import LFRQAQuestion, LFRQATaskDataset\n",
352+
"from ldp.agent import SimpleAgent\n",
353+
"from ldp.alg.runners import Evaluator, EvaluatorConfig\n",
354+
"\n",
355+
"from paperqa import Settings\n",
356+
"from paperqa.settings import AgentSettings, IndexSettings\n",
357+
"\n",
358+
"logging.basicConfig(level=logging.ERROR)\n",
359+
"\n",
360+
"log_results_dir = os.path.join(\"data\", \"rag-qa-benchmarking\", \"results\")\n",
361+
"os.makedirs(log_results_dir, exist_ok=True)\n",
362+
"\n",
363+
"\n",
364+
"async def log_evaluation_to_json(lfrqa_question_evaluation: dict) -> None: # noqa: RUF029\n",
365+
" json_path = os.path.join(\n",
366+
" log_results_dir, f\"{lfrqa_question_evaluation['qid']}.json\"\n",
367+
" )\n",
368+
" with open(json_path, \"w\") as f: # noqa: ASYNC230\n",
369+
" json.dump(lfrqa_question_evaluation, f, indent=2)\n",
370+
"\n",
371+
"\n",
372+
"async def evaluate() -> None:\n",
373+
" settings = Settings(\n",
374+
" agent=AgentSettings(\n",
375+
" index=IndexSettings(\n",
376+
" name=\"lfrqa_science_index\",\n",
377+
" paper_directory=os.path.join(\n",
378+
" \"data\", \"rag-qa-benchmarking\", \"lfrqa\", \"science_docs_for_paperqa\"\n",
379+
" ),\n",
380+
" index_directory=os.path.join(\n",
381+
" \"data\",\n",
382+
" \"rag-qa-benchmarking\",\n",
383+
" \"lfrqa\",\n",
384+
" \"science_docs_for_paperqa_index\",\n",
385+
" ),\n",
386+
" )\n",
387+
" )\n",
388+
" )\n",
389+
"\n",
390+
" data: list[LFRQAQuestion] = [\n",
391+
" LFRQAQuestion(**row)\n",
392+
" for row in pd.read_csv(\n",
393+
" os.path.join(\"data\", \"rag-qa-benchmarking\", \"lfrqa\", \"questions.csv\")\n",
394+
" )[[\"qid\", \"question\", \"answer\", \"gold_doc_ids\"]].to_dict(orient=\"records\")\n",
395+
" ]\n",
396+
"\n",
397+
" dataset = LFRQATaskDataset(\n",
398+
" data=data,\n",
399+
" settings=settings,\n",
400+
" evaluation_callback=log_evaluation_to_json,\n",
401+
" )\n",
402+
"\n",
403+
" evaluator = Evaluator(\n",
404+
" config=EvaluatorConfig(batch_size=3),\n",
405+
" agent=SimpleAgent(),\n",
406+
" dataset=dataset,\n",
407+
" )\n",
408+
" await evaluator.evaluate()\n",
409+
"\n",
410+
"\n",
411+
"if __name__ == \"__main__\":\n",
412+
" asyncio.run(evaluate())\n"
413+
]
414+
},
415+
{
416+
"cell_type": "markdown",
417+
"metadata": {},
418+
"source": [
419+
"After running this, you can find the results in the `data/rag-qa-benchmarking/results` folder. Here is an example of how to read them:"
420+
]
421+
},
422+
{
423+
"cell_type": "code",
424+
"execution_count": null,
425+
"metadata": {},
426+
"outputs": [],
427+
"source": [
428+
"import glob\n",
429+
"\n",
430+
"json_files = glob.glob(os.path.join(rag_qa_benchmarking_dir, \"results\", \"*.json\"))\n",
431+
"\n",
432+
"data = []\n",
433+
"for file in json_files:\n",
434+
" with open(file) as f:\n",
435+
" json_data = json.load(f)\n",
436+
" json_data[\"qid\"] = file.split(\"/\")[-1].replace(\".json\", \"\")\n",
437+
" data.append(json_data)\n",
438+
"\n",
439+
"results_df = pd.DataFrame(data).set_index(\"qid\")\n",
440+
"results_df[\"winner\"].value_counts(normalize=True)\n"
441+
]
442+
}
443+
],
444+
"metadata": {
445+
"kernelspec": {
446+
"display_name": ".venv",
447+
"language": "python",
448+
"name": "python3"
449+
},
450+
"language_info": {
451+
"codemirror_mode": {
452+
"name": "ipython",
453+
"version": 3
454+
},
455+
"file_extension": ".py",
456+
"mimetype": "text/x-python",
457+
"name": "python",
458+
"nbconvert_exporter": "python",
459+
"pygments_lexer": "ipython3"
460+
}
461+
},
462+
"nbformat": 4,
463+
"nbformat_minor": 2
464+
}

‎docs/tutorials/running_on_lfrqa.md

+79-33
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,20 @@
1+
---
2+
jupyter:
3+
jupytext:
4+
text_representation:
5+
extension: .md
6+
format_name: markdown
7+
format_version: '1.3'
8+
jupytext_version: 1.16.7
9+
kernelspec:
10+
display_name: .venv
11+
language: python
12+
name: python3
13+
---
14+
115
# Measuring PaperQA2 with LFRQA
16+
> This tutorial is available as a Jupyter notebook [here](https://github.com/Future-House/paper-qa/blob/main/docs/tutorials/running_on_lfrqa.md)
17+
218

319
## Overview
420

@@ -8,41 +24,55 @@ The **LFRQA dataset** was introduced in the paper [_RAG-QA Arena: Evaluating Dom
824

925
First, we need to obtain the annotated dataset from the official repository:
1026

11-
```bash
27+
28+
```python
1229
# Create a new directory for the dataset
13-
mkdir -p data/rag-qa-benchmarking
30+
!mkdir -p data/rag-qa-benchmarking
1431

1532
# Get the annotated questions
16-
curl https://raw.githubusercontent.com/awslabs/rag-qa-arena/refs/heads/main/data/annotations_science_with_citation.jsonl -o data/rag-qa-benchmarking/annotations_science_with_citation.jsonl
33+
!curl https://raw.githubusercontent.com/awslabs/rag-qa-arena/refs/heads/main/data/\
34+
annotations_science_with_citation.jsonl \
35+
-o data/rag-qa-benchmarking/annotations_science_with_citation.jsonl
1736
```
1837

38+
<!-- #region -->
39+
40+
1941
## Download the Robust-QA Documents
2042

2143
LFRQA is built upon **Robust-QA**, so we must download the relevant documents:
2244

23-
```bash
45+
<!-- #endregion -->
46+
47+
```python
2448
# Download the Lotte dataset, which includes the required documents
25-
curl https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/lotte.tar.gz --output lotte.tar.gz
49+
!curl https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/lotte.tar.gz --output lotte.tar.gz
2650

2751
# Extract the dataset
28-
tar -xvzf lotte.tar.gz
52+
!tar -xvzf lotte.tar.gz
2953

3054
# Move the science test collection to our dataset folder
31-
cp lotte/science/test/collection.tsv ./data/rag-qa-benchmarking/science_test_collection.tsv
55+
!cp lotte/science/test/collection.tsv ./data/rag-qa-benchmarking/science_test_collection.tsv
3256

3357
# Clean up unnecessary files
34-
rm lotte.tar.gz
35-
rm -rf lotte
58+
!rm lotte.tar.gz
59+
!rm -rf lotte
3660
```
3761

3862
For more details, refer to the original paper: [_RAG-QA Arena: Evaluating Domain Robustness for Long-Form Retrieval-Augmented Question Answering_](https://arxiv.org/pdf/2407.13998).
3963

64+
<!-- #region -->
65+
66+
4067
## Load the Data
4168

4269
We now load the documents into a pandas dataframe:
4370

71+
<!-- #endregion -->
72+
4473
```python
4574
import os
75+
4676
import pandas as pd
4777

4878
# Load questions and answers dataset
@@ -57,10 +87,9 @@ lfrqa_docs_df = pd.read_csv(
5787
```
5888

5989
## Select the Documents to Use
90+
RobustQA consists on 1.7M documents. Hence, it takes around 3 hours to build the whole index.
6091

61-
RobustQA consists on 1.7M documents, so building the whole index will take around 3 hours.
62-
63-
If you want to run a test, you can use a portion of the dataset and the questions that can be answered only on those documents.
92+
To run a test, we can use 1% of the dataset. This will be accomplished by selecting the first 1% available documents and the questions referent to these documents.
6493

6594
```python
6695
proportion_to_use = 1 / 100
@@ -69,7 +98,6 @@ print(f"Using {amount_of_docs_to_use} out of {len(lfrqa_docs_df)} documents")
6998
```
7099

71100
## Prepare the Document Files
72-
73101
We now create the document directory and store each document as a separate text file, so that paperqa can build the index.
74102

75103
```python
@@ -98,7 +126,6 @@ for i, row in partial_docs.iterrows():
98126
```
99127

100128
## Create the Manifest File
101-
102129
The **manifest file** keeps track of document metadata for the dataset. We need to fill some fields so that paperqa doesn’t try to get metadata using llm calls. This will make the indexing process faster.
103130

104131
```python
@@ -109,15 +136,14 @@ manifest["title"] = manifest["doc_id"]
109136
manifest["key"] = manifest["doc_id"]
110137
manifest["docname"] = manifest["doc_id"]
111138
manifest["citation"] = "_"
112-
manifest.drop(columns=["doc_id", "doc_text"], inplace=True)
139+
manifest = manifest.drop(columns=["doc_id", "doc_text"])
113140
manifest.to_csv(
114141
os.path.join(lfrqa_directory, "science_docs_for_paperqa", "manifest.csv"),
115142
index=False,
116143
)
117144
```
118145

119146
## Filter and Save Questions
120-
121147
Finally, we load the questions and filter them to ensure we only include questions that reference the selected documents:
122148

123149
```python
@@ -127,42 +153,53 @@ questions_df = pd.read_json(
127153
)
128154
partial_questions = questions_df[
129155
questions_df.gold_doc_ids.apply(
130-
lambda ids: all(id < amount_of_docs_to_use for id in ids)
156+
lambda ids: all(_id < amount_of_docs_to_use for _id in ids)
131157
)
132158
]
133159
partial_questions.to_csv(
134160
os.path.join(lfrqa_directory, "questions.csv"),
135161
index=False,
136162
)
163+
164+
print("Using", len(partial_questions), "questions")
137165
```
138166

139167
## Install paperqa
140-
141168
From now on, we will be using the paperqa library, so we need to install it:
142169

143-
```bash
144-
pip install paper-qa
170+
```python
171+
!pip install paper-qa
145172
```
146173

147-
## Index the documents
174+
## Index the Documents
148175

149-
Copy the following to a file and run it. Feel free to adjust the concurrency as you like.
176+
Now we will build an index for the LFRQA documents. The index is a **Tantivy index**, which is a fast, full-text search engine library written in Rust. Tantivy is designed to handle large datasets efficiently, making it ideal for searching through a vast collection of papers or documents.
150177

151-
You don’t need any api keys for building this index because we don't discern any citation metadata, but you do need LLM api keys to answer questions.
178+
Feel free to adjust the concurrency settings as you like. Because we defined a manifest, we don’t need any API keys for building this index because we don't discern any citation metadata, but you do need LLM API keys to answer questions.
152179

153180
Remember that this process is quick for small portions of the dataset, but can take around 3 hours for the whole dataset.
154181

182+
```python
183+
import nest_asyncio
184+
185+
nest_asyncio.apply()
186+
```
187+
188+
We add the line above to handle async code within a notebook.
189+
190+
However, to improve compatibility and speed up the indexing process, we strongly recommend running the following code in a separate `.py` file
191+
155192
```python
156193
import os
157194

158-
from paperqa import Settings, ask
195+
from paperqa import Settings
159196
from paperqa.agents import build_index
160197
from paperqa.settings import AgentSettings, IndexSettings, ParsingSettings
161198

162199
settings = Settings(
163200
agent=AgentSettings(
164201
index=IndexSettings(
165-
name="lfrqa_science_index0.1",
202+
name="lfrqa_science_index",
166203
paper_directory=os.path.join(
167204
"data", "rag-qa-benchmarking", "lfrqa", "science_docs_for_paperqa"
168205
),
@@ -183,17 +220,23 @@ settings = Settings(
183220
build_index(settings=settings)
184221
```
185222

186-
After this runs, you will get an answer!
223+
After this runs, you will have an index ready to use!
224+
187225

188226
## Benchmark!
227+
After you have built the index, you are ready to run the benchmark. We advice running this in a separate `.py` file.
228+
229+
To run this, you will need to have the [`ldp`](https://github.com/Future-House/ldp) and [`fhaviary[lfrqa]`](https://github.com/Future-House/aviary/blob/main/packages/lfrqa/README.md#installation) packages installed.
189230

190-
After you have built the index, you are ready to run the benchmark.
191231

192-
Copy the following into a file and run it. To run this, you will need to have the [`ldp`](https://github.com/Future-House/ldp) and [`fhaviary[lfrqa]`](https://github.com/Future-House/aviary/blob/main/packages/lfrqa/README.md#installation) packages installed.
232+
```python
233+
!pip install ldp "fhaviary[lfrqa]"
234+
```
193235

194236
```python
195237
import asyncio
196238
import json
239+
import logging
197240
import os
198241

199242
import pandas as pd
@@ -204,16 +247,19 @@ from ldp.alg.runners import Evaluator, EvaluatorConfig
204247
from paperqa import Settings
205248
from paperqa.settings import AgentSettings, IndexSettings
206249

250+
logging.basicConfig(level=logging.ERROR)
207251

208252
log_results_dir = os.path.join("data", "rag-qa-benchmarking", "results")
209253
os.makedirs(log_results_dir, exist_ok=True)
210254

211255

212-
async def log_evaluation_to_json(lfrqa_question_evaluation: dict) -> None:
256+
async def log_evaluation_to_json(
257+
lfrqa_question_evaluation: dict,
258+
) -> None: # noqa: RUF029
213259
json_path = os.path.join(
214260
log_results_dir, f"{lfrqa_question_evaluation['qid']}.json"
215261
)
216-
with open(json_path, "w") as f:
262+
with open(json_path, "w") as f: # noqa: ASYNC230
217263
json.dump(lfrqa_question_evaluation, f, indent=2)
218264

219265

@@ -260,11 +306,11 @@ if __name__ == "__main__":
260306
asyncio.run(evaluate())
261307
```
262308

309+
263310
After running this, you can find the results in the `data/rag-qa-benchmarking/results` folder. Here is an example of how to read them:
264311

265312
```python
266313
import glob
267-
import json
268314

269315
json_files = glob.glob(os.path.join(rag_qa_benchmarking_dir, "results", "*.json"))
270316

@@ -275,6 +321,6 @@ for file in json_files:
275321
json_data["qid"] = file.split("/")[-1].replace(".json", "")
276322
data.append(json_data)
277323

278-
df = pd.DataFrame(data).set_index("qid")
279-
df["winner"].value_counts(normalize=True)
324+
results_df = pd.DataFrame(data).set_index("qid")
325+
results_df["winner"].value_counts(normalize=True)
280326
```

0 commit comments

Comments
 (0)
Please sign in to comment.