Skip to content

feat: adding workflow file to run notebooks #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
6369865
adding workflow file
davidsbatista Jul 8, 2025
ab161a7
adding on event
davidsbatista Jul 8, 2025
ae03627
updating skip notebooks path
davidsbatista Jul 9, 2025
4d7fe71
updating skip notebooks path
davidsbatista Jul 9, 2025
985630b
updating skip notebooks file
davidsbatista Jul 9, 2025
a8a46ac
adding kernel and python3
davidsbatista Jul 9, 2025
209023b
adding more notebooks to test
davidsbatista Jul 9, 2025
48c4821
adding env keys
davidsbatista Jul 9, 2025
56ffab2
skipping all notebooks not fixed yet
davidsbatista Jul 9, 2025
749923e
adding HF token key as env variable
davidsbatista Jul 9, 2025
3e6f61c
skipping llama3 rag - needs gpu
davidsbatista Jul 9, 2025
2e1a75b
passing env vars to papermill
davidsbatista Jul 9, 2025
711f397
converting to notebooks to python scripts instead of using pappermill
davidsbatista Jul 9, 2025
ce6e5f0
updating workflow
davidsbatista Jul 9, 2025
86d6dae
updating workflow
davidsbatista Jul 9, 2025
5b4e07f
updating workflow: adding ipython
davidsbatista Jul 9, 2025
f5f887b
updating workflow
davidsbatista Jul 9, 2025
a35a93e
updating workflow
davidsbatista Jul 9, 2025
4f55b08
updating workflow
davidsbatista Jul 9, 2025
e2ecd77
updating workflow
davidsbatista Jul 9, 2025
19b679b
updating workflow
davidsbatista Jul 9, 2025
e1f5a28
testing env vars
davidsbatista Jul 9, 2025
a199e0b
mv env into job
davidsbatista Jul 9, 2025
81cabc5
wip: debugging
davidsbatista Jul 9, 2025
0c9ecf5
wip: debugging
davidsbatista Jul 9, 2025
d147607
adapting CI from tutorials
davidsbatista Jul 10, 2025
e500be6
running always for testing
davidsbatista Jul 10, 2025
d70d48b
removing version
davidsbatista Jul 10, 2025
4b5348b
appling to all files not only changed
davidsbatista Jul 10, 2025
a74fe68
wip: appling to all files
davidsbatista Jul 10, 2025
0767ba4
wip: aligning
davidsbatista Jul 10, 2025
275efcc
wip: aligning
davidsbatista Jul 10, 2025
bd3feb7
wip fixing JSON dump output
davidsbatista Jul 10, 2025
7957df4
changing matrix not to use filter, it was removed
davidsbatista Jul 10, 2025
d36b2e2
updating path for notebooks
davidsbatista Jul 10, 2025
0d9becf
adding filter for allowed notebooks
davidsbatista Jul 10, 2025
56f0222
debugging: running only one notebook
davidsbatista Jul 10, 2025
34baa1a
debugging: running only one notebook
davidsbatista Jul 10, 2025
60dee01
debugging: running only one notebook
davidsbatista Jul 10, 2025
cb937e1
debugging: running on latest stable haystack release
davidsbatista Jul 10, 2025
f461f2c
debugging: running on latest stable haystack release
davidsbatista Jul 10, 2025
d60f8d5
updating notebook to have bash instead of exclamation point
davidsbatista Jul 10, 2025
19012a4
updating notebook to have bash instead of exclamation point
davidsbatista Jul 10, 2025
648b00f
updating notebook to have bash instead of exclamation point
davidsbatista Jul 10, 2025
f5c7194
replacing wget by python code to download the file
davidsbatista Jul 10, 2025
e18126b
adding one more notebook
davidsbatista Jul 10, 2025
b60f2e2
adding one more notebook
davidsbatista Jul 10, 2025
6ed4f5a
replacing exclamation point by bash
davidsbatista Jul 10, 2025
d0518ef
updating sentence transformers version
davidsbatista Jul 10, 2025
b8b8a30
adding dependencies to a LLM metadata extractor notebook
davidsbatista Jul 10, 2025
2bedd18
adding dependencies to a LLM metadata extractor notebook
davidsbatista Jul 10, 2025
e899d58
removing HF token
davidsbatista Jul 10, 2025
78dd754
reverting to main
davidsbatista Jul 10, 2025
c2166e4
moving to bash
davidsbatista Jul 10, 2025
68c83a7
wip: adding chat_with_SQL_3_ways
davidsbatista Jul 10, 2025
10a456b
adding notebooks/chat_with_SQL_3_ways.ipynb
davidsbatista Jul 11, 2025
dfaf84a
fixing generate matrix
davidsbatista Jul 11, 2025
f744997
reverting chat sql
davidsbatista Jul 11, 2025
bd7bde7
fixing chat sql
davidsbatista Jul 11, 2025
9b04fcb
fixing chat sql
davidsbatista Jul 11, 2025
d2106a5
dealing with Pipeline.draw
davidsbatista Jul 11, 2025
422b837
removing chat - needs lots of fixes
davidsbatista Jul 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions .github/workflows/run_cookbooks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: Run Haystack Cookbooks

on:
pull_request:
schedule:
- cron: '0 0 */14 * *' # Every 14 days at midnight UTC
workflow_dispatch:


jobs:
generate-matrix:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.generator.outputs.matrix }}
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.11"

- id: generator
env:
GH_TOKEN: ${{ github.token }}
run: |
# Get cookbooks
pip install requests
NOTEBOOKS=$(python ./scripts/generate_matrix.py --include-main)
echo "matrix={\"include\":$NOTEBOOKS}" >> "$GITHUB_OUTPUT"

run-notebooks:
runs-on: ubuntu-latest
needs: generate-matrix
container: deepset/haystack:${{ matrix.haystack_version }}

strategy:
fail-fast: false
max-parallel: 3
matrix: ${{ fromJSON(needs.generate-matrix.outputs.matrix) }}

env:
HAYSTACK_TELEMETRY_ENABLED: "False"
# HF_API_TOKEN: ${{ secrets.HF_API_KEY }}
# Note: HF_API_TOKEN needs to be unset for notebooks using sentence-transformers
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
SERPERDEV_API_KEY: ${{ secrets.SERPERDEV_API_KEY }}

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Dump matrix content
run: echo '${{ toJSON(matrix) }}'

- name: Install common dependencies
run: |
apt-get update && apt-get install -y \
build-essential \
gcc \
libsndfile1 \
ffmpeg

pip install nbconvert ipython

- name: Install dependencies
if: toJSON(matrix.dependencies) != '[]'
run: |
pip install "${{ join(matrix.dependencies, '" "')}}"

- name: Convert notebook to Python
run: |
jupyter nbconvert --to python --RegexRemovePreprocessor.patterns '%%bash' ./notebooks/${{ matrix.notebook }}.ipynb

- name: Run the converted notebook
run: |
python ./notebooks/${{ matrix.notebook }}.py
2 changes: 2 additions & 0 deletions index.toml
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ topics = ["Function Calling", "Agents"]
title = "Extracting Metadata with an LLM"
notebook = "metadata_extraction_with_llm_metadata_extractor.ipynb"
topics = ["Metadata"]
dependencies = ["sentence-transformers>=4.1.0"]

[[cookbook]]
title = "Building an Interactive Feedback Review Agent with Azure AI Search and Haystack"
Expand All @@ -309,6 +310,7 @@ title = "DevOps Support Agent with Human in the Loop"
notebook = "agent_with_human_in_the_loop.ipynb"
new = true
topics = ["Function Calling", "Agents"]
dependencies = ["requests"]

[[cookbook]]
title = "Introduction to Multimodal Text Generation"
Expand Down
53 changes: 26 additions & 27 deletions notebooks/auto_merging_retriever.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,28 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "LaJsFx4P1o_l",
"outputId": "a5b29fa2-6d74-4ccf-e732-77c8a4f68491"
},
"outputs": [],
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[2mUsing Python 3.12.6 environment at: /Users/dsbatista/haystack-cookbook/.venv\u001b[0m\n",
"\u001b[2mAudited \u001b[1m1 package\u001b[0m \u001b[2min 23ms\u001b[0m\u001b[0m\n"
]
}
],
"source": [
"!pip install haystack-ai"
"%%bash\n",
"\n",
"pip install haystack-ai"
]
},
{
Expand All @@ -52,35 +63,23 @@
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cpMYVx1VY7Z7",
"outputId": "521dbe20-c6dc-4897-c4d7-764b6b82cea1"
},
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2024-09-06 09:41:04-- https://raw.githubusercontent.com/amankharwal/Website-data/master/bbc-news-data.csv\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 5080260 (4.8M) [text/plain]\n",
"Saving to: ‘bbc-news-data.csv’\n",
"\n",
"bbc-news-data.csv 100%[===================>] 4.84M --.-KB/s in 0.09s \n",
"\n",
"2024-09-06 09:41:05 (56.4 MB/s) - ‘bbc-news-data.csv’ saved [5080260/5080260]\n",
"\n"
]
"data": {
"text/plain": [
"('bbc-news-data.csv', <http.client.HTTPMessage at 0x103bd8260>)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"!wget https://raw.githubusercontent.com/amankharwal/Website-data/master/bbc-news-data.csv"
"import urllib.request\n",
"urllib.request.urlretrieve('https://raw.githubusercontent.com/amankharwal/Website-data/master/bbc-news-data.csv', 'bbc-news-data.csv')"
]
},
{
Expand Down
1,747 changes: 875 additions & 872 deletions notebooks/chat_with_SQL_3_ways.ipynb

Large diffs are not rendered by default.

53 changes: 14 additions & 39 deletions notebooks/llama3_rag.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -309,26 +309,21 @@
},
"outputs": [],
"source": [
"from haystack.components.builders import PromptBuilder\n",
"\n",
"prompt_template = \"\"\"\n",
"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n",
"\n",
"from haystack.components.builders import ChatPromptBuilder\n",
"from haystack.dataclasses import ChatMessage\n",
"\n",
"template = [ChatMessage.from_user(\"\"\"\n",
"Using the information contained in the context, give a comprehensive answer to the question.\n",
"If the answer cannot be deduced from the context, do not give an answer.\n",
"\n",
"Context:\n",
" {% for doc in documents %}\n",
" {{ doc.content }} URL:{{ doc.meta['url'] }}\n",
" {% endfor %};\n",
" Question: {{query}}<|eot_id|>\n",
"\n",
"<|start_header_id|>assistant<|end_header_id|>\n",
" Question: {{query}}\n",
"\n",
"\n",
"\"\"\"\n",
"prompt_builder = PromptBuilder(template=prompt_template)"
"\"\"\")]\n",
"prompt_builder = ChatPromptBuilder(template=template)"
]
},
{
Expand All @@ -337,7 +332,7 @@
"id": "pbvNtRzxPSOe"
},
"source": [
"Here, we use the [`HuggingFaceLocalGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalgenerator), loading the model in Colab with 4-bit quantization."
"Here, we use the [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator), loading the model in Colab with 4-bit quantization."
]
},
{
Expand Down Expand Up @@ -664,9 +659,9 @@
],
"source": [
"import torch\n",
"from haystack.components.generators import HuggingFaceLocalGenerator\n",
"from haystack.components.generators.chat import HuggingFaceLocalChatGenerator\n",
"\n",
"generator = HuggingFaceLocalGenerator(\n",
"generator = HuggingFaceLocalChatGenerator(\n",
" model=\"meta-llama/Meta-Llama-3.1-8B-Instruct\",\n",
" huggingface_pipeline_kwargs={\"device_map\":\"auto\",\n",
" \"model_kwargs\":{\"load_in_4bit\":True,\n",
Expand All @@ -688,27 +683,7 @@
"id": "lx6PNcm-I1zF",
"outputId": "363cf752-2b84-48d2-b8bc-d9f83542ff96"
},
"outputs": [
{
"data": {
"text/plain": [
"<haystack.core.pipeline.pipeline.Pipeline object at 0x7fcda58f5300>\n",
"🚅 Components\n",
" - text_embedder: SentenceTransformersTextEmbedder\n",
" - retriever: InMemoryEmbeddingRetriever\n",
" - prompt_builder: PromptBuilder\n",
" - generator: HuggingFaceLocalGenerator\n",
"🛤️ Connections\n",
" - text_embedder.embedding -> retriever.query_embedding (List[float])\n",
" - retriever.documents -> prompt_builder.documents (List[Document])\n",
" - prompt_builder.prompt -> generator.prompt (str)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n",
"\n",
Expand All @@ -722,7 +697,7 @@
" prefix=\"Represent this sentence for searching relevant passages: \", # as explained in the model card (https://huggingface.co/Snowflake/snowflake-arctic-embed-l#using-huggingface-transformers), queries should be prefixed\n",
" ))\n",
"query_pipeline.add_component(\"retriever\", InMemoryEmbeddingRetriever(document_store=document_store, top_k=5))\n",
"query_pipeline.add_component(\"prompt_builder\", PromptBuilder(template=prompt_template))\n",
"query_pipeline.add_component(\"prompt_builder\", ChatPromptBuilder(template=template))\n",
"query_pipeline.add_component(\"generator\", generator)\n",
"\n",
"# connect the components\n",
Expand Down Expand Up @@ -756,7 +731,7 @@
" }\n",
" )\n",
"\n",
" answer = results[\"generator\"][\"replies\"][0]\n",
" answer = results[\"generator\"][\"replies\"][0].text\n",
" rich.print(answer)"
]
},
Expand Down Expand Up @@ -1148,8 +1123,8 @@
"\n",
"To use Llama 3 models in Haystack, you also have **other options**:\n",
"- [LlamaCppGenerator](https://docs.haystack.deepset.ai/docs/llamacppgenerator) and [OllamaGenerator](https://docs.haystack.deepset.ai/docs/ollamagenerator): using the GGUF quantized format, these solutions are ideal to run LLMs on standard machines (even without GPUs).\n",
"- [HuggingFaceAPIGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapigenerator), which allows you to query a local TGI container or a (paid) HF Inference Endpoint. TGI is a toolkit for efficiently deploying and serving LLMs in production.\n",
"- [vLLM via OpenAIGenerator](https://haystack.deepset.ai/integrations/vllm): high-throughput and memory-efficient inference and serving engine for LLMs.\n",
"- [HuggingFaceAPIChatGenerator](https://docs.haystack.deepset.ai/docs/huggingfaceapichatgenerator), which allows you to query a the Hugging Face API, a local TGI container or a (paid) HF Inference Endpoint. TGI is a toolkit for efficiently deploying and serving LLMs in production.\n",
"- [vLLM via OpenAIChatGenerator](https://haystack.deepset.ai/integrations/vllm): high-throughput and memory-efficient inference and serving engine for LLMs.\n",
"\n"
]
},
Expand Down
Loading