Skip to content

Apify Web Crawler & Instagram Vibe Checker: Update notebook name and generators #236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion index.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ notebook = "amazon_sagemaker_and_chroma_for_qa.ipynb"
topics = ["RAG"]

[[cookbook]]
title = "RAG: Extract and use website content for question answering with Apify-Haystack integration"
title = "Crawl Website Content for Question Answering with Apify"
notebook = "apify_haystack_rag.ipynb"
topics = ["RAG", "Web-QA"]

Expand Down
84 changes: 45 additions & 39 deletions notebooks/apify_haystack_instagram_comments_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,12 @@
"cell_type": "code",
Copy link
Member

@anakin87 anakin87 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the output


Reply via ReviewNB

Copy link
Member

@anakin87 anakin87 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.    res.get("llm", {}).get("replies", ["No response"])[0]

let's use .text


Reply via ReviewNB

"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"collapsed": true,
"id": "r5AJeMOE1Cou",
"outputId": "63663073-ccc5-4306-ae18-e2720d937407"
"id": "r5AJeMOE1Cou"
},
"outputs": [],
"source": [
"!pip install apify-haystack==0.1.4 haystack-ai"
"!pip install -q apify-haystack"
]
},
{
Expand All @@ -63,7 +59,7 @@
"base_uri": "https://localhost:8080/"
},
"id": "yiUTwYzP36Yr",
"outputId": "d79acadc-bd18-44d3-c812-9b40c51d5124"
"outputId": "4a34577d-f3f1-4508-80f5-d367683ad017"
},
"outputs": [
{
Expand Down Expand Up @@ -189,34 +185,35 @@
"base_uri": "https://localhost:8080/"
},
"id": "gdN7baGrA_lR",
"outputId": "b73b1217-3082-4da7-c824-b8671eeef78d"
"outputId": "fdbcd795-8bbe-4652-922b-f74d47fdb115"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<haystack.core.pipeline.pipeline.Pipeline object at 0x7b45ef117be0>\n",
"<haystack.core.pipeline.pipeline.Pipeline object at 0x7ff4771fb5d0>\n",
"🚅 Components\n",
" - loader: ApifyDatasetFromActorCall\n",
" - cleaner: DocumentCleaner\n",
" - prompt_builder: PromptBuilder\n",
" - llm: OpenAIGenerator\n",
" - prompt_builder: ChatPromptBuilder\n",
" - llm: OpenAIChatGenerator\n",
"🛤️ Connections\n",
" - loader.documents -> cleaner.documents (list[Document])\n",
" - cleaner.documents -> prompt_builder.documents (List[Document])\n",
" - prompt_builder.prompt -> llm.prompt (str)"
" - prompt_builder.prompt -> llm.messages (List[ChatMessage])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
"execution_count": 5
}
],
"source": [
"from haystack import Pipeline\n",
"from haystack.components.builders import PromptBuilder\n",
"from haystack.components.generators import OpenAIGenerator\n",
"from haystack.components.builders import ChatPromptBuilder\n",
"from haystack.components.generators.chat import OpenAIChatGenerator\n",
"from haystack.components.preprocessors import DocumentCleaner\n",
"from haystack.dataclasses import ChatMessage\n",
"\n",
"prompt = \"\"\"\n",
"Analyze these Instagram comments to determine if the post is generating positive energy, excitement,\n",
Expand All @@ -232,8 +229,8 @@
"\"\"\"\n",
"\n",
"cleaner = DocumentCleaner(remove_empty_lines=True, remove_extra_whitespaces=True, remove_repeated_substrings=True)\n",
"prompt_builder = PromptBuilder(template=prompt)\n",
"generator = OpenAIGenerator(model=\"gpt-4o-mini\")\n",
"prompt_builder = ChatPromptBuilder(template=[ChatMessage.from_user(prompt)], required_variables=\"*\")\n",
"generator = OpenAIChatGenerator(model=\"gpt-4o-mini\")\n",
"\n",
"\n",
"pipe = Pipeline()\n",
Expand All @@ -257,37 +254,46 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {
"id": "qfaWI6BaAko9"
},
"outputs": [],
"source": [
"# \\@tiffintech on How to easily keep up with tech?\n",
"url = \"https://www.instagram.com/p/C_a9jcRuJZZ/\"\n",
"\n",
"res = pipe.run({\"loader\": {\"run_input\": {\"directUrls\": [url]}}})"
]
},
{
"cell_type": "code",
"source": [
"res.get(\"llm\", {}).get(\"replies\", [\"No response\"])[0].text"
],
"metadata": {
"id": "SZYtTd7TImGM",
"outputId": "07312231-505c-400f-e58e-a3fc2ba5f6b3",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 72
},
"id": "qfaWI6BaAko9",
"outputId": "25e33c1b-f8b9-4b6d-a3d9-0eb54365b820"
"height": 140
}
},
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"'The Instagram comments exhibit a strongly positive sentiment and emotional tone, indicating that the post is indeed generating high energy and excitement. The use of enthusiastic emojis (e.g., 🫶, 🔥, 😁) and expressions of gratitude (e.g., \"Thank you!\" and \"great resource!\") suggest that the audience is actively engaging with the content in an uplifting manner.\\n\\nKey engagement patterns include:\\n\\n- **Expressions of Appreciation**: Multiple comments acknowledge the resource as beneficial, with users expressing gratitude and sharing their positive experiences.\\n- **Community Interaction**: Users mention and tag others (@tiffintech, @dailydotdev), indicating a vibrant community dynamic and a willingness to share the content with friends or followers.\\n- **Curiosity and Requests**: Questions about additional content (e.g., \"Link please?\" and \"would love a breakdown\") show that the audience is highly engaged and eager for more information.\\n\\nOverall, the comments reflect a post that is \"vibrating\" with high energy, as evidenced by the positive feedback, community connections, and engaged responses.'"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'Overall, the Instagram comments on the post reflect positive energy, excitement, and high engagement. The use of emojis such as 😂, 😍, 🙌, ❤️, and 🔥 indicate enthusiasm and excitement. Many comments express gratitude, appreciation, and eagerness to explore the resources mentioned in the post. There are also interactions between users tagging each other and discussing their interest in the topic, further increasing engagement. Overall, the post seems to be generating high energy and positive vibes from the audience.'"
]
}
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
"execution_count": 8
}
],
"source": [
"# \\@tiffintech on How to easily keep up with tech?\n",
"url = \"https://www.instagram.com/p/C_a9jcRuJZZ/\"\n",
"\n",
"res = pipe.run({\"loader\": {\"run_input\": {\"directUrls\": [url]}}})\n",
"res.get(\"llm\", {}).get(\"replies\", [\"No response\"])[0]\n",
"\n"
]
},
{
Expand All @@ -301,7 +307,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
Expand Down Expand Up @@ -365,4 +371,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
}
}
Loading