patchy631 · nikhilrado · Jun 11, 2026 · coderabbitai · Jun 11, 2026 · coderabbitai
diff --git a/README.md b/README.md
@@ -92,6 +92,7 @@ Perfect for getting started with AI engineering. These projects focus on single
 - [**Video RAG with Gemini**](./video-rag-gemini) - Chat with videos using Gemini AI
 
 #### Other Tools
+- [**Webpage to Markdown & JSON with context.dev**](./context-dev-website-to-md-and-json) - Notebook tutorial for clean Markdown and structured JSON extraction
 - [**Website to API with FireCrawl**](./Website-to-API-with-FireCrawl) - Convert websites to APIs
 - [**AI News Generator**](./ai_news_generator) - News generation with CrewAI and Cohere
 - [**Siamese Network**](./siamese-network) - Digit similarity detection on MNIST

diff --git a/context-dev-website-to-md-and-json/README.md b/context-dev-website-to-md-and-json/README.md
@@ -0,0 +1,45 @@
+# Convert any webpage to Clean Markdown or Structured JSON with context.dev
+
+This project lets you convert any webpage into **LLM-ready Markdown** or **structured JSON** using [context.dev](https://context.dev).
+
+- [context.dev](https://context.dev) is used to scrape, crawl, and extract structured data from websites.
+- A Jupyter notebook walks through each API step by step.
+
+---
+
+## Setup and Installation
+
+**Get a context.dev API key**:
+
+- Go to [context.dev](https://context.dev) and sign up for an account.
+- Paste your API key into the first code cell of `notebook.ipynb`.
+
+**Install dependencies**:
+
+Ensure you have Python 3.11 or later installed.
+
+```bash
+pip install context.dev jupyter
+```
+
+---
+
+## Run the project
+
+```bash
+jupyter notebook notebook.ipynb
+```
+
+---
+
+## 📬 Stay Updated with Our Newsletter!
+
+**Get a FREE Data Science eBook** 📖 with 150+ essential lessons in Data Science when you subscribe to our newsletter! Stay in the loop with the latest tutorials, insights, and exclusive resources. [Subscribe now!](https://join.dailydoseofds.com)
+
+[![Daily Dose of Data Science Newsletter](https://github.com/patchy631/ai-engineering/blob/main/resources/join_ddods.png)](https://join.dailydoseofds.com)
+
+---
+
+## Contribution
+
+Contributions are welcome! Please fork the repository and submit a pull request with your improvements.
diff --git a/context-dev-website-to-md-and-json/notebook.ipynb b/context-dev-website-to-md-and-json/notebook.ipynb
@@ -0,0 +1,185 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a1b2c3d4",
+   "metadata": {},
+   "source": [
+    "# Convert any webpage to Clean Markdown or Structured JSON with context.dev\n",
+    "\n",
+    "This notebook walks through the three core context.dev web APIs step by step.\n",
+    "\n",
+    "**Prerequisites:** Get an API key at [context.dev](https://context.dev) and set it in the cell below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b2c3d4e5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -U context.dev\n",
+    "\n",
+    "CONTEXT_DEV_API_KEY = \"your_api_key_here\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c3d4e5f6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "\n",
+    "from context.dev import ContextDev\n",
+    "\n",
+    "client = ContextDev(api_key=CONTEXT_DEV_API_KEY)\n",
+    "URL = \"https://stripe.com/pricing\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4e5f6a7",
+   "metadata": {},
+   "source": [
+    "## 1. Single page → Clean Markdown\n",
+    "\n",
+    "`web.web_scrape_md` converts a URL into GitHub Flavored Markdown, with optional main-content-only stripping."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e5f6a7b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scrape = client.web.web_scrape_md(\n",
+    "    url=URL,\n",
+    "    use_main_content_only=True,\n",
+    "    include_links=True,\n",
+    ")\n",
+    "\n",
+    "print(f\"URL: {scrape.url}\")\n",
+    "print(f\"Credits used: {scrape.key_metadata.credits_consumed}\")\n",
+    "print(\"\\n--- Preview ---\\n\")\n",
+    "print(scrape.markdown[:2000])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6a7b8c9",
+   "metadata": {},
+   "source": [
+    "## 2. Crawl a site → Markdown per page\n",
+    "\n",
+    "`web.web_crawl_md` follows internal links and returns Markdown for each page."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a7b8c9d0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "crawl = client.web.web_crawl_md(\n",
+    "    url=\"https://docs.stripe.com\",\n",
+    "    max_pages=3,\n",
+    "    max_depth=1,\n",
+    "    use_main_content_only=True,\n",
+    ")\n",
+    "\n",
+    "print(crawl.metadata)\n",
+    "for page in crawl.results:\n",
+    "    status = \"ok\" if page.metadata.success else \"failed\"\n",
+    "    print(f\"\\n[{status}] {page.metadata.title} — {page.metadata.url}\")\n",
+    "    print(page.markdown[:500])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8c9d0e1",
+   "metadata": {},
+   "source": [
+    "## 3. Extract structured JSON with a schema\n",
+    "\n",
+    "`web.extract` crawls relevant pages and returns typed JSON matching your JSON Schema."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c9d0e1f2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pricing_schema = {\n",
+    "    \"type\": \"object\",\n",
+    "    \"properties\": {\n",
+    "        \"currency\": {\"type\": \"string\"},\n",
+    "        \"plans\": {\n",
+    "            \"type\": \"array\",\n",
+    "            \"items\": {\n",
+    "                \"type\": \"object\",\n",
+    "                \"properties\": {\n",
+    "                    \"name\": {\"type\": \"string\"},\n",
+    "                    \"price\": {\"type\": \"string\"},\n",
+    "                    \"billing_period\": {\"type\": \"string\"},\n",
+    "                    \"features\": {\"type\": \"array\", \"items\": {\"type\": \"string\"}},\n",
+    "                },\n",
+    "                \"required\": [\"name\"],\n",
+    "                \"additionalProperties\": False,\n",
+    "            },\n",
+    "        },\n",
+    "    },\n",
+    "    \"required\": [\"plans\"],\n",
+    "    \"additionalProperties\": False,\n",
+    "}\n",
+    "\n",
+    "extracted = client.web.extract(\n",
+    "    url=URL,\n",
+    "    schema=pricing_schema,\n",
+    "    instructions=\"Prioritize the pricing page. Include currency when visible.\",\n",
+    "    max_pages=5,\n",
+    ")\n",
+    "\n",
+    "print(\"Pages analyzed:\", extracted.urls_analyzed)\n",
+    "print(json.dumps(extracted.data, indent=2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0e1f2a3",
+   "metadata": {},
+   "source": [
+    "## Customize the schema\n",
+    "\n",
+    "Change `pricing_schema` above to extract whatever you need — company info, job postings, article metadata, etc. context.dev returns JSON matching your schema."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}