Skip to content

added context.dev web tools#244

Open
nikhilrado wants to merge 1 commit into
patchy631:mainfrom
nikhilrado:main
Open

added context.dev web tools#244
nikhilrado wants to merge 1 commit into
patchy631:mainfrom
nikhilrado:main

Conversation

@nikhilrado

@nikhilrado nikhilrado commented Jun 11, 2026

Copy link
Copy Markdown

This change adds Context.dev support. Users can try the web extraction endpoints in a Jupyter notebook to get markdown or crawl websites for information.

Summary by CodeRabbit

  • New Features

    • Added a new project for converting webpages into LLM-ready Markdown or structured JSON format.
  • Documentation

    • Added comprehensive setup instructions and usage guide for the new webpage conversion project.
    • Included an interactive example demonstrating single webpage conversion, site crawling with pagination, and structured data extraction capabilities.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

A new tutorial project is added to the repository that demonstrates converting webpages to markdown and structured JSON using the context.dev Python SDK. The main README is updated with a registry entry, a project-level README provides setup and usage instructions, and a complete Jupyter notebook contains three runnable examples.

Changes

context.dev Web Scraping & JSON Extraction Tutorial

Layer / File(s) Summary
Registry entry in main README
README.md
Main README updated with a new "Webpage to Markdown & JSON with context.dev" entry under Beginner Projects → Other Tools.
Project documentation and setup instructions
context-dev-website-to-md-and-json/README.md
Project README describes converting webpages to LLM-ready markdown or JSON, includes Python version requirements, API key setup in the notebook, dependency installation, and instructions to run the notebook.
Notebook implementation with three API examples
context-dev-website-to-md-and-json/notebook.ipynb
Complete Jupyter notebook with package installation, API key initialization, and three examples: single-page markdown scraping with web.web_scrape_md, site crawling with web.web_crawl_md including pagination/depth controls, and structured JSON extraction using a pricing schema with web.extract, plus guidance for schema customization.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • patchy631/ai-engineering-hub#201: Both PRs modify the root README.md's "Projects by Difficulty/Other Tools" section by adding project entries to the tutorial taxonomy.

Poem

🐰 A new tutorial hops into view,
With context.dev doing what it does best to do—
Scraping webpages clean and neat,
JSON schemas making extraction complete,
Three examples to show the way, hop hop! 🌐

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'added context.dev web tools' accurately reflects the main change: adding a new Context.dev project with web extraction tools (webpage-to-markdown and structured JSON conversion) to the repository.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.15)
context-dev-website-to-md-and-json/notebook.ipynb

Unexpected end of JSON input


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Warning

⚠️ This pull request might be slop. It has been flagged by CodeRabbit slop detection and should be reviewed carefully.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@context-dev-website-to-md-and-json/README.md`:
- Line 15: Replace the insecure “paste API key into notebook” workflow with
environment-variable-based auth: update README.md text to instruct users to
export/openai key into an env var (e.g., OPENAI_API_KEY) and to run the notebook
after that, and modify notebook.ipynb to read the key from
os.environ.get('OPENAI_API_KEY') (or via getpass if you prefer interactive
prompt) and raise a clear error if the var is missing; ensure any in-notebook
hardcoded key cells are removed or replaced with the env-var retrieval and that
examples reference the same OPENAI_API_KEY variable name so docs and notebook
stay aligned.

In `@README.md`:
- Line 95: Update the README metadata that lists beginner project counts to
reflect the newly added entry: locate the "Beginner Projects (22)" header/TOC
entry and any other occurrences of that count text in README.md and increment it
to "Beginner Projects (23)" (or adjust to the correct total if there are
multiple new/removed entries), ensuring all instances match the new list
including the newly added "[Webpage to Markdown & JSON with context.dev]" link.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44fe7d8b-3bdd-4ff0-ae95-a3fb3c1a6351

📥 Commits

Reviewing files that changed from the base of the PR and between cfd74dc and 0457e34.

📒 Files selected for processing (3)
  • README.md
  • context-dev-website-to-md-and-json/README.md
  • context-dev-website-to-md-and-json/notebook.ipynb

**Get a context.dev API key**:

- Go to [context.dev](https://context.dev) and sign up for an account.
- Paste your API key into the first code cell of `notebook.ipynb`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Root cause: the tutorial currently standardizes an insecure API-key workflow.

Both docs and notebook teach storing the key directly in notebook content. Switch the tutorial contract to environment-variable-based auth and keep both files aligned to that single secure pattern.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@context-dev-website-to-md-and-json/README.md` at line 15, Replace the
insecure “paste API key into notebook” workflow with environment-variable-based
auth: update README.md text to instruct users to export/openai key into an env
var (e.g., OPENAI_API_KEY) and to run the notebook after that, and modify
notebook.ipynb to read the key from os.environ.get('OPENAI_API_KEY') (or via
getpass if you prefer interactive prompt) and raise a clear error if the var is
missing; ensure any in-notebook hardcoded key cells are removed or replaced with
the env-var retrieval and that examples reference the same OPENAI_API_KEY
variable name so docs and notebook stay aligned.

Comment thread README.md
- [**Video RAG with Gemini**](./video-rag-gemini) - Chat with videos using Gemini AI

#### Other Tools
- [**Webpage to Markdown & JSON with context.dev**](./context-dev-website-to-md-and-json) - Notebook tutorial for clean Markdown and structured JSON extraction

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update beginner project count metadata to match the new entry.

Line 95 adds a beginner project, but the displayed beginner count (Line 34: Beginner Projects (22)) now appears stale. Please update the count(s) in the README headers/TOC so registry metadata stays accurate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 95, Update the README metadata that lists beginner project
counts to reflect the newly added entry: locate the "Beginner Projects (22)"
header/TOC entry and any other occurrences of that count text in README.md and
increment it to "Beginner Projects (23)" (or adjust to the correct total if
there are multiple new/removed entries), ensuring all instances match the new
list including the newly added "[Webpage to Markdown & JSON with context.dev]"
link.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants