Skip to content

Conversation

@VinciGit00
Copy link
Contributor

@VinciGit00 VinciGit00 commented Sep 29, 2025

Note

Refactors the Scrapegraph tool into a multi-method ScrapeGraph scraper with enum-driven methods, richer schemas/validation, async crawl handling, expanded docs, tests, and dependency bump.

  • ScrapeGraph Tool:
    • Multi-method support: Adds ScrapeMethod enum with smartscraper, searchscraper, agenticscraper, crawl, scrape, markdownify.
    • Schemas & validation: Introduces FixedScrapegraphScrapeToolSchema and extends ScrapegraphScrapeToolSchema with fields/validators (num_results, depth, timeout).
    • Execution logic: Routes calls per method; supports headers, JS rendering, steps/session/AI extraction, crawl depth/paging/domain/cache, schemas; per-method response parsing; async crawl polling; improved error handling.
    • Metadata: Renames tool to "ScrapeGraph AI Multi-Method Scraper" with updated description.
  • Packaging:
    • Exports new symbols in tools/__init__.py and scrapegraph_scrape_tool/__init__.py.
  • Documentation:
    • Overhauls README.md with features, setup, advanced examples, error handling, and API reference.
  • Dependencies:
    • Bumps scrapegraph-py to >=1.31.0.
  • Tests:
    • Adds comprehensive tests for schema validation, each method, error paths, and async crawl behavior in tests/tools/test_scrapegraph_scrape_tool.py.

Written by Cursor Bugbot for commit e96d426. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on October 28

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

self._validate_url(website_url)
self.website_url = website_url
self.description = f"A tool that uses Scrapegraph AI to intelligently scrape {website_url}'s content."
self.description = f"A tool that uses ScrapeGraph AI to scrape {website_url}'s content using {method.value} method."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Constructor Parameter Ignored in Tool Method

The method parameter in the ScrapegraphScrapeTool constructor is used to set the tool's description but isn't stored as an instance variable. The _run method then determines the actual scraping method from its kwargs or defaults to SMARTSCRAPER, ignoring the method specified during initialization. This can lead to the tool's description misrepresenting the method actually used.

Fix in Cursor Fix in Web

status = result.get("status")

if status == "success" and result.get("result"):
return result["result"].get("llm_result", result["result"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Async Crawl Fails on Non-Dictionary Result

The _handle_async_crawl method attempts to call .get("llm_result") on result["result"], assuming it's always a dictionary. If the API returns result["result"] as a non-dictionary type (like a string or list), this will cause an AttributeError.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant