Open
Description
We have code that looks like this:
scrapfly = ScrapflyClient(key=self.__scrapfly_api_key, max_concurrency=15)
targets = [
ScrapeConfig(
url=url,
render_js=True,
raise_on_upstream_error=False,
country="us",
asp=True,
)
for url in urls
]
async for result in scrapfly.concurrent_scrape(scrape_configs=targets):
self.__logger.info(f"Got result: {result}") # when this code explodes, no log appears
if isinstance(result, ScrapflyError): # error from scrapfly itself
...
elif result.error: # error from upstream
...
else: # success
...
That being said, this code tends to explode on the async iterator sometimes, which will throw an error which looks like this without returning a result at all.
<-- 422 | ERR::PROXY::TIMEOUT - Proxy connection or website was too slow and timeout - Proxy or website do not respond after 15s - Check if the website is online or geoblocking, if you are using session, rotate it..Checkout the related doc: https://scrapfly.io/docs/scrape-api/error/ERR::PROXY::TIMEOUT
Seems like there's some kind of bug where the async iterator can itself throw rather than return an exception, which means the entire process blows up. Any ideas how we might go about fixing?
As an aside, I wanted to point out that it feels like the very inconsistent use of typing throughout the library makes it very hard to debug what's actually going on and reason about what errors can happen and when.
Metadata
Metadata
Assignees
Labels
No labels