Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processing urls in batches #113

Open
nikomatsakis opened this issue Mar 31, 2021 · 6 comments
Open

processing urls in batches #113

nikomatsakis opened this issue Mar 31, 2021 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas

Comments

@nikomatsakis
Copy link
Contributor

Basic summary

As a beginner in Rust, I would like to add to this thread with our real-life experience. We are currently facing issues which make me relate to this story (and are preventing us to switch to Rust):

We are trying to rewrite some of our services from Python to Rust and are looking to achieve the following:

  1. Read a bunch of URLs (size varies, but about 1000 per batch)
  2. Do an HTTP GET request for each URL asynchronously
  3. Log the failures and process the results

What we did not succeed to do so far is:

  1. Send the requests by batch. If we send the 1000 requests at the same time,
    our server closes the connection and the process panics. Ideally we could
    buffer them to send at most 50 at a time. We could split the batches manually,
    but we hoped the HTTP client or the FuturesUnordered container would handle
    that for us.
  2. Handle errors. Failures should be logged and should not crash the
    process. We plan on using tracing-rs for the
    logging as it is part of the tokio stack.
  3. Implement Fibonacci or exponential retry mechanism on failure.

For reference, the stackoverflow question where I was looking for help.

Originally posted by @rgreinho in #95 (comment)

@nikomatsakis nikomatsakis added help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas good first issue Good for newcomers labels Mar 31, 2021
@nikomatsakis
Copy link
Contributor Author

Can you say a bit more, @rgreinho? What caused you not to succeed, for example?

@rgreinho
Copy link

rgreinho commented Mar 31, 2021

We're still investigating, so hopefully we will find solutions soon :)

But here are our two big blockers so far:

  1. Not sure how to limit the number of requests sent at the same time asynchronously. In Python the
    aiohttp web client can be configured with a smaller connection pool size and handles it for us. We hoped that reqwest would do the same. Note that we're not specifically attached to reqwest and could use surf or hreq, but unfortunately we found out that they behave the same way.

    Our second hope was that the FuturesUnordered container would allow us to manage this.

    A comment in the SO question pointed us to this question, where they create a Stream and apply the buffer_unordered() method on it. That will be our next attempt.

    Coming from a Python background, we hoped we could simply use the same pattern, where the asyncio.gather() function from the stdlib executes the coroutines and collects the results, that's why we went with the FuturesUnordered container first.

  2. We do not know how to retry failed requests. We did find the backoff crate and the tokio-retry one, but they don't seem to work well with FuturesUnordered. Or at least we did not succeed to get them to work together.

    In Python we use tenacity to decorate our functions, and if an exception is caught, it tries to re run them for us.

We are also having problems with the error handling. We could not get the map_err() and or_else() method to work as expected. But this is probably simply do to the fact that we're new to this and did not use them properly. I'm sure we will figure it out soon. Same thing with the logging, the tracing-rs library looks fantastic.

@nikomatsakis
Copy link
Contributor Author

@rgreinho any chance you want to join a "vision doc writing session" and talk about this? What time zone are you in? :)

@nikomatsakis
Copy link
Contributor Author

This week's writing sessions -- I expect we'll schedule more for next week.

@rgreinho
Copy link

rgreinho commented Apr 1, 2021

Sure thing! Not sure how that works, or what exactly is expected from participants, but I'll be glad to help.

I am in the CDT time zone.

@nikomatsakis
Copy link
Contributor Author

The basic format is that the host asks you (and others) a bunch of questions about your experiences and then we try to collectively write a story about one of the characters. It's fun. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed status-quo-story-ideas "Status quo" user story ideas
Projects
None yet
Development

No branches or pull requests

2 participants