I've moved from running mall locally with ollama to use models provided by GitHub Copilot, but I've found that any request I make gets rate limited almost instantly, making it unusable.
Is there something I need to consider or configure? I'm only trying to summarize 24 rows and it gets rate limit of 8 seconds almost right away, and then goes for rate limits of 50 seconds before returning nothing.
options(.mall_chat = ellmer::chat_github(model = "mistral-ai/mistral-medium-2505"))
resumen <- datos |>
distinct(variable, tema) |>
mall::llm_summarize(tema, max_words = 7, pred_name = "tema_resumen")
It also seems the problem is due to requests being done in parallel. Is there any way to change this?
Error in `mutate()`:
ℹ In argument: `tema_resumen = llm_vec_summarize(x = tema, max_words = max_words,
additional_prompt = additional_prompt)`.
Caused by error in `req_perform_parallel()`:
! HTTP 429 Too Many Requests.
ℹ Rate limit of 24 per 60s exceeded for UserByMinute. Please wait 0 seconds before retrying.
I've moved from running
malllocally withollamato use models provided by GitHub Copilot, but I've found that any request I make gets rate limited almost instantly, making it unusable.Is there something I need to consider or configure? I'm only trying to summarize 24 rows and it gets rate limit of 8 seconds almost right away, and then goes for rate limits of 50 seconds before returning nothing.
It also seems the problem is due to requests being done in parallel. Is there any way to change this?