Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClientError 429 RESOURCE_EXHAUSTED on first use of gemini-2.0-flash-exp with Google Search tool after 2-day idle period #17

Closed
YoussefElsafi opened this issue Dec 14, 2024 · 20 comments
Assignees
Labels
api: gemini-api priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@YoussefElsafi
Copy link

I am encountering a ClientError: 429 RESOURCE_EXHAUSTED error when attempting to use the gemini-2.0-flash-exp model with the Google Search tool. This error occurs immediately upon sending the first message in a chat session, even after a period of approximately 48 hours of inactivity, indicating that this is not a result of exceeding typical usage quotas.

Environment Details:

  • Programming Language: Python
  • Operating System: Windows 11
  • Python Version: 3.12.5
  • Package Versions: google-genai 0.2.2

Expected Behavior:

The gemini-2.0-flash-exp model should be able to generate responses using the Google Search tool without raising a RESOURCE_EXHAUSTED error, especially when the API has not been used for an extended period.

Actual Behavior:

A ClientError: 429 RESOURCE_EXHAUSTED error is raised immediately after sending the first message to the model.

Steps to Reproduce:

  1. Environment Setup:
    • Install the required packages as listed in the "Environment Details" section below.
    • Set the environment variable API_KEY to a valid Google AI API key.
  2. Code:
    from google import genai
    from google.genai import types
    import os
    
    MODEL_ID = "gemini-2.0-flash-exp"
    
    search_tool = {'google_search': {}}
    
    client = genai.Client(http_options={'api_version': 'v1alpha'},
                          api_key=os.environ["API_KEY"])
    chat = client.chats.create(
        model=MODEL_ID,
        config=types.GenerateContentConfig(
            tools=[search_tool],
        ),
    )
    
    def handle_response(response):
        """Handles the response from the model, including grounding metadata."""
        for candidate in response.candidates:
            for part in candidate.content.parts:
                if part.text:
                    print("Model:", part.text)
    
            if candidate.grounding_metadata and candidate.grounding_metadata.search_entry_point:
                print("Search Results:")
                print(candidate.grounding_metadata.search_entry_point.rendered_content)
    
    def main():
        """Main function to run the chat loop."""
        print("Start chatting with the model (type 'exit' to quit)")
        while True:
            user_input = input("You: ")
            if user_input.lower() == 'exit':
                print("Exiting chat.")
                break
    
            response = chat.send_message(user_input)
            handle_response(response)
    
    if __name__ == "__main__":
        main()
  3. Run the Script: Execute the Python script.
  4. Send Message: Send any message (e.g., "hi") to the model.

Error Output:

CMD> python genai_google_search.py
Start chatting with the model (type 'exit' to quit)
You: hi
Traceback (most recent call last):
  File "C:\Users\Youssef\Desktop\Coding\AI Development\BatchBot\genai_google_search.py", line 45, in <module>
    main()
  File "C:\Users\Youssef\Desktop\Coding\AI Development\BatchBot\genai_google_search.py", line 41, in main
    response = chat.send_message(user_input)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\chats.py", line 63, in send_message
    response = self._modules.generate_content(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\models.py", line 4393, in generate_content
    response = self._generate_content(
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\models.py", line 3655, in _generate_content
    response_dict = self.api_client.request(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\_api_client.py", line 321, in request
    response = self._request(http_request, stream=False)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\_api_client.py", line 261, in _request
    return self._request_unauthorized(http_request, stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\_api_client.py", line 283, in _request_unauthorized
    errors.APIError.raise_for_response(response)
  File "C:\Users\Youssef\AppData\Local\Programs\Python\Python312\Lib\site-packages\google\genai\errors.py", line 100, in raise_for_response
    raise ClientError(status_code, response)
google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'Resource has been exhausted (e.g. check quota).', 'status': 'RESOURCE_EXHAUSTED'}}
CMD>

Additional Information:

  • I have verified that my API key is valid.
  • I am using the v1alpha API version.
  • The gemini-2.0-flash-exp model is specified.
  • The Google Search tool is correctly configured in the GenerateContentConfig.
  • Crucially, I have not used the Google Search tool (or any other Gemini API) for approximately 48 hours prior to encountering this error. This strongly suggests that the error is not due to exceeding normal API usage quotas.
  • Could this be related to the gemini-2.0-flash-exp model being in an experimental stage and possibly having stricter quota limitations, unexpected quota reset behavior, or issues with tool integration?
  • Suggestion: It would be helpful if the documentation could clarify the quota limits and reset mechanisms for experimental models, especially when used with tools. It would also be helpful to confirm that quotas are properly reset after periods of inactivity.
  • Have others experienced this RESOURCE_EXHAUSTED error specifically when using tools with gemini-2.0-flash-exp, even on the first request after a period of inactivity?
  • Is there any account-specific information or quota dashboard that I can access to verify my API usage and quota status for experimental models?

Request:

Please investigate this issue and provide guidance on how to resolve the RESOURCE_EXHAUSTED error when using the gemini-2.0-flash-exp model with the Google Search tool, particularly in light of the fact that this is occurring on the very first request after an extended idle period.

@YoussefElsafi YoussefElsafi added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Dec 14, 2024
@Atlas3DSS
Copy link

thank you for posting this - cause it does not look at all like any of the actual code examples google gives in their documentation for how to use grounding search - and for me this actually worked - after 3 hours now of banging my head against their out of date docs - i was about to give up and came here to the issues tab and found your code which finally got me grounding results.

it looks nothing like what they show in documentation.

@YoussefElsafi
Copy link
Author

@Atlas3DSS

thank you for posting this - cause it does not look at all like any of the actual code examples google gives in their documentation for how to use grounding search - and for me this actually worked - after 3 hours now of banging my head against their out of date docs - i was about to give up and came here to the issues tab and found your code which finally got me grounding results.

it looks nothing like what they show in documentation.

Hey, thanks for chiming in! I'm glad my code example (even though it's highlighting an error for me) was helpful for you. It's definitely concerning that the official docs aren't reflecting the correct way to implement the search tool. Hopefully, this issue will help get that addressed. I appreciate you sharing your experience - it really validates that there's something not quite right on the Google side.

@sasha-gitg sasha-gitg added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Dec 17, 2024
@ShauryaKesarwani
Copy link

after having a lot of head banging with the docs, to the point went in the entire actual downloaded packages code base to figure out whats going on and what i actually need to implement to fix issue since i gotten that docs are wrong.

i got to so many times exhausted usage and was confused alot, it seems it might be their side issue since i can use it on ai studio

from google import genai
from google.genai.types import Tool, GenerateContentConfig, GoogleSearch

model_id = "gemini-2.0-flash-exp"
client = genai.Client(api_key="YOUR_API_KEY", http_options={'api_version': 'v1alpha'},)

search_tool = {'google_search': {}}

response = client.models.generate_content(
    model=model_id,
    contents="time in india",
    config= GenerateContentConfig(
        tools=[search_tool],
    )
)

print(response.text)

@tarekbadrsh
Copy link

I tried so many things, but it's still not working. Have you gotten it to work?

from google import genai
from google.genai.types import (
    GenerateContentConfig,
    Tool,
    GoogleSearch,
    GoogleSearchRetrieval,
)

client = genai.Client(
    api_key="YOUR_API_KEY",
    http_options={"api_version": "v1alpha"},
)

mytools = [
    Tool(
        google_search=GoogleSearch(),
        google_search_retrieval=GoogleSearchRetrieval(),
    ),
]

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents="time now in UK",
    config=GenerateContentConfig(
        tools=mytools,
    ),
)

print(response.text)

@ShauryaKesarwani
Copy link

ai studio is still working with search grounding, i tried even with cURL and getting the same error 429.

I'm not sure what to do at this point since if it's not even working with a simple POST request.
then it's a server fault from Google's end since I can use other tools and features of api fine

@jamg7
Copy link
Collaborator

jamg7 commented Jan 9, 2025

Hi, I just tried @YoussefElsafi code and it works for me without issue.

Does folks here still observe 429 when trying the gemini-2.0-flash-exp with Google Search tool?

Think 429 should be related to quota issue. Is the API KEY shared with anyone?

Thanks,
James

@andre-motorway
Copy link

Hi, I just tried @YoussefElsafi code and it works for me without issue.

Does folks here still observe 429 when trying the gemini-2.0-flash-exp with Google Search tool?

Think 429 should be related to quota issue. Is the API KEY shared with anyone?

Thanks, James

Yes, I am observing the same issue here. API KEY not shared.

@Giom-V
Copy link

Giom-V commented Jan 13, 2025

The free tier should offer about 1000 grounding searches per day. Are you trying to use it more than that or is it blocking from the first try?

@sasha-gitg sasha-gitg added api: vertex-ai Issues related to the Vertex AI API. api: gemini-api and removed api: vertex-ai Issues related to the Vertex AI API. labels Jan 13, 2025
@andre-motorway
Copy link

andre-motorway commented Jan 15, 2025

I have been getting a similar error for a similar circumstance (first try even after a day of inactivity) for every experimental model available:
429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'Online prediction request quota exceeded for gemini-experimental. Please try again later with backoff.', 'status': 'RESOURCE_EXHAUSTED'}}

@jamg7
Copy link
Collaborator

jamg7 commented Jan 30, 2025

@andre-motorway, the error message you shared seems return by Vertex AI backend. Can you double confirm which backend you are using?

If possible, can please share the script that you used for the testing?

@andre-motorway
Copy link

@andre-motorway, the error message you shared seems return by Vertex AI backend. Can you double confirm which backend you are using?

If possible, can please share the script that you used for the testing?

Yes, indeed is a Vertex AI backend.

import vertexai
from google import genai
from google.genai.types import (
    GenerateContentConfig,
    Tool,
    GoogleSearch,
    GoogleSearchRetrieval,
    SafetySetting,
)
from google.api_core.exceptions import GoogleAPIError

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
)

from .models import FollowUpQuestions
from config.config import Config  # Use absolute import

# Initialize Vertex AI
vertexai.init(project=Config.PROJECT_ID, location=Config.LOCATION)

client = genai.Client(
    vertexai=True, project=Config.PROJECT_ID, location=Config.LOCATION
)

google_search_tool = Tool(google_search=GoogleSearch())
google_search_retrieval_tool = Tool(google_search_retrieval=GoogleSearchRetrieval())

.
.
.

try:
        .
        .
        .
        response = client.models.generate_content(
            model=model,
            contents=prompt,
            config=GenerateContentConfig(
                system_instruction=system_instructions,
                temperature=0,
                tools=tools,
                response_modalities=["TEXT"],
                safety_settings=[
                    SafetySetting(
                        category="HARM_CATEGORY_HATE_SPEECH",
                        threshold="BLOCK_MEDIUM_AND_ABOVE",
                    ),
                    SafetySetting(
                        category="HARM_CATEGORY_DANGEROUS_CONTENT",
                        threshold="BLOCK_MEDIUM_AND_ABOVE",
                    ),
                    SafetySetting(
                        category="HARM_CATEGORY_HARASSMENT",
                        threshold="BLOCK_LOW_AND_ABOVE",
                    ),
                    SafetySetting(
                        category="HARM_CATEGORY_SEXUALLY_EXPLICIT",
                        threshold="BLOCK_ONLY_HIGH",
                    ),
                ],
            ),
        )

        return response
    except Exception as e:  # Catch a broader range of exceptions
        print(f"An ground_data error occurred: {str(e)}")
        raise  # Re-raise for tenacity to handle

@sasha-gitg
Copy link
Member

For Vertex AI, you need to contact Vertex AI support when encountering prediction request quota issues for Gemini: https://cloud.google.com/vertex-ai/docs/general/troubleshooting?authuser=8&component=any#error_code_429

@jamg7
Copy link
Collaborator

jamg7 commented Feb 3, 2025

Thanks, @sasha-gitg. Since the question should be asked in vertex-ai, I'm closing the issue now.

@jamg7 jamg7 closed this as completed Feb 4, 2025
@tarekbadrsh
Copy link

After the latest update today, the issue has been resolved, and it is now functioning as expected.
Here is an example for reference: (Happy coding 🥳)

import os
from typing import List
from google.genai import Client
from google.genai.types import GenerateContentConfig, Content, Tool, GoogleSearch
from dotenv import load_dotenv

# Load environment variables from .env file

load_dotenv()
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
client = Client(api_key=GEMINI_API_KEY)

messages: List[Content] = [
    Content(
        role="user",
        parts=[{"text": "do search online about: Salford City vs. Bromley and find the match report"}],
    ),
]

config = GenerateContentConfig(
    system_instruction="""You are a helpful assistant.
    You have to do deep research and provide a detailed answer
    """,
    tools=[Tool(google_search=GoogleSearch())],
)

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=messages,
    config=config,
)

print(response)
print(response.text)

@andre-motorway
Copy link

After the latest update today, the issue has been resolved, and it is now functioning as expected. Here is an example for reference: (Happy coding 🥳)

import os
from typing import List
from google.genai import Client
from google.genai.types import GenerateContentConfig, Content, Tool, GoogleSearch
from dotenv import load_dotenv

Load environment variables from .env file

load_dotenv()
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
client = Client(api_key=GEMINI_API_KEY)

messages: List[Content] = [
Content(
role="user",
parts=[{"text": "do search online about: Salford City vs. Bromley and find the match report"}],
),
]

config = GenerateContentConfig(
system_instruction="""You are a helpful assistant.
You have to do deep research and provide a detailed answer
""",
tools=[Tool(google_search=GoogleSearch())],
)

response = client.models.generate_content(
model="gemini-2.0-flash",
contents=messages,
config=config,
)

print(response)
print(response.text)

The issue has not been resolved, it was just that Gemini 2.0 Flash that the above code uses is not experimental, but generally available. If you try with experimental models, you will notice the issues remains.

@andre-motorway
Copy link

I have dealt with the issue by instantiating the GenAI SDK client using a Gemini API KEY instead of through Vertex AI.

@homjay
Copy link

homjay commented Feb 16, 2025

This is a Vertex AI issue; Gemini AI Studio is not subject to this quota limit.

Why does Google impose limitations on production users instead of free users?

@logankilpatrick
Copy link
Contributor

The issue has not been resolved, it was just that Gemini 2.0 Flash that the above code uses is not experimental, but generally available. If you try with experimental models, you will notice the issues remains.

@andre-motorway Hey! You should migrate off the experimental models and use the production ready / GA version. The experimental models typically get shut down soon after the GA models comes out.

@logankilpatrick
Copy link
Contributor

Hey @homjay, not sure I follow "Why does Google impose limitations on production users instead of free users?", you can take the Google AI Studio version to production today, pls let me know if you run into any issues doing so.

@OtwakO
Copy link

OtwakO commented Mar 7, 2025

Is this issue resolved? I am still getting a lot of 429 errors when calling with a delay of even up to 20+ seconds after a dozen calls, not on experimental models but models like gemini-2.0-flash or flash-lite, the quota usage shown on the google console dashboard is also very low. I am not using Vertex AI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: gemini-api priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests