Skip to content

Fix grpc timeouts when async_req=True #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 12, 2025
Merged

Conversation

jhamon
Copy link
Collaborator

@jhamon jhamon commented Jun 11, 2025

Problem

When users pass async_req=True and a timeout value to GRPC methods, the timeout value was not being honored; instead, a default timeout of 5 seconds was being imposed by the PineconeGrpcFuture.

Solution

The future object returned from the grpc package do not conform to the Future interface of python's concurrent.futures package, so we wrap those futures in our own class PineconeGrpcFuture to adapt the interface and try to make it more ergonomic to work with. We forgot to pass optional args to the PineconeGrpcFuture constructor, so in many cases (upsert, query, update, delete, fetch all affected) we were not respecting timeout values being passed by the caller.

The fix itself is pretty straightforward; pass the timeout value into the PineconeGrpcFuture. Majority of the work in this PR relates to testing. I've extracted the tests of the async grpc functions into a separate folder so we don't need to mess around with so many @pytest.mark.skip annotations to label grpc-specific tests in what are otherwise shared tests between rest and grpc located in the tests/integration/data folder.

Besides moving existing tests, I have added some new tests for query. And a completely new test file, test_timeouts.py that uses a mock grpc server implementation to simulate the interaction between timeouts in the client and long response times.

To summarize:

  • Refactor async_req=True tests into a different folder
  • Add detailed tests of timeout behavior using a mock server that allows us to control request latency in the test
  • Tweaked instantiation of the IndexGRPC class to respect the ssl_verify param, if any has been set on the parent PineconeGRPC object. This was needed to get the test interactions with the mock server running on localhost passing.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

if self.config.ssl_verify is None:
default_grpc_config = GRPCClientConfig()
else:
default_grpc_config = GRPCClientConfig(secure=self.config.ssl_verify)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows me to test against a local grpc server implementation with PineconeGrpc(ssl_veriify=False) setting getting carried through all the way to the index class.

Comment on lines +73 to +78
def parse_update_response(response: Union[dict, Message], _check_type: bool = False):
return {}


def parse_delete_response(response: Union[dict, Message], _check_type: bool = False):
return {}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two requests don't actually return anything, so there's nothing to do here.

Comment on lines -9 to -10
if os.environ.get("USE_GRPC") == "true":
from pinecone.grpc import GRPCDeleteResponse
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I moved these tests into a new folder data_grpc_futures/ that is only going to be run with grpc, I don't need as much complexity in the test setup.

@jhamon jhamon marked this pull request as ready for review June 11, 2025 18:12

@pytest.mark.usefixtures("grpc_server")
class TestGrpcAsyncTimeouts_QueryByVector:
def test_query_by_vector_with_timeout(self, local_idx: GRPCIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: test_query_by_vector_with_timeout -> test_query_by_vector_with_timeout_exceeded to match rest of the tests

assert result is not None
assert result == {}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to leave out test_update_with_default_timeout test?

self.grpc_client_config = grpc_config or GRPCClientConfig()
# If grpc_config is passed, use it. Otherwise, build a new one with
# default values and passing in the ssl_verify value from the config.
if self.config.ssl_verify is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unclear on why we are checking on ssl_verify property for setting default_grpc_config = GRPCClientConfig()

Copy link
Collaborator Author

@jhamon jhamon Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config object here has an optional boolean field called ssl_verify that can have three values: None (by default, if unset by the user), True, or False. The user sets these by passing a value to the Pinecone or PineconeGRPC constructor like this: pc = PineconeGRPC(ssl_verify=False)

Separately, GRPC configurations can be set at the index level with GRPCClientConfig by passing pc.Index(name='my-index', grpc_config=grpc_config). GRPCClientConfig is implemented as a NamedTuple, an immutable data structure. This object controls the SSL behavior any many other things within the grpc code. For SSL verification, the relevant property in this object is called secure. I discovered in testing that unlike ssl_verify, secure is a true boolean (True/False only) and if you set None you will get errors.

For configuring SSL verify behavior in GRPC, there are three scenarios we need to consider:

  • The user never set anything in particular regarding SSL verification; they want the default behavior everywhere. By the time this constructor is executing, config.ssl_verify will be None so we use the GRPCClientConfig() with no modifications to configure the GRPC channel. This has SSL verification on by default.
  • The user passed PineconeGRPC(ssl_verify=False) but did not specify a grpc_config object for the index client; in that scenario, we want to thread that setting through even to the index level and override the default grpc behavior for ssl verification.
  • The user explicitly passed grpc_config to the index constructor with pc.Index(name='foo', grpc_config=grppc_config). In this scenario, we use the configuration the user has provided.

The lesson here is that we should endeavor to have only one way to configure things if at all possible. Merging multiple config sources, resolution order, etc gets complicated in a hurry.

Copy link
Contributor

@rohanshah18 rohanshah18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work on adding the tests for timeouts! Left a few small comments and an additional comment that the test_update_future.py file is empty, so not sure if you want to delete the file or add new tests. LGTM!!!

# Vector(id='4', values=embedding_values(2), metadata={'genre': 'action', 'runtime': 120 }),
# Vector(id='5', values=embedding_values(2), metadata={'genre': 'comedy', 'runtime': 90 }),
# Vector(id='6', values=embedding_values(2), metadata={'genre': 'romance', 'runtime': 240 })
query_result = idx.query(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is being skipped so no need to update it right now but if we do decide to remove the skip decorator, we need to add async_req=true.

@jhamon jhamon merged commit 86c381c into main Jun 12, 2025
23 checks passed
@jhamon jhamon deleted the jhamon/grpc-timeouts-fix branch June 12, 2025 18:28
rohanshah18 pushed a commit that referenced this pull request Jun 16, 2025
## Problem

I noticed that tests were failing for the python 3.13 build of
`tests/integration/data_grpc_futures` tests after merging PR #510 to
main.

## Solution

Context:
- When tests are run on a PR branch, they only execute on python 3.9. 
- When tests they run on main, tests run for both python 3.9 and 3.13. 

What I observed:
- In this case, the python 3.13 build of newly added tests
`tests/integration/data_grpc_futures` was consistently failing with
"Unauthorized" errors
- Despite the confusing error message from the API, this "Unauthorized"
response was being returned when making calls for an index that no
longer exists.
- Why did the index no longer exist? Because the python 3.9 build of the
same tests was completing first and nuking the index before the 3.13
python build completes.
- Why would the 3.9 tests delete an index being used by the 3.13 tests?
- A test helper was assigning a deterministic name for the index that
was the same for both builds.

Solution:
- Revert back to a randomized name strategy for indexes. Need to rethink
deterministic names later before we can use something like VCR to
playback recorded API calls in testing.
- Reenable 3.13 tests on PR branches for now so we can see that this
change takes care of the problem observed on merge

## Type of Change

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] Infrastructure change (CI configs, etc)

## Test Plan

Should see all tests passing even when 3.9 and 3.13 tests are run
together on PR push.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants