Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to specify Postgres connection credentials #564

Merged
merged 4 commits into from
Jan 22, 2025

Conversation

michaelbautin
Copy link
Contributor

@michaelbautin michaelbautin commented Dec 27, 2024

  • Allow specifying PostgreSQL credentials when running against an external instance of PostgreSQL not managed by the benchmark.
  • Allow avoiding an attempt to start PostgreSQL service with systemctl, because it might be managed in a different way.
  • Create the vector PostgreSQL extension automatically if it does not exist.

Fixes: #563

@maumueller
Copy link
Collaborator

Thanks for your contribution, @michaelbautin. I'm a bit skeptical about the proposed solution, because it makes this very pronounced from an API point of view, while it only applies to a very small subset of the algorithms.

Can you achieve the same functionality by specifying these credentials in the config file https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/pgvector/config.yml? If I remember correctly, you can provide it in the base_args entry which is propagated to the constructor of the class.

@michaelbautin michaelbautin changed the title Add ability to specify database connection credentials on the command line Add ability to specify Postgres connection credentials Jan 18, 2025
@michaelbautin
Copy link
Contributor Author

@maumueller Thank you for your feedback. I've reworked the PR to avoid introducing generic command-line parameters, and instead made it controllable by Postgres-specific environment variables. Also I've added automatic pgvector extension creation and a way to avoid attempting to start PostgreSQL service if it is managed in a different way.

Cc @erikbern

@maumueller
Copy link
Collaborator

I think that looks good, @michaelbautin. I suppose your use-case is that the implementation is run locally outside of the Docker environment? Otherwise it seems a bit difficult to pass these variables into the container with the current setup.

@michaelbautin
Copy link
Contributor Author

@maumueller yes, my use case is that the Postgres server is not managed by ann-benchmarks scripts. In my experiments I frequently need a Postgres server set up in a very specific way in a cloud environment.

@maumueller
Copy link
Collaborator

Alright, I can merge this if you agree that it provides the necessary functionality. One last thing: Do you think a short comment in the module.py would help others find out about this feature? (And if yes, could you please add one? Thanks!)

@michaelbautin
Copy link
Contributor Author

@maumueller : sounds great, thank you! I will add the comment shortly.

@michaelbautin
Copy link
Contributor Author

@maumueller I've added the requested comment. Thank you!

@maumueller maumueller merged commit 4d734fb into erikbern:main Jan 22, 2025
31 of 47 checks passed
@maumueller
Copy link
Collaborator

Thank you for the contribution, @michaelbautin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow specifying database server connection credentials
2 participants