cloudsql-postgres-operator
is a Kubernetes operator designed to manage Cloud SQL for PostgreSQL, from now on referred to CSQLP, instances atop Kubernetes, automating their creation and administration.
Management of CSQLP instances is performed using a custom resource defined as part of the cloudsql-postgres-operator
API.
The sections below outline the goals and non-goals set forth for cloudsql-postgres-operator
, provide a detailed description of its API, and provide insight on some aspects that govern the actual implementation.
ℹ️
|
This document is intended as a living document, and will be updated as required during the lifecycle of cloudsql-postgres-operator in order to reflect the latest design decisions and their implementation.
|
The following features represent goals for cloudsql-postgres-operator
:
-
Create, update and delete CSQLP instances from within Kubernetes.
-
Manage instances within the context of a single Google Cloud Platform project.
-
Allow for configuring the desired instance name, instance type, and disk size and type.
-
Allow for configuring the location (region and zone) and availability type.
-
Allow for configuring the schedule for the weekly maintenance and for the daily backups.
-
Allow for configuring the networking settings.
-
Prevent accidental deletion of a given instance.
-
-
Automatically inject the Cloud SQL proxy and the required connection details in pods requesting access to a CSQLP instance managed by
cloudsql-postgres-operator
, regardless of whether the instance is publicly accessible or not.
The following features represent non-goals for cloudsql-postgres-operator
:
-
Manage CSQLP instances across multiple Google Cloud Platform projects.
-
Backup or restore a given CSQLP instance, given the following limitations.
-
Create, update, delete and otherwise manage CSQLP read replicas.
-
Perform operations such as failover/failback and TLS rotation.
-
Create, update and delete databases inside a given CSQLP instance.
-
Manage Cloud SQL for MySQL instances.
ℹ️
|
These operations can still be performed using the Google Cloud Console, the gcloud CLI or the Cloud SQL Admin API.
|
ℹ️
|
Some of these non-goals may (or may not) become goals in the future. |
cloudsql-postgres-operator
introduces the following custom resource as part of the cloudsql.travelaudience.com/v1alpha1
API:
The PostgresqlInstance
custom resource represents the desired state for a single CSQLP instance.
It is a cluster-scoped resource, meaning that it does not exist inside a specific namespace.
Creating a PostgresqlInstance
resource causes cloudsql-postgres-operator
to create a CSQLP instance following the provided specification.
If a CSQLP instance with the specified name already exists, and cloudsql-postgres-operator
is able to detect this, creation of the PostgresqlInstance
resource is rejected upfront by an admission webhook.
When a CSQLP instance is created, cloudsql-postgres-operator
generates a random password for the postgres
PostgreSQL user and creates a secret in the cloudsql-postgres-operator
namespace containing it (as well as additional connection details).
This secret is intended to be used exclusively by cloudsql-postgres-operator
, and will later be replicated as required into namespaces where pods requiring access to the CSQLP instance are created.
❗
|
Changing the password for the |
Updating a PostgresqlInstance
resource causes cloudsql-postgres-operator
to update the CSQLP instance according to the provided specification.
It should be noted that not all supported fields can be updated after the resource has been created.
Invalid updates are rejected upfront by the aforementioned admission webhook.
Deleting a PostgresqlInstance
resource causes cloudsql-postgres-operator
to delete the CSQLP instance targeted by said resource, as well as all secrets (across all namespaces) containing connection details for the instance.
However, and in order to prevent accidental deletion, the PostgresqlInstance
resource must be annotated with the following annotation:
cloudsql.travelaudience.com/allow-deletion: "true"
If this annotation is not present, or if its value differs from true
, deletion of the PostgresqlInstance
resource (and hence of the associated Cloud SQL for PostreSQL instance) is rejected upfront by the aforementioned admission webhook.
cloudsql-postgres-operator
periodically checks for differences between the specification provided by a given PostgresqlIntance
resource and the status of the CSQLP instance.
The amount of time between successive checks can be tweaked in order to avoid quota exhaustion.
If differences are detected (either because the PostgresqlInstance
resource has been modified, or because the CSQLP instance has been modified manually out-of-band), cloudsql-postgres-operator
updates the instance based on the specification provided by the most recent version of the PostgresqlInstance
resource.
The PostgresqlInstance
resource supports the following fields under .spec
:
Field | Description | Type | Observations |
---|---|---|---|
Availability |
|||
|
The availability type of the instance. |
|
|
Daily Backups |
|||
|
Whether daily backups are enabled for the instance. |
|
|
|
The start time (in UTC) for the daily backups of the instance, in 24-hour format. |
|
|
Database flags |
|||
|
A list of flags passed to the instance. |
|
|
User-defined labels |
|||
|
A map of user-defined labels to be set on the instance. |
|
|
Geographical location |
|||
|
The region where the instance is located. |
|
|
|
The zone where the instance is located. |
|
|
Maintenance |
|||
|
The preferred day of the week for periodic maintenance of the instance. |
|
|
|
The preferred hour of the day (in UTC) for periodic maintenance of the instance, in 24-hour format. |
|
|
Naming |
|||
|
The name of the instance. |
|
|
Networking |
|||
|
Whether the instance is accessible via a private IP address. |
|
|
|
The resource link of the VPC network from which the instance is accessible via a private IP address. |
|
|
|
The CIDR which to authorize by the current rule. |
|
|
|
The name of the current rule. |
|
|
|
Whether the instance is accessible via a public IP address. |
|
|
Resources |
|||
|
The maximum size (in GB) to which the storage capacity of the instance can be automatically increased. |
|
|
|
The minimum size (in GB) requested for the storage capacity of the instance. |
|
|
|
The type of disk used for storage by the instance. |
|
|
|
The instance type to use for the instance. |
|
|
Version |
|||
|
The database engine type and version. |
|
|
An example of a valid PostgresqlInstance
resource is provided below:
apiVersion: cloudsql.travelaudience.com/v1alpha1
kind: PostgresqlInstance
metadata:
name: postgresql-instance-0
spec:
availability:
type: Regional
backups:
daily:
enabled: true
startTime: "22:00"
flags:
- autovacuum=on
labels:
owner: cloudsql-postgres-operator
location:
region: europe-west4
zone: europe-west4-b
maintenance:
day: Saturday
hour: "16:00"
name: cloudsql-psql-123456
networking:
privateIp:
enabled: true
network: projects/cloudsql-postgres-operator-123456/global/networks/default
publicIp:
authorizedNetworks:
- cidr: 30.60.90.120/32
name: alice
- cidr: 120.90.60.0/24
name: bob
enabled: true
resources:
disk:
size:
maximumGb: 40
minimumGb: 20
type: SSD
instanceType: db-custom-2-7680
version: "9.6"
Creating said resource causes cloudsql-postgres-operator
to provision a CSQLP instance named cloudsql-psql-123456
and having with the following configuration:
-
2 vCPUs, 7.5GB RAM, and a 20GB SSD disk which may be automatically resized up to 40GB.
-
Located on the
europe-west-4b
zone, having high-availability enabled. -
Accessible via a private IP on the
default
VPC of thecloudsql-postgres-operator-123456
project. -
Accessible via a public IP from
30.60.90.120
and from every IP in the120.90.60.0/24
network. -
May undergo weekly maintenance on Saturdays, starting at 16:00 UTC.
-
Has daily backups enabled and performed everyday, starting at 22:00 UTC.
-
Runs PostgreSQL 9.6.
The instance may be referenced from within Kubernetes as postgresql-instance-0
(i.e. the value of .metadata.name
).
In a typical scenario, connecting to a CSQLP instance from within a Kubernetes cluster is done using the Cloud SQL proxy, as well as a PostgreSQL username and the corresponding password. [1] This requires the following items to be manually configured by the cluster administrator:
-
A secret containing the credentials for an IAM service account with the
roles/cloudsql.client
role;-
This secret is to be mounted into a sidecar container that runs the Cloud SQL proxy.
-
-
A secret containing the PostgreSQL username and password for the CSQLP instance;
-
This secret is to be mounted into every container requiring access to PostgreSQL.
-
In order to reduce the burden of manually configuring secrets and pods, cloudsql-postgres-operator
supports automatic injection of the Cloud SQL proxy and of the required PostgreSQL credentials in pods requiring access to a given CSQLP instance.
Injection is performed using an admission webhook that acts upon Pod
resources being created (but not updated) by modifying their PodSpec
as required.
In order to request automatic injection of the Cloud SQL proxy sidecar and of the connection details, pods requiring access to a given CSQLP instance managed by cloudsql-postgres-operator
need only to specify the following annotation:
cloudsql.travelaudience.com/postgresqlinstance-name: "<postresqlinstance-name>"
|
<postgresqlinstance-name> represents the value of .metadata.name (and not .spec.name ) of the target PostgresqlInstance resource.
|
Pods specifying the aforementioned annotation will be modified at creation time in the following way:
-
The
cloudsql.travelaudience.com/proxy-injected
annotation will be added to the pod with the fixed value oftrue
. -
A container running Cloud SQL proxy and properly configured in order to expose the referenced CSQLP instance at
localhost:<port>
(where<port>
denotes a random port) is added to.spec.containers
. -
The following environment variables are added to the
.env
field of every existing container:-
PGHOST
, containing the fixed valuelocalhost
; -
PGPORT
, containing the aforementioned value of<port>
; -
PGUSER
, containing the fixed valuepostgres
; -
PGPASSFILE
, containing the path to a PostgreSQL password file containing the password forPGUSER
.
-
The names of the environment variables are chosen so that libpq
-compatible applications (such as psql
itself) are able to connect to the CSQLP instance without further configuration.
Non-libpq
-compatible applications can still inspect the values of these environment variables and the PostgreSQL password file in order to connect to the CSQLP instance.
cloudsql-postgres-operator
is intended to be deployed to a Kubernetes cluster.
It interacts with the Kubernetes API in order to watch for and process changes to PostgresqlInstance
resources, as well as with the Cloud SQL Admin API in order to manage CSQLP instances and related resources.
It also interacts with Pod
resources at creation time in order to inject the Cloud SQL proxy sidecar and the connection details for CSQLP instances as described above.
Finally, it creates Secret
resources whenever a PostgresqlInstance
is created, as well as whenever Pod
resources requesting access to a CSQLP instance are created (and no secret for the instance exists in the target namespace yet).
To be able to access the Cloud SQL Admin API, and to establish connection between pods and the CSQLP instance, cloudsql-postgres-operator
must be provided with the following information:
-
The ID of the Google Cloud Platform project within which to manage Cloud SQL instances.
-
Management of CSQLP instances happens within the context of this project (only).
-
-
The private key of an IAM service account with the
roles/cloudsql.admin
role on the aforementioned project.-
This IAM service account is used directly by
cloudsql-postgres-operator
in order to access the Cloud SQL Admin API.
-
-
The private key of an IAM service account with the
roles/cloudsql.client
role on the aforementioned project.-
This IAM service account is used by pods in order to access the CSQLP instances.
-
ℹ️
|
A single IAM service account with |
These, as well as all other required configuration options, are to be provided to cloudsql-postgres-operator
via a configuration file in TOML format, whose path is specified as a command-line flag.
cloudsql-postgres-operator
is composed of two main components: an admission webhook, responsible for mutating both Pod
resources and resources belonging to the API, and a controller responsible for watching changes to PostgresqlInstance
resources and triggering a reconciliation function in response.
This reconciliation function uses the Cloud SQL Admin API in order to drive the current state of CSQLP instances in line with the desired state (as specified by the PostgresqlInstance
resource).
The admission webhook is called whenever a Pod
resource is created, as well as whenever a PostgresqlInstance
resource is created, updated or deleted.
The reconciliation function is called whenever a given resource of the cloudsql.travelaudience.com
API is created, updated or deleted, as well as periodically whenever the controller’s resync period elapses.
As mentioned above, the amount of time between successive iterations of the reconciliation function can be tweaked in order to prevent quota exhaustion.
The Cloud SQL Admin API reserves the names of deleted CSQLP instances for up to a week after they have been deleted.
Hence, cloudsql-postgres-operator
may sometimes be unable to understand if the name requested for a given PostgresqlInstance
resource is available or not until actually trying to create the instance.
In practice, this means that cloudsql-postgres-operator
may be unable to reject upfront (i.e. using the admission webhook) the creation of a PostgresqlInstance
resource requesting a reserved name, being only able to report the error later (i.e. during an iteration of the PostgresqlInstance
controller).
This particular scenario is handled according to what is described in Quotas, limits and error handling.
When a PostgresqlInstance
resource is deleted from the Kubernetes API, cloudsql-postgres-operator
deletes the associated CSQLP instance from the Cloud SQL Admin API.
It might happen, though, that cloudsql-postgres-operator
doesn’t have the chance to properly react to this event (for example, due to a crash or networking error).
In this situation, the CSQLP would be left orphaned.
In order to prevent such situations, cloudsql-postgres-operator
makes use of finalizers in order to guarantee proper cleanup of CSQLP instances and associated resources, hence greatly reducing the chances of ending up with orphaned resources.
Due to the usage of finalizers, a resource from the cloudsql.travelaudience.com
API is only permanently deleted from the Kubernetes API when the associated Cloud SQL Admin API resource has been cleaned up.
Each iteration of each controller’s reconciliation function makes a number of requests to the Cloud SQL Admin API in order to understand what the current status of CSQLP instances and related resources is, and in order to drive said current status towards the desired status.
Hence, it is important to keep in mind that all quotas and limits mentioned in the Quotas and Limits page apply to cloudsql-postgres-operator
.
In order to avoid hitting a quota limit in the first place, a sane default value is used for the PostgresqlInstance
controller’s resync period (i.e. the maximum amount of time between successive reconciliations of every PostgresqlInstance
resource).
This value is further made configurable by the cluster operator so that it can be tweaked according to the particular quota limits of the Google Cloud Platform project targeted by cloudsql-postgres-operator
.
In the unlikely event of a quota limit being reached, or whenever an error occurs, the Cloud SQL Admin API responds with an error.
Depending on the severity and context of said error, the PostgresqlInstance
controller may or may not be able to recover.
In cases where the controller cannot recover, the current iteration of the reconciliation function is marked as failed, a Kubernetes event associated with the resource being processed is emitted, and reconciliation is attemped again after the controller’s resync period elapses (or when the resource is modified, whichever comes first).
It should also be noted that the Cloud SQL proxy itself consumes quota from the Cloud SQL Admin API, at a rate of two requests per Cloud SQL proxy instance per hour (plus an additional few requests when starting).
As an example, a Deployment
with three replicas requesting access to a CSQLP instance and running 24/7 consumes approximately 144 requests per day.
In the unlikely event of a quota limit being reached, the Cloud SQL proxy will cease to function until quota is replenished.
This limitation can only be worked around by requesting a quota increase.
ℹ️
|
The usually high quota limits of the Cloud SQL Admin API, in combination with the fact that the expected number of requests made by cloudsql-postgres-operator and instances of the Cloud SQL proxy is low (being further reducible by tweaking the cloudsql-postgres-operator configuration), makes it highly unlikely that quota exhaustion for a given Google Cloud Project happens at all.
|
Due to the fact that the postgres
user does not have the SUPERUSER
attribute
[2],
cloudsql-postgres-operator
cannot provide complete and reliable on-demand backups of CSQLP to external storage.
For a related reason
[3]
, restore functionality cannot be implemented reliably.
Hence, backup and restore functionality in cloudsql-postgres-operator
is limited to allowing for enabling and customizing the schedule of daily backups.
On-demand backups and restores can still be performed using the Google Cloud Console, the gcloud
CLI or the Cloud SQL Admin API.