Skip to content

Revisiting Data and Sample repository deployments #277

@bmotevalli

Description

@bmotevalli

Background

Since migrating to EASI V2 (AWS EKS Auto Mode), we have experienced frequent container restarts for the CKAN applications. To improve stability under this new environment, we’ve explored two primary strategies:

1. Dedicated Node Pool (via EASI team):

  • Low-Spec Nodes: Ensures CKAN apps are scheduled on lightweight nodes suitable for web apps (not heavy computations), making them less likely to be terminated by EKS Auto Mode optimization.
  • Scheduled PodDisruptionBudgets (PDBs): Applied during business hours (6:00 AM–6:00 PM) to reduce disruptions. While helpful, this approach does not guarantee uninterrupted availability.
  1. High Availability Setup with PDB + Anti-Affinity:

    • Using 2+ replicas of each app, with node anti-affinity rules to spread them across different nodes.

    • A PDB is applied to ensure at least one replica remains live at all times.

    • This approach works well for stateless services, but not ideal for CKAN due to stateful dependencies—specifically, the SOLR service.

Problem

Although we are already using the first approach, it does not fully prevent container restarts.
The second strategy is more robust but assumes statelessness, making it less suitable for CKAN, which relies on stateful components like SOLR.

Possible Solutions

  1. Exploring setting up ReplicaSets for both SOLR and CKAN, with proper anti-affinity and PDB configurations. And see how that affects apps performance and consistency

  2. Use a Kubernetes CronJob to apply PDBs dynamically:

    • Enable PDBs on weekdays to protect CKAN during work hours.

    • Disable them on weekends to allow EKS Auto Mode to optimize the cluster.

  3. Explore solutions outside EASI, such as deploying CKAN and SOLR on dedicated EC2 instances for full control and stability.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions