07 Feb 09:59

fgogolli

ed69077

v0.4.3 Latest

Latest

What's changed

This release fixes all the relevant security issues in the current code base, as detected by cfn_lint, trivy, checkov and ScoutSuite.

Terraform State:

Encrypt and secure init_grid state and Lambda buckets.
Limit the scope of KMS Key policy for State Buckets.
Remove AccessControls and use BucketPolicy to keep the bucket private.
Configure all Makefiles to use encrypted S3 Buckets for TF State, non-root Dockerfiles, fix HTCGRID_ECR_REPO, name CloudFormation stack outputs, and support updating existing init_grid stack.
Improve init_grid Makefile to handle initial and deletion cases better.
Add support for cleaning up S3 object versions and standardize bucket variable naming.

HTC Grid Containers:

Configure all Dockerfiles to run non-root containers and fix builds.
Configure all HTC K8S resources to run with runAsNonRoot, default seccompProfile, and disabled allowPrivilegeEscalation.
Rename components, add readOnlyFileSystem and seccomp profile to HTC Agent, fix and cleanup code.
Remove file system write dependencies for the agent.
Harden K8S manifests and enforce further chekov rules.
Configure Grafana Ingress to drop invalid HTTP Header fields.

HTC Grid Control Plane:

Configure CMK KMS Key encryption for VPC Flow Logs, ECR Repositories, SQS, DynamoDB, S3, EKS Cluster, EKS MNG EBS Volumes, and all CloudWatch Logs.
Add encrypted CloudWatch Logging for API Gateway.
Create S3 via TF Module, add encryption support for S3 Data Plane in the agent, fix AWS partition, and DNS Suffix usage.
Simplify code and move all lambdas and auth to the control_plane.
Configure and consolidate least-privilege permissions on KMS, Lambda, and Agent IAM policies.
Add KMS Decrypt and GenerateDataKey permissions to Lambda and Agent permissions.
Move installation of jq onto lambda images and fix the bootstrap script.
Convert EC Redis to a single replica cluster mode and add encryption.
Add AUTH for ElastiCache Redis Cluster.
Enable XRay tracing for Lambda functions and adjust Redis config.
Add an explicit ASG Service Linked Role declaration to enable KMS support for ASG EBS Volumes.
Handle cases where AWSServiceRoleForAutoScaling already exists.
Add S3 and SQS Resource Policies to enforce HTTPS and create separate CMK KMS Keys for DLQs per each SQS Queue.
Configure the DLQs to be used with the respective SQS Queues and fix naming/references.
Add security group and ACL controls where possible.
Configure securityContext for OpenAPI.

General:

Add GitHub workflows for cfn_lint, trivy, and checkov.
Standardize, fix, and simplify tests.
Standardize the naming of TF resources.
Fix docs and random_password to align with pipelines.
Add auto deploy & destroy stages for images.
Change all Copyright notices to reflect the current year (2024).

Cloud9:

Fix Cloud9 deployment script to target correct instances.
Fix Cloud9 bootstrap race condition and adjust to WS.
Force a reinstall at bootstrap time to fix virtualenv issues.
Add support for specifying a Git repo/branch for HTCGridSource.
Remove Admin role from KMS Admins as it doesn't exist in WS.

Full Changelog: v0.4.2...v0.4.3

Assets 2

09 Oct 09:29

clementrey-dev

v0.4.2

9383ce4

v0.4.2

Remove CDK as IaC for deploying HTC Grid
Remove any hardcoded dependency to urllib3
Migrate lambda function runtime from python 3.7 to python 3.11

Assets 2

19 Sep 17:33

fgogolli

v0.4.1

1a97c67

v0.4.1

Move the deployment of the Helm charts outside of the EKS Blueprints Addons module to native TF Resource(s) to better handle the resource dependencies to those addons and simplify code.
Switch Grafana ingress to use the new ingressClassName spec format instead of the deprecated kubernetes.io/ingress.class annotation.
Switch to using the kubernetes_annotations TF Resource to manage the Cognito annotations for Grafana Ingress.
Adjust workshop notes on creation of Cognito user for the user-pool with sign-up disabled.
Add ability to always use the latest released tag in the Cloud9 instance deployment.
Fix the Private API Gateway and Resource Policy race-condition/dependency.
Fix image_repository destroy issues since adding explicit region flags to ECR commands.
Fix missing comma in state_table_dynamodb.py.
Add explicit region flag when listing ECR repos in the workshop.
Clean up and adjust workshop notes, code, comments and other docs (ie the FSI Whitepaper link).

Assets 2

12 Sep 10:41

fgogolli

v0.4.0

62afc75

v0.4.0

EKS Cluster & Nodes:

Change to using terraform-aws-modules/eks for managing and deploying the EKS Cluster as well as related resources, such as: Node IAM Roles & Policies, Node Defaults incl. instance types, Security Groups and the AWS Auth ConfigMap.
Change to using EKS Managed Node Groups for all of the Core and Worker Node Groups.
Configure Cluster Autoscaler to manage the scaling and lifecycle of the EKS Managed Node Groups.
Disable AWS Node Termination Handler, as it shouldn't be used in conjunction with EKS Managed Node Groups.
Simplify and standardise VPC Endpoint creation. Add EKS Private VPC Endpoint to allow internal communications from the private subnet with the EKS Control Plane.
Change node taints from grid/type: Operator to htc/node-type: core and htc/node-type: worker. Add those as labels and tags as well, to simplify operations and cluster visibility via kubectl and other monitoring solutions.
Adjust default instance types for the Core and Worker Node Groups to allow for better diversification and deplopyment, both for OnDemand and Spot workloads.
Change to using cluster_name instead of eks_cluster_id everywhere, in line with the new module changes.
Add ability to specify EBS Volume type and size for the EKS Nodes.

EKS AddOns:

Change to eks-blueprints-addons for managing and deploying all of the EKS Blueprint AddOns and OSS Helm Releases, such as: CoreDNS, Kube-Proxy, VPC CNI, FluentBit, Cluster Autoscaler, AWS LoadBalancer Controller, CloudWatch Metrics, KEDA, InfluxDB, Prometheus & Grafana, as well as all the relevant configuration.
Add implicit and explicit dependencies to fix the race conditions where the AWS Loadbalancer Controller may get deleted before being able to cleanup the AWS resources that it manages. The new dependency order guarantees a proper clean up of those resources before the AWS LoadBalancer Controller is destroyed during unprovisioning.
Fix the explicit and implicit dependencies between the Kubernetes data sources and the underlying resources created by the EKS Blueprints Addons module.
Move ingress and dashboard creation for Grafana to be handled via the Helm chart and clean up the un-needed additional Terraform resources. Add the Grafana Ingress URL as a Terraform output for the module.
Adjust image and repo configuration to pull the correct version for Cluster Autoscaler and other components.
Adjust the node selectors for FluentBit and CloudWatch agent DaemonSets to deploy to all nodes.
Switch to using the new Go based high-performance FluentBit logger for CloudWatch.
Disable Grafana Live Server (as it requires WebSockets).
Add cookie based session stickiness to the Grafana ingress to allow the ALB Controller and the Grafana HA deployment to handle auth properly.
Fix FluentBit based Container Insights Logs.
Extend the CoreDNS creation timeout to 25Mins to allow for the control plane to self-heal in case of issues.

HTC-Grid:

Change to using eks-blueprints-addon for deploying the HTC-Grid Helm Chart as well as create the respective IRSA Role.
Adjust IAM Policies & Permissions (ensuring CloudWatch Log Group lifecycle handling is done via Terraform), as well as formatting and naming to ensure concsistency for all the Lambdas.
Split the Control Plane lambda defintions into their individual TF files, simplifying configuration and visibility and grouping for the resources created.

Terraform & Helm:

Adjust all of the Terraform Registry modules to use ~> version pinning, allowing any new non-major versions to be used (any minor and patch updates are allowed), simplifying dependency version updates and ensuring consistency.
Upgrade all of the Terraform modules from the Terraform Registry to use the current latest versions.
Upgrade all of the Terraform providers to use the latest available versions and major version pinning using thre ~> operator.
Upgrade all of the Helm charts and container images to the current latest version for all of the components.
Remove image level pinning of Helm AddOn components and pinned only using the Helm release versions.
Remove un-needed explicit depends_on statemenets which cause slowness and cyclic dependencies or failures on plan (by not allowing data sources to be computed before an apply).
Fix cyclic dependency and remove the need for running targeted applies for the IAM Policies for the EKS Pull Through Cache and Agent permissions in the apply/auto-apply stages.
Move to using aws_api_gateway_rest_api_policy instead of a direct policy attachment of a generic policy for OpenAPI Private, which showed changes on every terraform apply, due to the wildcard allow policy.
Configure the AWS CloudWatch Metrics and AWS for FluentBit deployments to run on the Core nodes.
Configure Grafana to start two replicas and spread them across different nodes for high availability.
Clean up the Helm chart values.yaml files, removing any unneeded and nrequired config, simplifying the deployments. Consolidating Helm chart versions into a single variable for ease of change and visibility.
Remove un-needed data sources and use module outputs as required to also enforce consistent implicit dependencies in Terraform.
Simplify and consolidate the variable definitions, usage and functions across all of the resources and modules.
Adjust output and variable descriptions, types and values to reflect the required information and ensure consistency.
Adjust provider configurations to ensure correct credential retrieval and handling.
Use aws_htc_ecr consistently across all of the Helm charts as the ECR source repository for pulling internal and pull-through images.

New Features:

Upgrade ElastiCache to version 7 and started using the AWS Graviton3 based cache.r7g.large instance(s) for the Redis cluster.
Add ability to do in-place upgrades of the ElastiCache clusters by versioning the Parameter Groups created/used.
Add watch_htc.sh script, which can be used to monitor the status of a Kubernetes job running tasks on HTC-Grid, as well as the status of the overall compute plane, including the HPA, Deployment, Nodes and Job Completion statuses as well as durations. The scripts takes two arguments, namely the namespace to be watched as well as the name of the Kubernetes job.
Add support for correct handling of the AWS Partition as well as AWS Partition DNS Suffix.
Add ability to automatically manage the lifecycle of the self-signed ALB Certificates via the deployment process (any certs about to expire will get automatically updated and rolled out without any downtime).
Migrate to using AWS Certificate Manager instead of the IAM Server Certificates for the ALB Certs.
Increase the self-signed ALB Cert validity to 1 year, with auto-renew if run within 6 months of expiration time
Add ability to automatically create, update and destroy an admin Cognito user via the deployment, to be used for the Grafana authentication, reducing the need for manual steps during the setup as well as the workshop.
Add user cleanup on destroy for the admin Cognito user (created for use with Grafana) as well as the relevant Cognito config with the Grafana Ingress.
Switch to creating the Cognito User for Grafana using TF native resources.
Switch the grafana_admin_password variable to be sensitive everywhere.
Add template file and generation for submitting a batch of multi-session tasks instead of copying/replacing at runtime of the workshop. Adjust docs/workshop accordingly.

Lambda Runtimes:

Unify all of the lambda_runtimes into a single Dockerfile, driving behavior via build time arguments.
Add package updates at build time (incl. cache clearing post updates), to ensure latest versions of updates are always included in the runtime images.
Migrate all build runtimes to use the ECR Pull Through Cache for the build images.
Simplify and consolidated the lambda runtime build and push Terraform resources into a single map of resources.
Fix Lambda Runtimes Dockerfile to handle different entrypoint source script for the provided runtime.

ECR & Image Builds:

Change all container images to use the ECR pull through-cache where possible.
Add a new pull-through-cache config for registry.k8s.io, to allow for pulling any cluster components automatically, i.e. the cluster-autoscaler.
Add flag (REBUILD_RUNTIMES) which allows re-creating the local images for all the runtimes (without using the cache) and pushing them to ECR.
Clean up image_repository keeping the minimum number of required external dependencies (that were not availble via an ECR Pull Through Cache), to be manually copied over to the local ECR repositories.
Add the ability to cleanup the ECR Pull Through Cache repositories upon running destroy-images.
Add image scanning on push/upload for all of the ECR Repositories.
Move to using for_each instead of count for ECR Repositories ensuring they don't get destroyed from a simple order change in the JSON Config.

Cloud9:

Fix all of the Cloud9 bootstrap errors, handling of different packages, correct installation and upgrade of all the components and improved the bootstrap logging to increase visibilty on the success or issues of the Cloud9 deployment.
Update default versions for all pre-requisites for the Clo...

Assets 2

07 Aug 19:06

fgogolli

v0.3.6

edc6e7a

v0.3.6

Adding support for Java based Lambda Workers #64
Adding automated Bandit security checks for pull requests #55
DynamoDB degrading state refactoring #52
Fixing instance profile association in the context of Config rule #51
Fix: automatically added timestamp upon task completion into DDB #43
Fixing Cloud9 deployment outside of EventEngine #46
Adding CDK has a deployment tool for the HTC Grid #39
demo update 2215871
feat: migration tentative to EKS blueprint d65abca
Adding Java runtime for Worker Lambdas + QuantLib example 9444a17

Assets 2

10 Oct 16:49

clementrey-dev

v0.3.5

d863ca3

v0.3.5 Pre-release

Pre-release

Merge pull request #38 from ruecarlo/main

fixed issue in cloud9 environment

Assets 2

28 Feb 04:04

ruecarlo

v0.3.4

c4edd33

v0.3.4

Minor issues and bugfixes:

Added attributes to python quantlib examples (Build pipeline still missing testing this path)

Assets 2

28 Feb 02:21

ruecarlo

v0.3.3

310301d

v0.3.3

Minor changes and bugfixes

Fixed issue with the python-quantlib path.

Assets 2

24 Feb 12:20

ruecarlo

v0.3.2

ebeaa81

v0.3.2

Update Workshop to the reflect image pull through cache changes
Update Documentation to reflect image pull through cache changes
Fix issue with lambda_scaling function
Fix issue with quantlib function

Assets 2

15 Sep 12:03

clementrey-dev

v0.3.1

9f1af13

v0.3.1

Fixes for issue #28

make Makefile silent for CentOS
improve documentation for deployment of the quant lib submission
fix tag issue while transferring images

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's changed

Terraform State:

HTC Grid Containers:

HTC Grid Control Plane:

General:

Cloud9:

Uh oh!

Uh oh!

Uh oh!

EKS Cluster & Nodes:

EKS AddOns:

HTC-Grid:

Terraform & Helm:

New Features:

Lambda Runtimes:

ECR & Image Builds:

Cloud9:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: finos/htc-grid

v0.4.3

What's changed

Terraform State:

HTC Grid Containers:

HTC Grid Control Plane:

General:

Cloud9:

Uh oh!

v0.4.2

Uh oh!

v0.4.1

Uh oh!

v0.4.0

EKS Cluster & Nodes:

EKS AddOns:

HTC-Grid:

Terraform & Helm:

New Features:

Lambda Runtimes:

ECR & Image Builds:

Cloud9:

Uh oh!

v0.3.6

Uh oh!

v0.3.5

Uh oh!

v0.3.4

Uh oh!

v0.3.3

Uh oh!

v0.3.2

Uh oh!

v0.3.1

Uh oh!