Skip to content

mlnrt/aws-monster-chaos-game

Repository files navigation

The AWS Monster Chaos Game

This is a chaos engineering game for AWS developed using the AWS CDK and an Adafruit Pyportal microcontroller. It includes:

  • A 4 tiers, 3 AZs simple web application stack
  • A chaos engineering stack to inject failures into the web application stack
  • An IoT Stack to register an Adafruit PyPortal microcontroller to AWS IoT Core and trigger chaos experiments
  • A Circuit Python Game for Adafruit PyPortal microcontroller

The Architecture

Here is a video explaining the architecture:

AWS Chaos Game - The Architecture

The Demo

Here is a demo of the game in action:

AWS Monster Chaos Game - The Demo

Want to Play the Game by Yourself?

The Prerequisites?

The list below is for Windows environment

If you want the full experience, you will need an Adafruit PyPortal which you can purchase here. But if you don't have one, you can still deploy the stacks and trigger manually the AWS Step Function to inject failures into the web application stack.

The Setup

  1. Setup your Adafruit PyPortal. Please refer to this README
  2. (Optional) You can edit some configuration about the application in the webapp-config.json file. These parameters are used to configure the AWS Fargate Tasks and the Amazon CloudWatch Alarms for the web application.
    • app: Configuration of the web application AWS Fargate Task
      • namespace: The internal domain name of the web application AWS Fargate Task configured on AWS CloudMap and Amazon Route 53
      • name: Name of the web application. The web application Tasks will be internally available at http://<app.name>.<app.namespace>:<app.port> to the Nginx reverse proxy.
      • path: Path of the web application. The web application Tasks will be available externally at the URL http://<LoadBalancer DNS Name>/<app.path>
      • port: Port of the web application.
      • healtchCeckPath: Path of the health check of the web application. This is used by CloudMap to check the health of the AWS Fargate Tasks.
    • nginx: Configuration of the Nginx reverse proxy AWS Fargate Task
      • port: Port of the Nginx reverse proxy.
      • healthCheckPath: Path of the health check of the Nginx reverse proxy. This is used by the ALB to monitor the health of the Nginx reverse proxy AWS Fargate Tasks.
    • fis: Configuration of the alarm level triggering an AWS FIS experiment to be stopped.
      • numberOfEvaluationPeriods: The number of 1 minutes periods for the metric to be above the threshold to raise an alarm
      • alarmErrorThresholdPerPeriod: The number of acceptable errors during a 1 minute period. If a metric is above the threshold for the number of periods, an alarm is raised and the AWS FIS experiment is stopped.
{
    "app": {
        "namespace": "aws-chaos.game",
        "name": "app",
        "path": "/game",
        "healthCheckPath": "/health",
        "port": 3000
    },
    "nginx": {
        "port": 8080,
        "healthCheckPath": "/health"
    },
    "fis": {
        "numberOfEvaluationPeriods": 2,
        "alarmErrorThresholdPerPeriod": 50
    }
}
  1. Deploy the Stacks on AWS. Please refer to this README

How does it work?

Important: As per the application configuration displayed above, the default URL of the web application is http://<LoadBalancer DNS Name>/game

The Adafruit PyPortal micro-controller is connected to AWS IoT Core by uploading on the device the certificates generated by AWS IoT Core.

  1. When you play the game on the Adafruit PyPortal micro-controller and lose the game, it sends a MQTT message to AWS IoT Core.
  2. An IoT Rule then triggers an AWS Step Function to start a state machine for the Fault Injection experiment.
  3. The state machine starts an AWS Lambda function which will randomly pick one of the AWS Fault Injection Service experiment and start it. Every 5 seconds another AWS Lambda function will check if the experiment is still running. If the experiment is still running it will execute a third AWS Lambda function to generate traffic on the web application ALB. This is done to simulate user traffic and generate errors if the application is not capable of handling the chaos generated by the monsters.
  4. If an alarm is raised in Amazon CloudWatch, the experiment will stop and the state machine will update the lost score in the Amazon DynamoDB table.
  5. If an alarm is not raised in Amazon CloudWatch, the experiment will continue to the end and the state machine will update the won score in the Amazon DynamoDB table.

Fixing the Nginx Reverse Proxy

As mentioned in the demo, the Nginx reverse proxy is on purposed misconfigured. This is done to simulate the case when the application is not able to handle the chaos generated by the monsters and the Fault Injection experiments helps to reveal issues in the application. But if you want to test how the application is able to handle the chaos when the Nginx reverse proxy is properly configured, you can edit the Nginx reverse proxy configuration.

In the /resources/services/nginx/nginx.conf file uncomment the following lines

      # resolver 169.254.169.253;
      # set $upstream_endpoint "http://<APP-NAME>.<APP-NAMESPACE>:<APP-PORT>";
      # proxy_pass $upstream_endpoint$request_uri;

And comment the line below

      proxy_pass http://<APP-NAME>.<APP-NAMESPACE>:<APP-PORT>;

Code References, Credits and Licenses

About

AWS Chaos Game

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors