This assignment is to imitate Animoto web service to generate videos transformed from client's pictures on AWS. Boto3, boto2 and python3 are required in the running machine in order to test our program.
-------- ----- > --------
| client1 | |-----------------------------------| | service1 |
| client2 |--->|... message3, message2, message1 |------>| service2 |
| ... | |-----------------------------------| | ... |
-------- --------
Message Queue service pool
-
minimoto_setup.py
create a configure.ini to store all info of aws resources generated by boto3 API.[INIT] key_file = key_file_name = region_name = ap-southeast-2 aws_access_key = aws_secret_access_key = service_ec2_type = t2.xlarge [SQS] queue_name = visible_time = queue_url = [S3] bucket_input = bucket_output = [EC2] client_identity = service_identity = watchdog_identity = client_id = watchdog_id = service_id = [SECURITY_GROUP] sg_name = security_group_id = [AMI] image_name = service_ami = [OUTPUT] s3_bucket_input = s3_bucket_output = client_user = client_addr = watchdog_user = watchdog_addr = service_user = service_addr = service_ami =
-
This key-value file will be shared by client, service, watchdog, setup and cleanup instances. Due to required global name of S3 bucket, uuid.uuid4() method is used to generate 32-length unique identifier. The format of key client_identity, service_identity, watchdog_identity and other resource's name will be the value of tag Name to distinguish which resources are generated by me, thereby easily doing setup failure rollback, cleanup and filter. The rest of value is simply generated by AWS API or given by testing user.
-
There will be two S3 buckets, one SQS queue, three EC2 instances and an image on cloud clearly. A new security group is created for setting traffic rule.
-
This part of code will build a session to manipulate all API of ec2, s3, sqs and other resources at first; secondly, a sqs queue and two s3 bucket will be run; thirdly, a security group will be created; fourthly, all ec2 instance will be launched up and testified if they are available and reachable; lastly, configure all instances, make an image for service instance and scp shared file(configure.ini) to all instances. Specifically, VisibilityTimeout of sqs queue is set 630s, which means, a message will not be accessible after received until 630 seconds goes by. Client instance and watchdog instance are type of 't2.micro' and service instance uses 't2.large'. Ubuntu 16 ami-96666ff5 will be used as template for all instances. The step of configuring instance is to install all necessary software package and configure service on cloud machines, like python3, python2, boto3, boto, etc.
-
Once setup is interrupt or fails, cleanup will be done to resume to the initial status by capturing any exceptions or error.
-
all configuration will do checking and make sure all softwares or files will be installed on the instances.
-
-
minimoto_install This file will be invoked during setup. Shell commands are passed by ssh to remote ec2 instances for installing necessary software, copying executable code and configuration. All instances will use this command to install software. sudo apt-get update sudo DEBIAN_FRONTEND=noninteractive apt-get install -y python3-pip sudo DEBIAN_FRONTEND=noninteractive apt-get install -y python-pip sudo pip3 install boto sudo pip3 install boto3 Client will need minimoto_client.py and configure.ini; Service will need minimoto_service.py, configure.ini, minimoto_i2v, run_service and service.cron; Watchdog will need minimoto_watchdog.py, configure.ini. Note: A new service process will be launched by cron every other 5 seconds.
-
minimoto_cleanup.py In this part, the corresponding API of resources I have used will be manipulated to find out all of which are created in the setup by tag Name and delete them. As every deletion process, search and filter will be executed at first so that there will be no issues of repetition of cleanup.
-
minimoto_client.py In this client-side program, there are several things in the following:
- read name of bucket input and output from configure.ini
- check whether the given input and output bucket exist or not; if not, create one
- generate a unique identifier for each request by uuid, combined with folder of pictures, like, folder_name_uuid as upload folder and output name
- send a request message like the follow format:
request = "transform" + "?" + "bucket_input_name=" + bucket_input_name + ":" +
"bucket_output_name=" + bucket_output_name + ":" +
"folder_name=" + folder_name - if '--wait' is given, loop and check if the video exists in output bucket
-
minimoto_service.py In the service-side, this program does the following things:
- check if the file request.lock exists; if exists, exit
- read name of bucket input and output from configure.ini
- receive only one request message from queue; create a lock file to prevent two processes from running at the same time by mutex(key)
- parse request message and download all files from input bucket
- invoke minimoto_i2v to transform pictures to a video
- upload the video to output bucket, delete request message and the lock file
-
minimoto_watchdog.py
- read ids or names of sqs and ec2 resources
- read args, if "--status" is provided, only print the format message as in spec and exit; if not, start to analyse and make a decision to scale in/out and fault tolerance. In this step, cloudwath resource will be used to get CPU utilisation as workload reference
- when "--status" is not provided, get the number of visible messages and in-flight messages (1) check if there is at least one service node; if no, scale out (2) check if there are any service nodes that are crash; if yes, terminate and replace one of them (3) if number of visible message > 0, there is at least 1 messsages not received yet -> scale out (4) if number of invisible message > 0, there is at least 1 messages being processed -> nothing to do (5) if not the above-mentioned both, this queue should be empty -> Once there is one service with over 5% CPU workload, exit; otherwise, scale in and leave only one service
- check if there is at least one active service instances; if not, a new service instance should be launched by backup AMI
Note:
- Scale in/out depends on the number of visibility and invisibility messages
- fault tolerance can be implemented by the feature of message visibility. Specifically, during processing message, in case service have fault and cannot execute the last step deleting message, this request message would show up again after 630 seconds in the queue; when wathdog observe there is a message in the queue again, a new instance will be relaunched by backup service AMI.
As described in minimoto_watchdog.py (2 - 4), there is no repetition here.
- There are four classes of tests following spec:
- simple-size test (5 pics with each 1M), over 10 request
- medium-size test (15 pics with each 2M), over 10 request
- large-size test (55 pics with each 2M), over 10 request
- xlarge-size test (125 pics with each 2M), over 5 request
- mix (4 small, 3 medium, 2 large, 1 xlarge), over 10 request
- Meanwhile, a fault test will be invoked to stop some or all service instances to see the watchdog behaviour
After testing, it is found that the data or metric of SQS or cloudwatch may not be accurate because of delay, which means watchdog cannot do a real-time adjustment somewhat. Additionally, there is no need to concerned about high level of aggregate of CPU(over 50%) utilisation as long as it is processing a request; conversely, it is a better option to set a priority for user experience due to delay of data response, say, if number of visibility will be over 0 and aggregate of cpu workload will be below 40% (at this moment , we may think some cpu could be idle and leave them process the request, but 40% data cannot prove that), a new instance should be launched up.