Skip to content

Commit 2b3a10e

Browse files
committed
Initial commit
0 parents  commit 2b3a10e

22 files changed

+2011
-0
lines changed

.travis.yml

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
language: python
2+
matrix:
3+
include:
4+
- python: 2.7
5+
- python: 3.6
6+
allow_failures:
7+
- python: 3.6
8+
install:
9+
- pip install --upgrade pip
10+
- pip install -r requirements.txt
11+
- pip install flake8
12+
before_script:
13+
# stop the build if there are Python syntax errors or undefined names
14+
- flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
15+
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
16+
- flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
17+
script: true # pytest
18+
notifications:
19+
email: false
20+
slack:
21+
secure: kDWVy90sDY+o3g0/ZTGX2D+PTbzhtd74Whe1AJHhcUDobTUzkch8GtY9eZxybZk4nga9lQxL6YeJ72SfBBEPaLzXcUMe0YcNaBydkQHcipKZn+Vcb8kf2FiZC6YwsUYfTvvH9MPLbkZOZvsNyd0h85z+hYMB8jHsq6Yn5gf79BA=
22+
on_failure: always
23+
on_success: change

LICENSE

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
Distributed-CellProfiler is distributed under the following BSD-style license:
2+
3+
Copyright © 2020 Broad Institute, Inc. All rights reserved.
4+
5+
Redistribution and use in source and binary forms, with or without
6+
modification, are permitted provided that the following conditions are
7+
met:
8+
9+
1. Redistributions of source code must retain the above copyright
10+
notice, this list of conditions and the following disclaimer.
11+
12+
2. Redistributions in binary form must reproduce the above copyright
13+
notice, this list of conditions and the following disclaimer in the
14+
documentation and/or other materials provided with the distribution.
15+
16+
3. Neither the name of the Broad Institute, Inc. nor the names of its
17+
contributors may be used to endorse or promote products derived from
18+
this software without specific prior written permission.
19+
20+
THIS SOFTWARE IS PROVIDED “AS IS.” BROAD MAKES NO EXPRESS OR IMPLIED
21+
REPRESENTATIONS OR WARRANTIES OF ANY KIND REGARDING THE SOFTWARE AND
22+
COPYRIGHT, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF
23+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, CONFORMITY WITH ANY
24+
DOCUMENTATION, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER
25+
DEFECTS, WHETHER OR NOT DISCOVERABLE. IN NO EVENT SHALL BROAD, THE
26+
COPYRIGHT HOLDERS, OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
27+
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
28+
BUT NOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
29+
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
30+
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
31+
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
32+
USE OF THIS SOFTWARE, EVEN IF ADVISED OF, HAVE REASON TO KNOW, OR IN
33+
FACT SHALL KNOW OF THE POSSIBILITY OF SUCH DAMAGE.
34+
35+
If, by operation of law or otherwise, any of the aforementioned
36+
warranty disclaimers are determined inapplicable, your sole remedy,
37+
regardless of the form of action, including, but not limited to,
38+
negligence and strict liability, shall be replacement of the software
39+
with an updated version if one exists.

README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Distributed-CellProfiler
2+
Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.
3+
4+
This code is an example of how to use AWS distributed infrastructure for running CellProfiler.
5+
The configuration of the AWS resources is done using fabric. The worker is written in Python
6+
and is encapsulated in a docker container. There are four AWS components that are needed to run
7+
distributed jobs:
8+
9+
1. An SQS queue
10+
2. An ECS cluster
11+
3. An S3 bucket
12+
4. A spot fleet of EC2 instances
13+
14+
All of them can be managed through the AWS Management Console. However, this code helps to get
15+
started quickly and run a job autonomously if all the configuration is correct. The code includes
16+
a fabric script that links all these components and prepares the infrastructure to run a distributed
17+
job. When the job is completed, the code is also able to stop resources and clean up components.
18+
19+
## Running the code
20+
21+
### Step 1
22+
Edit the config.py file with all the relevant information for your job. Then, start creating
23+
the basic AWS resources by running the following script:
24+
25+
$ python run.py setup
26+
27+
This script intializes the resources in AWS. Notice that the docker registry is built separately,
28+
and you can modify the worker code to build your own. Anytime you modify the worker code, you need
29+
to update the docker registry using the Makefile script inside the worker directory.
30+
31+
### Step 2
32+
After the first script runs successfully, the job can now be submitted to AWS using EITHER of the
33+
following commands:
34+
35+
$ python run.py submitJob files/exampleJob.json
36+
37+
OR
38+
39+
$ python run_batch_general.py
40+
41+
Running either script uploads the tasks that are configured in the json file. This assumes that your
42+
data is stored in S3, and the json file has the paths to find input and output directories. You have to
43+
customizethe exampleJob.json file or the run_batch_general file with paths that make sense for your project.
44+
The tasks that composeyour job are CP groups, and each one will be run in parallel. You need to define each
45+
task in your input file to guide the parallelization.
46+
47+
### Step 3
48+
After submitting the job to the queue, we can add computing power to process all tasks in AWS. This
49+
code starts a fleet of spot EC2 instances which will run the worker code. The worker code is encapsulated
50+
in docker containers, and the code uses ECS services to inject them in EC2. All this is automated
51+
with the following command:
52+
53+
$ python run.py startCluster files/exampleFleet.json
54+
55+
After the cluster is ready, the code informs you that everything is setup, and saves the spot fleet identifier
56+
in a file for further reference.
57+
58+
### Step 4
59+
When the cluster is up and running, you can monitor progress using the following command:
60+
61+
$ python run.py monitor files/APP_NAMESpotFleetRequestId.json
62+
63+
The file APP_NAMESpotFleetRequestId.json is created after the cluster is setup in step 3. It is
64+
important to keep this monitor running if you want to automatically shutdown computing resources
65+
when there are no more tasks in the queue (recommended).
66+
67+
See the wiki for more information about each step of the process.

config.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Constants (User configurable)
2+
3+
APP_NAME = 'DistributedCP' # Used to generate derivative names unique to the application.
4+
5+
# DOCKER REGISTRY INFORMATION:
6+
DOCKERHUB_TAG = 'cellprofiler/distributed-cellprofiler:2.0.0_4.0.6'
7+
8+
# AWS GENERAL SETTINGS:
9+
AWS_REGION = 'us-east-1'
10+
AWS_PROFILE = 'default' # The same profile used by your AWS CLI installation
11+
SSH_KEY_NAME = 'your-key-file.pem' # Expected to be in ~/.ssh
12+
AWS_BUCKET = 'your-bucket-name'
13+
14+
# EC2 AND ECS INFORMATION:
15+
ECS_CLUSTER = 'default'
16+
CLUSTER_MACHINES = 3
17+
TASKS_PER_MACHINE = 1
18+
MACHINE_TYPE = ['m4.xlarge']
19+
MACHINE_PRICE = 0.10
20+
EBS_VOL_SIZE = 30 # In GB. Minimum allowed is 22.
21+
DOWNLOAD_FILES = 'False'
22+
23+
# DOCKER INSTANCE RUNNING ENVIRONMENT:
24+
DOCKER_CORES = 1 # Number of CellProfiler processes to run inside a docker container
25+
CPU_SHARES = DOCKER_CORES * 1024 # ECS computing units assigned to each docker container (1024 units = 1 core)
26+
MEMORY = 4096 # Memory assigned to the docker container in MB
27+
SECONDS_TO_START = 0*60 # Wait before the next CP process is initiated to avoid memory collisions
28+
29+
# SQS QUEUE INFORMATION:
30+
SQS_QUEUE_NAME = APP_NAME + 'Queue'
31+
SQS_MESSAGE_VISIBILITY = 1*60 # Timeout (secs) for messages in flight (average time to be processed)
32+
SQS_DEAD_LETTER_QUEUE = 'arn:aws:sqs:some-region:111111100000:DeadMessages'
33+
34+
# LOG GROUP INFORMATION:
35+
LOG_GROUP_NAME = APP_NAME
36+
37+
# REDUNDANCY CHECKS
38+
CHECK_IF_DONE_BOOL = 'True' #True or False- should it check if there are a certain number of non-empty files and delete the job if yes?
39+
EXPECTED_NUMBER_FILES = 7 #What is the number of files that trigger skipping a job?
40+
MIN_FILE_SIZE_BYTES = 1 #What is the minimal number of bytes an object should be to "count"?
41+
NECESSARY_STRING = '' #Is there any string that should be in the file name to "count"?
42+
43+
# PLUGINS
44+
USE_PLUGINS = 'True'

files/ManualMetadata.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
''' A script to create a list of all the metadata combinations present in a given CSV
2+
This is designed to be called from the command line with
3+
$ python ManualMetadata.py pathtocsv/csvfile.csv "['Metadata_Metadata1','Metadata_Metadata2']"
4+
'''
5+
from __future__ import print_function
6+
7+
import pandas as pd
8+
import sys
9+
import ast
10+
11+
csv=sys.argv[1]
12+
metadatalist=ast.literal_eval(sys.argv[2])
13+
14+
def manualmetadata():
15+
incsv=pd.read_csv(csv)
16+
manmet=open(csv[:-4]+'batch.txt','w')
17+
print(incsv.shape)
18+
done=[]
19+
for i in range(incsv.shape[0]):
20+
metadatatext='{"Metadata": "'
21+
for j in metadatalist:
22+
metadatatext+=j+'='+str(incsv[j][i])+','
23+
metadatatext=metadatatext[:-1]+'"}, \n'
24+
if metadatatext not in done:
25+
manmet.write(metadatatext)
26+
done.append(metadatatext)
27+
manmet.close()
28+
print(str(len(done)), 'batches found')
29+
manualmetadata()

files/batches.sh

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Command to generate batches for a single plate.
2+
# It generates 384*9 tasks, corresponding to 384 wells with 9 images each.
3+
# An image is the unit of parallelization in this example.
4+
#
5+
# You need to install parallel to run this command.
6+
7+
parallel echo '{\"Metadata\": \"Metadata_Plate={1},Metadata_Well={2}{3},Metadata_Site={4}\"},' ::: Plate1 Plate2 ::: `echo {A..P}` ::: `seq -w 24` ::: `seq -w 9` | sort > batches.txt

files/exampleFleet_us-east-1.json

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{
2+
"IamFleetRole": "arn:aws:iam::XXXXXXXXXXXXX:role/aws-ec2-spot-fleet-role",
3+
"AllocationStrategy": "lowestPrice",
4+
"TerminateInstancesWithExpiration": true,
5+
"LaunchSpecifications": [
6+
{
7+
"ImageId": "ami-fad25980",
8+
"KeyName": "your_key_file_name",
9+
"IamInstanceProfile": {
10+
"Arn": "arn:aws:iam::XXXXXXXXXXXX:instance-profile/ecsInstanceRole"
11+
},
12+
"BlockDeviceMappings": [
13+
{
14+
"DeviceName": "/dev/xvda",
15+
"Ebs": {
16+
"DeleteOnTermination": true,
17+
"VolumeType": "gp2",
18+
"VolumeSize": 8,
19+
"SnapshotId": "snap-04007a196c0f3f398"
20+
}
21+
},
22+
{
23+
"DeviceName": "/dev/xvdcz",
24+
"Ebs": {
25+
"DeleteOnTermination": true,
26+
"VolumeType": "gp2"
27+
}
28+
}
29+
],
30+
"NetworkInterfaces": [
31+
{
32+
"DeviceIndex": 0,
33+
"SubnetId": "subnet-WWWWWWWW",
34+
"DeleteOnTermination": true,
35+
"AssociatePublicIpAddress": true,
36+
"Groups": [
37+
"sg-ZZZZZZZZZ"
38+
]
39+
}
40+
]
41+
}
42+
],
43+
"Type": "maintain"
44+
}
45+

files/exampleFleet_us-west-2.json

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
{
2+
"IamFleetRole": "arn:aws:iam::XXXXXXXXXXXXX:role/aws-ec2-spot-fleet-role",
3+
"AllocationStrategy": "lowestPrice",
4+
"TerminateInstancesWithExpiration": true,
5+
"LaunchSpecifications": [
6+
{
7+
"ImageId": "ami-c9c87cb1",
8+
"KeyName": "your_key_file_name",
9+
"IamInstanceProfile": {
10+
"Arn": "arn:aws:iam::XXXXXXXXXXXX:instance-profile/ecsInstanceRole"
11+
},
12+
"BlockDeviceMappings": [
13+
{
14+
"DeviceName": "/dev/xvda",
15+
"Ebs": {
16+
"DeleteOnTermination": true,
17+
"VolumeType": "gp2",
18+
"VolumeSize": 8,
19+
"SnapshotId": "snap-0b52be5bdbda1ac5f"
20+
}
21+
},
22+
{
23+
"DeviceName": "/dev/xvdcz",
24+
"Ebs": {
25+
"DeleteOnTermination": true,
26+
"VolumeType": "gp2"
27+
}
28+
}
29+
],
30+
"NetworkInterfaces": [
31+
{
32+
"DeviceIndex": 0,
33+
"SubnetId": "subnet-WWWWWWWW",
34+
"DeleteOnTermination": true,
35+
"AssociatePublicIpAddress": true,
36+
"Groups": [
37+
"sg-ZZZZZZZZZ"
38+
]
39+
}
40+
]
41+
}
42+
],
43+
"Type": "maintain"
44+
}
45+

files/exampleJob.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"_comment1": "Paths in this file are relative to the root of your S3 bucket",
3+
"pipeline": "projects/analysis.cppipe",
4+
"data_file": "projects/list_of_images.csv",
5+
"input": "projects/input/",
6+
"output": "projects/output/",
7+
"output_structure": "Metadata_Plate-Metadata_Well-Metadata_Site",
8+
"_comment2": "The following groups are tasks, and each will be run in parallel",
9+
"groups": [
10+
{"Metadata": "Metadata_Plate=Plate1,Metadata_Well=A01,Metadata_Site=1"},
11+
{"Metadata": "Metadata_Plate=Plate1,Metadata_Well=A01,Metadata_Site=2"}
12+
]
13+
}
14+

files/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
boto3>=1.0.0

python2worker/Dockerfile

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
#
2+
# - [ BROAD'16 ] -
3+
#
4+
# A docker instance for accessing AWS resources
5+
# This wraps the cellprofiler docker registry
6+
#
7+
8+
9+
FROM cellprofiler/cellprofiler:3.1.9
10+
11+
# Install S3FS
12+
13+
RUN apt-get -y update && \
14+
apt-get -y upgrade && \
15+
apt-get -y install \
16+
automake \
17+
autotools-dev \
18+
g++ \
19+
git \
20+
libcurl4-gnutls-dev \
21+
libfuse-dev \
22+
libssl-dev \
23+
libxml2-dev \
24+
make pkg-config \
25+
sysstat \
26+
curl
27+
28+
WORKDIR /usr/local/src
29+
RUN git clone https://github.com/s3fs-fuse/s3fs-fuse.git
30+
WORKDIR /usr/local/src/s3fs-fuse
31+
RUN ./autogen.sh
32+
RUN ./configure
33+
RUN make
34+
RUN make install
35+
36+
# Install AWS CLI
37+
38+
RUN \
39+
pip install awscli
40+
41+
# Install boto3
42+
43+
RUN \
44+
pip install -U boto3
45+
46+
# Install watchtower for logging
47+
48+
RUN \
49+
pip install watchtower==0.8.0
50+
51+
# Install pandas for optional file downloading
52+
53+
RUN pip install pandas==0.24.2
54+
55+
# SETUP NEW ENTRYPOINT
56+
57+
RUN mkdir -p /home/ubuntu/
58+
WORKDIR /home/ubuntu
59+
COPY cp-worker.py .
60+
COPY instance-monitor.py .
61+
COPY run-worker.sh .
62+
RUN chmod 755 run-worker.sh
63+
64+
RUN git clone https://github.com/CellProfiler/CellProfiler-plugins.git
65+
WORKDIR /home/ubuntu/CellProfiler-plugins
66+
#RUN pip install -r requirements.txt
67+
68+
WORKDIR /home/ubuntu
69+
ENTRYPOINT ["./run-worker.sh"]
70+
CMD [""]

0 commit comments

Comments
 (0)