Skip to content

pnnl/agile-workflows

Repository files navigation

AGILE Workflows

Installation Instructions on a Docker Cluster

Alternative to Docker Desktop

brew install hyperkit
brew install minikube
brew install docker
brew install docker-compose
minikube start --driver=hyperkit --container-runtime=docker --mount --mount-string --mount-string $(pwd):/agile-workflows -n 2
eval $(minikube docker-env)
echo "`minikube ip` docker.local" | sudo tee -a /etc/hosts > /dev/null

Create the docker development cluster

docker compose -f docker/docker-compose.yml build
docker compose -f docker/docker-compose.yml up -d --scale node=2
bash docker/gen_machines_list.sh

Build the software from the head node

ssh -p 2022 -i docker/ssh/id_rsa.mpi agile@localhost
cd agile-workflows
source scripts/docker_modules.sh
source scripts/docker_setup.sh
conan install --install-folder build . --build missing
conan build --build-folder build .

Installation Instructions on the Puma Cluster

Load the modules and prepare for the build

source scripts/puma_modules.sh

First time setup and dependency installation

source scripts/puma_setup.sh

Build the agile-workflow package

The conan install command is only needed the first time or after updating dependencies.

conan install --install-folder build . --build missing
conan build --build-folder build .

Run the AGILE workflows

Download the AGILE WMD Dataset:

git clone [email protected]:iarpa-agile-gfi/pnnl/wmd-dataset.git

Workflow 1

cd agile-workflows
./build/bin/do-workflow1 ../wmd-dataset/data.001.csv pytorch-models/workflow1-GNN.pt

To scale the workflow on a machine with MPI, you can execute the program by running:

cd agile-workflows
mpiexec -n 4 --npernode 2 --bind-to socket ./build/bin/do-workflow1 ../wmd-dataset/data.001.csv pytorch-models/workflow1-GNN.pt
cd agile-workflows
mpiexec -n 16  --npernode 2 --bind-to socket ./build/bin/do-wk_multihop $PATH_TO_DATA/train_data.csv $PATH_TO_DATA/entity_features.csv $PATH_TO_DATA/relation_features.csv --gmt_num_workers 7 gmt_num_helpers 2

Workflow 2

Exact Matching

cd agile-workflows
./build/bin/do-wk2_exact ../wmd-dataset/data.001.csv

To scale the workflow on a machine with MPI, you can execute the program by running:

cd agile-workflows
mpiexec -n 4 --npernode 2 --bind-to socket ./build/bin/do-wk2_exact ../wmd-dataset/data.001.csv 

Approximate Matching

cd agile-workflows
./build/bin/do-wk2_approx ../wmd-dataset/pattern1.csv ../wmd-dataset/data.001.csv <number of matches>

To scale the workflow on a machine with MPI, you can execute the program by running:

cd agile-workflows
mpiexec -n 4 --npernode 2 --bind-to socket ./build/bin/do-wk2_approx ../wmd-dataset/pattern1.csv ../wmd-dataset/data.001.csv <number of matches>

Partial Matching

cd agile-workflows
./build/bin/do-wk2_partial ../wmd-dataset/data.001.csv

To scale the workflow on a machine with MPI, you can execute the program by running:

cd agile-workflows
mpiexec -n 4 --npernode 2 --bind-to socket ./build/bin/do-wk2_partial ../wmd-dataset/data.001.csv

Workflow 4 installation and run on PNNL clusters

This workflow has an external dependency on library ‘ripples’, which can be built (sequential or parallel version) by running following script:

cd agile-workflows
source scripts/ripples_setup_puma_seq.sh
OR
source scripts/ripples_setup_marianas_par.sh

Note: we need at least GCC-10 for ripples.

This workflow is run in THREE stages. The first stage executes the Workflow 4 binary with 5 parameters: social data file, cyber data sale, uses data sale, commercial data file, and coffee edge weight data file. The stage builds the data graph, selects the coffee subgraph, computes the weights of the subgraph’s sale edges, and prints out the weights. An example of the run command is

cd agile-workflows
./build/bin/do-workflow4 /lustre/scratch/feo/wk4/social.csv /lustre/scratch/feo/wk4/cyber.csv /lustre/scratch/feo/wk4/uses.csv /lustre/scratch/feo/wk4/commercial.csv /lustre/scratch/feo/wk4/coffeeEdgeWeights.csv

The second stage runs the ripples code with the weighted sale edges as input. It reads the coffee edge weight data file written by the first stage and writes out a json file of the key influencers. An example of the run command is

cd agile-workflows
$HOME/ripples/build/release/tools/imm -i /lustre/scratch/feo/wk4/coffeeEdgeWeights.csv -w -p -k 100 -d IC -e 0.25 -o /lustre/scratch/feo/wk4/influencers.json
OR 
mpiexec -np 2 --npernode 1 $HOME/ripples/build/release/tools/mpi-imm -i /lustre/scratch/feo/wk4/coffeeEdgeWeights.csv -w -p -k 100 -d IC -e 0.25 -o /lustre/scratch/feo/wk4/infulencers.json

Note that the maximum influence algorithm is not deterministic, so different runs reading the same input file may return different key influencers. Run Stage 1 and 2 ONLY when the data files read by Stage 1 change. The purpose of Stage 1 and 2 is to identify key coffee market influencers.

The third stage executes the Workflow 4 binary with 6 parameters: social data file, cyber data sale, uses data sale, commercial data file, coffee edge weight data file, and key influencer data file. The stage runs the complete workflow: it builds the data graph, selects the coffee subgraph, computes the weights of the subgraph’s sale edges, prints out the weights, deletes the key influencers and their sale/purchase edges, and signals their customers to find new suppliers for the coffee purchases they have lost. An example of the run command is

cd agile-workflows
./build/bin/do-workflow4 /lustre/scratch/feo/wk4/social.csv /lustre/scratch/feo/wk4/cyber.csv /lustre/scratch/feo/wk4/uses.csv /lustre/scratch/feo/wk4/commercial.csv /lustre/scratch/feo/wk4/coffeeEdgeWeights.csv /lustre/scratch/feo/wk4/influencers.json

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published