Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: l2cache instructions #10

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 1.chunkedgraph.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Run the `terraform apply` command to create resources.

```shell
$ cd terraform/
$ terraform init // only needed first time
$ terraform init
$ terraform apply
```
This will output some variables useful for next steps:
Expand Down
15 changes: 13 additions & 2 deletions 3.l2cache.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@

# L2Cache

The L2Cache is a storage for parameters of individual L2 IDs in the PyChunkedGraph. L2 IDs represent the connected component of the supervoxel graph within a chunk. The most anticipated use of the L2 Cache is to query the information stored for all L2 IDs making up a single neuron. For instance, the volume of a neuron can be computed from the volume of all its L2 IDs.
The L2Cache is a storage for parameters of individual L2 IDs in the PyChunkedGraph. L2 IDs represent the connected component of the supervoxel graph within a chunk. The most anticipated use of the L2 Cache is to query the information stored for all L2 IDs making up a single neuron. For instance, the volume of a neuron can be computed from the volume of all its L2 IDs.

The main reason to store information for the L2 level is to make computation and retrieval of neuron-level information fast and easy following an edit. Edits typically only affect a few chunks among hundreds or thousands spanned by a neuron. Hence, information only needs to be recomputed for a few chunks instead of all of them.
The main reason to store information for the L2 level is to make computation and retrieval of neuron-level information fast and easy following an edit. Edits typically only affect a few chunks among hundreds or thousands spanned by a neuron. Hence, information only needs to be recomputed for a few chunks instead of all of them.

## Infrastructure
The [same infrastructure](1.chunkedgraph.md#infrastructure) used for creating a PyChunkedGraph can be used to create an L2Cache. But simpler because you will need workers for only one layer i.e. L2.

One other requirement for creating an L2Cache is a PyChunkedGraph server. A graphene protocol path that can be used a create a [CloudVolume](https://github.com/seung-lab/cloud-volume/) instance. The path looks something like this - `graphene://https://<server-host>/segmentation/table/<chunkedgraph>`. This assumes [CAVEdeployment](https://github.com/seung-lab/CAVEdeployment) has already been setup.

You will also need a [cave-secret](https://github.com/seung-lab/cloud-volume/#cave-secretjson) for authentication. Refer to `cave-secret.json` in [example_values.yaml](helm/l2cache/example_values.yaml).

## Ingest

Once the necessary infrastructure is setup, a `helm` [chart](helm/l2cache/) can be used to create a master and worker nodes. The master is used to create jobs using redis as a queue.
92 changes: 92 additions & 0 deletions helm/l2cache/example_values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
env:
- name: &commonEnvVars "pychunkedgraph"
vars:
REDIS_HOST: "<redis_host>" # refer to output of terraform apply
REDIS_PORT: 6379
REDIS_PASSWORD: ""
BIGTABLE_PROJECT: &bt_project "<google_project>"
BIGTABLE_INSTANCE: &bt_instance "<bigtable_instance>"
GOOGLE_APPLICATION_CREDENTIALS: /root/.cloudvolume/secrets/google-secret.json
SHELL: /bin/bash
FLASK_APP: run_dev.py
APP_SETTINGS: pychunkedgraph.app.config.DeploymentWithRedisConfig


configfiles:
- name: &bashrc "bashrc"
files:
".bashrc": |-
alias watch='watch '
alias ingest='flask ingest'
alias rqx='flask rq'

configyamls: []

secrets:
- name: &cloudVolumeSecrets cloud-volume-secrets
files:
# these are used by python bigtable client and cloud-files
# must have the following permissions:
# * read gcs objects if edges/component files are stored in google cloud buckets
# if they're stored elsewhere use the secrets with appropriate permissions accordingly
# * bigtable - create and read tables
google-secret.json: |-
{
<contents_of_service_accout_secret>
}
cave-secret.json: |-
{
"token": "<cave_token>"
}

deployments:
- enabled: true
name: &name master
nodeSelector:
cloud.google.com/gke-nodepool: master
hpa:
enabled: false
volumes: &commonVolumes
- name: *cloudVolumeSecrets
secret:
secretName: *cloudVolumeSecrets
- name: &bashrcVolume bashrc-volume
configMap:
name: *bashrc
containers:
- name: *name
image: &image
repository: &imageRep <image_repo>
tag: &tag "<image_tag>"
volumeMounts: &commonVolumeMounts
- name: *cloudVolumeSecrets
mountPath: /root/.cloudvolume/secrets
readOnly: true
- name: *bashrcVolume
mountPath: /root/
env:
- name: *commonEnvVars
resources:
requests:
memory: 500M


workerDeployments:
- enabled: true
name: &name l2
nodeSelector:
cloud.google.com/gke-nodepool: low
hpa:
enabled: true
minReplicas: 10
volumes: *commonVolumes
containers:
- name: *name
command: [rq, worker, *name]
image: *image
volumeMounts: *commonVolumeMounts
env:
- name: *commonEnvVars
resources:
requests:
memory: 1G
9 changes: 9 additions & 0 deletions l2cache.cmds
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
```shell
export PROJECT_ID=$(gcloud config get-value project)
git clone https://github.com/seung-lab/PCGL2Cache.git -b pcgv2-ingest
cd PCGL2Cache/
gcloud builds submit --config=cloudbuild.v2.ingest.yaml .

ingest v2 l2cache_aibs_v1dd aibs_v1dd \
graphene://https://api.em.brain.allentech.org/segmentation/table/aibs_v1dd "2022-09-16 12:00:00" --create
```