Launching hailctl dataproc error in pipeline-runner: components [beta] not installed #2013

dmcgoldrick · 2021-07-30T15:41:55Z

dmcgoldrick
Jul 30, 2021

Describe the bug
I cannot successfully replicate the hailctl dataproc upload procedure for seqr data upload using GCP.

"You do not currently have this command group installed. Using it
requires the installation of components: [beta]"

Link to page(s) where bug is occurring

following:
https://github.com/broadinstitute/seqr/blob/master/deploy/LOCAL_INSTALL.md

step 6

Scope of the bug
all projects

Screenshots

I had to change the zone and region also.

cd2c9b99d40b:/hail-elasticsearch-pipelines/luigi_pipeline]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 12
seqr-loading-cluster
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/main.py", line 100, in main
cli.main(args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
jmp[args.module].main(args, pass_through_args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 274, in main
raise RuntimeError("Could not determine dataproc region. Use --region argument to hailctl, or use gcloud config set dataproc/region <my-region> to set a default.")
RuntimeError: Could not determine dataproc region. Use --region argument to hailctl, or use gcloud config set dataproc/region <my-region> to set a default.
cd2c9b99d40b:/hail-elasticsearch-pipelines/luigi_pipeline]$ hailctl dataproc start --pkgs luigi,google-api-python-client --region us-west1 --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preem
ptible-workers 12 seqr-loading-cluster
Pulling VEP data from bucket in us.
gcloud beta dataproc clusters create
seqr-loading-cluster
--image-version=1.4-debian9
--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh
--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*
--master-machine-type=n1-highmem-8
--master-boot-disk-size=100GB
--num-master-local-ssds=0
--num-secondary-workers=12
--num-worker-local-ssds=0
--num-workers=2
--secondary-worker-boot-disk-size=200GB
--worker-boot-disk-size=200GB
--worker-machine-type=n1-highmem-8
--region=us-west1
--zone=us-west1-c
--initialization-action-timeout=20m
--max-idle=30m
Starting cluster 'seqr-loading-cluster'...
You do not currently have this command group installed. Using it
requires the installation of components: [beta]
ERROR: The component listing for Cloud SDK version [286.0.0] could not be found. Make sure this is a valid archived Cloud SDK version.
ERROR: (gcloud) Failed to fetch component listing from server. Check your network settings and try again.
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/main.py", line 100, in main
cli.main(args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
jmp[args.module].main(args, pass_through_args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 369, in main
gcloud.run(cmd[1:])
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
return subprocess.check_call(["gcloud"] + command)
File "/usr/local/lib/python3.7/subprocess.py", line 341, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'beta', 'dataproc', 'clusters', 'create', 'seqr-loading-cluster', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh', '--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=12', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--region=us-west1', '--zone=us-west1-c', '--initialization-action-timeout=20m', '--max-idle=30m']' returned non-zero exit status 1.

hanars · 2021-07-30T20:12:11Z

hanars
Jul 30, 2021
Maintainer

@bw2 @mike-w-wilson have you ever seen something like this before?

0 replies

dmcgoldrick · 2021-08-02T15:27:48Z

dmcgoldrick
Aug 2, 2021
Author

This is in the pipeline-runner container which is up-to-date. Could it be something about the gcloud install inside the container or using windows or ubuntu subsystem? Using dataproc should be really basic eh? I know this is odd behavior - sorry. Daniel

…

On Fri, Jul 30, 2021 at 1:12 PM hanars ***@***.***> wrote: @bw2 <https://github.com/bw2> @mike-w-wilson <https://github.com/mike-w-wilson> have you ever seen something like this before? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2013 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AELLUIZMJRFT3R2E4CO745DT2MBSNANCNFSM5BJEXXSA> .

-- *Daniel J McGoldrick Ph.D* *University of Washington* *UW Genome Sciences Center (GRC),* *Nickerson Lab* *Box 355065Seattle, WA 98195(206) 685-7342*

0 replies

mike-w-wilson · 2021-08-02T18:03:54Z

mike-w-wilson
Aug 2, 2021

Unfortunately, I have not seen this error before. I'm curious to see if @bw2 has any insight though.

0 replies

bw2 · 2021-08-03T13:59:08Z

bw2
Aug 3, 2021

I'm not able to reproduce this currently. .

wm103-772:/tmp $ docker-compose exec pipeline-runner /bin/bash

This shell is in the PIPELINE-RUNNER container.

2ea769730120:/]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster

Pulling VEP data from bucket in us.
gcloud beta dataproc clusters create \
    seqr-loading-cluster \
    --image-version=1.4-debian9 \
    --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
    --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh \
    --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-secondary-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
    --secondary-worker-boot-disk-size=200GB \
    --worker-boot-disk-size=200GB \
    --worker-machine-type=n1-highmem-8 \
    --zone=us-west1-c \
    --initialization-action-timeout=20m \
    --labels=creator=weisburd_broadinstitute_org \
    --max-idle=30m
Starting cluster 'seqr-loading-cluster'...
You do not currently have this command group installed.  Using it
requires the installation of components: [beta]


Your current Cloud SDK version is: 286.0.0
Installing components from version: 286.0.0

┌─────────────────────────────────────────────┐
│     These components will be installed.     │
├──────────────────────┬────────────┬─────────┤
│         Name         │  Version   │   Size  │
├──────────────────────┼────────────┼─────────┤
│ gcloud Beta Commands │ 2019.05.17 │ < 1 MiB │
└──────────────────────┴────────────┴─────────┘

For the latest full release notes, please visit:
  https://cloud.google.com/sdk/release_notes

Do you want to continue (Y/n)?
╔════════════════════════════════════════════════════════════╗
╠═ Creating update staging area                             ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gcloud Beta Commands                         ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Creating backup and activating new installation          ═╣
╚

════════════════════════════════════════════════════════════╝

Performing post processing steps...done.

Update done!

Restarting command:
  $ gcloud beta dataproc clusters create seqr-loading-cluster --image-version=1.4-debian9 --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* --master-machine-type=n1-highmem-8 --master-boot-disk-size=100GB --num-master-local-ssds=0 --num-secondary-workers=0 --num-worker-local-ssds=0 --num-workers=2 --secondary-worker-boot-disk-size=200GB --worker-boot-disk-size=200GB --worker-machine-type=n1-highmem-8 --zone=us-west1-c --initialization-action-timeout=20m --labels=creator=weisburd_broadinstitute_org --max-idle=30m

ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT: Zone 'seqr-project/us-west1-c' resides in unsupported region 'https://www.googleapis.com/compute/v1/projects/seqr-project/regions/us-west1'. Supported regions: [us-central1]
Traceback (most recent call last):
  File "/usr/local/bin/hailctl", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 100, in main
    cli.main(args)
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
    jmp[args.module].main(args, pass_through_args)
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 369, in main
    gcloud.run(cmd[1:])
  File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
    return subprocess.check_call(["gcloud"] + command)
  File "/usr/local/lib/python3.7/subprocess.py", line 341, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'beta', 'dataproc', 'clusters', 'create', 'seqr-loading-cluster', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh', '--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--zone=us-west1-c', '--initialization-action-timeout=20m', '--labels=creator=weisburd_broadinstitute_org', '--max-idle=30m']' returned non-zero exit status 1.
2ea769730120:/]$
2ea769730120:/]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-central1-a --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster
Pulling VEP data from bucket in us.
gcloud beta dataproc clusters create \
    seqr-loading-cluster \
    --image-version=1.4-debian9 \
    --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
    --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh \
    --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*|google-api-python-client \
    --master-machine-type=n1-highmem-8 \
    --master-boot-disk-size=100GB \
    --num-master-local-ssds=0 \
    --num-secondary-workers=0 \
    --num-worker-local-ssds=0 \
    --num-workers=2 \
    --secondary-worker-boot-disk-size=200GB \
    --worker-boot-disk-size=200GB \
    --worker-machine-type=n1-highmem-8 \
    --zone=us-central1-a \
    --initialization-action-timeout=20m \
    --labels=creator=weisburd_broadinstitute_org \
    --max-idle=30m
Starting cluster 'seqr-loading-cluster'...
Waiting on operation [projects/seqr-project/regions/us-central1/operations/3c0dbb10-1840-364f-bfee-7191e9a7bfdf].
Waiting for cluster creation operation...
WARNING: For PD-Standard without local SSDs, we strongly recommend provisioning 1TB or larger to ensure consistently high I/O performance. See https://cloud.google.com/compute/docs/disks/performance for information on disk I/O performance.
Waiting for cluster creation operation...⠧

From googling your error message, I came across
https://ismailyenigul.wordpress.com/2018/07/13/gcloud-install-error-on-linux-error-gcloud-components-list-failed-to-fetch-component-listing-from-server/

so one thing you might want to try is edit the docker-compose.yml file to disable net.ipv6.conf.all.disable_ipv6 as follows:

  pipeline-runner:
    image: gcr.io/seqr-project/pipeline-runner:gcloud-prod
    volumes:
      - ./data/seqr-reference-data:/seqr-reference-data
      - ./data/vep_data:/vep_data
      - ./data/input_vcfs:/input_vcfs
      - ~/.config:/root/.config
    sysctls:
      net.ipv6.conf.all.disable_ipv6: 1
    depends_on:
      elasticsearch:
        condition: service_healthy

0 replies

dmcgoldrick · 2021-08-03T23:24:15Z

dmcgoldrick
Aug 3, 2021
Author

That's encouraging! Thanks for looking must be something weird on our end then... d

…

On Tue, Aug 3, 2021 at 6:59 AM bw2 ***@***.***> wrote: I'm not able to reproduce this currently. . wm103-772:/tmp $ docker-compose exec pipeline-runner /bin/bash This shell is in the PIPELINE-RUNNER container. 2ea769730120:/]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster Pulling VEP data from bucket in us. gcloud beta dataproc clusters create \ seqr-loading-cluster \ --image-version=1.4-debian9 \ --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \ --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh \ --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* \ --master-machine-type=n1-highmem-8 \ --master-boot-disk-size=100GB \ --num-master-local-ssds=0 \ --num-secondary-workers=0 \ --num-worker-local-ssds=0 \ --num-workers=2 \ --secondary-worker-boot-disk-size=200GB \ --worker-boot-disk-size=200GB \ --worker-machine-type=n1-highmem-8 \ --zone=us-west1-c \ --initialization-action-timeout=20m \ --labels=creator=weisburd_broadinstitute_org \ --max-idle=30m Starting cluster 'seqr-loading-cluster'... You do not currently have this command group installed. Using it requires the installation of components: [beta] Your current Cloud SDK version is: 286.0.0 Installing components from version: 286.0.0 ┌─────────────────────────────────────────────┐ │ These components will be installed. │ ├──────────────────────┬────────────┬─────────┤ │ Name │ Version │ Size │ ├──────────────────────┼────────────┼─────────┤ │ gcloud Beta Commands │ 2019.05.17 │ < 1 MiB │ └──────────────────────┴────────────┴─────────┘ For the latest full release notes, please visit: https://cloud.google.com/sdk/release_notes Do you want to continue (Y/n)? ╔════════════════════════════════════════════════════════════╗ ╠═ Creating update staging area ═╣ ╠════════════════════════════════════════════════════════════╣ ╠═ Installing: gcloud Beta Commands ═╣ ╠════════════════════════════════════════════════════════════╣ ╠═ Creating backup and activating new installation ═╣ ╚ ════════════════════════════════════════════════════════════╝ Performing post processing steps...done. Update done! Restarting command: $ gcloud beta dataproc clusters create seqr-loading-cluster --image-version=1.4-debian9 --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* --master-machine-type=n1-highmem-8 --master-boot-disk-size=100GB --num-master-local-ssds=0 --num-secondary-workers=0 --num-worker-local-ssds=0 --num-workers=2 --secondary-worker-boot-disk-size=200GB --worker-boot-disk-size=200GB --worker-machine-type=n1-highmem-8 --zone=us-west1-c --initialization-action-timeout=20m --labels=creator=weisburd_broadinstitute_org --max-idle=30m ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT: Zone 'seqr-project/us-west1-c' resides in unsupported region 'https://www.googleapis.com/compute/v1/projects/seqr-project/regions/us-west1'. Supported regions: [us-central1] Traceback (most recent call last): File "/usr/local/bin/hailctl", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 100, in main cli.main(args) File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main jmp[args.module].main(args, pass_through_args) File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 369, in main gcloud.run(cmd[1:]) File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run return subprocess.check_call(["gcloud"] + command) File "/usr/local/lib/python3.7/subprocess.py", line 341, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['gcloud', 'beta', 'dataproc', 'clusters', 'create', 'seqr-loading-cluster', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh', '--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--zone=us-west1-c', '--initialization-action-timeout=20m', '--labels=creator=weisburd_broadinstitute_org', '--max-idle=30m']' returned non-zero exit status 1. 2ea769730120:/]$ 2ea769730120:/]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-central1-a --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster Pulling VEP data from bucket in us. gcloud beta dataproc clusters create \ seqr-loading-cluster \ --image-version=1.4-debian9 \ --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \ --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh \ --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*|google-api-python-client \ --master-machine-type=n1-highmem-8 \ --master-boot-disk-size=100GB \ --num-master-local-ssds=0 \ --num-secondary-workers=0 \ --num-worker-local-ssds=0 \ --num-workers=2 \ --secondary-worker-boot-disk-size=200GB \ --worker-boot-disk-size=200GB \ --worker-machine-type=n1-highmem-8 \ --zone=us-central1-a \ --initialization-action-timeout=20m \ --labels=creator=weisburd_broadinstitute_org \ --max-idle=30m Starting cluster 'seqr-loading-cluster'... Waiting on operation [projects/seqr-project/regions/us-central1/operations/3c0dbb10-1840-364f-bfee-7191e9a7bfdf]. Waiting for cluster creation operation... WARNING: For PD-Standard without local SSDs, we strongly recommend provisioning 1TB or larger to ensure consistently high I/O performance. See https://cloud.google.com/compute/docs/disks/performance for information on disk I/O performance. Waiting for cluster creation operation...⠧ From googling your error message, I came across https://ismailyenigul.wordpress.com/2018/07/13/gcloud-install-error-on-linux-error-gcloud-components-list-failed-to-fetch-component-listing-from-server/ so one thing you might want to try is edit the docker-compose.yml file to disable net.ipv6.conf.all.disable_ipv6 as follows: pipeline-runner: image: gcr.io/seqr-project/pipeline-runner:gcloud-prod volumes: - ./data/seqr-reference-data:/seqr-reference-data - ./data/vep_data:/vep_data - ./data/input_vcfs:/input_vcfs - ~/.config:/root/.config sysctls: net.ipv6.conf.all.disable_ipv6: 1 depends_on: elasticsearch: condition: service_healthy — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2013 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AELLUIZ4V543DR5HLIJGU7DT27Y3PANCNFSM5BJEXXSA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

-- *Daniel J McGoldrick Ph.D* *University of Washington* *UW Genome Sciences Center (GRC),* *Nickerson Lab* *Box 355065Seattle, WA 98195(206) 685-7342*

0 replies

lauragails · 2023-04-17T22:44:25Z

lauragails
Apr 17, 2023

Is it also possible that the --zone flag should be --region? Or that these flags are different in VMs and dataprocs? I hit the same error when running on a google cloud virtual machine and tried running:

sudo docker-compose exec pipeline-runner /bin/bash 
hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-east1 --vep GRCh38 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster

Which gave that same location issue, but when I used --region instead of --zone it got to the next step (then failed because I need to make a dataproc cluster and not a VM....

2 replies

hanars Apr 18, 2023
Maintainer

We recommend you set your default region: https://cloud.google.com/compute/docs/regions-zones/changing-default-zone-region

lauragails Apr 18, 2023

I did try that, and it didn't work (and my virtual machine did have a default setting in the east, offhand).

I suspect this error has something to do with a permissions setting making sure that the service account associated with a virtual machine inherited all of the settings from the main account. Not 100% confirmed yet, though, because I had to call it a day yesterday before testing it out. Will post here if that was the solution, so others who might see this post have another thing to try. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Launching hailctl dataproc error in pipeline-runner: components [beta] not installed #2013

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Launching hailctl dataproc error in pipeline-runner: components [beta] not installed #2013

Uh oh!

dmcgoldrick Jul 30, 2021

Replies: 6 comments · 2 replies

Uh oh!

hanars Jul 30, 2021 Maintainer

Uh oh!

dmcgoldrick Aug 2, 2021 Author

Uh oh!

mike-w-wilson Aug 2, 2021

Uh oh!

bw2 Aug 3, 2021

Uh oh!

dmcgoldrick Aug 3, 2021 Author

Uh oh!

Uh oh!

lauragails Apr 17, 2023

Uh oh!

hanars Apr 18, 2023 Maintainer

Uh oh!

lauragails Apr 18, 2023

dmcgoldrick
Jul 30, 2021

Replies: 6 comments 2 replies

hanars
Jul 30, 2021
Maintainer

dmcgoldrick
Aug 2, 2021
Author

mike-w-wilson
Aug 2, 2021

bw2
Aug 3, 2021

dmcgoldrick
Aug 3, 2021
Author

lauragails
Apr 17, 2023

hanars Apr 18, 2023
Maintainer