Launching hailctl dataproc error in pipeline-runner: components [beta] not installed #2013
Replies: 6 comments 2 replies
-
@bw2 @mike-w-wilson have you ever seen something like this before? |
Beta Was this translation helpful? Give feedback.
-
This is in the pipeline-runner container which is up-to-date. Could it be
something about the gcloud install inside the container or using windows or
ubuntu subsystem? Using dataproc should be really basic eh?
I know this is odd behavior - sorry.
Daniel
…On Fri, Jul 30, 2021 at 1:12 PM hanars ***@***.***> wrote:
@bw2 <https://github.com/bw2> @mike-w-wilson
<https://github.com/mike-w-wilson> have you ever seen something like this
before?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2013 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AELLUIZMJRFT3R2E4CO745DT2MBSNANCNFSM5BJEXXSA>
.
--
*Daniel J McGoldrick Ph.D*
*University of Washington*
*UW Genome Sciences Center (GRC),*
*Nickerson Lab*
*Box 355065Seattle, WA 98195(206) 685-7342*
|
Beta Was this translation helpful? Give feedback.
-
Unfortunately, I have not seen this error before. I'm curious to see if @bw2 has any insight though. |
Beta Was this translation helpful? Give feedback.
-
I'm not able to reproduce this currently. .
From googling your error message, I came across so one thing you might want to try is edit the docker-compose.yml file to disable
|
Beta Was this translation helpful? Give feedback.
-
That's encouraging! Thanks for looking must be something weird on our end
then...
d
…On Tue, Aug 3, 2021 at 6:59 AM bw2 ***@***.***> wrote:
I'm not able to reproduce this currently. .
wm103-772:/tmp $ docker-compose exec pipeline-runner /bin/bash
This shell is in the PIPELINE-RUNNER container.
2ea769730120:/]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster
Pulling VEP data from bucket in us.
gcloud beta dataproc clusters create \
seqr-loading-cluster \
--image-version=1.4-debian9 \
--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh \
--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* \
--master-machine-type=n1-highmem-8 \
--master-boot-disk-size=100GB \
--num-master-local-ssds=0 \
--num-secondary-workers=0 \
--num-worker-local-ssds=0 \
--num-workers=2 \
--secondary-worker-boot-disk-size=200GB \
--worker-boot-disk-size=200GB \
--worker-machine-type=n1-highmem-8 \
--zone=us-west1-c \
--initialization-action-timeout=20m \
--labels=creator=weisburd_broadinstitute_org \
--max-idle=30m
Starting cluster 'seqr-loading-cluster'...
You do not currently have this command group installed. Using it
requires the installation of components: [beta]
Your current Cloud SDK version is: 286.0.0
Installing components from version: 286.0.0
┌─────────────────────────────────────────────┐
│ These components will be installed. │
├──────────────────────┬────────────┬─────────┤
│ Name │ Version │ Size │
├──────────────────────┼────────────┼─────────┤
│ gcloud Beta Commands │ 2019.05.17 │ < 1 MiB │
└──────────────────────┴────────────┴─────────┘
For the latest full release notes, please visit:
https://cloud.google.com/sdk/release_notes
Do you want to continue (Y/n)?
╔════════════════════════════════════════════════════════════╗
╠═ Creating update staging area ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gcloud Beta Commands ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Creating backup and activating new installation ═╣
╚
════════════════════════════════════════════════════════════╝
Performing post processing steps...done.
Update done!
Restarting command:
$ gcloud beta dataproc clusters create seqr-loading-cluster --image-version=1.4-debian9 --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g --initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh --metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.* --master-machine-type=n1-highmem-8 --master-boot-disk-size=100GB --num-master-local-ssds=0 --num-secondary-workers=0 --num-worker-local-ssds=0 --num-workers=2 --secondary-worker-boot-disk-size=200GB --worker-boot-disk-size=200GB --worker-machine-type=n1-highmem-8 --zone=us-west1-c --initialization-action-timeout=20m --labels=creator=weisburd_broadinstitute_org --max-idle=30m
ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT: Zone 'seqr-project/us-west1-c' resides in unsupported region 'https://www.googleapis.com/compute/v1/projects/seqr-project/regions/us-west1'. Supported regions: [us-central1]
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/__main__.py", line 100, in main
cli.main(args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
jmp[args.module].main(args, pass_through_args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 369, in main
gcloud.run(cmd[1:])
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
return subprocess.check_call(["gcloud"] + command)
File "/usr/local/lib/python3.7/subprocess.py", line 341, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'beta', 'dataproc', 'clusters', 'create', 'seqr-loading-cluster', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh', '--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=0', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--zone=us-west1-c', '--initialization-action-timeout=20m', '--labels=creator=weisburd_broadinstitute_org', '--max-idle=30m']' returned non-zero exit status 1.
2ea769730120:/]$
2ea769730120:/]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-central1-a --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 0 seqr-loading-cluster
Pulling VEP data from bucket in us.
gcloud beta dataproc clusters create \
seqr-loading-cluster \
--image-version=1.4-debian9 \
--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g \
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh \
--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*|google-api-python-client \
--master-machine-type=n1-highmem-8 \
--master-boot-disk-size=100GB \
--num-master-local-ssds=0 \
--num-secondary-workers=0 \
--num-worker-local-ssds=0 \
--num-workers=2 \
--secondary-worker-boot-disk-size=200GB \
--worker-boot-disk-size=200GB \
--worker-machine-type=n1-highmem-8 \
--zone=us-central1-a \
--initialization-action-timeout=20m \
--labels=creator=weisburd_broadinstitute_org \
--max-idle=30m
Starting cluster 'seqr-loading-cluster'...
Waiting on operation [projects/seqr-project/regions/us-central1/operations/3c0dbb10-1840-364f-bfee-7191e9a7bfdf].
Waiting for cluster creation operation...
WARNING: For PD-Standard without local SSDs, we strongly recommend provisioning 1TB or larger to ensure consistently high I/O performance. See https://cloud.google.com/compute/docs/disks/performance for information on disk I/O performance.
Waiting for cluster creation operation...⠧
From googling your error message, I came across
https://ismailyenigul.wordpress.com/2018/07/13/gcloud-install-error-on-linux-error-gcloud-components-list-failed-to-fetch-component-listing-from-server/
so one thing you might want to try is edit the docker-compose.yml file to
disable net.ipv6.conf.all.disable_ipv6 as follows:
pipeline-runner:
image: gcr.io/seqr-project/pipeline-runner:gcloud-prod
volumes:
- ./data/seqr-reference-data:/seqr-reference-data
- ./data/vep_data:/vep_data
- ./data/input_vcfs:/input_vcfs
- ~/.config:/root/.config
sysctls:
net.ipv6.conf.all.disable_ipv6: 1
depends_on:
elasticsearch:
condition: service_healthy
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2013 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AELLUIZ4V543DR5HLIJGU7DT27Y3PANCNFSM5BJEXXSA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
--
*Daniel J McGoldrick Ph.D*
*University of Washington*
*UW Genome Sciences Center (GRC),*
*Nickerson Lab*
*Box 355065Seattle, WA 98195(206) 685-7342*
|
Beta Was this translation helpful? Give feedback.
-
Is it also possible that the
Which gave that same location issue, but when I used |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I cannot successfully replicate the hailctl dataproc upload procedure for seqr data upload using GCP.
"You do not currently have this command group installed. Using it
requires the installation of components: [beta]"
Link to page(s) where bug is occurring
following:
https://github.com/broadinstitute/seqr/blob/master/deploy/LOCAL_INSTALL.md
step 6
Scope of the bug
all projects
Screenshots
I had to change the zone and region also.
cd2c9b99d40b:/hail-elasticsearch-pipelines/luigi_pipeline]$ hailctl dataproc start --pkgs luigi,google-api-python-client --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preemptible-workers 12
seqr-loading-cluster
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/main.py", line 100, in main
cli.main(args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
jmp[args.module].main(args, pass_through_args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 274, in main
raise RuntimeError("Could not determine dataproc region. Use --region argument to hailctl, or use
gcloud config set dataproc/region <my-region>
to set a default.")RuntimeError: Could not determine dataproc region. Use --region argument to hailctl, or use
gcloud config set dataproc/region <my-region>
to set a default.cd2c9b99d40b:/hail-elasticsearch-pipelines/luigi_pipeline]$ hailctl dataproc start --pkgs luigi,google-api-python-client --region us-west1 --zone us-west1-c --vep GRCh37 --max-idle 30m --num-workers 2 --num-preem
ptible-workers 12 seqr-loading-cluster
Pulling VEP data from bucket in us.
gcloud beta dataproc clusters create
seqr-loading-cluster
--image-version=1.4-debian9
--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g
--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh
--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*
--master-machine-type=n1-highmem-8
--master-boot-disk-size=100GB
--num-master-local-ssds=0
--num-secondary-workers=12
--num-worker-local-ssds=0
--num-workers=2
--secondary-worker-boot-disk-size=200GB
--worker-boot-disk-size=200GB
--worker-machine-type=n1-highmem-8
--region=us-west1
--zone=us-west1-c
--initialization-action-timeout=20m
--max-idle=30m
Starting cluster 'seqr-loading-cluster'...
You do not currently have this command group installed. Using it
requires the installation of components: [beta]
ERROR: The component listing for Cloud SDK version [286.0.0] could not be found. Make sure this is a valid archived Cloud SDK version.
ERROR: (gcloud) Failed to fetch component listing from server. Check your network settings and try again.
Traceback (most recent call last):
File "/usr/local/bin/hailctl", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/main.py", line 100, in main
cli.main(args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/cli.py", line 122, in main
jmp[args.module].main(args, pass_through_args)
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/start.py", line 369, in main
gcloud.run(cmd[1:])
File "/usr/local/lib/python3.7/site-packages/hailtop/hailctl/dataproc/gcloud.py", line 9, in run
return subprocess.check_call(["gcloud"] + command)
File "/usr/local/lib/python3.7/subprocess.py", line 341, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['gcloud', 'beta', 'dataproc', 'clusters', 'create', 'seqr-loading-cluster', '--image-version=1.4-debian9', '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=41g', '--initialization-actions=gs://hail-common/hailctl/dataproc/0.2.61/init_notebook.py,gs://hail-common/hailctl/dataproc/0.2.61/vep-GRCh37.sh', '--metadata=^|||^VEP_REPLICATE=us|||VEP_CONFIG_PATH=/vep_data/vep-gcloud.json|||VEP_CONFIG_URI=file:///vep_data/vep-gcloud.json|||WHEEL=gs://hail-common/hailctl/dataproc/0.2.61/hail-0.2.61-py3-none-any.whl|||PKGS=google-api-python-client|luigi|aiohttp>=3.6,<3.7|aiohttp_session>=2.7,<2.8|asyncinit>=0.2.4,<0.3|bokeh>1.1,<1.3|decorator<5|Deprecated>=1.2.10,<1.3|dill>=0.3.1.1,<0.4|gcsfs==0.2.2|humanize==1.0.0|hurry.filesize==0.9|nest_asyncio|numpy<2|pandas>0.24,<0.26|parsimonious<0.9|PyJWT|python-json-logger==0.1.11|requests==2.22.0|scipy>1.2,<1.4|tabulate==0.8.3|tqdm==4.42.1|google-cloud-storage==1.25.*', '--master-machine-type=n1-highmem-8', '--master-boot-disk-size=100GB', '--num-master-local-ssds=0', '--num-secondary-workers=12', '--num-worker-local-ssds=0', '--num-workers=2', '--secondary-worker-boot-disk-size=200GB', '--worker-boot-disk-size=200GB', '--worker-machine-type=n1-highmem-8', '--region=us-west1', '--zone=us-west1-c', '--initialization-action-timeout=20m', '--max-idle=30m']' returned non-zero exit status 1.
Beta Was this translation helpful? Give feedback.
All reactions