This module performs the following tasks:
- create an instance template from which execute points will be created
- create a managed instance group (MIG) for execute points
- create a Toolkit runner to configure the autoscaler to scale the MIG
It is expected to be used with the htcondor-install and htcondor-configure modules.
This module may be used exactly 1 or 2 times in a blueprint to create sets of
execute points in an HTCondor pool. If using 1 set, it may use either Spot or
On-demand pricing. If using 2 sets, one must use Spot and the other must
use On-demand pricing. If you do not follow this constraint, you will likely
receive an error while running terraform apply
similar to that shown below.
Future development is planned to support more than 2 sets of VM configurations,
including all pricing options.
│ │ var.runners is list of map of string with 7 elements
│
│ All startup-script runners must have a unique destination.
│
│ This was checked by the validation rule at modules/startup-script/variables.tf:72,3-13.
HTCondor access points provisioned by the Toolkit are specially configured to
add an attribute named RequireSpot
to each Job ClassAd. When this
value is true, a job's requirements
are automatically updated to require
that it run on a Spot VM. When this value is false, the requirements
are
similarly updated to run only on On-Demand VMs. The default value of this
attribute is false. A job submit file may override this value as shown below.
universe = vanilla
executable = /bin/echo
arguments = "Hello, World!"
output = out.\$(ClusterId).\$(ProcId)
error = err.\$(ClusterId).\$(ProcId)
log = log.\$(ClusterId).\$(ProcId)
request_cpus = 1
request_memory = 100MB
+RequireSpot = true
queue
A full example can be found in the examples README.
The following code snippet creates a pool with 2 sets of HTCondor execute points, one using On-demand pricing and the other using Spot pricing. They use a startup script and network created in previous steps.
- id: htcondor_execute_point
source: community/modules/compute/htcondor-execute-point
use:
- network1
- htcondor_configure_execute_point
settings:
service_account:
email: $(htcondor_configure.execute_point_service_account)
scopes:
- cloud-platform
- id: htcondor_execute_point_spot
source: community/modules/compute/htcondor-execute-point
use:
- network1
- htcondor_configure_execute_point
settings:
service_account:
email: $(htcondor_configure.execute_point_service_account)
scopes:
- cloud-platform
- id: htcondor_startup_access_point
source: modules/scripts/startup-script
settings:
runners:
- $(htcondor_install.install_htcondor_runner)
- $(htcondor_install.install_autoscaler_deps_runner)
- $(htcondor_install.install_autoscaler_runner)
- $(htcondor_configure.access_point_runner)
- $(htcondor_execute_point.configure_autoscaler_runner)
- $(htcondor_execute_point_spot.configure_autoscaler_runner)
- id: htcondor_access
source: modules/compute/vm-instance
use:
- network1
- htcondor_startup_access_point
settings:
name_prefix: access-point
machine_type: c2-standard-4
service_account:
email: $(htcondor_configure.access_point_service_account)
scopes:
- cloud-platform
HTCondor is maintained by the Center for High Throughput Computing at the University of Wisconsin-Madison. Support for HTCondor is available via:
When using OS Login with "external users" (outside of the Google Cloud organization), then Docker universe jobs will fail and cause the Docker daemon to crash. This stems from the use of POSIX user ids (uid) outside the range supported by Docker. Please consider disabling OS Login if this atypical situation applies.
vars:
# add setting below to existing deployment variables
enable_oslogin: DISABLE
Copyright 2022 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Name | Version |
---|---|
terraform | >= 0.13.0 |
No providers.
Name | Source | Version |
---|---|---|
execute_point_instance_template | terraform-google-modules/vm/google//modules/instance_template | ~> 8.0 |
mig | terraform-google-modules/vm/google//modules/mig | ~> 8.0 |
No resources.
Name | Description | Type | Default | Required |
---|---|---|---|---|
deployment_name | HPC Toolkit deployment name. HTCondor cloud resource names will include this value. | string |
n/a | yes |
disk_size_gb | Boot disk size in GB | number |
100 |
no |
enable_oslogin | Enable or Disable OS Login with "ENABLE" or "DISABLE". Set to "INHERIT" to inherit project OS Login setting. | string |
"ENABLE" |
no |
image | HTCondor execute point VM image | object({ |
{ |
no |
labels | Labels to add to HTConodr execute points | map(string) |
n/a | yes |
machine_type | Machine type to use for HTCondor execute points | string |
"n2-standard-4" |
no |
max_size | Maximum size of the HTCondor execute point pool. | number |
100 |
no |
metadata | Metadata to add to HTCondor execute points | map(string) |
{} |
no |
min_idle | Minimum number of idle VMs in the HTCondor pool (if pool reaches var.max_size, this minimum is not guaranteed); set to ensure jobs beginning run more quickly. | number |
0 |
no |
network_self_link | The self link of the network HTCondor execute points will join | string |
"default" |
no |
network_storage | An array of network attached storage mounts to be configured | list(object({ |
[] |
no |
project_id | Project in which the HTCondor execute points will be created | string |
n/a | yes |
region | The region in which HTCondor execute points will be created | string |
n/a | yes |
service_account | Service account to attach to HTCondor execute points | object({ |
{ |
no |
spot | Provision VMs using discounted Spot pricing, allowing for preemption | bool |
false |
no |
startup_script | Startup script to run at boot-time for HTCondor execute points | string |
null |
no |
subnetwork_self_link | The self link of the subnetwork HTCondor execute points will join | string |
null |
no |
target_size | Initial size of the HTCondor execute point pool; set to null (default) to avoid Terraform management of size. | number |
null |
no |
zone | The default zone in which resources will be created | string |
n/a | yes |
Name | Description |
---|---|
configure_autoscaler_runner | Toolkit runner to configure the HTCondor autoscaler |