Skip to content

Latest commit

 

History

History
265 lines (221 loc) · 13.3 KB

File metadata and controls

265 lines (221 loc) · 13.3 KB

Description

This module performs the following tasks:

  • store an HTCondor Pool password in Google Cloud Secret Manager
    • will generate a new password if one is not supplied
  • create service accounts for an HTCondor Access Point and Central Manager
  • create a Toolkit runner for an Access Point
  • create a Toolkit runner for a Central Manager

It is expected to be used with the htcondor-install and htcondor-execute-point modules.

Example

The following code snippet uses this module to create a startup script that installs HTCondor software and configures an HTCondor Central Manager. A full example can be found in the examples README.

- id: network1
  source: modules/network/pre-existing-vpc

- id: htcondor_install
  source: community/modules/scripts/htcondor-install

- id: htcondor_configure
  source: community/modules/scheduler/htcondor-configure
  use:
  - network1

- id: htcondor_central_manager_startup
  source: modules/scripts/startup-script
  settings:
    runners:
    - $(htcondor_install.install_htcondor_runner)
    - $(htcondor_configure.central_manager_runner)

- id: htcondor_cm
  source: modules/compute/vm-instance
  use:
  - network1
  - htcondor_central_manager_startup
  settings:
    name_prefix: cm0
    machine_type: c2-standard-4
    disable_public_ips: true
    service_account:
      email: $(htcondor_configure.central_manager_service_account)
      scopes:
      - cloud-platform
    network_interfaces:
    - network: null
      subnetwork: $(network1.subnetwork_self_link)
      subnetwork_project: $(vars.project_id)
      network_ip: $(htcondor_configure.central_manager_internal_ip)
      stack_type: null
      access_config: []
      ipv6_access_config: []
      alias_ip_range: []
      nic_type: VIRTIO_NET
      queue_count: null
  outputs:
  - internal_ip

High Availability

This module supports high availability modes of the HTCondor Central Manager and of the Access Points. In these modes, the services can be resiliant against zonal failures by distributing the services across two zones. Modify the above example by setting central_manager_high_availability to true and adding a new deployment variable zone_secondary set to another zone in the same region. The 2 VMs can use the same startup script, but should differ by setting:

  • primary and secondary zones defined in deployment variables
  • primary and secondary IP addresses created by this module
  • differing name prefixes
vars:
  # add typical settings (deployment_name, project_id, etc.)
  # select a region and 2 different zones within the region
  region: us-central1
  zone: us-central1-c
  zone_secondary: us-central1-f

- id: htcondor_configure
  source: community/modules/scheduler/htcondor-configure
  use:
  - network1
  settings:
    central_manager_high_availability: true

- id: htcondor_cm_primary
  source: modules/compute/vm-instance
  use:
  - network1
  - htcondor_central_manager_startup
  settings:
    name_prefix: cm0
    machine_type: c2-standard-4
    disable_public_ips: true
    service_account:
      email: $(htcondor_configure.central_manager_service_account)
      scopes:
      - cloud-platform
    network_interfaces:
    - network: null
      subnetwork: $(network1.subnetwork_self_link)
      subnetwork_project: $(vars.project_id)
      network_ip: $(htcondor_configure.central_manager_internal_ip)
      stack_type: null
      access_config: []
      ipv6_access_config: []
      alias_ip_range: []
      nic_type: VIRTIO_NET
      queue_count: null
  outputs:
  - internal_ip

- id: htcondor_cm_secondary
  source: modules/compute/vm-instance
  use:
  - network1
  - htcondor_central_manager_startup
  settings:
    name_prefix: cm1
    machine_type: c2-standard-4
    zone: $(vars.zone_secondary)
    disable_public_ips: true
    service_account:
      email: $(htcondor_configure.central_manager_service_account)
      scopes:
      - cloud-platform
    network_interfaces:
    - network: null
      subnetwork: $(network1.subnetwork_self_link)
      subnetwork_project: $(vars.project_id)
      network_ip: $(htcondor_configure.central_manager_secondary_internal_ip)
      stack_type: null
      access_config: []
      ipv6_access_config: []
      alias_ip_range: []
      nic_type: VIRTIO_NET
      queue_count: null
  outputs:
  - internal_ip

Access Point high availability is impacted by known issues HTCONDOR-1590 and HTCONDOR-1594. These are anticipated to be resolved in LTS release 10.0.3 and above or feature release 10.4 and above. Please see HTCondor version numbering and release notes for details.

Support

HTCondor is maintained by the Center for High Throughput Computing at the University of Wisconsin-Madison. Support for HTCondor is available via:

License

Copyright 2022 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 0.13.0
google >= 3.83
random >= 3.0

Providers

Name Version
google >= 3.83
random >= 3.0

Modules

Name Source Version
access_point_service_account terraform-google-modules/service-accounts/google ~> 4.1
address terraform-google-modules/address/google ~> 3.0
central_manager_service_account terraform-google-modules/service-accounts/google ~> 4.1
execute_point_service_account terraform-google-modules/service-accounts/google ~> 4.1
health_check_firewall_rule terraform-google-modules/network/google//modules/firewall-rules ~> 6.0

Resources

Name Type
google_secret_manager_secret.pool_password resource
google_secret_manager_secret_iam_member.access_point resource
google_secret_manager_secret_iam_member.central_manager resource
google_secret_manager_secret_iam_member.execute_point resource
google_secret_manager_secret_version.pool_password resource
random_password.pool resource
google_compute_subnetwork.htcondor data source

Inputs

Name Description Type Default Required
access_point_roles Project-wide roles for HTCondor Access Point service account list(string)
[
"roles/compute.instanceAdmin",
"roles/monitoring.metricWriter",
"roles/logging.logWriter",
"roles/storage.objectViewer"
]
no
central_manager_high_availability Provision HTCondor central manager in high availability mode bool false no
central_manager_roles Project-wide roles for HTCondor Central Manager service account list(string)
[
"roles/monitoring.metricWriter",
"roles/logging.logWriter",
"roles/storage.objectViewer"
]
no
deployment_name HPC Toolkit deployment name. HTCondor cloud resource names will include this value. string n/a yes
execute_point_roles Project-wide roles for HTCondor Execute Point service account list(string)
[
"roles/monitoring.metricWriter",
"roles/logging.logWriter",
"roles/storage.objectViewer"
]
no
job_queue_high_availability Provision HTCondor access points in high availability mode (experimental: see README) bool false no
labels Labels to add to resources. List key, value pairs. map(string) n/a yes
pool_password HTCondor Pool Password string null no
project_id Project in which HTCondor pool will be created string n/a yes
region Default region for creating resources string n/a yes
spool_parent_dir HTCondor access point configuration SPOOL will be set to subdirectory named "spool" string "/var/lib/condor" no
subnetwork_self_link The self link of the subnetwork in which Central Managers will be placed. string n/a yes

Outputs

Name Description
access_point_runner Toolkit Runner to configure an HTCondor Access Point
access_point_service_account HTCondor Access Point Service Account (e-mail format)
central_manager_internal_ip Reserved internal IP address for use by Central Manager
central_manager_runner Toolkit Runner to configure an HTCondor Central Manager
central_manager_secondary_internal_ip Reserved internal IP address for use by failover Central Manager
central_manager_service_account HTCondor Central Manager Service Account (e-mail format)
execute_point_runner Toolkit Runner to configure an HTCondor Execute Point
execute_point_service_account HTCondor Execute Point Service Account (e-mail format)
pool_password_secret_id Google Cloud Secret Manager ID containing HTCondor Pool Password