Skip to content

Commit 94ec82b

Browse files
committed
EH: CS-903: auto (un)installation not covered by the installation guide
1 parent 419b272 commit 94ec82b

File tree

6 files changed

+588
-81
lines changed

6 files changed

+588
-81
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Download Product Packages
2+
3+
For clusters intended for production environments, it is highly recommended to use pre-built packages by xxQS_COMPANY_NAMExx. xxQS_COMPANY_NAMExx ensures that all source code components used to build the packages are compatible with each other. The packages are built and carefully tested.
4+
5+
xxQS_COMPANY_NAMExx offers patch releases for pre-built packages, along with support services to ensure that productive clusters receive the latest fixes and security enhancements. Professional engineers are available to provide assistance in case of any questions.
6+
7+
Additionally, the packages from xxQS_COMPANY_NAMExx contain product enhancements that would not be available in packages that you built yourself.
8+
9+
To receive a quote, please contact us at [xxQS_COMPANY_MAILxx](mailto:xxQS_COMPANY_MAILxx) or fill and send following [Questionnaire](https://www.hpc-gridware.com/quote/).
10+
11+
The core xxQS_NAMExx code is available on GitHub. You can clone the required repositories and build the core product yourself, or use the nightly build. Please note that we do not provide support for these packages. It is not recommended to use the nightly build for production systems as it contains untested code that is still in development.
12+
13+
The download of the pre-built packages is available at [xxQS_COMPANY_NAMExx Downloads](https://www.hpc-gridware.com/download-main).
14+
15+
For a product installation you need a set of *tar.gz* files. Required are:
16+
17+
* the common package containing architecture independent files (the file names *gcs-`<version>`-common.\** e.g. *gcs-9.0.0-common.tar.gz*)
18+
19+
* one architecture specific package for each supported compute platform (files with the names *gcs-`<version>`-bin-`<os>`-`<platform>`.\** e.g. *gcs-9.0.0-bin-lx-amd64.tar.gz*)
20+
21+
* the gcs-`<version>`-md5sum.txt file
22+
23+
Additionally, you will also find product documentation, release notes and other packages for product extensions on the download page.
24+
25+
Once you have downloaded all packages, you can test and install them at the designated installation location. Please note in the instructions below the placeholder `<install-dir>` refers to the absolute path of the installation directory, while `<download-dir>` refers to the directory containing the downloaded files.
26+
27+
1. Copy the packages from your download location into the installation directory
28+
29+
```
30+
% cp <download-dir>/gcs-* <install-dir>
31+
```
32+
33+
2. Check if the downloaded files where downloaded correctly by calculating the MD5 checksum.
34+
35+
```
36+
% cd <install-dir>
37+
% md5 gcs-*
38+
...
39+
% cat gcs-9.0.0-md5sum.txt
40+
...
41+
```
42+
43+
Compare the output of the md5 command with that of the cat command. If one or more checksums are not correct then re-download the faulty files and repeat the previous steps, otherwise continue.
44+
45+
3. Unpack the packages as root and set the SGE_ROOT variable manually and execute the script *util/setfileperm.sh* to verify and adapt ownership and file permissions of the unpacked files.
46+
47+
```
48+
% su
49+
# cd <install-dir>
50+
# tar xfz gcs-*.tar.gz
51+
# SGE_ROOT=<install-dir>
52+
# util/setfileperm.sh $SGE_ROOT
53+
```
54+
55+
4. If your `<install-dir>` is located on a shared filesystem available on all hosts in the cluster then you can start the installation process.
56+
57+
[//]: # (Eeach file has to end with two emty lines)
58+

doc/markdown/manual/installation-guide/02_installation.md renamed to doc/markdown/manual/installation-guide/03_installation.md

Lines changed: 236 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -3,79 +3,9 @@
33
Once you have gathered the necessary information as outlined in previous chapters, you may proceed with the
44
installation process for xxQS_NAMExx.
55

6-
## Download Product Packages
7-
8-
For clusters intended for production environments, it is highly recommended to use pre-built packages by xxQS_COMPANY_NAMExx.
9-
xxQS_COMPANY_NAMExx ensures that all source code components used to build the packages are compatible with each other.
10-
The packages are built and carefully tested.
11-
12-
xxQS_COMPANY_NAMExx offers patch releases for pre-built packages, along with support services to ensure that productive
13-
clusters receive the latest fixes and security enhancements. Professional engineers are available to provide
14-
assistance in case of any questions.
15-
16-
Additionally, the packages from xxQS_COMPANY_NAMExx contain product enhancements that would not be available in packages
17-
that you built yourself.
18-
19-
To receive a quote, please contact us at [xxQS_COMPANY_MAILxx](mailto:xxQS_COMPANY_MAILxx) or fill and send following
20-
[Questionnaire](https://www.hpc-gridware.com/quote/).
21-
22-
The core xxQS_NAMExx code is available on GitHub. You can clone the required repositories and build the core product
23-
yourself, or use the nightly build. Please note that we do not provide support for these packages. It is not
24-
recommended to use the nightly build for production systems as it contains untested code that is still in development.
25-
26-
The download of the pre-built packages is available at [xxQS_COMPANY_NAMExx Downloads](https://www.hpc-gridware.com/download-main).
27-
28-
For a product installation you need a set of *tar.gz* files. Required are:
29-
30-
* the common package containing architecture independent files (the file names *gcs-`<version>`-common.\** e.g. *gcs-9.0.0-common.tar.gz*)
31-
32-
* one architecture specific package for each supported compute platform (files with the names *gcs-`<version>`-bin-`<os>`-`<platform>`.\** e.g. *gcs-9.0.0-bin-lx-amd64.tar.gz*)
33-
34-
* the gcs-`<version>`-md5sum.txt file
35-
36-
Additionally, you will also find product documentation, release notes and other packages for product extensions on the
37-
download page.
38-
39-
Once you have downloaded all packages, you can test and install them at the designated installation location.
40-
Please note in the instructions below the placeholder `<install-dir>` refers to the absolute path of the
41-
installation directory, while `<download-dir>` refers to the directory containing the downloaded files.
42-
43-
1. Copy the packages from your download location into the installation directory
44-
45-
```
46-
% cp <download-dir>/gcs-* <install-dir>
47-
```
48-
49-
2. Check if the downloaded files where downloaded correctly by calculating the MD5 checksum.
50-
51-
```
52-
% cd <install-dir>
53-
% md5 gcs-*
54-
...
55-
% cat gcs-9.0.0-md5sum.txt
56-
...
57-
```
58-
59-
Compare the output of the md5 command with that of the cat command. If one or more checksums are not correct then
60-
re-download the faulty files and repeat the previous steps, otherwise continue.
61-
62-
3. Unpack the packages as root and set the SGE_ROOT variable manually and execute the script *util/setfileperm.sh* to verify and adapt ownership and file permissions of the unpacked files.
63-
64-
```
65-
% su
66-
# cd <install-dir>
67-
# tar xfz gcs-*.tar.gz
68-
# SGE_ROOT=<install-dir>
69-
# util/setfileperm.sh $SGE_ROOT
70-
```
71-
72-
4. If your `<install-dir>` is located on a shared filesystem available on all hosts in the cluster then you can start the installation process.
73-
746
## Manual Installation
757

76-
This section covers the manual installation process on the command line on Linux hosts. Note the prerequisites are
77-
required as outlined in previous chapters. If the hostname setup, usernames and service configuration are correct
78-
for all hosts that you intend to include in you cluster, then you can continue with the installation the master service.
8+
This section covers the manual installation process on the command line. Note the prerequisites are required as outlined in previous chapters. If the hostname setup, usernames and service configuration are correct for all hosts that you intend to include in you cluster, then you can continue with the installation the master service.
799

8010
### Installation of the Master Service
8111

@@ -613,19 +543,248 @@ Here are the steps required to complete the installation.
613543
614544
You are reaching the end of the manual installation.
615545
546+
### Installation of the Execution Service
547+
548+
During the execution host installation procedure following steps are processed:
549+
550+
* It is tested that the master service is running and that the execution host is able to communicate with the master service.
551+
552+
* An appropriate directory hierarchy is created as required by the `sge_execd` service.
553+
554+
* The `sge_execd` service is started and basic tests of its functionality are executed.
555+
556+
* The host is added to a default queue (optional)
557+
558+
Here are the steps required to complete the installation.
559+
560+
1. Log in as user root on an execution host.
561+
562+
2. Source the settings file that was created during the master service installation or set the SGE_ROOT environment variable manually. This Installation Guide assumes that the installation directory is available on all hosts in the same location.
563+
564+
```
565+
# . <install_dir>/<cell_name>/common/settings.sh
566+
# cd $SGE_ROOT
567+
```
568+
569+
4. Verify, that the execution host has been declared as administrative host. Do this by executing the following `qconf` command on the master machine. The hostlist should contain the hostname of the new execution host. If it does not exit, then add the hostname to the list of administrative hosts by executing `qconf -ah <hostname>` on the master machine.
570+
571+
```
572+
# qconf -sh
573+
...
574+
```
575+
5. Start the installation process by executing the `install_execd` script and read and follow the given instructions.
576+
577+
```
578+
# ./install_execd
579+
Welcome to the Cluster Scheduler execution host installation
580+
------------------------------------------------------------
581+
582+
If you haven't installed the Cluster Scheduler qmaster host yet, you must execute
583+
this step (with >install_qmaster<) prior the execution host installation.
584+
585+
For a successful installation you need a running Cluster Scheduler qmaster. It is
586+
also necessary that this host is an administrative host.
587+
588+
You can verify your current list of administrative hosts with
589+
the command:
590+
591+
# qconf -sh
592+
593+
You can add an administrative host with the command:
594+
595+
# qconf -ah <hostname>
596+
597+
The execution host installation will take approximately 5 minutes.
598+
599+
Hit <RETURN> to continue >>
600+
```
601+
602+
6. Confirm the installation directory. The suggested default is the directory you set in the master service installation.
603+
604+
```
605+
Checking $SGE_ROOT directory
606+
----------------------------
607+
608+
The Cluster Scheduler root directory is:
609+
610+
$SGE_ROOT = <installation_directory>
611+
612+
If this directory is not correct (e.g. it may contain an automounter
613+
prefix) enter the correct path to this directory or hit <RETURN>
614+
to use default [<installation_directory>] >>
615+
```
616+
617+
7. Confirm the cell directory. The suggested default is the directory you set in the master service installation. You can enter a different cell name if you intend to start the execution service in a different cell.
618+
619+
```
620+
Cluster Scheduler cells
621+
-----------------------
622+
623+
Please enter cell name which you used for the qmaster
624+
installation or press <RETURN> to use [default] >>
625+
```
626+
627+
8. Confirm the detected execution daemon TCP/IP port number.
628+
629+
```
630+
Cluster Scheduler TCP/IP communication service
631+
----------------------------------------------
632+
633+
The port for sge_execd is set as service.
634+
635+
sge_execd service set to port 6445
636+
637+
Hit <RETURN> to continue >>
638+
```
639+
640+
9. The installer does verify the local hostname resolution and if the current host is an administrative host.
641+
642+
```
643+
Checking hostname resolving
644+
---------------------------
645+
646+
This hostname is known at qmaster as an administrative host.
647+
648+
Hit <RETURN> to continue >>
649+
650+
10. Specify the spooling directory for execution hosts
651+
652+
```
653+
Execd spool directory configuration
654+
-----------------------------------
655+
656+
You defined a global spool directory when you installed the master host.
657+
You can use that directory for spooling jobs from this execution host
658+
or you can define a different spool directory for this execution host.
659+
660+
ATTENTION: For most operating systems, the spool directory does not have to
661+
be located on a local disk. The spool directory can be located on a
662+
network-accessible drive. However, using a local spool directory provides
663+
better performance.
664+
665+
The spool directory is currently set to:
666+
<<<installation_directory>/default/spool/<hostname>>>
667+
668+
Do you want to configure a different spool directory
669+
for this host (y/n) [n] >>
670+
```
671+
672+
11. The installer will create a local configuration for the execution host.
673+
674+
```
675+
Creating local configuration
676+
----------------------------
677+
<admin_user>@<hostname> added "<hostname>" to configuration list
678+
Local configuration for host ><hostname>< created.
679+
680+
Hit <RETURN> to continue >>
681+
```
682+
683+
12. Now specify if you want to start the execution service automatically.
684+
685+
```
686+
execd startup script
687+
--------------------
688+
689+
We can install the startup script that will
690+
start execd at machine boot (y/n) [y] >>
691+
```
692+
693+
13. The execution service is started.
694+
695+
```
696+
Cluster Scheduler execution daemon startup
697+
------------------------------------------
698+
699+
Starting execution daemon. Please wait ...
700+
starting sge_execd
701+
702+
Hit <RETURN> to continue >>
703+
```
704+
705+
14. Specify a queue for the new host.
706+
707+
```
708+
Adding a queue for this host
709+
----------------------------
710+
711+
We can now add a queue instance for this host:
712+
713+
- it is added to the >allhosts< host group
714+
- the queue provides 32 slot(s) for jobs in all queues
715+
referencing the >allhosts< host group
716+
717+
You do not need to add this host now, but before running jobs on this host
718+
it must be added to at least one queue.
719+
720+
Do you want to add a default queue instance for this host (y/n) [y] >>
721+
```
722+
616723
## Automatic Installation
617724

618-
## Backup and Restore
725+
The automatic installation process is based on the manual installation process where the installer gets a configuration file with predefined answers to those questions that would normally be asked during an interactive installation. For an automatic installation the configuration file has to be prepared, and it has to be passed to the installation script as argument with the `-auto` option.
726+
727+
The auto installation is also able to install services on remote hosts if either passwordless `ssh` or `rsh` access is configured for the root user on the master machine.
728+
729+
1. Login as root on the system where you intend to install a service.
730+
731+
2. Make of copy of a configuration template file and prepare it with the answers to the questions that are usually asked during the manual installation process. If the root user has no write permissions in $SGE_ROOT then choose a different path but make sure that you preserve the file for the uninstallation process.
732+
733+
```
734+
$ cp $SGE_ROOT/util/install_modules/inst_template.conf $SGE_ROOT/my_template.conf
735+
$ vi $SGE_ROOT/my_template.conf
736+
...
737+
```
738+
739+
3. On the master machine start the master installation
740+
741+
```
742+
cd $SGE_ROOT
743+
./inst_sge -m -auto $SGE_ROOT/my_template.conf
744+
```
745+
746+
4. If you have a list of hosts specified as EXEC_HOST_LIST parameter in the configuration file AND when you have passwordless `ssh` or `rsh` access to those hosts then you can install the execution service on those hosts remotely from the master machine.
747+
748+
```
749+
cd $SGE_ROOT
750+
./inst_sge -x -auto $SGE_ROOT/my_template.conf
751+
```
752+
753+
If you have no passwordless `ssh` or `rsh` access to those hosts then you have to log in to each host and start the installation process manually for each host individually.
754+
755+
5. On shadow hosts install the shadow service
619756

620-
## Upgrading Open Cluster Scheduler
757+
```
758+
cd $SGE_ROOT
759+
./inst_sge -sm -auto $SGE_ROOT/my_template.conf
760+
```
621761

622-
### Patch installation
762+
## Uninstallation
623763

624-
### Side by Side Upgrade
764+
The uninstallation of the xxQS_NAMExx software can be done manually or automatically using the configuration template created during the auto installation. If you uninstall an execution host then make sure that there are no running jobs on that host. If you uninstall manually then make sure that all execution hosts are uninstalled first before you uninstall the master host or other services.
625765

626-
## Testing the Installation/Upgrade
766+
1. Login as root on the system where you installed a service.
627767

628-
## Troubleshooting
768+
2. Automatic uninstall the execution service on execution hosts.
769+
770+
```
771+
cd $SGE_ROOT
772+
./inst_sge -ux -auto $SGE_ROOT/my_template.conf
773+
```
774+
775+
3. Manual uninstallation of the execution component.
776+
777+
```
778+
cd $SGE_ROOT
779+
./inst_sge -ux
780+
```
781+
782+
4. Qmaster, shadow master and other services can be uninstalled the same way. To uninstall the qmaster service use the `-um` switch, for the shadow master service use the `-usm` switch. For the automatic uninstallation use the `-auto` switch with the configuration template.
783+
784+
```
785+
cd $SGE_ROOT
786+
./inst_sge ...
787+
```
629788

630789
[//]: # (Eeach file has to end with two emty lines)
631790

0 commit comments

Comments
 (0)