|
3 | 3 | Once you have gathered the necessary information as outlined in previous chapters, you may proceed with the
|
4 | 4 | installation process for xxQS_NAMExx.
|
5 | 5 |
|
6 |
| -## Download Product Packages |
7 |
| - |
8 |
| -For clusters intended for production environments, it is highly recommended to use pre-built packages by xxQS_COMPANY_NAMExx. |
9 |
| -xxQS_COMPANY_NAMExx ensures that all source code components used to build the packages are compatible with each other. |
10 |
| -The packages are built and carefully tested. |
11 |
| - |
12 |
| -xxQS_COMPANY_NAMExx offers patch releases for pre-built packages, along with support services to ensure that productive |
13 |
| -clusters receive the latest fixes and security enhancements. Professional engineers are available to provide |
14 |
| -assistance in case of any questions. |
15 |
| - |
16 |
| -Additionally, the packages from xxQS_COMPANY_NAMExx contain product enhancements that would not be available in packages |
17 |
| -that you built yourself. |
18 |
| - |
19 |
| -To receive a quote, please contact us at [xxQS_COMPANY_MAILxx](mailto:xxQS_COMPANY_MAILxx) or fill and send following |
20 |
| -[Questionnaire](https://www.hpc-gridware.com/quote/). |
21 |
| - |
22 |
| -The core xxQS_NAMExx code is available on GitHub. You can clone the required repositories and build the core product |
23 |
| -yourself, or use the nightly build. Please note that we do not provide support for these packages. It is not |
24 |
| -recommended to use the nightly build for production systems as it contains untested code that is still in development. |
25 |
| - |
26 |
| -The download of the pre-built packages is available at [xxQS_COMPANY_NAMExx Downloads](https://www.hpc-gridware.com/download-main). |
27 |
| - |
28 |
| -For a product installation you need a set of *tar.gz* files. Required are: |
29 |
| - |
30 |
| -* the common package containing architecture independent files (the file names *gcs-`<version>`-common.\** e.g. *gcs-9.0.0-common.tar.gz*) |
31 |
| - |
32 |
| -* one architecture specific package for each supported compute platform (files with the names *gcs-`<version>`-bin-`<os>`-`<platform>`.\** e.g. *gcs-9.0.0-bin-lx-amd64.tar.gz*) |
33 |
| - |
34 |
| -* the gcs-`<version>`-md5sum.txt file |
35 |
| - |
36 |
| -Additionally, you will also find product documentation, release notes and other packages for product extensions on the |
37 |
| -download page. |
38 |
| - |
39 |
| -Once you have downloaded all packages, you can test and install them at the designated installation location. |
40 |
| -Please note in the instructions below the placeholder `<install-dir>` refers to the absolute path of the |
41 |
| -installation directory, while `<download-dir>` refers to the directory containing the downloaded files. |
42 |
| - |
43 |
| -1. Copy the packages from your download location into the installation directory |
44 |
| - |
45 |
| - ``` |
46 |
| - % cp <download-dir>/gcs-* <install-dir> |
47 |
| - ``` |
48 |
| -
|
49 |
| -2. Check if the downloaded files where downloaded correctly by calculating the MD5 checksum. |
50 |
| -
|
51 |
| - ``` |
52 |
| - % cd <install-dir> |
53 |
| - % md5 gcs-* |
54 |
| - ... |
55 |
| - % cat gcs-9.0.0-md5sum.txt |
56 |
| - ... |
57 |
| - ``` |
58 |
| - |
59 |
| - Compare the output of the md5 command with that of the cat command. If one or more checksums are not correct then |
60 |
| - re-download the faulty files and repeat the previous steps, otherwise continue. |
61 |
| -
|
62 |
| -3. Unpack the packages as root and set the SGE_ROOT variable manually and execute the script *util/setfileperm.sh* to verify and adapt ownership and file permissions of the unpacked files. |
63 |
| -
|
64 |
| - ``` |
65 |
| - % su |
66 |
| - # cd <install-dir> |
67 |
| - # tar xfz gcs-*.tar.gz |
68 |
| - # SGE_ROOT=<install-dir> |
69 |
| - # util/setfileperm.sh $SGE_ROOT |
70 |
| - ``` |
71 |
| - |
72 |
| -4. If your `<install-dir>` is located on a shared filesystem available on all hosts in the cluster then you can start the installation process. |
73 |
| -
|
74 | 6 | ## Manual Installation
|
75 | 7 |
|
76 |
| -This section covers the manual installation process on the command line on Linux hosts. Note the prerequisites are |
77 |
| -required as outlined in previous chapters. If the hostname setup, usernames and service configuration are correct |
78 |
| -for all hosts that you intend to include in you cluster, then you can continue with the installation the master service. |
| 8 | +This section covers the manual installation process on the command line. Note the prerequisites are required as outlined in previous chapters. If the hostname setup, usernames and service configuration are correct for all hosts that you intend to include in you cluster, then you can continue with the installation the master service. |
79 | 9 |
|
80 | 10 | ### Installation of the Master Service
|
81 | 11 |
|
@@ -613,19 +543,248 @@ Here are the steps required to complete the installation.
|
613 | 543 |
|
614 | 544 | You are reaching the end of the manual installation.
|
615 | 545 |
|
| 546 | +### Installation of the Execution Service |
| 547 | +
|
| 548 | +During the execution host installation procedure following steps are processed: |
| 549 | +
|
| 550 | +* It is tested that the master service is running and that the execution host is able to communicate with the master service. |
| 551 | +
|
| 552 | +* An appropriate directory hierarchy is created as required by the `sge_execd` service. |
| 553 | +
|
| 554 | +* The `sge_execd` service is started and basic tests of its functionality are executed. |
| 555 | +
|
| 556 | +* The host is added to a default queue (optional) |
| 557 | +
|
| 558 | +Here are the steps required to complete the installation. |
| 559 | +
|
| 560 | +1. Log in as user root on an execution host. |
| 561 | +
|
| 562 | +2. Source the settings file that was created during the master service installation or set the SGE_ROOT environment variable manually. This Installation Guide assumes that the installation directory is available on all hosts in the same location. |
| 563 | +
|
| 564 | + ``` |
| 565 | + # . <install_dir>/<cell_name>/common/settings.sh |
| 566 | + # cd $SGE_ROOT |
| 567 | + ``` |
| 568 | + |
| 569 | +4. Verify, that the execution host has been declared as administrative host. Do this by executing the following `qconf` command on the master machine. The hostlist should contain the hostname of the new execution host. If it does not exit, then add the hostname to the list of administrative hosts by executing `qconf -ah <hostname>` on the master machine. |
| 570 | +
|
| 571 | + ``` |
| 572 | + # qconf -sh |
| 573 | + ... |
| 574 | + ``` |
| 575 | +5. Start the installation process by executing the `install_execd` script and read and follow the given instructions. |
| 576 | +
|
| 577 | + ``` |
| 578 | + # ./install_execd |
| 579 | + Welcome to the Cluster Scheduler execution host installation |
| 580 | + ------------------------------------------------------------ |
| 581 | + |
| 582 | + If you haven't installed the Cluster Scheduler qmaster host yet, you must execute |
| 583 | + this step (with >install_qmaster<) prior the execution host installation. |
| 584 | + |
| 585 | + For a successful installation you need a running Cluster Scheduler qmaster. It is |
| 586 | + also necessary that this host is an administrative host. |
| 587 | + |
| 588 | + You can verify your current list of administrative hosts with |
| 589 | + the command: |
| 590 | + |
| 591 | + # qconf -sh |
| 592 | + |
| 593 | + You can add an administrative host with the command: |
| 594 | + |
| 595 | + # qconf -ah <hostname> |
| 596 | + |
| 597 | + The execution host installation will take approximately 5 minutes. |
| 598 | + |
| 599 | + Hit <RETURN> to continue >> |
| 600 | + ``` |
| 601 | + |
| 602 | +6. Confirm the installation directory. The suggested default is the directory you set in the master service installation. |
| 603 | +
|
| 604 | + ``` |
| 605 | + Checking $SGE_ROOT directory |
| 606 | + ---------------------------- |
| 607 | + |
| 608 | + The Cluster Scheduler root directory is: |
| 609 | + |
| 610 | + $SGE_ROOT = <installation_directory> |
| 611 | + |
| 612 | + If this directory is not correct (e.g. it may contain an automounter |
| 613 | + prefix) enter the correct path to this directory or hit <RETURN> |
| 614 | + to use default [<installation_directory>] >> |
| 615 | + ``` |
| 616 | + |
| 617 | +7. Confirm the cell directory. The suggested default is the directory you set in the master service installation. You can enter a different cell name if you intend to start the execution service in a different cell. |
| 618 | +
|
| 619 | + ``` |
| 620 | + Cluster Scheduler cells |
| 621 | + ----------------------- |
| 622 | + |
| 623 | + Please enter cell name which you used for the qmaster |
| 624 | + installation or press <RETURN> to use [default] >> |
| 625 | + ``` |
| 626 | + |
| 627 | +8. Confirm the detected execution daemon TCP/IP port number. |
| 628 | +
|
| 629 | + ``` |
| 630 | + Cluster Scheduler TCP/IP communication service |
| 631 | + ---------------------------------------------- |
| 632 | + |
| 633 | + The port for sge_execd is set as service. |
| 634 | + |
| 635 | + sge_execd service set to port 6445 |
| 636 | + |
| 637 | + Hit <RETURN> to continue >> |
| 638 | + ``` |
| 639 | + |
| 640 | +9. The installer does verify the local hostname resolution and if the current host is an administrative host. |
| 641 | +
|
| 642 | + ``` |
| 643 | + Checking hostname resolving |
| 644 | + --------------------------- |
| 645 | + |
| 646 | + This hostname is known at qmaster as an administrative host. |
| 647 | + |
| 648 | + Hit <RETURN> to continue >> |
| 649 | + |
| 650 | +10. Specify the spooling directory for execution hosts |
| 651 | + |
| 652 | + ``` |
| 653 | + Execd spool directory configuration |
| 654 | + ----------------------------------- |
| 655 | +
|
| 656 | + You defined a global spool directory when you installed the master host. |
| 657 | + You can use that directory for spooling jobs from this execution host |
| 658 | + or you can define a different spool directory for this execution host. |
| 659 | + |
| 660 | + ATTENTION: For most operating systems, the spool directory does not have to |
| 661 | + be located on a local disk. The spool directory can be located on a |
| 662 | + network-accessible drive. However, using a local spool directory provides |
| 663 | + better performance. |
| 664 | + |
| 665 | + The spool directory is currently set to: |
| 666 | + <<<installation_directory>/default/spool/<hostname>>> |
| 667 | +
|
| 668 | + Do you want to configure a different spool directory |
| 669 | + for this host (y/n) [n] >> |
| 670 | + ``` |
| 671 | + |
| 672 | +11. The installer will create a local configuration for the execution host. |
| 673 | + |
| 674 | + ``` |
| 675 | + Creating local configuration |
| 676 | + ---------------------------- |
| 677 | + <admin_user>@<hostname> added "<hostname>" to configuration list |
| 678 | + Local configuration for host ><hostname>< created. |
| 679 | +
|
| 680 | + Hit <RETURN> to continue >> |
| 681 | + ``` |
| 682 | + |
| 683 | +12. Now specify if you want to start the execution service automatically. |
| 684 | + |
| 685 | + ``` |
| 686 | + execd startup script |
| 687 | + -------------------- |
| 688 | + |
| 689 | + We can install the startup script that will |
| 690 | + start execd at machine boot (y/n) [y] >> |
| 691 | + ``` |
| 692 | + |
| 693 | +13. The execution service is started. |
| 694 | + |
| 695 | + ``` |
| 696 | + Cluster Scheduler execution daemon startup |
| 697 | + ------------------------------------------ |
| 698 | +
|
| 699 | + Starting execution daemon. Please wait ... |
| 700 | + starting sge_execd |
| 701 | +
|
| 702 | + Hit <RETURN> to continue >> |
| 703 | + ``` |
| 704 | + |
| 705 | +14. Specify a queue for the new host. |
| 706 | + |
| 707 | + ``` |
| 708 | + Adding a queue for this host |
| 709 | + ---------------------------- |
| 710 | +
|
| 711 | + We can now add a queue instance for this host: |
| 712 | +
|
| 713 | + - it is added to the >allhosts< host group |
| 714 | + - the queue provides 32 slot(s) for jobs in all queues |
| 715 | + referencing the >allhosts< host group |
| 716 | +
|
| 717 | + You do not need to add this host now, but before running jobs on this host |
| 718 | + it must be added to at least one queue. |
| 719 | +
|
| 720 | + Do you want to add a default queue instance for this host (y/n) [y] >> |
| 721 | + ``` |
| 722 | + |
616 | 723 | ## Automatic Installation
|
617 | 724 |
|
618 |
| -## Backup and Restore |
| 725 | +The automatic installation process is based on the manual installation process where the installer gets a configuration file with predefined answers to those questions that would normally be asked during an interactive installation. For an automatic installation the configuration file has to be prepared, and it has to be passed to the installation script as argument with the `-auto` option. |
| 726 | + |
| 727 | +The auto installation is also able to install services on remote hosts if either passwordless `ssh` or `rsh` access is configured for the root user on the master machine. |
| 728 | + |
| 729 | +1. Login as root on the system where you intend to install a service. |
| 730 | + |
| 731 | +2. Make of copy of a configuration template file and prepare it with the answers to the questions that are usually asked during the manual installation process. If the root user has no write permissions in $SGE_ROOT then choose a different path but make sure that you preserve the file for the uninstallation process. |
| 732 | + |
| 733 | + ``` |
| 734 | + $ cp $SGE_ROOT/util/install_modules/inst_template.conf $SGE_ROOT/my_template.conf |
| 735 | + $ vi $SGE_ROOT/my_template.conf |
| 736 | + ... |
| 737 | + ``` |
| 738 | + |
| 739 | +3. On the master machine start the master installation |
| 740 | + |
| 741 | + ``` |
| 742 | + cd $SGE_ROOT |
| 743 | + ./inst_sge -m -auto $SGE_ROOT/my_template.conf |
| 744 | + ``` |
| 745 | + |
| 746 | +4. If you have a list of hosts specified as EXEC_HOST_LIST parameter in the configuration file AND when you have passwordless `ssh` or `rsh` access to those hosts then you can install the execution service on those hosts remotely from the master machine. |
| 747 | + |
| 748 | + ``` |
| 749 | + cd $SGE_ROOT |
| 750 | + ./inst_sge -x -auto $SGE_ROOT/my_template.conf |
| 751 | + ``` |
| 752 | + |
| 753 | + If you have no passwordless `ssh` or `rsh` access to those hosts then you have to log in to each host and start the installation process manually for each host individually. |
| 754 | + |
| 755 | +5. On shadow hosts install the shadow service |
619 | 756 |
|
620 |
| -## Upgrading Open Cluster Scheduler |
| 757 | + ``` |
| 758 | + cd $SGE_ROOT |
| 759 | + ./inst_sge -sm -auto $SGE_ROOT/my_template.conf |
| 760 | + ``` |
621 | 761 |
|
622 |
| -### Patch installation |
| 762 | +## Uninstallation |
623 | 763 |
|
624 |
| -### Side by Side Upgrade |
| 764 | +The uninstallation of the xxQS_NAMExx software can be done manually or automatically using the configuration template created during the auto installation. If you uninstall an execution host then make sure that there are no running jobs on that host. If you uninstall manually then make sure that all execution hosts are uninstalled first before you uninstall the master host or other services. |
625 | 765 |
|
626 |
| -## Testing the Installation/Upgrade |
| 766 | +1. Login as root on the system where you installed a service. |
627 | 767 |
|
628 |
| -## Troubleshooting |
| 768 | +2. Automatic uninstall the execution service on execution hosts. |
| 769 | + |
| 770 | + ``` |
| 771 | + cd $SGE_ROOT |
| 772 | + ./inst_sge -ux -auto $SGE_ROOT/my_template.conf |
| 773 | + ``` |
| 774 | + |
| 775 | +3. Manual uninstallation of the execution component. |
| 776 | + |
| 777 | + ``` |
| 778 | + cd $SGE_ROOT |
| 779 | + ./inst_sge -ux |
| 780 | + ``` |
| 781 | + |
| 782 | +4. Qmaster, shadow master and other services can be uninstalled the same way. To uninstall the qmaster service use the `-um` switch, for the shadow master service use the `-usm` switch. For the automatic uninstallation use the `-auto` switch with the configuration template. |
| 783 | + |
| 784 | + ``` |
| 785 | + cd $SGE_ROOT |
| 786 | + ./inst_sge ... |
| 787 | + ``` |
629 | 788 |
|
630 | 789 | [//]: # (Eeach file has to end with two emty lines)
|
631 | 790 |
|
0 commit comments