Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how fence_kdump works #415

Merged
merged 3 commits into from
Sep 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 36 additions & 29 deletions xml/ha_fencing.xml
Original file line number Diff line number Diff line change
Expand Up @@ -446,56 +446,62 @@ hostlist</screen>
<para>Kdump belongs to the <xref linkend="sec-ha-fencing-special"
xrefstyle="select:title"/> and is in fact the opposite of a fencing device.
The plug-in checks if a Kernel dump is in progress on a node. If so, it
returns true, and acts <emphasis>as if</emphasis> the node has been fenced.
returns true and acts <emphasis>as if</emphasis> the node has been fenced,
because the node will reboot after the Kdump is complete.
If not, it returns a failure and the next fencing device is triggered.
</para>
<para>
The Kdump plug-in must be used in concert with another, real &stonith;
device, for example, <literal>external/ipmi</literal>. For the fencing
mechanism to work properly, you must specify that Kdump is checked before
a real &stonith; device is triggered. Use <command>crm configure
fencing_topology</command> to specify the order of the fencing devices as
The Kdump plug-in must be used together with another, real &stonith;
device, for example, <literal>external/ipmi</literal>. It does
<emphasis>not</emphasis> work with SBD as the &stonith; device. For the fencing
mechanism to work properly, you must specify the order of the fencing devices
so that Kdump is checked before a real &stonith; device is triggered, as
shown in the following procedure.
</para>
<procedure>
<step>
<para>
Use the <literal>stonith:fence_kdump</literal> resource agent (provided
by the package <package>fence-agents</package>)
to monitor all nodes with the Kdump function enabled. Find a
configuration example for the resource below:
Use the <literal>stonith:fence_kdump</literal> fence agent.
A configuration example is shown below. For more information,
see <command>crm ra info stonith:fence_kdump</command>.
</para>
<screen>&prompt.root;<command>crm configure</command>
&prompt.crm.conf;<command>primitive st-kdump stonith:fence_kdump \
params nodename="&node1; "\ </command><co xml:id="co-ha-fenc-kdump-nodename"/>
<command>pcmk_host_check="static-list" \
params nodename="&node1; "\ </command><co xml:id="co-ha-fence-kdump-nodename"/>
<command>pcmk_host_list="&node1;" \
pcmk_host_check="static-list" \
pcmk_reboot_action="off" \
pcmk_monitor_action="metadata" \
pcmk_reboot_retries="1" \
timeout="60"</command>
timeout="60"</command><co xml:id="co-ha-fence-kdump-timeout"/>
&prompt.crm.conf;<command>commit</command></screen>
<calloutlist>
<callout arearefs="co-ha-fenc-kdump-nodename">
<callout arearefs="co-ha-fence-kdump-nodename">
<para>
Name of the node to be monitored. If you need to monitor more than one
node, configure more &stonith; resources. To prevent a specific node
from using a fencing device, add location constraints.
Name of the node to listen for a message from <literal>fence_kdump_send</literal>.
Configure more &stonith; resources for other nodes if needed.
</para>
</callout>
<callout arearefs="co-ha-fence-kdump-timeout">
<para>
Defines how long to wait for a message from <literal>fence_kdump_send</literal>.
If a message is received, then a Kdump is in progress and the fencing mechanism
considers the node to be fenced. If no message is received, <literal>fence_kdump</literal>
times out, which indicates that the fence operation failed. The next &stonith; device
in the <literal>fencing_topology</literal> eventually fences the node.
</para>
</callout>
</calloutlist>
<para>
The fencing action starts after the timeout of the resource.
</para>
</step>
<step>
<para>
In <filename>/etc/sysconfig/kdump</filename> on each node, configure
<literal>KDUMP_POSTSCRIPT</literal> to send a notification to all nodes
when the Kdump process is finished. For example:
On each node, configure <literal>fence_kdump_send</literal> to send a message to
all nodes when the Kdump process is finished. In <filename>/etc/sysconfig/kdump</filename>,
edit the <literal>KDUMP_POSTSCRIPT</literal> line. For example:
</para>
<screen>KDUMP_POSTSCRIPT="/usr/lib/fence_kdump_send -i <replaceable>INTERVAL</replaceable> -p <replaceable>PORT</replaceable> -c 1 &node1; &node2; &node3;"</screen>
<screen>KDUMP_POSTSCRIPT="/usr/lib/fence_kdump_send -i 10 -p 7410 -c 1 <replaceable>NODELIST</replaceable>"</screen>
<para>
The node that does a Kdump restarts automatically after Kdump is
finished.
Replace <replaceable>NODELIST</replaceable> with the host names of all the cluster nodes.
</para>
</step>
<step>
Expand All @@ -514,15 +520,16 @@ hostlist</screen>
</step>
<step>
<para>
To achieve that Kdump is checked before triggering a real fencing
To have Kdump checked before triggering a real fencing
mechanism (like <literal>external/ipmi</literal>),
use a configuration similar to the following:</para>
<screen>&prompt.crm.conf;<command>fencing_topology \
&node1;: kdump-node1 ipmi-node1 \
&node2;: kdump-node2 ipmi-node2</command></screen>
&node2;: kdump-node2 ipmi-node2</command>
&prompt.crm.conf;<command>commit</command></screen>
<para>For more details on <option>fencing_topology</option>:
</para>
<screen>&prompt.root;<command>crm configure help fencing_topology</command></screen>
<screen>&prompt.crm.conf;<command>help fencing_topology</command></screen>
</step>
</procedure>
</example>
Expand Down