Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ failovermon(1M) — DG/UX 5.4R2.01

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

sysadm(1M)

admfailoverdisk(1M)

failoverd(1M)

failover(4M)



failovermon(1M)              DG/UX 5.4 Rel. 2.01             failovermon(1M)


NAME
       failovermon - manage failover monitors

SYNOPSIS
       failovermon -o add [ -i interval ] [ -r retries ] [ -l lost-pulse ] [
                     -g regain-pulse ] [ -b ] [ -s ] hostname

       failovermon -o delete hostname

       failovermon -o modify [ -i interval ] [ -r retries ] [ -l lost-pulse
                     ] [ -g regain-pulse ] [ -bn ] [ -s ] hostname

       failovermon -o list [ -qv ] [ hostname ...  ]

       failovermon -o start [ hostname ...  ]

       failovermon -o stop [ hostname ...  ]

DESCRIPTION
       failovermon provides operations for manipulating entries in the
       failover monitors(4M) database as well as operations for starting and
       stopping failovermon monitors. Failover monitors and their action
       scripts (lost-pulse and regain-pulse) are set up and execute on the
       system that is serving in the backup role. This system should already
       have been set up for failover using the operator initiated failover
       operations through sysadm.

       The failovermon process monitors the specified system with a
       heartbeat message. This message is sent from the failovermon process
       to the failoverd(1M) process on the host being monitored. The
       heartbeat is sent over all communication paths that have been set up
       for the host being monitored using the admfailoveraltcommpath(1M)
       command. As long as at least one response is received by the monitor
       the heartbeat is successful. The monitor then sleeps for the number
       of seconds specified in its interval value.

       If no response is received on any of the communications paths, the
       retries value is examined to determine whether or not to declare the
       host failed. If the retries value is zero the monitor immediately
       executes the lost-pulse script. If the retries value is not zero, the
       monitor continues to try and communicate with the host until the
       retry value is exceeded. Then the monitor executes the lost-pulse
       action script.

       The monitor continues to attempt to communicate with the failed host.
       When communications are re-established the regain-pulse action script
       is executed.

       When a failovermon(1M) monitor is started, a child process is
       fork(2)'ed and put in the background. The start operation will report
       the monitor as started if this succeeds. The monitor can then fail if
       the host it is suppossed to monitor is not accepting communications.
       Most times when this situation occurs, you should check to see if the
       listen(1M) portmonitor is running on the remote system. If it is not



Licensed material--property of copyright holder(s)                         1




failovermon(1M)              DG/UX 5.4 Rel. 2.01             failovermon(1M)


       running you will get the "connection refused" error message on the
       system console.

       The failovermon monitor can be configured to monitor the host it is
       running on. This type of monitoring is used to detect a system hang.
       The monitor determines if the system it is invoked on has and can use
       the wdt() driver. This driver is available for use on AV4600 and
       above systems. The wdt() driver will internally reset a register
       every second. If it fails to reset the timer in one second, it will
       trigger a warm reset of the system. The failovermon monitor
       communicates with the wdt() driver for a higher level of monitoring.
       The failovermon process will attempt to open and close a file every
       30 seconds. Upon successful completion, the failovermon process will
       send a message to the wdt() driver indicating the system is alive.

       If the wdt() driver does not get a message from the failovermon
       process within 30 seconds of the last message, the wdt() driver will
       initiate a system panic to alleviate the hang.

       When the failovermon process is stopped or terminates abnormally, the
       wdt() driver ceases the high level monitoring. The wdt() driver
       continues to perform its lower level monitoring until the driver is
       deconfigured from the system.

       The failovermon monitor can be configured to be started when the
       system is rebooted.

   Operations
       add       Add a failovermon monitor entry for hostname to the
                 failover monitors database. This operation will optionally
                 allow the administrator to start the monitor at this time.

       delete    Delete a failovermon monitor entry for hostname from the
                 failover monitors database. This operation will also
                 terminate an existing monitor if one is running.

       modify    Modify a failovermon monitor entry for hostname. This
                 operation will optionally allow the administrator to
                 restart the current monitor (if one is running) or start
                 one using the new information.

       list      List failover monitors database entries. The list operation
                 reports the following monitor information to stdout:

                     the name of the host that is being monitored
                     a flag indicating that a monitor is running or not
                     flag indicating whether the monitor is brought up
                         at system reboot time
                     the interval value
                     the retries value
                     the lost pulse action script name
                     the regain pulse action script name

                 With the `verbose´ format (-v), information is printed in



Licensed material--property of copyright holder(s)                         2




failovermon(1M)              DG/UX 5.4 Rel. 2.01             failovermon(1M)


                 aligned col umns with headers. With the `quiet´ format (-q)
                 headers are sup pressed and each host entry is printed on a
                 separate line. If both -q and -v are specified, the output
                 will be in `quiet´ format.

       start     Start a failovermon monitor for the specified host(s).

       stop      Stop a failovermon monitor for the specified host(s).

   Options
       The following options can be used with the add or modify operations:

       -b        Start on reboot. This option specifies that this monitor is
                 to be brought up when the system is rebooted.


       -i interval
                 The time in seconds that the failovermon monitor waits
                 after receiving a reply to a handshake before initiating
                 the next handshake. The default is zero for an add
                 operation or the current interval value for a modify
                 operation.


       -r retries
                 The number of times the failovermon monitor should continue
                 to try and communicate with the failoverd daemon of the
                 specified system, before declaring the system failed. The
                 default is zero for an add operation or the current retries
                 value for a modify operation.


       -l lost-pulse
                 The full pathname to the user created script to be executed
                 when the monitor declares a system to be failed. This
                 script should contain an admfailoverdisk(1M) command line
                 to transfer the physical disks from the failed host to the
                 backup host. This script should also contain any system set
                 up required for the application or its users. The default
                 is /etc/failover/failovermon_lost_pulse for an add
                 operation or the current lost_pulse value for a modify
                 operation.


       -g regain-pulse
                 The full pathname to the user created script to be executed
                 when the monitor regains the pulse of the system it is
                 monitoring. This script should contain any actions that
                 should be performed when the heart beat is regained (e.g.,
                 the administrator may want to shutdown the application and
                 move the disks back to the original host).The default is
                 /etc/failover/failovermon_regain_pulse for an add operation
                 or the current regain-pulse value for a modify operation.




Licensed material--property of copyright holder(s)                         3




failovermon(1M)              DG/UX 5.4 Rel. 2.01             failovermon(1M)


       -s        If specified on an add operation this option indicates that
                 the monitor should be started. If specified on a modify
                 operation this option indicates that the currently running
                 monitor should be stopped and restarted with the new
                 values. If no monitor is running, then one will be started.


       The following option can be used with the modify operation:

       -n        Do not start on reboot. This option specifies that this
                 monitor is not to be brought up when the system is
                 rebooted.

       The following options can be used with the list operation:

       -q        Quiet. Produce an unformatted listing with no headers,
                 fields delimited by a single space.  -v Verbose. Produce a
                 formatted listing with headers and aligned columns. This
                 option is the default.

EXAMPLE
       To add and start a failovermon monitor that will monitor a system
       named hostA. This monitor will send messages every 60 seconds, and
       will retry the handshake message 3 times before executing the
       /hostA_has_failed script. Should hostA return, the /hostA_is_back
       script will be executed. This can be done with the following command
       line:

         failovermon -o add -i 60 -r 3 -l /hostAhasfailed -g /hostAisback hostA


       The monitor can then be started with the following command line:

         failovermon -o start hostA


       To modify this monitor and restart it with an interval of 1200
       seconds (i.e., 20 minutes). The following command line could be
       submitted for off-peak monitoring:

         failovermon -o modify -i 1200 -s hostA


       To stop this monitor, use the following command line:

         failovermon -o stop hostA


FILES
       /etc/failover/monitors
                           failover monitors database

DIAGNOSTICS




Licensed material--property of copyright holder(s)                         4




failovermon(1M)              DG/UX 5.4 Rel. 2.01             failovermon(1M)


   Warnings
       -      Cannot initiate connection with host <hostname>, retrying.

       -      A monitor for <hostname> is already running.

       -      An attempt was made to delete a monitors database entry that
              did not exist

   Errors
       -      failovermon connection refused.

       -      Monitor for <hostname> not running.

       -      Monitor for <hostname> is already running.

       -      An attempt was made to add, delete, modify, or list a monitor
              for an invalid host.

       -      An attempt was made to modify or list a monitors database
              entry that did not exist.

       -      An attempt was made to add a monitors database entry that
              already existed.

       -      The wdt() driver is not supported on this system.

   Exit Codes
        0     The operation was successful.

        1     The operation was unsuccessful.

        2     The operation failed due to access restrictions.

        3     There was an error in the command line.

SEE ALSO
       sysadm(1M), admfailoverdisk(1M), failoverd(1M), failover(4M).

NOTES
       Super-user privilege is required for all operations except list.

       It is possible for systems to be in a state where users get no
       response but the monitor continues to detect a heartbeat. If this is
       detected you should reset or `hot-key´ the system that is hung. This
       will allow the monitor to detect a failure and perform its functions
       that will allow the applications to be restarted while the failed
       system is rebooted.

       If you add additional communications paths to the failover
       altcommpath database after a monitor has been started, you will need
       to stop and start the monitor in order for those additional paths to
       be used.

       If you intend to shutdown a system that is being monitored and do not



Licensed material--property of copyright holder(s)                         5




failovermon(1M)              DG/UX 5.4 Rel. 2.01             failovermon(1M)


       want the monitor to detect the system being down and execute its
       lost-pulse action script, you should stop the monitor before shutting
       down the system.






















































Licensed material--property of copyright holder(s)                         6


Typewritten Software • bear@typewritten.org • Edmonds, WA 98026