Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ wdx(8) — Watchdog Autopilot 2.1

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

wdx.loc(5)

wdx.cfg(5)

mount(8)

wdx(8)  —  Maintenance

 

Name

wdx - DECwatchdog/Autopilot

Syntax

wdx [mission] [option...]

Description

DECwatchdog/Autopilot software runs on distributed configurations running the following operating systems:

•OSF/1,

•ULTRIX/RISC,

•OpenVMS VAX,

•OpenVMS AXP and

•Windows NT. 

DECwatchdog/Autopilot runs on a system with one Ethernet controller. 

However, two independent TCP/IP Ethernet are required to benefit from the full functionality of DECwatchdog/Autopilot. 

DECwatchdog/Autopilot is used with groups of computers. 
 A group is composed of 2 to 16 computers communicating across a local area network and collaborating to perform a critical application. A copy of DECwatchdog/Autopilot software must be installed on each computer in a group.

 A critical application and the description of the group of computers on which this application runs form a DECwatchdog/Autopilot mission. 

DECwatchdog/Autopilot provides three main features for a mission (defined below):

•watchdog and failover recovery,

•software and hardware redundancy management (distributed lock management),

•mission maintenance facilities, stop and start
 

Mission

This is an optional argument. If the argument mission is in the command line, then wdx refers to this mission defined by:

•the configuration files: mission.cfg, mission.loc,

•the log file,

•the UDP service used for communication: mission/udp

The mission name must not exceed 16 characters. 

If no mission argument is in the command line, then wdx command refers to the mission defined by:

•the configuration file: wdx.cfg, wdx.loc,

•the log file,

•the UDP service used for communication: wdx/udp
 

Options

The options of wdx must be used singly. If no option is entered, wdx displays a list of the available options. 

On success, each wdx command returns a success status as defined by the operating system. Otherwise an error status is returned. 

Status can be:

-1 : syntax or system error

0 : success

1 : command failed

Mission Management Commands

The Mission Management commands are distributed commands. When entered on a system, this command is processed on all the watchdog processes. 

-daemon [member_name]

Start the DECwatchdog/Autopilot process. 

If no [member_name] is specified, the default member name is the host name. The [member_name] argument corresponds to the member name specified in the mission file. 

Only one watchdog process at any time can be present per node for a mission. 
This option requires superuser privileges.

-run

Set all watchdog processes to the RUNNING state. Each member will become either SLAVE or SATELLITE according to its role. If no MASTER is in the mission the member which becomes the master for the mission, is the slave which has the highest priority. 
This option requires superuser privileges.

-kill

Kill all watchdog processes for the corresponding mission when the mission name is specified. It is useful for software maintenance of DECwatchdog/Autopilot for a running mission. 
This option requires superuser privileges.

-stop [snapshot]

Set all watchdog processes to IDLE state. Member activity is stopped after registration of a snapshot for the mission. 

When used with the start command, the stop command is useful to initiate a Warm Startup procedure. 

The name of the snapshot is an optional argument, if no snapshot argument is passed, then the mission name is used by default. 

The snapshot name length is limited to 8 characters. The snapshot file is located in the wdx directory, and has a .hot extension. 

This command also creates an ascii file snapshot.txt which contains a text copy of the displays of a list and of a constant commands.  This file includes a date and a time stamp. 

This option requires superuser privileges. 
 

-start [snapshot]

Force all watchdog processes to restart from the registered snapshot. This warm startup requires that all watchdog processes of the mission be in IDLE state. 

The name of the snapshot is passed as an optional argument to the command line. If no snapshot argument is specified then the mission name is taken as snapshot. 

The snapshot name length is limited to 8 characters. 

When a start command is initiated, the DECwatchdog software restarts the different nodes according to the state and the locks on the resources that they had before the stop command. The information on the states and the locks are taken from the snapshot file. 

During a start procedure, the mission file are scanned; if a member no longer belongs to a mission, its resource locks are released and the wdx process of the member corresponding to this mission is cancelled. 

This feature is useful when modifying the DECwatchdog configuration (adding or removing a member) without disturbing the whole configuration. 

This option requires superuser privileges. 

-shutdown

Initiate the triggering of the SHUTDOWNSCRIPT on all the members of the corresponding mission, without changing the member state. 

The OPERATOR argument is passed to the script in order to determine if the SHUTDOWNSCRIPT activation is due to a shutdown command or to a double master detection. 

This option requires superuser privileges. 

Process Management Commands

The Process Management commands are used to locally control the different watchdog missions that can be started on a member. 

-setup

Set the local watchdog process to the RUNNING state. The member will become either SLAVE or SATELLITE according to its role. 

You do not need to issue a setup command after a successful request for mastership. 

This option requires superuser privileges. 

-prockill

For a particular mission on a member, kill the corresponding watchdog process. 

This option requires superuser privileges. 

Master/Slave Management

-master

The master command is used to declare the MASTER member of the mission. When the member becomes MASTER, the SLAVESCRIPT is triggered. 

This command can be initiated only by the nodes that support the MASTER-SLAVE feature. 

You can have only one MASTER for a mission (except when you have a network partitioning), so your request is rejected if a MASTER member already exists for your mission or if the same request is currently processing on an other member. 

This option requires superuser privileges. 

-nomaster

This option is used to release the lock on the mastertoken.  The MASTER member will return in the SLAVE state with the lowest priority.  The others SLAVEs in the mission will compete for the mastertoken to become MASTER. 

This option requires superuser privileges. 

-getmaster

When used by a SLAVE member, the getmaster command forces a SLAVE to become MASTER. The getmaster command initiates a doublemaster state. The previous MASTER returns to the SLAVE state, after triggering the SHUTDOWNSCRIPT locally. In this case, the argument passed to the SHUTDOWNSCRIPT is DOUBLEMASTER. 

This option requires superuser privileges. 

Resource and Directory Management Commands

The following commands are used to manage the DECwatchdog/Autopilot resources lock mechanism. 

-declare res_name

This option is used to declare a virtual resource or mount point directory for the corresponding mission. 

The variable res_name is mandatory, and represents either a resource name or a mount point directory. The variable res_name must not exceed 20 characters. 

When you declare a resource or a mount point, its state becomes FREE. 

The declare command is rejected when the resource is in one of the following states:

•RESERVED

•MOUNTED

•FAULT

 This option requires superuser privileges. 

-erase res_name

This option is used to release the declaration of a resource or a mount point directory. 

The variable res_name is mandatory, and represents either a resource name or a mount point directory. The variable res_name must not exceed 20 characters. 

When you erase a resource or a mount point, its state becomes "UNKNOWN". The name of the resource or the name of the mount point directory is removed from the DECwatchdog/Autopilot tables. 

The erase command is rejected when the resource is in one of the following states:

•RESERVED

•MOUNTED

•FAULT

This option requires superuser privileges. 

-reserve res_name

This option is used to put an exclusive lock on a virtual resource. 

The variable res_name is mandatory, and represents in this case a resource. The variable res_name must not exceed 20 characters. 

When a resource is RESERVED, its state becomes RESERVED. 

The reserve command is accepted only if the resource is in the FREE state. 

This option requires superuser privileges. 

-mount directory

This option is used to put an exclusive lock on a mount point directory. The name of the directory is mandatory, and must not exceed 20 characters. 

The watchdog process locks a mount point directory, only if this directory has been locally MOUNTED with the mount command. 

After a successful mount command the new state for the directory is MOUNTED

If the directory is not locally MOUNTED, an error is returned:

directory not locally MOUNTED.

When the mount point is unMOUNTED (command umount) the lock on this directory is automaticaly released. The mount point returns in the FREE state. 

This option is useful to inform all the nodes in the mission that a file system is now available for NFS operations. 

The mount command is accepted only if the mount point is in the FREE state. 

This option requires superuser privileges. 

-free res_name

This option is used to release the exclusive lock on a virtual resource or on the mount point directory. 

The variable res_name is mandatory, and represents either a resource name or a mount point directory. The variable res_name must not exceed 20 characters. 

The FREE command is RESERVED to the owner of the resource. 

You become the owner when you do the following on the resource or on the mount point directory:

•reserve the resource : reserve command

•mount the directory : mount command

•declare the resource or the mount point as FAULT: fault command. 

This option requires superuser privileges. 

-fault res_name

This option is used to declare in FAULT state a resource or a mount point directory. 

The variable res_name is mandatory, and represents either a resource name or a mount point directory. The variable res_name must not exceed 20 characters. 

This option is useful to indicate that the resource or a mount point directory is unavailable. 

You can declare a resource or a mount point directory as FAULT when it is in one of the following states:

•UNKNOWN

•FREE

•RESERVED

•MOUNTED

When you enter the fault command for an unknown resource, the resource will be automaticaly declared in the DECwatchdog table with the "FAULT" state. 

This option requires superuser privileges. 

-getresource res_name

This option is used to force the lock on a resource, even if another member holds a lock on this resource. 

The variable res_name is mandatory, and represents a resource name. The variable res_name must not exceed 20 characters. 

This command is not supported when it is issued for a mount point directory. 

When a member in the mission initiates this command, the resource owner releases the lock on the resource, and the initiator keeps the lock of this resource. 

This command is valid only when the resources are in one of the following states:

•FREE

•RESERVED

•FAULT

This option requires superuser privileges. 

-nfs directory

When you initiate this command, you are blocked until a member in the mission initiates a mount command for the mount point directory that you specified in your nfs command. 

When the command completes, a string is returned with the following format:

membername:directory

 where membername is the name of the member which has declared as MOUNTED the mount point directory, and directory is the mount point directory. 

If the mount point directory is already in the MOUNTED state before you initiate the nfs command, the string is returned immediately. 

This option requires superuser privileges. 

-wait resource

When you initiate this command, you are blocked until that a member in the mission initiates a DECwatchdog RESERVED command for the resource that you specified in your wait command. 

When the command completes, a string is returned with the following format:

 membername:resource

 where membername is the name of the member which has declared the resource as RESERVED, and resource is the name of the resource. 

If the resource is in the RESERVED state before you initiate the wait command, the string is returned immediately. 

Information Commands

The following commands are used to obtain local status information of your missions running on a member. 

-constant

Reports the values of important global variables of wdx. Some of the variables are adjustable values read from configuration file, others are compiled-in constants. The option is useful to see the limits of the current DECwatchdog/Autopilot implementation. 

-list

List the current status of a mission. 

-log

This option renames the current log file. 

This option is used to analyze the contents of the log file, or when the log file size increases too quickly. 

This option requires superuser privileges. 

-namemaster

Prints the membername of the current master node.  If there is no master, the message:

master.is.unknown

 is printed. 

-show

Prints the characteristics of all missions currently processing on the local computer. 

-status

Prints the status of the local member. 

The Status can be:

•MASTER

•SLAVE

•SATELLITE

•IDLE
 

Restrictions

The following restrictions apply to DECwatchdog/Autopilot:

•all the members in the mission must run the same DECwatchdog/Autopilot version,

•all the members in the mission must have the same mission file,

•the maximum number of members in the DECwatchdog/Autopilot mission is limited to 16,

•the maximum number of user resources that can be managed in a mission is limited to 24,

•the string length of a resource or a "mount point directory" name must not exceed 20 bytes,

•Broadcast and validity time parameters do not have default and maximum values. 

•the latency time for transition detections of DECwatchdog/Autopilot depends on the validity of each member. 
 

Recommendations

System Clock Synchronization

To perform the mission, the computers have to be synchronized. 

For WINDOWS NT configurations refer to Chapter 5 of the DECwatchdog/Autopilot Software User’s Guide. 

Diagnostics

DECwatchdog/Autopilot uses two log files for startup and stop messages:

•mission log file mission.log
 for dynamic events which occured in a mission.

This file is created at start-up time by the DECwatchdog/Autopilot main process, and the previous log file is renamed,
 

•fatal errors are reported to the operating system log files. 

Files

Default local file for wdx:

/var/wdx/wdx.loc

Default mission file for wdx:

/var/wdx/wdx.cfg

See Also

wdx.loc(5)

wdx.cfg(5)

mount(8)

DECwatchdog/Autopilot Software User’s Guide

Typewritten Software • bear@typewritten.org • Edmonds, WA 98026