Museum

Home

Lab Overview

Retrotechnology Articles

Online Manuals

⇒ cpqw(7) — UnixWare 2.01

Media Vault

Software Library

Restoration Projects

Artifacts Sought

Related Articles

inspect(1M)

rtpm(1M)

crom(7)






       cpqw(7)                                                      cpqw(7)


       NAME
             cpqw - Compaq Wellness Driver

       DESCRIPTION
             The Compaq Wellness Driver collects and monitors important
             operational data on your server to ensure that the system is
             ``healthy.''  Any abnormal conditions are logged into the EISA
             Non Volatile RAM (Health log) and the information is
             optionally sent to the Compaq Insight Manager via SNMP traps.
             The health log information can be displayed using inspect.

             Compaq Servers are equipped with hardware and firmware to
             monitor certain abnormal conditions such as abnormal
             temperature readings, fan failures, ECC memory errors, etc.
             The cpqw driver monitors these conditions and reports the
             status to the administrator by printing a message on the
             console, and also logging the condition into the EISA NVRAM.
             A tightly coupled SNMP agent (a daemon process) opens the
             driver and waits for abnormal conditions.  If an abnormal
             condition is detected the SNMP agent issues a trap to the
             Compaq Insight Manager (CIM).

             cpqw acts as a multiplexer for three drivers: asr, csm, and
             ecc_nmi.

          asr Driver
             This driver allows a user based process to periodically update
             the Automatic Server Recovery (ASR) timer, a hardware
             ``heartbeat'' counter that is loaded with an initial value and
             starts counting down.  If the timer cannot be updated, and if
             the ASR manages to count down to zero, it assumes that the
             operating system is locked up, and the firmware attempts to
             reboot the system.  Before rebooting, asr displays a message
             to the console stating the problem, and makes an entry in the
             system health log.  When ASR reboots, all the boards are
             reset.

          csm Driver
             The Compaq Server Management csm driver monitors (via IRQ 13
             which is shared with FPU) the aspects of the system in the
             following list.  It writes its status to EISA NVRAM.

             Temperature sensors
                            If the normal operating temperature is
                            exceeded, or a cooling fan fails, the csm
                            driver displays a message to the console


                           Copyright 1994 Novell, Inc.               Page 1













      cpqw(7)                                                      cpqw(7)


                           stating the problem, makes an entry in the
                           system health log, and shuts the system down
                           (optionally) to avoid hardware damage.  Use
                           EISA configuration utility to control the
                           option.

            Fan sensors    If a cooling fan fails, csm displays a message
                           to the console stating the problem, makes an
                           entry in the system health log, and shuts the
                           system down (optionally) to avoid hardware
                           damage.  Use the EISA configuration utility to
                           control the option.

            Hobbs meter updates
                           Total system uptime is stored in non-volatile
                           memory and displayed by using the inspect
                           utility.  This is a 4-byte counter (the
                           environment variable, CPQHOB) on the ProLiant
                           system board that acts as a total ``uptime''
                           meter (the amount of time the system was
                           running under OS).  Once a driver for the OS is
                           installed, the driver causes this timer to
                           tick.  It is not resettable.

            EISA bus utilization
                           The driver maintains the EISA bus utilization
                           statistics.  EISA bus utilization is obtained
                           by periodically polling a hardware counter.
                           The driver stores the number of idle BLCK's and
                           poll frequency in a variable that can be
                           retrieved/displayed via the performance monitor
                           utility, rtpm.

         ecc_nmi Driver
            If an ECC memory error occurs, the ecc_nmi driver logs the
            error in the health log including the error causing address.
            If too many errors occur at the same memory location, the
            driver disables the ECC error interrupts to prevent flooding
            the console from warnings (the hardware automatically corrects
            the ECC error).

            There are two types of error logs maintained in the EISA
            NVRAM, the Correctable Error Log and the Critical Error Log.
            Each error log consists of 16 entries in a circular queue.
            Each entry is 8 bytes long.  Each record has a time stamp with
            a date and time.


                          Copyright 1994 Novell, Inc.               Page 2













       cpqw(7)                                                      cpqw(7)


             The Error Information field provides specific information for
             each error type.  Diagnostics uses this information to resolve
             the source of the error.

             Some examples of the NMI and ECC errors are:

                         Uncorrectable memory error:
                         Expansion bus master timeout
                         Expansion bus slave timeout
                         Expansion board timeout
                         Expansion bus arbitration error
                         Processor cache parity timeout
                         Processor parity error
                         System concurrency error
                         Processor failure
                         Processor exception
                         Processor internal error
                         Critical Temperature
                         Failsave timer expiration
                         Abnormal program termination
                         System fan failure
                         UPS - battery depletion detected

       REFERENCES
             inspect(1M), rtpm(1M), crom(7)

       NOTICES
             This command is only supported on applicable Compaq systems.




















                           Copyright 1994 Novell, Inc.               Page 3








Typewritten Software • bear@typewritten.org • Edmonds, WA 98026