ecclogger(1M) — Series 300, Model 350 only
NAME
ecclogger, eccscrub − check for or scrub out ECC memory errors
SYNOPSIS
ecclogger [logsize]
eccscrub
DESCRIPTION
ecclogger checks ECC (error-checking and correcting) memory installed in an HP 9000 Model 350 computer system for errors which have been corrected. When ecclogger finds an error, it writes a record of the error to /etc/ecclog and invokes eccscrub to ensure that all errors have been cleared (only one error per ECC memory card is logged).
The optional parameter logsize specifies the number of log entries which are allowed. Hewlett-Packard recommends that the default value of 100 be used for logsize. If 100 errors occur on a system the local HP Sales and Support Office should be notified so that the /etc/ecclog can be evaluated (a system with fewer than 100 ecclog entries is usually not a concern). When the log reaches the maximum number of entries, a message is printed (mailed to root when invoked from root’s crontab) to indicate that /etc/ecclog is full.
ecclogger should be invoked by root’s crontab by including an entry for ecclogger in the crontab file for user root. The recommended frequency for running ecclogger is once per hour (eccscrub is automatically invoked by ecclogger whenever an error is corrected).
eccscrub performs a read-modify-write operation on memory to correct any single-bit soft (not a ’stuck’ bit) errors that may exist in a memory cell. It may be desirable to invoke eccscrub from cron once per day.
In order to achieve the recommended frequencies, the following two entries need to be added to the crontab entry for root:
0 * * * * exec /etc/ecclogger
0 0 * * * exec /etc/eccscrub
Here is a typical entry for a failure as recorded in /etc/ecclog:
870911084132 0xFF55CF40 0x70
Note: Do not write to /etc/ecclog. Doing so prevents ecclogger from writing to /etc/ecclog.
The first field in the error entry is a date/time stamp [yymmddhhmmss] that indicates when the error was logged. The second field is the memory location in hexadecimal. The final field is the error syndrome byte. The syndrome byte is decoded below (useful to service personell when troubleshooting ECC cards):
| Syndrome | Error Location | Syndrome | Error Location |
| 0x01 | check bit 0 | 0x31 | data bit 14 |
| 0x02 | check bit 1 | 0x34 | data bit 15 |
| 0x04 | check bit 2 | 0x40 | check bit 6 |
| 0x08 | check bit 3 | 0x4A | data bit 1 |
| 0x0B | data bit 17 | 0x4F | data bit 0 |
| 0x0E | data bit 16 | 0x52 | data bit 2 |
| 0x10 | check bit 4 | 0x54 | data bit 3 |
| 0x13 | data bit 18 | 0x57 | data bit 4 |
| 0x15 | data bit 19 | 0x58 | data bit 5 |
| 0x16 | data bit 20 | 0x5B | data bit 6 |
| 0x19 | data bit 21 | 0x5D | data bit 7 |
| 0x1A | data bit 22 | 0x62 | data bit 24 |
| 0x1C | data bit 23 | 0x64 | data bit 25 |
| 0x20 | check bit 5 | 0x67 | data bit 26 |
| 0x23 | data bit 8 | 0x68 | data bit 27 |
| 0x25 | data bit 9 | 0x6B | data bit 28 |
| 0x26 | data bit 10 | 0x6D | data bit 29 |
| 0x29 | data bit 11 | 0x70 | data bit 30 |
| 0x2A | data bit 12 | 0x75 | data bit 31 |
| 0x2C | data bit 13 |
FILES
/etc/ecclog
SEE ALSO
Hewlett-Packard Company — HP-UX Release 9.0: August 1992