crash(8v) — VAX
Name
crash − what happens when the system crashes
Description
This section explains what happens when the system crashes and shows how to analyze crash dumps.
When the system crashes voluntarily it prints a message on the console in the form:
panic: explanation
The system takes a dump on a mass storage peripheral device, and then invokes an automatic reboot procedure as described in reboot(.). Unless there is some unexpected inconsistency in the state of the file systems due to hardware or software failure, the system then resumes multi-user operations. If auto-reboot is disabled on the front panel of the machine, the system halts at this point.
The system has a large number of internal consistency checks; if one of these fails, it prints a short message indicating which one failed.
The most common cause of system failures is hardware failure. In all cases there is the possibility that hardware or software error produced the message in some unexpected way. These messages are the ones you are likely to encounter:
IO err in push
hard IO err in swap
The system encountered an error when trying to write to the paging device or an error in reading critical information from a disk drive. Fix your disk if it is broken or unreliable.
timeout table overflow
Due to the current data structure, running out of entries causes a crash. If this happens, make the timeout table bigger.
KSP not valid
SBI fault
CHM? in kernel
These indicate either a problem in the system or failing hardware. If SBI faults recur, check out the hardware or call field service. Run the processor microdiagnostics to determine if the problem is caused by an unreliable processor.
machine check %x:
description
machine dependent machine-check information
Call field service.
trap type %d, code=%d, pc=%x
An unexpected trap has occurred within the system; the trap types are:
0reserved addressing fault
1privileged instruction fault
2reserved operand fault
3bpt instruction fault
4xfc instruction fault
5system call trap
6arithmetic trap
7ast delivery trap
8segmentation fault
9protection fault
10trace trap
11compatibility mode fault
12page fault
13page table fault
The most common traps in system crashes are trap types 8 and 9, indicating a wild reference. The code is the referenced address, and the pc at the time of the fault is printed. These problems tend to be easy to track down if they are kernel problems because the processor stops, but there are random occurrences with unpredictable causes.
init died
The system initialization process has exited. The only solution is the automatic reboot procedure described in reboot(.). Until this is done, no new users can log in.
When the system crashes, it attempts to write an image of memory into the back end of the primary swap area. After the system is rebooted, the program savecore() runs and preserves a copy of this core image and the current system in a specified directory for later access. See savecore() for details.
To analyze a dump, you should begin by running adb() with the −k flag on the core dump. Normally, the command “*(intstack-4)$c” provides a stack trace from the point of the crash and this should provide a clue as to what went wrong.