CRASH(8S) — MAINTENANCE COMMANDS

NAME

crash − what happens when the system crashes

DESCRIPTION

This section explains what happens when the system crashes and how you can analyze crash dumps.

When the system crashes voluntarily it displays a message of the form

panic: why i gave up the ghost

on the console, takes a dump on a mass storage peripheral, and then invokes an automatic reboot procedure as described in reboot(8). Unless some unexpected inconsistency is encountered in the state of the file systems due to hardware or software failure the system will then resume multi-user operations.

The system has a large number of internal consistency checks; if one of these fails, it will panic with a very short message indicating which one failed.

The most common cause of system failures is hardware failure, which can reflect itself in different ways. Here are the messages which you are likely to encounter, with some hints as to causes. Left unstated in all cases is the possibility that hardware or software error produced the message in some unexpected way.

IO err in push

hard IO err in swap
The system encountered an error trying to write to the paging device or an error in reading critical information from a disk drive. You should fix your disk if it is broken or unreliable.

timeout table overflow
This really shouldn’t be a panic, but until we fix up the data structure involved, running out of entries causes a crash. If this happens, you should make the timeout table bigger by changing the value of ncallout in the param.c file, and then rebuild your system.

trap type type, pid process-id, pc = program-counter, sr = status-register, context context-number
A unexpected trap has occurred within the system; typical trap types are:

•Bus error

•Address error

•Illegal instruction

•Divide by zero

•Chk instruction

•Trapv instruction

•Privilege violation

•Trace

•1010 emulator trap

•1111 emulator trap

•Stack format error

•Unitialized interrupt

•Spurious interrupt

The favorite trap types in system crashes are ‘Bus error’ or ‘Address error’, indicating a wild reference. The process-id is the id of the process running at the time of the fault, program-counter is the hexadecimal value of the program counter, status-register is the hexadecimal value of the status register, and context-number is the context that the process was running in. These problems tend to be easy to track down if they are kernel bugs since the processor stops cold, but random flakiness seems to cause this sometimes.

init died
The system initialization process has exited. This is bad news, as no new users will then be able to log in. Rebooting is the only fix, so the system just does it right away.

That completes the list of panic types you are likely to see.

When the system crashes it writes (or at least attempts to write) an image of memory into the back end of the primary swap area. After the system is rebooted, the program savecore(8) runs and preserves a copy of this core image and the current system in a specified directory for later perusal. See savecore(8) for details.

To analyze a dump you should begin by running adb(1S) with the −k flag on the core dump. A more complete discussion of system debugging is impossible here. See, however, ‘Using ADB to Debug the UNIX Kernel’.

Museum

Related Articles

CRASH(8S) — MAINTENANCE COMMANDS

NAME

DESCRIPTION

SEE ALSO