crash(8) crash(8)
NAME
crash - what to do when the system crashes
DESCRIPTION
This entry gives a few clues about what to do if the system
crashes. It is not complete.
In restarting after a crash, always bring up the system
single-user, as specified in boot(8) modified for your
installation. Perform an fsck(1M) on all file systems which
could have been in use when the system crashed. If you find
any serious file system problems, you should repair them.
When you are satisfied with the health of your disks, check
and set the date, then come up multi-user.
To boot Oreo, certain files (and the directories leading to
them) must be intact. First, the initialization program
/etc/init must be present and executable. For init to work
correctly, /dev/console, /bin/sh, and /bin/env must be
present. If one of these does not exist, the symptom is
best described as thrashing. init will go into a fork/exec
loop trying to create a Shell with proper standard input and
output. The file /etc/rc should also be executable; the
system will come up but will not be fully initialized
without it.
If you cannot get the system to boot, you must obtain a
runnable system from a backup medium. You may then doctor
the root file system as a mounted file system, as described
below. If there are any problems with the root file system,
you should go to a backup system to avoid working on a
mounted file system.
Repairing disks
You should treat an addled disk gently; you shouldn't mount
it unless necessary, and if it is very valuable but in quite
bad shape, perhaps you should copy it before trying surgery
on it. This is an area where experience and informed
courage count for much.
fsck(1M) is adept at diagnosing and repairing file system
problems. It first identifies all the files that contain
bad (out of range) blocks or blocks that appear in more than
one file. Any such files are identified by name and fsck
requests permission to remove them from the file system.
You should remove files with bad blocks. When there are
duplicate blocks, you should remove all the files except the
most recently modified. You should check the contents of
the survivor after repairing the file system to ensure that
it contains the proper data. (Note that running fsck -n
reports all problems without attempting any repair.)
Page 1 (last mod. 1/15/87)
crash(8) crash(8)
fsck also reports on incorrect link counts and requests
permission to adjust any that are erroneous. In addition, it
reconnects any files or directories allocated without any
file system references to a ``lost+found'' directory.
Finally, if the free list is bad (out of range, missing, or
duplicate blocks) fsck will, with the operators concurrence,
construct a new one.
Why did it crash?
Oreo types a message on the console typewriter when it
voluntarily crashes. Here is the current list of such
messages, with enough information to provide a hope of the
remedy. The message has the form ``panic: ...'', possibly
accompanied by other information. Left unstated in all
cases is the possibility that hardware or software error
produced the message in some unexpected way. Not all
systems produce all of these panics.
bflush: bad free list.
A buffer management error occurred during a sync or
umount system call
blkdev
The getblk routine was called with a nonexistent major
device as argument. Definitely hardware or software
error.
devtab
Null device table entry for the major device used as
argument to getblk. Definitely hardware or software
error.
dpfrelse
The list of processes currently mapped into the memory
management unit has been lost (68451 only).
iinit
An I/O error reading the super-block for the root file
system during initialization.
interrupt stack overflow
The kernel ran out of stack space on an interrupt.
Subroutine depth is too great or too many local
variables.
kernel memory management error
Bus error or address error in supervisor mode. Can be
a software or hardware problem.
kernel parity error
A memory parity error occurred while the cpu was in
supervisor mode.
Page 2 (last mod. 1/15/87)
crash(8) crash(8)
lost vmap segment
An error occurred opening a shared memory segment.
no data pages
A memory management error occurred allocating mmu
registers for a process's data segment.
no fs
A device has disappeared from the mounted-device table.
Definitely hardware or software error.
no imt
Like no fs, but produced elsewhere.
no clock
During initialization, neither the line nor
programmable clock existed.
no procs
Process table has been destroyed.
no text page
A memory management error occurred allocating mmu
registers for a process's text segment.
I/O error in swap
An unrecoverable I/O error during a swap. Really
shouldn't be a panic, but it is hard to fix.
oops!!! syscall
Missing the interrupt vector for system calls.
out of swap space
A program needs to be swapped out, and there is no more
swap space. It has to be increased. This really
shouldn't be a panic, but there is no easy fix.
timeout table overflow
The timeout table overflowed. The timeout table is not
large enough or some routine is starting up too many
timeouts.
trap An unexpected trap occurred within the system. This is
accompanied by the following information:
trap type
2 bus error
3 address error
4 illegal instruction
5 divide by zero
6 CHK instruction
7 TRAPV instruction
Page 3 (last mod. 1/15/87)
crash(8) crash(8)
8 privilege violation
9 trace
10 1010 emulator
11 1111 emulator
12-255 unexpected interrupt
virtual address (for bus/address errors only)
physical address
instruction register
function code
mmu dump
program counter
status register
program id
registers
unexpected kernel trap
A buserr or similar unexpected exception occurred while
the cpu was in supervisor mode.
In some of these cases, it is possible to add hex 1000 into
the trap type; this indicates that the processor was in user
mode when the trap occurred.
SEE ALSO
fsck(1M), boot(8).
Page 4 (last mod. 1/15/87)