ND(4) — System Manager’s Manual — Special Files
NAME
nd − net disk driver
SYNOPSIS
pseudo-device nd
DESCRIPTION
The network disk device, /dev/nd∗, allows a client workstation to perform disk IO operations on a server system, over the network. To the client system, this device looks like any normal disk driver: it allows read/write operations at a given block number and byte count. Note that this provides a network disk block access service rather than a network file access service. (The Sun / 4.2bsd network distributed file server is still under development).
Typically the client system will contain no disks at all. In this case /dev/nd0 contains the client’s root file system (including /usr files), and nd1 is used as a paging area. Client access to these devices is converted to net disk protocol requests and sent to the server system over the network. The server receives the request, performs the actual disk IO, and sends a response back to the client.
The server contains a table which lists the net address of each of his clients and the server disk partition which corresponds to each client unit number (nd0,1,...). This table resides in the server kernel in a structure owned by the nd device. The table is initialized by running the program /etc/nd with text file /etc/nd.local as its input. /etc/nd then issues ioctl(2) functions to load the table into the kernel.
In addition to the read/write units /dev/nd∗, there are public read-only units which are named /dev/ndp∗. The correspondence to server partitions is specified by the /etc/nd.local text file, in a similar manner to the private partitions. The public units can be used to provide shared access to binaries or libraries (/bin, /usr/bin, /usr/ucb, /usr/lib) so that each diskless client does not have to waste space in his private partitions for these files. This is done by providing a public file system at the server ( /dev/ndp0 ) which is mounted on ‘/pub’ of each diskless client. The clients then use symbolic links to read the public files: /bin -> /pub/bin, /usr/ucb -> /pub/usr/ucb. One requirement in this case is that the server (who has read/write access to this file system) should not perform much write activity with any public filesystem. This is because each client is locally cacheing blocks.
One last type of unit is provided. These are called local units and are named /dev/ndl∗. The SUN physical disk sector 0 label only provides a limited number of partitions per physical disk (eight). Since this number is small and these partitions have somewhat fixed meanings, the nd driver itself has a subpartitioning capability built-in. This allows the large server physical disk partition (e.g. /dev/ip0g ) to be broken up into any number of diskless client partitions. Of course on the client side these would be referenced as /dev/nd0,1,... ; but the server needs to reference these client partitions from time to time, to do mkfs(8) and fsck(8) for example. The /dev/ndl∗ entries allow the server ‘local’ access to his subpartitions without causing any net activity. The actual local unit number to client unit number correspondence is again recorded in the /etc/nd.local text file.
The nd device driver is the same on both the client and server sides. There are no user level processes associated with either side, thus the latency and transfer rates are close to maximal.
MINOR DEVICE NUMBERS
The minor device and ioctl encoding used is given in file /usr/include/sun/ndio.h. The low six bits are the unit number. The 0x40 bit indicates a public unit; the 0x80 bit indicates a local unit.
INITIALIZATION
No special initialization is required on the client side; he finds the server by broadcasting the initial request. Upon getting a response, he locks onto that server address.
At the server, the /etc/rc.local file contains the line ‘/etc/nd - </etc/nd.local’. This causes the initialization text file to be read and ioctl’s issued to load this information into the kernel. /etc/nd.local contains comments explaining the format of the commands contained therein. Below is reproduced a sample file:
# nd.local - net disk local initialization file
#
# Each of the commands accepted can be given on the command line
# as arguments or on standard input.
# See also manual page nd(4). Syntax of each command:
#
# son
# enables this host as a net file server.
#
# soff
# turns off server status.
#
# user [ipaddr] [hisunit] [mydev] [myoff] [mysize] [mylunit]
# For the client of the file server at [ipaddr], transform
# incoming requests for [hisunit] into server device [mydev]
# at offset [myoff] and size [mysize] sectors. /dev/ndl[mylunit]
# provides a local name for this disk ‘subpartition’.
#
# If [mysize] is ‘-1’, then this user/unit is equivalent
# to the entire filesystem partition [mydev] (no ‘subpartioning’
# is done.) If [mysize] is positive, but [myoff] = ‘-1’, then
# this user/unit begins at an offset following the offset and
# size of the previous command line. If [mylunit] is ‘-1’,
# then no local name is needed for this user/unit (this is usually
# the case with a swap unit, or a unit represented by an entire
# filesystem).
#
# If [ipaddr] is zero, [hisunit] refers to a public unit.
#
# clear
# Clear the kernel tables built by any previous commands.
#
# flush
# Executed by a client, this clears the buffer cache of
# any read-only public filesystem blocks (this also happens
# automatically every hour and whenever the file server
# broadcasts a ‘public flush’ message.) This allows public
# filesystem changes to eventually reach the clients.
#
# version [versionnumber]
# Occasionally the need arises to reorganize or
# reload the diskless partitions. Since the clients will rewrite
# locally cached blocks, they must be kept from writing their
# filesystems until they reboot.
#
# Before such a reorganization occurs, the system manager should
# warn diskless users to save files and halt their machines.
# Modification of the partitions should occur with the disk server
# off. After modification is complete, [versionnumber] should be
# incremented to force users to reboot.
#
# -
# ‘/etc/nd -’ tells the program to read commands
# from standard input instead of parsing the command line.
#
clear
version 1
user 0 0 /dev/ip0d -1 -1 -1
user dummy1 0 /dev/ip0e -1 -1 -1
user dummy1 1 /dev/ip0f -1 -1 -1
user dummy2 0 /dev/ip0g -1 -1 -1
user dummy2 1 /dev/ip0h -1 -1 -1
son
ERRORS
Generally physical disk IO errors detected at the server are returned to the client for action. If the server is down or unaccessable, the client will see the console message file server not responding: still trying. The client continues (forever) making his request until he gets positive acknowledgement from the server. This means the server can crash or power down and come back up without any special action required of the user at the client machine. It also means the process performing the IO to nd will block, insensitive to signals, since the process is sleeping inside the kernel at PRIBIO.
PROTOCOL AND DRIVER INTERNALS
The protocol packet is defined in /usr/include/sys/nd.h and also included below:
/∗
∗ ‘nd’ protocol packet format.
∗/
struct ndpack {
struct ip np_ip;/∗ ip header, proto IPPROTO_ND ∗/
u_charnp_op;/∗ operation code, see below ∗/
u_charnp_min;/∗ minor device ∗/
charnp_error;/∗ b_error ∗/
charnp_ver;/∗ version number ∗/
longnp_seq;/∗ sequence number ∗/
longnp_blkno;/∗ b_blkno, disk block number ∗/
longnp_bcount;/∗ b_bcount, byte count ∗/
longnp_resid;/∗ b_resid, residual byte count ∗/
longnp_caddr;/∗ current byte offset of this packet ∗/
longnp_ccount;/∗ current byte count of this packet ∗/
}; /∗ data follows ∗/
/∗ np_op operation codes. ∗/
#defineNDOPREAD1/∗ read ∗/
#defineNDOPWRITE2/∗ write ∗/
#defineNDOPERROR3/∗ error ∗/
#defineNDOPCODE7/∗ op code mask ∗/
#defineNDOPWAIT010/∗ waiting for DONE or next request ∗/
#defineNDOPDONE020/∗ operation done ∗/
/∗ misc protocol defines. ∗/
#defineNDMAXDATA1024/∗ max data per packet (if 1370, would
allow 4K disk block to fit in 3 ether
packets; but would mess up clusters) ∗/
#defineNDMAXPACKS6/∗ max packets before acknowledgement ∗/
#defineNDMAXIO32∗1024/∗ max np_bcount ∗/
#defineNDXTIMER4/∗ seconds between rexmits ∗/
IP datagrams were chosen instead of UDP datagrams because only the IP header is checksummed, not the entire packet as in UDP. Also the kernel level interface to the IP layer is simpler. The min, blkno, and bcount fields are copied directly from the client’s strategy request. The sequence number field seq is incremented on each new client request and is matched with incoming server responses. The server essentially echos the request header in his responses, altering certain fields. The caddr and ccount fields show the current byte address and count of the data in this packet, or the data expected to be sent by the other side.
The protocol is very simple and driven entirely from the client side. As soon as the client ndstrategy routine is called, the request is sent to the server; this allows disk sorting to occur at the server as soon as possible. Transactions which send data (client writes on the client side, client reads on the server side) can only send NDMAXPACKS packets of NDMAXDATA bytes each, before waiting for an acknowledgement. The defines are currently set at 6 packets of 1K bytes each. This allows the normal 4K byte case to occur with just one ‘transaction’. The NDOPWAIT bit is set in the op field by the sender to indicate he will send no more until acknowledged (or requested) by the other side. The NDOPDONE bit is set by the server side to indicate the request operation has completed; for both the read and write cases this means the requested disk IO has actually occured.
Requests received by the server are entered on an active list which is timed out and discarded if not completed within NDXTIMER seconds. Requests received by the server allocate a bcount size buffer directly in Multibus memory to minimize buffer copying. Contiguous DMA disk IO thus occurs in the same size chunks it would if requested from a local physical disk.
BOOTSTRAP
The SUN workstation has PROM code to perform a net boot using this driver. All of the boot files are obtained from public device 0 (/dev/ndp0) on the server with which the client is registered; this allows multiple servers to exist on the same net (even running different releases of kernel and boot software). If the station you are booting is not registered on any of the servers, you will have to specify the hex local net address (low three bytes) in the boot command string (e.g.): ‘bnd(814)stand/diag’.
This booting performs exactly the same steps involved in a real disk boot which are: (1) user types ‘b’ to PROM monitor. (2) PROM loads blocks 1 thru 15 of /dev/ndp0 (bootnd). (3) bootnd loads ‘/boot’. (4) /boot loads ‘/vmunix’.
Although this is more involved than it needs to be (the PROM could do all the work if we were running a file server), it uses the same protocol and driver as the rest of the net disk system.
SEE ALSO
BUGS
The size of the Multibus memory pool allocated by function mbbufall is critical. If this pool is too small, the server will not be able to allocate a disk buffer and retransmissions will occur. The nd driver could manage its own pool in main memory and try there if Multibus memory is unavailable. Disk servers should have 256K of multibus memory; some early disk systems were shipped with only 128K and this is not enough.
If the client machine contains a disk controller, the workstation cannot be brought up in a completely diskless mode by the default PROM boot. This is because the current ‘swapgeneric’ code always defaults to disk if one is present. In this case the user must type the ‘-a’ or ‘-as’ switches to request ‘ask’ mode or ‘ask plus single-user’ mode. The boot command would then be ‘bnd vmunix -as’ (possibly followed by ‘nd(0,0,0)vmunix -as’.) When the kernel is finally loaded, the user specifies ‘nd0’ as the root device.
Sun System Release 0.3 — 18 April 1983