MX(4) — Pixar Programmer’s Manual
NAME
mx− Fast Disk device interface
SYNOPSIS
controller mxc0 at vme16d16 ? csr 0xa000 priority 2 vector mxnintr 0xd0 mxpintr 0xd1
disk mx0 at mxc0 drive 0
disk mx1 at mxc0 drive 1
disk mx2 at mxc0 drive 2
disk mx3 at mxc0 drive 3
controller mxc1 at vme16d16 ? csr 0xa100 priority 2 vector mxnintr 0xd2 mxpintr 0xd3
disk mx4 at mxc1 drive 0
disk mx5 at mxc1 drive 1
disk mx6 at mxc1 drive 2
disk mx7 at mxc1 drive 3
DESCRIPTION
In its most simple form, the mx device consists of a File Access Controller (FAC) and a set of four industry-standard disk drives. The set of four drives is known as a volume. From the host computer’s perspective, the volume appears as one large, fast disk drive; its constituent drives are not individually addressable.
The principal difference between the mx and conventional disk drive/controller arrangements is the high throughput of the mx (actual throughput rates achieved by the mx are governed by the speeds of the component disk drives and other factors). The mx achieves its high performance by disk striping, a technique that distributes each input/output operation over the four drives within a volume. One host CPU can be equipped with multiple FACs. Each FAC can be equipped with between 1 to 4 volumes (4, 8, 12, or 16 drives).
Access to mx disks is available only through the UNIX character-special (raw) device interface. All I/O must be performed in multiples of 4096 bytes, beginning at offsets that are multiples of 4096 bytes. Traditional lseek/read/write access is supported; however, the FAC has the ability to support filesystems that are larger than 4 gigabytes, and such filesystems would not be completely addressable via the 32-bit offset supplied to lseek. To accommodate such filesystems, alternate read and write mechanisms are available via ioctl, and these are the preferred method for performing I/O.
Arbitration of access to the FAC
One mx File Access Controller may be connected to up to six host CPUs. The FAC firmware permits only one of those CPUs to be in control of FAC at any given moment. The CPU that has control of the FAC is said to have requested the FAC. All user operations (e.g., I/O) that require communication with the FAC must be bracketed with MXIOREQUEST and MXIORELINQUISH ioctl calls (the only exception to this rule is the read and write syscalls, which implicitly request and relinquish the FAC); failure to comply will elicit an EPERM error from the driver. It is legal to request the FAC, perform numerous operations, and then relinquish it; such usage must be tempered, however, by the restriction that while one host has the FAC requested, all other hosts must wait their turn. Proper etiquette would dictate, therefore, that a user process not keep the FAC requested any longer than necessary to ensure consistency of the data on disk. In a single-host environment, request and relinquish operations are still necessary, but they will always be honored immediately.
Volume partitioning
Each volume is identified by a number. Volumes 0 through 3 reside on the first FAC, volumes 4 through 7 on the second, and so on. Each volume is divided into two partitions. The a partition is 1 megabyte in size and is reserved for use by Pixar diagnostic utilities. The b partition comprises the remainder of the drive and contains a user filesystem. The c partition encompasses the entire volume (i.e., the union of partitions a and b).
Ioctl functions
Programs employing these functions should include the files:
#include <sys/ioctl.h>
#include <pixardev/mxioctl.h>
MXIOREQUEST
Request the FAC. If the FAC is available, return is immediate. Otherwise, the argument is interpreted as an int containing the number of seconds to wait for the FAC to become available. If the FAC becomes available before the timeout period elapses, the call returns successfully; otherwise it fails with errno set to EBUSY. Note that, when reporting a "busy" condition, this call makes no distinction between another user on the same host possessing the FAC and a user on a different host possessing it; both are considered "busy". Upon successful return, the calling process has exclusive access to the FAC until it explicitly relinquishes it. MXIOREQUEST calls may be nested.
MXIORELINQUISH
Undo the effects of a prior MXIOREQUEST call. If multiple MXIOREQUEST calls have been nested, then a corresponding number of MXIORELINQUISH calls must be issued to relinquish the FAC. The last close call to an open mx device will undo all unbalanced MXIOREQUEST calls and relinquish the FAC.
MXIOGETCONFIG
Read configuration tables from the FAC. The argument is an mxconfigioctl structure (see mxioctl.h), of which the calling program must fill in the fields setnumber, tabnumber, startpos, and nbyts. Configuration information is returned in the first nbyts elements of the configbytes array. Refer to the STRATEGY 1 operation manual for more information about configuration information.
MXIOSETCONFIG
Write configuration tables in the FAC. The argument is an mxconfigioctl structure, of which the caller must fill in the fields setnumber, tabnumber, startpos, and nbyts. Also, the caller fills in the first nbyts elements of the configbytes array. This call should be used sparingly, as it writes to an EEPROM that deteriorates slightly upon each write access.
MXIOXREAD
Read from mx disk into user program memory. This call offers the combined features of lseek and read, along with full access to volumes that are larger than 4 gigabytes. The argument is an mxxio structure. The mxx_blkno field contains the 4096-byte block number of the first block to read. The mxx_buf field contains the address of the buffer in user memory. The mxx_nblocks field contains the number of 4096-byte blocks to read (the total number of bytes read is mxx_nblocks ∗ 4096). The mxx_flags field contains bit flags that modify the effects of the read operation. MXXF_HOLDADDR holds the memory address constant, causing all data to be delivered to the same 4-byte location in memory. This is typically used to transfer data into the Pixar HSI data window, which is mapped into user program memory via mmap(2). Large transfers that do not use MXXF_HOLDADDR are normally broken into 256 kilobyte segments by the driver, with no noticeable side effects except perhaps slightly slower performance. This segmentation is required by the UNIX kernel. MXXF_NOSEG inhibits the segmentation, possibly giving an increase in performance; however, its use circumvents the kernel’s protection mechanisms and can cause the operating system to crash, and is therefore not encouraged.
MXIOXWRITE
Write to disk from user program memory. The argument is an mxxio structure; its contents are the same as in MXIOXREAD.
MXIOFORMAT
Format one or more disk drives of a volume. The argument is an int that is furnished to the FAC as the dsel parameter of the Format command. See manual for more details.
MXIOSPECFLAW
Specify raw flaw information to the FAC. The argument is an mxflaw structure (see mxioctl.h), containing information regarding the position of the flaw. This call is used primarily when recovering from a hardware problem that erased the vendor-supplied flaw data recorded on the disk drives.
DKIOCINFO
Return controller information. See dkio(4S).
DKIOCGPART
Return partition starting block number (dkl_cylno) and size in 4096-byte blocks (dkl_nblk). See dkio(4S).
DKIOCSPART
Set starting block number and size of partition. See dkio(4S).
Ioctl functions for reading asynchronously into HSI
These functions implement separate control over the two phases of reading data from mx disks, namely reading from the disk media into the FAC RAM, and DMA transfer of data from the FAC RAM to a VMEbus slave device (the Pixar HSI). Additionally, the FAC operations initiated by these functions occur in parallel with user program activity, hence the term “asynchronous.”
The functions described here are also illustrated in the file /usr/pixar/hsi/demo/mxasynch.c .
MXIODCREAD(n)
Initiate n operations to read blocks from disk into the RAM buffers in the FAC. The third argument to ioctl is the address of the first element of an array of struct mxiodcread. The array contains n elements (between 1 and 15). Each element specifies a starting partition block number and a count of 4K-byte blocks to read. These requests are submitted in order to the FAC, and control is returned immediately to the calling program, so that processing can continue in parallel with reading the disk media. As each read operation completes, the kernel delivers a signal to the calling process (see MXIOASIG). Note that this function causes no DMA activity on the VMEbus.
MXIOCHREAD
Initiate DMA from FAC RAM to an HSI data window. This function is issued subsequent to one or more MXIODCREAD calls to read the data that was queued by those calls. The third argument to ioctl is the address of a union mxiochread specifying the virtual memory address of the HSI data window and the number of 4K-byte blocks to transfer. The user program must have previously mapped the HSI data window into user virtual space using mmap(2). The MXIODCREAD function returns the number of blocks remaining queued in the FAC RAM (after deducting the number of blocks requested in this call) and a status code: ECH_SUCCESS (success, DMA was started), ECH_DMAINPROG (prior DMA still in progress, no action taken), or ECH_NOTENOUGH (number of requested blocks exceeds number of available blocks, no action taken). All data is read into the same VMEbus address, so only one page of the HSI data window needs to be mapped into the user program’s address space regardless of the size of the DMA transfer. No notification is given to the user program when the DMA is complete; however, the user program may query the status of the DMA with the MXIOCHQUERY function.
MXIOCHQUERY
Query the status of asynchronous queued transfers. The third argument to ioctl is the address of a union mxiochread wherein the kernel returns the count of the number of 4K-byte blocks that are ready for DMA as a result of completed MXIODCREAD requests, and a status code: ECH_NODMAINPROG (no DMA in progress), or ECH_DMAINPROG (DMA in progress). This call initiates no FAC activity.
MXIOASIG
Request UNIX signal upon the completion of MXIODCREAD and MXIOCHREAD events. The thrid argument to ioctl is the address of a struct mxioasig containing two fields. mxas_dcsig is a valid UNIX signal number between 1 and NSIG-1, specifying a signal to be sent when each part of an MXIODCREAD operation completes. Similarly, mxas_chsig specifies a signal to be sent when an MXIOCHREAD operation completes. A value of 0 in either field specifies that no signal is to be delivered. The effects of this ioctl function endure until the mx file descriptor is closed.
Upon any relinquish of the FAC or the initiation of a synchronous I/O request (e.g., MXIOXREAD), the driver will first pause to wait for any pending MXIODCREAD requests to complete, and then discard all unread data waiting in the FAC RAM.
FILES
/dev/rmxn{abc}device special files
SEE ALSO
STRATEGY 1 File Access Controller Operation Manual
Maximum Strategy, Inc. San Jose, CA
DIAGNOSTICS
These diagnostic messages are typically indicative of failing hardware. If they persist, report them to Field Service.
mxc%d: %s error status %d/%d
A utility (not data transfer) command failed. The two numbers are Status_1 and Status_0, respectively.
mxc%d: %s timeout
The polled-status register handshake timed out.
mxc%d: spurious DMA interrupt
An unexpected DMA-done interrupt arrived. It is ignored.
mxc%d: DMA timeout DMACSR=%x
A DMA operation did not complete within the allotted time (5 seconds minimum).
mxc%d: VME error DMACSR=%x
The mx VMEbus interface reported a bus-related error.
mxc%d: spurious SP interrupt: %x ...
An unexpected status-pending interrupt arrived. Status bytes are displayed and the interrupt is ignored.
mx%d: {read,write} status %d/%d [explanation]
A Read or Write command completed with a non-zero status. The two numbers are Status_1 and Status_0, respectively. A negative Status_1 value implies failure. Positive Status_1 values indicate a qualified success, such as recovery from a correctable ECC error. A complete list of error codes appears in Appendix B of the Strategy 1 hardware manual.
mxc%d: status not available
The polled-status register handshake timed out after completion of a Read or Write command.
mx%d: format timed out
A Format operation did not complete within the allotted time.
mx%d: format aborted by signal (%x)
A Format operation was aborted by a signal sent to the user process; the number in parentheses indicates the signal number (see /usr/include/signal.h). Among other things, typing CONTROL-C can cause this.
BUGS
If a FAC is present at autoconfigure time, UNIX will report that all volumes are present, regardless of whether this is indeed true.
No block-special access is implemented; thus this device is unsuitable for use as a mountable UNIX filesystem.
No attention is paid to cylinder boundaries. The number of blocks per cylinder is 1 as far as the driver is concerned. Due to the flaw-mapping technique employed by the FAC, it is nontrivial to determine where cylinder boundaries lie.
In certain rare error cases, the FAC firmware and the UNIX driver can lose synchronization. Pressing the RESET button on the front panel of the FAC often relieves this condition.
The driver maintains no iostat information.
Release β — Last change: 2/23/89