bpf(4) DG/UX R4.11MU05 bpf(4)
NAME
bpf - Berkeley Packet Filter
SYNOPSIS
bpf0
DESCRIPTION
The Berkeley Packet Filter provides a raw interface to data link
layers in a protocol-independent fashion. All packets on the
network, even those destined for other hosts, are accessible through
this interface.
The packet filter appears as a character special clonable device,
/dev/bpf0. After opening this file, the file descriptor must be
bound to a specific interface with the BIOCSETIF or BIOCSETIF2
ioctls. The interfaces can be bound to more than one file
descriptor, and the filter underlying each descriptor will see an
identical packet stream. If /dev/bpf0 does not exist, you can build
a new kernel with the bpf() entry in the system file, and reboot your
system.
A user-settable packet filter is associated with each open instance
of bpf0. Whenever an interface receives a packet, all file
descriptors listening on that interface apply their filter. Each
descriptor that accepts the packet receives its own copy.
Reads from these file descriptors return the next group of packets
that have matched the filter. To improve performance, the buffer
passed to read must be the same size as the buffers used internally
by bpf. A user application can get/set the size of this buffer with
the BIOCGBLEN/BIOCSBLEN ioctl.
The packet filter supports the following link level protocols:
Ethernet, SLIP, FDDI, and Token Ring. The packet filter also
supports attaching at the bottom and top of IP; that is, ipbottom
and iptop, respectively.
Since packet data is in network byte order, applications should use
the byteorder(3N) macros to extract multi-byte values.
Ioctls
The ioctl command codes below are defined in <net/bpf.h>. All
commands require these includes:
#include <sys/types.h>
#include <sys/time.h>
#include <sys/ioctl.h>
#include <net/bpf.h>
Additionally, BIOCGETIF, BIOCSETIF, and BIOCSETIF2 require
<net/if.h>.
In addition to FIONREAD and SIOCGIFADDR, the following commands may
be applied to any open instance of bpf0. The third argument to the
ioctl should be a pointer to the type indicated.
BIOCGBLEN (uint)
Returns the required buffer length for reads.
BIOCSBLEN (uint)
Sets the buffer length (in bytes) for reads. If the
requested buffer size cannot be accommodated, the closest
allowable size will be set and returned in the argument. A
read call will result in EIO if it is passed a buffer that
is not this size. Note that an individual packet larger
than this size is necessarily truncated.
BIOCGDLT (uint)
Returns the type of the data link layer underlying the
attached interface. EINVAL is returned if no interface has
been specified. The device types are defined in
<net/bpf.h>.
BIOCPROMISC
Forces the interface into promiscuous mode. All packets,
not just those destined for the local host, are processed.
Since more than one file can be listening on a given
interface, a listener that opened its interface non-
promiscuously may receive packets promiscuously. This
problem can be remedied with an appropriate filter.
The interface remains in promiscuous mode until all file
instances listening promiscuously are closed.
If the interface does not have a promiscuous mode, this
ioctl has no effect.
You must attach to an interface via the BIOCSETIF or
BIOCSETIF2 ioctl before issuing the BIOCPROMISC ioctl.
BIOCFLUSH Flushes the buffer of incoming packets and resets the
statistics that are returned by BIOCGSTATS.
BIOCGETIFLIST (struct bpfiflist)
Returns a list of the interfaces which can be attached to
via the BIOCSETIF or BIOCSETIF2 ioctl. Upon entry, the
bifllen field equals the size (in bytes) of the buffer
pointed to by the biflbuf field. Upon return, the
bifllen field equals the size (in bytes) of a buffer
required to fully accommodate the interface list; if the
interface list is larger than the buffer pointed to by the
biflbuf field, only the number of elements which can fully
fit into the buffer are returned. If the biflversion
field equals BPF_IF_VERSION1, each element of the interface
list is defined by struct bpfif.
BIOCGETIF (struct ifreq)
Returns the name of the interface that was attached to by
the BIOCSETIF or BIOCSETIF2 ioctl. The name is returned in
the ifname field of ifreq. All other fields are
undefined.
BIOCSETIF (struct ifreq)
BIOCSETIF2 (devt)
Sets the interface associated with the file descriptor and
performs the actions of BIOCFLUSH. One of these ioctls
must be performed before any packets can be read. With
BIOSETIF, indicate the device name in the ifname field of
ifreq; the device name is a simple file name, not the
complete path (e.g. cien0). With BIOSETIF2, use the device
number to indicate the device; the device number of a
device is returned by the stat system call in the strdev
field of the struct stat structure.
BIOCGRTIMEOUT (struct timeval)
BIOCSRTIMEOUT (struct timeval)
Gets or sets the read timeout parameter. The value of
timeval specifies the maximum length of time the kernel
will wait before sending any buffered packets to a process
which is pended at a read of a bpf file descriptor. This
parameter is initialized to zero by open(2), indicating no
timeout.
BIOCGSTATS (struct bpfstat)
Returns the following structure of packet statistics:
struct bpfstat {
uint bsrecv;
uint bsdrop;
};
The fields are:
bsrecv the number of packets received by the
descriptor since opened or reset (including
any buffered since the last read call); this
includes packets which are rejected as well
as those which are accepted by the filter
program.
bsdrop the number of packets accepted by the filter
program but dropped by the kernel because of
buffer overflows (i.e., the application's
reads aren't keeping up with the packet
traffic).
BIOCIMMEDIATE (uint)
Enables or disables "immediate mode," based on the truth
value of the argument. When immediate mode is enabled,
reads return immediately upon packet reception. This is
useful for programs that must respond to messages in real
time. Initially, an open instance of bpf0 has immediate
mode disabled, which means that reads block until either
the kernel buffer becomes full or a timeout occurs and data
must be read.
BIOCGMAXMEM (long)
BIOCSMAXMEM (long)
Gets or sets the maximum number of scratch memory locations
available for use by the filter program. Each location is
4 bytes.
The BIOCSETF ioctl explains how to set the filter program.
BIOCGHOSTTBL (bpfhosttablet)
Returns a copy of the host table, which is maintained from
packets seen on the associated interface. The host table
can currently be maintained only if the interface is
Ethernet; the filter program must also contain an
instruction that causes the host table statistics to be
kept (see BPF_MISC+BPF_ROUTINES).
The hdr.numtableeles field must be set to the number of
table elements in the buffer pointed to by the tableptr
field. Upon return, the hdr.numtableeles field is set to
the number of host table elements. (The number actually
returned is the smaller of the current number of elements
and the number of elements in the buffer.) Also upon
return, the hdr.maxtableeles field is set to the maximum
number of elements that can be in the host table. A user
application can get/set this value with the
BIOCGHOSTTBLSIZE/BIOCSHOSTTBLSIZE ioctls.
BIOCGMATRIXTBL (bpfmatrixtablet)
Returns a copy of the matrix table, which is maintained
from packets seen on the associated interface. The matrix
table can currently be maintained only if the interface is
Ethernet; the filter program must also contain an
instruction that causes the matrix table statistics to be
kept (see BPF_MISC+BPF_ROUTINES).
The hdr.numtableeles field must be set to the number of
table elements in the buffer pointed to by the tableptr
field. Upon return, the hdr.numtableeles field is set to
the number of matrix table elements. (The number actually
returned is the smaller of the current number of elements
and the number of elements in the buffer.) Also upon
return, the hdr.maxtableeles field is set to the maximum
number of elements that can be in the matrix table. A user
application can get/set this value with the
BIOCGMATRIXTBLSIZE/BIOCSMATRIXTBLSIZE ioctls.
BIOCGHOSTTBLSIZE (unsigned int)
BIOCSHOSTTBLSIZE (unsigned int)
Gets or sets the maximum number of elements that the kernel
will store in this host table before it begins to drop
elements.
BIOCGMATRIXTBLSIZE (unsigned int)
BIOCSMATRIXTBLSIZE (unsigned int)
Gets or sets the maximum number of elements that the kernel
will store in this matrix table before it begins to drop
elements.
BIOCSETF (struct bpfprogram)
Sets the filter program and performs the actions of
BIOCFLUSH. An array of instructions and its length is
passed in using the following structure:
struct bpfprogram {
int bflen;
struct bpfinsn *bfinsns;
};
The fields are:
bfinsns points to the filter program
bflen is the length of the filter program
struct bpfinsn
is the units of the length.
The FILTER MACHINE section explains the filter language.
BPF Header
The following structure is prepended to each packet returned by
read(2):
struct bpfhdr {
struct timeval bhtstamp;
ulong bhcaplen;
ulong bhdatalen;
ushort bhhdrlen;
};
The fields, whose values are stored in host order, are:
bhtstamp The time the packet was processed by the packet
filter.
bhcaplen The length of the captured portion of the packet.
This is the minimum of the truncation amount specified
by the filter and the length of the packet.
bhdatalen The length of the packet off the wire. This value is
independent of the truncation amount specified by the
filter.
bhhdrlen The length of the BPF header, which may not be equal
to sizeof(struct bpfhdr).
The bhhdrlen field accounts for padding between the bpfhdr
structure and the lowest level protocol header. This provides proper
alignment of the packet data structures, which is required on
alignment-sensitive architectures and improves performance on many
other architectures.
Additionally, individual packets are padded so that each starts on a
word boundary. This requires an application to know how to get from
packet to packet. The macro BPF_WORDALIGN, defined in <net/bpf.h>,
rounds up its argument to the nearest word-aligned value (where a
word is BPF_ALIGNMENT bytes wide).
For example, if p points to the start of a packet, this expression
advances it to the next packet:
p = (char *)p + BPFWORDALIGN(p->bhhdrlen + p->bhcaplen)
For the alignment mechanisms to work properly, the buffer passed to
read(2) must itself be word aligned. malloc(3) always returns an
aligned buffer.
Filter Machine
A filter program is an array of instructions, with all branches
forwardly directed, terminated by a return instruction. Each
instruction performs some action on the pseudo-machine state, which
consists of an accumulator, index register, scratch memory, and
implicit program counter.
The following structure defines the instruction format:
struct bpfinsn {
ushort code;
uchar jt;
uchar jf;
long k;
};
The k field is used in different ways by different instructions, and
the jt and jf fields are used as offsets by the branch instructions.
The opcodes are encoded in a semi-hierarchical fashion. There are
eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC. Various other mode and
operator bits are or'd into the class to give the actual
instructions. The classes and modes are defined in <net/bpf.h>.
The semantics for each defined BPF instruction are given below. A is
the accumulator, X is the index register, P[] packet data, and M[]
scratch memory. P[i:n] gives the data at byte offset i in the
packet, interpreted as a word (n=4), unsigned halfword (n=2), or
unsigned byte (n=1). M[i] gives the i'th word in scratch memory,
which is addressed only in word units. Scratch memory is indexed
from 0 to the number of scratch memory locations (see BIOCGMAXMEM).
k, jt, and jf are the corresponding fields in the instruction
definition. len refers to the length of the packet.
BPFLD These instructions copy a value into the accumulator. The
type of the source operand is specified by an "addressing
mode" and can be a constant (BPFIMM), packet data at a
fixed offset (BPFABS), packet data at a variable offset
(BPFIND), the packet length (BPFLEN), or a word in
scratch memory (BPFMEM). For BPFIND and BPFABS, the
data size must be specified as a word (BPFW), halfword
(BPFH), or byte (BPFB). The semantics of all the
recognized BPF_LD instructions follow.
BPFLD+BPFW+BPFABS A <- P[k:4]
BPFLD+BPFH+BPFABS A <- P[k:2]
BPFLD+BPFB+BPFABS A <- P[k:1]
BPFLD+BPFW+BPFIND A <- P[X+k:4]
BPFLD+BPFH+BPFIND A <- P[X+k:2]
BPFLD+BPFB+BPFIND A <- P[X+k:1]
BPFLD+BPFW+BPFLEN A <- len
BPFLD+BPFIMM A <- k
BPFLD+BPFMEM A <- M[k]
BPFLDX These instructions load a value into the index register.
The addressing modes are more restricted than those of the
accumulator loads, but they include BPFMSH, which
efficiently loads the IP header length.
BPFLDX+BPFW+BPFIMM X <- k
BPFLDX+BPFW+BPFMEM X <- M[k]
BPFLDX+BPFW+BPFLEN X <- len
BPFLDX+BPFB+BPFMSH X <- 4*(P[k:1]&0xf)
BPFST This instruction stores the accumulator into the scratch
memory. We do not need an addressing mode since there is
only one possibility for the destination.
BPFST M[k] <- A
BPFSTX This instruction stores the index register into the scratch
memory.
BPFSTX M[k] <- X
BPFALU The alu instructions perform operations between the
accumulator and index register or constant, and store the
result back in the accumulator. For binary operations, a
source mode is required (BPFK or BPFX).
BPFALU+BPFADD+BPFK A <- A + k
BPFALU+BPFSUB+BPFK A <- A - k
BPFALU+BPFMUL+BPFK A <- A * k
BPFALU+BPFDIV+BPFK A <- A / k
BPFALU+BPFAND+BPFK A <- A & k
BPFALU+BPFOR+BPFK A <- A | k
BPFALU+BPFLSH+BPFK A <- A << k
BPFALU+BPFRSH+BPFK A <- A >> k
BPFALU+BPFADD+BPFX A <- A + X
BPFALU+BPFSUB+BPFX A <- A - X
BPFALU+BPFMUL+BPFX A <- A * X
BPFALU+BPFDIV+BPFX A <- A / X
BPFALU+BPFAND+BPFX A <- A & X
BPFALU+BPFOR+BPFX A <- A | X
BPFALU+BPFLSH+BPFX A <- A << X
BPFALU+BPFRSH+BPFX A <- A >> X
BPFALU+BPFNEG A <- -A
BPFJMP The jump instructions alter flow of control. Conditional
jumps compare the accumulator against a constant (BPFK) or
the index register (BPFX). If the result is non-zero, the
true branch is taken, otherwise the false branch is taken.
Jump offsets are encoded in 8 bits, so the longest jump is
256 instructions. However, the jump always (BPFJA) opcode
uses the 32-bit k field as the offset, allowing arbitrarily
distant destinations. All conditionals use unsigned
comparison conventions.
BPFJMP+BPFJA pc += k
BPFJMP+BPFJGT+BPFK pc += (A > k) ? jt : jf
BPFJMP+BPFJGE+BPFK pc += (A >= k) ? jt : jf
BPFJMP+BPFJEQ+BPFK pc += (A == k) ? jt : jf
BPFJMP+BPFJSET+BPFK pc += (A & k) ? jt : jf
BPFJMP+BPFJGT+BPFX pc += (A > X) ? jt : jf
BPFJMP+BPFJGE+BPFX pc += (A >= X) ? jt : jf
BPFJMP+BPFJEQ+BPFX pc += (A == X) ? jt : jf
BPFJMP+BPFJSET+BPFX pc += (A & X) ? jt : jf
BPFRET The return instructions terminate the filter program and
specify the amount of packet to accept (i.e., they return
the truncation amount). A return value of zero indicates
that the packet should be ignored. The return value is
either a constant (BPFK) or the accumulator (BPFA).
BPFRET+BPFA accept A bytes
BPFRET+BPFK accept k bytes
BPFMISC The miscellaneous category includes instructions that don't
fit into the above classes and new instructions that need
to be added. Currently, these are the register transfer
instructions, which copy the index register to the
accumulator and vice versa, and an instruction that
contains an index into a table of routines to perform
specific kernel processing on each packet.
BPFMISC+BPFTAX X <- A
BPFMISC+BPFTXA A <- X
BPFMISC+BPFROUTINES Call the k'th routine. k can
have two values:
BPF_HOST_TABLE_ROUTINE, which
calls the routine to keep
host table statistics;
BPF_MATRIX_TABLE_ROUTINE,
which calls the routine to
keep matrix table statistics.
The BPF interface provides the following macros to facilitate array
initializers:
BPFSTMT(opcode, operand)
BPFJUMP(opcode, operand, trueoffset, falseoffset)
EXAMPLES
This filter accepts only Reverse ARP requests.
struct bpfinsn insns[] = {
BPFSTMT(BPFLD+BPFH+BPFABS, 12),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, ETHERTYPEREVARP, 0, 3),
BPFSTMT(BPFLD+BPFH+BPFABS, 20),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, REVARPREQUEST, 0, 1),
BPFSTMT(BPFRET+BPFK, sizeof(struct etherarp) +
sizeof(struct etherheader)),
BPFSTMT(BPFRET+BPFK, 0),
};
This filter accepts only IP packets between host 128.3.112.15 and
128.3.112.35.
struct bpfinsn insns[] = {
BPFSTMT(BPFLD+BPFH+BPFABS, 12),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, ETHERTYPEIP, 0, 8),
BPFSTMT(BPFLD+BPFH+BPFABS, 26),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, 0x8003700f, 0, 2),
BPFSTMT(BPFLD+BPFH+BPFABS, 30),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, 0x80037023, 3, 4),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, 0x80037023, 0, 3),
BPFSTMT(BPFLD+BPFH+BPFABS, 30),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, 0x8003700f, 0, 1),
BPFSTMT(BPFRET+BPFK, (uint)-1),
BPFSTMT(BPFRET+BPFK, 0),
};
This filter returns only TCP finger packets. We must parse the IP
header to reach the TCP header. The BPFJSET instruction checks that
the IP fragment offset is 0 so we are sure that we have a TCP header.
struct bpfinsn insns[] = {
BPFSTMT(BPFLD+BPFH+BPFABS, 12),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, ETHERTYPEIP, 0, 10),
BPFSTMT(BPFLD+BPFB+BPFABS, 23),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, IPPROTOTCP, 0, 8),
BPFSTMT(BPFLD+BPFH+BPFABS, 20),
BPFJUMP(BPFJMP+BPFJSET+BPFK, 0x1fff, 6, 0),
BPFSTMT(BPFLDX+BPFB+BPFMSH, 14),
BPFSTMT(BPFLD+BPFH+BPFIND, 14),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, 79, 2, 0),
BPFSTMT(BPFLD+BPFH+BPFIND, 16),
BPFJUMP(BPFJMP+BPFJEQ+BPFK, 79, 0, 1),
BPFSTMT(BPFRET+BPFK, (uint)-1),
BPFSTMT(BPFRET+BPFK, 0),
};
FILES
/dev/bpf0
SEE ALSO
tcpdump(1).
Licensed material--property of copyright holder(s)