bpf(4) — 386BSD 1.0



BPF(4)                         1991                        BPF(4)


NAME
       bpf - Berkeley Packet Filter

SYNOPSIS
       pseudo-device bpfilter 16

DESCRIPTION
       The  Berkeley  Packet  Filter  provides a raw interface to
       data link layers in a protocol independent  fashion.   All
       packets  on  the  network,  even  those destined for other
       hosts, are accessible through this mechanism.

       The packet filter appears as a character  special  device,
       /dev/bpf0,  /dev/bpf1, etc.  After opening the device, the
       file descriptor  must  be  bound  to  a  specific  network
       interface  with the BIOSETIF ioctl.  A given interface can
       be shared be multiple listeners, and the filter underlying
       each  descriptor will see an identical packet stream.  The
       total number of open files is limited to the  value  given
       in  the  kernel  configuration;  the  example given in the
       SYNOPSIS above sets the limit to 16.

       A separate device file is required for each minor  device.
       If  a file is in use, the open will fail and errno will be
       set to EBUSY.

       Associated with each open instance of  a  bpf  file  is  a
       user-settable   packet   filter.   Whenever  a  packet  is
       received by an interface, all file  descriptors  listening
       on  that  interface  apply  their filter.  Each descriptor
       that accepts the packet receives its own copy.

       Reads from these files return the next  group  of  packets
       that have matched the filter.  To improve performance, the
       buffer passed to read must be the same size as the buffers
       used  internally  by  bpf.   This  size is returned by the
       BIOCGBLEN ioctl (see below), and under  BSD,  can  be  set
       with  BIOCSBLEN.   Note  that  an individual packet larger
       than this size is necessarily truncated.

       The packet filter will support  any  link  level  protocol
       that  has fixed length headers.  Currently, only Ethernet,
       SLIP and PPP drivers have been modified to  interact  with
       bpf.

       Since  packet  data is in network byte order, applications
       should use the byteorder(3n) macros to extract  multi-byte
       values.

       A  packet  can  be sent out on the network by writing to a
       bpf file descriptor.  The writes are  unbuffered,  meaning
       only  one  packet  can be processed per write.  Currently,
       only writes to Ethernets and SLIP links are supported.




May                             23                              1





BPF(4)                         1991                        BPF(4)


IOCTLS
       The ioctl command codes below are defined in  <net/bpf.h>.
       All commands require these includes:

            #include <sys/types.h>
            #include <sys/time.h>
            #include <sys/ioctl.h>
            #include <net/bpf.h>

       Additionally,  BIOCGETIF and BIOCSETIF require <net/if.h>.

       In addition to FIONREAD  and  SIOCGIFADDR,  the  following
       commands may be applied to any open bpf file.  The (third)
       argument to the ioctl should be  a  pointer  to  the  type
       indicated.

       BIOCGBLEN (uint)
                 Returns  the required buffer length for reads on
                 bpf files.

       BIOCSBLEN (uint)
                 Sets the buffer length for reads on  bpf  files.
                 The  buffer  must  be  set  before  the  file is
                 attached to an interface with BIOCSETIF.  If the
                 requested buffer size cannot be accomodated, the
                 closest allowable size will be set and  returned
                 in the argument.  A read call will result in EIO
                 if it is passed a buffer that is not this  size.

       BIOCGDLT (uint)
                 Returns   the   type  of  the  data  link  layer
                 underyling the attached  interface.   EINVAL  is
                 returned  if  no  interface  has been specified.
                 The device types, prefixed  with  ``DLT_'',  are
                 defined in <net/bpf.h>.

       BIOCPROMISC
                 Forces the interface into promiscuous mode.  All
                 packets, not just those destined for  the  local
                 host,  are  processed.  Since more than one file
                 can  be  listening  on  a  given  interface,   a
                 listener   that   opened   its   interface  non-
                 promiscuously may receive packets promiscuously.
                 This problem can be remedied with an appropriate
                 filter.

                 The interface remains in promiscuous mode  until
                 all files listening promiscuously are closed.

       BIOCFLUSH Flushes  the  buffer  of  incoming  packets, and
                 resets  the  statistics  that  are  returned  by
                 BIOCGSTATS.





May                             23                              2





BPF(4)                         1991                        BPF(4)


       BIOCGETIF (struct ifreq)
                 Returns  the name of the hardware interface that
                 the file is listening on.  The name is  returned
                 in  the  if_name field of ifr.  All other fields
                 are undefined.

       BIOCSETIF (struct ifreq)
                 Sets the hardware interface associate  with  the
                 file.  This command must be performed before any
                 packets can be read.  The device is indicated by
                 name  using  the  if_name  field  of  the ifreq.
                 Additionally, performs the actions of BIOCFLUSH.

       BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval)
                 Set  or  get  the  read  timeout parameter.  The
                 timeval specifies the length  of  time  to  wait
                 before  timing  out  on  a  read  request.  This
                 parameter is initialized  to  zero  by  open(2),
                 indicating no timeout.

       BIOCGSTATS (struct bpfstat)
                 Returns   the   following  structure  of  packet
                 statistics:

                 struct bpfstat {
                      uint bsrecv;
                      uint bsdrop;
                 };

                 The fields are:

                 bs_recv        the number of packets received by
                                the  descriptor  since  opened or
                                reset  (including  any   buffered
                                since the last read call); and

                 bs_drop        the  number of packets which were
                                accepted  by   the   filter   but
                                dropped  by the kernel because of
                                buffer   overflows   (i.e.,   the
                                application's     reads    aren't
                                keeping  up   with   the   packet
                                traffic).

       BIOCIMMEDIATE (uint)
                 Enable  or  disable ``immediate mode'', based on
                 the truth value of the argument.  When immediate
                 mode  is  enabled, reads return immediately upon
                 packet reception.  Otherwise, a read will  block
                 until either the kernel buffer becomes full or a
                 timeout occurs.  This  is  useful  for  programs
                 like  rarpd(8c),  which must respond to messages
                 in real time.  The default for  a  new  file  is
                 off.



May                             23                              3





BPF(4)                         1991                        BPF(4)


       BIOCSETF (struct bpfprogram)
                 Sets  the  filter  program used by the kernel to
                 discard  uninteresting  packets.   An  array  of
                 instructions  and  its length is passed in using
                 the following structure:

                 struct bpfprogram {
                      int bflen;
                      struct bpfinsn *bfinsns;
                 };

                 The filter program is pointed to by the bf_insns
                 field  while  its  length  in  units  of `struct
                 bpf_insn' is given by the bf_len  field.   Also,
                 the actions of BIOCFLUSH are performed.

                 See section FILTER MACHINE for an explanation of
                 the filter language.

       BIOCVERSION (struct bpfversion)
                 Returns the major and minor version  numbers  of
                 the filter languange currently recognized by the
                 kernel.     Before    installing    a    filter,
                 applications must check that the current version
                 is compatible with the running kernel.   Version
                 numbers  are  compatible  if  the  major numbers
                 match and the application minor is less than  or
                 equal  to  the kernel minor.  The kernel version
                 number is returned in the following structure:

                 struct bpfversion {
                      ushort bvmajor;
                      ushort bvminor;
                 };

                 The  current  version  numbers  are   given   by
                 BPFMAJORVERSION   and  BPFMINORVERSION  from
                 <net/bpf.h>.  An incompatible filter may  result
                 in  undefined  behavior  (most  likely, an error
                 returned  by   ioctl()   or   haphazard   packet
                 matching).

BPF HEADER
       The  following  structure  is  prepended  to  each  packet
       returned by read(2):

               struct bpfhdr {
                    struct timeval bhtstamp;
                    ulong bhcaplen;
                    ulong bhdatalen;
                    ushort bhhdrlen;
               };

       The fields, whose values are stored  in  host  order,  and



May                             23                              4





BPF(4)                         1991                        BPF(4)


       are:

       bh_tstamp      The  time at which the packet was processed
                      by the packet filter.

       bh_caplen      The length of the captured portion  of  the
                      packet.    This   is  the  minimum  of  the
                      truncation amount specified by  the  filter
                      and the length of the packet.

       bh_datalen     The  length  of  the  packet  off the wire.
                      This value is independent of the truncation
                      amount specified by the filter.

       bh_hdrlen      The length of the BPF header, which may not
                      be equal to sizeof(struct bpf_hdr).

       The bh_hdrlen field exists to account for padding  between
       the  header and the link level protocol.  The purpose here
       is to  guarantee  proper  alignment  of  the  packet  data
       structures,  which  is  required  on  alignment  sensitive
       architectures and and improves performance on  many  other
       architectures.  The packet filter insures that the bpf_hdr
       and  the  network  layer  header  will  be  word  aligned.
       Suitable precautions must be taken when accessing the link
       layer protocol fields on  alignment  restricted  machines.
       (This isn't a problem on an Ethernet, since the type field
       is a short falling on an even offset,  and  the  addresses
       are probably accessed in a bytewise fashion).

       Additionally,  individual  packets are padded so that each
       starts  on  a  word  boundary.   This  requires  that   an
       application  has  some knowledge of how to get from packet
       to  packet.   The  macro  BPF_WORDALIGN  is   defined   in
       <net/bpf.h>  to facilitate this process.  It rounds up its
       argument to the nearest word aligned value (where  a  word
       is BPF_ALIGNMENT bytes wide).

       For  example, if `p' points to the start of a packet, this
       expression will advance it to the next packet:

              p = (char *)p + BPF_WORDALIGN(p->bh_hdrlen + p->bh_caplen)

       For the alignment mechanisms to work properly, the  buffer
       passed  to read(2) must itself be word aligned.  malloc(3)
       will always return an aligned buffer.

FILTER MACHINE
       A filter program is an array  of  instructions,  with  all
       branches   forwardly  directed,  terminated  by  a  return
       instruction.  Each instruction performs some action on the
       pseudo-machine  state,  which  consists of an accumulator,
       index register, scratch memory store, and implicit program
       counter.



May                             23                              5





BPF(4)                         1991                        BPF(4)


       The following structure defines the instruction format:

              struct bpfinsn {
                   ushort   code;
                   uchar    jt;
                   uchar    jf;
                   long k;
              };

       The  k  field  is  used  in  differnet  ways  by different
       insutructions, and the  jt  and  jf  fields  are  used  as
       offsets  by  the  branch  intructions.   The  opcodes  are
       encoded in a semi-hierarchical fashion.  There  are  eight
       classes  of intructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
       BPF_ALU, BPF_JMP, BPF_RET, and  BPF_MISC.   Various  other
       mode and operator bits are or'd into the class to give the
       actual instructions.  The classes and modes are defined in
       <net/bpf.h>.

       Below  are the semantics for each defined BPF instruction.
       We use the convention that A is the accumulator, X is  the
       index  register,  P[]  packet data, and M[] scratch memory
       store.  P[i:n] gives the data at byte offset ``i'' in  the
       packet,  interpreted  as  a  word (n=4), unsigned halfword
       (n=2), or unsigned byte (n=1).  M[i] gives the  i'th  word
       in  the  scratch  memory store, which is only addressed in
       word units.   The  memory  store  is  indexed  from  0  to
       BPF_MEMWORDS-1.   k,  jt,  and  jf  are  the corresponding
       fields in the instruction definition.  ``len''  refers  to
       the length of the packet.


       BPFLD    These   instructions   copy  a  value  into  the
                 accumulator.  The type of the source operand  is
                 specified by an ``addressing mode'' and can be a
                 constant  (BPFIMM),  packet  data  at  a  fixed
                 offset  (BPFABS),  packet  data  at  a variable
                 offset (BPFIND), the packet  length  (BPFLEN),
                 or a word in the scratch memory store (BPFMEM).
                 For BPFIND and BPFABS, the data size  must  be
                 specified  as  a word (BPFW), halfword (BPFH),
                 or byte  (BPFB).   The  semantics  of  all  the
                 recognized BPF_LD instructions follow.


                 BPFLD+BPFW+BPFABS          A <- P[k:4]

                 BPFLD+BPFH+BPFABS          A <- P[k:2]

                 BPFLD+BPFB+BPFABS          A <- P[k:1]

                 BPFLD+BPFW+BPFIND          A <- P[X+k:4]

                 BPFLD+BPFH+BPFIND          A <- P[X+k:2]



May                             23                              6





BPF(4)                         1991                        BPF(4)


                 BPFLD+BPFB+BPFIND          A <- P[X+k:1]

                 BPFLD+BPFW+BPFLEN          A <- len

                 BPFLD+BPFIMM                A <- k

                 BPFLD+BPFMEM                A <- M[k]


       BPFLDX   These  instructions  load a value into the index
                 register.  Note that the  addressing  modes  are
                 more  retricted  than  those  of the accumulator
                 loads, but they  include  BPFMSH,  a  hack  for
                 efficiently loading the IP header length.

                 BPFLDX+BPFW+BPFIMM         X <- k

                 BPFLDX+BPFW+BPFMEM         X <- M[k]

                 BPFLDX+BPFW+BPFLEN         X <- len

                 BPFLDX+BPFB+BPFMSH         X               <-
                                               4*(P[k:1]&0xf)


       BPFST    This instruction stores the accumulator into the
                 scratch  memory.   We  do not need an addressing
                 mode since there is only one possibility for the
                 destination.

                 BPFST                        M[k] <- A


       BPFSTX   This  instruction  stores  the index register in
                 the scratch memory store.

                 BPFSTX                       M[k] <- X


       BPFALU   The alu instructions perform operations  between
                 the  accumulator and index register or constant,
                 and store the result back  in  the  accumulator.
                 For binary operations, a source mode is required
                 (BPFK or BPFX).

                 BPFALU+BPFADD+BPFK         A <- A + k

                 BPFALU+BPFSUB+BPFK         A <- A - k

                 BPFALU+BPFMUL+BPFK         A <- A * k

                 BPFALU+BPFDIV+BPFK         A <- A / k

                 BPFALU+BPFAND+BPFK         A <- A & k



May                             23                              7





BPF(4)                         1991                        BPF(4)


                 BPFALU+BPFOR+BPFK          A <- A | k

                 BPFALU+BPFLSH+BPFK         A <- A << k

                 BPFALU+BPFRSH+BPFK         A <- A >> k

                 BPFALU+BPFADD+BPFX         A <- A + X

                 BPFALU+BPFSUB+BPFX         A <- A - X

                 BPFALU+BPFMUL+BPFX         A <- A * X

                 BPFALU+BPFDIV+BPFX         A <- A / X

                 BPFALU+BPFAND+BPFX         A <- A & X

                 BPFALU+BPFOR+BPFX          A <- A | X

                 BPFALU+BPFLSH+BPFX         A <- A << X

                 BPFALU+BPFRSH+BPFX         A <- A >> X

                 BPFALU+BPFNEG               A <- -A


       BPFJMP   The jump instructions  alter  flow  of  control.
                 Conditional   jumps   compare   the  accumulator
                 against a constant (BPFK) or the index register
                 (BPFX).   If  the result is true (or non-zero),
                 the true branch is taken,  otherwise  the  false
                 branch  is taken.  Jump offsets are encoded in 8
                 bits so the longest jump  is  256  instructions.
                 However,  the  jump  always (BPFJA) opcode uses
                 the 32 bit  k  field  as  the  offset,  allowing
                 arbitrarily     distant    destinations.     All
                 conditionals     use     unsigned     comparison
                 conventions.

                 BPFJMP+BPFJA                pc += k

                 BPFJMP+BPFJGT+BPFK         pc += (A > k) ? jt
                                               : jf

                 BPFJMP+BPFJGE+BPFK         pc += (A >=  k)  ?
                                               jt : jf

                 BPFJMP+BPFJEQ+BPFK         pc  +=  (A == k) ?
                                               jt : jf

                 BPFJMP+BPFJSET+BPFK        pc += (A & k) ? jt
                                               : jf

                 BPFJMP+BPFJGT+BPFX         pc += (A > X) ? jt
                                               : jf



May                             23                              8





BPF(4)                         1991                        BPF(4)


                 BPFJMP+BPFJGE+BPFX         pc += (A >=  X)  ?
                                               jt : jf

                 BPFJMP+BPFJEQ+BPFX         pc  +=  (A == X) ?
                                               jt : jf

                 BPFJMP+BPFJSET+BPFX        pc += (A & X) ? jt
                                               : jf

       BPFRET   The  return  instructions  terminate  the filter
                 program and specify  the  amount  of  packet  to
                 accept   (i.e.,   they   return  the  truncation
                 amount).  A return value of zero indicates  that
                 the  packet should be ignored.  The return value
                 is either a constant (BPFK) or the  accumulator
                 (BPFA).

                 BPFRET+BPFA                 accept A bytes

                 BPFRET+BPFK                 accept k bytes

       BPFMISC  The   miscellaneous  category  was  created  for
                 anything  that  doesn't  fit  into   the   above
                 classes, and for any new instructions that might
                 need to be  added.   Currently,  these  are  the
                 register  transfer  intructions  that  copy  the
                 index register to the accumulator or vice versa.

                 BPFMISC+BPFTAX              X <- A

                 BPFMISC+BPFTXA              A <- X

       The   BPF  interface  provides  the  following  macros  to
       facilitate array initializers:
              BPFSTMT(opcode, operand)
              and
              BPFJUMP(opcode,       operand,        true_offset,
              false_offset)


EXAMPLES
       The following filter is taken from the Reverse ARP Daemon.
       It accepts only Reverse ARP requests.

              struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
                         sizeof(struct ether_header)),
                   BPF_STMT(BPF_RET+BPF_K, 0),
              };




May                             23                              9





BPF(4)                         1991                        BPF(4)


       This  filter  accepts  only  IP   packets   between   host
       128.3.112.15 and 128.3.112.35.

              struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 26),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 30),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 30),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
                   BPF_STMT(BPF_RET+BPF_K, 0),
              };

       Finally,  this filter returns only TCP finger packets.  We
       must parse the IP header to reach  the  TCP  header.   The
       BPFJSET instruction checks that the IP fragment offset is
       0 so we are sure that we have a TCP header.

              struct bpf_insn insns[] = {
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
                   BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
                   BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
                   BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
                   BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
                   BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
                   BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
                   BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
                   BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
                   BPF_STMT(BPF_RET+BPF_K, 0),
              };

SEE ALSO
       tcpdump(1)

       McCanne, S., Jacobson V., `An efficient,  extensible,  and
       portable network monitor'

FILES
       /dev/bpf0, /dev/bpf1, ...

BUGS
       The  read  buffer must be of a fixed size (returned by the
       BIOCGBLEN ioctl).

       A file that does not request promiscuous mode may  receive
       promiscuously received packets as a side effect of another
       file requesting this mode on the same hardware  interface.



May                             23                             10





BPF(4)                         1991                        BPF(4)


       This   could  be  fixed  in  the  kernel  with  additional
       processing overhead.  However, we favor  the  model  where
       all  files  must assume that the interface is promiscuous,
       and if so desired, must utilize a filter to reject foreign
       packets.

       Data  link  protocols with variable length headers are not
       currently supported.

       Under SunOS, if a BPF application  reads  more  than  2^31
       bytes  of  data, read will fail in EINVAL.  You can either
       fix the bug in SunOS, or lseek to 0 when  read  fails  for
       this reason.

HISTORY
       The Enet packet filter was created in 1980 by Mike Accetta
       and Rick Rashid at  Carnegie-Mellon  University.   Jeffrey
       Mogul,  at  Stanford, ported the code to BSD and continued
       its development from 1983 on.  Since then, it has  evolved
       into the Ultrix Packet Filter at DEC, a STREAMS NIT module
       under SunOS 4.1, and BPF.

AUTHORS
       Steven   McCanne,   of   Lawrence   Berkeley   Laboratory,
       implemented BPF in Summer 1990.  Much of the design is due
       to Van Jacobson.































May                             23                             11
Museum

Related Articles