Sat Mar 13 20:39:54 2010 UTC ()
Document BIOC{G,S}FEEDBACK; I forgot who sent me the patch, so whoever created
it, thanks!


(christos)
diff -r1.42 -r1.43 src/share/man/man4/bpf.4

cvs diff -r1.42 -r1.43 src/share/man/man4/bpf.4 (switch to unified diff)

--- src/share/man/man4/bpf.4 2010/01/16 18:47:50 1.42
+++ src/share/man/man4/bpf.4 2010/03/13 20:39:54 1.43
@@ -1,755 +1,770 @@ @@ -1,755 +1,770 @@
1.\" -*- nroff -*- 1.\" -*- nroff -*-
2.\" 2.\"
3.\" $NetBSD: bpf.4,v 1.42 2010/01/16 18:47:50 pooka Exp $ 3.\" $NetBSD: bpf.4,v 1.43 2010/03/13 20:39:54 christos Exp $
4.\" 4.\"
5.\" Copyright (c) 1990, 1991, 1992, 1993, 1994 5.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
6.\" The Regents of the University of California. All rights reserved. 6.\" The Regents of the University of California. All rights reserved.
7.\" 7.\"
8.\" Redistribution and use in source and binary forms, with or without 8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that: (1) source code distributions 9.\" modification, are permitted provided that: (1) source code distributions
10.\" retain the above copyright notice and this paragraph in its entirety, (2) 10.\" retain the above copyright notice and this paragraph in its entirety, (2)
11.\" distributions including binary code include the above copyright notice and 11.\" distributions including binary code include the above copyright notice and
12.\" this paragraph in its entirety in the documentation or other materials 12.\" this paragraph in its entirety in the documentation or other materials
13.\" provided with the distribution, and (3) all advertising materials mentioning 13.\" provided with the distribution, and (3) all advertising materials mentioning
14.\" features or use of this software display the following acknowledgement: 14.\" features or use of this software display the following acknowledgement:
15.\" ``This product includes software developed by the University of California, 15.\" ``This product includes software developed by the University of California,
16.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of 16.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
17.\" the University nor the names of its contributors may be used to endorse 17.\" the University nor the names of its contributors may be used to endorse
18.\" or promote products derived from this software without specific prior 18.\" or promote products derived from this software without specific prior
19.\" written permission. 19.\" written permission.
20.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED 20.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
21.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF 21.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
22.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 22.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
23.\" 23.\"
24.\" This document is derived in part from the enet man page (enet.4) 24.\" This document is derived in part from the enet man page (enet.4)
25.\" distributed with 4.3BSD Unix. 25.\" distributed with 4.3BSD Unix.
26.\" 26.\"
27.Dd January 16, 2010 27.Dd March 13, 2010
28.Dt BPF 4 28.Dt BPF 4
29.Os 29.Os
30.Sh NAME 30.Sh NAME
31.Nm bpf 31.Nm bpf
32.Nd Berkeley Packet Filter raw network interface 32.Nd Berkeley Packet Filter raw network interface
33.Sh SYNOPSIS 33.Sh SYNOPSIS
34.Cd "pseudo-device bpfilter" 34.Cd "pseudo-device bpfilter"
35.Sh DESCRIPTION 35.Sh DESCRIPTION
36The Berkeley Packet Filter 36The Berkeley Packet Filter
37provides a raw interface to data link layers in a protocol 37provides a raw interface to data link layers in a protocol
38independent fashion. 38independent fashion.
39All packets on the network, even those destined for other hosts, 39All packets on the network, even those destined for other hosts,
40are accessible through this mechanism. 40are accessible through this mechanism.
41.Pp 41.Pp
42The packet filter appears as a character special device, 42The packet filter appears as a character special device,
43.Pa /dev/bpf . 43.Pa /dev/bpf .
44After opening the device, the file descriptor must be bound to a 44After opening the device, the file descriptor must be bound to a
45specific network interface with the 45specific network interface with the
46.Dv BIOSETIF 46.Dv BIOSETIF
47ioctl. 47ioctl.
48A given interface can be shared by multiple listeners, and the filter 48A given interface can be shared by multiple listeners, and the filter
49underlying each descriptor will see an identical packet stream. 49underlying each descriptor will see an identical packet stream.
50.Pp 50.Pp
51Associated with each open instance of a 51Associated with each open instance of a
52.Nm 52.Nm
53file is a user-settable packet filter. 53file is a user-settable packet filter.
54Whenever a packet is received by an interface, 54Whenever a packet is received by an interface,
55all file descriptors listening on that interface apply their filter. 55all file descriptors listening on that interface apply their filter.
56Each descriptor that accepts the packet receives its own copy. 56Each descriptor that accepts the packet receives its own copy.
57.Pp 57.Pp
58Reads from these files return the next group of packets 58Reads from these files return the next group of packets
59that have matched the filter. 59that have matched the filter.
60To improve performance, the buffer passed to read must be 60To improve performance, the buffer passed to read must be
61the same size as the buffers used internally by 61the same size as the buffers used internally by
62.Nm . 62.Nm .
63This size is returned by the 63This size is returned by the
64.Dv BIOCGBLEN 64.Dv BIOCGBLEN
65ioctl (see below), and under 65ioctl (see below), and under
66BSD, can be set with 66BSD, can be set with
67.Dv BIOCSBLEN . 67.Dv BIOCSBLEN .
68Note that an individual packet larger than this size is necessarily 68Note that an individual packet larger than this size is necessarily
69truncated. 69truncated.
70.Pp 70.Pp
71The packet filter will support any link level protocol that has fixed length 71The packet filter will support any link level protocol that has fixed length
72headers. 72headers.
73Currently, only Ethernet, SLIP and PPP drivers have been 73Currently, only Ethernet, SLIP and PPP drivers have been
74modified to interact with 74modified to interact with
75.Nm . 75.Nm .
76.Pp 76.Pp
77Since packet data is in network byte order, applications should use the 77Since packet data is in network byte order, applications should use the
78.Xr byteorder 3 78.Xr byteorder 3
79macros to extract multi-byte values. 79macros to extract multi-byte values.
80.Pp 80.Pp
81A packet can be sent out on the network by writing to a 81A packet can be sent out on the network by writing to a
82.Nm 82.Nm
83file descriptor. 83file descriptor.
84The writes are unbuffered, meaning only one packet can be processed per write. 84The writes are unbuffered, meaning only one packet can be processed per write.
85Currently, only writes to Ethernets and SLIP links are supported. 85Currently, only writes to Ethernets and SLIP links are supported.
86.Sh IOCTLS 86.Sh IOCTLS
87The 87The
88.Xr ioctl 2 88.Xr ioctl 2
89command codes below are defined in 89command codes below are defined in
90.Aq Pa net/bpf.h . 90.Aq Pa net/bpf.h .
91All commands require these includes: 91All commands require these includes:
92.Bd -literal -offset indent 92.Bd -literal -offset indent
93#include \*[Lt]sys/types.h\*[Gt] 93#include \*[Lt]sys/types.h\*[Gt]
94#include \*[Lt]sys/time.h\*[Gt] 94#include \*[Lt]sys/time.h\*[Gt]
95#include \*[Lt]sys/ioctl.h\*[Gt] 95#include \*[Lt]sys/ioctl.h\*[Gt]
96#include \*[Lt]net/bpf.h\*[Gt] 96#include \*[Lt]net/bpf.h\*[Gt]
97.Ed 97.Ed
98.Pp 98.Pp
99Additionally, 99Additionally,
100.Dv BIOCGETIF 100.Dv BIOCGETIF
101and 101and
102.Dv BIOCSETIF 102.Dv BIOCSETIF
103require 103require
104.Pa \*[Lt]net/if.h\*[Gt] . 104.Pa \*[Lt]net/if.h\*[Gt] .
105.Pp 105.Pp
106The (third) argument to the 106The (third) argument to the
107.Xr ioctl 2 107.Xr ioctl 2
108should be a pointer to the type indicated. 108should be a pointer to the type indicated.
109.Bl -tag -width indent -offset indent 109.Bl -tag -width indent -offset indent
110.It Dv "BIOCGBLEN (u_int)" 110.It Dv "BIOCGBLEN (u_int)"
111Returns the required buffer length for reads on 111Returns the required buffer length for reads on
112.Nm 112.Nm
113files. 113files.
114.It Dv "BIOCSBLEN (u_int)" 114.It Dv "BIOCSBLEN (u_int)"
115Sets the buffer length for reads on 115Sets the buffer length for reads on
116.Nm 116.Nm
117files. 117files.
118The buffer must be set before the file is attached to an interface with 118The buffer must be set before the file is attached to an interface with
119.Dv BIOCSETIF . 119.Dv BIOCSETIF .
120If the requested buffer size cannot be accommodated, the closest 120If the requested buffer size cannot be accommodated, the closest
121allowable size will be set and returned in the argument. 121allowable size will be set and returned in the argument.
122A read call will result in 122A read call will result in
123.Er EINVAL 123.Er EINVAL
124if it is passed a buffer that is not this size. 124if it is passed a buffer that is not this size.
125.It Dv BIOCGDLT (u_int) 125.It Dv BIOCGDLT (u_int)
126Returns the type of the data link layer underlying the attached interface. 126Returns the type of the data link layer underlying the attached interface.
127.Er EINVAL 127.Er EINVAL
128is returned if no interface has been specified. 128is returned if no interface has been specified.
129The device types, prefixed with 129The device types, prefixed with
130.Dq DLT_ , 130.Dq DLT_ ,
131are defined in 131are defined in
132.Aq Pa net/bpf.h . 132.Aq Pa net/bpf.h .
133.It Dv BIOCGDLTLIST (struct bpf_dltlist) 133.It Dv BIOCGDLTLIST (struct bpf_dltlist)
134Returns an array of available type of the data link layer 134Returns an array of available type of the data link layer
135underlying the attached interface: 135underlying the attached interface:
136.Bd -literal -offset indent 136.Bd -literal -offset indent
137struct bpf_dltlist { 137struct bpf_dltlist {
138 u_int bfl_len; 138 u_int bfl_len;
139 u_int *bfl_list; 139 u_int *bfl_list;
140}; 140};
141.Ed 141.Ed
142.Pp 142.Pp
143The available type is returned to the array pointed to the 143The available type is returned to the array pointed to the
144.Va bfl_list 144.Va bfl_list
145field while its length in u_int is supplied to the 145field while its length in u_int is supplied to the
146.Va bfl_len 146.Va bfl_len
147field. 147field.
148.Er ENOMEM 148.Er ENOMEM
149is returned if there is not enough buffer. 149is returned if there is not enough buffer.
150The 150The
151.Va bfl_len 151.Va bfl_len
152field is modified on return to indicate the actual length in u_int 152field is modified on return to indicate the actual length in u_int
153of the array returned. 153of the array returned.
154If 154If
155.Va bfl_list 155.Va bfl_list
156is 156is
157.Dv NULL , 157.Dv NULL ,
158the 158the
159.Va bfl_len 159.Va bfl_len
160field is returned to indicate the required length of an array in u_int. 160field is returned to indicate the required length of an array in u_int.
161.It Dv BIOCSDLT (u_int) 161.It Dv BIOCSDLT (u_int)
162Change the type of the data link layer underlying the attached interface. 162Change the type of the data link layer underlying the attached interface.
163.Er EINVAL 163.Er EINVAL
164is returned if no interface has been specified or the specified 164is returned if no interface has been specified or the specified
165type is not available for the interface. 165type is not available for the interface.
166.It Dv BIOCPROMISC 166.It Dv BIOCPROMISC
167Forces the interface into promiscuous mode. 167Forces the interface into promiscuous mode.
168All packets, not just those destined for the local host, are processed. 168All packets, not just those destined for the local host, are processed.
169Since more than one file can be listening on a given interface, 169Since more than one file can be listening on a given interface,
170a listener that opened its interface non-promiscuously may receive 170a listener that opened its interface non-promiscuously may receive
171packets promiscuously. 171packets promiscuously.
172This problem can be remedied with an appropriate filter. 172This problem can be remedied with an appropriate filter.
173.Pp 173.Pp
174The interface remains in promiscuous mode until all files listening 174The interface remains in promiscuous mode until all files listening
175promiscuously are closed. 175promiscuously are closed.
176.It Dv BIOCFLUSH 176.It Dv BIOCFLUSH
177Flushes the buffer of incoming packets, 177Flushes the buffer of incoming packets,
178and resets the statistics that are returned by 178and resets the statistics that are returned by
179.Dv BIOCGSTATS . 179.Dv BIOCGSTATS .
180.It Dv BIOCGETIF (struct ifreq) 180.It Dv BIOCGETIF (struct ifreq)
181Returns the name of the hardware interface that the file is listening on. 181Returns the name of the hardware interface that the file is listening on.
182The name is returned in the ifr_name field of 182The name is returned in the ifr_name field of
183.Fa ifr . 183.Fa ifr .
184All other fields are undefined. 184All other fields are undefined.
185.It Dv BIOCSETIF (struct ifreq) 185.It Dv BIOCSETIF (struct ifreq)
186Sets the hardware interface associate with the file. 186Sets the hardware interface associate with the file.
187This command must be performed before any packets can be read. 187This command must be performed before any packets can be read.
188The device is indicated by name using the 188The device is indicated by name using the
189.Dv ifr_name 189.Dv ifr_name
190field of the 190field of the
191.Fa ifreq . 191.Fa ifreq .
192Additionally, performs the actions of 192Additionally, performs the actions of
193.Dv BIOCFLUSH . 193.Dv BIOCFLUSH .
194.It Dv BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval) 194.It Dv BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval)
195Set or get the read timeout parameter. 195Set or get the read timeout parameter.
196The 196The
197.Fa timeval 197.Fa timeval
198specifies the length of time to wait before timing 198specifies the length of time to wait before timing
199out on a read request. 199out on a read request.
200This parameter is initialized to zero by 200This parameter is initialized to zero by
201.Xr open 2 , 201.Xr open 2 ,
202indicating no timeout. 202indicating no timeout.
203.It Dv BIOCGSTATS (struct bpf_stat) 203.It Dv BIOCGSTATS (struct bpf_stat)
204Returns the following structure of packet statistics: 204Returns the following structure of packet statistics:
205.Bd -literal -offset indent 205.Bd -literal -offset indent
206struct bpf_stat { 206struct bpf_stat {
207 uint64_t bs_recv; 207 uint64_t bs_recv;
208 uint64_t bs_drop; 208 uint64_t bs_drop;
209 uint64_t bs_capt; 209 uint64_t bs_capt;
210 uint64_t bs_padding[13]; 210 uint64_t bs_padding[13];
211}; 211};
212.Ed 212.Ed
213.Pp 213.Pp
214The fields are: 214The fields are:
215.Bl -tag -width bs_recv -offset indent 215.Bl -tag -width bs_recv -offset indent
216.It Va bs_recv 216.It Va bs_recv
217the number of packets received by the descriptor since opened or reset 217the number of packets received by the descriptor since opened or reset
218(including any buffered since the last read call); 218(including any buffered since the last read call);
219.It Va bs_drop 219.It Va bs_drop
220the number of packets which were accepted by the filter but dropped by the 220the number of packets which were accepted by the filter but dropped by the
221kernel because of buffer overflows 221kernel because of buffer overflows
222(i.e., the application's reads aren't keeping up with the packet 222(i.e., the application's reads aren't keeping up with the packet
223traffic); and 223traffic); and
224.It Va bs_capt 224.It Va bs_capt
225the number of packets accepted by the filter. 225the number of packets accepted by the filter.
226.El 226.El
227.It Dv BIOCIMMEDIATE (u_int) 227.It Dv BIOCIMMEDIATE (u_int)
228Enable or disable 228Enable or disable
229.Dq immediate mode , 229.Dq immediate mode ,
230based on the truth value of the argument. 230based on the truth value of the argument.
231When immediate mode is enabled, reads return immediately upon packet 231When immediate mode is enabled, reads return immediately upon packet
232reception. 232reception.
233Otherwise, a read will block until either the kernel buffer 233Otherwise, a read will block until either the kernel buffer
234becomes full or a timeout occurs. 234becomes full or a timeout occurs.
235This is useful for programs like 235This is useful for programs like
236.Xr rarpd 8 , 236.Xr rarpd 8 ,
237which must respond to messages in real time. 237which must respond to messages in real time.
238The default for a new file is off. 238The default for a new file is off.
239.It Dv BIOCSETF (struct bpf_program) 239.It Dv BIOCSETF (struct bpf_program)
240Sets the filter program used by the kernel to discard uninteresting 240Sets the filter program used by the kernel to discard uninteresting
241packets. 241packets.
242An array of instructions and its length is passed in using the following structure: 242An array of instructions and its length is passed in using the following structure:
243.Bd -literal -offset indent 243.Bd -literal -offset indent
244struct bpf_program { 244struct bpf_program {
245 u_int bf_len; 245 u_int bf_len;
246 struct bpf_insn *bf_insns; 246 struct bpf_insn *bf_insns;
247}; 247};
248.Ed 248.Ed
249.Pp 249.Pp
250The filter program is pointed to by the 250The filter program is pointed to by the
251.Va bf_insns 251.Va bf_insns
252field while its length in units of 252field while its length in units of
253.Sq struct bpf_insn 253.Sq struct bpf_insn
254is given by the 254is given by the
255.Va bf_len 255.Va bf_len
256field. 256field.
257Also, the actions of 257Also, the actions of
258.Dv BIOCFLUSH 258.Dv BIOCFLUSH
259are performed. 259are performed.
260.Pp 260.Pp
261See section 261See section
262.Sy FILTER MACHINE 262.Sy FILTER MACHINE
263for an explanation of the filter language. 263for an explanation of the filter language.
264.It Dv BIOCVERSION (struct bpf_version) 264.It Dv BIOCVERSION (struct bpf_version)
265Returns the major and minor version numbers of the filter language currently 265Returns the major and minor version numbers of the filter language currently
266recognized by the kernel. 266recognized by the kernel.
267Before installing a filter, applications must check 267Before installing a filter, applications must check
268that the current version is compatible with the running kernel. 268that the current version is compatible with the running kernel.
269Version numbers are compatible if the major numbers match and the 269Version numbers are compatible if the major numbers match and the
270application minor is less than or equal to the kernel minor. 270application minor is less than or equal to the kernel minor.
271The kernel version number is returned in the following structure: 271The kernel version number is returned in the following structure:
272.Bd -literal -offset indent 272.Bd -literal -offset indent
273struct bpf_version { 273struct bpf_version {
274 u_short bv_major; 274 u_short bv_major;
275 u_short bv_minor; 275 u_short bv_minor;
276}; 276};
277.Ed 277.Ed
278.Pp 278.Pp
279The current version numbers are given by 279The current version numbers are given by
280.Dv BPF_MAJOR_VERSION 280.Dv BPF_MAJOR_VERSION
281and 281and
282.Dv BPF_MINOR_VERSION 282.Dv BPF_MINOR_VERSION
283from 283from
284.Aq Pa net/bpf.h . 284.Aq Pa net/bpf.h .
285An incompatible filter 285An incompatible filter
286may result in undefined behavior (most likely, an error returned by 286may result in undefined behavior (most likely, an error returned by
287.Xr ioctl 2 287.Xr ioctl 2
288or haphazard packet matching). 288or haphazard packet matching).
289.It Dv BIOCGHDRCMPLT BIOCSHDRCMPLT (u_int) 289.It Dv BIOCGHDRCMPLT BIOCSHDRCMPLT (u_int)
290Enable/disable or get the 290Enable/disable or get the
291.Dq header complete 291.Dq header complete
292flag status. 292flag status.
293If enabled, packets written to the bpf file descriptor will not have 293If enabled, packets written to the bpf file descriptor will not have
294network layer headers rewritten in the interface output routine. 294network layer headers rewritten in the interface output routine.
295By default, the flag is disabled (value is 0). 295By default, the flag is disabled (value is 0).
296.It Dv BIOCGSEESENT BIOCSSEESENT (u_int) 296.It Dv BIOCGSEESENT BIOCSSEESENT (u_int)
297Enable/disable or get the 297Enable/disable or get the
298.Dq see sent 298.Dq see sent
299flag status. 299flag status.
300If enabled, packets sent will be passed to the filter. 300If enabled, packets sent by the host (not from
 301.Nm )
 302will be passed to the filter.
301By default, the flag is enabled (value is 1). 303By default, the flag is enabled (value is 1).
 304.It Dv BIOCFEEDBACK BIOCSFEEDBACK BIOCGFEEDBACK (u_int)
 305Set (or get)
 306.Dq packet feedback mode .
 307This allows injected packets to be fed back as input to the interface when
 308output via the interface is successful.
 309The first name is meant for FreeBSD compatibility, the two others follow
 310the Get/Set convention.
 311.\"When
 312.\".Dv BPF_D_INOUT
 313.\"direction is set, injected
 314Injected
 315outgoing packets are not returned by BPF to avoid
 316duplication. This flag is initialized to zero by default.
302.El 317.El
303.Sh STANDARD IOCTLS 318.Sh STANDARD IOCTLS
304.Nm 319.Nm
305now supports several standard 320now supports several standard
306.Xr ioctl 2 Ns 's 321.Xr ioctl 2 Ns 's
307which allow the user to do async and/or non-blocking I/O to an open 322which allow the user to do async and/or non-blocking I/O to an open
308.Nm bpf 323.Nm bpf
309file descriptor. 324file descriptor.
310.Bl -tag -width indent -offset indent 325.Bl -tag -width indent -offset indent
311.It Dv FIONREAD (int) 326.It Dv FIONREAD (int)
312Returns the number of bytes that are immediately available for reading. 327Returns the number of bytes that are immediately available for reading.
313.It Dv SIOCGIFADDR (struct ifreq) 328.It Dv SIOCGIFADDR (struct ifreq)
314Returns the address associated with the interface. 329Returns the address associated with the interface.
315.It Dv FIONBIO (int) 330.It Dv FIONBIO (int)
316Set or clear non-blocking I/O. 331Set or clear non-blocking I/O.
317If arg is non-zero, then doing a 332If arg is non-zero, then doing a
318.Xr read 2 333.Xr read 2
319when no data is available will return -1 and 334when no data is available will return -1 and
320.Va errno 335.Va errno
321will be set to 336will be set to
322.Er EAGAIN . 337.Er EAGAIN .
323If arg is zero, non-blocking I/O is disabled. 338If arg is zero, non-blocking I/O is disabled.
324Note: setting this 339Note: setting this
325overrides the timeout set by 340overrides the timeout set by
326.Dv BIOCSRTIMEOUT . 341.Dv BIOCSRTIMEOUT .
327.It Dv FIOASYNC (int) 342.It Dv FIOASYNC (int)
328Enable or disable async I/O. 343Enable or disable async I/O.
329When enabled (arg is non-zero), the process or process group specified by 344When enabled (arg is non-zero), the process or process group specified by
330.Dv FIOSETOWN 345.Dv FIOSETOWN
331will start receiving SIGIO's when packets 346will start receiving SIGIO's when packets
332arrive. 347arrive.
333Note that you must do an 348Note that you must do an
334.Dv FIOSETOWN 349.Dv FIOSETOWN
335in order for this to take effect, as 350in order for this to take effect, as
336the system will not default this for you. 351the system will not default this for you.
337The signal may be changed via 352The signal may be changed via
338.Dv BIOCSRSIG . 353.Dv BIOCSRSIG .
339.It Dv FIOSETOWN FIOGETOWN (int) 354.It Dv FIOSETOWN FIOGETOWN (int)
340Set or get the process or process group (if negative) that should receive SIGIO 355Set or get the process or process group (if negative) that should receive SIGIO
341when packets are available. 356when packets are available.
342The signal may be changed using 357The signal may be changed using
343.Dv BIOCSRSIG 358.Dv BIOCSRSIG
344(see above). 359(see above).
345.El 360.El
346.Sh BPF HEADER 361.Sh BPF HEADER
347The following structure is prepended to each packet returned by 362The following structure is prepended to each packet returned by
348.Xr read 2 : 363.Xr read 2 :
349.Bd -literal -offset indent 364.Bd -literal -offset indent
350struct bpf_hdr { 365struct bpf_hdr {
351 struct bpf_timeval bh_tstamp; 366 struct bpf_timeval bh_tstamp;
352 uint32_t bh_caplen; 367 uint32_t bh_caplen;
353 uint32_t bh_datalen; 368 uint32_t bh_datalen;
354 uint16_t bh_hdrlen; 369 uint16_t bh_hdrlen;
355}; 370};
356.Ed 371.Ed
357.Pp 372.Pp
358The fields, whose values are stored in host order, and are: 373The fields, whose values are stored in host order, and are:
359.Bl -tag -width bh_datalen -offset indent 374.Bl -tag -width bh_datalen -offset indent
360.It Va bh_tstamp 375.It Va bh_tstamp
361The time at which the packet was processed by the packet filter. 376The time at which the packet was processed by the packet filter.
362This structure differs from the standard 377This structure differs from the standard
363.Vt struct timeval 378.Vt struct timeval
364in that both members are of type 379in that both members are of type
365.Vt long . 380.Vt long .
366.It Va bh_caplen 381.It Va bh_caplen
367The length of the captured portion of the packet. 382The length of the captured portion of the packet.
368This is the minimum of 383This is the minimum of
369the truncation amount specified by the filter and the length of the packet. 384the truncation amount specified by the filter and the length of the packet.
370.It Va bh_datalen 385.It Va bh_datalen
371The length of the packet off the wire. 386The length of the packet off the wire.
372This value is independent of the truncation amount specified by the filter. 387This value is independent of the truncation amount specified by the filter.
373.It Va bh_hdrlen 388.It Va bh_hdrlen
374The length of the BPF header, which may not be equal to 389The length of the BPF header, which may not be equal to
375.Em sizeof(struct bpf_hdr) . 390.Em sizeof(struct bpf_hdr) .
376.El 391.El
377.Pp 392.Pp
378The 393The
379.Va bh_hdrlen 394.Va bh_hdrlen
380field exists to account for 395field exists to account for
381padding between the header and the link level protocol. 396padding between the header and the link level protocol.
382The purpose here is to guarantee proper alignment of the packet 397The purpose here is to guarantee proper alignment of the packet
383data structures, which is required on alignment sensitive 398data structures, which is required on alignment sensitive
384architectures and improves performance on many other architectures. 399architectures and improves performance on many other architectures.
385The packet filter ensures that the 400The packet filter ensures that the
386.Va bpf_hdr 401.Va bpf_hdr
387and the 402and the
388.Em network layer 403.Em network layer
389header will be word aligned. 404header will be word aligned.
390Suitable precautions must be taken when accessing the link layer 405Suitable precautions must be taken when accessing the link layer
391protocol fields on alignment restricted machines. 406protocol fields on alignment restricted machines.
392(This isn't a problem on an Ethernet, since 407(This isn't a problem on an Ethernet, since
393the type field is a short falling on an even offset, 408the type field is a short falling on an even offset,
394and the addresses are probably accessed in a bytewise fashion). 409and the addresses are probably accessed in a bytewise fashion).
395.Pp 410.Pp
396Additionally, individual packets are padded so that each starts 411Additionally, individual packets are padded so that each starts
397on a word boundary. 412on a word boundary.
398This requires that an application 413This requires that an application
399has some knowledge of how to get from packet to packet. 414has some knowledge of how to get from packet to packet.
400The macro 415The macro
401.Dv BPF_WORDALIGN 416.Dv BPF_WORDALIGN
402is defined in 417is defined in
403.Aq Pa net/bpf.h 418.Aq Pa net/bpf.h
404to facilitate this process. 419to facilitate this process.
405It rounds up its argument 420It rounds up its argument
406to the nearest word aligned value (where a word is 421to the nearest word aligned value (where a word is
407.Dv BPF_ALIGNMENT 422.Dv BPF_ALIGNMENT
408bytes wide). 423bytes wide).
409.Pp 424.Pp
410For example, if 425For example, if
411.Sq Va p 426.Sq Va p
412points to the start of a packet, this expression 427points to the start of a packet, this expression
413will advance it to the next packet: 428will advance it to the next packet:
414.Pp 429.Pp
415.Dl p = (char *)p + BPF_WORDALIGN(p-\*[Gt]bh_hdrlen + p-\*[Gt]bh_caplen) 430.Dl p = (char *)p + BPF_WORDALIGN(p-\*[Gt]bh_hdrlen + p-\*[Gt]bh_caplen)
416.Pp 431.Pp
417For the alignment mechanisms to work properly, the 432For the alignment mechanisms to work properly, the
418buffer passed to 433buffer passed to
419.Xr read 2 434.Xr read 2
420must itself be word aligned. 435must itself be word aligned.
421.Xr malloc 3 436.Xr malloc 3
422will always return an aligned buffer. 437will always return an aligned buffer.
423.Sh FILTER MACHINE 438.Sh FILTER MACHINE
424A filter program is an array of instructions, with all branches forwardly 439A filter program is an array of instructions, with all branches forwardly
425directed, terminated by a 440directed, terminated by a
426.Sy return 441.Sy return
427instruction. 442instruction.
428Each instruction performs some action on the pseudo-machine state, 443Each instruction performs some action on the pseudo-machine state,
429which consists of an accumulator, index register, scratch memory store, 444which consists of an accumulator, index register, scratch memory store,
430and implicit program counter. 445and implicit program counter.
431.Pp 446.Pp
432The following structure defines the instruction format: 447The following structure defines the instruction format:
433.Bd -literal -offset indent 448.Bd -literal -offset indent
434struct bpf_insn { 449struct bpf_insn {
435 uint16_t code; 450 uint16_t code;
436 u_char jt; 451 u_char jt;
437 u_char jf; 452 u_char jf;
438 int32_t k; 453 int32_t k;
439}; 454};
440.Ed 455.Ed
441.Pp 456.Pp
442The 457The
443.Va k 458.Va k
444field is used in different ways by different instructions, 459field is used in different ways by different instructions,
445and the 460and the
446.Va jt 461.Va jt
447and 462and
448.Va jf 463.Va jf
449fields are used as offsets 464fields are used as offsets
450by the branch instructions. 465by the branch instructions.
451The opcodes are encoded in a semi-hierarchical fashion. 466The opcodes are encoded in a semi-hierarchical fashion.
452There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX, 467There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
453BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC. 468BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC.
454Various other mode and 469Various other mode and
455operator bits are or'd into the class to give the actual instructions. 470operator bits are or'd into the class to give the actual instructions.
456The classes and modes are defined in 471The classes and modes are defined in
457.Aq Pa net/bpf.h . 472.Aq Pa net/bpf.h .
458.Pp 473.Pp
459Below are the semantics for each defined BPF instruction. 474Below are the semantics for each defined BPF instruction.
460We use the convention that A is the accumulator, X is the index register, 475We use the convention that A is the accumulator, X is the index register,
461P[] packet data, and M[] scratch memory store. 476P[] packet data, and M[] scratch memory store.
462P[i:n] gives the data at byte offset 477P[i:n] gives the data at byte offset
463.Dq i 478.Dq i
464in the packet, 479in the packet,
465interpreted as a word (n=4), 480interpreted as a word (n=4),
466unsigned halfword (n=2), or unsigned byte (n=1). 481unsigned halfword (n=2), or unsigned byte (n=1).
467M[i] gives the i'th word in the scratch memory store, which is only 482M[i] gives the i'th word in the scratch memory store, which is only
468addressed in word units. 483addressed in word units.
469The memory store is indexed from 0 to BPF_MEMWORDS-1. 484The memory store is indexed from 0 to BPF_MEMWORDS-1.
470.Va k , 485.Va k ,
471.Va jt , 486.Va jt ,
472and 487and
473.Va jf 488.Va jf
474are the corresponding fields in the 489are the corresponding fields in the
475instruction definition. 490instruction definition.
476.Dq len 491.Dq len
477refers to the length of the packet. 492refers to the length of the packet.
478.Bl -tag -width indent -offset indent 493.Bl -tag -width indent -offset indent
479.It Sy BPF_LD 494.It Sy BPF_LD
480These instructions copy a value into the accumulator. 495These instructions copy a value into the accumulator.
481The type of the source operand is specified by an 496The type of the source operand is specified by an
482.Dq addressing mode 497.Dq addressing mode
483and can be a constant 498and can be a constant
484.Sy ( BBPF_IMM ) , 499.Sy ( BBPF_IMM ) ,
485packet data at a fixed offset 500packet data at a fixed offset
486.Sy ( BPF_ABS ) , 501.Sy ( BPF_ABS ) ,
487packet data at a variable offset 502packet data at a variable offset
488.Sy ( BPF_IND ) , 503.Sy ( BPF_IND ) ,
489the packet length 504the packet length
490.Sy ( BPF_LEN ) , 505.Sy ( BPF_LEN ) ,
491or a word in the scratch memory store 506or a word in the scratch memory store
492.Sy ( BPF_MEM ) . 507.Sy ( BPF_MEM ) .
493For 508For
494.Sy BPF_IND 509.Sy BPF_IND
495and 510and
496.Sy BPF_ABS , 511.Sy BPF_ABS ,
497the data size must be specified as a word 512the data size must be specified as a word
498.Sy ( BPF_W ) , 513.Sy ( BPF_W ) ,
499halfword 514halfword
500.Sy ( BPF_H ) , 515.Sy ( BPF_H ) ,
501or byte 516or byte
502.Sy ( BPF_B ) . 517.Sy ( BPF_B ) .
503The semantics of all the recognized BPF_LD instructions follow. 518The semantics of all the recognized BPF_LD instructions follow.
504.Bl -column "BPF_LD_BPF_W_BPF_ABS" "A \*[Lt]- P[k:4]" -offset indent 519.Bl -column "BPF_LD_BPF_W_BPF_ABS" "A \*[Lt]- P[k:4]" -offset indent
505.It Sy BPF_LD+BPF_W+BPF_ABS Ta A \*[Lt]- P[k:4] 520.It Sy BPF_LD+BPF_W+BPF_ABS Ta A \*[Lt]- P[k:4]
506.It Sy BPF_LD+BPF_H+BPF_ABS Ta A \*[Lt]- P[k:2] 521.It Sy BPF_LD+BPF_H+BPF_ABS Ta A \*[Lt]- P[k:2]
507.It Sy BPF_LD+BPF_B+BPF_ABS Ta A \*[Lt]- P[k:1] 522.It Sy BPF_LD+BPF_B+BPF_ABS Ta A \*[Lt]- P[k:1]
508.It Sy BPF_LD+BPF_W+BPF_IND Ta A \*[Lt]- P[X+k:4] 523.It Sy BPF_LD+BPF_W+BPF_IND Ta A \*[Lt]- P[X+k:4]
509.It Sy BPF_LD+BPF_H+BPF_IND Ta A \*[Lt]- P[X+k:2] 524.It Sy BPF_LD+BPF_H+BPF_IND Ta A \*[Lt]- P[X+k:2]
510.It Sy BPF_LD+BPF_B+BPF_IND Ta A \*[Lt]- P[X+k:1] 525.It Sy BPF_LD+BPF_B+BPF_IND Ta A \*[Lt]- P[X+k:1]
511.It Sy BPF_LD+BPF_W+BPF_LEN Ta A \*[Lt]- len 526.It Sy BPF_LD+BPF_W+BPF_LEN Ta A \*[Lt]- len
512.It Sy BPF_LD+BPF_IMM Ta A \*[Lt]- k 527.It Sy BPF_LD+BPF_IMM Ta A \*[Lt]- k
513.It Sy BPF_LD+BPF_MEM Ta A \*[Lt]- M[k] 528.It Sy BPF_LD+BPF_MEM Ta A \*[Lt]- M[k]
514.El 529.El
515.It Sy BPF_LDX 530.It Sy BPF_LDX
516These instructions load a value into the index register. 531These instructions load a value into the index register.
517Note that the addressing modes are more restricted than those of 532Note that the addressing modes are more restricted than those of
518the accumulator loads, but they include 533the accumulator loads, but they include
519.Sy BPF_MSH , 534.Sy BPF_MSH ,
520a hack for efficiently loading the IP header length. 535a hack for efficiently loading the IP header length.
521.Bl -column "BPF_LDX_BPF_W_BPF_IMM" "X \*[Lt]- k" -offset indent 536.Bl -column "BPF_LDX_BPF_W_BPF_IMM" "X \*[Lt]- k" -offset indent
522.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X \*[Lt]- k 537.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X \*[Lt]- k
523.It Sy BPF_LDX+BPF_W+BPF_MEM Ta X \*[Lt]- M[k] 538.It Sy BPF_LDX+BPF_W+BPF_MEM Ta X \*[Lt]- M[k]
524.It Sy BPF_LDX+BPF_W+BPF_LEN Ta X \*[Lt]- len 539.It Sy BPF_LDX+BPF_W+BPF_LEN Ta X \*[Lt]- len
525.It Sy BPF_LDX+BPF_B+BPF_MSH Ta X \*[Lt]- 4*(P[k:1]\*[Am]0xf) 540.It Sy BPF_LDX+BPF_B+BPF_MSH Ta X \*[Lt]- 4*(P[k:1]\*[Am]0xf)
526.El 541.El
527.It Sy BPF_ST 542.It Sy BPF_ST
528This instruction stores the accumulator into the scratch memory. 543This instruction stores the accumulator into the scratch memory.
529We do not need an addressing mode since there is only one possibility 544We do not need an addressing mode since there is only one possibility
530for the destination. 545for the destination.
531.Bl -column "BPF_ST" "M[k] \*[Lt]- A" -offset indent 546.Bl -column "BPF_ST" "M[k] \*[Lt]- A" -offset indent
532.It Sy BPF_ST Ta M[k] \*[Lt]- A 547.It Sy BPF_ST Ta M[k] \*[Lt]- A
533.El 548.El
534.It Sy BPF_STX 549.It Sy BPF_STX
535This instruction stores the index register in the scratch memory store. 550This instruction stores the index register in the scratch memory store.
536.Bl -column "BPF_STX" "M[k] \*[Lt]- X" -offset indent 551.Bl -column "BPF_STX" "M[k] \*[Lt]- X" -offset indent
537.It Sy BPF_STX Ta M[k] \*[Lt]- X 552.It Sy BPF_STX Ta M[k] \*[Lt]- X
538.El 553.El
539.It Sy BPF_ALU 554.It Sy BPF_ALU
540The alu instructions perform operations between the accumulator and 555The alu instructions perform operations between the accumulator and
541index register or constant, and store the result back in the accumulator. 556index register or constant, and store the result back in the accumulator.
542For binary operations, a source mode is required 557For binary operations, a source mode is required
543.Sy ( BPF_K 558.Sy ( BPF_K
544or 559or
545.Sy BPF_X ) . 560.Sy BPF_X ) .
546.Bl -column "BPF_ALU_BPF_ADD_BPF_K" "A \*[Lt]- A + k" -offset indent 561.Bl -column "BPF_ALU_BPF_ADD_BPF_K" "A \*[Lt]- A + k" -offset indent
547.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A \*[Lt]- A + k 562.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A \*[Lt]- A + k
548.It Sy BPF_ALU+BPF_SUB+BPF_K Ta A \*[Lt]- A - k 563.It Sy BPF_ALU+BPF_SUB+BPF_K Ta A \*[Lt]- A - k
549.It Sy BPF_ALU+BPF_MUL+BPF_K Ta A \*[Lt]- A * k 564.It Sy BPF_ALU+BPF_MUL+BPF_K Ta A \*[Lt]- A * k
550.It Sy BPF_ALU+BPF_DIV+BPF_K Ta A \*[Lt]- A / k 565.It Sy BPF_ALU+BPF_DIV+BPF_K Ta A \*[Lt]- A / k
551.It Sy BPF_ALU+BPF_AND+BPF_K Ta A \*[Lt]- A \*[Am] k 566.It Sy BPF_ALU+BPF_AND+BPF_K Ta A \*[Lt]- A \*[Am] k
552.It Sy BPF_ALU+BPF_OR+BPF_K Ta A \*[Lt]- A | k 567.It Sy BPF_ALU+BPF_OR+BPF_K Ta A \*[Lt]- A | k
553.It Sy BPF_ALU+BPF_LSH+BPF_K Ta A \*[Lt]- A \*[Lt]\*[Lt] k 568.It Sy BPF_ALU+BPF_LSH+BPF_K Ta A \*[Lt]- A \*[Lt]\*[Lt] k
554.It Sy BPF_ALU+BPF_RSH+BPF_K Ta A \*[Lt]- A \*[Gt]\*[Gt] k 569.It Sy BPF_ALU+BPF_RSH+BPF_K Ta A \*[Lt]- A \*[Gt]\*[Gt] k
555.It Sy BPF_ALU+BPF_ADD+BPF_X Ta A \*[Lt]- A + X 570.It Sy BPF_ALU+BPF_ADD+BPF_X Ta A \*[Lt]- A + X
556.It Sy BPF_ALU+BPF_SUB+BPF_X Ta A \*[Lt]- A - X 571.It Sy BPF_ALU+BPF_SUB+BPF_X Ta A \*[Lt]- A - X
557.It Sy BPF_ALU+BPF_MUL+BPF_X Ta A \*[Lt]- A * X 572.It Sy BPF_ALU+BPF_MUL+BPF_X Ta A \*[Lt]- A * X
558.It Sy BPF_ALU+BPF_DIV+BPF_X Ta A \*[Lt]- A / X 573.It Sy BPF_ALU+BPF_DIV+BPF_X Ta A \*[Lt]- A / X
559.It Sy BPF_ALU+BPF_AND+BPF_X Ta A \*[Lt]- A \*[Am] X 574.It Sy BPF_ALU+BPF_AND+BPF_X Ta A \*[Lt]- A \*[Am] X
560.It Sy BPF_ALU+BPF_OR+BPF_X Ta A \*[Lt]- A | X 575.It Sy BPF_ALU+BPF_OR+BPF_X Ta A \*[Lt]- A | X
561.It Sy BPF_ALU+BPF_LSH+BPF_X Ta A \*[Lt]- A \*[Lt]\*[Lt] X 576.It Sy BPF_ALU+BPF_LSH+BPF_X Ta A \*[Lt]- A \*[Lt]\*[Lt] X
562.It Sy BPF_ALU+BPF_RSH+BPF_X Ta A \*[Lt]- A \*[Gt]\*[Gt] X 577.It Sy BPF_ALU+BPF_RSH+BPF_X Ta A \*[Lt]- A \*[Gt]\*[Gt] X
563.It Sy BPF_ALU+BPF_NEG Ta A \*[Lt]- -A 578.It Sy BPF_ALU+BPF_NEG Ta A \*[Lt]- -A
564.El 579.El
565.It Sy BPF_JMP 580.It Sy BPF_JMP
566The jump instructions alter flow of control. 581The jump instructions alter flow of control.
567Conditional jumps compare the accumulator against a constant 582Conditional jumps compare the accumulator against a constant
568.Sy ( BPF_K ) 583.Sy ( BPF_K )
569or the index register 584or the index register
570.Sy ( BPF_X ) . 585.Sy ( BPF_X ) .
571If the result is true (or non-zero), 586If the result is true (or non-zero),
572the true branch is taken, otherwise the false branch is taken. 587the true branch is taken, otherwise the false branch is taken.
573Jump offsets are encoded in 8 bits so the longest jump is 256 instructions. 588Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
574However, the jump always 589However, the jump always
575.Sy ( BPF_JA ) 590.Sy ( BPF_JA )
576opcode uses the 32 bit 591opcode uses the 32 bit
577.Va k 592.Va k
578field as the offset, allowing arbitrarily distant destinations. 593field as the offset, allowing arbitrarily distant destinations.
579All conditionals use unsigned comparison conventions. 594All conditionals use unsigned comparison conventions.
580.Bl -column "BPF_JMP+BPF_JGE+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -offset indent 595.Bl -column "BPF_JMP+BPF_JGE+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -offset indent
581.It Sy BPF_JMP+BPF_JA Ta pc += k 596.It Sy BPF_JMP+BPF_JA Ta pc += k
582.It Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A \*[Gt] k) ? jt : jf" 597.It Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A \*[Gt] k) ? jt : jf"
583.It Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf" 598.It Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
584.It Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf" 599.It Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf"
585.It Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A \*[Am] k) ? jt : jf" 600.It Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A \*[Am] k) ? jt : jf"
586.It Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A \*[Gt] X) ? jt : jf" 601.It Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A \*[Gt] X) ? jt : jf"
587.It Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf" 602.It Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
588.It Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf" 603.It Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf"
589.It Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A \*[Am] X) ? jt : jf" 604.It Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A \*[Am] X) ? jt : jf"
590.El 605.El
591.It Sy BPF_RET 606.It Sy BPF_RET
592The return instructions terminate the filter program and specify the amount 607The return instructions terminate the filter program and specify the amount
593of packet to accept (i.e., they return the truncation amount). 608of packet to accept (i.e., they return the truncation amount).
594A return value of zero indicates that the packet should be ignored. 609A return value of zero indicates that the packet should be ignored.
595The return value is either a constant 610The return value is either a constant
596.Sy ( BPF_K ) 611.Sy ( BPF_K )
597or the accumulator 612or the accumulator
598.Sy ( BPF_A ) . 613.Sy ( BPF_A ) .
599.Bl -column "BPF_RET+BPF_A" "accept A bytes" -offset indent 614.Bl -column "BPF_RET+BPF_A" "accept A bytes" -offset indent
600.It Sy BPF_RET+BPF_A Ta accept A bytes 615.It Sy BPF_RET+BPF_A Ta accept A bytes
601.It Sy BPF_RET+BPF_K Ta accept k bytes 616.It Sy BPF_RET+BPF_K Ta accept k bytes
602.El 617.El
603.It Sy BPF_MISC 618.It Sy BPF_MISC
604The miscellaneous category was created for anything that doesn't 619The miscellaneous category was created for anything that doesn't
605fit into the above classes, and for any new instructions that might need to 620fit into the above classes, and for any new instructions that might need to
606be added. 621be added.
607Currently, these are the register transfer instructions 622Currently, these are the register transfer instructions
608that copy the index register to the accumulator or vice versa. 623that copy the index register to the accumulator or vice versa.
609.Bl -column "BPF_MISC+BPF_TAX" "X \*[Lt]- A" -offset indent 624.Bl -column "BPF_MISC+BPF_TAX" "X \*[Lt]- A" -offset indent
610.It Sy BPF_MISC+BPF_TAX Ta X \*[Lt]- A 625.It Sy BPF_MISC+BPF_TAX Ta X \*[Lt]- A
611.It Sy BPF_MISC+BPF_TXA Ta A \*[Lt]- X 626.It Sy BPF_MISC+BPF_TXA Ta A \*[Lt]- X
612.El 627.El
613.El 628.El
614.Pp 629.Pp
615The BPF interface provides the following macros to facilitate 630The BPF interface provides the following macros to facilitate
616array initializers: 631array initializers:
617.Bd -unfilled -offset indent 632.Bd -unfilled -offset indent
618.Sy BPF_STMT No (opcode, operand) 633.Sy BPF_STMT No (opcode, operand)
619.Sy BPF_JUMP No (opcode, operand, true_offset, false_offset) 634.Sy BPF_JUMP No (opcode, operand, true_offset, false_offset)
620.Ed 635.Ed
621.Sh SYSCTLS 636.Sh SYSCTLS
622The following sysctls are available when 637The following sysctls are available when
623.Nm 638.Nm
624is enabled: 639is enabled:
625.Pp 640.Pp
626.Bl -tag -width "XnetXbpfXmaxbufsizeXX" 641.Bl -tag -width "XnetXbpfXmaxbufsizeXX"
627.It Li net.bpf.maxbufsize 642.It Li net.bpf.maxbufsize
628Sets the maximum buffer size available for 643Sets the maximum buffer size available for
629.Nm 644.Nm
630peers. 645peers.
631.It Li net.bpf.stats 646.It Li net.bpf.stats
632Shows 647Shows
633.Nm 648.Nm
634statistics. 649statistics.
635They can be retrieved with the 650They can be retrieved with the
636.Xr netstat 1 651.Xr netstat 1
637utility. 652utility.
638.It Li net.bpf.peers 653.It Li net.bpf.peers
639Shows the current 654Shows the current
640.Nm 655.Nm
641peers. 656peers.
642This is only available to the super user and can also be retrieved with the 657This is only available to the super user and can also be retrieved with the
643.Xr netstat 1 658.Xr netstat 1
644utility. 659utility.
645.El 660.El
646.Sh FILES 661.Sh FILES
647.Pa /dev/bpf 662.Pa /dev/bpf
648.Sh EXAMPLES 663.Sh EXAMPLES
649The following filter is taken from the Reverse ARP Daemon. 664The following filter is taken from the Reverse ARP Daemon.
650It accepts only Reverse ARP requests. 665It accepts only Reverse ARP requests.
651.Bd -literal -offset indent 666.Bd -literal -offset indent
652struct bpf_insn insns[] = { 667struct bpf_insn insns[] = {
653 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 668 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
654 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3), 669 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
655 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 670 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
656 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1), 671 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
657 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) + 672 BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
658 sizeof(struct ether_header)), 673 sizeof(struct ether_header)),
659 BPF_STMT(BPF_RET+BPF_K, 0), 674 BPF_STMT(BPF_RET+BPF_K, 0),
660}; 675};
661.Ed 676.Ed
662.Pp 677.Pp
663This filter accepts only IP packets between host 128.3.112.15 and 678This filter accepts only IP packets between host 128.3.112.15 and
664128.3.112.35. 679128.3.112.35.
665.Bd -literal -offset indent 680.Bd -literal -offset indent
666struct bpf_insn insns[] = { 681struct bpf_insn insns[] = {
667 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 682 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
668 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8), 683 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
669 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26), 684 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
670 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2), 685 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
671 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 686 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
672 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4), 687 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
673 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3), 688 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
674 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30), 689 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
675 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1), 690 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
676 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 691 BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
677 BPF_STMT(BPF_RET+BPF_K, 0), 692 BPF_STMT(BPF_RET+BPF_K, 0),
678}; 693};
679.Ed 694.Ed
680.Pp 695.Pp
681Finally, this filter returns only TCP finger packets. 696Finally, this filter returns only TCP finger packets.
682We must parse the IP header to reach the TCP header. 697We must parse the IP header to reach the TCP header.
683The 698The
684.Sy BPF_JSET 699.Sy BPF_JSET
685instruction checks that the IP fragment offset is 0 so we are sure 700instruction checks that the IP fragment offset is 0 so we are sure
686that we have a TCP header. 701that we have a TCP header.
687.Bd -literal -offset indent 702.Bd -literal -offset indent
688struct bpf_insn insns[] = { 703struct bpf_insn insns[] = {
689 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12), 704 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
690 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10), 705 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
691 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23), 706 BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
692 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8), 707 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
693 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20), 708 BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
694 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0), 709 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
695 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14), 710 BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
696 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14), 711 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
697 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0), 712 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
698 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16), 713 BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
699 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1), 714 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
700 BPF_STMT(BPF_RET+BPF_K, (u_int)-1), 715 BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
701 BPF_STMT(BPF_RET+BPF_K, 0), 716 BPF_STMT(BPF_RET+BPF_K, 0),
702}; 717};
703.Ed 718.Ed
704.Sh SEE ALSO 719.Sh SEE ALSO
705.Xr ioctl 2 , 720.Xr ioctl 2 ,
706.Xr read 2 , 721.Xr read 2 ,
707.Xr select 2 , 722.Xr select 2 ,
708.Xr signal 3 , 723.Xr signal 3 ,
709.Xr tcpdump 8 724.Xr tcpdump 8
710.Rs 725.Rs
711.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture" 726.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
712.%A S. McCanne 727.%A S. McCanne
713.%A V. Jacobson 728.%A V. Jacobson
714.%J Proceedings of the 1993 Winter USENIX 729.%J Proceedings of the 1993 Winter USENIX
715.%C Technical Conference, San Diego, CA 730.%C Technical Conference, San Diego, CA
716.Re 731.Re
717.Sh HISTORY 732.Sh HISTORY
718The Enet packet filter was created in 1980 by Mike Accetta and 733The Enet packet filter was created in 1980 by Mike Accetta and
719Rick Rashid at Carnegie-Mellon University. 734Rick Rashid at Carnegie-Mellon University.
720Jeffrey Mogul, at Stanford, ported the code to BSD and continued 735Jeffrey Mogul, at Stanford, ported the code to BSD and continued
721its development from 1983 on. 736its development from 1983 on.
722Since then, it has evolved into the ULTRIX Packet Filter 737Since then, it has evolved into the ULTRIX Packet Filter
723at DEC, a STREAMS NIT module under SunOS 4.1, and BPF. 738at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
724.Sh AUTHORS 739.Sh AUTHORS
725Steven McCanne, of Lawrence Berkeley Laboratory, implemented BPF in 740Steven McCanne, of Lawrence Berkeley Laboratory, implemented BPF in
726Summer 1990. 741Summer 1990.
727The design was in collaboration with Van Jacobson, 742The design was in collaboration with Van Jacobson,
728also of Lawrence Berkeley Laboratory. 743also of Lawrence Berkeley Laboratory.
729.Sh BUGS 744.Sh BUGS
730The read buffer must be of a fixed size (returned by the 745The read buffer must be of a fixed size (returned by the
731.Dv BIOCGBLEN 746.Dv BIOCGBLEN
732ioctl). 747ioctl).
733.Pp 748.Pp
734A file that does not request promiscuous mode may receive promiscuously 749A file that does not request promiscuous mode may receive promiscuously
735received packets as a side effect of another file requesting this 750received packets as a side effect of another file requesting this
736mode on the same hardware interface. 751mode on the same hardware interface.
737This could be fixed in the kernel with additional processing overhead. 752This could be fixed in the kernel with additional processing overhead.
738However, we favor the model where 753However, we favor the model where
739all files must assume that the interface is promiscuous, and if 754all files must assume that the interface is promiscuous, and if
740so desired, must use a filter to reject foreign packets. 755so desired, must use a filter to reject foreign packets.
741.Pp 756.Pp
742Data link protocols with variable length headers are not currently supported. 757Data link protocols with variable length headers are not currently supported.
743.Pp 758.Pp
744Under SunOS, if a BPF application reads more than 2^31 bytes of 759Under SunOS, if a BPF application reads more than 2^31 bytes of
745data, read will fail in 760data, read will fail in
746.Er EINVAL . 761.Er EINVAL .
747You can either fix the bug in SunOS, 762You can either fix the bug in SunOS,
748or lseek to 0 when read fails for this reason. 763or lseek to 0 when read fails for this reason.
749.Pp 764.Pp
750.Dq Immediate mode 765.Dq Immediate mode
751and the 766and the
752.Dq read timeout 767.Dq read timeout
753are misguided features. 768are misguided features.
754This functionality can be emulated with non-blocking mode and 769This functionality can be emulated with non-blocking mode and
755.Xr select 2 . 770.Xr select 2 .