Thu Mar 11 16:36:41 2021 UTC ()
improve wording.


(christos)
diff -r1.29 -r1.30 src/lib/libc/regex/regex.3

cvs diff -r1.29 -r1.30 src/lib/libc/regex/regex.3 (switch to unified diff)

--- src/lib/libc/regex/regex.3 2021/03/11 15:12:51 1.29
+++ src/lib/libc/regex/regex.3 2021/03/11 16:36:41 1.30
@@ -1,859 +1,861 @@ @@ -1,859 +1,861 @@
1.\" $NetBSD: regex.3,v 1.29 2021/03/11 15:12:51 christos Exp $ 1.\" $NetBSD: regex.3,v 1.30 2021/03/11 16:36:41 christos Exp $
2.\" 2.\"
3.\" Copyright (c) 1992, 1993, 1994 Henry Spencer. 3.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
4.\" Copyright (c) 1992, 1993, 1994 4.\" Copyright (c) 1992, 1993, 1994
5.\" The Regents of the University of California. All rights reserved. 5.\" The Regents of the University of California. All rights reserved.
6.\" 6.\"
7.\" This code is derived from software contributed to Berkeley by 7.\" This code is derived from software contributed to Berkeley by
8.\" Henry Spencer. 8.\" Henry Spencer.
9.\" 9.\"
10.\" Redistribution and use in source and binary forms, with or without 10.\" Redistribution and use in source and binary forms, with or without
11.\" modification, are permitted provided that the following conditions 11.\" modification, are permitted provided that the following conditions
12.\" are met: 12.\" are met:
13.\" 1. Redistributions of source code must retain the above copyright 13.\" 1. Redistributions of source code must retain the above copyright
14.\" notice, this list of conditions and the following disclaimer. 14.\" notice, this list of conditions and the following disclaimer.
15.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" 2. Redistributions in binary form must reproduce the above copyright
16.\" notice, this list of conditions and the following disclaimer in the 16.\" notice, this list of conditions and the following disclaimer in the
17.\" documentation and/or other materials provided with the distribution. 17.\" documentation and/or other materials provided with the distribution.
18.\" 3. Neither the name of the University nor the names of its contributors 18.\" 3. Neither the name of the University nor the names of its contributors
19.\" may be used to endorse or promote products derived from this software 19.\" may be used to endorse or promote products derived from this software
20.\" without specific prior written permission. 20.\" without specific prior written permission.
21.\" 21.\"
22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32.\" SUCH DAMAGE. 32.\" SUCH DAMAGE.
33.\" 33.\"
34.\" @(#)regex.3 8.4 (Berkeley) 3/20/94 34.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
35.\" $FreeBSD: head/lib/libc/regex/regex.3 363817 2020-08-04 02:06:49Z kevans $ 35.\" $FreeBSD: head/lib/libc/regex/regex.3 363817 2020-08-04 02:06:49Z kevans $
36.\" 36.\"
37.Dd March 11, 2021 37.Dd March 11, 2021
38.Dt REGEX 3 38.Dt REGEX 3
39.Os 39.Os
40.Sh NAME 40.Sh NAME
41.Nm regcomp , 41.Nm regcomp ,
42.Nm regexec , 42.Nm regexec ,
43.Nm regerror , 43.Nm regerror ,
44.Nm regfree , 44.Nm regfree ,
45.Nm regasub , 45.Nm regasub ,
46.Nm regnsub 46.Nm regnsub
47.Nd regular-expression library 47.Nd regular-expression library
48.Sh LIBRARY 48.Sh LIBRARY
49.Lb libc 49.Lb libc
50.Sh SYNOPSIS 50.Sh SYNOPSIS
51.In regex.h 51.In regex.h
52.Ft int 52.Ft int
53.Fo regcomp 53.Fo regcomp
54.Fa "regex_t * restrict preg" "const char * restrict pattern" "int cflags" 54.Fa "regex_t * restrict preg" "const char * restrict pattern" "int cflags"
55.Fc 55.Fc
56.Ft int 56.Ft int
57.Fo regexec 57.Fo regexec
58.Fa "const regex_t * restrict preg" "const char * restrict string" 58.Fa "const regex_t * restrict preg" "const char * restrict string"
59.Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags" 59.Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags"
60.Fc 60.Fc
61.Ft size_t 61.Ft size_t
62.Fo regerror 62.Fo regerror
63.Fa "int errcode" "const regex_t * restrict preg" 63.Fa "int errcode" "const regex_t * restrict preg"
64.Fa "char * restrict errbuf" "size_t errbuf_size" 64.Fa "char * restrict errbuf" "size_t errbuf_size"
65.Fc 65.Fc
66.Ft void 66.Ft void
67.Fn regfree "regex_t *preg" 67.Fn regfree "regex_t *preg"
68.Ft ssize_t 68.Ft ssize_t
69.Fn regnsub "char *buf" "size_t bufsiz" "const char *sub" "const regmatch_t *rm" "const char *str" 69.Fn regnsub "char *buf" "size_t bufsiz" "const char *sub" "const regmatch_t *rm" "const char *str"
70.Ft ssize_t 70.Ft ssize_t
71.Fn regasub "char **buf" "const char *sub" "const regmatch_t *rm" "const char *sstr" 71.Fn regasub "char **buf" "const char *sub" "const regmatch_t *rm" "const char *sstr"
72.Sh DESCRIPTION 72.Sh DESCRIPTION
73These routines implement 73These routines implement
74.St -p1003.2 74.St -p1003.2
75regular expressions 75regular expressions
76.Pq Do RE Dc Ns s ; 76.Pq Do RE Dc Ns s ;
77see 77see
78.Xr re_format 7 . 78.Xr re_format 7 .
79The 79The
80.Fn regcomp 80.Fn regcomp
81function 81function
82compiles an RE written as a string into an internal form, 82compiles an RE written as a string into an internal form,
83.Fn regexec 83.Fn regexec
84matches that internal form against a string and reports results, 84matches that internal form against a string and reports results,
85.Fn regerror 85.Fn regerror
86transforms error codes from either into human-readable messages, 86transforms error codes from either into human-readable messages,
87and 87and
88.Fn regfree 88.Fn regfree
89frees any dynamically-allocated storage used by the internal form 89frees any dynamically-allocated storage used by the internal form
90of an RE. 90of an RE.
91.Pp 91.Pp
92The header 92The header
93.In regex.h 93.In regex.h
94declares two structure types, 94declares two structure types,
95.Ft regex_t 95.Ft regex_t
96and 96and
97.Ft regmatch_t , 97.Ft regmatch_t ,
98the former for compiled internal forms and the latter for match reporting. 98the former for compiled internal forms and the latter for match reporting.
99It also declares the four functions, 99It also declares the four functions,
100a type 100a type
101.Ft regoff_t , 101.Ft regoff_t ,
102and a number of constants with names starting with 102and a number of constants with names starting with
103.Dq Dv REG_ . 103.Dq Dv REG_ .
104.Pp 104.Pp
105The 105The
106.Fn regcomp 106.Fn regcomp
107function 107function
108compiles the regular expression contained in the 108compiles the regular expression contained in the
109.Fa pattern 109.Fa pattern
110string, 110string,
111subject to the flags in 111subject to the flags in
112.Fa cflags , 112.Fa cflags ,
113and places the results in the 113and places the results in the
114.Ft regex_t 114.Ft regex_t
115structure pointed to by 115structure pointed to by
116.Fa preg . 116.Fa preg .
117The 117The
118.Fa cflags 118.Fa cflags
119argument 119argument
120is the bitwise OR of zero or more of the following flags: 120is the bitwise OR of zero or more of the following flags:
121.Bl -tag -width REG_EXTENDED 121.Bl -tag -width REG_EXTENDED
122.It Dv REG_EXTENDED 122.It Dv REG_EXTENDED
123Compile modern 123Compile modern
124.Pq Dq extended 124.Pq Dq extended
125REs, 125REs,
126rather than the obsolete 126rather than the obsolete
127.Pq Dq basic 127.Pq Dq basic
128REs that 128REs that
129are the default. 129are the default.
130.It Dv REG_BASIC 130.It Dv REG_BASIC
131This is a synonym for 0, 131This is a synonym for 0,
132provided as a counterpart to 132provided as a counterpart to
133.Dv REG_EXTENDED 133.Dv REG_EXTENDED
134to improve readability. 134to improve readability.
135.It Dv REG_NOSPEC 135.It Dv REG_NOSPEC
136Compile with recognition of all special characters turned off. 136Compile with recognition of all special characters turned off.
137All characters are thus considered ordinary, 137All characters are thus considered ordinary,
138so the 138so the
139.Dq RE 139.Dq RE
140is a literal string. 140is a literal string.
141This is an extension, 141This is an extension,
142compatible with but not specified by 142compatible with but not specified by
143.St -p1003.2 , 143.St -p1003.2 ,
144and should be used with 144and should be used with
145caution in software intended to be portable to other systems. 145caution in software intended to be portable to other systems.
146.Dv REG_EXTENDED 146.Dv REG_EXTENDED
147and 147and
148.Dv REG_NOSPEC 148.Dv REG_NOSPEC
149may not be used 149may not be used
150in the same call to 150in the same call to
151.Fn regcomp . 151.Fn regcomp .
152.It Dv REG_ICASE 152.It Dv REG_ICASE
153Compile for matching that ignores upper/lower case distinctions. 153Compile for matching that ignores upper/lower case distinctions.
154See 154See
155.Xr re_format 7 . 155.Xr re_format 7 .
156.It Dv REG_NOSUB 156.It Dv REG_NOSUB
157Compile for matching that need only report success or failure, 157Compile for matching that need only report success or failure,
158not what was matched. 158not what was matched.
159.It Dv REG_NEWLINE 159.It Dv REG_NEWLINE
160Compile for newline-sensitive matching. 160Compile for newline-sensitive matching.
161By default, newline is a completely ordinary character with no special 161By default, newline is a completely ordinary character with no special
162meaning in either REs or strings. 162meaning in either REs or strings.
163With this flag, 163With this flag,
164.Ql [^ 164.Ql [^
165bracket expressions and 165bracket expressions and
166.Ql .\& 166.Ql .\&
167never match newline, 167never match newline,
168a 168a
169.Ql ^\& 169.Ql ^\&
170anchor matches the null string after any newline in the string 170anchor matches the null string after any newline in the string
171in addition to its normal function, 171in addition to its normal function,
172and the 172and the
173.Ql $\& 173.Ql $\&
174anchor matches the null string before any newline in the 174anchor matches the null string before any newline in the
175string in addition to its normal function. 175string in addition to its normal function.
176.It Dv REG_PEND 176.It Dv REG_PEND
177The regular expression ends, 177The regular expression ends,
178not at the first NUL, 178not at the first NUL,
179but just before the character pointed to by the 179but just before the character pointed to by the
180.Va re_endp 180.Va re_endp
181member of the structure pointed to by 181member of the structure pointed to by
182.Fa preg . 182.Fa preg .
183The 183The
184.Va re_endp 184.Va re_endp
185member is of type 185member is of type
186.Ft "const char *" . 186.Ft "const char *" .
187This flag permits inclusion of NULs in the RE; 187This flag permits inclusion of NULs in the RE;
188they are considered ordinary characters. 188they are considered ordinary characters.
189This is an extension, 189This is an extension,
190compatible with but not specified by 190compatible with but not specified by
191.St -p1003.2 , 191.St -p1003.2 ,
192and should be used with 192and should be used with
193caution in software intended to be portable to other systems. 193caution in software intended to be portable to other systems.
194.It Dv REG_GNU 194.It Dv REG_GNU
195Include GNU-inspired extensions: 195Include GNU-inspired extensions:
196.Pp 196.Pp
197.Bl -tag -offset indent -width XX -compact  197.Bl -tag -offset indent -width XX -compact
198.It \eN 198.It \eN
199Use backreference 199Use backreference
200.Dv N 200.Dv N
201where 201where
202.Dv N 202.Dv N
203is between 203is a single digit number between
204.Dv [1-9] . 204.Dv 1
 205and
 206.Dv 9 .
205.It \ea 207.It \ea
206Visual Bell 208Visual Bell
207.It \eb 209.It \eb
208Match a position that is a word boundary. 210Match a position that is a word boundary.
209.It \eB 211.It \eB
210Match a position that is a not word boundary. 212Match a position that is a not word boundary.
211.It \ef 213.It \ef
212Form Feed 214Form Feed
213.It \en 215.It \en
214Line Feed 216Line Feed
215.It \er 217.It \er
216Carriage return 218Carriage return
217.It \es 219.It \es
218Alias for [[:space:]] 220Alias for [[:space:]]
219.It \eS 221.It \eS
220Alias for [^[:space:]] 222Alias for [^[:space:]]
221.It \et 223.It \et
222Horizontal Tab 224Horizontal Tab
223.It \ev 225.It \ev
224Vertical Tab 226Vertical Tab
225.It \ew 227.It \ew
226Alias for [[:alnum:]] 228Alias for [[:alnum:]]
227.It \eW 229.It \eW
228Alias for [^[:alnum:]] 230Alias for [^[:alnum:]]
229.It \e' 231.It \e'
230Matches the end of the subject. 232Matches the end of the subject string (the string to be matched).
231.It \e` 233.It \e`
232Matches the beginning of the subject. 234Matches the beginning of the subject string.
233.El 235.El
234.Pp 236.Pp
235This is an extension, 237This is an extension,
236compatible with but not specified by 238compatible with but not specified by
237.St -p1003.2 , 239.St -p1003.2 ,
238and should be used with 240and should be used with
239caution in software intended to be portable to other systems. 241caution in software intended to be portable to other systems.
240.El 242.El
241.Pp 243.Pp
242When successful, 244When successful,
243.Fn regcomp 245.Fn regcomp
244returns 0 and fills in the structure pointed to by 246returns 0 and fills in the structure pointed to by
245.Fa preg . 247.Fa preg .
246One member of that structure 248One member of that structure
247(other than 249(other than
248.Va re_endp ) 250.Va re_endp )
249is publicized: 251is publicized:
250.Va re_nsub , 252.Va re_nsub ,
251of type 253of type
252.Ft size_t , 254.Ft size_t ,
253contains the number of parenthesized subexpressions within the RE 255contains the number of parenthesized subexpressions within the RE
254(except that the value of this member is undefined if the 256(except that the value of this member is undefined if the
255.Dv REG_NOSUB 257.Dv REG_NOSUB
256flag was used). 258flag was used).
257If 259If
258.Fn regcomp 260.Fn regcomp
259fails, it returns a non-zero error code; 261fails, it returns a non-zero error code;
260see 262see
261.Sx DIAGNOSTICS . 263.Sx DIAGNOSTICS .
262.Pp 264.Pp
263The 265The
264.Fn regexec 266.Fn regexec
265function 267function
266matches the compiled RE pointed to by 268matches the compiled RE pointed to by
267.Fa preg 269.Fa preg
268against the 270against the
269.Fa string , 271.Fa string ,
270subject to the flags in 272subject to the flags in
271.Fa eflags , 273.Fa eflags ,
272and reports results using 274and reports results using
273.Fa nmatch , 275.Fa nmatch ,
274.Fa pmatch , 276.Fa pmatch ,
275and the returned value. 277and the returned value.
276The RE must have been compiled by a previous invocation of 278The RE must have been compiled by a previous invocation of
277.Fn regcomp . 279.Fn regcomp .
278The compiled form is not altered during execution of 280The compiled form is not altered during execution of
279.Fn regexec , 281.Fn regexec ,
280so a single compiled RE can be used simultaneously by multiple threads. 282so a single compiled RE can be used simultaneously by multiple threads.
281.Pp 283.Pp
282By default, 284By default,
283the NUL-terminated string pointed to by 285the NUL-terminated string pointed to by
284.Fa string 286.Fa string
285is considered to be the text of an entire line, minus any terminating 287is considered to be the text of an entire line, minus any terminating
286newline. 288newline.
287The 289The
288.Fa eflags 290.Fa eflags
289argument is the bitwise OR of zero or more of the following flags: 291argument is the bitwise OR of zero or more of the following flags:
290.Bl -tag -width REG_STARTEND 292.Bl -tag -width REG_STARTEND
291.It Dv REG_NOTBOL 293.It Dv REG_NOTBOL
292The first character of the string is treated as the continuation 294The first character of the string is treated as the continuation
293of a line. 295of a line.
294This means that the anchors 296This means that the anchors
295.Ql ^\& , 297.Ql ^\& ,
296.Ql [[:<:]] , 298.Ql [[:<:]] ,
297and 299and
298.Ql \e< 300.Ql \e<
299do not match before it; but see 301do not match before it; but see
300.Dv REG_STARTEND 302.Dv REG_STARTEND
301below. 303below.
302This does not affect the behavior of newlines under 304This does not affect the behavior of newlines under
303.Dv REG_NEWLINE . 305.Dv REG_NEWLINE .
304.It Dv REG_NOTEOL 306.It Dv REG_NOTEOL
305The NUL terminating 307The NUL terminating
306the string 308the string
307does not end a line, so the 309does not end a line, so the
308.Ql $\& 310.Ql $\&
309anchor does not match before it. 311anchor does not match before it.
310This does not affect the behavior of newlines under 312This does not affect the behavior of newlines under
311.Dv REG_NEWLINE . 313.Dv REG_NEWLINE .
312.It Dv REG_STARTEND 314.It Dv REG_STARTEND
313The string is considered to start at 315The string is considered to start at
314.Fa string No + 316.Fa string No +
315.Fa pmatch Ns [0]. Ns Fa rm_so 317.Fa pmatch Ns [0]. Ns Fa rm_so
316and to end before the byte located at 318and to end before the byte located at
317.Fa string No + 319.Fa string No +
318.Fa pmatch Ns [0]. Ns Fa rm_eo , 320.Fa pmatch Ns [0]. Ns Fa rm_eo ,
319regardless of the value of 321regardless of the value of
320.Fa nmatch . 322.Fa nmatch .
321See below for the definition of 323See below for the definition of
322.Fa pmatch 324.Fa pmatch
323and 325and
324.Fa nmatch . 326.Fa nmatch .
325This is an extension, 327This is an extension,
326compatible with but not specified by 328compatible with but not specified by
327.St -p1003.2 , 329.St -p1003.2 ,
328and should be used with 330and should be used with
329caution in software intended to be portable to other systems. 331caution in software intended to be portable to other systems.
330.Pp 332.Pp
331Without 333Without
332.Dv REG_NOTBOL , 334.Dv REG_NOTBOL ,
333the position 335the position
334.Fa rm_so 336.Fa rm_so
335is considered the beginning of a line, such that 337is considered the beginning of a line, such that
336.Ql ^ 338.Ql ^
337matches before it, and the beginning of a word if there is a word 339matches before it, and the beginning of a word if there is a word
338character at this position, such that 340character at this position, such that
339.Ql [[:<:]] 341.Ql [[:<:]]
340and 342and
341.Ql \e< 343.Ql \e<
342match before it. 344match before it.
343.Pp 345.Pp
344With 346With
345.Dv REG_NOTBOL , 347.Dv REG_NOTBOL ,
346the character at position 348the character at position
347.Fa rm_so 349.Fa rm_so
348is treated as the continuation of a line, and if 350is treated as the continuation of a line, and if
349.Fa rm_so 351.Fa rm_so
350is greater than 0, the preceding character is taken into consideration. 352is greater than 0, the preceding character is taken into consideration.
351If the preceding character is a newline and the regular expression was compiled 353If the preceding character is a newline and the regular expression was compiled
352with 354with
353.Dv REG_NEWLINE , 355.Dv REG_NEWLINE ,
354.Ql ^ 356.Ql ^
355matches before the string; if the preceding character is not a word character 357matches before the string; if the preceding character is not a word character
356but the string starts with a word character, 358but the string starts with a word character,
357.Ql [[:<:]] 359.Ql [[:<:]]
358and 360and
359.Ql \e< 361.Ql \e<
360match before the string. 362match before the string.
361.El 363.El
362.Pp 364.Pp
363See 365See
364.Xr re_format 7 366.Xr re_format 7
365for a discussion of what is matched in situations where an RE or a 367for a discussion of what is matched in situations where an RE or a
366portion thereof could match any of several substrings of 368portion thereof could match any of several substrings of
367.Fa string . 369.Fa string .
368.Pp 370.Pp
369Normally, 371Normally,
370.Fn regexec 372.Fn regexec
371returns 0 for success and the non-zero code 373returns 0 for success and the non-zero code
372.Dv REG_NOMATCH 374.Dv REG_NOMATCH
373for failure. 375for failure.
374Other non-zero error codes may be returned in exceptional situations; 376Other non-zero error codes may be returned in exceptional situations;
375see 377see
376.Sx DIAGNOSTICS . 378.Sx DIAGNOSTICS .
377.Pp 379.Pp
378If 380If
379.Dv REG_NOSUB 381.Dv REG_NOSUB
380was specified in the compilation of the RE, 382was specified in the compilation of the RE,
381or if 383or if
382.Fa nmatch 384.Fa nmatch
383is 0, 385is 0,
384.Fn regexec 386.Fn regexec
385ignores the 387ignores the
386.Fa pmatch 388.Fa pmatch
387argument (but see below for the case where 389argument (but see below for the case where
388.Dv REG_STARTEND 390.Dv REG_STARTEND
389is specified). 391is specified).
390Otherwise, 392Otherwise,
391.Fa pmatch 393.Fa pmatch
392points to an array of 394points to an array of
393.Fa nmatch 395.Fa nmatch
394structures of type 396structures of type
395.Ft regmatch_t . 397.Ft regmatch_t .
396Such a structure has at least the members 398Such a structure has at least the members
397.Va rm_so 399.Va rm_so
398and 400and
399.Va rm_eo , 401.Va rm_eo ,
400both of type 402both of type
401.Ft regoff_t 403.Ft regoff_t
402(a signed arithmetic type at least as large as an 404(a signed arithmetic type at least as large as an
403.Ft off_t 405.Ft off_t
404and a 406and a
405.Ft ssize_t ) , 407.Ft ssize_t ) ,
406containing respectively the offset of the first character of a substring 408containing respectively the offset of the first character of a substring
407and the offset of the first character after the end of the substring. 409and the offset of the first character after the end of the substring.
408Offsets are measured from the beginning of the 410Offsets are measured from the beginning of the
409.Fa string 411.Fa string
410argument given to 412argument given to
411.Fn regexec . 413.Fn regexec .
412An empty substring is denoted by equal offsets, 414An empty substring is denoted by equal offsets,
413both indicating the character following the empty substring. 415both indicating the character following the empty substring.
414.Pp 416.Pp
415The 0th member of the 417The 0th member of the
416.Fa pmatch 418.Fa pmatch
417array is filled in to indicate what substring of 419array is filled in to indicate what substring of
418.Fa string 420.Fa string
419was matched by the entire RE. 421was matched by the entire RE.
420Remaining members report what substring was matched by parenthesized 422Remaining members report what substring was matched by parenthesized
421subexpressions within the RE; 423subexpressions within the RE;
422member 424member
423.Va i 425.Va i
424reports subexpression 426reports subexpression
425.Va i , 427.Va i ,
426with subexpressions counted (starting at 1) by the order of their opening 428with subexpressions counted (starting at 1) by the order of their opening
427parentheses in the RE, left to right. 429parentheses in the RE, left to right.
428Unused entries in the array (corresponding either to subexpressions that 430Unused entries in the array (corresponding either to subexpressions that
429did not participate in the match at all, or to subexpressions that do not 431did not participate in the match at all, or to subexpressions that do not
430exist in the RE (that is, 432exist in the RE (that is,
431.Va i 433.Va i
432> 434>
433.Fa preg Ns -> Ns Va re_nsub ) ) 435.Fa preg Ns -> Ns Va re_nsub ) )
434have both 436have both
435.Va rm_so 437.Va rm_so
436and 438and
437.Va rm_eo 439.Va rm_eo
438set to -1. 440set to -1.
439If a subexpression participated in the match several times, 441If a subexpression participated in the match several times,
440the reported substring is the last one it matched. 442the reported substring is the last one it matched.
441(Note, as an example in particular, that when the RE 443(Note, as an example in particular, that when the RE
442.Ql "(b*)+" 444.Ql "(b*)+"
443matches 445matches
444.Ql bbb , 446.Ql bbb ,
445the parenthesized subexpression matches each of the three 447the parenthesized subexpression matches each of the three
446.So Li b Sc Ns s 448.So Li b Sc Ns s
447and then 449and then
448an infinite number of empty strings following the last 450an infinite number of empty strings following the last
449.Ql b , 451.Ql b ,
450so the reported substring is one of the empties.) 452so the reported substring is one of the empties.)
451.Pp 453.Pp
452If 454If
453.Dv REG_STARTEND 455.Dv REG_STARTEND
454is specified, 456is specified,
455.Fa pmatch 457.Fa pmatch
456must point to at least one 458must point to at least one
457.Ft regmatch_t 459.Ft regmatch_t
458(even if 460(even if
459.Fa nmatch 461.Fa nmatch
460is 0 or 462is 0 or
461.Dv REG_NOSUB 463.Dv REG_NOSUB
462was specified), 464was specified),
463to hold the input offsets for 465to hold the input offsets for
464.Dv REG_STARTEND . 466.Dv REG_STARTEND .
465Use for output is still entirely controlled by 467Use for output is still entirely controlled by
466.Fa nmatch ; 468.Fa nmatch ;
467if 469if
468.Fa nmatch 470.Fa nmatch
469is 0 or 471is 0 or
470.Dv REG_NOSUB 472.Dv REG_NOSUB
471was specified, 473was specified,
472the value of 474the value of
473.Fa pmatch Ns [0] 475.Fa pmatch Ns [0]
474will not be changed by a successful 476will not be changed by a successful
475.Fn regexec . 477.Fn regexec .
476.Pp 478.Pp
477The 479The
478.Fn regerror 480.Fn regerror
479function 481function
480maps a non-zero 482maps a non-zero
481.Fa errcode 483.Fa errcode
482from either 484from either
483.Fn regcomp 485.Fn regcomp
484or 486or
485.Fn regexec 487.Fn regexec
486to a human-readable, printable message. 488to a human-readable, printable message.
487If 489If
488.Fa preg 490.Fa preg
489is 491is
490.No non\- Ns Dv NULL , 492.No non\- Ns Dv NULL ,
491the error code should have arisen from use of 493the error code should have arisen from use of
492the 494the
493.Ft regex_t 495.Ft regex_t
494pointed to by 496pointed to by
495.Fa preg , 497.Fa preg ,
496and if the error code came from 498and if the error code came from
497.Fn regcomp , 499.Fn regcomp ,
498it should have been the result from the most recent 500it should have been the result from the most recent
499.Fn regcomp 501.Fn regcomp
500using that 502using that
501.Ft regex_t . 503.Ft regex_t .
502The 504The
503.Po 505.Po
504.Fn regerror 506.Fn regerror
505may be able to supply a more detailed message using information 507may be able to supply a more detailed message using information
506from the 508from the
507.Ft regex_t . 509.Ft regex_t .
508.Pc 510.Pc
509The 511The
510.Fn regerror 512.Fn regerror
511function 513function
512places the NUL-terminated message into the buffer pointed to by 514places the NUL-terminated message into the buffer pointed to by
513.Fa errbuf , 515.Fa errbuf ,
514limiting the length (including the NUL) to at most 516limiting the length (including the NUL) to at most
515.Fa errbuf_size 517.Fa errbuf_size
516bytes. 518bytes.
517If the whole message will not fit, 519If the whole message will not fit,
518as much of it as will fit before the terminating NUL is supplied. 520as much of it as will fit before the terminating NUL is supplied.
519In any case, 521In any case,
520the returned value is the size of buffer needed to hold the whole 522the returned value is the size of buffer needed to hold the whole
521message (including terminating NUL). 523message (including terminating NUL).
522If 524If
523.Fa errbuf_size 525.Fa errbuf_size
524is 0, 526is 0,
525.Fa errbuf 527.Fa errbuf
526is ignored but the return value is still correct. 528is ignored but the return value is still correct.
527.Pp 529.Pp
528If the 530If the
529.Fa errcode 531.Fa errcode
530given to 532given to
531.Fn regerror 533.Fn regerror
532is first ORed with 534is first ORed with
533.Dv REG_ITOA , 535.Dv REG_ITOA ,
534the 536the
535.Dq message 537.Dq message
536that results is the printable name of the error code, 538that results is the printable name of the error code,
537e.g.\& 539e.g.\&
538.Dq Dv REG_NOMATCH , 540.Dq Dv REG_NOMATCH ,
539rather than an explanation thereof. 541rather than an explanation thereof.
540If 542If
541.Fa errcode 543.Fa errcode
542is 544is
543.Dv REG_ATOI , 545.Dv REG_ATOI ,
544then 546then
545.Fa preg 547.Fa preg
546shall be 548shall be
547.No non\- Ns Dv NULL 549.No non\- Ns Dv NULL
548and the 550and the
549.Va re_endp 551.Va re_endp
550member of the structure it points to 552member of the structure it points to
551must point to the printable name of an error code; 553must point to the printable name of an error code;
552in this case, the result in 554in this case, the result in
553.Fa errbuf 555.Fa errbuf
554is the decimal digits of 556is the decimal digits of
555the numeric value of the error code 557the numeric value of the error code
556(0 if the name is not recognized). 558(0 if the name is not recognized).
557.Dv REG_ITOA 559.Dv REG_ITOA
558and 560and
559.Dv REG_ATOI 561.Dv REG_ATOI
560are intended primarily as debugging facilities; 562are intended primarily as debugging facilities;
561they are extensions, 563they are extensions,
562compatible with but not specified by 564compatible with but not specified by
563.St -p1003.2 , 565.St -p1003.2 ,
564and should be used with 566and should be used with
565caution in software intended to be portable to other systems. 567caution in software intended to be portable to other systems.
566Be warned also that they are considered experimental and changes are possible. 568Be warned also that they are considered experimental and changes are possible.
567.Pp 569.Pp
568The 570The
569.Fn regfree 571.Fn regfree
570function 572function
571frees any dynamically-allocated storage associated with the compiled RE 573frees any dynamically-allocated storage associated with the compiled RE
572pointed to by 574pointed to by
573.Fa preg . 575.Fa preg .
574The remaining 576The remaining
575.Ft regex_t 577.Ft regex_t
576is no longer a valid compiled RE 578is no longer a valid compiled RE
577and the effect of supplying it to 579and the effect of supplying it to
578.Fn regexec 580.Fn regexec
579or 581or
580.Fn regerror 582.Fn regerror
581is undefined. 583is undefined.
582.Pp 584.Pp
583None of these functions references global variables except for tables 585None of these functions references global variables except for tables
584of constants; 586of constants;
585all are safe for use from multiple threads if the arguments are safe. 587all are safe for use from multiple threads if the arguments are safe.
586.Pp 588.Pp
587The 589The
588.Fn regnsub 590.Fn regnsub
589and 591and
590.Fn regasub 592.Fn regasub
591functions perform substitutions using 593functions perform substitutions using
592.Xr sed 1 594.Xr sed 1
593like syntax. 595like syntax.
594They return the length of the string that would have been created 596They return the length of the string that would have been created
595if there was enough space or 597if there was enough space or
596.Dv \-1 598.Dv \-1
597on error, setting 599on error, setting
598.Dv errno . 600.Dv errno .
599The result 601The result
600is being placed in 602is being placed in
601.Fa buf 603.Fa buf
602which is user-supplied in 604which is user-supplied in
603.Fn regnsub 605.Fn regnsub
604and dynamically allocated in 606and dynamically allocated in
605.Fn regasub . 607.Fn regasub .
606The 608The
607.Fa sub 609.Fa sub
608argument contains a substitution string which might refer to the first 610argument contains a substitution string which might refer to the first
6099 regular expression strings using 6119 regular expression strings using
610.Dq \e<n> 612.Dq \e<n>
611to refer to the nth matched 613to refer to the nth matched
612item, or 614item, or
613.Dq & 615.Dq &
614(which is equivalent to 616(which is equivalent to
615.Dq \e0 ) 617.Dq \e0 )
616to refer to the full match. 618to refer to the full match.
617The 619The
618.Fa rm 620.Fa rm
619array must be at least 10 elements long, and should contain the result 621array must be at least 10 elements long, and should contain the result
620of the matches from a previous 622of the matches from a previous
621.Fn regexec 623.Fn regexec
622call. 624call.
623Only 10 elements of the 625Only 10 elements of the
624.Fa rm 626.Fa rm
625array can be used. 627array can be used.
626The 628The
627.Fa str 629.Fa str
628argument contains the source string to apply the transformation to. 630argument contains the source string to apply the transformation to.
629.Sh IMPLEMENTATION CHOICES 631.Sh IMPLEMENTATION CHOICES
630There are a number of decisions that 632There are a number of decisions that
631.St -p1003.2 633.St -p1003.2
632leaves up to the implementor, 634leaves up to the implementor,
633either by explicitly saying 635either by explicitly saying
634.Dq undefined 636.Dq undefined
635or by virtue of them being 637or by virtue of them being
636forbidden by the RE grammar. 638forbidden by the RE grammar.
637This implementation treats them as follows. 639This implementation treats them as follows.
638.Pp 640.Pp
639See 641See
640.Xr re_format 7 642.Xr re_format 7
641for a discussion of the definition of case-independent matching. 643for a discussion of the definition of case-independent matching.
642.Pp 644.Pp
643There is no particular limit on the length of REs, 645There is no particular limit on the length of REs,
644except insofar as memory is limited. 646except insofar as memory is limited.
645Memory usage is approximately linear in RE size, and largely insensitive 647Memory usage is approximately linear in RE size, and largely insensitive
646to RE complexity, except for bounded repetitions. 648to RE complexity, except for bounded repetitions.
647See 649See
648.Sx BUGS 650.Sx BUGS
649for one short RE using them 651for one short RE using them
650that will run almost any system out of memory. 652that will run almost any system out of memory.
651.Pp 653.Pp
652A backslashed character other than one specifically given a magic meaning 654A backslashed character other than one specifically given a magic meaning
653by 655by
654.St -p1003.2 656.St -p1003.2
655(such magic meanings occur only in obsolete 657(such magic meanings occur only in obsolete
656.Bq Dq basic 658.Bq Dq basic
657REs) 659REs)
658is taken as an ordinary character. 660is taken as an ordinary character.
659.Pp 661.Pp
660Any unmatched 662Any unmatched
661.Ql [\& 663.Ql [\&
662is a 664is a
663.Dv REG_EBRACK 665.Dv REG_EBRACK
664error. 666error.
665.Pp 667.Pp
666Equivalence classes cannot begin or end bracket-expression ranges. 668Equivalence classes cannot begin or end bracket-expression ranges.
667The endpoint of one range cannot begin another. 669The endpoint of one range cannot begin another.
668.Pp 670.Pp
669.Dv RE_DUP_MAX , 671.Dv RE_DUP_MAX ,
670the limit on repetition counts in bounded repetitions, is 255. 672the limit on repetition counts in bounded repetitions, is 255.
671.Pp 673.Pp
672A repetition operator 674A repetition operator
673.Ql ( ?\& , 675.Ql ( ?\& ,
674.Ql *\& , 676.Ql *\& ,
675.Ql +\& , 677.Ql +\& ,
676or bounds) 678or bounds)
677cannot follow another 679cannot follow another
678repetition operator. 680repetition operator.
679A repetition operator cannot begin an expression or subexpression 681A repetition operator cannot begin an expression or subexpression
680or follow 682or follow
681.Ql ^\& 683.Ql ^\&
682or 684or
683.Ql |\& . 685.Ql |\& .
684.Pp 686.Pp
685.Ql |\& 687.Ql |\&
686cannot appear first or last in a (sub)expression or after another 688cannot appear first or last in a (sub)expression or after another
687.Ql |\& , 689.Ql |\& ,
688i.e., an operand of 690i.e., an operand of
689.Ql |\& 691.Ql |\&
690cannot be an empty subexpression. 692cannot be an empty subexpression.
691An empty parenthesized subexpression, 693An empty parenthesized subexpression,
692.Ql "()" , 694.Ql "()" ,
693is legal and matches an 695is legal and matches an
694empty (sub)string. 696empty (sub)string.
695An empty string is not a legal RE. 697An empty string is not a legal RE.
696.Pp 698.Pp
697A 699A
698.Ql {\& 700.Ql {\&
699followed by a digit is considered the beginning of bounds for a 701followed by a digit is considered the beginning of bounds for a
700bounded repetition, which must then follow the syntax for bounds. 702bounded repetition, which must then follow the syntax for bounds.
701A 703A
702.Ql {\& 704.Ql {\&
703.Em not 705.Em not
704followed by a digit is considered an ordinary character. 706followed by a digit is considered an ordinary character.
705.Pp 707.Pp
706.Ql ^\& 708.Ql ^\&
707and 709and
708.Ql $\& 710.Ql $\&
709beginning and ending subexpressions in obsolete 711beginning and ending subexpressions in obsolete
710.Pq Dq basic 712.Pq Dq basic
711REs are anchors, not ordinary characters. 713REs are anchors, not ordinary characters.
712.Sh DIAGNOSTICS 714.Sh DIAGNOSTICS
713Non-zero error codes from 715Non-zero error codes from
714.Fn regcomp 716.Fn regcomp
715and 717and
716.Fn regexec 718.Fn regexec
717include the following: 719include the following:
718.Pp 720.Pp
719.Bl -tag -width REG_ECOLLATE -compact 721.Bl -tag -width REG_ECOLLATE -compact
720.It Dv REG_NOMATCH 722.It Dv REG_NOMATCH
721The 723The
722.Fn regexec 724.Fn regexec
723function 725function
724failed to match 726failed to match
725.It Dv REG_BADPAT 727.It Dv REG_BADPAT
726invalid regular expression 728invalid regular expression
727.It Dv REG_ECOLLATE 729.It Dv REG_ECOLLATE
728invalid collating element 730invalid collating element
729.It Dv REG_ECTYPE 731.It Dv REG_ECTYPE
730invalid character class 732invalid character class
731.It Dv REG_EESCAPE 733.It Dv REG_EESCAPE
732.Ql \e 734.Ql \e
733applied to unescapable character 735applied to unescapable character
734.It Dv REG_ESUBREG 736.It Dv REG_ESUBREG
735invalid backreference number 737invalid backreference number
736.It Dv REG_EBRACK 738.It Dv REG_EBRACK
737brackets 739brackets
738.Ql "[ ]" 740.Ql "[ ]"
739not balanced 741not balanced
740.It Dv REG_EPAREN 742.It Dv REG_EPAREN
741parentheses 743parentheses
742.Ql "( )" 744.Ql "( )"
743not balanced 745not balanced
744.It Dv REG_EBRACE 746.It Dv REG_EBRACE
745braces 747braces
746.Ql "{ }" 748.Ql "{ }"
747not balanced 749not balanced
748.It Dv REG_BADBR 750.It Dv REG_BADBR
749invalid repetition count(s) in 751invalid repetition count(s) in
750.Ql "{ }" 752.Ql "{ }"
751.It Dv REG_ERANGE 753.It Dv REG_ERANGE
752invalid character range in 754invalid character range in
753.Ql "[ ]" 755.Ql "[ ]"
754.It Dv REG_ESPACE 756.It Dv REG_ESPACE
755ran out of memory 757ran out of memory
756.It Dv REG_BADRPT 758.It Dv REG_BADRPT
757.Ql ?\& , 759.Ql ?\& ,
758.Ql *\& , 760.Ql *\& ,
759or 761or
760.Ql +\& 762.Ql +\&
761operand invalid 763operand invalid
762.It Dv REG_EMPTY 764.It Dv REG_EMPTY
763empty (sub)expression 765empty (sub)expression
764.It Dv REG_ASSERT 766.It Dv REG_ASSERT
765cannot happen - you found a bug 767cannot happen - you found a bug
766.It Dv REG_INVARG 768.It Dv REG_INVARG
767invalid argument, e.g.\& negative-length string 769invalid argument, e.g.\& negative-length string
768.It Dv REG_ILLSEQ 770.It Dv REG_ILLSEQ
769illegal byte sequence (bad multibyte character) 771illegal byte sequence (bad multibyte character)
770.El 772.El
771.Sh SEE ALSO 773.Sh SEE ALSO
772.Xr grep 1 , 774.Xr grep 1 ,
773.Xr re_format 7 775.Xr re_format 7
774.Pp 776.Pp
775.St -p1003.2 , 777.St -p1003.2 ,
776sections 2.8 (Regular Expression Notation) 778sections 2.8 (Regular Expression Notation)
777and 779and
778B.5 (C Binding for Regular Expression Matching). 780B.5 (C Binding for Regular Expression Matching).
779.Sh HISTORY 781.Sh HISTORY
780Originally written by 782Originally written by
781.An Henry Spencer . 783.An Henry Spencer .
782Altered for inclusion in the 784Altered for inclusion in the
783.Bx 4.4 785.Bx 4.4
784distribution. 786distribution.
785.Pp 787.Pp
786The 788The
787.Fn regnsub 789.Fn regnsub
788and 790and
789.Fn regasub 791.Fn regasub
790functions appeared in 792functions appeared in
791.Nx 8 . 793.Nx 8 .
792.Sh BUGS 794.Sh BUGS
793This is an alpha release with known defects. 795This is an alpha release with known defects.
794Please report problems. 796Please report problems.
795.Pp 797.Pp
796The back-reference code is subtle and doubts linger about its correctness 798The back-reference code is subtle and doubts linger about its correctness
797in complex cases. 799in complex cases.
798.Pp 800.Pp
799The 801The
800.Fn regexec 802.Fn regexec
801function 803function
802performance is poor. 804performance is poor.
803This will improve with later releases. 805This will improve with later releases.
804The 806The
805.Fa nmatch 807.Fa nmatch
806argument 808argument
807exceeding 0 is expensive; 809exceeding 0 is expensive;
808.Fa nmatch 810.Fa nmatch
809exceeding 1 is worse. 811exceeding 1 is worse.
810The 812The
811.Fn regexec 813.Fn regexec
812function 814function
813is largely insensitive to RE complexity 815is largely insensitive to RE complexity
814.Em except 816.Em except
815that back 817that back
816references are massively expensive. 818references are massively expensive.
817RE length does matter; in particular, there is a strong speed bonus 819RE length does matter; in particular, there is a strong speed bonus
818for keeping RE length under about 30 characters, 820for keeping RE length under about 30 characters,
819with most special characters counting roughly double. 821with most special characters counting roughly double.
820.Pp 822.Pp
821The 823The
822.Fn regcomp 824.Fn regcomp
823function 825function
824implements bounded repetitions by macro expansion, 826implements bounded repetitions by macro expansion,
825which is costly in time and space if counts are large 827which is costly in time and space if counts are large
826or bounded repetitions are nested. 828or bounded repetitions are nested.
827An RE like, say, 829An RE like, say,
828.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}" 830.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}"
829will (eventually) run almost any existing machine out of swap space. 831will (eventually) run almost any existing machine out of swap space.
830.Pp 832.Pp
831There are suspected problems with response to obscure error conditions. 833There are suspected problems with response to obscure error conditions.
832Notably, 834Notably,
833certain kinds of internal overflow, 835certain kinds of internal overflow,
834produced only by truly enormous REs or by multiply nested bounded repetitions, 836produced only by truly enormous REs or by multiply nested bounded repetitions,
835are probably not handled well. 837are probably not handled well.
836.Pp 838.Pp
837Due to a mistake in 839Due to a mistake in
838.St -p1003.2 , 840.St -p1003.2 ,
839things like 841things like
840.Ql "a)b" 842.Ql "a)b"
841are legal REs because 843are legal REs because
842.Ql )\& 844.Ql )\&
843is 845is
844a special character only in the presence of a previous unmatched 846a special character only in the presence of a previous unmatched
845.Ql (\& . 847.Ql (\& .
846This cannot be fixed until the spec is fixed. 848This cannot be fixed until the spec is fixed.
847.Pp 849.Pp
848The standard's definition of back references is vague. 850The standard's definition of back references is vague.
849For example, does 851For example, does
850.Ql "a\e(\e(b\e)*\e2\e)*d" 852.Ql "a\e(\e(b\e)*\e2\e)*d"
851match 853match
852.Ql "abbbd" ? 854.Ql "abbbd" ?
853Until the standard is clarified, 855Until the standard is clarified,
854behavior in such cases should not be relied on. 856behavior in such cases should not be relied on.
855.Pp 857.Pp
856The implementation of word-boundary matching is a bit of a kludge, 858The implementation of word-boundary matching is a bit of a kludge,
857and bugs may lurk in combinations of word-boundary matching and anchoring. 859and bugs may lurk in combinations of word-boundary matching and anchoring.
858.Pp 860.Pp
859Word-boundary matching does not work properly in multibyte locales. 861Word-boundary matching does not work properly in multibyte locales.