Thu Mar 11 15:12:51 2021 UTC ()
Document the "C" language escapes supported in GNU mode.


(christos)
diff -r1.28 -r1.29 src/lib/libc/regex/regex.3

cvs diff -r1.28 -r1.29 src/lib/libc/regex/regex.3 (switch to unified diff)

--- src/lib/libc/regex/regex.3 2021/02/24 09:10:12 1.28
+++ src/lib/libc/regex/regex.3 2021/03/11 15:12:51 1.29
@@ -1,847 +1,859 @@ @@ -1,847 +1,859 @@
1.\" $NetBSD: regex.3,v 1.28 2021/02/24 09:10:12 wiz Exp $ 1.\" $NetBSD: regex.3,v 1.29 2021/03/11 15:12:51 christos Exp $
2.\" 2.\"
3.\" Copyright (c) 1992, 1993, 1994 Henry Spencer. 3.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
4.\" Copyright (c) 1992, 1993, 1994 4.\" Copyright (c) 1992, 1993, 1994
5.\" The Regents of the University of California. All rights reserved. 5.\" The Regents of the University of California. All rights reserved.
6.\" 6.\"
7.\" This code is derived from software contributed to Berkeley by 7.\" This code is derived from software contributed to Berkeley by
8.\" Henry Spencer. 8.\" Henry Spencer.
9.\" 9.\"
10.\" Redistribution and use in source and binary forms, with or without 10.\" Redistribution and use in source and binary forms, with or without
11.\" modification, are permitted provided that the following conditions 11.\" modification, are permitted provided that the following conditions
12.\" are met: 12.\" are met:
13.\" 1. Redistributions of source code must retain the above copyright 13.\" 1. Redistributions of source code must retain the above copyright
14.\" notice, this list of conditions and the following disclaimer. 14.\" notice, this list of conditions and the following disclaimer.
15.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" 2. Redistributions in binary form must reproduce the above copyright
16.\" notice, this list of conditions and the following disclaimer in the 16.\" notice, this list of conditions and the following disclaimer in the
17.\" documentation and/or other materials provided with the distribution. 17.\" documentation and/or other materials provided with the distribution.
18.\" 3. Neither the name of the University nor the names of its contributors 18.\" 3. Neither the name of the University nor the names of its contributors
19.\" may be used to endorse or promote products derived from this software 19.\" may be used to endorse or promote products derived from this software
20.\" without specific prior written permission. 20.\" without specific prior written permission.
21.\" 21.\"
22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32.\" SUCH DAMAGE. 32.\" SUCH DAMAGE.
33.\" 33.\"
34.\" @(#)regex.3 8.4 (Berkeley) 3/20/94 34.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
35.\" $FreeBSD: head/lib/libc/regex/regex.3 363817 2020-08-04 02:06:49Z kevans $ 35.\" $FreeBSD: head/lib/libc/regex/regex.3 363817 2020-08-04 02:06:49Z kevans $
36.\" 36.\"
37.Dd February 22, 2021 37.Dd March 11, 2021
38.Dt REGEX 3 38.Dt REGEX 3
39.Os 39.Os
40.Sh NAME 40.Sh NAME
41.Nm regcomp , 41.Nm regcomp ,
42.Nm regexec , 42.Nm regexec ,
43.Nm regerror , 43.Nm regerror ,
44.Nm regfree , 44.Nm regfree ,
45.Nm regasub , 45.Nm regasub ,
46.Nm regnsub 46.Nm regnsub
47.Nd regular-expression library 47.Nd regular-expression library
48.Sh LIBRARY 48.Sh LIBRARY
49.Lb libc 49.Lb libc
50.Sh SYNOPSIS 50.Sh SYNOPSIS
51.In regex.h 51.In regex.h
52.Ft int 52.Ft int
53.Fo regcomp 53.Fo regcomp
54.Fa "regex_t * restrict preg" "const char * restrict pattern" "int cflags" 54.Fa "regex_t * restrict preg" "const char * restrict pattern" "int cflags"
55.Fc 55.Fc
56.Ft int 56.Ft int
57.Fo regexec 57.Fo regexec
58.Fa "const regex_t * restrict preg" "const char * restrict string" 58.Fa "const regex_t * restrict preg" "const char * restrict string"
59.Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags" 59.Fa "size_t nmatch" "regmatch_t pmatch[restrict]" "int eflags"
60.Fc 60.Fc
61.Ft size_t 61.Ft size_t
62.Fo regerror 62.Fo regerror
63.Fa "int errcode" "const regex_t * restrict preg" 63.Fa "int errcode" "const regex_t * restrict preg"
64.Fa "char * restrict errbuf" "size_t errbuf_size" 64.Fa "char * restrict errbuf" "size_t errbuf_size"
65.Fc 65.Fc
66.Ft void 66.Ft void
67.Fn regfree "regex_t *preg" 67.Fn regfree "regex_t *preg"
68.Ft ssize_t 68.Ft ssize_t
69.Fn regnsub "char *buf" "size_t bufsiz" "const char *sub" "const regmatch_t *rm" "const char *str" 69.Fn regnsub "char *buf" "size_t bufsiz" "const char *sub" "const regmatch_t *rm" "const char *str"
70.Ft ssize_t 70.Ft ssize_t
71.Fn regasub "char **buf" "const char *sub" "const regmatch_t *rm" "const char *sstr" 71.Fn regasub "char **buf" "const char *sub" "const regmatch_t *rm" "const char *sstr"
72.Sh DESCRIPTION 72.Sh DESCRIPTION
73These routines implement 73These routines implement
74.St -p1003.2 74.St -p1003.2
75regular expressions 75regular expressions
76.Pq Do RE Dc Ns s ; 76.Pq Do RE Dc Ns s ;
77see 77see
78.Xr re_format 7 . 78.Xr re_format 7 .
79The 79The
80.Fn regcomp 80.Fn regcomp
81function 81function
82compiles an RE written as a string into an internal form, 82compiles an RE written as a string into an internal form,
83.Fn regexec 83.Fn regexec
84matches that internal form against a string and reports results, 84matches that internal form against a string and reports results,
85.Fn regerror 85.Fn regerror
86transforms error codes from either into human-readable messages, 86transforms error codes from either into human-readable messages,
87and 87and
88.Fn regfree 88.Fn regfree
89frees any dynamically-allocated storage used by the internal form 89frees any dynamically-allocated storage used by the internal form
90of an RE. 90of an RE.
91.Pp 91.Pp
92The header 92The header
93.In regex.h 93.In regex.h
94declares two structure types, 94declares two structure types,
95.Ft regex_t 95.Ft regex_t
96and 96and
97.Ft regmatch_t , 97.Ft regmatch_t ,
98the former for compiled internal forms and the latter for match reporting. 98the former for compiled internal forms and the latter for match reporting.
99It also declares the four functions, 99It also declares the four functions,
100a type 100a type
101.Ft regoff_t , 101.Ft regoff_t ,
102and a number of constants with names starting with 102and a number of constants with names starting with
103.Dq Dv REG_ . 103.Dq Dv REG_ .
104.Pp 104.Pp
105The 105The
106.Fn regcomp 106.Fn regcomp
107function 107function
108compiles the regular expression contained in the 108compiles the regular expression contained in the
109.Fa pattern 109.Fa pattern
110string, 110string,
111subject to the flags in 111subject to the flags in
112.Fa cflags , 112.Fa cflags ,
113and places the results in the 113and places the results in the
114.Ft regex_t 114.Ft regex_t
115structure pointed to by 115structure pointed to by
116.Fa preg . 116.Fa preg .
117The 117The
118.Fa cflags 118.Fa cflags
119argument 119argument
120is the bitwise OR of zero or more of the following flags: 120is the bitwise OR of zero or more of the following flags:
121.Bl -tag -width REG_EXTENDED 121.Bl -tag -width REG_EXTENDED
122.It Dv REG_EXTENDED 122.It Dv REG_EXTENDED
123Compile modern 123Compile modern
124.Pq Dq extended 124.Pq Dq extended
125REs, 125REs,
126rather than the obsolete 126rather than the obsolete
127.Pq Dq basic 127.Pq Dq basic
128REs that 128REs that
129are the default. 129are the default.
130.It Dv REG_BASIC 130.It Dv REG_BASIC
131This is a synonym for 0, 131This is a synonym for 0,
132provided as a counterpart to 132provided as a counterpart to
133.Dv REG_EXTENDED 133.Dv REG_EXTENDED
134to improve readability. 134to improve readability.
135.It Dv REG_NOSPEC 135.It Dv REG_NOSPEC
136Compile with recognition of all special characters turned off. 136Compile with recognition of all special characters turned off.
137All characters are thus considered ordinary, 137All characters are thus considered ordinary,
138so the 138so the
139.Dq RE 139.Dq RE
140is a literal string. 140is a literal string.
141This is an extension, 141This is an extension,
142compatible with but not specified by 142compatible with but not specified by
143.St -p1003.2 , 143.St -p1003.2 ,
144and should be used with 144and should be used with
145caution in software intended to be portable to other systems. 145caution in software intended to be portable to other systems.
146.Dv REG_EXTENDED 146.Dv REG_EXTENDED
147and 147and
148.Dv REG_NOSPEC 148.Dv REG_NOSPEC
149may not be used 149may not be used
150in the same call to 150in the same call to
151.Fn regcomp . 151.Fn regcomp .
152.It Dv REG_ICASE 152.It Dv REG_ICASE
153Compile for matching that ignores upper/lower case distinctions. 153Compile for matching that ignores upper/lower case distinctions.
154See 154See
155.Xr re_format 7 . 155.Xr re_format 7 .
156.It Dv REG_NOSUB 156.It Dv REG_NOSUB
157Compile for matching that need only report success or failure, 157Compile for matching that need only report success or failure,
158not what was matched. 158not what was matched.
159.It Dv REG_NEWLINE 159.It Dv REG_NEWLINE
160Compile for newline-sensitive matching. 160Compile for newline-sensitive matching.
161By default, newline is a completely ordinary character with no special 161By default, newline is a completely ordinary character with no special
162meaning in either REs or strings. 162meaning in either REs or strings.
163With this flag, 163With this flag,
164.Ql [^ 164.Ql [^
165bracket expressions and 165bracket expressions and
166.Ql .\& 166.Ql .\&
167never match newline, 167never match newline,
168a 168a
169.Ql ^\& 169.Ql ^\&
170anchor matches the null string after any newline in the string 170anchor matches the null string after any newline in the string
171in addition to its normal function, 171in addition to its normal function,
172and the 172and the
173.Ql $\& 173.Ql $\&
174anchor matches the null string before any newline in the 174anchor matches the null string before any newline in the
175string in addition to its normal function. 175string in addition to its normal function.
176.It Dv REG_PEND 176.It Dv REG_PEND
177The regular expression ends, 177The regular expression ends,
178not at the first NUL, 178not at the first NUL,
179but just before the character pointed to by the 179but just before the character pointed to by the
180.Va re_endp 180.Va re_endp
181member of the structure pointed to by 181member of the structure pointed to by
182.Fa preg . 182.Fa preg .
183The 183The
184.Va re_endp 184.Va re_endp
185member is of type 185member is of type
186.Ft "const char *" . 186.Ft "const char *" .
187This flag permits inclusion of NULs in the RE; 187This flag permits inclusion of NULs in the RE;
188they are considered ordinary characters. 188they are considered ordinary characters.
189This is an extension, 189This is an extension,
190compatible with but not specified by 190compatible with but not specified by
191.St -p1003.2 , 191.St -p1003.2 ,
192and should be used with 192and should be used with
193caution in software intended to be portable to other systems. 193caution in software intended to be portable to other systems.
194.It Dv REG_GNU 194.It Dv REG_GNU
195Include GNU-inspired extensions: 195Include GNU-inspired extensions:
196.Pp 196.Pp
197.Bl -tag -offset indent -width XX -compact  197.Bl -tag -offset indent -width XX -compact
198.It \eN 198.It \eN
199Use backreference 199Use backreference
200.Dv N 200.Dv N
201where 201where
202.Dv N 202.Dv N
203is between 203is between
204.Dv [1-9] . 204.Dv [1-9] .
 205.It \ea
 206Visual Bell
205.It \eb 207.It \eb
206Match a position that is a word boundary. 208Match a position that is a word boundary.
207.It \eB 209.It \eB
208Match a position that is a not word boundary. 210Match a position that is a not word boundary.
 211.It \ef
 212Form Feed
 213.It \en
 214Line Feed
 215.It \er
 216Carriage return
209.It \es 217.It \es
210Alias for [[:space:]] 218Alias for [[:space:]]
211.It \eS 219.It \eS
212Alias for [^[:space:]] 220Alias for [^[:space:]]
 221.It \et
 222Horizontal Tab
 223.It \ev
 224Vertical Tab
213.It \ew 225.It \ew
214Alias for [[:alnum:]] 226Alias for [[:alnum:]]
215.It \eW 227.It \eW
216Alias for [^[:alnum:]] 228Alias for [^[:alnum:]]
217.It \e' 229.It \e'
218Matches the end of the subject. 230Matches the end of the subject.
219.It \e` 231.It \e`
220Matches the beginning of the subject. 232Matches the beginning of the subject.
221.El 233.El
222.Pp 234.Pp
223This is an extension, 235This is an extension,
224compatible with but not specified by 236compatible with but not specified by
225.St -p1003.2 , 237.St -p1003.2 ,
226and should be used with 238and should be used with
227caution in software intended to be portable to other systems. 239caution in software intended to be portable to other systems.
228.El 240.El
229.Pp 241.Pp
230When successful, 242When successful,
231.Fn regcomp 243.Fn regcomp
232returns 0 and fills in the structure pointed to by 244returns 0 and fills in the structure pointed to by
233.Fa preg . 245.Fa preg .
234One member of that structure 246One member of that structure
235(other than 247(other than
236.Va re_endp ) 248.Va re_endp )
237is publicized: 249is publicized:
238.Va re_nsub , 250.Va re_nsub ,
239of type 251of type
240.Ft size_t , 252.Ft size_t ,
241contains the number of parenthesized subexpressions within the RE 253contains the number of parenthesized subexpressions within the RE
242(except that the value of this member is undefined if the 254(except that the value of this member is undefined if the
243.Dv REG_NOSUB 255.Dv REG_NOSUB
244flag was used). 256flag was used).
245If 257If
246.Fn regcomp 258.Fn regcomp
247fails, it returns a non-zero error code; 259fails, it returns a non-zero error code;
248see 260see
249.Sx DIAGNOSTICS . 261.Sx DIAGNOSTICS .
250.Pp 262.Pp
251The 263The
252.Fn regexec 264.Fn regexec
253function 265function
254matches the compiled RE pointed to by 266matches the compiled RE pointed to by
255.Fa preg 267.Fa preg
256against the 268against the
257.Fa string , 269.Fa string ,
258subject to the flags in 270subject to the flags in
259.Fa eflags , 271.Fa eflags ,
260and reports results using 272and reports results using
261.Fa nmatch , 273.Fa nmatch ,
262.Fa pmatch , 274.Fa pmatch ,
263and the returned value. 275and the returned value.
264The RE must have been compiled by a previous invocation of 276The RE must have been compiled by a previous invocation of
265.Fn regcomp . 277.Fn regcomp .
266The compiled form is not altered during execution of 278The compiled form is not altered during execution of
267.Fn regexec , 279.Fn regexec ,
268so a single compiled RE can be used simultaneously by multiple threads. 280so a single compiled RE can be used simultaneously by multiple threads.
269.Pp 281.Pp
270By default, 282By default,
271the NUL-terminated string pointed to by 283the NUL-terminated string pointed to by
272.Fa string 284.Fa string
273is considered to be the text of an entire line, minus any terminating 285is considered to be the text of an entire line, minus any terminating
274newline. 286newline.
275The 287The
276.Fa eflags 288.Fa eflags
277argument is the bitwise OR of zero or more of the following flags: 289argument is the bitwise OR of zero or more of the following flags:
278.Bl -tag -width REG_STARTEND 290.Bl -tag -width REG_STARTEND
279.It Dv REG_NOTBOL 291.It Dv REG_NOTBOL
280The first character of the string is treated as the continuation 292The first character of the string is treated as the continuation
281of a line. 293of a line.
282This means that the anchors 294This means that the anchors
283.Ql ^\& , 295.Ql ^\& ,
284.Ql [[:<:]] , 296.Ql [[:<:]] ,
285and 297and
286.Ql \e< 298.Ql \e<
287do not match before it; but see 299do not match before it; but see
288.Dv REG_STARTEND 300.Dv REG_STARTEND
289below. 301below.
290This does not affect the behavior of newlines under 302This does not affect the behavior of newlines under
291.Dv REG_NEWLINE . 303.Dv REG_NEWLINE .
292.It Dv REG_NOTEOL 304.It Dv REG_NOTEOL
293The NUL terminating 305The NUL terminating
294the string 306the string
295does not end a line, so the 307does not end a line, so the
296.Ql $\& 308.Ql $\&
297anchor does not match before it. 309anchor does not match before it.
298This does not affect the behavior of newlines under 310This does not affect the behavior of newlines under
299.Dv REG_NEWLINE . 311.Dv REG_NEWLINE .
300.It Dv REG_STARTEND 312.It Dv REG_STARTEND
301The string is considered to start at 313The string is considered to start at
302.Fa string No + 314.Fa string No +
303.Fa pmatch Ns [0]. Ns Fa rm_so 315.Fa pmatch Ns [0]. Ns Fa rm_so
304and to end before the byte located at 316and to end before the byte located at
305.Fa string No + 317.Fa string No +
306.Fa pmatch Ns [0]. Ns Fa rm_eo , 318.Fa pmatch Ns [0]. Ns Fa rm_eo ,
307regardless of the value of 319regardless of the value of
308.Fa nmatch . 320.Fa nmatch .
309See below for the definition of 321See below for the definition of
310.Fa pmatch 322.Fa pmatch
311and 323and
312.Fa nmatch . 324.Fa nmatch .
313This is an extension, 325This is an extension,
314compatible with but not specified by 326compatible with but not specified by
315.St -p1003.2 , 327.St -p1003.2 ,
316and should be used with 328and should be used with
317caution in software intended to be portable to other systems. 329caution in software intended to be portable to other systems.
318.Pp 330.Pp
319Without 331Without
320.Dv REG_NOTBOL , 332.Dv REG_NOTBOL ,
321the position 333the position
322.Fa rm_so 334.Fa rm_so
323is considered the beginning of a line, such that 335is considered the beginning of a line, such that
324.Ql ^ 336.Ql ^
325matches before it, and the beginning of a word if there is a word 337matches before it, and the beginning of a word if there is a word
326character at this position, such that 338character at this position, such that
327.Ql [[:<:]] 339.Ql [[:<:]]
328and 340and
329.Ql \e< 341.Ql \e<
330match before it. 342match before it.
331.Pp 343.Pp
332With 344With
333.Dv REG_NOTBOL , 345.Dv REG_NOTBOL ,
334the character at position 346the character at position
335.Fa rm_so 347.Fa rm_so
336is treated as the continuation of a line, and if 348is treated as the continuation of a line, and if
337.Fa rm_so 349.Fa rm_so
338is greater than 0, the preceding character is taken into consideration. 350is greater than 0, the preceding character is taken into consideration.
339If the preceding character is a newline and the regular expression was compiled 351If the preceding character is a newline and the regular expression was compiled
340with 352with
341.Dv REG_NEWLINE , 353.Dv REG_NEWLINE ,
342.Ql ^ 354.Ql ^
343matches before the string; if the preceding character is not a word character 355matches before the string; if the preceding character is not a word character
344but the string starts with a word character, 356but the string starts with a word character,
345.Ql [[:<:]] 357.Ql [[:<:]]
346and 358and
347.Ql \e< 359.Ql \e<
348match before the string. 360match before the string.
349.El 361.El
350.Pp 362.Pp
351See 363See
352.Xr re_format 7 364.Xr re_format 7
353for a discussion of what is matched in situations where an RE or a 365for a discussion of what is matched in situations where an RE or a
354portion thereof could match any of several substrings of 366portion thereof could match any of several substrings of
355.Fa string . 367.Fa string .
356.Pp 368.Pp
357Normally, 369Normally,
358.Fn regexec 370.Fn regexec
359returns 0 for success and the non-zero code 371returns 0 for success and the non-zero code
360.Dv REG_NOMATCH 372.Dv REG_NOMATCH
361for failure. 373for failure.
362Other non-zero error codes may be returned in exceptional situations; 374Other non-zero error codes may be returned in exceptional situations;
363see 375see
364.Sx DIAGNOSTICS . 376.Sx DIAGNOSTICS .
365.Pp 377.Pp
366If 378If
367.Dv REG_NOSUB 379.Dv REG_NOSUB
368was specified in the compilation of the RE, 380was specified in the compilation of the RE,
369or if 381or if
370.Fa nmatch 382.Fa nmatch
371is 0, 383is 0,
372.Fn regexec 384.Fn regexec
373ignores the 385ignores the
374.Fa pmatch 386.Fa pmatch
375argument (but see below for the case where 387argument (but see below for the case where
376.Dv REG_STARTEND 388.Dv REG_STARTEND
377is specified). 389is specified).
378Otherwise, 390Otherwise,
379.Fa pmatch 391.Fa pmatch
380points to an array of 392points to an array of
381.Fa nmatch 393.Fa nmatch
382structures of type 394structures of type
383.Ft regmatch_t . 395.Ft regmatch_t .
384Such a structure has at least the members 396Such a structure has at least the members
385.Va rm_so 397.Va rm_so
386and 398and
387.Va rm_eo , 399.Va rm_eo ,
388both of type 400both of type
389.Ft regoff_t 401.Ft regoff_t
390(a signed arithmetic type at least as large as an 402(a signed arithmetic type at least as large as an
391.Ft off_t 403.Ft off_t
392and a 404and a
393.Ft ssize_t ) , 405.Ft ssize_t ) ,
394containing respectively the offset of the first character of a substring 406containing respectively the offset of the first character of a substring
395and the offset of the first character after the end of the substring. 407and the offset of the first character after the end of the substring.
396Offsets are measured from the beginning of the 408Offsets are measured from the beginning of the
397.Fa string 409.Fa string
398argument given to 410argument given to
399.Fn regexec . 411.Fn regexec .
400An empty substring is denoted by equal offsets, 412An empty substring is denoted by equal offsets,
401both indicating the character following the empty substring. 413both indicating the character following the empty substring.
402.Pp 414.Pp
403The 0th member of the 415The 0th member of the
404.Fa pmatch 416.Fa pmatch
405array is filled in to indicate what substring of 417array is filled in to indicate what substring of
406.Fa string 418.Fa string
407was matched by the entire RE. 419was matched by the entire RE.
408Remaining members report what substring was matched by parenthesized 420Remaining members report what substring was matched by parenthesized
409subexpressions within the RE; 421subexpressions within the RE;
410member 422member
411.Va i 423.Va i
412reports subexpression 424reports subexpression
413.Va i , 425.Va i ,
414with subexpressions counted (starting at 1) by the order of their opening 426with subexpressions counted (starting at 1) by the order of their opening
415parentheses in the RE, left to right. 427parentheses in the RE, left to right.
416Unused entries in the array (corresponding either to subexpressions that 428Unused entries in the array (corresponding either to subexpressions that
417did not participate in the match at all, or to subexpressions that do not 429did not participate in the match at all, or to subexpressions that do not
418exist in the RE (that is, 430exist in the RE (that is,
419.Va i 431.Va i
420> 432>
421.Fa preg Ns -> Ns Va re_nsub ) ) 433.Fa preg Ns -> Ns Va re_nsub ) )
422have both 434have both
423.Va rm_so 435.Va rm_so
424and 436and
425.Va rm_eo 437.Va rm_eo
426set to -1. 438set to -1.
427If a subexpression participated in the match several times, 439If a subexpression participated in the match several times,
428the reported substring is the last one it matched. 440the reported substring is the last one it matched.
429(Note, as an example in particular, that when the RE 441(Note, as an example in particular, that when the RE
430.Ql "(b*)+" 442.Ql "(b*)+"
431matches 443matches
432.Ql bbb , 444.Ql bbb ,
433the parenthesized subexpression matches each of the three 445the parenthesized subexpression matches each of the three
434.So Li b Sc Ns s 446.So Li b Sc Ns s
435and then 447and then
436an infinite number of empty strings following the last 448an infinite number of empty strings following the last
437.Ql b , 449.Ql b ,
438so the reported substring is one of the empties.) 450so the reported substring is one of the empties.)
439.Pp 451.Pp
440If 452If
441.Dv REG_STARTEND 453.Dv REG_STARTEND
442is specified, 454is specified,
443.Fa pmatch 455.Fa pmatch
444must point to at least one 456must point to at least one
445.Ft regmatch_t 457.Ft regmatch_t
446(even if 458(even if
447.Fa nmatch 459.Fa nmatch
448is 0 or 460is 0 or
449.Dv REG_NOSUB 461.Dv REG_NOSUB
450was specified), 462was specified),
451to hold the input offsets for 463to hold the input offsets for
452.Dv REG_STARTEND . 464.Dv REG_STARTEND .
453Use for output is still entirely controlled by 465Use for output is still entirely controlled by
454.Fa nmatch ; 466.Fa nmatch ;
455if 467if
456.Fa nmatch 468.Fa nmatch
457is 0 or 469is 0 or
458.Dv REG_NOSUB 470.Dv REG_NOSUB
459was specified, 471was specified,
460the value of 472the value of
461.Fa pmatch Ns [0] 473.Fa pmatch Ns [0]
462will not be changed by a successful 474will not be changed by a successful
463.Fn regexec . 475.Fn regexec .
464.Pp 476.Pp
465The 477The
466.Fn regerror 478.Fn regerror
467function 479function
468maps a non-zero 480maps a non-zero
469.Fa errcode 481.Fa errcode
470from either 482from either
471.Fn regcomp 483.Fn regcomp
472or 484or
473.Fn regexec 485.Fn regexec
474to a human-readable, printable message. 486to a human-readable, printable message.
475If 487If
476.Fa preg 488.Fa preg
477is 489is
478.No non\- Ns Dv NULL , 490.No non\- Ns Dv NULL ,
479the error code should have arisen from use of 491the error code should have arisen from use of
480the 492the
481.Ft regex_t 493.Ft regex_t
482pointed to by 494pointed to by
483.Fa preg , 495.Fa preg ,
484and if the error code came from 496and if the error code came from
485.Fn regcomp , 497.Fn regcomp ,
486it should have been the result from the most recent 498it should have been the result from the most recent
487.Fn regcomp 499.Fn regcomp
488using that 500using that
489.Ft regex_t . 501.Ft regex_t .
490The 502The
491.Po 503.Po
492.Fn regerror 504.Fn regerror
493may be able to supply a more detailed message using information 505may be able to supply a more detailed message using information
494from the 506from the
495.Ft regex_t . 507.Ft regex_t .
496.Pc 508.Pc
497The 509The
498.Fn regerror 510.Fn regerror
499function 511function
500places the NUL-terminated message into the buffer pointed to by 512places the NUL-terminated message into the buffer pointed to by
501.Fa errbuf , 513.Fa errbuf ,
502limiting the length (including the NUL) to at most 514limiting the length (including the NUL) to at most
503.Fa errbuf_size 515.Fa errbuf_size
504bytes. 516bytes.
505If the whole message will not fit, 517If the whole message will not fit,
506as much of it as will fit before the terminating NUL is supplied. 518as much of it as will fit before the terminating NUL is supplied.
507In any case, 519In any case,
508the returned value is the size of buffer needed to hold the whole 520the returned value is the size of buffer needed to hold the whole
509message (including terminating NUL). 521message (including terminating NUL).
510If 522If
511.Fa errbuf_size 523.Fa errbuf_size
512is 0, 524is 0,
513.Fa errbuf 525.Fa errbuf
514is ignored but the return value is still correct. 526is ignored but the return value is still correct.
515.Pp 527.Pp
516If the 528If the
517.Fa errcode 529.Fa errcode
518given to 530given to
519.Fn regerror 531.Fn regerror
520is first ORed with 532is first ORed with
521.Dv REG_ITOA , 533.Dv REG_ITOA ,
522the 534the
523.Dq message 535.Dq message
524that results is the printable name of the error code, 536that results is the printable name of the error code,
525e.g.\& 537e.g.\&
526.Dq Dv REG_NOMATCH , 538.Dq Dv REG_NOMATCH ,
527rather than an explanation thereof. 539rather than an explanation thereof.
528If 540If
529.Fa errcode 541.Fa errcode
530is 542is
531.Dv REG_ATOI , 543.Dv REG_ATOI ,
532then 544then
533.Fa preg 545.Fa preg
534shall be 546shall be
535.No non\- Ns Dv NULL 547.No non\- Ns Dv NULL
536and the 548and the
537.Va re_endp 549.Va re_endp
538member of the structure it points to 550member of the structure it points to
539must point to the printable name of an error code; 551must point to the printable name of an error code;
540in this case, the result in 552in this case, the result in
541.Fa errbuf 553.Fa errbuf
542is the decimal digits of 554is the decimal digits of
543the numeric value of the error code 555the numeric value of the error code
544(0 if the name is not recognized). 556(0 if the name is not recognized).
545.Dv REG_ITOA 557.Dv REG_ITOA
546and 558and
547.Dv REG_ATOI 559.Dv REG_ATOI
548are intended primarily as debugging facilities; 560are intended primarily as debugging facilities;
549they are extensions, 561they are extensions,
550compatible with but not specified by 562compatible with but not specified by
551.St -p1003.2 , 563.St -p1003.2 ,
552and should be used with 564and should be used with
553caution in software intended to be portable to other systems. 565caution in software intended to be portable to other systems.
554Be warned also that they are considered experimental and changes are possible. 566Be warned also that they are considered experimental and changes are possible.
555.Pp 567.Pp
556The 568The
557.Fn regfree 569.Fn regfree
558function 570function
559frees any dynamically-allocated storage associated with the compiled RE 571frees any dynamically-allocated storage associated with the compiled RE
560pointed to by 572pointed to by
561.Fa preg . 573.Fa preg .
562The remaining 574The remaining
563.Ft regex_t 575.Ft regex_t
564is no longer a valid compiled RE 576is no longer a valid compiled RE
565and the effect of supplying it to 577and the effect of supplying it to
566.Fn regexec 578.Fn regexec
567or 579or
568.Fn regerror 580.Fn regerror
569is undefined. 581is undefined.
570.Pp 582.Pp
571None of these functions references global variables except for tables 583None of these functions references global variables except for tables
572of constants; 584of constants;
573all are safe for use from multiple threads if the arguments are safe. 585all are safe for use from multiple threads if the arguments are safe.
574.Pp 586.Pp
575The 587The
576.Fn regnsub 588.Fn regnsub
577and 589and
578.Fn regasub 590.Fn regasub
579functions perform substitutions using 591functions perform substitutions using
580.Xr sed 1 592.Xr sed 1
581like syntax. 593like syntax.
582They return the length of the string that would have been created 594They return the length of the string that would have been created
583if there was enough space or 595if there was enough space or
584.Dv \-1 596.Dv \-1
585on error, setting 597on error, setting
586.Dv errno . 598.Dv errno .
587The result 599The result
588is being placed in 600is being placed in
589.Fa buf 601.Fa buf
590which is user-supplied in 602which is user-supplied in
591.Fn regnsub 603.Fn regnsub
592and dynamically allocated in 604and dynamically allocated in
593.Fn regasub . 605.Fn regasub .
594The 606The
595.Fa sub 607.Fa sub
596argument contains a substitution string which might refer to the first 608argument contains a substitution string which might refer to the first
5979 regular expression strings using 6099 regular expression strings using
598.Dq \e<n> 610.Dq \e<n>
599to refer to the nth matched 611to refer to the nth matched
600item, or 612item, or
601.Dq & 613.Dq &
602(which is equivalent to 614(which is equivalent to
603.Dq \e0 ) 615.Dq \e0 )
604to refer to the full match. 616to refer to the full match.
605The 617The
606.Fa rm 618.Fa rm
607array must be at least 10 elements long, and should contain the result 619array must be at least 10 elements long, and should contain the result
608of the matches from a previous 620of the matches from a previous
609.Fn regexec 621.Fn regexec
610call. 622call.
611Only 10 elements of the 623Only 10 elements of the
612.Fa rm 624.Fa rm
613array can be used. 625array can be used.
614The 626The
615.Fa str 627.Fa str
616argument contains the source string to apply the transformation to. 628argument contains the source string to apply the transformation to.
617.Sh IMPLEMENTATION CHOICES 629.Sh IMPLEMENTATION CHOICES
618There are a number of decisions that 630There are a number of decisions that
619.St -p1003.2 631.St -p1003.2
620leaves up to the implementor, 632leaves up to the implementor,
621either by explicitly saying 633either by explicitly saying
622.Dq undefined 634.Dq undefined
623or by virtue of them being 635or by virtue of them being
624forbidden by the RE grammar. 636forbidden by the RE grammar.
625This implementation treats them as follows. 637This implementation treats them as follows.
626.Pp 638.Pp
627See 639See
628.Xr re_format 7 640.Xr re_format 7
629for a discussion of the definition of case-independent matching. 641for a discussion of the definition of case-independent matching.
630.Pp 642.Pp
631There is no particular limit on the length of REs, 643There is no particular limit on the length of REs,
632except insofar as memory is limited. 644except insofar as memory is limited.
633Memory usage is approximately linear in RE size, and largely insensitive 645Memory usage is approximately linear in RE size, and largely insensitive
634to RE complexity, except for bounded repetitions. 646to RE complexity, except for bounded repetitions.
635See 647See
636.Sx BUGS 648.Sx BUGS
637for one short RE using them 649for one short RE using them
638that will run almost any system out of memory. 650that will run almost any system out of memory.
639.Pp 651.Pp
640A backslashed character other than one specifically given a magic meaning 652A backslashed character other than one specifically given a magic meaning
641by 653by
642.St -p1003.2 654.St -p1003.2
643(such magic meanings occur only in obsolete 655(such magic meanings occur only in obsolete
644.Bq Dq basic 656.Bq Dq basic
645REs) 657REs)
646is taken as an ordinary character. 658is taken as an ordinary character.
647.Pp 659.Pp
648Any unmatched 660Any unmatched
649.Ql [\& 661.Ql [\&
650is a 662is a
651.Dv REG_EBRACK 663.Dv REG_EBRACK
652error. 664error.
653.Pp 665.Pp
654Equivalence classes cannot begin or end bracket-expression ranges. 666Equivalence classes cannot begin or end bracket-expression ranges.
655The endpoint of one range cannot begin another. 667The endpoint of one range cannot begin another.
656.Pp 668.Pp
657.Dv RE_DUP_MAX , 669.Dv RE_DUP_MAX ,
658the limit on repetition counts in bounded repetitions, is 255. 670the limit on repetition counts in bounded repetitions, is 255.
659.Pp 671.Pp
660A repetition operator 672A repetition operator
661.Ql ( ?\& , 673.Ql ( ?\& ,
662.Ql *\& , 674.Ql *\& ,
663.Ql +\& , 675.Ql +\& ,
664or bounds) 676or bounds)
665cannot follow another 677cannot follow another
666repetition operator. 678repetition operator.
667A repetition operator cannot begin an expression or subexpression 679A repetition operator cannot begin an expression or subexpression
668or follow 680or follow
669.Ql ^\& 681.Ql ^\&
670or 682or
671.Ql |\& . 683.Ql |\& .
672.Pp 684.Pp
673.Ql |\& 685.Ql |\&
674cannot appear first or last in a (sub)expression or after another 686cannot appear first or last in a (sub)expression or after another
675.Ql |\& , 687.Ql |\& ,
676i.e., an operand of 688i.e., an operand of
677.Ql |\& 689.Ql |\&
678cannot be an empty subexpression. 690cannot be an empty subexpression.
679An empty parenthesized subexpression, 691An empty parenthesized subexpression,
680.Ql "()" , 692.Ql "()" ,
681is legal and matches an 693is legal and matches an
682empty (sub)string. 694empty (sub)string.
683An empty string is not a legal RE. 695An empty string is not a legal RE.
684.Pp 696.Pp
685A 697A
686.Ql {\& 698.Ql {\&
687followed by a digit is considered the beginning of bounds for a 699followed by a digit is considered the beginning of bounds for a
688bounded repetition, which must then follow the syntax for bounds. 700bounded repetition, which must then follow the syntax for bounds.
689A 701A
690.Ql {\& 702.Ql {\&
691.Em not 703.Em not
692followed by a digit is considered an ordinary character. 704followed by a digit is considered an ordinary character.
693.Pp 705.Pp
694.Ql ^\& 706.Ql ^\&
695and 707and
696.Ql $\& 708.Ql $\&
697beginning and ending subexpressions in obsolete 709beginning and ending subexpressions in obsolete
698.Pq Dq basic 710.Pq Dq basic
699REs are anchors, not ordinary characters. 711REs are anchors, not ordinary characters.
700.Sh DIAGNOSTICS 712.Sh DIAGNOSTICS
701Non-zero error codes from 713Non-zero error codes from
702.Fn regcomp 714.Fn regcomp
703and 715and
704.Fn regexec 716.Fn regexec
705include the following: 717include the following:
706.Pp 718.Pp
707.Bl -tag -width REG_ECOLLATE -compact 719.Bl -tag -width REG_ECOLLATE -compact
708.It Dv REG_NOMATCH 720.It Dv REG_NOMATCH
709The 721The
710.Fn regexec 722.Fn regexec
711function 723function
712failed to match 724failed to match
713.It Dv REG_BADPAT 725.It Dv REG_BADPAT
714invalid regular expression 726invalid regular expression
715.It Dv REG_ECOLLATE 727.It Dv REG_ECOLLATE
716invalid collating element 728invalid collating element
717.It Dv REG_ECTYPE 729.It Dv REG_ECTYPE
718invalid character class 730invalid character class
719.It Dv REG_EESCAPE 731.It Dv REG_EESCAPE
720.Ql \e 732.Ql \e
721applied to unescapable character 733applied to unescapable character
722.It Dv REG_ESUBREG 734.It Dv REG_ESUBREG
723invalid backreference number 735invalid backreference number
724.It Dv REG_EBRACK 736.It Dv REG_EBRACK
725brackets 737brackets
726.Ql "[ ]" 738.Ql "[ ]"
727not balanced 739not balanced
728.It Dv REG_EPAREN 740.It Dv REG_EPAREN
729parentheses 741parentheses
730.Ql "( )" 742.Ql "( )"
731not balanced 743not balanced
732.It Dv REG_EBRACE 744.It Dv REG_EBRACE
733braces 745braces
734.Ql "{ }" 746.Ql "{ }"
735not balanced 747not balanced
736.It Dv REG_BADBR 748.It Dv REG_BADBR
737invalid repetition count(s) in 749invalid repetition count(s) in
738.Ql "{ }" 750.Ql "{ }"
739.It Dv REG_ERANGE 751.It Dv REG_ERANGE
740invalid character range in 752invalid character range in
741.Ql "[ ]" 753.Ql "[ ]"
742.It Dv REG_ESPACE 754.It Dv REG_ESPACE
743ran out of memory 755ran out of memory
744.It Dv REG_BADRPT 756.It Dv REG_BADRPT
745.Ql ?\& , 757.Ql ?\& ,
746.Ql *\& , 758.Ql *\& ,
747or 759or
748.Ql +\& 760.Ql +\&
749operand invalid 761operand invalid
750.It Dv REG_EMPTY 762.It Dv REG_EMPTY
751empty (sub)expression 763empty (sub)expression
752.It Dv REG_ASSERT 764.It Dv REG_ASSERT
753cannot happen - you found a bug 765cannot happen - you found a bug
754.It Dv REG_INVARG 766.It Dv REG_INVARG
755invalid argument, e.g.\& negative-length string 767invalid argument, e.g.\& negative-length string
756.It Dv REG_ILLSEQ 768.It Dv REG_ILLSEQ
757illegal byte sequence (bad multibyte character) 769illegal byte sequence (bad multibyte character)
758.El 770.El
759.Sh SEE ALSO 771.Sh SEE ALSO
760.Xr grep 1 , 772.Xr grep 1 ,
761.Xr re_format 7 773.Xr re_format 7
762.Pp 774.Pp
763.St -p1003.2 , 775.St -p1003.2 ,
764sections 2.8 (Regular Expression Notation) 776sections 2.8 (Regular Expression Notation)
765and 777and
766B.5 (C Binding for Regular Expression Matching). 778B.5 (C Binding for Regular Expression Matching).
767.Sh HISTORY 779.Sh HISTORY
768Originally written by 780Originally written by
769.An Henry Spencer . 781.An Henry Spencer .
770Altered for inclusion in the 782Altered for inclusion in the
771.Bx 4.4 783.Bx 4.4
772distribution. 784distribution.
773.Pp 785.Pp
774The 786The
775.Fn regnsub 787.Fn regnsub
776and 788and
777.Fn regasub 789.Fn regasub
778functions appeared in 790functions appeared in
779.Nx 8 . 791.Nx 8 .
780.Sh BUGS 792.Sh BUGS
781This is an alpha release with known defects. 793This is an alpha release with known defects.
782Please report problems. 794Please report problems.
783.Pp 795.Pp
784The back-reference code is subtle and doubts linger about its correctness 796The back-reference code is subtle and doubts linger about its correctness
785in complex cases. 797in complex cases.
786.Pp 798.Pp
787The 799The
788.Fn regexec 800.Fn regexec
789function 801function
790performance is poor. 802performance is poor.
791This will improve with later releases. 803This will improve with later releases.
792The 804The
793.Fa nmatch 805.Fa nmatch
794argument 806argument
795exceeding 0 is expensive; 807exceeding 0 is expensive;
796.Fa nmatch 808.Fa nmatch
797exceeding 1 is worse. 809exceeding 1 is worse.
798The 810The
799.Fn regexec 811.Fn regexec
800function 812function
801is largely insensitive to RE complexity 813is largely insensitive to RE complexity
802.Em except 814.Em except
803that back 815that back
804references are massively expensive. 816references are massively expensive.
805RE length does matter; in particular, there is a strong speed bonus 817RE length does matter; in particular, there is a strong speed bonus
806for keeping RE length under about 30 characters, 818for keeping RE length under about 30 characters,
807with most special characters counting roughly double. 819with most special characters counting roughly double.
808.Pp 820.Pp
809The 821The
810.Fn regcomp 822.Fn regcomp
811function 823function
812implements bounded repetitions by macro expansion, 824implements bounded repetitions by macro expansion,
813which is costly in time and space if counts are large 825which is costly in time and space if counts are large
814or bounded repetitions are nested. 826or bounded repetitions are nested.
815An RE like, say, 827An RE like, say,
816.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}" 828.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}"
817will (eventually) run almost any existing machine out of swap space. 829will (eventually) run almost any existing machine out of swap space.
818.Pp 830.Pp
819There are suspected problems with response to obscure error conditions. 831There are suspected problems with response to obscure error conditions.
820Notably, 832Notably,
821certain kinds of internal overflow, 833certain kinds of internal overflow,
822produced only by truly enormous REs or by multiply nested bounded repetitions, 834produced only by truly enormous REs or by multiply nested bounded repetitions,
823are probably not handled well. 835are probably not handled well.
824.Pp 836.Pp
825Due to a mistake in 837Due to a mistake in
826.St -p1003.2 , 838.St -p1003.2 ,
827things like 839things like
828.Ql "a)b" 840.Ql "a)b"
829are legal REs because 841are legal REs because
830.Ql )\& 842.Ql )\&
831is 843is
832a special character only in the presence of a previous unmatched 844a special character only in the presence of a previous unmatched
833.Ql (\& . 845.Ql (\& .
834This cannot be fixed until the spec is fixed. 846This cannot be fixed until the spec is fixed.
835.Pp 847.Pp
836The standard's definition of back references is vague. 848The standard's definition of back references is vague.
837For example, does 849For example, does
838.Ql "a\e(\e(b\e)*\e2\e)*d" 850.Ql "a\e(\e(b\e)*\e2\e)*d"
839match 851match
840.Ql "abbbd" ? 852.Ql "abbbd" ?
841Until the standard is clarified, 853Until the standard is clarified,
842behavior in such cases should not be relied on. 854behavior in such cases should not be relied on.
843.Pp 855.Pp
844The implementation of word-boundary matching is a bit of a kludge, 856The implementation of word-boundary matching is a bit of a kludge,
845and bugs may lurk in combinations of word-boundary matching and anchoring. 857and bugs may lurk in combinations of word-boundary matching and anchoring.
846.Pp 858.Pp
847Word-boundary matching does not work properly in multibyte locales. 859Word-boundary matching does not work properly in multibyte locales.