| @@ -1,315 +1,315 @@ | | | @@ -1,315 +1,315 @@ |
1 | .\" $NetBSD: re_format.7,v 1.9 2009/04/21 14:46:02 joerg Exp $ | | 1 | .\" $NetBSD: re_format.7,v 1.10 2013/01/25 11:51:42 wiz Exp $ |
2 | .\" | | 2 | .\" |
3 | .\" Copyright (c) 1992, 1993, 1994 | | 3 | .\" Copyright (c) 1992, 1993, 1994 |
4 | .\" The Regents of the University of California. All rights reserved. | | 4 | .\" The Regents of the University of California. All rights reserved. |
5 | .\" | | 5 | .\" |
6 | .\" This code is derived from software contributed to Berkeley by | | 6 | .\" This code is derived from software contributed to Berkeley by |
7 | .\" Henry Spencer. | | 7 | .\" Henry Spencer. |
8 | .\" | | 8 | .\" |
9 | .\" Redistribution and use in source and binary forms, with or without | | 9 | .\" Redistribution and use in source and binary forms, with or without |
10 | .\" modification, are permitted provided that the following conditions | | 10 | .\" modification, are permitted provided that the following conditions |
11 | .\" are met: | | 11 | .\" are met: |
12 | .\" 1. Redistributions of source code must retain the above copyright | | 12 | .\" 1. Redistributions of source code must retain the above copyright |
13 | .\" notice, this list of conditions and the following disclaimer. | | 13 | .\" notice, this list of conditions and the following disclaimer. |
14 | .\" 2. Redistributions in binary form must reproduce the above copyright | | 14 | .\" 2. Redistributions in binary form must reproduce the above copyright |
15 | .\" notice, this list of conditions and the following disclaimer in the | | 15 | .\" notice, this list of conditions and the following disclaimer in the |
16 | .\" documentation and/or other materials provided with the distribution. | | 16 | .\" documentation and/or other materials provided with the distribution. |
17 | .\" 3. Neither the name of the University nor the names of its contributors | | 17 | .\" 3. Neither the name of the University nor the names of its contributors |
18 | .\" may be used to endorse or promote products derived from this software | | 18 | .\" may be used to endorse or promote products derived from this software |
19 | .\" without specific prior written permission. | | 19 | .\" without specific prior written permission. |
20 | .\" | | 20 | .\" |
21 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | | 21 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND |
22 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | | 22 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
23 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | | 23 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
24 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | | 24 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE |
25 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | | 25 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
26 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | | 26 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS |
27 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | | 27 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
28 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | | 28 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
29 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | | 29 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
30 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | | 30 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
31 | .\" SUCH DAMAGE. | | 31 | .\" SUCH DAMAGE. |
32 | .\" | | 32 | .\" |
33 | .\" Copyright (c) 1992, 1993, 1994 Henry Spencer. | | 33 | .\" Copyright (c) 1992, 1993, 1994 Henry Spencer. |
34 | .\" | | 34 | .\" |
35 | .\" This code is derived from software contributed to Berkeley by | | 35 | .\" This code is derived from software contributed to Berkeley by |
36 | .\" Henry Spencer. | | 36 | .\" Henry Spencer. |
37 | .\" | | 37 | .\" |
38 | .\" Redistribution and use in source and binary forms, with or without | | 38 | .\" Redistribution and use in source and binary forms, with or without |
39 | .\" modification, are permitted provided that the following conditions | | 39 | .\" modification, are permitted provided that the following conditions |
40 | .\" are met: | | 40 | .\" are met: |
41 | .\" 1. Redistributions of source code must retain the above copyright | | 41 | .\" 1. Redistributions of source code must retain the above copyright |
42 | .\" notice, this list of conditions and the following disclaimer. | | 42 | .\" notice, this list of conditions and the following disclaimer. |
43 | .\" 2. Redistributions in binary form must reproduce the above copyright | | 43 | .\" 2. Redistributions in binary form must reproduce the above copyright |
44 | .\" notice, this list of conditions and the following disclaimer in the | | 44 | .\" notice, this list of conditions and the following disclaimer in the |
45 | .\" documentation and/or other materials provided with the distribution. | | 45 | .\" documentation and/or other materials provided with the distribution. |
46 | .\" 3. All advertising materials mentioning features or use of this software | | 46 | .\" 3. All advertising materials mentioning features or use of this software |
47 | .\" must display the following acknowledgement: | | 47 | .\" must display the following acknowledgement: |
48 | .\" This product includes software developed by the University of | | 48 | .\" This product includes software developed by the University of |
49 | .\" California, Berkeley and its contributors. | | 49 | .\" California, Berkeley and its contributors. |
50 | .\" 4. Neither the name of the University nor the names of its contributors | | 50 | .\" 4. Neither the name of the University nor the names of its contributors |
51 | .\" may be used to endorse or promote products derived from this software | | 51 | .\" may be used to endorse or promote products derived from this software |
52 | .\" without specific prior written permission. | | 52 | .\" without specific prior written permission. |
53 | .\" | | 53 | .\" |
54 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | | 54 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND |
55 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | | 55 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
56 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | | 56 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
57 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | | 57 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE |
58 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | | 58 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
59 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | | 59 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS |
60 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | | 60 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
61 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | | 61 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
62 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | | 62 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
63 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | | 63 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
64 | .\" SUCH DAMAGE. | | 64 | .\" SUCH DAMAGE. |
65 | .\" | | 65 | .\" |
66 | .\" @(#)re_format.7 8.3 (Berkeley) 3/20/94 | | 66 | .\" @(#)re_format.7 8.3 (Berkeley) 3/20/94 |
67 | .\" | | 67 | .\" |
68 | .Dd March 20, 1994 | | 68 | .Dd March 20, 1994 |
69 | .Dt RE_FORMAT 7 | | 69 | .Dt RE_FORMAT 7 |
70 | .Os | | 70 | .Os |
71 | .Sh NAME | | 71 | .Sh NAME |
72 | .Nm re_format | | 72 | .Nm re_format |
73 | .Nd POSIX 1003.2 regular expressions | | 73 | .Nd POSIX 1003.2 regular expressions |
74 | .Sh DESCRIPTION | | 74 | .Sh DESCRIPTION |
75 | Regular expressions (``RE''s), | | 75 | Regular expressions (``RE''s), |
76 | as defined in POSIX 1003.2, come in two forms: | | 76 | as defined in POSIX 1003.2, come in two forms: |
77 | modern REs (roughly those of | | 77 | modern REs (roughly those of |
78 | .Xr egrep 1 ; | | 78 | .Xr egrep 1 ; |
79 | 1003.2 calls these ``extended'' REs) | | 79 | 1003.2 calls these ``extended'' REs) |
80 | and obsolete REs (roughly those of | | 80 | and obsolete REs (roughly those of |
81 | .Xr ed 1 ; | | 81 | .Xr ed 1 ; |
82 | 1003.2 ``basic'' REs). | | 82 | 1003.2 ``basic'' REs). |
83 | Obsolete REs mostly exist for backward compatibility in some old programs; | | 83 | Obsolete REs mostly exist for backward compatibility in some old programs; |
84 | they will be discussed at the end. | | 84 | they will be discussed at the end. |
85 | 1003.2 leaves some aspects of RE syntax and semantics open; | | 85 | 1003.2 leaves some aspects of RE syntax and semantics open; |
86 | `\(dg' marks decisions on these aspects that | | 86 | `#' marks decisions on these aspects that |
87 | may not be fully portable to other 1003.2 implementations. | | 87 | may not be fully portable to other 1003.2 implementations. |
88 | .Pp | | 88 | .Pp |
89 | A (modern) RE is one\(dg or more non-empty\(dg | | 89 | A (modern) RE is one# or more non-empty# |
90 | .Em branches , | | 90 | .Em branches , |
91 | separated by `|'. | | 91 | separated by `|'. |
92 | It matches anything that matches one of the branches. | | 92 | It matches anything that matches one of the branches. |
93 | .Pp | | 93 | .Pp |
94 | A branch is one\(dg or more | | 94 | A branch is one# or more |
95 | .Em pieces , | | 95 | .Em pieces , |
96 | concatenated. | | 96 | concatenated. |
97 | It matches a match for the first, followed by a match for the second, etc. | | 97 | It matches a match for the first, followed by a match for the second, etc. |
98 | .Pp | | 98 | .Pp |
99 | A piece is an | | 99 | A piece is an |
100 | .Em atom | | 100 | .Em atom |
101 | possibly followed | | 101 | possibly followed |
102 | by a single\(dg `*', `+', `?', or | | 102 | by a single# `*', `+', `?', or |
103 | .Em bound . | | 103 | .Em bound . |
104 | An atom followed by `*' matches a sequence of 0 or more matches of the atom. | | 104 | An atom followed by `*' matches a sequence of 0 or more matches of the atom. |
105 | An atom followed by `+' matches a sequence of 1 or more matches of the atom. | | 105 | An atom followed by `+' matches a sequence of 1 or more matches of the atom. |
106 | An atom followed by `?' matches a sequence of 0 or 1 matches of the atom. | | 106 | An atom followed by `?' matches a sequence of 0 or 1 matches of the atom. |
107 | .Pp | | 107 | .Pp |
108 | A | | 108 | A |
109 | .Em bound | | 109 | .Em bound |
110 | is `{' followed by an unsigned decimal integer, possibly followed by `,' | | 110 | is `{' followed by an unsigned decimal integer, possibly followed by `,' |
111 | possibly followed by another unsigned decimal integer, | | 111 | possibly followed by another unsigned decimal integer, |
112 | always followed by `}'. | | 112 | always followed by `}'. |
113 | The integers must lie between 0 and RE_DUP_MAX (255\(dg) inclusive, | | 113 | The integers must lie between 0 and RE_DUP_MAX (255#) inclusive, |
114 | and if there are two of them, the first may not exceed the second. | | 114 | and if there are two of them, the first may not exceed the second. |
115 | An atom followed by a bound containing one integer | | 115 | An atom followed by a bound containing one integer |
116 | .Em i | | 116 | .Em i |
117 | and no comma matches a sequence of exactly | | 117 | and no comma matches a sequence of exactly |
118 | .Em i | | 118 | .Em i |
119 | matches of the atom. | | 119 | matches of the atom. |
120 | An atom followed by a bound containing one integer | | 120 | An atom followed by a bound containing one integer |
121 | .Em i | | 121 | .Em i |
122 | and a comma matches a sequence of | | 122 | and a comma matches a sequence of |
123 | .Em i | | 123 | .Em i |
124 | or more matches of the atom. | | 124 | or more matches of the atom. |
125 | An atom followed by a bound containing two integers | | 125 | An atom followed by a bound containing two integers |
126 | .Em i | | 126 | .Em i |
127 | and | | 127 | and |
128 | .Em j | | 128 | .Em j |
129 | matches a sequence of | | 129 | matches a sequence of |
130 | .Em i | | 130 | .Em i |
131 | through | | 131 | through |
132 | .Em j | | 132 | .Em j |
133 | (inclusive) matches of the atom. | | 133 | (inclusive) matches of the atom. |
134 | .Pp | | 134 | .Pp |
135 | An atom is a regular expression enclosed in `()' (matching a match for the | | 135 | An atom is a regular expression enclosed in `()' (matching a match for the |
136 | regular expression), an empty set of `()' (matching the null string)\(dg, a | | 136 | regular expression), an empty set of `()' (matching the null string)#, a |
137 | .Em bracket expression | | 137 | .Em bracket expression |
138 | (see below), `.' (matching any single character), | | 138 | (see below), `.' (matching any single character), |
139 | `^' (matching the null string at the beginning of a line), | | 139 | `^' (matching the null string at the beginning of a line), |
140 | `$' (matching the null string at the end of a line), | | 140 | `$' (matching the null string at the end of a line), |
141 | a `\e' followed by one of the characters `^.[$()|*+?{\e' | | 141 | a `\e' followed by one of the characters `^.[$()|*+?{\e' |
142 | (matching that character taken as an ordinary character), | | 142 | (matching that character taken as an ordinary character), |
143 | a `\e' followed by any other character\(dg | | 143 | a `\e' followed by any other character# |
144 | (matching that character taken as an ordinary character, | | 144 | (matching that character taken as an ordinary character, |
145 | as if the `\e' had not been present\(dg), | | 145 | as if the `\e' had not been present#), |
146 | or a single character with no other significance (matching that character). | | 146 | or a single character with no other significance (matching that character). |
147 | A `{' followed by a character other than a digit is an ordinary | | 147 | A `{' followed by a character other than a digit is an ordinary |
148 | character, not the beginning of a bound\(dg. | | 148 | character, not the beginning of a bound#. |
149 | It is illegal to end an RE with `\e'. | | 149 | It is illegal to end an RE with `\e'. |
150 | .Pp | | 150 | .Pp |
151 | A | | 151 | A |
152 | .Em bracket expression | | 152 | .Em bracket expression |
153 | is a list of characters enclosed in `[]'. | | 153 | is a list of characters enclosed in `[]'. |
154 | It normally matches any single character from the list (but see below). | | 154 | It normally matches any single character from the list (but see below). |
155 | If the list begins with `^', | | 155 | If the list begins with `^', |
156 | it matches any single character (but see below) | | 156 | it matches any single character (but see below) |
157 | .Em not | | 157 | .Em not |
158 | from the rest of the list. | | 158 | from the rest of the list. |
159 | If two characters in the list are separated by `\-', this is shorthand | | 159 | If two characters in the list are separated by `\-', this is shorthand |
160 | for the full | | 160 | for the full |
161 | .Em range | | 161 | .Em range |
162 | of characters between those two (inclusive) in the collating sequence, | | 162 | of characters between those two (inclusive) in the collating sequence, |
163 | e.g. `[0-9]' in ASCII matches any decimal digit. | | 163 | e.g. `[0-9]' in ASCII matches any decimal digit. |
164 | It is illegal\(dg for two ranges to share an endpoint, e.g. `a-c-e'. | | 164 | It is illegal# for two ranges to share an endpoint, e.g. `a-c-e'. |
165 | Ranges are very collating-sequence-dependent, | | 165 | Ranges are very collating-sequence-dependent, |
166 | and portable programs should avoid relying on them. | | 166 | and portable programs should avoid relying on them. |
167 | .Pp | | 167 | .Pp |
168 | To include a literal `]' in the list, make it the first character | | 168 | To include a literal `]' in the list, make it the first character |
169 | (following a possible `^'). | | 169 | (following a possible `^'). |
170 | To include a literal `\-', make it the first or last character, | | 170 | To include a literal `\-', make it the first or last character, |
171 | or the second endpoint of a range. | | 171 | or the second endpoint of a range. |
172 | To use a literal `\-' as the first endpoint of a range, | | 172 | To use a literal `\-' as the first endpoint of a range, |
173 | enclose it in `[.' and `.]' to make it a collating element (see below). | | 173 | enclose it in `[.' and `.]' to make it a collating element (see below). |
174 | With the exception of these and some combinations using `[' (see next | | 174 | With the exception of these and some combinations using `[' (see next |
175 | paragraphs), all other special characters, including `\e', lose their | | 175 | paragraphs), all other special characters, including `\e', lose their |
176 | special significance within a bracket expression. | | 176 | special significance within a bracket expression. |
177 | .Pp | | 177 | .Pp |
178 | Within a bracket expression, a collating element (a character, | | 178 | Within a bracket expression, a collating element (a character, |
179 | a multi-character sequence that collates as if it were a single character, | | 179 | a multi-character sequence that collates as if it were a single character, |
180 | or a collating-sequence name for either) | | 180 | or a collating-sequence name for either) |
181 | enclosed in `[.' and `.]' stands for the | | 181 | enclosed in `[.' and `.]' stands for the |
182 | sequence of characters of that collating element. | | 182 | sequence of characters of that collating element. |
183 | The sequence is a single element of the bracket expression's list. | | 183 | The sequence is a single element of the bracket expression's list. |
184 | A bracket expression containing a multi-character collating element | | 184 | A bracket expression containing a multi-character collating element |
185 | can thus match more than one character, | | 185 | can thus match more than one character, |
186 | e.g. if the collating sequence includes a `ch' collating element, | | 186 | e.g. if the collating sequence includes a `ch' collating element, |
187 | then the RE `[[.ch.]]*c' matches the first five characters | | 187 | then the RE `[[.ch.]]*c' matches the first five characters |
188 | of `chchcc'. | | 188 | of `chchcc'. |
189 | .Pp | | 189 | .Pp |
190 | Within a bracket expression, a collating element enclosed in `[=' and | | 190 | Within a bracket expression, a collating element enclosed in `[=' and |
191 | `=]' is an equivalence class, standing for the sequences of characters | | 191 | `=]' is an equivalence class, standing for the sequences of characters |
192 | of all collating elements equivalent to that one, including itself. | | 192 | of all collating elements equivalent to that one, including itself. |
193 | (If there are no other equivalent collating elements, | | 193 | (If there are no other equivalent collating elements, |
194 | the treatment is as if the enclosing delimiters were `[.' and `.]'.) | | 194 | the treatment is as if the enclosing delimiters were `[.' and `.]'.) |
195 | For example, if o and '\(^o' are the members of an equivalence class, | | 195 | For example, if o and '\(^o' are the members of an equivalence class, |
196 | then `[[=o=]]', `[[=\(^o'=]]', and `[o\(^o']' are all synonymous. | | 196 | then `[[=o=]]', `[[=\(^o'=]]', and `[o\(^o']' are all synonymous. |
197 | An equivalence class may not\(dg be an endpoint | | 197 | An equivalence class may not# be an endpoint |
198 | of a range. | | 198 | of a range. |
199 | .Pp | | 199 | .Pp |
200 | Within a bracket expression, the name of a | | 200 | Within a bracket expression, the name of a |
201 | .Em character class | | 201 | .Em character class |
202 | enclosed in `[:' and `:]' stands for the list of all characters | | 202 | enclosed in `[:' and `:]' stands for the list of all characters |
203 | belonging to that class. | | 203 | belonging to that class. |
204 | Standard character class names are: | | 204 | Standard character class names are: |
205 | .Bl -column "alnum" "digit" "xdigit" | | 205 | .Bl -column "alnum" "digit" "xdigit" |
206 | .It alnum digit punct | | 206 | .It alnum digit punct |
207 | .It alpha graph space | | 207 | .It alpha graph space |
208 | .It blank lower upper | | 208 | .It blank lower upper |
209 | .It cntrl print xdigit | | 209 | .It cntrl print xdigit |
210 | .El | | 210 | .El |
211 | .Pp | | 211 | .Pp |
212 | These stand for the character classes defined in | | 212 | These stand for the character classes defined in |
213 | .Xr ctype 3 . | | 213 | .Xr ctype 3 . |
214 | A locale may provide others. | | 214 | A locale may provide others. |
215 | A character class may not be used as an endpoint of a range. | | 215 | A character class may not be used as an endpoint of a range. |
216 | .Pp | | 216 | .Pp |
217 | There are two special cases\(dg of bracket expressions: | | 217 | There are two special cases# of bracket expressions: |
218 | the bracket expressions `[[:\*[Lt]:]]' and `[[:\*[Gt]:]]' match | | 218 | the bracket expressions `[[:\*[Lt]:]]' and `[[:\*[Gt]:]]' match |
219 | the null string at the beginning and end of a word respectively. | | 219 | the null string at the beginning and end of a word respectively. |
220 | A word is defined as a sequence of word characters | | 220 | A word is defined as a sequence of word characters |
221 | which is neither preceded nor followed by word characters. | | 221 | which is neither preceded nor followed by word characters. |
222 | A word character is an | | 222 | A word character is an |
223 | .Em alnum | | 223 | .Em alnum |
224 | character (as defined by | | 224 | character (as defined by |
225 | .Xr ctype 3 ) | | 225 | .Xr ctype 3 ) |
226 | or an underscore. | | 226 | or an underscore. |
227 | This is an extension, compatible with but not specified by POSIX 1003.2, | | 227 | This is an extension, compatible with but not specified by POSIX 1003.2, |
228 | and should be used with caution in software intended to be portable | | 228 | and should be used with caution in software intended to be portable |
229 | to other systems. | | 229 | to other systems. |
230 | .Pp | | 230 | .Pp |
231 | In the event that an RE could match more than one substring of a given | | 231 | In the event that an RE could match more than one substring of a given |
232 | string, the RE matches the one starting earliest in the string. | | 232 | string, the RE matches the one starting earliest in the string. |
233 | If the RE could match more than one substring starting at that point, | | 233 | If the RE could match more than one substring starting at that point, |
234 | it matches the longest. | | 234 | it matches the longest. |
235 | Subexpressions also match the longest possible substrings, subject to | | 235 | Subexpressions also match the longest possible substrings, subject to |
236 | the constraint that the whole match be as long as possible, | | 236 | the constraint that the whole match be as long as possible, |
237 | with subexpressions starting earlier in the RE taking priority over | | 237 | with subexpressions starting earlier in the RE taking priority over |
238 | ones starting later. | | 238 | ones starting later. |
239 | Note that higher-level subexpressions thus take priority over | | 239 | Note that higher-level subexpressions thus take priority over |
240 | their lower-level component subexpressions. | | 240 | their lower-level component subexpressions. |
241 | .Pp | | 241 | .Pp |
242 | Match lengths are measured in characters, not collating elements. | | 242 | Match lengths are measured in characters, not collating elements. |
243 | A null string is considered longer than no match at all. | | 243 | A null string is considered longer than no match at all. |
244 | For example, | | 244 | For example, |
245 | `bb*' matches the three middle characters of `abbbc', | | 245 | `bb*' matches the three middle characters of `abbbc', |
246 | `(wee|week)(knights|nights)' matches all ten characters of `weeknights', | | 246 | `(wee|week)(knights|nights)' matches all ten characters of `weeknights', |
247 | when `(.*).*' is matched against `abc' the parenthesized subexpression | | 247 | when `(.*).*' is matched against `abc' the parenthesized subexpression |
248 | matches all three characters, and | | 248 | matches all three characters, and |
249 | when `(a*)*' is matched against `bc' both the whole RE and the parenthesized | | 249 | when `(a*)*' is matched against `bc' both the whole RE and the parenthesized |
250 | subexpression match the null string. | | 250 | subexpression match the null string. |
251 | .Pp | | 251 | .Pp |
252 | If case-independent matching is specified, | | 252 | If case-independent matching is specified, |
253 | the effect is much as if all case distinctions had vanished from the | | 253 | the effect is much as if all case distinctions had vanished from the |
254 | alphabet. | | 254 | alphabet. |
255 | When an alphabetic that exists in multiple cases appears as an | | 255 | When an alphabetic that exists in multiple cases appears as an |
256 | ordinary character outside a bracket expression, it is effectively | | 256 | ordinary character outside a bracket expression, it is effectively |
257 | transformed into a bracket expression containing both cases, | | 257 | transformed into a bracket expression containing both cases, |
258 | e.g. `x' becomes `[xX]'. | | 258 | e.g. `x' becomes `[xX]'. |
259 | When it appears inside a bracket expression, all case counterparts | | 259 | When it appears inside a bracket expression, all case counterparts |
260 | of it are added to the bracket expression, so that (e.g.) `[x]' | | 260 | of it are added to the bracket expression, so that (e.g.) `[x]' |
261 | becomes `[xX]' and `[^x]' becomes `[^xX]'. | | 261 | becomes `[xX]' and `[^x]' becomes `[^xX]'. |
262 | .Pp | | 262 | .Pp |
263 | No particular limit is imposed on the length of REs\(dg. | | 263 | No particular limit is imposed on the length of REs#. |
264 | Programs intended to be portable should not employ REs longer | | 264 | Programs intended to be portable should not employ REs longer |
265 | than 256 bytes, | | 265 | than 256 bytes, |
266 | as an implementation can refuse to accept such REs and remain | | 266 | as an implementation can refuse to accept such REs and remain |
267 | POSIX-compliant. | | 267 | POSIX-compliant. |
268 | .Pp | | 268 | .Pp |
269 | Obsolete (``basic'') regular expressions differ in several respects. | | 269 | Obsolete (``basic'') regular expressions differ in several respects. |
270 | `|', `+', and `?' are ordinary characters and there is no equivalent | | 270 | `|', `+', and `?' are ordinary characters and there is no equivalent |
271 | for their functionality. | | 271 | for their functionality. |
272 | The delimiters for bounds are `\e{' and `\e}', | | 272 | The delimiters for bounds are `\e{' and `\e}', |
273 | with `{' and `}' by themselves ordinary characters. | | 273 | with `{' and `}' by themselves ordinary characters. |
274 | The parentheses for nested subexpressions are `\e(' and `\e)', | | 274 | The parentheses for nested subexpressions are `\e(' and `\e)', |
275 | with `(' and `)' by themselves ordinary characters. | | 275 | with `(' and `)' by themselves ordinary characters. |
276 | `^' is an ordinary character except at the beginning of the | | 276 | `^' is an ordinary character except at the beginning of the |
277 | RE or\(dg the beginning of a parenthesized subexpression, | | 277 | RE or# the beginning of a parenthesized subexpression, |
278 | `$' is an ordinary character except at the end of the | | 278 | `$' is an ordinary character except at the end of the |
279 | RE or\(dg the end of a parenthesized subexpression, | | 279 | RE or# the end of a parenthesized subexpression, |
280 | and `*' is an ordinary character if it appears at the beginning of the | | 280 | and `*' is an ordinary character if it appears at the beginning of the |
281 | RE or the beginning of a parenthesized subexpression | | 281 | RE or the beginning of a parenthesized subexpression |
282 | (after a possible leading `^'). | | 282 | (after a possible leading `^'). |
283 | Finally, there is one new type of atom, a | | 283 | Finally, there is one new type of atom, a |
284 | .Em back reference : | | 284 | .Em back reference : |
285 | `\e' followed by a non-zero decimal digit | | 285 | `\e' followed by a non-zero decimal digit |
286 | .Em d | | 286 | .Em d |
287 | matches the same sequence of characters | | 287 | matches the same sequence of characters |
288 | matched by the | | 288 | matched by the |
289 | .Em d Ns th parenthesized subexpression | | 289 | .Em d Ns th parenthesized subexpression |
290 | (numbering subexpressions by the positions of their opening parentheses, | | 290 | (numbering subexpressions by the positions of their opening parentheses, |
291 | left to right), | | 291 | left to right), |
292 | so that (e.g.) `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'. | | 292 | so that (e.g.) `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'. |
293 | .Sh SEE ALSO | | 293 | .Sh SEE ALSO |
294 | .Xr regex 3 | | 294 | .Xr regex 3 |
295 | .Pp | | 295 | .Pp |
296 | POSIX 1003.2, section 2.8 (Regular Expression Notation). | | 296 | POSIX 1003.2, section 2.8 (Regular Expression Notation). |
297 | .Sh BUGS | | 297 | .Sh BUGS |
298 | Having two kinds of REs is a botch. | | 298 | Having two kinds of REs is a botch. |
299 | .Pp | | 299 | .Pp |
300 | The current 1003.2 spec says that `)' is an ordinary character in | | 300 | The current 1003.2 spec says that `)' is an ordinary character in |
301 | the absence of an unmatched `('; | | 301 | the absence of an unmatched `('; |
302 | this was an unintentional result of a wording error, and change is likely. | | 302 | this was an unintentional result of a wording error, and change is likely. |
303 | Avoid relying on it. | | 303 | Avoid relying on it. |
304 | .Pp | | 304 | .Pp |
305 | Back references are a dreadful botch, | | 305 | Back references are a dreadful botch, |
306 | posing major problems for efficient implementations. | | 306 | posing major problems for efficient implementations. |
307 | They are also somewhat vaguely defined | | 307 | They are also somewhat vaguely defined |
308 | (does `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?). | | 308 | (does `a\e(\e(b\e)*\e2\e)*d' match `abbbd'?). |
309 | Avoid using them. | | 309 | Avoid using them. |
310 | .Pp | | 310 | .Pp |
311 | 1003.2's specification of case-independent matching is vague. | | 311 | 1003.2's specification of case-independent matching is vague. |
312 | The ``one case implies all cases'' definition given above | | 312 | The ``one case implies all cases'' definition given above |
313 | is current consensus among implementors as to the right interpretation. | | 313 | is current consensus among implementors as to the right interpretation. |
314 | .Pp | | 314 | .Pp |
315 | The syntax for word boundaries is incredibly ugly. | | 315 | The syntax for word boundaries is incredibly ugly. |