Sun Sep 17 10:32:06 2023 UTC ()
lang/nawk: downgrade to 20230909.

Partially revert previous commit, by downgrading the package to the
most recent release tag supporting ASCII encoded input files (and
processing strings as sequences of bytes).
This is needed by security/mozilla-rootcerts and likely other packages;
see https://mail-index.netbsd.org/tech-pkg/2023/09/17/msg028190.html.

This version incorporates all the changes described in the FIXES file up
to 2023-09-09, minus support for UTF-8 and comma-separated values (CSV)
input.


(vins)
diff -r1.3 -r1.4 pkgsrc/lang/nawk/DESCR
diff -r1.45 -r1.46 pkgsrc/lang/nawk/Makefile
diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/FIXES
diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/README
diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/lex.c
diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/main.c
diff -r1.5 -r1.6 pkgsrc/lang/nawk/files/awk.h
diff -r1.5 -r1.6 pkgsrc/lang/nawk/files/b.c
diff -r1.5 -r1.6 pkgsrc/lang/nawk/files/run.c
diff -r1.6 -r1.7 pkgsrc/lang/nawk/files/lib.c
diff -r1.6 -r1.7 pkgsrc/lang/nawk/files/proto.h
diff -r1.6 -r1.7 pkgsrc/lang/nawk/files/tran.c
diff -r1.3 -r1.4 pkgsrc/lang/nawk/files/nawk.1

cvs diff -r1.3 -r1.4 pkgsrc/lang/nawk/DESCR (expand / switch to unified diff)

--- pkgsrc/lang/nawk/DESCR 2023/09/12 19:16:52 1.3
+++ pkgsrc/lang/nawk/DESCR 2023/09/17 10:32:05 1.4
@@ -1,6 +1,5 @@ @@ -1,6 +1,5 @@
1The one, true implementation of the AWK pattern-directed scanning and 1The one, true implementation of the AWK pattern-directed scanning and
2processing language, by one of the language's creators, Brian Kernighan. 2processing language, by one of the language's creators, Brian Kernighan.
3This is the version of awk described in The AWK Programming Language, 3This is the version of awk described in "The AWK Programming Language",
4Second Edition, by Al Aho, Brian Kernighan, and Peter Weinberger 4by Al Aho, Brian Kernighan, and Peter Weinberger (Addison-Wesley,
5(Addison-Wesley, 2024, ISBN-13 978-0138269722, ISBN-10 0138269726). 51988, ISBN 0-201-07981-X). It is also known as new awk, or nawk.
6It is also known as new awk, or nawk. 

cvs diff -r1.45 -r1.46 pkgsrc/lang/nawk/Makefile (expand / switch to unified diff)

--- pkgsrc/lang/nawk/Makefile 2023/09/12 19:16:52 1.45
+++ pkgsrc/lang/nawk/Makefile 2023/09/17 10:32:05 1.46
@@ -1,16 +1,16 @@ @@ -1,16 +1,16 @@
1# $NetBSD: Makefile,v 1.45 2023/09/12 19:16:52 vins Exp $ 1# $NetBSD: Makefile,v 1.46 2023/09/17 10:32:05 vins Exp $
2 2
3DISTNAME= nawk-20230911 3DISTNAME= nawk-20230909
4CATEGORIES= lang 4CATEGORIES= lang
5MASTER_SITES= # empty 5MASTER_SITES= # empty
6DISTFILES= # empty 6DISTFILES= # empty
7 7
8MAINTAINER= pkgsrc-users@NetBSD.org 8MAINTAINER= pkgsrc-users@NetBSD.org
9HOMEPAGE= https://www.cs.princeton.edu/~bwk/btl.mirror/ 9HOMEPAGE= https://www.cs.princeton.edu/~bwk/btl.mirror/
10COMMENT= Brian Kernighan's pattern-directed scanning and processing language 10COMMENT= Brian Kernighan's pattern-directed scanning and processing language
11LICENSE= mit 11LICENSE= mit
12 12
13BOOTSTRAP_PKG= yes 13BOOTSTRAP_PKG= yes
14 14
15CFLAGS+= ${CPPFLAGS} -DYYMAXDEPTH=300 15CFLAGS+= ${CPPFLAGS} -DYYMAXDEPTH=300
16MAKE_FLAGS+= CC=${CC:Q} CFLAGS=${CFLAGS:M*:Q} 16MAKE_FLAGS+= CC=${CC:Q} CFLAGS=${CFLAGS:M*:Q}

cvs diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/FIXES (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/FIXES 2023/09/12 19:16:52 1.4
+++ pkgsrc/lang/nawk/files/FIXES 2023/09/17 10:32:05 1.5
@@ -15,47 +15,26 @@ permission. @@ -15,47 +15,26 @@ permission.
15LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 15LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
22THIS SOFTWARE. 22THIS SOFTWARE.
23****************************************************************/ 23****************************************************************/
24 24
25This file lists all bug fixes, changes, etc., made since the AWK book 25This file lists all bug fixes, changes, etc., made since the AWK book
26was sent to the printers in August 1987. 26was sent to the printers in August 1987.
27 27
28Sep 11, 2023: 
29 Added --csv option to enable processing of comma-separated 
30 values inputs. When --csv is enabled, fields are separated 
31 by commas, fields may be quoted with " double quotes, fields 
32 may contain embedded newlines. 
33 
34 If no explicit separator argument is provided, split() uses 
35 the setting of --csv to determine how fields are split. 
36 
37 Strings may now contain UTF-8 code points (not necessarily 
38 characters). Functions that operate on characters, like 
39 length, substr, index, match, etc., use UTF-8, so the length 
40 of a string of 3 emojis is 3, not 12 as it would be if bytes 
41 were counted. 
42 
43 Regular expressions are processes as UTF-8. 
44 
45 Unicode literals can be written as \u followed by one 
46 to eight hexadecimal digits. These may appear in strings and 
47 regular expressions. 
48 
49Sep 06, 2023: 28Sep 06, 2023:
50 Fix edge case where FS is changed on commandline. Thanks to  29 Fix edge case where FS is changed on commandline. Thanks to
51 Gordon Shephard and Miguel Pineiro Jr. 30 Gordon Shephard and Miguel Pineiro Jr.
52 31
53 Fix regular expression clobbering in the lexer, where lexer does 32 Fix regular expression clobbering in the lexer, where lexer does
54 not make a copy of regexp literals. also makedfa memory leaks have 33 not make a copy of regexp literals. also makedfa memory leaks have
55 been plugged. Thanks to Miguel Pineiro Jr. 34 been plugged. Thanks to Miguel Pineiro Jr.
56  35
57Dec 15, 2022: 36Dec 15, 2022:
58 Force hex escapes in strings to be no more than two characters, 37 Force hex escapes in strings to be no more than two characters,
59 as they already are in regular expressions. This brings internal 38 as they already are in regular expressions. This brings internal
60 consistency, as well as consistency with gawk. Thanks to 39 consistency, as well as consistency with gawk. Thanks to
61 Arnold Robbins. 40 Arnold Robbins.

cvs diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/README (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/README 2023/09/12 19:16:52 1.4
+++ pkgsrc/lang/nawk/files/README 2023/09/17 10:32:05 1.5
@@ -1,48 +1,18 @@ @@ -1,48 +1,18 @@
1# The One True Awk 1# The One True Awk
2 2
3This is the version of `awk` described in _The AWK Programming Language_, 3This is the version of `awk` described in _The AWK Programming Language_,
4Second Edition, by Al Aho, Brian Kernighan, and Peter Weinberger 4by Al Aho, Brian Kernighan, and Peter Weinberger
5(Addison-Wesley, 2024, ISBN-13 978-0138269722, ISBN-10 0138269726). 5(Addison-Wesley, 1988, ISBN 0-201-07981-X).
6 
7## What's New? ## 
8 
9This version of Awk handles UTF-8 and comma-separated values (CSV) input. 
10 
11### Strings ### 
12 
13Functions that process strings now count Unicode code points, not bytes; 
14this affects `length`, `substr`, `index`, `match`, `split`, 
15`sub`, `gsub`, and others. Note that code 
16points are not necessarily characters. 
17 
18UTF-8 sequences may appear in literal strings and regular expressions. 
19Aribtrary characters may be included with `\u` followed by 1 to 8 hexadecimal digits. 
20 
21### Regular expressions ### 
22 
23Regular expressions may include UTF-8 code points, including `\u`. 
24Character classes are likely to be limited to about 256 characters 
25when expanded. 
26 
27### CSV ### 
28 
29The option `--csv` turns on CSV processing of input: 
30fields are separated by commas, fields may be quoted with 
31double-quote (`"`) characters, fields may contain embedded newlines. 
32In CSV mode, `FS` is ignored. 
33 
34If no explicit separator argument is provided, 
35field-splitting in `split` is determined by CSV mode. 
36 6
37## Copyright 7## Copyright
38 8
39Copyright (C) Lucent Technologies 1997<br/> 9Copyright (C) Lucent Technologies 1997<br/>
40All Rights Reserved 10All Rights Reserved
41 11
42Permission to use, copy, modify, and distribute this software and 12Permission to use, copy, modify, and distribute this software and
43its documentation for any purpose and without fee is hereby 13its documentation for any purpose and without fee is hereby
44granted, provided that the above copyright notice appear in all 14granted, provided that the above copyright notice appear in all
45copies and that both that the copyright notice and this 15copies and that both that the copyright notice and this
46permission notice and warranty disclaimer appear in supporting 16permission notice and warranty disclaimer appear in supporting
47documentation, and that the name Lucent Technologies or any of 17documentation, and that the name Lucent Technologies or any of
48its entities not be used in advertising or publicity pertaining 18its entities not be used in advertising or publicity pertaining
@@ -86,61 +56,69 @@ posting the pull request. To do so: @@ -86,61 +56,69 @@ posting the pull request. To do so:
86 56
87* Please create the pull request with a request 57* Please create the pull request with a request
88to merge into the `staging` branch instead of into the `master` branch. 58to merge into the `staging` branch instead of into the `master` branch.
89This allows us to do testing, and to make any additional edits or changes 59This allows us to do testing, and to make any additional edits or changes
90after the merge but before merging to `master`. 60after the merge but before merging to `master`.
91 61
92## Building 62## Building
93 63
94The program itself is created by 64The program itself is created by
95 65
96 make 66 make
97 67
98which should produce a sequence of messages roughly like this: 68which should produce a sequence of messages roughly like this:
99  69
100 yacc -d -b awkgram awkgram.y 70 yacc -d awkgram.y
101 yacc: 44 shift/reduce conflicts, 85 reduce/reduce conflicts. 71 conflicts: 43 shift/reduce, 85 reduce/reduce
102 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o awkgram.tab.o awkgram.tab.c 72 mv y.tab.c ytab.c
103 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o b.o b.c 73 mv y.tab.h ytab.h
104 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o main.o main.c 74 cc -c ytab.c
105 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o parse.o parse.c 75 cc -c b.c
106 cc -g -Wall -pedantic -Wcast-qual -O2 maketab.c -o maketab 76 cc -c main.c
107 ./maketab awkgram.tab.h >proctab.c 77 cc -c parse.c
108 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o proctab.o proctab.c 78 cc maketab.c -o maketab
109 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o tran.o tran.c 79 ./maketab >proctab.c
110 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o lib.o lib.c 80 cc -c proctab.c
111 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o run.o run.c 81 cc -c tran.c
112 cc -g -Wall -pedantic -Wcast-qual -O2 -c -o lex.o lex.c 82 cc -c lib.c
113 cc -g -Wall -pedantic -Wcast-qual -O2 awkgram.tab.o b.o main.o parse.o proctab.o tran.o lib.o run.o lex.o -lm 83 cc -c run.c
 84 cc -c lex.c
 85 cc ytab.o b.o main.o parse.o proctab.o tran.o lib.o run.o lex.o -lm
114 86
115This produces an executable `a.out`; you will eventually want to 87This produces an executable `a.out`; you will eventually want to
116move this to some place like `/usr/bin/awk`. 88move this to some place like `/usr/bin/awk`.
117 89
118If your system does not have `yacc` or `bison` (the GNU 90If your system does not have `yacc` or `bison` (the GNU
119equivalent), you need to install one of them first. 91equivalent), you need to install one of them first.
120 92
121NOTE: This version uses ISO/IEC C99, as you should also. We have 93NOTE: This version uses ISO/IEC C99, as you should also. We have
122compiled this without any changes using `gcc -Wall` and/or local C 94compiled this without any changes using `gcc -Wall` and/or local C
123compilers on a variety of systems, but new systems or compilers 95compilers on a variety of systems, but new systems or compilers
124may raise some new complaint; reports of difficulties are 96may raise some new complaint; reports of difficulties are
125welcome. 97welcome.
126 98
127This compiles without change on Macintosh OS X using `gcc` and 99This compiles without change on Macintosh OS X using `gcc` and
128the standard developer tools. 100the standard developer tools.
129 101
130You can also use `make CC=g++` to build with the GNU C++ compiler, 102You can also use `make CC=g++` to build with the GNU C++ compiler,
131should you choose to do so. 103should you choose to do so.
132 104
 105The version of `malloc` that comes with some systems is sometimes
 106astonishly slow. If `awk` seems slow, you might try fixing that.
 107More generally, turning on optimization can significantly improve
 108`awk`'s speed, perhaps by 1/3 for highest levels.
 109
133## A Note About Releases 110## A Note About Releases
134 111
135We don't usually do releases. 112We don't usually do releases.
136 113
137## A Note About Maintenance 114## A Note About Maintenance
138 115
139NOTICE! Maintenance of this program is on a ''best effort'' 116NOTICE! Maintenance of this program is on a ''best effort''
140basis. We try to get to issues and pull requests as quickly 117basis. We try to get to issues and pull requests as quickly
141as we can. Unfortunately, however, keeping this program going 118as we can. Unfortunately, however, keeping this program going
142is not at the top of our priority list. 119is not at the top of our priority list.
143 120
144#### Last Updated 121#### Last Updated
145 122
146Sun Sep 3 09:26:43 EDT 2023 123Sun 23 Jan 2022 03:48:01 PM EST
 124

cvs diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/lex.c (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/lex.c 2023/09/12 19:16:52 1.4
+++ pkgsrc/lang/nawk/files/lex.c 2023/09/17 10:32:06 1.5
@@ -358,28 +358,26 @@ int yylex(void) @@ -358,28 +358,26 @@ int yylex(void)
358 case '(': 358 case '(':
359 parencnt++; 359 parencnt++;
360 RET('('); 360 RET('(');
361 361
362 case '"': 362 case '"':
363 return string(); /* BUG: should be like tran.c ? */ 363 return string(); /* BUG: should be like tran.c ? */
364 364
365 default: 365 default:
366 RET(c); 366 RET(c);
367 } 367 }
368 } 368 }
369} 369}
370 370
371extern int runetochar(char *str, int c); 
372 
373int string(void) 371int string(void)
374{ 372{
375 int c, n; 373 int c, n;
376 char *s, *bp; 374 char *s, *bp;
377 static char *buf = NULL; 375 static char *buf = NULL;
378 static int bufsz = 500; 376 static int bufsz = 500;
379 377
380 if (buf == NULL && (buf = (char *) malloc(bufsz)) == NULL) 378 if (buf == NULL && (buf = (char *) malloc(bufsz)) == NULL)
381 FATAL("out of space for strings"); 379 FATAL("out of space for strings");
382 for (bp = buf; (c = input()) != '"'; ) { 380 for (bp = buf; (c = input()) != '"'; ) {
383 if (!adjbuf(&buf, &bufsz, bp-buf+2, 500, &bp, "string")) 381 if (!adjbuf(&buf, &bufsz, bp-buf+2, 500, &bp, "string"))
384 FATAL("out of space for string %.10s...", buf); 382 FATAL("out of space for string %.10s...", buf);
385 switch (c) { 383 switch (c) {
@@ -407,73 +405,52 @@ int string(void) @@ -407,73 +405,52 @@ int string(void)
407 case '\\': *bp++ = '\\'; break; 405 case '\\': *bp++ = '\\'; break;
408 406
409 case '0': case '1': case '2': /* octal: \d \dd \ddd */ 407 case '0': case '1': case '2': /* octal: \d \dd \ddd */
410 case '3': case '4': case '5': case '6': case '7': 408 case '3': case '4': case '5': case '6': case '7':
411 n = c - '0'; 409 n = c - '0';
412 if ((c = peek()) >= '0' && c < '8') { 410 if ((c = peek()) >= '0' && c < '8') {
413 n = 8 * n + input() - '0'; 411 n = 8 * n + input() - '0';
414 if ((c = peek()) >= '0' && c < '8') 412 if ((c = peek()) >= '0' && c < '8')
415 n = 8 * n + input() - '0'; 413 n = 8 * n + input() - '0';
416 } 414 }
417 *bp++ = n; 415 *bp++ = n;
418 break; 416 break;
419 417
420 case 'x': /* hex \x0-9a-fA-F (exactly two) */ 418 case 'x': /* hex \x0-9a-fA-F + */
421 { 419 {
422 int i; 420 int i;
423 421
424 n = 0; 422 n = 0;
425 for (i = 1; i <= 2; i++) { 423 for (i = 1; i <= 2; i++) {
426 c = input(); 424 c = input();
427 if (c == 0) 425 if (c == 0)
428 break; 426 break;
429 if (isxdigit(c)) { 427 if (isxdigit(c)) {
430 c = tolower(c); 428 c = tolower(c);
431 n *= 16; 429 n *= 16;
432 if (isdigit(c)) 430 if (isdigit(c))
433 n += (c - '0'); 431 n += (c - '0');
434 else 432 else
435 n += 10 + (c - 'a'); 433 n += 10 + (c - 'a');
436 } else 434 } else
437 break; 435 break;
438 } 436 }
439 if (n) 437 if (n)
440 *bp++ = n; 438 *bp++ = n;
441 else 439 else
442 unput(c); 440 unput(c);
443 break; 441 break;
444 } 442 }
445 443
446 case 'u': /* utf \u0-9a-fA-F (1..8) */ 
447 { 
448 int i; 
449 
450 n = 0; 
451 for (i = 0; i < 8; i++) { 
452 c = input(); 
453 if (!isxdigit(c) || c == 0) 
454 break; 
455 c = tolower(c); 
456 n *= 16; 
457 if (isdigit(c)) 
458 n += (c - '0'); 
459 else 
460 n += 10 + (c - 'a'); 
461 } 
462 unput(c); 
463 bp += runetochar(bp, n); 
464 break; 
465 } 
466 
467 default: 444 default:
468 *bp++ = c; 445 *bp++ = c;
469 break; 446 break;
470 } 447 }
471 break; 448 break;
472 default: 449 default:
473 *bp++ = c; 450 *bp++ = c;
474 break; 451 break;
475 } 452 }
476 } 453 }
477 *bp = 0; 454 *bp = 0;
478 s = tostring(buf); 455 s = tostring(buf);
479 *bp++ = ' '; *bp++ = '\0'; 456 *bp++ = ' '; *bp++ = '\0';

cvs diff -r1.4 -r1.5 pkgsrc/lang/nawk/files/main.c (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/main.c 2023/09/12 19:16:52 1.4
+++ pkgsrc/lang/nawk/files/main.c 2023/09/17 10:32:06 1.5
@@ -12,55 +12,53 @@ its entities not be used in advertising  @@ -12,55 +12,53 @@ its entities not be used in advertising
12to distribution of the software without specific, written prior 12to distribution of the software without specific, written prior
13permission. 13permission.
14 14
15LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 15LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
22THIS SOFTWARE. 22THIS SOFTWARE.
23****************************************************************/ 23****************************************************************/
24 24
25const char *version = "version 20230911"; 25const char *version = "version 20230909";
26 26
27#define DEBUG 27#define DEBUG
28#include <stdio.h> 28#include <stdio.h>
29#include <ctype.h> 29#include <ctype.h>
30#include <locale.h> 30#include <locale.h>
31#include <stdlib.h> 31#include <stdlib.h>
32#include <string.h> 32#include <string.h>
33#include <signal.h> 33#include <signal.h>
34#include "awk.h" 34#include "awk.h"
35 35
36extern char **environ; 36extern char **environ;
37extern int nfields; 37extern int nfields;
38 38
39int dbg = 0; 39int dbg = 0;
40Awkfloat srand_seed = 1; 40Awkfloat srand_seed = 1;
41char *cmdname; /* gets argv[0] for error messages */ 41char *cmdname; /* gets argv[0] for error messages */
42extern FILE *yyin; /* lex input file */ 42extern FILE *yyin; /* lex input file */
43char *lexprog; /* points to program argument if it exists */ 43char *lexprog; /* points to program argument if it exists */
44extern int errorflag; /* non-zero if any syntax errors; set by yyerror */ 44extern int errorflag; /* non-zero if any syntax errors; set by yyerror */
45enum compile_states compile_time = ERROR_PRINTING; 45enum compile_states compile_time = ERROR_PRINTING;
46 46
47static char **pfile; /* program filenames from -f's */ 47static char **pfile; /* program filenames from -f's */
48static size_t maxpfile; /* max program filename */ 48static size_t maxpfile; /* max program filename */
49static size_t npfile; /* number of filenames */ 49static size_t npfile; /* number of filenames */
50static size_t curpfile; /* current filename */ 50static size_t curpfile; /* current filename */
51 51
52bool CSV = false; /* true for csv input */ 
53 
54bool safe = false; /* true => "safe" mode */ 52bool safe = false; /* true => "safe" mode */
55 53
56static noreturn void fpecatch(int n 54static noreturn void fpecatch(int n
57#ifdef SA_SIGINFO 55#ifdef SA_SIGINFO
58 , siginfo_t *si, void *uc 56 , siginfo_t *si, void *uc
59#endif 57#endif
60) 58)
61{ 59{
62#ifdef SA_SIGINFO 60#ifdef SA_SIGINFO
63 static const char *emsg[] = { 61 static const char *emsg[] = {
64 [0] = "Unknown error", 62 [0] = "Unknown error",
65 [FPE_INTDIV] = "Integer divide by zero", 63 [FPE_INTDIV] = "Integer divide by zero",
66 [FPE_INTOVF] = "Integer overflow", 64 [FPE_INTOVF] = "Integer overflow",
@@ -142,32 +140,26 @@ int main(int argc, char *argv[]) @@ -142,32 +140,26 @@ int main(int argc, char *argv[])
142 140
143 yyin = NULL; 141 yyin = NULL;
144 symtab = makesymtab(NSYMTAB/NSYMTAB); 142 symtab = makesymtab(NSYMTAB/NSYMTAB);
145 while (argc > 1 && argv[1][0] == '-' && argv[1][1] != '\0') { 143 while (argc > 1 && argv[1][0] == '-' && argv[1][1] != '\0') {
146 if (strcmp(argv[1], "-version") == 0 || strcmp(argv[1], "--version") == 0) { 144 if (strcmp(argv[1], "-version") == 0 || strcmp(argv[1], "--version") == 0) {
147 printf("awk %s\n", version); 145 printf("awk %s\n", version);
148 return 0; 146 return 0;
149 } 147 }
150 if (strcmp(argv[1], "--") == 0) { /* explicit end of args */ 148 if (strcmp(argv[1], "--") == 0) { /* explicit end of args */
151 argc--; 149 argc--;
152 argv++; 150 argv++;
153 break; 151 break;
154 } 152 }
155 if (strcmp(argv[1], "--csv") == 0) { /* turn on csv input processing */ 
156 CSV = true; 
157 argc--; 
158 argv++; 
159 continue; 
160 } 
161 switch (argv[1][1]) { 153 switch (argv[1][1]) {
162 case 's': 154 case 's':
163 if (strcmp(argv[1], "-safe") == 0) 155 if (strcmp(argv[1], "-safe") == 0)
164 safe = true; 156 safe = true;
165 break; 157 break;
166 case 'f': /* next argument is program filename */ 158 case 'f': /* next argument is program filename */
167 fn = getarg(&argc, &argv, "no program filename"); 159 fn = getarg(&argc, &argv, "no program filename");
168 if (npfile >= maxpfile) { 160 if (npfile >= maxpfile) {
169 maxpfile += 20; 161 maxpfile += 20;
170 pfile = (char **) realloc(pfile, maxpfile * sizeof(*pfile)); 162 pfile = (char **) realloc(pfile, maxpfile * sizeof(*pfile));
171 if (pfile == NULL) 163 if (pfile == NULL)
172 FATAL("error allocating space for -f options"); 164 FATAL("error allocating space for -f options");
173 } 165 }

cvs diff -r1.5 -r1.6 pkgsrc/lang/nawk/files/awk.h (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/awk.h 2023/09/12 19:16:52 1.5
+++ pkgsrc/lang/nawk/files/awk.h 2023/09/17 10:32:05 1.6
@@ -68,28 +68,26 @@ extern char EMPTY[]; /* this avoid -Wwri @@ -68,28 +68,26 @@ extern char EMPTY[]; /* this avoid -Wwri
68extern char **FS; 68extern char **FS;
69extern char **RS; 69extern char **RS;
70extern char **ORS; 70extern char **ORS;
71extern char **OFS; 71extern char **OFS;
72extern char **OFMT; 72extern char **OFMT;
73extern Awkfloat *NR; 73extern Awkfloat *NR;
74extern Awkfloat *FNR; 74extern Awkfloat *FNR;
75extern Awkfloat *NF; 75extern Awkfloat *NF;
76extern char **FILENAME; 76extern char **FILENAME;
77extern char **SUBSEP; 77extern char **SUBSEP;
78extern Awkfloat *RSTART; 78extern Awkfloat *RSTART;
79extern Awkfloat *RLENGTH; 79extern Awkfloat *RLENGTH;
80 80
81extern bool CSV; /* true for csv input */ 
82 
83extern char *record; /* points to $0 */ 81extern char *record; /* points to $0 */
84extern int lineno; /* line number in awk program */ 82extern int lineno; /* line number in awk program */
85extern int errorflag; /* 1 if error has occurred */ 83extern int errorflag; /* 1 if error has occurred */
86extern bool donefld; /* true if record broken into fields */ 84extern bool donefld; /* true if record broken into fields */
87extern bool donerec; /* true if record is valid (no fld has changed */ 85extern bool donerec; /* true if record is valid (no fld has changed */
88extern int dbg; 86extern int dbg;
89 87
90extern const char *patbeg; /* beginning of pattern matched */ 88extern const char *patbeg; /* beginning of pattern matched */
91extern int patlen; /* length of pattern matched. set in b.c */ 89extern int patlen; /* length of pattern matched. set in b.c */
92 90
93/* Cell: all information about a variable or constant */ 91/* Cell: all information about a variable or constant */
94 92
95typedef struct Cell { 93typedef struct Cell {
@@ -217,51 +215,43 @@ extern int pairstack[], paircnt; @@ -217,51 +215,43 @@ extern int pairstack[], paircnt;
217#define isfld(n) ((n)->tval & FLD) 215#define isfld(n) ((n)->tval & FLD)
218#define isstr(n) ((n)->tval & STR) 216#define isstr(n) ((n)->tval & STR)
219#define isnum(n) ((n)->tval & NUM) 217#define isnum(n) ((n)->tval & NUM)
220#define isarr(n) ((n)->tval & ARR) 218#define isarr(n) ((n)->tval & ARR)
221#define isfcn(n) ((n)->tval & FCN) 219#define isfcn(n) ((n)->tval & FCN)
222#define istrue(n) ((n)->csub == BTRUE) 220#define istrue(n) ((n)->csub == BTRUE)
223#define istemp(n) ((n)->csub == CTEMP) 221#define istemp(n) ((n)->csub == CTEMP)
224#define isargument(n) ((n)->nobj == ARG) 222#define isargument(n) ((n)->nobj == ARG)
225/* #define freeable(p) (!((p)->tval & DONTFREE)) */ 223/* #define freeable(p) (!((p)->tval & DONTFREE)) */
226#define freeable(p) ( ((p)->tval & (STR|DONTFREE)) == STR ) 224#define freeable(p) ( ((p)->tval & (STR|DONTFREE)) == STR )
227 225
228/* structures used by regular expression matching machinery, mostly b.c: */ 226/* structures used by regular expression matching machinery, mostly b.c: */
229 227
230#define NCHARS (1256+3) /* 256 handles 8-bit chars; 128 does 7-bit */ 228#define NCHARS (256+3) /* 256 handles 8-bit chars; 128 does 7-bit */
231 /* BUG: some overflows (caught) if we use 256 */ 
232 /* watch out in match(), etc. */ 229 /* watch out in match(), etc. */
233#define HAT (NCHARS+2) /* matches ^ in regular expr */ 230#define HAT (NCHARS+2) /* matches ^ in regular expr */
234#define NSTATES 32 231#define NSTATES 32
235 232
236typedef struct rrow { 233typedef struct rrow {
237 long ltype; /* long avoids pointer warnings on 64-bit */ 234 long ltype; /* long avoids pointer warnings on 64-bit */
238 union { 235 union {
239 int i; 236 int i;
240 Node *np; 237 Node *np;
241 uschar *up; 238 uschar *up;
242 int *rp; /* rune representation of char class */ 
243 } lval; /* because Al stores a pointer in it! */ 239 } lval; /* because Al stores a pointer in it! */
244 int *lfollow; 240 int *lfollow;
245} rrow; 241} rrow;
246 242
247typedef struct gtt { /* gototab entry */ 
248 unsigned int ch; 
249 unsigned int state; 
250} gtt; 
251 
252typedef struct fa { 243typedef struct fa {
253 gtt **gototab; 244 unsigned int **gototab;
254 int gototab_len; 
255 uschar *out; 245 uschar *out;
256 uschar *restr; 246 uschar *restr;
257 int **posns; 247 int **posns;
258 int state_count; 248 int state_count;
259 bool anchor; 249 bool anchor;
260 int use; 250 int use;
261 int initstat; 251 int initstat;
262 int curstat; 252 int curstat;
263 int accept; 253 int accept;
264 struct rrow re[1]; /* variable: actual size set by calling malloc */ 254 struct rrow re[1]; /* variable: actual size set by calling malloc */
265} fa; 255} fa;
266 256
267 257

cvs diff -r1.5 -r1.6 pkgsrc/lang/nawk/files/b.c (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/b.c 2023/09/12 19:16:52 1.5
+++ pkgsrc/lang/nawk/files/b.c 2023/09/17 10:32:05 1.6
@@ -70,119 +70,83 @@ static const uschar *basestr; /* starts @@ -70,119 +70,83 @@ static const uschar *basestr; /* starts
70 repetition processing */ 70 repetition processing */
71static const uschar *firstbasestr; 71static const uschar *firstbasestr;
72 72
73static int setcnt; 73static int setcnt;
74static int poscnt; 74static int poscnt;
75 75
76const char *patbeg; 76const char *patbeg;
77int patlen; 77int patlen;
78 78
79#define NFA 128 /* cache this many dynamic fa's */ 79#define NFA 128 /* cache this many dynamic fa's */
80fa *fatab[NFA]; 80fa *fatab[NFA];
81int nfatab = 0; /* entries in fatab */ 81int nfatab = 0; /* entries in fatab */
82 82
83 
84/* utf-8 mechanism: 
85 
86 For most of Awk, utf-8 strings just "work", since they look like 
87 null-terminated sequences of 8-bit bytes. 
88 
89 Functions like length(), index(), and substr() have to operate 
90 in units of utf-8 characters. The u8_* functions in run.c 
91 handle this. 
92 
93 Regular expressions are more complicated, since the basic 
94 mechanism of the goto table used 8-bit byte indices into the 
95 gototab entries to compute the next state. Unicode is a lot 
96 bigger, so the gototab entries are now structs with a character 
97 and a next state, and there is a linear search of the characters 
98 to find the state. (Yes, this is slower, by a significant 
99 amount. Tough.) 
100 
101 Throughout the RE mechanism in b.c, utf-8 characters are 
102 converted to their utf-32 value. This mostly shows up in 
103 cclenter, which expands character class ranges like a-z and now 
104 alpha-omega. The size of a gototab array is still about 256. 
105 This should be dynamic, but for now things work ok for a single 
106 code page of Unicode, which is the most likely case. 
107 
108 The code changes are localized in run.c and b.c. I have added a 
109 handful of functions to somewhat better hide the implementation, 
110 but a lot more could be done. 
111 
112 */ 
113 
114static int get_gototab(fa*, int, int); 
115static int set_gototab(fa*, int, int, int); 
116extern int u8_rune(int *, const uschar *); 
117 
118static int * 83static int *
119intalloc(size_t n, const char *f) 84intalloc(size_t n, const char *f)
120{ 85{
121 int *p = (int *) calloc(n, sizeof(int)); 86 int *p = (int *) calloc(n, sizeof(int));
122 if (p == NULL) 87 if (p == NULL)
123 overflo(f); 88 overflo(f);
124 return p; 89 return p;
125} 90}
126 91
127static void 92static void
128resizesetvec(const char *f) 93resizesetvec(const char *f)
129{ 94{
130 if (maxsetvec == 0) 95 if (maxsetvec == 0)
131 maxsetvec = MAXLIN; 96 maxsetvec = MAXLIN;
132 else 97 else
133 maxsetvec *= 4; 98 maxsetvec *= 4;
134 setvec = (int *) realloc(setvec, maxsetvec * sizeof(*setvec)); 99 setvec = (int *) realloc(setvec, maxsetvec * sizeof(*setvec));
135 tmpset = (int *) realloc(tmpset, maxsetvec * sizeof(*tmpset)); 100 tmpset = (int *) realloc(tmpset, maxsetvec * sizeof(*tmpset));
136 if (setvec == NULL || tmpset == NULL) 101 if (setvec == NULL || tmpset == NULL)
137 overflo(f); 102 overflo(f);
138} 103}
139 104
140static void 105static void
141resize_state(fa *f, int state) 106resize_state(fa *f, int state)
142{ 107{
143 gtt **p; 108 unsigned int **p;
144 uschar *p2; 109 uschar *p2;
145 int **p3; 110 int **p3;
146 int i, new_count; 111 int i, new_count;
147 112
148 if (++state < f->state_count) 113 if (++state < f->state_count)
149 return; 114 return;
150 115
151 new_count = state + 10; /* needs to be tuned */ 116 new_count = state + 10; /* needs to be tuned */
152 117
153 p = (gtt **) realloc(f->gototab, new_count * sizeof(f->gototab[0])); 118 p = (unsigned int **) realloc(f->gototab, new_count * sizeof(f->gototab[0]));
154 if (p == NULL) 119 if (p == NULL)
155 goto out; 120 goto out;
156 f->gototab = p; 121 f->gototab = p;
157 122
158 p2 = (uschar *) realloc(f->out, new_count * sizeof(f->out[0])); 123 p2 = (uschar *) realloc(f->out, new_count * sizeof(f->out[0]));
159 if (p2 == NULL) 124 if (p2 == NULL)
160 goto out; 125 goto out;
161 f->out = p2; 126 f->out = p2;
162 127
163 p3 = (int **) realloc(f->posns, new_count * sizeof(f->posns[0])); 128 p3 = (int **) realloc(f->posns, new_count * sizeof(f->posns[0]));
164 if (p3 == NULL) 129 if (p3 == NULL)
165 goto out; 130 goto out;
166 f->posns = p3; 131 f->posns = p3;
167 132
168 for (i = f->state_count; i < new_count; ++i) { 133 for (i = f->state_count; i < new_count; ++i) {
169 f->gototab[i] = (gtt *) calloc(NCHARS, sizeof(**f->gototab)); 134 f->gototab[i] = (unsigned int *) calloc(NCHARS, sizeof(**f->gototab));
170 if (f->gototab[i] == NULL) 135 if (f->gototab[i] == NULL)
171 goto out; 136 goto out;
172 f->out[i] = 0; 137 f->out[i] = 0;
173 f->posns[i] = NULL; 138 f->posns[i] = NULL;
174 } 139 }
175 f->gototab_len = NCHARS; /* should be variable, growable */ 
176 f->state_count = new_count; 140 f->state_count = new_count;
177 return; 141 return;
178out: 142out:
179 overflo(__func__); 143 overflo(__func__);
180} 144}
181 145
182fa *makedfa(const char *s, bool anchor) /* returns dfa for reg expr s */ 146fa *makedfa(const char *s, bool anchor) /* returns dfa for reg expr s */
183{ 147{
184 int i, use, nuse; 148 int i, use, nuse;
185 fa *pfa; 149 fa *pfa;
186 static int now = 1; 150 static int now = 1;
187 151
188 if (setvec == NULL) { /* first time through any RE */ 152 if (setvec == NULL) { /* first time through any RE */
@@ -257,27 +221,27 @@ int makeinit(fa *f, bool anchor) @@ -257,27 +221,27 @@ int makeinit(fa *f, bool anchor)
257 int i, k; 221 int i, k;
258 222
259 f->curstat = 2; 223 f->curstat = 2;
260 f->out[2] = 0; 224 f->out[2] = 0;
261 k = *(f->re[0].lfollow); 225 k = *(f->re[0].lfollow);
262 xfree(f->posns[2]); 226 xfree(f->posns[2]);
263 f->posns[2] = intalloc(k + 1, __func__); 227 f->posns[2] = intalloc(k + 1, __func__);
264 for (i = 0; i <= k; i++) { 228 for (i = 0; i <= k; i++) {
265 (f->posns[2])[i] = (f->re[0].lfollow)[i]; 229 (f->posns[2])[i] = (f->re[0].lfollow)[i];
266 } 230 }
267 if ((f->posns[2])[1] == f->accept) 231 if ((f->posns[2])[1] == f->accept)
268 f->out[2] = 1; 232 f->out[2] = 1;
269 for (i = 0; i < NCHARS; i++) 233 for (i = 0; i < NCHARS; i++)
270 set_gototab(f, 2, 0, 0); /* f->gototab[2][i] = 0; */ 234 f->gototab[2][i] = 0;
271 f->curstat = cgoto(f, 2, HAT); 235 f->curstat = cgoto(f, 2, HAT);
272 if (anchor) { 236 if (anchor) {
273 *f->posns[2] = k-1; /* leave out position 0 */ 237 *f->posns[2] = k-1; /* leave out position 0 */
274 for (i = 0; i < k; i++) { 238 for (i = 0; i < k; i++) {
275 (f->posns[0])[i] = (f->posns[2])[i]; 239 (f->posns[0])[i] = (f->posns[2])[i];
276 } 240 }
277 241
278 f->out[0] = f->out[2]; 242 f->out[0] = f->out[2];
279 if (f->curstat != 2) 243 if (f->curstat != 2)
280 --(*f->posns[f->curstat]); 244 --(*f->posns[f->curstat]);
281 } 245 }
282 return f->curstat; 246 return f->curstat;
283} 247}
@@ -326,151 +290,128 @@ void freetr(Node *p) /* free parse tree  @@ -326,151 +290,128 @@ void freetr(Node *p) /* free parse tree
326 freetr(left(p)); 290 freetr(left(p));
327 freetr(right(p)); 291 freetr(right(p));
328 xfree(p); 292 xfree(p);
329 break; 293 break;
330 default: /* can't happen */ 294 default: /* can't happen */
331 FATAL("can't happen: unknown type %d in freetr", type(p)); 295 FATAL("can't happen: unknown type %d in freetr", type(p));
332 break; 296 break;
333 } 297 }
334} 298}
335 299
336/* in the parsing of regular expressions, metacharacters like . have */ 300/* in the parsing of regular expressions, metacharacters like . have */
337/* to be seen literally; \056 is not a metacharacter. */ 301/* to be seen literally; \056 is not a metacharacter. */
338 302
339int hexstr(const uschar **pp, int max) /* find and eval hex string at pp, return new p */ 303int hexstr(const uschar **pp) /* find and eval hex string at pp, return new p */
340{ /* only pick up one 8-bit byte (2 chars) */ 304{ /* only pick up one 8-bit byte (2 chars) */
341 const uschar *p; 305 const uschar *p;
342 int n = 0; 306 int n = 0;
343 int i; 307 int i;
344 308
345 for (i = 0, p = *pp; i < max && isxdigit(*p); i++, p++) { 309 for (i = 0, p = *pp; i < 2 && isxdigit(*p); i++, p++) {
346 if (isdigit(*p)) 310 if (isdigit(*p))
347 n = 16 * n + *p - '0'; 311 n = 16 * n + *p - '0';
348 else if (*p >= 'a' && *p <= 'f') 312 else if (*p >= 'a' && *p <= 'f')
349 n = 16 * n + *p - 'a' + 10; 313 n = 16 * n + *p - 'a' + 10;
350 else if (*p >= 'A' && *p <= 'F') 314 else if (*p >= 'A' && *p <= 'F')
351 n = 16 * n + *p - 'A' + 10; 315 n = 16 * n + *p - 'A' + 10;
352 } 316 }
353 *pp = p; 317 *pp = p;
354 return n; 318 return n;
355} 319}
356 320
357 
358 
359#define isoctdigit(c) ((c) >= '0' && (c) <= '7') /* multiple use of arg */ 321#define isoctdigit(c) ((c) >= '0' && (c) <= '7') /* multiple use of arg */
360 322
361int quoted(const uschar **pp) /* pick up next thing after a \\ */ 323int quoted(const uschar **pp) /* pick up next thing after a \\ */
362 /* and increment *pp */ 324 /* and increment *pp */
363{ 325{
364 const uschar *p = *pp; 326 const uschar *p = *pp;
365 int c; 327 int c;
366 328
367/* BUG: should advance by utf-8 char even if makes no sense */ 329 if ((c = *p++) == 't')
368 
369 if ((c = *p++) == 't') { 
370 c = '\t'; 330 c = '\t';
371 } else if (c == 'n') { 331 else if (c == 'n')
372 c = '\n'; 332 c = '\n';
373 } else if (c == 'f') { 333 else if (c == 'f')
374 c = '\f'; 334 c = '\f';
375 } else if (c == 'r') { 335 else if (c == 'r')
376 c = '\r'; 336 c = '\r';
377 } else if (c == 'b') { 337 else if (c == 'b')
378 c = '\b'; 338 c = '\b';
379 } else if (c == 'v') { 339 else if (c == 'v')
380 c = '\v'; 340 c = '\v';
381 } else if (c == 'a') { 341 else if (c == 'a')
382 c = '\a'; 342 c = '\a';
383 } else if (c == '\\') { 343 else if (c == '\\')
384 c = '\\'; 344 c = '\\';
385 } else if (c == 'x') { /* 2 hex digits follow */ 345 else if (c == 'x') { /* hexadecimal goo follows */
386 c = hexstr(&p, 2); /* this adds a null if number is invalid */ 346 c = hexstr(&p); /* this adds a null if number is invalid */
387 } else if (c == 'u') { /* unicode char number up to 8 hex digits */ 
388 c = hexstr(&p, 8); 
389 } else if (isoctdigit(c)) { /* \d \dd \ddd */ 347 } else if (isoctdigit(c)) { /* \d \dd \ddd */
390 int n = c - '0'; 348 int n = c - '0';
391 if (isoctdigit(*p)) { 349 if (isoctdigit(*p)) {
392 n = 8 * n + *p++ - '0'; 350 n = 8 * n + *p++ - '0';
393 if (isoctdigit(*p)) 351 if (isoctdigit(*p))
394 n = 8 * n + *p++ - '0'; 352 n = 8 * n + *p++ - '0';
395 } 353 }
396 c = n; 354 c = n;
397 } /* else */ 355 } /* else */
398 /* c = c; */ 356 /* c = c; */
399 *pp = p; 357 *pp = p;
400 return c; 358 return c;
401} 359}
402 360
403int *cclenter(const char *argp) /* add a character class */ 361char *cclenter(const char *argp) /* add a character class */
404{ 362{
405 int i, c, c2; 363 int i, c, c2;
406 int n; 364 const uschar *op, *p = (const uschar *) argp;
407 const uschar *p = (const uschar *) argp; 365 uschar *bp;
408 int *bp, *retp; 366 static uschar *buf = NULL;
409 static int *buf = NULL; 
410 static int bufsz = 100; 367 static int bufsz = 100;
411 368
412 if (buf == NULL && (buf = (int *) calloc(bufsz, sizeof(int))) == NULL) 369 op = p;
 370 if (buf == NULL && (buf = (uschar *) malloc(bufsz)) == NULL)
413 FATAL("out of space for character class [%.10s...] 1", p); 371 FATAL("out of space for character class [%.10s...] 1", p);
414 bp = buf; 372 bp = buf;
415 for (i = 0; *p != 0; ) { 373 for (i = 0; (c = *p++) != 0; ) {
416 n = u8_rune(&c, p); 
417 p += n; 
418 if (c == '\\') { 374 if (c == '\\') {
419 c = quoted(&p); 375 c = quoted(&p);
420 } else if (c == '-' && i > 0 && bp[-1] != 0) { 376 } else if (c == '-' && i > 0 && bp[-1] != 0) {
421 if (*p != 0) { 377 if (*p != 0) {
422 c = bp[-1]; 378 c = bp[-1];
423 /* c2 = *p++; */ 379 c2 = *p++;
424 n = u8_rune(&c2, p); 
425 p += n; 
426 if (c2 == '\\') 380 if (c2 == '\\')
427 c2 = quoted(&p); /* BUG: sets p, has to be u8 size */ 381 c2 = quoted(&p);
428 if (c > c2) { /* empty; ignore */ 382 if (c > c2) { /* empty; ignore */
429 bp--; 383 bp--;
430 i--; 384 i--;
431 continue; 385 continue;
432 } 386 }
433 while (c < c2) { 387 while (c < c2) {
434 if (i >= bufsz) { 388 if (!adjbuf((char **) &buf, &bufsz, bp-buf+2, 100, (char **) &bp, "cclenter1"))
435 bufsz *= 2; 389 FATAL("out of space for character class [%.10s...] 2", p);
436 buf = (int *) realloc(buf, bufsz * sizeof(int)); 
437 if (buf == NULL) 
438 FATAL("out of space for character class [%.10s...] 2", p); 
439 bp = buf + i; 
440 } 
441 *bp++ = ++c; 390 *bp++ = ++c;
442 i++; 391 i++;
443 } 392 }
444 continue; 393 continue;
445 } 394 }
446 } 395 }
447 if (i >= bufsz) { 396 if (!adjbuf((char **) &buf, &bufsz, bp-buf+2, 100, (char **) &bp, "cclenter2"))
448 bufsz *= 2; 397 FATAL("out of space for character class [%.10s...] 3", p);
449 buf = (int *) realloc(buf, bufsz * sizeof(int)); 
450 if (buf == NULL) 
451 FATAL("out of space for character class [%.10s...] 2", p); 
452 bp = buf + i; 
453 } 
454 *bp++ = c; 398 *bp++ = c;
455 i++; 399 i++;
456 } 400 }
457 *bp = 0; 401 *bp = 0;
458 /* DPRINTF("cclenter: in = |%s|, out = |%s|\n", op, buf); BUG: can't print array of int */ 402 DPRINTF("cclenter: in = |%s|, out = |%s|\n", op, buf);
459 /* xfree(op); BUG: what are we freeing here? */ 403 xfree(op);
460 retp = (int *) calloc(bp-buf+1, sizeof(int)); 404 return (char *) tostring((char *) buf);
461 for (i = 0; i < bp-buf+1; i++) 
462 retp[i] = buf[i]; 
463 return retp; 
464} 405}
465 406
466void overflo(const char *s) 407void overflo(const char *s)
467{ 408{
468 FATAL("regular expression too big: out of space in %.30s...", s); 409 FATAL("regular expression too big: out of space in %.30s...", s);
469} 410}
470 411
471void cfoll(fa *f, Node *v) /* enter follow set of each leaf of vertex v into lfollow[leaf] */ 412void cfoll(fa *f, Node *v) /* enter follow set of each leaf of vertex v into lfollow[leaf] */
472{ 413{
473 int i; 414 int i;
474 int *p; 415 int *p;
475 416
476 switch (type(v)) { 417 switch (type(v)) {
@@ -573,284 +514,195 @@ void follow(Node *v) /* collects leaves  @@ -573,284 +514,195 @@ void follow(Node *v) /* collects leaves
573 514
574 case CAT: 515 case CAT:
575 if (v == left(p)) { /* v is left child of p */ 516 if (v == left(p)) { /* v is left child of p */
576 if (first(right(p)) == 0) { 517 if (first(right(p)) == 0) {
577 follow(p); 518 follow(p);
578 return; 519 return;
579 } 520 }
580 } else /* v is right child */ 521 } else /* v is right child */
581 follow(p); 522 follow(p);
582 return; 523 return;
583 } 524 }
584} 525}
585 526
586int member(int c, int *sarg) /* is c in s? */ 527int member(int c, const char *sarg) /* is c in s? */
587{ 528{
588 int *s = (int *) sarg; 529 const uschar *s = (const uschar *) sarg;
589 530
590 while (*s) 531 while (*s)
591 if (c == *s++) 532 if (c == *s++)
592 return(1); 533 return(1);
593 return(0); 534 return(0);
594} 535}
595 536
596static int get_gototab(fa *f, int state, int ch) /* hide gototab inplementation */ 
597{ 
598 int i; 
599 for (i = 0; i < f->gototab_len; i++) { 
600 if (f->gototab[state][i].ch == 0) 
601 break; 
602 if (f->gototab[state][i].ch == ch) 
603 return f->gototab[state][i].state; 
604 } 
605 return 0; 
606} 
607 
608static int set_gototab(fa *f, int state, int ch, int val) /* hide gototab inplementation */ 
609{ 
610 int i; 
611 for (i = 0; i < f->gototab_len; i++) { 
612 if (f->gototab[state][i].ch == 0 || f->gototab[state][i].ch == ch) { 
613 f->gototab[state][i].ch = ch; 
614 f->gototab[state][i].state = val; 
615 return val; 
616 } 
617 } 
618 overflo(__func__); 
619 return val; /* not used anywhere at the moment */ 
620} 
621 
622int match(fa *f, const char *p0) /* shortest match ? */ 537int match(fa *f, const char *p0) /* shortest match ? */
623{ 538{
624 int s, ns; 539 int s, ns;
625 int n; 
626 int rune; 
627 const uschar *p = (const uschar *) p0; 540 const uschar *p = (const uschar *) p0;
628 541
629 /* return pmatch(f, p0); does it matter whether longest or shortest? */ 
630 
631 s = f->initstat; 542 s = f->initstat;
632 assert (s < f->state_count); 543 assert (s < f->state_count);
633 544
634 if (f->out[s]) 545 if (f->out[s])
635 return(1); 546 return(1);
636 do { 547 do {
637 /* assert(*p < NCHARS); */ 548 /* assert(*p < NCHARS); */
638 n = u8_rune(&rune, p); 549 if ((ns = f->gototab[s][*p]) != 0)
639 if ((ns = get_gototab(f, s, rune)) != 0) 
640 s = ns; 550 s = ns;
641 else 551 else
642 s = cgoto(f, s, rune); 552 s = cgoto(f, s, *p);
643 if (f->out[s]) 553 if (f->out[s])
644 return(1); 554 return(1);
645 if (*p == 0) 555 } while (*p++ != 0);
646 break; 
647 p += n; 
648 } while (1); /* was *p++ != 0 */ 
649 return(0); 556 return(0);
650} 557}
651 558
652int pmatch(fa *f, const char *p0) /* longest match, for sub */ 559int pmatch(fa *f, const char *p0) /* longest match, for sub */
653{ 560{
654 int s, ns; 561 int s, ns;
655 int n; 
656 int rune; 
657 const uschar *p = (const uschar *) p0; 562 const uschar *p = (const uschar *) p0;
658 const uschar *q; 563 const uschar *q;
659 564
660 s = f->initstat; 565 s = f->initstat;
661 assert(s < f->state_count); 566 assert(s < f->state_count);
662 567
663 patbeg = (const char *)p; 568 patbeg = (const char *)p;
664 patlen = -1; 569 patlen = -1;
665 do { 570 do {
666 q = p; 571 q = p;
667 do { 572 do {
668 if (f->out[s]) /* final state */ 573 if (f->out[s]) /* final state */
669 patlen = q-p; 574 patlen = q-p;
670 /* assert(*q < NCHARS); */ 575 /* assert(*q < NCHARS); */
671 n = u8_rune(&rune, q); 576 if ((ns = f->gototab[s][*q]) != 0)
672 if ((ns = get_gototab(f, s, rune)) != 0) 
673 s = ns; 577 s = ns;
674 else 578 else
675 s = cgoto(f, s, rune); 579 s = cgoto(f, s, *q);
676 580
677 assert(s < f->state_count); 581 assert(s < f->state_count);
678 582
679 if (s == 1) { /* no transition */ 583 if (s == 1) { /* no transition */
680 if (patlen >= 0) { 584 if (patlen >= 0) {
681 patbeg = (const char *) p; 585 patbeg = (const char *) p;
682 return(1); 586 return(1);
683 } 587 }
684 else 588 else
685 goto nextin; /* no match */ 589 goto nextin; /* no match */
686 } 590 }
687 if (*q == 0) 591 } while (*q++ != 0);
688 break; 
689 q += n; 
690 } while (1); 
691 q++; /* was *q++ */ 
692 if (f->out[s]) 592 if (f->out[s])
693 patlen = q-p-1; /* don't count $ */ 593 patlen = q-p-1; /* don't count $ */
694 if (patlen >= 0) { 594 if (patlen >= 0) {
695 patbeg = (const char *) p; 595 patbeg = (const char *) p;
696 return(1); 596 return(1);
697 } 597 }
698 nextin: 598 nextin:
699 s = 2; 599 s = 2;
700 if (*p == 0) 600 } while (*p++);
701 break; 
702 n = u8_rune(&rune, p); 
703 p += n; 
704 } while (1); /* was *p++ */ 
705 return (0); 601 return (0);
706} 602}
707 603
708int nematch(fa *f, const char *p0) /* non-empty match, for sub */ 604int nematch(fa *f, const char *p0) /* non-empty match, for sub */
709{ 605{
710 int s, ns; 606 int s, ns;
711 int n; 
712 int rune; 
713 const uschar *p = (const uschar *) p0; 607 const uschar *p = (const uschar *) p0;
714 const uschar *q; 608 const uschar *q;
715 609
716 s = f->initstat; 610 s = f->initstat;
717 assert(s < f->state_count); 611 assert(s < f->state_count);
718 612
719 patbeg = (const char *)p; 613 patbeg = (const char *)p;
720 patlen = -1; 614 patlen = -1;
721 while (*p) { 615 while (*p) {
722 q = p; 616 q = p;
723 do { 617 do {
724 if (f->out[s]) /* final state */ 618 if (f->out[s]) /* final state */
725 patlen = q-p; 619 patlen = q-p;
726 /* assert(*q < NCHARS); */ 620 /* assert(*q < NCHARS); */
727 n = u8_rune(&rune, q); 621 if ((ns = f->gototab[s][*q]) != 0)
728 if ((ns = get_gototab(f, s, rune)) != 0) 
729 s = ns; 622 s = ns;
730 else 623 else
731 s = cgoto(f, s, rune); 624 s = cgoto(f, s, *q);
732 if (s == 1) { /* no transition */ 625 if (s == 1) { /* no transition */
733 if (patlen > 0) { 626 if (patlen > 0) {
734 patbeg = (const char *) p; 627 patbeg = (const char *) p;
735 return(1); 628 return(1);
736 } else 629 } else
737 goto nnextin; /* no nonempty match */ 630 goto nnextin; /* no nonempty match */
738 } 631 }
739 if (*q == 0) 632 } while (*q++ != 0);
740 break; 
741 q += n; 
742 } while (1); 
743 q++; 
744 if (f->out[s]) 633 if (f->out[s])
745 patlen = q-p-1; /* don't count $ */ 634 patlen = q-p-1; /* don't count $ */
746 if (patlen > 0 ) { 635 if (patlen > 0 ) {
747 patbeg = (const char *) p; 636 patbeg = (const char *) p;
748 return(1); 637 return(1);
749 } 638 }
750 nnextin: 639 nnextin:
751 s = 2; 640 s = 2;
752 p++; 641 p++;
753 } 642 }
754 return (0); 643 return (0);
755} 644}
756 645
757static int getrune(FILE *fp, char **pbuf, int *pbufsize, int quantum, 
758 int *curpos, int *lastpos) 
759{ 
760 int c = 0; 
761 char *buf = *pbuf; 
762 static const int max_bytes = 4; // max multiple bytes in UTF-8 is 4 
763 int i, rune; 
764 uschar private_buf[max_bytes + 1]; 
765 
766 for (i = 0; i <= max_bytes; i++) { 
767 if (++*curpos == *lastpos) { 
768 if (*lastpos == *pbufsize) 
769 if (!adjbuf((char **) pbuf, pbufsize, *pbufsize+1, quantum, 0, "getrune")) 
770 FATAL("stream '%.30s...' too long", buf); 
771 buf[(*lastpos)++] = (c = getc(fp)) != EOF ? c : 0; 
772 private_buf[i] = c; 
773 } 
774 if (c == 0 || c < 128 || (c >> 6) == 4) { // 10xxxxxx starts a new character 
775 ungetc(c, fp); 
776 private_buf[i] = 0; 
777 break; 
778 } 
779 } 
780 
781 u8_rune(& rune, private_buf); 
782 
783 return rune; 
784} 
785 
786 646
787/* 647/*
788 * NAME 648 * NAME
789 * fnematch 649 * fnematch
790 * 650 *
791 * DESCRIPTION 651 * DESCRIPTION
792 * A stream-fed version of nematch which transfers characters to a 652 * A stream-fed version of nematch which transfers characters to a
793 * null-terminated buffer. All characters up to and including the last 653 * null-terminated buffer. All characters up to and including the last
794 * character of the matching text or EOF are placed in the buffer. If 654 * character of the matching text or EOF are placed in the buffer. If
795 * a match is found, patbeg and patlen are set appropriately. 655 * a match is found, patbeg and patlen are set appropriately.
796 * 656 *
797 * RETURN VALUES 657 * RETURN VALUES
798 * false No match found. 658 * false No match found.
799 * true Match found. 659 * true Match found.
800 */ 660 */
801 661
802bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, int quantum) 662bool fnematch(fa *pfa, FILE *f, char **pbuf, int *pbufsize, int quantum)
803{ 663{
804 char *buf = *pbuf; 664 char *buf = *pbuf;
805 int bufsize = *pbufsize; 665 int bufsize = *pbufsize;
806 int c, i, j, k, ns, s; 666 int c, i, j, k, ns, s;
807 int rune; 
808 667
809 s = pfa->initstat; 668 s = pfa->initstat;
810 patlen = 0; 669 patlen = 0;
811 670
812 /* 671 /*
813 * All indices relative to buf. 672 * All indices relative to buf.
814 * i <= j <= k <= bufsize 673 * i <= j <= k <= bufsize
815 * 674 *
816 * i: origin of active substring 675 * i: origin of active substring
817 * j: current character 676 * j: current character
818 * k: destination of next getc() 677 * k: destination of next getc()
819 */ 678 */
820 i = -1, k = 0; 679 i = -1, k = 0;
821 do { 680 do {
822 j = i++; 681 j = i++;
823 do { 682 do {
824 if (++j == k) { 683 if (++j == k) {
825 if (k == bufsize) 684 if (k == bufsize)
826 if (!adjbuf((char **) &buf, &bufsize, bufsize+1, quantum, 0, "fnematch")) 685 if (!adjbuf((char **) &buf, &bufsize, bufsize+1, quantum, 0, "fnematch"))
827 FATAL("stream '%.30s...' too long", buf); 686 FATAL("stream '%.30s...' too long", buf);
828 buf[k++] = (c = getc(f)) != EOF ? c : 0; 687 buf[k++] = (c = getc(f)) != EOF ? c : 0;
829 } 688 }
830 c = (uschar)buf[j]; 689 c = (uschar)buf[j];
831 if (c < 128) 690 /* assert(c < NCHARS); */
832 rune = c; 
833 else { 
834 j--; 
835 k--; 
836 ungetc(c, f); 
837 rune = getrune(f, &buf, &bufsize, quantum, &j, &k); 
838 } 
839 691
840 if ((ns = get_gototab(pfa, s, rune)) != 0) 692 if ((ns = pfa->gototab[s][c]) != 0)
841 s = ns; 693 s = ns;
842 else 694 else
843 s = cgoto(pfa, s, rune); 695 s = cgoto(pfa, s, c);
844 696
845 if (pfa->out[s]) { /* final state */ 697 if (pfa->out[s]) { /* final state */
846 patlen = j - i + 1; 698 patlen = j - i + 1;
847 if (c == 0) /* don't count $ */ 699 if (c == 0) /* don't count $ */
848 patlen--; 700 patlen--;
849 } 701 }
850 } while (buf[j] && s != 1); 702 } while (buf[j] && s != 1);
851 s = 2; 703 s = 2;
852 } while (buf[i] && !patlen); 704 } while (buf[i] && !patlen);
853 705
854 /* adjbuf() may have relocated a resized buffer. Inform the world. */ 706 /* adjbuf() may have relocated a resized buffer. Inform the world. */
855 *pbuf = buf; 707 *pbuf = buf;
856 *pbufsize = bufsize; 708 *pbufsize = bufsize;
@@ -1157,51 +1009,43 @@ static int repeat(const uschar *reptok,  @@ -1157,51 +1009,43 @@ static int repeat(const uschar *reptok,
1157 return replace_repeat(reptok, reptoklen, atom, atomlen, 1009 return replace_repeat(reptok, reptoklen, atom, atomlen,
1158 firstnum, secondnum, REPEAT_SIMPLE); 1010 firstnum, secondnum, REPEAT_SIMPLE);
1159 } 1011 }
1160 } else if (firstnum < secondnum) { /* {n,m} -> repeat n-1 times then alternate */ 1012 } else if (firstnum < secondnum) { /* {n,m} -> repeat n-1 times then alternate */
1161 /* x{n,m} => xx...x{1, m-n+1} => xx...x?x?x?..x? */ 1013 /* x{n,m} => xx...x{1, m-n+1} => xx...x?x?x?..x? */
1162 return replace_repeat(reptok, reptoklen, atom, atomlen, 1014 return replace_repeat(reptok, reptoklen, atom, atomlen,
1163 firstnum, secondnum, REPEAT_WITH_Q); 1015 firstnum, secondnum, REPEAT_WITH_Q);
1164 } else { /* Error - shouldn't be here (n>m) */ 1016 } else { /* Error - shouldn't be here (n>m) */
1165 FATAL("internal error"); 1017 FATAL("internal error");
1166 } 1018 }
1167 return 0; 1019 return 0;
1168} 1020}
1169 1021
1170extern int u8_rune(int *, const uschar *); /* run.c; should be in header file */ 
1171 
1172int relex(void) /* lexical analyzer for reparse */ 1022int relex(void) /* lexical analyzer for reparse */
1173{ 1023{
1174 int c, n; 1024 int c, n;
1175 int cflag; 1025 int cflag;
1176 static uschar *buf = NULL; 1026 static uschar *buf = NULL;
1177 static int bufsz = 100; 1027 static int bufsz = 100;
1178 uschar *bp; 1028 uschar *bp;
1179 const struct charclass *cc; 1029 const struct charclass *cc;
1180 int i; 1030 int i;
1181 int num, m; 1031 int num, m;
1182 bool commafound, digitfound; 1032 bool commafound, digitfound;
1183 const uschar *startreptok; 1033 const uschar *startreptok;
1184 static int parens = 0; 1034 static int parens = 0;
1185 1035
1186rescan: 1036rescan:
1187 starttok = prestr; 1037 starttok = prestr;
1188 1038
1189 if ((n = u8_rune(&rlxval, prestr)) > 1) { 
1190 prestr += n; 
1191 starttok = prestr; 
1192 return CHAR; 
1193 } 
1194 
1195 switch (c = *prestr++) { 1039 switch (c = *prestr++) {
1196 case '|': return OR; 1040 case '|': return OR;
1197 case '*': return STAR; 1041 case '*': return STAR;
1198 case '+': return PLUS; 1042 case '+': return PLUS;
1199 case '?': return QUEST; 1043 case '?': return QUEST;
1200 case '.': return DOT; 1044 case '.': return DOT;
1201 case '\0': prestr--; return '\0'; 1045 case '\0': prestr--; return '\0';
1202 case '^': 1046 case '^':
1203 case '$': 1047 case '$':
1204 return c; 1048 return c;
1205 case '(': 1049 case '(':
1206 parens++; 1050 parens++;
1207 return c; 1051 return c;
@@ -1219,35 +1063,30 @@ rescan: @@ -1219,35 +1063,30 @@ rescan:
1219 default: 1063 default:
1220 rlxval = c; 1064 rlxval = c;
1221 return CHAR; 1065 return CHAR;
1222 case '[': 1066 case '[':
1223 if (buf == NULL && (buf = (uschar *) malloc(bufsz)) == NULL) 1067 if (buf == NULL && (buf = (uschar *) malloc(bufsz)) == NULL)
1224 FATAL("out of space in reg expr %.10s..", lastre); 1068 FATAL("out of space in reg expr %.10s..", lastre);
1225 bp = buf; 1069 bp = buf;
1226 if (*prestr == '^') { 1070 if (*prestr == '^') {
1227 cflag = 1; 1071 cflag = 1;
1228 prestr++; 1072 prestr++;
1229 } 1073 }
1230 else 1074 else
1231 cflag = 0; 1075 cflag = 0;
1232 n = 5 * strlen((const char *) prestr)+1; /* BUG: was 2. what value? */ 1076 n = 2 * strlen((const char *) prestr)+1;
1233 if (!adjbuf((char **) &buf, &bufsz, n, n, (char **) &bp, "relex1")) 1077 if (!adjbuf((char **) &buf, &bufsz, n, n, (char **) &bp, "relex1"))
1234 FATAL("out of space for reg expr %.10s...", lastre); 1078 FATAL("out of space for reg expr %.10s...", lastre);
1235 for (; ; ) { 1079 for (; ; ) {
1236 if ((n = u8_rune(&rlxval, prestr)) > 1) { 
1237 for (i = 0; i < n; i++) 
1238 *bp++ = *prestr++; 
1239 continue; 
1240 } 
1241 if ((c = *prestr++) == '\\') { 1080 if ((c = *prestr++) == '\\') {
1242 *bp++ = '\\'; 1081 *bp++ = '\\';
1243 if ((c = *prestr++) == '\0') 1082 if ((c = *prestr++) == '\0')
1244 FATAL("nonterminated character class %.20s...", lastre); 1083 FATAL("nonterminated character class %.20s...", lastre);
1245 *bp++ = c; 1084 *bp++ = c;
1246 /* } else if (c == '\n') { */ 1085 /* } else if (c == '\n') { */
1247 /* FATAL("newline in character class %.20s...", lastre); */ 1086 /* FATAL("newline in character class %.20s...", lastre); */
1248 } else if (c == '[' && *prestr == ':') { 1087 } else if (c == '[' && *prestr == ':') {
1249 /* POSIX char class names, Dag-Erling Smorgrav, des@ofug.org */ 1088 /* POSIX char class names, Dag-Erling Smorgrav, des@ofug.org */
1250 for (cc = charclasses; cc->cc_name; cc++) 1089 for (cc = charclasses; cc->cc_name; cc++)
1251 if (strncmp((const char *) prestr + 1, (const char *) cc->cc_name, cc->cc_namelen) == 0) 1090 if (strncmp((const char *) prestr + 1, (const char *) cc->cc_name, cc->cc_namelen) == 0)
1252 break; 1091 break;
1253 if (cc->cc_name != NULL && prestr[1 + cc->cc_namelen] == ':' && 1092 if (cc->cc_name != NULL && prestr[1 + cc->cc_namelen] == ':' &&
@@ -1394,44 +1233,44 @@ rescan: @@ -1394,44 +1233,44 @@ rescan:
1394 FATAL("illegal repetition expression: class %.20s", 1233 FATAL("illegal repetition expression: class %.20s",
1395 lastre); 1234 lastre);
1396 } 1235 }
1397 } 1236 }
1398 break; 1237 break;
1399 } 1238 }
1400} 1239}
1401 1240
1402int cgoto(fa *f, int s, int c) 1241int cgoto(fa *f, int s, int c)
1403{ 1242{
1404 int *p, *q; 1243 int *p, *q;
1405 int i, j, k; 1244 int i, j, k;
1406 1245
1407 /* assert(c == HAT || c < NCHARS); BUG: seg fault if disable test */ 1246 assert(c == HAT || c < NCHARS);
1408 while (f->accept >= maxsetvec) { /* guessing here! */ 1247 while (f->accept >= maxsetvec) { /* guessing here! */
1409 resizesetvec(__func__); 1248 resizesetvec(__func__);
1410 } 1249 }
1411 for (i = 0; i <= f->accept; i++) 1250 for (i = 0; i <= f->accept; i++)
1412 setvec[i] = 0; 1251 setvec[i] = 0;
1413 setcnt = 0; 1252 setcnt = 0;
1414 resize_state(f, s); 1253 resize_state(f, s);
1415 /* compute positions of gototab[s,c] into setvec */ 1254 /* compute positions of gototab[s,c] into setvec */
1416 p = f->posns[s]; 1255 p = f->posns[s];
1417 for (i = 1; i <= *p; i++) { 1256 for (i = 1; i <= *p; i++) {
1418 if ((k = f->re[p[i]].ltype) != FINAL) { 1257 if ((k = f->re[p[i]].ltype) != FINAL) {
1419 if ((k == CHAR && c == ptoi(f->re[p[i]].lval.np)) 1258 if ((k == CHAR && c == ptoi(f->re[p[i]].lval.np))
1420 || (k == DOT && c != 0 && c != HAT) 1259 || (k == DOT && c != 0 && c != HAT)
1421 || (k == ALL && c != 0) 1260 || (k == ALL && c != 0)
1422 || (k == EMPTYRE && c != 0) 1261 || (k == EMPTYRE && c != 0)
1423 || (k == CCL && member(c, (int *) f->re[p[i]].lval.rp)) 1262 || (k == CCL && member(c, (char *) f->re[p[i]].lval.up))
1424 || (k == NCCL && !member(c, (int *) f->re[p[i]].lval.rp) && c != 0 && c != HAT)) { 1263 || (k == NCCL && !member(c, (char *) f->re[p[i]].lval.up) && c != 0 && c != HAT)) {
1425 q = f->re[p[i]].lfollow; 1264 q = f->re[p[i]].lfollow;
1426 for (j = 1; j <= *q; j++) { 1265 for (j = 1; j <= *q; j++) {
1427 if (q[j] >= maxsetvec) { 1266 if (q[j] >= maxsetvec) {
1428 resizesetvec(__func__); 1267 resizesetvec(__func__);
1429 } 1268 }
1430 if (setvec[q[j]] == 0) { 1269 if (setvec[q[j]] == 0) {
1431 setcnt++; 1270 setcnt++;
1432 setvec[q[j]] = 1; 1271 setvec[q[j]] = 1;
1433 } 1272 }
1434 } 1273 }
1435 } 1274 }
1436 } 1275 }
1437 } 1276 }
@@ -1443,42 +1282,42 @@ int cgoto(fa *f, int s, int c) @@ -1443,42 +1282,42 @@ int cgoto(fa *f, int s, int c)
1443 tmpset[j++] = i; 1282 tmpset[j++] = i;
1444 } 1283 }
1445 resize_state(f, f->curstat > s ? f->curstat : s); 1284 resize_state(f, f->curstat > s ? f->curstat : s);
1446 /* tmpset == previous state? */ 1285 /* tmpset == previous state? */
1447 for (i = 1; i <= f->curstat; i++) { 1286 for (i = 1; i <= f->curstat; i++) {
1448 p = f->posns[i]; 1287 p = f->posns[i];
1449 if ((k = tmpset[0]) != p[0]) 1288 if ((k = tmpset[0]) != p[0])
1450 goto different; 1289 goto different;
1451 for (j = 1; j <= k; j++) 1290 for (j = 1; j <= k; j++)
1452 if (tmpset[j] != p[j]) 1291 if (tmpset[j] != p[j])
1453 goto different; 1292 goto different;
1454 /* setvec is state i */ 1293 /* setvec is state i */
1455 if (c != HAT) 1294 if (c != HAT)
1456 set_gototab(f, s, c, i); 1295 f->gototab[s][c] = i;
1457 return i; 1296 return i;
1458 different:; 1297 different:;
1459 } 1298 }
1460 1299
1461 /* add tmpset to current set of states */ 1300 /* add tmpset to current set of states */
1462 ++(f->curstat); 1301 ++(f->curstat);
1463 resize_state(f, f->curstat); 1302 resize_state(f, f->curstat);
1464 for (i = 0; i < NCHARS; i++) 1303 for (i = 0; i < NCHARS; i++)
1465 set_gototab(f, f->curstat, 0, 0); 1304 f->gototab[f->curstat][i] = 0;
1466 xfree(f->posns[f->curstat]); 1305 xfree(f->posns[f->curstat]);
1467 p = intalloc(setcnt + 1, __func__); 1306 p = intalloc(setcnt + 1, __func__);
1468 1307
1469 f->posns[f->curstat] = p; 1308 f->posns[f->curstat] = p;
1470 if (c != HAT) 1309 if (c != HAT)
1471 set_gototab(f, s, c, f->curstat); 1310 f->gototab[s][c] = f->curstat;
1472 for (i = 0; i <= setcnt; i++) 1311 for (i = 0; i <= setcnt; i++)
1473 p[i] = tmpset[i]; 1312 p[i] = tmpset[i];
1474 if (setvec[f->accept]) 1313 if (setvec[f->accept])
1475 f->out[f->curstat] = 1; 1314 f->out[f->curstat] = 1;
1476 else 1315 else
1477 f->out[f->curstat] = 0; 1316 f->out[f->curstat] = 0;
1478 return f->curstat; 1317 return f->curstat;
1479} 1318}
1480 1319
1481 1320
1482void freefa(fa *f) /* free a finite automaton */ 1321void freefa(fa *f) /* free a finite automaton */
1483{ 1322{
1484 int i; 1323 int i;

cvs diff -r1.5 -r1.6 pkgsrc/lang/nawk/files/run.c (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/run.c 2023/09/12 19:16:52 1.5
+++ pkgsrc/lang/nawk/files/run.c 2023/09/17 10:32:06 1.6
@@ -16,43 +16,42 @@ LUCENT DISCLAIMS ALL WARRANTIES WITH REG @@ -16,43 +16,42 @@ LUCENT DISCLAIMS ALL WARRANTIES WITH REG
16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
22THIS SOFTWARE. 22THIS SOFTWARE.
23****************************************************************/ 23****************************************************************/
24 24
25#define DEBUG 25#define DEBUG
26#include <stdio.h> 26#include <stdio.h>
27#include <ctype.h> 27#include <ctype.h>
28#include <errno.h> 28#include <errno.h>
 29#include <wchar.h>
29#include <wctype.h> 30#include <wctype.h>
30#include <fcntl.h> 31#include <fcntl.h>
31#include <setjmp.h> 32#include <setjmp.h>
32#include <limits.h> 33#include <limits.h>
33#include <math.h> 34#include <math.h>
34#include <string.h> 35#include <string.h>
35#include <stdlib.h> 36#include <stdlib.h>
36#include <time.h> 37#include <time.h>
37#include <sys/types.h> 38#include <sys/types.h>
38#include <sys/wait.h> 39#include <sys/wait.h>
39#include "awk.h" 40#include "awk.h"
40#include "awkgram.tab.h" 41#include "awkgram.tab.h"
41 42
42 
43static void stdinit(void); 43static void stdinit(void);
44static void flush_all(void); 44static void flush_all(void);
45static char *wide_char_to_byte_str(int rune, size_t *outlen); 
46 45
47#if 1 46#if 1
48#define tempfree(x) do { if (istemp(x)) tfree(x); } while (/*CONSTCOND*/0) 47#define tempfree(x) do { if (istemp(x)) tfree(x); } while (/*CONSTCOND*/0)
49#else 48#else
50void tempfree(Cell *p) { 49void tempfree(Cell *p) {
51 if (p->ctype == OCELL && (p->csub < CUNK || p->csub > CFREE)) { 50 if (p->ctype == OCELL && (p->csub < CUNK || p->csub > CFREE)) {
52 WARNING("bad csub %d in Cell %d %s", 51 WARNING("bad csub %d in Cell %d %s",
53 p->csub, p->ctype, p->sval); 52 p->csub, p->ctype, p->sval);
54 } 53 }
55 if (istemp(p)) 54 if (istemp(p))
56 tfree(p); 55 tfree(p);
57} 56}
58#endif 57#endif
@@ -570,280 +569,54 @@ Cell *intest(Node **a, int n) /* a[0] is @@ -570,280 +569,54 @@ Cell *intest(Node **a, int n) /* a[0] is
570 ap->sval = (char *) makesymtab(NSYMTAB); 569 ap->sval = (char *) makesymtab(NSYMTAB);
571 } 570 }
572 buf = makearraystring(a[0], __func__); 571 buf = makearraystring(a[0], __func__);
573 k = lookup(buf, (Array *) ap->sval); 572 k = lookup(buf, (Array *) ap->sval);
574 tempfree(ap); 573 tempfree(ap);
575 free(buf); 574 free(buf);
576 if (k == NULL) 575 if (k == NULL)
577 return(False); 576 return(False);
578 else 577 else
579 return(True); 578 return(True);
580} 579}
581 580
582 581
583/* ======== utf-8 code ========== */ 
584 
585/* 
586 * Awk strings can contain ascii, random 8-bit items (eg Latin-1), 
587 * or utf-8. u8_isutf tests whether a string starts with a valid 
588 * utf-8 sequence, and returns 0 if not (e.g., high bit set). 
589 * u8_nextlen returns length of next valid sequence, which is 
590 * 1 for ascii, 2..4 for utf-8, or 1 for high bit non-utf. 
591 * u8_strlen returns length of string in valid utf-8 sequences 
592 * and/or high-bit bytes. Conversion functions go between byte 
593 * number and character number. 
594 * 
595 * In theory, this behaves the same as before for non-utf8 bytes. 
596 * 
597 * Limited checking! This is a potential security hole. 
598 */ 
599 
600/* is s the beginning of a valid utf-8 string? */ 
601/* return length 1..4 if yes, 0 if no */ 
602int u8_isutf(const char *s) 
603{ 
604 int n, ret; 
605 unsigned char c; 
606 
607 c = s[0]; 
608 if (c < 128) 
609 return 1; /* what if it's 0? */ 
610 
611 n = strlen(s); 
612 if (n >= 2 && ((c>>5) & 0x7) == 0x6 && (s[1] & 0xC0) == 0x80) { 
613 ret = 2; /* 110xxxxx 10xxxxxx */ 
614 } else if (n >= 3 && ((c>>4) & 0xF) == 0xE && (s[1] & 0xC0) == 0x80 
615 && (s[2] & 0xC0) == 0x80) { 
616 ret = 3; /* 1110xxxx 10xxxxxx 10xxxxxx */ 
617 } else if (n >= 4 && ((c>>3) & 0x1F) == 0x1E && (s[1] & 0xC0) == 0x80 
618 && (s[2] & 0xC0) == 0x80 && (s[3] & 0xC0) == 0x80) { 
619 ret = 4; /* 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */ 
620 } else { 
621 ret = 0; 
622 } 
623 return ret; 
624} 
625 
626/* Convert (prefix of) utf8 string to utf-32 rune. */ 
627/* Sets *rune to the value, returns the length. */ 
628/* No error checking: watch out. */ 
629int u8_rune(int *rune, const char *s) 
630{ 
631 int n, ret; 
632 unsigned char c; 
633 
634 c = s[0]; 
635 if (c < 128) { 
636 *rune = c; 
637 return 1; 
638 } 
639 
640 n = strlen(s); 
641 if (n >= 2 && ((c>>5) & 0x7) == 0x6 && (s[1] & 0xC0) == 0x80) { 
642 *rune = ((c & 0x1F) << 6) | (s[1] & 0x3F); /* 110xxxxx 10xxxxxx */ 
643 ret = 2; 
644 } else if (n >= 3 && ((c>>4) & 0xF) == 0xE && (s[1] & 0xC0) == 0x80 
645 && (s[2] & 0xC0) == 0x80) { 
646 *rune = ((c & 0xF) << 12) | ((s[1] & 0x3F) << 6) | (s[2] & 0x3F); 
647 /* 1110xxxx 10xxxxxx 10xxxxxx */ 
648 ret = 3; 
649 } else if (n >= 4 && ((c>>3) & 0x1F) == 0x1E && (s[1] & 0xC0) == 0x80 
650 && (s[2] & 0xC0) == 0x80 && (s[3] & 0xC0) == 0x80) { 
651 *rune = ((c & 0x7) << 18) | ((s[1] & 0x3F) << 12) | ((s[2] & 0x3F) << 6) | (s[3] & 0x3F); 
652 /* 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx */ 
653 ret = 4; 
654 } else { 
655 *rune = c; 
656 ret = 1; 
657 } 
658 return ret; /* returns one byte if sequence doesn't look like utf */ 
659} 
660 
661/* return length of next sequence: 1 for ascii or random, 2..4 for valid utf8 */ 
662int u8_nextlen(const char *s) 
663{ 
664 int len; 
665 
666 len = u8_isutf(s); 
667 if (len == 0) 
668 len = 1; 
669 return len; 
670} 
671 
672/* return number of utf characters or single non-utf bytes */ 
673int u8_strlen(const char *s) 
674{ 
675 int i, len, n, totlen; 
676 unsigned char c; 
677 
678 n = strlen(s); 
679 totlen = 0; 
680 for (i = 0; i < n; i += len) { 
681 c = s[i]; 
682 if (c < 128) { 
683 len = 1; 
684 } else { 
685 len = u8_nextlen(&s[i]); 
686 } 
687 totlen++; 
688 if (i > n) 
689 FATAL("bad utf count [%s] n=%d i=%d\n", s, n, i); 
690 } 
691 return totlen; 
692} 
693 
694/* convert utf-8 char number in a string to its byte offset */ 
695int u8_char2byte(const char *s, int charnum) 
696{ 
697 int n; 
698 int bytenum = 0; 
699 
700 while (charnum > 0) { 
701 n = u8_nextlen(s); 
702 s += n; 
703 bytenum += n; 
704 charnum--; 
705 } 
706 return bytenum; 
707} 
708 
709/* convert byte offset in s to utf-8 char number that starts there */ 
710int u8_byte2char(const char *s, int bytenum) 
711{ 
712 int i, len, b; 
713 int charnum = 0; /* BUG: what origin? */ 
714 /* should be 0 to match start==0 which means no match */  
715 
716 b = strlen(s); 
717 if (bytenum >= b) { 
718 return -1; /* ??? */ 
719 } 
720 for (i = 0; i <= bytenum; i += len) { 
721 len = u8_nextlen(s+i); 
722 charnum++; 
723 } 
724 return charnum; 
725} 
726 
727/* runetochar() adapted from rune.c in the Plan 9 distributione */ 
728 
729enum 
730{ 
731 Runeerror = 128, /* from somewhere else */ 
732 Runemax = 0x10FFFF, 
733 
734 Bit1 = 7, 
735 Bitx = 6, 
736 Bit2 = 5, 
737 Bit3 = 4, 
738 Bit4 = 3, 
739 Bit5 = 2, 
740 
741 T1 = ((1<<(Bit1+1))-1) ^ 0xFF, /* 0000 0000 */ 
742 Tx = ((1<<(Bitx+1))-1) ^ 0xFF, /* 1000 0000 */ 
743 T2 = ((1<<(Bit2+1))-1) ^ 0xFF, /* 1100 0000 */ 
744 T3 = ((1<<(Bit3+1))-1) ^ 0xFF, /* 1110 0000 */ 
745 T4 = ((1<<(Bit4+1))-1) ^ 0xFF, /* 1111 0000 */ 
746 T5 = ((1<<(Bit5+1))-1) ^ 0xFF, /* 1111 1000 */ 
747 
748 Rune1 = (1<<(Bit1+0*Bitx))-1, /* 0000 0000 0000 0000 0111 1111 */ 
749 Rune2 = (1<<(Bit2+1*Bitx))-1, /* 0000 0000 0000 0111 1111 1111 */ 
750 Rune3 = (1<<(Bit3+2*Bitx))-1, /* 0000 0000 1111 1111 1111 1111 */ 
751 Rune4 = (1<<(Bit4+3*Bitx))-1, /* 0011 1111 1111 1111 1111 1111 */ 
752 
753 Maskx = (1<<Bitx)-1, /* 0011 1111 */ 
754 Testx = Maskx ^ 0xFF, /* 1100 0000 */ 
755 
756}; 
757 
758int runetochar(char *str, int c) 
759{  
760 /* one character sequence 00000-0007F => 00-7F */  
761 if (c <= Rune1) { 
762 str[0] = c; 
763 return 1; 
764 } 
765  
766 /* two character sequence 00080-007FF => T2 Tx */ 
767 if (c <= Rune2) { 
768 str[0] = T2 | (c >> 1*Bitx); 
769 str[1] = Tx | (c & Maskx); 
770 return 2; 
771 } 
772 
773 /* three character sequence 00800-0FFFF => T3 Tx Tx */ 
774 if (c > Runemax) 
775 c = Runeerror; 
776 if (c <= Rune3) { 
777 str[0] = T3 | (c >> 2*Bitx); 
778 str[1] = Tx | ((c >> 1*Bitx) & Maskx); 
779 str[2] = Tx | (c & Maskx); 
780 return 3; 
781 } 
782  
783 /* four character sequence 010000-1FFFFF => T4 Tx Tx Tx */ 
784 str[0] = T4 | (c >> 3*Bitx); 
785 str[1] = Tx | ((c >> 2*Bitx) & Maskx); 
786 str[2] = Tx | ((c >> 1*Bitx) & Maskx); 
787 str[3] = Tx | (c & Maskx); 
788 return 4; 
789}  
790 
791 
792/* ========== end of utf8 code =========== */ 
793 
794 
795 
796Cell *matchop(Node **a, int n) /* ~ and match() */ 582Cell *matchop(Node **a, int n) /* ~ and match() */
797{ 583{
798 Cell *x, *y; 584 Cell *x, *y;
799 char *s, *t; 585 char *s, *t;
800 int i; 586 int i;
801 int cstart, cpatlen, len; 
802 fa *pfa; 587 fa *pfa;
803 int (*mf)(fa *, const char *) = match, mode = 0; 588 int (*mf)(fa *, const char *) = match, mode = 0;
804 589
805 if (n == MATCHFCN) { 590 if (n == MATCHFCN) {
806 mf = pmatch; 591 mf = pmatch;
807 mode = 1; 592 mode = 1;
808 } 593 }
809 x = execute(a[1]); /* a[1] = target text */ 594 x = execute(a[1]); /* a[1] = target text */
810 s = getsval(x); 595 s = getsval(x);
811 if (a[0] == NULL) /* a[1] == 0: already-compiled reg expr */ 596 if (a[0] == NULL) /* a[1] == 0: already-compiled reg expr */
812 i = (*mf)((fa *) a[2], s); 597 i = (*mf)((fa *) a[2], s);
813 else { 598 else {
814 y = execute(a[2]); /* a[2] = regular expr */ 599 y = execute(a[2]); /* a[2] = regular expr */
815 t = getsval(y); 600 t = getsval(y);
816 pfa = makedfa(t, mode); 601 pfa = makedfa(t, mode);
817 i = (*mf)(pfa, s); 602 i = (*mf)(pfa, s);
818 tempfree(y); 603 tempfree(y);
819 } 604 }
820 tempfree(x); 605 tempfree(x);
821 if (n == MATCHFCN) { 606 if (n == MATCHFCN) {
822 int start = patbeg - s + 1; /* origin 1 */ 607 int start = patbeg - s + 1;
823 if (patlen < 0) { 608 if (patlen < 0)
824 start = 0; /* not found */ 609 start = 0;
825 } else { 
826 cstart = u8_byte2char(s, start-1); 
827 cpatlen = 0; 
828 for (i = 0; i < patlen; i += len) { 
829 len = u8_nextlen(patbeg+i); 
830 cpatlen++; 
831 } 
832 
833 start = cstart; 
834 patlen = cpatlen; 
835 } 
836 
837 setfval(rstartloc, (Awkfloat) start); 610 setfval(rstartloc, (Awkfloat) start);
838 setfval(rlengthloc, (Awkfloat) patlen); 611 setfval(rlengthloc, (Awkfloat) patlen);
839 x = gettemp(); 612 x = gettemp();
840 x->tval = NUM; 613 x->tval = NUM;
841 x->fval = start; 614 x->fval = start;
842 return x; 615 return x;
843 } else if ((n == MATCH && i == 1) || (n == NOTMATCH && i == 0)) 616 } else if ((n == MATCH && i == 1) || (n == NOTMATCH && i == 0))
844 return(True); 617 return(True);
845 else 618 else
846 return(False); 619 return(False);
847} 620}
848 621
849 622
@@ -874,49 +647,43 @@ Cell *boolop(Node **a, int n) /* a[0] || @@ -874,49 +647,43 @@ Cell *boolop(Node **a, int n) /* a[0] ||
874 if (i) return(False); 647 if (i) return(False);
875 else return(True); 648 else return(True);
876 default: /* can't happen */ 649 default: /* can't happen */
877 FATAL("unknown boolean operator %d", n); 650 FATAL("unknown boolean operator %d", n);
878 } 651 }
879 return 0; /*NOTREACHED*/ 652 return 0; /*NOTREACHED*/
880} 653}
881 654
882Cell *relop(Node **a, int n) /* a[0 < a[1], etc. */ 655Cell *relop(Node **a, int n) /* a[0 < a[1], etc. */
883{ 656{
884 int i; 657 int i;
885 Cell *x, *y; 658 Cell *x, *y;
886 Awkfloat j; 659 Awkfloat j;
887 bool x_is_nan, y_is_nan; 
888 660
889 x = execute(a[0]); 661 x = execute(a[0]);
890 y = execute(a[1]); 662 y = execute(a[1]);
891 x_is_nan = isnan(x->fval); 
892 y_is_nan = isnan(y->fval); 
893 if (x->tval&NUM && y->tval&NUM) { 663 if (x->tval&NUM && y->tval&NUM) {
894 if ((x_is_nan || y_is_nan) && n != NE) 
895 return(False); 
896 j = x->fval - y->fval; 664 j = x->fval - y->fval;
897 i = j<0? -1: (j>0? 1: 0); 665 i = j<0? -1: (j>0? 1: 0);
898 } else { 666 } else {
899 i = strcmp(getsval(x), getsval(y)); 667 i = strcmp(getsval(x), getsval(y));
900 } 668 }
901 tempfree(x); 669 tempfree(x);
902 tempfree(y); 670 tempfree(y);
903 switch (n) { 671 switch (n) {
904 case LT: if (i<0) return(True); 672 case LT: if (i<0) return(True);
905 else return(False); 673 else return(False);
906 case LE: if (i<=0) return(True); 674 case LE: if (i<=0) return(True);
907 else return(False); 675 else return(False);
908 case NE: if (x_is_nan && y_is_nan) return(True); 676 case NE: if (i!=0) return(True);
909 else if (i!=0) return(True); 
910 else return(False); 677 else return(False);
911 case EQ: if (i == 0) return(True); 678 case EQ: if (i == 0) return(True);
912 else return(False); 679 else return(False);
913 case GE: if (i>=0) return(True); 680 case GE: if (i>=0) return(True);
914 else return(False); 681 else return(False);
915 case GT: if (i>0) return(True); 682 case GT: if (i>0) return(True);
916 else return(False); 683 else return(False);
917 default: /* can't happen */ 684 default: /* can't happen */
918 FATAL("unknown relational operator %d", n); 685 FATAL("unknown relational operator %d", n);
919 } 686 }
920 return 0; /*NOTREACHED*/ 687 return 0; /*NOTREACHED*/
921} 688}
922 689
@@ -965,27 +732,26 @@ Cell *indirect(Node **a, int n) /* $( a[ @@ -965,27 +732,26 @@ Cell *indirect(Node **a, int n) /* $( a[
965 if (m == 0 && !is_number(s = getsval(x), NULL)) /* suspicion! */ 732 if (m == 0 && !is_number(s = getsval(x), NULL)) /* suspicion! */
966 FATAL("illegal field $(%s), name \"%s\"", s, x->nval); 733 FATAL("illegal field $(%s), name \"%s\"", s, x->nval);
967 /* BUG: can x->nval ever be null??? */ 734 /* BUG: can x->nval ever be null??? */
968 tempfree(x); 735 tempfree(x);
969 x = fieldadr(m); 736 x = fieldadr(m);
970 x->ctype = OCELL; /* BUG? why are these needed? */ 737 x->ctype = OCELL; /* BUG? why are these needed? */
971 x->csub = CFLD; 738 x->csub = CFLD;
972 return(x); 739 return(x);
973} 740}
974 741
975Cell *substr(Node **a, int nnn) /* substr(a[0], a[1], a[2]) */ 742Cell *substr(Node **a, int nnn) /* substr(a[0], a[1], a[2]) */
976{ 743{
977 int k, m, n; 744 int k, m, n;
978 int mb, nb; 
979 char *s; 745 char *s;
980 int temp; 746 int temp;
981 Cell *x, *y, *z = NULL; 747 Cell *x, *y, *z = NULL;
982 748
983 x = execute(a[0]); 749 x = execute(a[0]);
984 y = execute(a[1]); 750 y = execute(a[1]);
985 if (a[2] != NULL) 751 if (a[2] != NULL)
986 z = execute(a[2]); 752 z = execute(a[2]);
987 s = getsval(x); 753 s = getsval(x);
988 k = strlen(s) + 1; 754 k = strlen(s) + 1;
989 if (k <= 1) { 755 if (k <= 1) {
990 tempfree(x); 756 tempfree(x);
991 tempfree(y); 757 tempfree(y);
@@ -1001,86 +767,62 @@ Cell *substr(Node **a, int nnn) /* subs @@ -1001,86 +767,62 @@ Cell *substr(Node **a, int nnn) /* subs
1001 m = 1; 767 m = 1;
1002 else if (m > k) 768 else if (m > k)
1003 m = k; 769 m = k;
1004 tempfree(y); 770 tempfree(y);
1005 if (a[2] != NULL) { 771 if (a[2] != NULL) {
1006 n = (int) getfval(z); 772 n = (int) getfval(z);
1007 tempfree(z); 773 tempfree(z);
1008 } else 774 } else
1009 n = k - 1; 775 n = k - 1;
1010 if (n < 0) 776 if (n < 0)
1011 n = 0; 777 n = 0;
1012 else if (n > k - m) 778 else if (n > k - m)
1013 n = k - m; 779 n = k - m;
1014 /* m is start, n is length from there */ 
1015 DPRINTF("substr: m=%d, n=%d, s=%s\n", m, n, s); 780 DPRINTF("substr: m=%d, n=%d, s=%s\n", m, n, s);
1016 y = gettemp(); 781 y = gettemp();
1017 mb = u8_char2byte(s, m-1); /* byte offset of start char in s */ 782 temp = s[n+m-1]; /* with thanks to John Linderman */
1018 nb = u8_char2byte(s, m-1+n); /* byte offset of end+1 char in s */ 783 s[n+m-1] = '\0';
1019 784 setsval(y, s + m - 1);
1020 temp = s[nb]; /* with thanks to John Linderman */ 785 s[n+m-1] = temp;
1021 s[nb] = '\0'; 
1022 setsval(y, s + mb); 
1023 s[nb] = temp; 
1024 tempfree(x); 786 tempfree(x);
1025 return(y); 787 return(y);
1026} 788}
1027 789
1028Cell *sindex(Node **a, int nnn) /* index(a[0], a[1]) */ 790Cell *sindex(Node **a, int nnn) /* index(a[0], a[1]) */
1029{ 791{
1030 Cell *x, *y, *z; 792 Cell *x, *y, *z;
1031 char *s1, *s2, *p1, *p2, *q; 793 char *s1, *s2, *p1, *p2, *q;
1032 Awkfloat v = 0.0; 794 Awkfloat v = 0.0;
1033 795
1034 x = execute(a[0]); 796 x = execute(a[0]);
1035 s1 = getsval(x); 797 s1 = getsval(x);
1036 y = execute(a[1]); 798 y = execute(a[1]);
1037 s2 = getsval(y); 799 s2 = getsval(y);
1038 800
1039 z = gettemp(); 801 z = gettemp();
1040 for (p1 = s1; *p1 != '\0'; p1++) { 802 for (p1 = s1; *p1 != '\0'; p1++) {
1041 for (q = p1, p2 = s2; *p2 != '\0' && *q == *p2; q++, p2++) 803 for (q = p1, p2 = s2; *p2 != '\0' && *q == *p2; q++, p2++)
1042 continue; 804 continue;
1043 if (*p2 == '\0') { 805 if (*p2 == '\0') {
1044 /* v = (Awkfloat) (p1 - s1 + 1); origin 1 */ 806 v = (Awkfloat) (p1 - s1 + 1); /* origin 1 */
1045 
1046 /* should be a function: used in match() as well */ 
1047 int i, len; 
1048 v = 0; 
1049 for (i = 0; i < p1-s1+1; i += len) { 
1050 len = u8_nextlen(s1+i); 
1051 v++; 
1052 } 
1053 break; 807 break;
1054 } 808 }
1055 } 809 }
1056 tempfree(x); 810 tempfree(x);
1057 tempfree(y); 811 tempfree(y);
1058 setfval(z, v); 812 setfval(z, v);
1059 return(z); 813 return(z);
1060} 814}
1061 815
1062int has_utf8(char *s) /* return 1 if s contains any utf-8 (2 bytes or more) character */ 
1063{ 
1064 int n; 
1065 
1066 for (n = 0; *s != 0; s += n) { 
1067 n = u8_nextlen(s); 
1068 if (n > 1) 
1069 return 1; 
1070 } 
1071 return 0; 
1072} 
1073 
1074#define MAXNUMSIZE 50 816#define MAXNUMSIZE 50
1075 817
1076int format(char **pbuf, int *pbufsize, const char *s, Node *a) /* printf-like conversions */ 818int format(char **pbuf, int *pbufsize, const char *s, Node *a) /* printf-like conversions */
1077{ 819{
1078 char *fmt; 820 char *fmt;
1079 char *p, *t; 821 char *p, *t;
1080 const char *os; 822 const char *os;
1081 Cell *x; 823 Cell *x;
1082 int flag = 0, n; 824 int flag = 0, n;
1083 int fmtwd; /* format width */ 825 int fmtwd; /* format width */
1084 int fmtsz = recsize; 826 int fmtsz = recsize;
1085 char *buf = *pbuf; 827 char *buf = *pbuf;
1086 int bufsize = *pbufsize; 828 int bufsize = *pbufsize;
@@ -1103,26 +845,27 @@ int format(char **pbuf, int *pbufsize, c @@ -1103,26 +845,27 @@ int format(char **pbuf, int *pbufsize, c
1103 if ((fmt = (char *) malloc(fmtsz)) == NULL) 845 if ((fmt = (char *) malloc(fmtsz)) == NULL)
1104 FATAL("out of memory in format()"); 846 FATAL("out of memory in format()");
1105 while (*s) { 847 while (*s) {
1106 adjbuf(&buf, &bufsize, MAXNUMSIZE+1+p-buf, recsize, &p, "format1"); 848 adjbuf(&buf, &bufsize, MAXNUMSIZE+1+p-buf, recsize, &p, "format1");
1107 if (*s != '%') { 849 if (*s != '%') {
1108 *p++ = *s++; 850 *p++ = *s++;
1109 continue; 851 continue;
1110 } 852 }
1111 if (*(s+1) == '%') { 853 if (*(s+1) == '%') {
1112 *p++ = '%'; 854 *p++ = '%';
1113 s += 2; 855 s += 2;
1114 continue; 856 continue;
1115 } 857 }
 858 /* have to be real careful in case this is a huge number, eg, %100000d */
1116 fmtwd = atoi(s+1); 859 fmtwd = atoi(s+1);
1117 if (fmtwd < 0) 860 if (fmtwd < 0)
1118 fmtwd = -fmtwd; 861 fmtwd = -fmtwd;
1119 adjbuf(&buf, &bufsize, fmtwd+1+p-buf, recsize, &p, "format2"); 862 adjbuf(&buf, &bufsize, fmtwd+1+p-buf, recsize, &p, "format2");
1120 for (t = fmt; (*t++ = *s) != '\0'; s++) { 863 for (t = fmt; (*t++ = *s) != '\0'; s++) {
1121 if (!adjbuf(&fmt, &fmtsz, MAXNUMSIZE+1+t-fmt, recsize, &t, "format3")) 864 if (!adjbuf(&fmt, &fmtsz, MAXNUMSIZE+1+t-fmt, recsize, &t, "format3"))
1122 FATAL("format item %.30s... ran format() out of memory", os); 865 FATAL("format item %.30s... ran format() out of memory", os);
1123 /* Ignore size specifiers */ 866 /* Ignore size specifiers */
1124 if (strchr("hjLlqtz", *s) != NULL) { /* the ansi panoply */ 867 if (strchr("hjLlqtz", *s) != NULL) { /* the ansi panoply */
1125 t--; 868 t--;
1126 continue; 869 continue;
1127 } 870 }
1128 if (isalpha((uschar)*s)) 871 if (isalpha((uschar)*s))
@@ -1175,211 +918,63 @@ int format(char **pbuf, int *pbufsize, c @@ -1175,211 +918,63 @@ int format(char **pbuf, int *pbufsize, c
1175 WARNING("weird printf conversion %s", fmt); 918 WARNING("weird printf conversion %s", fmt);
1176 flag = '?'; 919 flag = '?';
1177 break; 920 break;
1178 } 921 }
1179 if (a == NULL) 922 if (a == NULL)
1180 FATAL("not enough args in printf(%s)", os); 923 FATAL("not enough args in printf(%s)", os);
1181 x = execute(a); 924 x = execute(a);
1182 a = a->nnext; 925 a = a->nnext;
1183 n = MAXNUMSIZE; 926 n = MAXNUMSIZE;
1184 if (fmtwd > n) 927 if (fmtwd > n)
1185 n = fmtwd; 928 n = fmtwd;
1186 adjbuf(&buf, &bufsize, 1+n+p-buf, recsize, &p, "format5"); 929 adjbuf(&buf, &bufsize, 1+n+p-buf, recsize, &p, "format5");
1187 switch (flag) { 930 switch (flag) {
1188 case '?': 931 case '?': snprintf(p, BUFSZ(p), "%s", fmt); /* unknown, so dump it too */
1189 snprintf(p, BUFSZ(p), "%s", fmt); /* unknown, so dump it too */ 
1190 t = getsval(x); 932 t = getsval(x);
1191 n = strlen(t); 933 n = strlen(t);
1192 if (fmtwd > n) 934 if (fmtwd > n)
1193 n = fmtwd; 935 n = fmtwd;
1194 adjbuf(&buf, &bufsize, 1+strlen(p)+n+p-buf, recsize, &p, "format6"); 936 adjbuf(&buf, &bufsize, 1+strlen(p)+n+p-buf, recsize, &p, "format6");
1195 p += strlen(p); 937 p += strlen(p);
1196 snprintf(p, BUFSZ(p), "%s", t); 938 snprintf(p, BUFSZ(p), "%s", t);
1197 break; 939 break;
1198 case 'a': 940 case 'a':
1199 case 'A': 941 case 'A':
1200 case 'f': snprintf(p, BUFSZ(p), fmt, getfval(x)); break; 942 case 'f': snprintf(p, BUFSZ(p), fmt, getfval(x)); break;
1201 case 'd': snprintf(p, BUFSZ(p), fmt, (intmax_t) getfval(x)); break; 943 case 'd': snprintf(p, BUFSZ(p), fmt, (intmax_t) getfval(x)); break;
1202 case 'u': snprintf(p, BUFSZ(p), fmt, (uintmax_t) getfval(x)); break; 944 case 'u': snprintf(p, BUFSZ(p), fmt, (uintmax_t) getfval(x)); break;
1203 945 case 's':
1204 case 's': { 
1205 t = getsval(x); 946 t = getsval(x);
1206 n = strlen(t); 947 n = strlen(t);
1207 /* if simple format or no utf-8 in the string, sprintf works */ 948 if (fmtwd > n)
1208 if (!has_utf8(t) || strcmp(fmt,"%s") == 0) { 949 n = fmtwd;
1209 if (fmtwd > n) 950 if (!adjbuf(&buf, &bufsize, 1+n+p-buf, recsize, &p, "format7"))
1210 n = fmtwd; 951 FATAL("huge string/format (%d chars) in printf %.30s... ran format() out of memory", n, t);
1211 if (!adjbuf(&buf, &bufsize, 1+n+p-buf, recsize, &p, "format7")) 952 snprintf(p, BUFSZ(p), fmt, t);
1212 FATAL("huge string/format (%d chars) in printf %.30s..." \ 
1213 " ran format() out of memory", n, t); 
1214 snprintf(p, BUFSZ(p), fmt, t); 
1215 break; 
1216 } 
1217 
1218 /* get here if string has utf-8 chars and fmt is not plain %s */ 
1219 /* "%-w.ps", where -, w and .p are all optional */ 
1220 /* '0' before the w is a flag character */ 
1221 /* fmt points at % */ 
1222 int ljust = 0, wid = 0, prec = n, pad = 0; 
1223 char *f = fmt+1; 
1224 if (f[0] == '-') { 
1225 ljust = 1; 
1226 f++; 
1227 } 
1228 // flags '0' and '+' are recognized but skipped 
1229 if (f[0] == '0') { 
1230 f++; 
1231 if (f[0] == '+') 
1232 f++; 
1233 } 
1234 if (f[0] == '+') { 
1235 f++; 
1236 if (f[0] == '0') 
1237 f++; 
1238 } 
1239 if (isdigit(f[0])) { /* there is a wid */ 
1240 wid = strtol(f, &f, 10); 
1241 } 
1242 if (f[0] == '.') { /* there is a .prec */ 
1243 prec = strtol(++f, &f, 10); 
1244 } 
1245 if (prec > u8_strlen(t)) 
1246 prec = u8_strlen(t); 
1247 pad = wid>prec ? wid - prec : 0; // has to be >= 0 
1248 int i, k, n; 
1249  
1250 if (ljust) { // print prec chars from t, then pad blanks 
1251 n = u8_char2byte(t, prec); 
1252 for (k = 0; k < n; k++) { 
1253 //putchar(t[k]); 
1254 *p++ = t[k]; 
1255 } 
1256 for (i = 0; i < pad; i++) { 
1257 //printf(" "); 
1258 *p++ = ' '; 
1259 } 
1260 } else { // print pad blanks, then prec chars from t 
1261 for (i = 0; i < pad; i++) { 
1262 //printf(" "); 
1263 *p++ = ' '; 
1264 } 
1265 n = u8_char2byte(t, prec); 
1266 for (k = 0; k < n; k++) { 
1267 //putchar(t[k]); 
1268 *p++ = t[k]; 
1269 } 
1270 } 
1271 *p = 0; 
1272 break; 953 break;
1273 } 954 case 'c':
1274 
1275 case 'c': { 
1276 /* 
1277 * If a numeric value is given, awk should just turn 
1278 * it into a character and print it: 
1279 * BEGIN { printf("%c\n", 65) } 
1280 * prints "A". 
1281 * 
1282 * But what if the numeric value is > 128 and 
1283 * represents a valid Unicode code point?!? We do 
1284 * our best to convert it back into UTF-8. If we 
1285 * can't, we output the encoding of the Unicode 
1286 * "invalid character", 0xFFFD. 
1287 */ 
1288 if (isnum(x)) { 955 if (isnum(x)) {
1289 int charval = (int) getfval(x); 956 if ((int)getfval(x))
1290 957 snprintf(p, BUFSZ(p), fmt, (int) getfval(x));
1291 if (charval != 0) { 958 else {
1292 if (charval < 128) 
1293 snprintf(p, BUFSZ(p), fmt, charval); 
1294 else { 
1295 // possible unicode character 
1296 size_t count; 
1297 char *bs = wide_char_to_byte_str(charval, &count); 
1298 
1299 if (bs == NULL) { // invalid character 
1300 // use unicode invalid character, 0xFFFD 
1301 bs = "\357\277\275"; 
1302 count = 3; 
1303 } 
1304 t = bs; 
1305 n = count; 
1306 goto format_percent_c; 
1307 } 
1308 } else { 
1309 *p++ = '\0'; /* explicit null byte */ 959 *p++ = '\0'; /* explicit null byte */
1310 *p = '\0'; /* next output will start here */ 960 *p = '\0'; /* next output will start here */
1311 } 961 }
1312 break; 962 } else
1313 } 
1314 t = getsval(x); 
1315 n = u8_nextlen(t); 
1316 format_percent_c: 
1317 if (n < 2) { /* not utf8 */ 
1318 snprintf(p, BUFSZ(p), fmt, getsval(x)[0]); 963 snprintf(p, BUFSZ(p), fmt, getsval(x)[0]);
1319 break; 
1320 } 
1321 
1322 // utf8 character, almost same song and dance as for %s 
1323 int ljust = 0, wid = 0, prec = n, pad = 0; 
1324 char *f = fmt+1; 
1325 if (f[0] == '-') { 
1326 ljust = 1; 
1327 f++; 
1328 } 
1329 // flags '0' and '+' are recognized but skipped 
1330 if (f[0] == '0') { 
1331 f++; 
1332 if (f[0] == '+') 
1333 f++; 
1334 } 
1335 if (f[0] == '+') { 
1336 f++; 
1337 if (f[0] == '0') 
1338 f++; 
1339 } 
1340 if (isdigit(f[0])) { /* there is a wid */ 
1341 wid = strtol(f, &f, 10); 
1342 } 
1343 if (f[0] == '.') { /* there is a .prec */ 
1344 prec = strtol(++f, &f, 10); 
1345 } 
1346 if (prec > 1) // %c --> only one character 
1347 prec = 1; 
1348 pad = wid>prec ? wid - prec : 0; // has to be >= 0 
1349 int i; 
1350 
1351 if (ljust) { // print one char from t, then pad blanks 
1352 for (int i = 0; i < n; i++) 
1353 *p++ = t[i]; 
1354 for (i = 0; i < pad; i++) { 
1355 //printf(" "); 
1356 *p++ = ' '; 
1357 } 
1358 } else { // print pad blanks, then prec chars from t 
1359 for (i = 0; i < pad; i++) { 
1360 //printf(" "); 
1361 *p++ = ' '; 
1362 } 
1363 for (int i = 0; i < n; i++) 
1364 *p++ = t[i]; 
1365 } 
1366 *p = 0; 
1367 break; 964 break;
1368 } 
1369 default: 965 default:
1370 FATAL("can't happen: bad conversion %c in format()", flag); 966 FATAL("can't happen: bad conversion %c in format()", flag);
1371 } 967 }
1372 
1373 tempfree(x); 968 tempfree(x);
1374 p += strlen(p); 969 p += strlen(p);
1375 s++; 970 s++;
1376 } 971 }
1377 *p = '\0'; 972 *p = '\0';
1378 free(fmt); 973 free(fmt);
1379 for ( ; a; a = a->nnext) { /* evaluate any remaining args */ 974 for ( ; a; a = a->nnext) { /* evaluate any remaining args */
1380 x = execute(a); 975 x = execute(a);
1381 tempfree(x); 976 tempfree(x);
1382 } 977 }
1383 *pbuf = buf; 978 *pbuf = buf;
1384 *pbufsize = bufsize; 979 *pbufsize = bufsize;
1385 return p - buf; 980 return p - buf;
@@ -1660,47 +1255,44 @@ Cell *dopa2(Node **a, int n) /* a[0], a[ @@ -1660,47 +1255,44 @@ Cell *dopa2(Node **a, int n) /* a[0], a[
1660 } 1255 }
1661 return(False); 1256 return(False);
1662} 1257}
1663 1258
1664Cell *split(Node **a, int nnn) /* split(a[0], a[1], a[2]); a[3] is type */ 1259Cell *split(Node **a, int nnn) /* split(a[0], a[1], a[2]); a[3] is type */
1665{ 1260{
1666 Cell *x = NULL, *y, *ap; 1261 Cell *x = NULL, *y, *ap;
1667 const char *s, *origs, *t; 1262 const char *s, *origs, *t;
1668 const char *fs = NULL; 1263 const char *fs = NULL;
1669 char *origfs = NULL; 1264 char *origfs = NULL;
1670 int sep; 1265 int sep;
1671 char temp, num[50]; 1266 char temp, num[50];
1672 int n, tempstat, arg3type; 1267 int n, tempstat, arg3type;
1673 int j; 
1674 double result; 1268 double result;
1675 1269
1676 y = execute(a[0]); /* source string */ 1270 y = execute(a[0]); /* source string */
1677 origs = s = strdup(getsval(y)); 1271 origs = s = strdup(getsval(y));
1678 tempfree(y); 1272 tempfree(y);
1679 arg3type = ptoi(a[3]); 1273 arg3type = ptoi(a[3]);
1680 if (a[2] == NULL) { /* BUG: CSV should override implicit fs but not explicit */ 1274 if (a[2] == NULL) /* fs string */
1681 fs = getsval(fsloc); 1275 fs = getsval(fsloc);
1682 } else if (arg3type == STRING) { /* split(str,arr,"string") */ 1276 else if (arg3type == STRING) { /* split(str,arr,"string") */
1683 x = execute(a[2]); 1277 x = execute(a[2]);
1684 fs = origfs = strdup(getsval(x)); 1278 fs = origfs = strdup(getsval(x));
1685 tempfree(x); 1279 tempfree(x);
1686 } else if (arg3type == REGEXPR) { 1280 } else if (arg3type == REGEXPR)
1687 fs = "(regexpr)"; /* split(str,arr,/regexpr/) */ 1281 fs = "(regexpr)"; /* split(str,arr,/regexpr/) */
1688 } else { 1282 else
1689 FATAL("illegal type of split"); 1283 FATAL("illegal type of split");
1690 } 
1691 sep = *fs; 1284 sep = *fs;
1692 ap = execute(a[1]); /* array name */ 1285 ap = execute(a[1]); /* array name */
1693/* BUG 7/26/22: this appears not to reset array: see C1/asplit */ 
1694 freesymtab(ap); 1286 freesymtab(ap);
1695 DPRINTF("split: s=|%s|, a=%s, sep=|%s|\n", s, NN(ap->nval), fs); 1287 DPRINTF("split: s=|%s|, a=%s, sep=|%s|\n", s, NN(ap->nval), fs);
1696 ap->tval &= ~STR; 1288 ap->tval &= ~STR;
1697 ap->tval |= ARR; 1289 ap->tval |= ARR;
1698 ap->sval = (char *) makesymtab(NSYMTAB); 1290 ap->sval = (char *) makesymtab(NSYMTAB);
1699 1291
1700 n = 0; 1292 n = 0;
1701 if (arg3type == REGEXPR && strlen((char*)((fa*)a[2])->restr) == 0) { 1293 if (arg3type == REGEXPR && strlen((char*)((fa*)a[2])->restr) == 0) {
1702 /* split(s, a, //); have to arrange that it looks like empty sep */ 1294 /* split(s, a, //); have to arrange that it looks like empty sep */
1703 arg3type = 0; 1295 arg3type = 0;
1704 fs = ""; 1296 fs = "";
1705 sep = 0; 1297 sep = 0;
1706 } 1298 }
@@ -1734,102 +1326,62 @@ Cell *split(Node **a, int nnn) /* split( @@ -1734,102 +1326,62 @@ Cell *split(Node **a, int nnn) /* split(
1734 } 1326 }
1735 } while (nematch(pfa,s)); 1327 } while (nematch(pfa,s));
1736 pfa->initstat = tempstat; /* bwk: has to be here to reset */ 1328 pfa->initstat = tempstat; /* bwk: has to be here to reset */
1737 /* cf gsub and refldbld */ 1329 /* cf gsub and refldbld */
1738 } 1330 }
1739 n++; 1331 n++;
1740 snprintf(num, sizeof(num), "%d", n); 1332 snprintf(num, sizeof(num), "%d", n);
1741 if (is_number(s, & result)) 1333 if (is_number(s, & result))
1742 setsymtab(num, s, result, STR|NUM, (Array *) ap->sval); 1334 setsymtab(num, s, result, STR|NUM, (Array *) ap->sval);
1743 else 1335 else
1744 setsymtab(num, s, 0.0, STR, (Array *) ap->sval); 1336 setsymtab(num, s, 0.0, STR, (Array *) ap->sval);
1745 spdone: 1337 spdone:
1746 pfa = NULL; 1338 pfa = NULL;
1747 1339 } else if (sep == ' ') {
1748 } else if (a[2] == NULL && CSV) { /* CSV only if no explicit separator */ 
1749 char *newt = (char *) malloc(strlen(s)); /* for building new string; reuse for each field */ 
1750 for (;;) { 
1751 char *fr = newt; 
1752 n++; 
1753 if (*s == '"' ) { /* start of "..." */ 
1754 for (s++ ; *s != '\0'; ) { 
1755 if (*s == '"' && s[1] != '\0' && s[1] == '"') { 
1756 s += 2; /* doubled quote */ 
1757 *fr++ = '"'; 
1758 } else if (*s == '"' && (s[1] == '\0' || s[1] == ',')) { 
1759 s++; /* skip over closing quote */ 
1760 break; 
1761 } else { 
1762 *fr++ = *s++; 
1763 } 
1764 } 
1765 *fr++ = 0; 
1766 } else { /* unquoted field */ 
1767 while (*s != ',' && *s != '\0') 
1768 *fr++ = *s++; 
1769 *fr++ = 0; 
1770 } 
1771 snprintf(num, sizeof(num), "%d", n); 
1772 if (is_number(newt, &result)) 
1773 setsymtab(num, newt, result, STR|NUM, (Array *) ap->sval); 
1774 else 
1775 setsymtab(num, newt, 0.0, STR, (Array *) ap->sval); 
1776 if (*s++ == '\0') 
1777 break; 
1778 } 
1779 free(newt); 
1780 
1781 } else if (!CSV && sep == ' ') { /* usual case: split on white space */ 
1782 for (n = 0; ; ) { 1340 for (n = 0; ; ) {
1783#define ISWS(c) ((c) == ' ' || (c) == '\t' || (c) == '\n') 1341#define ISWS(c) ((c) == ' ' || (c) == '\t' || (c) == '\n')
1784 while (ISWS(*s)) 1342 while (ISWS(*s))
1785 s++; 1343 s++;
1786 if (*s == '\0') 1344 if (*s == '\0')
1787 break; 1345 break;
1788 n++; 1346 n++;
1789 t = s; 1347 t = s;
1790 do 1348 do
1791 s++; 1349 s++;
1792 while (*s != '\0' && !ISWS(*s)); 1350 while (*s != '\0' && !ISWS(*s));
1793 temp = *s; 1351 temp = *s;
1794 setptr(s, '\0'); 1352 setptr(s, '\0');
1795 snprintf(num, sizeof(num), "%d", n); 1353 snprintf(num, sizeof(num), "%d", n);
1796 if (is_number(t, & result)) 1354 if (is_number(t, & result))
1797 setsymtab(num, t, result, STR|NUM, (Array *) ap->sval); 1355 setsymtab(num, t, result, STR|NUM, (Array *) ap->sval);
1798 else 1356 else
1799 setsymtab(num, t, 0.0, STR, (Array *) ap->sval); 1357 setsymtab(num, t, 0.0, STR, (Array *) ap->sval);
1800 setptr(s, temp); 1358 setptr(s, temp);
1801 if (*s != '\0') 1359 if (*s != '\0')
1802 s++; 1360 s++;
1803 } 1361 }
1804 
1805 } else if (sep == 0) { /* new: split(s, a, "") => 1 char/elem */ 1362 } else if (sep == 0) { /* new: split(s, a, "") => 1 char/elem */
1806 for (n = 0; *s != '\0'; s += u8_nextlen(s)) { 1363 for (n = 0; *s != '\0'; s++) {
1807 char buf[10]; 1364 char buf[2];
1808 n++; 1365 n++;
1809 snprintf(num, sizeof(num), "%d", n); 1366 snprintf(num, sizeof(num), "%d", n);
1810 1367 buf[0] = *s;
1811 for (j = 0; j < u8_nextlen(s); j++) { 1368 buf[1] = '\0';
1812 buf[j] = s[j]; 
1813 } 
1814 buf[j] = '\0'; 
1815 
1816 if (isdigit((uschar)buf[0])) 1369 if (isdigit((uschar)buf[0]))
1817 setsymtab(num, buf, atof(buf), STR|NUM, (Array *) ap->sval); 1370 setsymtab(num, buf, atof(buf), STR|NUM, (Array *) ap->sval);
1818 else 1371 else
1819 setsymtab(num, buf, 0.0, STR, (Array *) ap->sval); 1372 setsymtab(num, buf, 0.0, STR, (Array *) ap->sval);
1820 } 1373 }
1821 1374 } else if (*s != '\0') {
1822 } else if (*s != '\0') { /* some random single character */ 
1823 for (;;) { 1375 for (;;) {
1824 n++; 1376 n++;
1825 t = s; 1377 t = s;
1826 while (*s != sep && *s != '\n' && *s != '\0') 1378 while (*s != sep && *s != '\n' && *s != '\0')
1827 s++; 1379 s++;
1828 temp = *s; 1380 temp = *s;
1829 setptr(s, '\0'); 1381 setptr(s, '\0');
1830 snprintf(num, sizeof(num), "%d", n); 1382 snprintf(num, sizeof(num), "%d", n);
1831 if (is_number(t, & result)) 1383 if (is_number(t, & result))
1832 setsymtab(num, t, result, STR|NUM, (Array *) ap->sval); 1384 setsymtab(num, t, result, STR|NUM, (Array *) ap->sval);
1833 else 1385 else
1834 setsymtab(num, t, 0.0, STR, (Array *) ap->sval); 1386 setsymtab(num, t, 0.0, STR, (Array *) ap->sval);
1835 setptr(s, temp); 1387 setptr(s, temp);
@@ -1968,47 +1520,46 @@ Cell *instat(Node **a, int n) /* for (a[ @@ -1968,47 +1520,46 @@ Cell *instat(Node **a, int n) /* for (a[
1968 } 1520 }
1969 return True; 1521 return True;
1970} 1522}
1971 1523
1972static char *nawk_convert(const char *s, int (*fun_c)(int), 1524static char *nawk_convert(const char *s, int (*fun_c)(int),
1973 wint_t (*fun_wc)(wint_t)) 1525 wint_t (*fun_wc)(wint_t))
1974{ 1526{
1975 char *buf = NULL; 1527 char *buf = NULL;
1976 char *pbuf = NULL; 1528 char *pbuf = NULL;
1977 const char *ps = NULL; 1529 const char *ps = NULL;
1978 size_t n = 0; 1530 size_t n = 0;
1979 wchar_t wc; 1531 wchar_t wc;
1980 size_t sz = MB_CUR_MAX; 1532 size_t sz = MB_CUR_MAX;
1981 int unused; 
1982 1533
1983 if (sz == 1) { 1534 if (sz == 1) {
1984 buf = tostring(s); 1535 buf = tostring(s);
1985 1536
1986 for (pbuf = buf; *pbuf; pbuf++) 1537 for (pbuf = buf; *pbuf; pbuf++)
1987 *pbuf = fun_c((uschar)*pbuf); 1538 *pbuf = fun_c((uschar)*pbuf);
1988 1539
1989 return buf; 1540 return buf;
1990 } else { 1541 } else {
1991 /* upper/lower character may be shorter/longer */ 1542 /* upper/lower character may be shorter/longer */
1992 buf = tostringN(s, strlen(s) * sz + 1); 1543 buf = tostringN(s, strlen(s) * sz + 1);
1993 1544
1994 (void) mbtowc(NULL, NULL, 0); /* reset internal state */ 1545 (void) mbtowc(NULL, NULL, 0); /* reset internal state */
1995 /* 1546 /*
1996 * Reset internal state here too. 1547 * Reset internal state here too.
1997 * Assign result to avoid a compiler warning. (Casting to void 1548 * Assign result to avoid a compiler warning. (Casting to void
1998 * doesn't work.) 1549 * doesn't work.)
1999 * Increment said variable to avoid a different warning. 1550 * Increment said variable to avoid a different warning.
2000 */ 1551 */
2001 unused = wctomb(NULL, L'\0'); 1552 int unused = wctomb(NULL, L'\0');
2002 unused++; 1553 unused++;
2003 1554
2004 ps = s; 1555 ps = s;
2005 pbuf = buf; 1556 pbuf = buf;
2006 while (n = mbtowc(&wc, ps, sz), 1557 while (n = mbtowc(&wc, ps, sz),
2007 n > 0 && n != (size_t)-1 && n != (size_t)-2) 1558 n > 0 && n != (size_t)-1 && n != (size_t)-2)
2008 { 1559 {
2009 ps += n; 1560 ps += n;
2010 1561
2011 n = wctomb(pbuf, fun_wc(wc)); 1562 n = wctomb(pbuf, fun_wc(wc));
2012 if (n == (size_t)-1) 1563 if (n == (size_t)-1)
2013 FATAL("illegal wide character %s", s); 1564 FATAL("illegal wide character %s", s);
2014 1565
@@ -2042,48 +1593,46 @@ static wint_t towlower(wint_t wc) @@ -2042,48 +1593,46 @@ static wint_t towlower(wint_t wc)
2042} 1593}
2043#endif 1594#endif
2044 1595
2045static char *nawk_toupper(const char *s) 1596static char *nawk_toupper(const char *s)
2046{ 1597{
2047 return nawk_convert(s, toupper, towupper); 1598 return nawk_convert(s, toupper, towupper);
2048} 1599}
2049 1600
2050static char *nawk_tolower(const char *s) 1601static char *nawk_tolower(const char *s)
2051{ 1602{
2052 return nawk_convert(s, tolower, towlower); 1603 return nawk_convert(s, tolower, towlower);
2053} 1604}
2054 1605
2055 
2056 
2057Cell *bltin(Node **a, int n) /* builtin functions. a[0] is type, a[1] is arg list */ 1606Cell *bltin(Node **a, int n) /* builtin functions. a[0] is type, a[1] is arg list */
2058{ 1607{
2059 Cell *x, *y; 1608 Cell *x, *y;
2060 Awkfloat u; 1609 Awkfloat u;
2061 int t; 1610 int t;
2062 Awkfloat tmp; 1611 Awkfloat tmp;
2063 char *buf; 1612 char *buf;
2064 Node *nextarg; 1613 Node *nextarg;
2065 FILE *fp; 1614 FILE *fp;
2066 int status = 0; 1615 int status = 0;
2067 1616
2068 t = ptoi(a[0]); 1617 t = ptoi(a[0]);
2069 x = execute(a[1]); 1618 x = execute(a[1]);
2070 nextarg = a[1]->nnext; 1619 nextarg = a[1]->nnext;
2071 switch (t) { 1620 switch (t) {
2072 case FLENGTH: 1621 case FLENGTH:
2073 if (isarr(x)) 1622 if (isarr(x))
2074 u = ((Array *) x->sval)->nelem; /* GROT. should be function*/ 1623 u = ((Array *) x->sval)->nelem; /* GROT. should be function*/
2075 else 1624 else
2076 u = u8_strlen(getsval(x)); 1625 u = strlen(getsval(x));
2077 break; 1626 break;
2078 case FLOG: 1627 case FLOG:
2079 errno = 0; 1628 errno = 0;
2080 u = errcheck(log(getfval(x)), "log"); 1629 u = errcheck(log(getfval(x)), "log");
2081 break; 1630 break;
2082 case FINT: 1631 case FINT:
2083 modf(getfval(x), &u); break; 1632 modf(getfval(x), &u); break;
2084 case FEXP: 1633 case FEXP:
2085 errno = 0; 1634 errno = 0;
2086 u = errcheck(exp(getfval(x)), "exp"); 1635 u = errcheck(exp(getfval(x)), "exp");
2087 break; 1636 break;
2088 case FSQRT: 1637 case FSQRT:
2089 errno = 0; 1638 errno = 0;
@@ -2585,51 +2134,13 @@ void backsub(char **pb_ptr, const char * @@ -2585,51 +2134,13 @@ void backsub(char **pb_ptr, const char *
2585 } else { /* \\x -> \\x */ 2134 } else { /* \\x -> \\x */
2586 *pb++ = *sptr++; 2135 *pb++ = *sptr++;
2587 *pb++ = *sptr++; 2136 *pb++ = *sptr++;
2588 } 2137 }
2589 } else if (sptr[1] == '&') { /* literal & */ 2138 } else if (sptr[1] == '&') { /* literal & */
2590 sptr++; 2139 sptr++;
2591 *pb++ = *sptr++; 2140 *pb++ = *sptr++;
2592 } else /* literal \ */ 2141 } else /* literal \ */
2593 *pb++ = *sptr++; 2142 *pb++ = *sptr++;
2594 2143
2595 *pb_ptr = pb; 2144 *pb_ptr = pb;
2596 *sptr_ptr = sptr; 2145 *sptr_ptr = sptr;
2597} 2146}
2598 
2599static char *wide_char_to_byte_str(int rune, size_t *outlen) 
2600{ 
2601 static char buf[5]; 
2602 int len; 
2603 
2604 if (rune < 0 || rune > 0x10FFFF) 
2605 return NULL; 
2606 
2607 memset(buf, 0, sizeof(buf)); 
2608 
2609 len = 0; 
2610 if (rune <= 0x0000007F) { 
2611 buf[len++] = rune; 
2612 } else if (rune <= 0x000007FF) { 
2613 // 110xxxxx 10xxxxxx 
2614 buf[len++] = 0xC0 | (rune >> 6); 
2615 buf[len++] = 0x80 | (rune & 0x3F); 
2616 } else if (rune <= 0x0000FFFF) { 
2617 // 1110xxxx 10xxxxxx 10xxxxxx 
2618 buf[len++] = 0xE0 | (rune >> 12); 
2619 buf[len++] = 0x80 | ((rune >> 6) & 0x3F); 
2620 buf[len++] = 0x80 | (rune & 0x3F); 
2621 
2622 } else { 
2623 // 0x00010000 - 0x10FFFF 
2624 // 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 
2625 buf[len++] = 0xF0 | (rune >> 18); 
2626 buf[len++] = 0x80 | ((rune >> 12) & 0x3F); 
2627 buf[len++] = 0x80 | ((rune >> 6) & 0x3F); 
2628 buf[len++] = 0x80 | (rune & 0x3F); 
2629 } 
2630 
2631 *outlen = len; 
2632 buf[len++] = '\0'; 
2633 
2634 return buf; 
2635} 

cvs diff -r1.6 -r1.7 pkgsrc/lang/nawk/files/lib.c (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/lib.c 2023/09/12 19:16:52 1.6
+++ pkgsrc/lang/nawk/files/lib.c 2023/09/17 10:32:06 1.7
@@ -24,28 +24,26 @@ THIS SOFTWARE. @@ -24,28 +24,26 @@ THIS SOFTWARE.
24 24
25#define DEBUG 25#define DEBUG
26#include <stdio.h> 26#include <stdio.h>
27#include <string.h> 27#include <string.h>
28#include <strings.h> 28#include <strings.h>
29#include <ctype.h> 29#include <ctype.h>
30#include <errno.h> 30#include <errno.h>
31#include <stdlib.h> 31#include <stdlib.h>
32#include <stdarg.h> 32#include <stdarg.h>
33#include <limits.h> 33#include <limits.h>
34#include <math.h> 34#include <math.h>
35#include "awk.h" 35#include "awk.h"
36 36
37extern int u8_nextlen(const char *s); 
38 
39char EMPTY[] = { '\0' }; 37char EMPTY[] = { '\0' };
40FILE *infile = NULL; 38FILE *infile = NULL;
41bool innew; /* true = infile has not been read by readrec */ 39bool innew; /* true = infile has not been read by readrec */
42char *file = EMPTY; 40char *file = EMPTY;
43char *record; 41char *record;
44int recsize = RECSIZE; 42int recsize = RECSIZE;
45char *fields; 43char *fields;
46int fieldssize = RECSIZE; 44int fieldssize = RECSIZE;
47 45
48Cell **fldtab; /* pointers to Cells */ 46Cell **fldtab; /* pointers to Cells */
49static size_t len_inputFS = 0; 47static size_t len_inputFS = 0;
50static char *inputFS = NULL; /* FS at time of input, for field splitting */ 48static char *inputFS = NULL; /* FS at time of input, for field splitting */
51 49
@@ -211,54 +209,48 @@ int getrec(char **pbuf, int *pbufsize, b @@ -211,54 +209,48 @@ int getrec(char **pbuf, int *pbufsize, b
211 *pbuf = buf; 209 *pbuf = buf;
212 *pbufsize = savebufsize; 210 *pbufsize = savebufsize;
213 return 0; /* true end of file */ 211 return 0; /* true end of file */
214} 212}
215 213
216void nextfile(void) 214void nextfile(void)
217{ 215{
218 if (infile != NULL && infile != stdin) 216 if (infile != NULL && infile != stdin)
219 fclose(infile); 217 fclose(infile);
220 infile = NULL; 218 infile = NULL;
221 argno++; 219 argno++;
222} 220}
223 221
224extern int readcsvrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag); 
225 
226int readrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag) /* read one record into buf */ 222int readrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag) /* read one record into buf */
227{ 223{
228 int sep, c, isrec; // POTENTIAL BUG? isrec is a macro in awk.h 224 int sep, c, isrec;
229 char *rr = *pbuf, *buf = *pbuf; 225 char *rr, *buf = *pbuf;
230 int bufsize = *pbufsize; 226 int bufsize = *pbufsize;
231 char *rs = getsval(rsloc); 227 char *rs = getsval(rsloc);
232 228
233 if (CSV) { 229 if (*rs && rs[1]) {
234 c = readcsvrec(pbuf, pbufsize, inf, newflag); 
235 isrec = (c == EOF && rr == buf) ? false : true; 
236 } else if (*rs && rs[1]) { 
237 bool found; 230 bool found;
238 231
239 fa *pfa = makedfa(rs, 1); 232 fa *pfa = makedfa(rs, 1);
240 if (newflag) 233 if (newflag)
241 found = fnematch(pfa, inf, &buf, &bufsize, recsize); 234 found = fnematch(pfa, inf, &buf, &bufsize, recsize);
242 else { 235 else {
243 int tempstat = pfa->initstat; 236 int tempstat = pfa->initstat;
244 pfa->initstat = 2; 237 pfa->initstat = 2;
245 found = fnematch(pfa, inf, &buf, &bufsize, recsize); 238 found = fnematch(pfa, inf, &buf, &bufsize, recsize);
246 pfa->initstat = tempstat; 239 pfa->initstat = tempstat;
247 } 240 }
248 if (found) 241 if (found)
249 setptr(patbeg, '\0'); 242 setptr(patbeg, '\0');
250 isrec = (found == 0 && *buf == '\0') ? false : true; 243 isrec = (found == 0 && *buf == '\0') ? false : true;
251 
252 } else { 244 } else {
253 if ((sep = *rs) == 0) { 245 if ((sep = *rs) == 0) {
254 sep = '\n'; 246 sep = '\n';
255 while ((c=getc(inf)) == '\n' && c != EOF) /* skip leading \n's */ 247 while ((c=getc(inf)) == '\n' && c != EOF) /* skip leading \n's */
256 ; 248 ;
257 if (c != EOF) 249 if (c != EOF)
258 ungetc(c, inf); 250 ungetc(c, inf);
259 } 251 }
260 for (rr = buf; ; ) { 252 for (rr = buf; ; ) {
261 for (; (c=getc(inf)) != sep && c != EOF; ) { 253 for (; (c=getc(inf)) != sep && c != EOF; ) {
262 if (rr-buf+1 > bufsize) 254 if (rr-buf+1 > bufsize)
263 if (!adjbuf(&buf, &bufsize, 1+rr-buf, 255 if (!adjbuf(&buf, &bufsize, 1+rr-buf,
264 recsize, &rr, "readrec 1")) 256 recsize, &rr, "readrec 1"))
@@ -276,96 +268,47 @@ int readrec(char **pbuf, int *pbufsize,  @@ -276,96 +268,47 @@ int readrec(char **pbuf, int *pbufsize,
276 *rr++ = c; 268 *rr++ = c;
277 } 269 }
278 if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3")) 270 if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3"))
279 FATAL("input record `%.30s...' too long", buf); 271 FATAL("input record `%.30s...' too long", buf);
280 *rr = 0; 272 *rr = 0;
281 isrec = (c == EOF && rr == buf) ? false : true; 273 isrec = (c == EOF && rr == buf) ? false : true;
282 } 274 }
283 *pbuf = buf; 275 *pbuf = buf;
284 *pbufsize = bufsize; 276 *pbufsize = bufsize;
285 DPRINTF("readrec saw <%s>, returns %d\n", buf, isrec); 277 DPRINTF("readrec saw <%s>, returns %d\n", buf, isrec);
286 return isrec; 278 return isrec;
287} 279}
288 280
289 
290/******************* 
291 * loose ends here: 
292 * \r\n should become \n 
293 * what about bare \r? Excel uses that for embedded newlines 
294 * can't have "" in unquoted fields, according to RFC 4180 
295*/ 
296 
297 
298int readcsvrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag) /* csv can have \n's */ 
299{ /* so read a complete record that might be multiple lines */ 
300 int sep, c; 
301 char *rr = *pbuf, *buf = *pbuf; 
302 int bufsize = *pbufsize; 
303 bool in_quote = false; 
304 
305 sep = '\n'; /* the only separator; have to skip over \n embedded in "..." */ 
306 rr = buf; 
307 while ((c = getc(inf)) != EOF) { 
308 if (c == sep) { 
309 if (! in_quote) 
310 break; 
311 if (rr > buf && rr[-1] == '\r') // remove \r if was \r\n 
312 rr--; 
313 } 
314 
315 if (rr-buf+1 > bufsize) 
316 if (!adjbuf(&buf, &bufsize, 1+rr-buf, 
317 recsize, &rr, "readcsvrec 1")) 
318 FATAL("input record `%.30s...' too long", buf); 
319 *rr++ = c; 
320 if (c == '"') 
321 in_quote = ! in_quote; 
322 } 
323 if (c == '\n' && rr > buf && rr[-1] == '\r') // remove \r if was \r\n 
324 rr--; 
325 
326 if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readcsvrec 4")) 
327 FATAL("input record `%.30s...' too long", buf); 
328 *rr = 0; 
329 *pbuf = buf; 
330 *pbufsize = bufsize; 
331 DPRINTF("readcsvrec saw <%s>, returns %d\n", buf, c); 
332 return c; 
333} 
334 
335char *getargv(int n) /* get ARGV[n] */ 281char *getargv(int n) /* get ARGV[n] */
336{ 282{
337 Cell *x; 283 Cell *x;
338 char *s, temp[50]; 284 char *s, temp[50];
339 extern Array *ARGVtab; 285 extern Array *ARGVtab;
340 286
341 snprintf(temp, sizeof(temp), "%d", n); 287 snprintf(temp, sizeof(temp), "%d", n);
342 if (lookup(temp, ARGVtab) == NULL) 288 if (lookup(temp, ARGVtab) == NULL)
343 return NULL; 289 return NULL;
344 x = setsymtab(temp, "", 0.0, STR, ARGVtab); 290 x = setsymtab(temp, "", 0.0, STR, ARGVtab);
345 s = getsval(x); 291 s = getsval(x);
346 DPRINTF("getargv(%d) returns |%s|\n", n, s); 292 DPRINTF("getargv(%d) returns |%s|\n", n, s);
347 return s; 293 return s;
348} 294}
349 295
350void setclvar(char *s) /* set var=value from s */ 296void setclvar(char *s) /* set var=value from s */
351{ 297{
352 char *e, *p; 298 char *e, *p;
353 Cell *q; 299 Cell *q;
354 double result; 300 double result;
355 301
356/* commit f3d9187d4e0f02294fb1b0e31152070506314e67 broke T.argv test */ 
357/* I don't understand why it was changed. */ 
358 
359 for (p=s; *p != '='; p++) 302 for (p=s; *p != '='; p++)
360 ; 303 ;
361 e = p; 304 e = p;
362 *p++ = 0; 305 *p++ = 0;
363 p = qstring(p, '\0'); 306 p = qstring(p, '\0');
364 q = setsymtab(s, p, 0.0, STR, symtab); 307 q = setsymtab(s, p, 0.0, STR, symtab);
365 setsval(q, p); 308 setsval(q, p);
366 if (is_number(q->sval, & result)) { 309 if (is_number(q->sval, & result)) {
367 q->fval = result; 310 q->fval = result;
368 q->tval |= NUM; 311 q->tval |= NUM;
369 } 312 }
370 DPRINTF("command line set %s to |%s|\n", s, p); 313 DPRINTF("command line set %s to |%s|\n", s, p);
371 free(p); 314 free(p);
@@ -390,97 +333,65 @@ void fldbld(void) /* create fields from  @@ -390,97 +333,65 @@ void fldbld(void) /* create fields from
390 n = strlen(r); 333 n = strlen(r);
391 if (n > fieldssize) { 334 if (n > fieldssize) {
392 xfree(fields); 335 xfree(fields);
393 if ((fields = (char *) malloc(n+2)) == NULL) /* possibly 2 final \0s */ 336 if ((fields = (char *) malloc(n+2)) == NULL) /* possibly 2 final \0s */
394 FATAL("out of space for fields in fldbld %d", n); 337 FATAL("out of space for fields in fldbld %d", n);
395 fieldssize = n; 338 fieldssize = n;
396 } 339 }
397 fr = fields; 340 fr = fields;
398 i = 0; /* number of fields accumulated here */ 341 i = 0; /* number of fields accumulated here */
399 if (inputFS == NULL) /* make sure we have a copy of FS */ 342 if (inputFS == NULL) /* make sure we have a copy of FS */
400 savefs(); 343 savefs();
401 if (strlen(inputFS) > 1) { /* it's a regular expression */ 344 if (strlen(inputFS) > 1) { /* it's a regular expression */
402 i = refldbld(r, inputFS); 345 i = refldbld(r, inputFS);
403 } else if (!CSV && (sep = *inputFS) == ' ') { /* default whitespace */ 346 } else if ((sep = *inputFS) == ' ') { /* default whitespace */
404 for (i = 0; ; ) { 347 for (i = 0; ; ) {
405 while (*r == ' ' || *r == '\t' || *r == '\n') 348 while (*r == ' ' || *r == '\t' || *r == '\n')
406 r++; 349 r++;
407 if (*r == 0) 350 if (*r == 0)
408 break; 351 break;
409 i++; 352 i++;
410 if (i > nfields) 353 if (i > nfields)
411 growfldtab(i); 354 growfldtab(i);
412 if (freeable(fldtab[i])) 355 if (freeable(fldtab[i]))
413 xfree(fldtab[i]->sval); 356 xfree(fldtab[i]->sval);
414 fldtab[i]->sval = fr; 357 fldtab[i]->sval = fr;
415 fldtab[i]->tval = FLD | STR | DONTFREE; 358 fldtab[i]->tval = FLD | STR | DONTFREE;
416 do 359 do
417 *fr++ = *r++; 360 *fr++ = *r++;
418 while (*r != ' ' && *r != '\t' && *r != '\n' && *r != '\0'); 361 while (*r != ' ' && *r != '\t' && *r != '\n' && *r != '\0');
419 *fr++ = 0; 362 *fr++ = 0;
420 } 363 }
421 *fr = 0; 364 *fr = 0;
422 } else if (CSV) { /* CSV processing. no error handling */ 365 } else if ((sep = *inputFS) == 0) { /* new: FS="" => 1 char/field */
423 if (*r != 0) { 366 for (i = 0; *r != '\0'; r += n) {
424 for (;;) { 367 char buf[MB_LEN_MAX + 1];
425 i++; 368
426 if (i > nfields) 
427 growfldtab(i); 
428 if (freeable(fldtab[i])) 
429 xfree(fldtab[i]->sval); 
430 fldtab[i]->sval = fr; 
431 fldtab[i]->tval = FLD | STR | DONTFREE; 
432 if (*r == '"' ) { /* start of "..." */ 
433 for (r++ ; *r != '\0'; ) { 
434 if (*r == '"' && r[1] != '\0' && r[1] == '"') { 
435 r += 2; /* doubled quote */ 
436 *fr++ = '"'; 
437 } else if (*r == '"' && (r[1] == '\0' || r[1] == ',')) { 
438 r++; /* skip over closing quote */ 
439 break; 
440 } else { 
441 *fr++ = *r++; 
442 } 
443 } 
444 *fr++ = 0; 
445 } else { /* unquoted field */ 
446 while (*r != ',' && *r != '\0') 
447 *fr++ = *r++; 
448 *fr++ = 0; 
449 } 
450 if (*r++ == 0) 
451 break; 
452  
453 } 
454 } 
455 *fr = 0; 
456 } else if ((sep = *inputFS) == 0) { /* new: FS="" => 1 char/field */ 
457 for (i = 0; *r != '\0'; ) { 
458 char buf[10]; 
459 i++; 369 i++;
460 if (i > nfields) 370 if (i > nfields)
461 growfldtab(i); 371 growfldtab(i);
462 if (freeable(fldtab[i])) 372 if (freeable(fldtab[i]))
463 xfree(fldtab[i]->sval); 373 xfree(fldtab[i]->sval);
464 n = u8_nextlen(r); 374 n = mblen(r, MB_LEN_MAX);
465 for (j = 0; j < n; j++) 375 if (n < 0)
466 buf[j] = *r++; 376 n = 1;
467 buf[j] = '\0'; 377 memcpy(buf, r, n);
 378 buf[n] = '\0';
468 fldtab[i]->sval = tostring(buf); 379 fldtab[i]->sval = tostring(buf);
469 fldtab[i]->tval = FLD | STR; 380 fldtab[i]->tval = FLD | STR;
470 } 381 }
471 *fr = 0; 382 *fr = 0;
472 } else if (*r != 0) { /* if 0, it's a null field */ 383 } else if (*r != 0) { /* if 0, it's a null field */
473 /* subtle case: if length(FS) == 1 && length(RS > 0) 384 /* subtlecase : if length(FS) == 1 && length(RS > 0)
474 * \n is NOT a field separator (cf awk book 61,84). 385 * \n is NOT a field separator (cf awk book 61,84).
475 * this variable is tested in the inner while loop. 386 * this variable is tested in the inner while loop.
476 */ 387 */
477 int rtest = '\n'; /* normal case */ 388 int rtest = '\n'; /* normal case */
478 if (strlen(*RS) > 0) 389 if (strlen(*RS) > 0)
479 rtest = '\0'; 390 rtest = '\0';
480 for (;;) { 391 for (;;) {
481 i++; 392 i++;
482 if (i > nfields) 393 if (i > nfields)
483 growfldtab(i); 394 growfldtab(i);
484 if (freeable(fldtab[i])) 395 if (freeable(fldtab[i]))
485 xfree(fldtab[i]->sval); 396 xfree(fldtab[i]->sval);
486 fldtab[i]->sval = fr; 397 fldtab[i]->sval = fr;
@@ -875,31 +786,31 @@ bool is_valid_number(const char *s, bool @@ -875,31 +786,31 @@ bool is_valid_number(const char *s, bool
875{ 786{
876 double r; 787 double r;
877 char *ep; 788 char *ep;
878 bool retval = false; 789 bool retval = false;
879 bool is_nan = false; 790 bool is_nan = false;
880 bool is_inf = false; 791 bool is_inf = false;
881 792
882 if (no_trailing) 793 if (no_trailing)
883 *no_trailing = false; 794 *no_trailing = false;
884 795
885 while (isspace(*s)) 796 while (isspace(*s))
886 s++; 797 s++;
887 798
888 /* no hex floating point, sorry */ 799 // no hex floating point, sorry
889 if (s[0] == '0' && tolower(s[1]) == 'x') 800 if (s[0] == '0' && tolower(s[1]) == 'x')
890 return false; 801 return false;
891 802
892 /* allow +nan, -nan, +inf, -inf, any other letter, no */ 803 // allow +nan, -nan, +inf, -inf, any other letter, no
893 if (s[0] == '+' || s[0] == '-') { 804 if (s[0] == '+' || s[0] == '-') {
894 is_nan = (strncasecmp(s+1, "nan", 3) == 0); 805 is_nan = (strncasecmp(s+1, "nan", 3) == 0);
895 is_inf = (strncasecmp(s+1, "inf", 3) == 0); 806 is_inf = (strncasecmp(s+1, "inf", 3) == 0);
896 if ((is_nan || is_inf) 807 if ((is_nan || is_inf)
897 && (isspace(s[4]) || s[4] == '\0')) 808 && (isspace(s[4]) || s[4] == '\0'))
898 goto convert; 809 goto convert;
899 else if (! isdigit(s[1]) && s[1] != '.') 810 else if (! isdigit(s[1]) && s[1] != '.')
900 return false; 811 return false;
901 } 812 }
902 else if (! isdigit(s[0]) && s[0] != '.') 813 else if (! isdigit(s[0]) && s[0] != '.')
903 return false; 814 return false;
904 815
905convert: 816convert:
@@ -913,18 +824,18 @@ convert: @@ -913,18 +824,18 @@ convert:
913 824
914 if (result != NULL) 825 if (result != NULL)
915 *result = r; 826 *result = r;
916 827
917 /* 828 /*
918 * check for trailing stuff 829 * check for trailing stuff
919 */ 830 */
920 while (isspace(*ep)) 831 while (isspace(*ep))
921 ep++; 832 ep++;
922 833
923 if (no_trailing != NULL) 834 if (no_trailing != NULL)
924 *no_trailing = (*ep == '\0'); 835 *no_trailing = (*ep == '\0');
925 836
926 /* return true if found the end, or trailing stuff is allowed */ 837 // return true if found the end, or trailing stuff is allowed
927 retval = *ep == '\0' || trailing_stuff_ok; 838 retval = *ep == '\0' || trailing_stuff_ok;
928 839
929 return retval; 840 return retval;
930} 841}

cvs diff -r1.6 -r1.7 pkgsrc/lang/nawk/files/proto.h (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/proto.h 2023/09/12 19:16:52 1.6
+++ pkgsrc/lang/nawk/files/proto.h 2023/09/17 10:32:06 1.7
@@ -33,33 +33,34 @@ extern int yylex(void); @@ -33,33 +33,34 @@ extern int yylex(void);
33extern void startreg(void); 33extern void startreg(void);
34extern int input(void); 34extern int input(void);
35extern void unput(int); 35extern void unput(int);
36extern void unputstr(const char *); 36extern void unputstr(const char *);
37extern int yylook(void); 37extern int yylook(void);
38extern int yyback(int *, int); 38extern int yyback(int *, int);
39extern int yyinput(void); 39extern int yyinput(void);
40 40
41extern fa *makedfa(const char *, bool); 41extern fa *makedfa(const char *, bool);
42extern fa *mkdfa(const char *, bool); 42extern fa *mkdfa(const char *, bool);
43extern int makeinit(fa *, bool); 43extern int makeinit(fa *, bool);
44extern void penter(Node *); 44extern void penter(Node *);
45extern void freetr(Node *); 45extern void freetr(Node *);
 46extern int hexstr(const uschar **);
46extern int quoted(const uschar **); 47extern int quoted(const uschar **);
47extern int *cclenter(const char *); 48extern char *cclenter(const char *);
48extern noreturn void overflo(const char *); 49extern noreturn void overflo(const char *);
49extern void cfoll(fa *, Node *); 50extern void cfoll(fa *, Node *);
50extern int first(Node *); 51extern int first(Node *);
51extern void follow(Node *); 52extern void follow(Node *);
52extern int member(int, int *); 53extern int member(int, const char *);
53extern int match(fa *, const char *); 54extern int match(fa *, const char *);
54extern int pmatch(fa *, const char *); 55extern int pmatch(fa *, const char *);
55extern int nematch(fa *, const char *); 56extern int nematch(fa *, const char *);
56extern bool fnematch(fa *, FILE *, char **, int *, int); 57extern bool fnematch(fa *, FILE *, char **, int *, int);
57extern Node *reparse(const char *); 58extern Node *reparse(const char *);
58extern Node *regexp(void); 59extern Node *regexp(void);
59extern Node *primary(void); 60extern Node *primary(void);
60extern Node *concat(Node *); 61extern Node *concat(Node *);
61extern Node *alt(Node *); 62extern Node *alt(Node *);
62extern Node *unary(Node *); 63extern Node *unary(Node *);
63extern int relex(void); 64extern int relex(void);
64extern int cgoto(fa *, int, int); 65extern int cgoto(fa *, int, int);
65extern void freefa(fa *); 66extern void freefa(fa *);

cvs diff -r1.6 -r1.7 pkgsrc/lang/nawk/files/tran.c (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/tran.c 2023/09/12 19:16:52 1.6
+++ pkgsrc/lang/nawk/files/tran.c 2023/09/17 10:32:06 1.7
@@ -298,27 +298,27 @@ Awkfloat setfval(Cell *vp, Awkfloat f) / @@ -298,27 +298,27 @@ Awkfloat setfval(Cell *vp, Awkfloat f) /
298 298
299 f += 0.0; /* normalise negative zero to positive zero */ 299 f += 0.0; /* normalise negative zero to positive zero */
300 if ((vp->tval & (NUM | STR)) == 0) 300 if ((vp->tval & (NUM | STR)) == 0)
301 funnyvar(vp, "assign to"); 301 funnyvar(vp, "assign to");
302 if (isfld(vp)) { 302 if (isfld(vp)) {
303 donerec = false; /* mark $0 invalid */ 303 donerec = false; /* mark $0 invalid */
304 fldno = atoi(vp->nval); 304 fldno = atoi(vp->nval);
305 if (fldno > *NF) 305 if (fldno > *NF)
306 newfld(fldno); 306 newfld(fldno);
307 DPRINTF("setting field %d to %g\n", fldno, f); 307 DPRINTF("setting field %d to %g\n", fldno, f);
308 } else if (&vp->fval == NF) { 308 } else if (&vp->fval == NF) {
309 donerec = false; /* mark $0 invalid */ 309 donerec = false; /* mark $0 invalid */
310 setlastfld(f); 310 setlastfld(f);
311 DPRINTF("setfval: setting NF to %g\n", f); 311 DPRINTF("setting NF to %g\n", f);
312 } else if (isrec(vp)) { 312 } else if (isrec(vp)) {
313 donefld = false; /* mark $1... invalid */ 313 donefld = false; /* mark $1... invalid */
314 donerec = true; 314 donerec = true;
315 savefs(); 315 savefs();
316 } else if (vp == ofsloc) { 316 } else if (vp == ofsloc) {
317 if (!donerec) 317 if (!donerec)
318 recbld(); 318 recbld();
319 } 319 }
320 if (freeable(vp)) 320 if (freeable(vp))
321 xfree(vp->sval); /* free any previous string */ 321 xfree(vp->sval); /* free any previous string */
322 vp->tval &= ~(STR|CONVC|CONVO); /* mark string invalid */ 322 vp->tval &= ~(STR|CONVC|CONVO); /* mark string invalid */
323 vp->fmt = NULL; 323 vp->fmt = NULL;
324 vp->tval |= NUM; /* mark number ok */ 324 vp->tval |= NUM; /* mark number ok */
@@ -338,30 +338,26 @@ void funnyvar(Cell *vp, const char *rw) @@ -338,30 +338,26 @@ void funnyvar(Cell *vp, const char *rw)
338 (void *)vp, vp->nval, vp->sval, vp->fval, vp->tval); 338 (void *)vp, vp->nval, vp->sval, vp->fval, vp->tval);
339} 339}
340 340
341char *setsval(Cell *vp, const char *s) /* set string val of a Cell */ 341char *setsval(Cell *vp, const char *s) /* set string val of a Cell */
342{ 342{
343 char *t; 343 char *t;
344 int fldno; 344 int fldno;
345 Awkfloat f; 345 Awkfloat f;
346 346
347 DPRINTF("starting setsval %p: %s = \"%s\", t=%o, r,f=%d,%d\n", 347 DPRINTF("starting setsval %p: %s = \"%s\", t=%o, r,f=%d,%d\n",
348 (void*)vp, NN(vp->nval), s, vp->tval, donerec, donefld); 348 (void*)vp, NN(vp->nval), s, vp->tval, donerec, donefld);
349 if ((vp->tval & (NUM | STR)) == 0) 349 if ((vp->tval & (NUM | STR)) == 0)
350 funnyvar(vp, "assign to"); 350 funnyvar(vp, "assign to");
351 if (CSV && (vp == rsloc)) 
352 WARNING("danger: don't set RS when --csv is in effect"); 
353 if (CSV && (vp == fsloc)) 
354 WARNING("danger: don't set FS when --csv is in effect"); 
355 if (isfld(vp)) { 351 if (isfld(vp)) {
356 donerec = false; /* mark $0 invalid */ 352 donerec = false; /* mark $0 invalid */
357 fldno = atoi(vp->nval); 353 fldno = atoi(vp->nval);
358 if (fldno > *NF) 354 if (fldno > *NF)
359 newfld(fldno); 355 newfld(fldno);
360 DPRINTF("setting field %d to %s (%p)\n", fldno, s, (const void*)s); 356 DPRINTF("setting field %d to %s (%p)\n", fldno, s, (const void*)s);
361 } else if (isrec(vp)) { 357 } else if (isrec(vp)) {
362 donefld = false; /* mark $1... invalid */ 358 donefld = false; /* mark $1... invalid */
363 donerec = true; 359 donerec = true;
364 savefs(); 360 savefs();
365 } else if (vp == ofsloc) { 361 } else if (vp == ofsloc) {
366 if (!donerec) 362 if (!donerec)
367 recbld(); 363 recbld();
@@ -369,27 +365,27 @@ char *setsval(Cell *vp, const char *s) / @@ -369,27 +365,27 @@ char *setsval(Cell *vp, const char *s) /
369 t = s ? tostring(s) : tostring(""); /* in case it's self-assign */ 365 t = s ? tostring(s) : tostring(""); /* in case it's self-assign */
370 if (freeable(vp)) 366 if (freeable(vp))
371 xfree(vp->sval); 367 xfree(vp->sval);
372 vp->tval &= ~(NUM|DONTFREE|CONVC|CONVO); 368 vp->tval &= ~(NUM|DONTFREE|CONVC|CONVO);
373 vp->tval |= STR; 369 vp->tval |= STR;
374 vp->fmt = NULL; 370 vp->fmt = NULL;
375 DPRINTF("setsval %p: %s = \"%s (%p) \", t=%o r,f=%d,%d\n", 371 DPRINTF("setsval %p: %s = \"%s (%p) \", t=%o r,f=%d,%d\n",
376 (void*)vp, NN(vp->nval), t, (void*)t, vp->tval, donerec, donefld); 372 (void*)vp, NN(vp->nval), t, (void*)t, vp->tval, donerec, donefld);
377 vp->sval = t; 373 vp->sval = t;
378 if (&vp->fval == NF) { 374 if (&vp->fval == NF) {
379 donerec = false; /* mark $0 invalid */ 375 donerec = false; /* mark $0 invalid */
380 f = getfval(vp); 376 f = getfval(vp);
381 setlastfld(f); 377 setlastfld(f);
382 DPRINTF("setsval: setting NF to %g\n", f); 378 DPRINTF("setting NF to %g\n", f);
383 } 379 }
384 380
385 return(vp->sval); 381 return(vp->sval);
386} 382}
387 383
388Awkfloat getfval(Cell *vp) /* get float val of a Cell */ 384Awkfloat getfval(Cell *vp) /* get float val of a Cell */
389{ 385{
390 if ((vp->tval & (NUM | STR)) == 0) 386 if ((vp->tval & (NUM | STR)) == 0)
391 funnyvar(vp, "read value of"); 387 funnyvar(vp, "read value of");
392 if (isfld(vp) && !donefld) 388 if (isfld(vp) && !donefld)
393 fldbld(); 389 fldbld();
394 else if (isrec(vp) && !donerec) 390 else if (isrec(vp) && !donerec)
395 recbld(); 391 recbld();

cvs diff -r1.3 -r1.4 pkgsrc/lang/nawk/files/nawk.1 (expand / switch to unified diff)

--- pkgsrc/lang/nawk/files/nawk.1 2023/09/12 19:16:52 1.3
+++ pkgsrc/lang/nawk/files/nawk.1 2023/09/17 10:32:06 1.4
@@ -1,46 +1,44 @@ @@ -1,46 +1,44 @@
1.\" $NetBSD: nawk.1,v 1.3 2023/09/12 19:16:52 vins Exp $ 1.\" $NetBSD: nawk.1,v 1.4 2023/09/17 10:32:06 vins Exp $
2.\" 2.\"
3.\" This file is copied from awk.1 but with the following modifications 3.\" This file is copied from nawk.1 but with the following modifications
4.\" for pkgsrc: 4.\" for pkgsrc:
5.\" 5.\"
6.\" * awk is changed to nawk. 6.\" * nawk is changed to nnawk.
7.\" * Awk is changed to Nawk. 7.\" * Nawk is changed to Nnawk.
8.\" * AWK is changed to NAWK. 8.\" * NAWK is changed to NNAWK.
9.\" 9.\"
10.de EX 10.de EX
11.nf 11.nf
12.ft CW 12.ft CW
13.. 13..
14.de EE 14.de EE
15.br 15.br
16.fi 16.fi
17.ft 1 17.ft 1
18.. 18..
19.de TF 19.de TF
20.IP "" "\w'\fB\\$1\ \ \fP'u" 20.IP "" "\w'\fB\\$1\ \ \fP'u"
21.PD 0 21.PD 0
22.. 22..
23.TH NAWK 1 23.TH NAWK 1
24.CT 1 files prog_other 24.CT 1 files prog_other
25.SH NAME 25.SH NAME
26nawk \- pattern-directed scanning and processing language 26nawk \- pattern-directed scanning and processing language
27.SH SYNOPSIS 27.SH SYNOPSIS
28.B nawk 28.B nawk
29[ 29[
30.BI \-F 30.BI \-F
31.I fs 31.I fs
32| 
33.B \-\^\-csv 
34] 32]
35[ 33[
36.BI \-v 34.BI \-v
37.I var=value 35.I var=value
38] 36]
39[ 37[
40.I 'prog' 38.I 'prog'
41| 39|
42.BI \-f 40.BI \-f
43.I progfile 41.I progfile
44] 42]
45[ 43[
46.I file ... 44.I file ...
@@ -77,32 +75,26 @@ The option @@ -77,32 +75,26 @@ The option
77followed by 75followed by
78.I var=value 76.I var=value
79is an assignment to be done before 77is an assignment to be done before
80.I prog 78.I prog
81is executed; 79is executed;
82any number of 80any number of
83.B \-v 81.B \-v
84options may be present. 82options may be present.
85The 83The
86.B \-F 84.B \-F
87.I fs 85.I fs
88option defines the input field separator to be the regular expression 86option defines the input field separator to be the regular expression
89.IR fs . 87.IR fs .
90The 
91.B \-\^\-csv 
92option causes 
93.I nawk 
94to process records using (more or less) standard comma-separated values 
95(CSV) format. 
96.PP 88.PP
97An input line is normally made up of fields separated by white space, 89An input line is normally made up of fields separated by white space,
98or by the regular expression 90or by the regular expression
99.BR FS . 91.BR FS .
100The fields are denoted 92The fields are denoted
101.BR $1 , 93.BR $1 ,
102.BR $2 , 94.BR $2 ,
103\&..., while 95\&..., while
104.B $0 96.B $0
105refers to the entire line. 97refers to the entire line.
106If 98If
107.BR FS 99.BR FS
108is null, the input line is split into one field per character. 100is null, the input line is split into one field per character.
@@ -209,45 +201,45 @@ The built-in function @@ -209,45 +201,45 @@ The built-in function
209flushes any buffered output for the file or pipe 201flushes any buffered output for the file or pipe
210.IR expr . 202.IR expr .
211.PP 203.PP
212The mathematical functions 204The mathematical functions
213.BR atan2 , 205.BR atan2 ,
214.BR cos , 206.BR cos ,
215.BR exp , 207.BR exp ,
216.BR log , 208.BR log ,
217.BR sin , 209.BR sin ,
218and 210and
219.B sqrt 211.B sqrt
220are built in. 212are built in.
221Other built-in functions: 213Other built-in functions:
222.TF "\fBlength(\fR[\fIv\^\fR]\fB)\fR" 214.TF length
223.TP 215.TP
224\fBlength(\fR[\fIv\^\fR]\fB)\fR 216.B length
225the length of its argument 217the length of its argument
226taken as a string, 218taken as a string,
227number of elements in an array for an array argument, 219number of elements in an array for an array argument,
228or length of 220or length of
229.B $0 221.B $0
230if no argument. 222if no argument.
231.TP 223.TP
232.B rand() 224.B rand
233random number on [0,1). 225random number on [0,1).
234.TP 226.TP
235\fBsrand(\fR[\fIs\^\fR]\fB)\fR 227.B srand
236sets seed for 228sets seed for
237.B rand 229.B rand
238and returns the previous seed. 230and returns the previous seed.
239.TP 231.TP
240.BI int( x\^ ) 232.B int
241truncates to an integer value. 233truncates to an integer value.
242.TP 234.TP
243\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR 235\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
244the 236the
245.IR n -character 237.IR n -character
246substring of 238substring of
247.I s 239.I s
248that begins at position 240that begins at position
249.I m 241.I m
250counted from 1. 242counted from 1.
251If no 243If no
252.IR n , 244.IR n ,
253use the rest of the string. 245use the rest of the string.
@@ -396,37 +388,37 @@ Regular expressions may also occur in @@ -396,37 +388,37 @@ Regular expressions may also occur in
396relational expressions, using the operators 388relational expressions, using the operators
397.B ~ 389.B ~
398and 390and
399.BR !~ . 391.BR !~ .
400.BI / re / 392.BI / re /
401is a constant regular expression; 393is a constant regular expression;
402any string (constant or variable) may be used 394any string (constant or variable) may be used
403as a regular expression, except in the position of an isolated regular expression 395as a regular expression, except in the position of an isolated regular expression
404in a pattern. 396in a pattern.
405.PP 397.PP
406A pattern may consist of two patterns separated by a comma; 398A pattern may consist of two patterns separated by a comma;
407in this case, the action is performed for all lines 399in this case, the action is performed for all lines
408from an occurrence of the first pattern 400from an occurrence of the first pattern
409through an occurrence of the second, inclusive. 401though an occurrence of the second.
410.PP 402.PP
411A relational expression is one of the following: 403A relational expression is one of the following:
412.IP 404.IP
413.I expression matchop regular-expression 405.I expression matchop regular-expression
414.br 406.br
415.I expression relop expression 407.I expression relop expression
416.br 408.br
417.IB expression " in " array-name 409.IB expression " in " array-name
418.br 410.br
419.BI ( expr ,\| expr ,\| ... ") in " array-name 411.BI ( expr , expr,... ") in " array-name
420.PP 412.PP
421where a 413where a
422.I relop 414.I relop
423is any of the six relational operators in C, 415is any of the six relational operators in C,
424and a 416and a
425.I matchop 417.I matchop
426is either 418is either
427.B ~ 419.B ~
428(matches) 420(matches)
429or 421or
430.B !~ 422.B !~
431(does not match). 423(does not match).
432A conditional is an arithmetic expression, 424A conditional is an arithmetic expression,
@@ -506,27 +498,27 @@ is treated as a regular expression, and  @@ -506,27 +498,27 @@ is treated as a regular expression, and
506separated by text matching the expression. 498separated by text matching the expression.
507.TP 499.TP
508.B RSTART 500.B RSTART
509the start position of a string matched by 501the start position of a string matched by
510.BR match . 502.BR match .
511.TP 503.TP
512.B SUBSEP 504.B SUBSEP
513separates multiple subscripts (default 034). 505separates multiple subscripts (default 034).
514.PD 506.PD
515.PP 507.PP
516Functions may be defined (at the position of a pattern-action statement) thus: 508Functions may be defined (at the position of a pattern-action statement) thus:
517.IP 509.IP
518.B 510.B
519function foo(a, b, c) { ... } 511function foo(a, b, c) { ...; return x }
520.PP 512.PP
521Parameters are passed by value if scalar and by reference if array name; 513Parameters are passed by value if scalar and by reference if array name;
522functions may be called recursively. 514functions may be called recursively.
523Parameters are local to the function; all other variables are global. 515Parameters are local to the function; all other variables are global.
524Thus local variables may be created by providing excess parameters in 516Thus local variables may be created by providing excess parameters in
525the function definition. 517the function definition.
526.SH ENVIRONMENT VARIABLES 518.SH ENVIRONMENT VARIABLES
527If 519If
528.B POSIXLY_CORRECT 520.B POSIXLY_CORRECT
529is set in the environment, then 521is set in the environment, then
530.I nawk 522.I nawk
531follows the POSIX rules for 523follows the POSIX rules for
532.B sub 524.B sub
@@ -572,39 +564,38 @@ Print all lines between start/stop pairs @@ -572,39 +564,38 @@ Print all lines between start/stop pairs
572.nf 564.nf
573BEGIN { # Simulate echo(1) 565BEGIN { # Simulate echo(1)
574 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] 566 for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
575 printf "\en" 567 printf "\en"
576 exit } 568 exit }
577.fi 569.fi
578.EE 570.EE
579.SH SEE ALSO 571.SH SEE ALSO
580.IR grep (1), 572.IR grep (1),
581.IR lex (1), 573.IR lex (1),
582.IR sed (1) 574.IR sed (1)
583.br 575.br
584A. V. Aho, B. W. Kernighan, P. J. Weinberger, 576A. V. Aho, B. W. Kernighan, P. J. Weinberger,
585.IR "The NAWK Programming Language, Second Edition" , 577.IR "The NAWK Programming Language" ,
586Addison-Wesley, 2024. ISBN 978-0-13-826972-2, 0-13-826972-6. 578Addison-Wesley, 1988. ISBN 0-201-07981-X.
587.SH BUGS 579.SH BUGS
588There are no explicit conversions between numbers and strings. 580There are no explicit conversions between numbers and strings.
589To force an expression to be treated as a number add 0 to it; 581To force an expression to be treated as a number add 0 to it;
590to force it to be treated as a string concatenate 582to force it to be treated as a string concatenate
591\&\f(CW""\fP to it. 583\&\f(CW""\fP to it.
592.PP 584.PP
593The scope rules for variables in functions are a botch; 585The scope rules for variables in functions are a botch;
594the syntax is worse. 586the syntax is worse.
595.PP 587.PP
596Input is expected to be UTF-8 encoded. Other multibyte 588Only eight-bit characters sets are handled correctly.
597character sets are not handled. 
598.SH UNUSUAL FLOATING-POINT VALUES 589.SH UNUSUAL FLOATING-POINT VALUES
599.I Nawk 590.I Nawk
600was designed before IEEE 754 arithmetic defined Not-A-Number (NaN) 591was designed before IEEE 754 arithmetic defined Not-A-Number (NaN)
601and Infinity values, which are supported by all modern floating-point 592and Infinity values, which are supported by all modern floating-point
602hardware. 593hardware.
603.PP 594.PP
604Because 595Because
605.I nawk 596.I nawk
606uses 597uses
607.IR strtod (3) 598.IR strtod (3)
608and 599and
609.IR atof (3) 600.IR atof (3)
610to convert string values to double-precision floating-point values, 601to convert string values to double-precision floating-point values,