--- - branch: MAIN date: Wed Aug 16 23:38:35 UTC 2017 files: - new: '1.1' old: '0' path: othersrc/external/bsd/agcre/dist/internal.h pathrev: othersrc/external/bsd/agcre/dist/internal.h@1.1 type: added id: 20170816T233835Z.44b1b12119c9f2960f4cb4da5f18d0f35eba78cd log: "Just what this world needs - another regexp library. However, for\nsomething I was doing, I needed a regexp library in C, BSD-licensed,\nand able to be exposed to a wide range of expressions, some better\ncontrolled than others.\n\nThe resulting library is libagcre, which implements regular expression\ncompilation and execution. It uses the Pike Virtual Machine approach,\nand features:\n\n+ standard POSIX features where sane\n+ some/most Perl escapes\n+ lazy matching via '?'\n+ non-capture parenthese (?:...)\n+ in-expression case-insensitive directives are supported (?i)...(?-i)\n+ all case-insensitivity is actioned at expression exec time.\nCase-insensitivity can be specified at expression compile-time,\nand, if so, it will be remembered. \ But the expression itself, once\ncompiled, can be used to match in both a case-sensitive and insensitive\nmanner\n+ utf8 is supported both for expressions and for input text when\nmatching\n+ unicode escapes (in the Java format of \\uABCD) are supported\n+ exact multiple repetition specifiers {N}, and {N,M} are supported\n+ backreferences are supported\n+ utf16 (LE and BE) and utf32 (LE and BE) are supported, both for the\nexpression and for the input being searched\n+ at the most basic level, individual 32bit unicode characters are\nmatched\n+ an egrep/grep implementation for matching unicode regexps\nis included\n\nA simple implementation of sets is used to provide inclusion and\nexclusion information for unicode characters, which is taken directly\nfrom unicode.org. No bitmasks are used - ranges are specified by\nusing an upper and a lower bound for the codepoints. Callbacks can\nalso be added to these sets, to provide functionality similar to\nthe ctype macros across the whole unicode character set.\n\nThe standard regular expression basic3 torture test is passed with\n4 known (and, I'd argue, incorrect) results flagged. As expected,\nthe expression '(a?){9999}aaaaaaaaaaaaaaaaaaaaaaaaaaaaa' matches\nin linear time, as does the expression\n'((((((((((((((((((((((((((((((x))))))))))))))))))))))))))))))'\n\n\t% time agcre '(a?){9999}aaaaaaaaaaaaaaaaaaaaaaaaaaaaa' dist/tests/2.in\n\taaaaaaaaaaaaaaaaaaaaaaaaaaaaa\n\t0.063u 0.000s 0:00.06 100.0% 0+0k 0+0io 0pf+0w\n\t% time egrep '(a?){9999}aaaaaaaaaaaaaaaaaaaaaaaaaaaaa' dist/tests/2.in\n\t^C88.462u 0.730s 1:29.21 99.9% 0+0k 0+0io 0pf+0w\n\t%\n\nThe library and agcre utility have been run through valgrind to\nconfirm no memory leaks.\n\nIn general, the emphasis is on a modern, predictable, VM-style,\nwell-featured regexp library, in C, with a BSD license. In\nparticular, sljit has not been used to speed up on certain platforms,\nmost Perl regexp features are supported, as are back references,\nand UTF-8, UTF-16 and UTF32.\n\nOnce again, I wouldn't expect anyone to use this as the main engine\nin egrep. But I am always amazed at the uses for some of the things\nthat I write.\n\nFor more information about the Pike VM, and comparison to other\nregexp implementations, please see:\n\n\thttps://swtch.com/~rsc/regexp/regexp2.html\n\nAlistair Crooks\nTue Aug 15 07:43:34 PDT 2017\n" module: othersrc subject: 'CVS commit: othersrc/external/bsd/agcre/dist' unixtime: '1502926715' user: agc