Received: by mail.netbsd.org (Postfix, from userid 605) id F3F9684D14; Thu, 18 Feb 2021 10:26:58 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.netbsd.org (Postfix) with ESMTP id 3974A84CE7 for ; Thu, 18 Feb 2021 10:26:58 +0000 (UTC) X-Virus-Scanned: amavisd-new at netbsd.org Received: from mail.netbsd.org ([127.0.0.1]) by localhost (mail.netbsd.org [127.0.0.1]) (amavisd-new, port 10025) with ESMTP id 8xHpGWl6y06X for ; Thu, 18 Feb 2021 10:26:57 +0000 (UTC) Received: from cvs.NetBSD.org (ivanova.netbsd.org [199.233.217.197]) by mail.netbsd.org (Postfix) with ESMTP id 0687784CE8 for ; Thu, 18 Feb 2021 10:26:57 +0000 (UTC) Received: by cvs.NetBSD.org (Postfix, from userid 500) id 00347FA95; Thu, 18 Feb 2021 10:26:56 +0000 (UTC) Content-Transfer-Encoding: 7bit Content-Type: multipart/mixed; boundary="_----------=_1613644016297820" MIME-Version: 1.0 Date: Thu, 18 Feb 2021 10:26:56 +0000 From: "Thomas Klausner" Subject: CVS commit: pkgsrc/textproc/libstemmer To: pkgsrc-changes@NetBSD.org Reply-To: wiz@netbsd.org X-Mailer: log_accum Message-Id: <20210218102657.00347FA95@cvs.NetBSD.org> Sender: pkgsrc-changes-owner@NetBSD.org List-Id: Precedence: bulk List-Unsubscribe: This is a multi-part message in MIME format. --_----------=_1613644016297820 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="UTF-8" Module Name: pkgsrc Committed By: wiz Date: Thu Feb 18 10:26:56 UTC 2021 Modified Files: pkgsrc/textproc/libstemmer: Makefile distinfo pkgsrc/textproc/libstemmer/patches: patch-GNUmakefile Log Message: libstemmer: update to 2.1.0. Snowball 2.1.0 (2021-01-21) =========================== C/C++ ----- * Fix decoding of 4-byte UTF-8 sequences in `grouping` checks. This bug affected Unicode codepoints U+40000 to U+7FFFF and U+C0000 to U+FFFFF and doesn't affect any of the stemming algorithms we currently ship (#138, reported by Stephane Carrez). Python ------ * Fix snowballstemmer.algorithms() method (#132, reported by kkaiser). * Update code to generate trove language classifiers for PyPI. All the natural languages we previously had stemmers for have now been added to PyPI's list, but Armenian and Yiddish aren't on it. Patch from Dmitry Shachnev. Java ---- Code Quality Improvements ------------------------- * Suppress GCC warning in compiler code. * Use `const` pointers more in C runtime. * Only use spaces for indentation in javascript code. Change proposed by Emily Marigold Klassen in #123, and seems to be the modern Javascript norm. New Code Generators ------------------- * Add Ada generator from Stephane Carrez (#135). New Snowball Language Features ------------------------------ * `lenof` and `sizeof` can now be applied to a literal string, which can be useful if you want to do calculations on cursor values. This change actually simplifies the language a little, since you can now use a literal string in any read-only context which accepts a string variable. Code generation improvements ---------------------------- * General: + Fix bugs in the code generated to handle failure of `goto`, `gopast` or `try` inside `setlimit` or string-`$`. This affected all languages (though the issue with `try` wasn't present for C). These bugs don't affect any of the stemming algorithms we currently ship. Reported by Stefan Petkovic on snowball-discuss. + Change `hop` with a negative argument to work as documented. The manual says a negative argument to hop will raise signal f, but the implementation for all languages was actually to move the cursor in the opposite direction to `hop` with a positive argument. The implemented behaviour is problematic as it allows invalidating implicitly saved cursor values by modifying the string outside the current region, so we've decided it's best to fix the implementation to match the documentation. The only Snowball code we're aware of which relies on this was the original version of the new Yiddish stemming algorithm, which has been updated not to rely on this. The compiler now issues a warning for `hop` with a constant negative argument (internally now converted to `false`), and for `hop` with a constant zero argument (internally now converted to `true`). + Canonicalise `among` actions equivalent to `()` such as `(true)` which previously resulted in an extra case in the among, and for Python we'd generate invalid Python code (`if` or `elif` with an empty body). Bug revealed by Assaf Urieli's Yiddish stemmer in #137. + Eliminate variables whose values are never used - they no longer have corresponding member variables, etc, and no code is generated for any assignments to them. + Don't generate anything for an unused `grouping`. + Stop warning "grouping X defined but not used" for a `grouping` which is only used to define other another `grouping`. * C/C++: + Store booleans in same array as integers. This means each boolean is stored as an int instead of an unsigned char which means 4 bytes instead of 1, but we save a pointer (4 or 8 bytes) in struct SN_env which is a win for all the current stemmers. For an algorithm which uses both integers and booleans, we also save the overhead of allocating a block on the heap, and potentially improve data locality. + Eliminate duplicate generated C comment for sliceto. * Pascal: + Avoid generating unused variables. The Pascal code generated for the stemmers we ship is now warning free (tested with fpc 3.2.0). * Python: + End `if`-chain with `else` where possible, avoiding a redundant test of the variable being switched on. This optimisation kicks in for an `among` where all cases have commands. This change seems to speed up `make check_python_arabic` by a few percent. New stemming algorithms ----------------------- * Add Serbian stemmer from stef4np (#113). * Add Yiddish stemmer from Assaf Urieli (#137). * Add Armenian stemmer from Astghik Mkrtchyan. It's been on the website for over a decade, and included in Xapian for over 9 years without any negative feedback. Behavioural changes to existing algorithms ------------------------------------------ Optimisations to existing algorithms ------------------------------------ * kraaij_pohlmann: Use `$v = limit` instead of `do (tolimit setmark v)` since this generates simpler code, and also matches the code other algorithm implementations use. Probably for languages like C with optimising compilers the compiler will generate equivalent code anyway, but e.g. for Python this should be an improvement. Code clarity improvements to existing algorithms ------------------------------------------------ * hindi.sbl: Fix comment typo. Compiler -------- * Don't count `$x = x + 1` as initialising or using `x`, so it's now handled like `$x += 1` already is. * Comments are now only included in the generated code if command like option -comments is specified. The comments in the generated code are useful if you're trying to debug the compiler, and perhaps also if you are trying to debug your Snowball code, but for everyone else they just bloat the code which as the number of languages we support grows becomes more of an issue. * `-parentclassname` is not only for java and csharp so don't disable it if those backends are disabled. * `-syntax` now reports the value for each numeric literal. * Report location for excessive get nesting error. * Internally the compiler now represents negated literal numbers as a simple `c_number` rather than `c_neg` applied to a `c_number` with a positive value. This simplifies optimisations that want to check for a constant numeric expression. Build system ------------ * Link binaries with LDFLAGS if it's set, which is needed for some platform (e.g. OpenEmbedded). Patch from Andreas Müller (#120). * Add missing dependencies of algorithms.go rule. Testsuite --------- * C: Add stemtest for low-level regression tests. Documentation ------------- * Document a C99 compiler as a requirement for building the snowball compiler (but the C code it generates should still work with any ISO C compiler.) A few declarations mixed with code crept in some time ago (which nobody's complained about), so this is really just formally documenting a requirement which already existed. * README: Explain what Snowball is and what Stemming is (#131, reported by Sean Kelly). * CONTRIBUTING.rst: Expand section on adding a new generator. * For Python snowballstemmer module include global NEWS instead of Python-specific CHANGES.rst and use README.rst as the long description. Patch from Dmitry Shachnev (#119). * COPYING: Update and incorporate Python backend licensing information which was previously in a separate file. To generate a diff of this commit: cvs rdiff -u -r1.2 -r1.3 pkgsrc/textproc/libstemmer/Makefile cvs rdiff -u -r1.1 -r1.2 pkgsrc/textproc/libstemmer/distinfo cvs rdiff -u -r1.1 -r1.2 pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. --_----------=_1613644016297820 Content-Disposition: inline Content-Length: 3985 Content-Transfer-Encoding: binary Content-Type: text/x-diff; charset=us-ascii Modified files: Index: pkgsrc/textproc/libstemmer/Makefile diff -u pkgsrc/textproc/libstemmer/Makefile:1.2 pkgsrc/textproc/libstemmer/Makefile:1.3 --- pkgsrc/textproc/libstemmer/Makefile:1.2 Mon Aug 31 18:11:43 2020 +++ pkgsrc/textproc/libstemmer/Makefile Thu Feb 18 10:26:56 2021 @@ -1,8 +1,7 @@ -# $NetBSD: Makefile,v 1.2 2020/08/31 18:11:43 wiz Exp $ +# $NetBSD: Makefile,v 1.3 2021/02/18 10:26:56 wiz Exp $ -DISTNAME= snowball-2.0.0 -PKGNAME= libstemmer-2.0.0 -PKGREVISION= 1 +DISTNAME= snowball-2.1.0 +PKGNAME= ${DISTNAME:S/snowball/libstemmer/} CATEGORIES= textproc MASTER_SITES= ${MASTER_SITE_GITHUB:=snowballstem/} GITHUB_PROJECT= snowball Index: pkgsrc/textproc/libstemmer/distinfo diff -u pkgsrc/textproc/libstemmer/distinfo:1.1 pkgsrc/textproc/libstemmer/distinfo:1.2 --- pkgsrc/textproc/libstemmer/distinfo:1.1 Tue Apr 14 14:07:50 2020 +++ pkgsrc/textproc/libstemmer/distinfo Thu Feb 18 10:26:56 2021 @@ -1,8 +1,8 @@ -$NetBSD: distinfo,v 1.1 2020/04/14 14:07:50 ryoon Exp $ +$NetBSD: distinfo,v 1.2 2021/02/18 10:26:56 wiz Exp $ -SHA1 (snowball-2.0.0.tar.gz) = b152bbebca34505d963f3cfb6b726859d5b83b66 -RMD160 (snowball-2.0.0.tar.gz) = f5dc4e6caeb65120eeb36d9f45dd758a8024c881 -SHA512 (snowball-2.0.0.tar.gz) = 7da7c653d41bf03f3fb2f0b4a8963572fc97319fe44e82c1fc7882ba440e60e5947ed7fb722f7e78592d5ea862e3d733880f9f656236e40c1d5306e70a80a1b1 -Size (snowball-2.0.0.tar.gz) = 179986 bytes -SHA1 (patch-GNUmakefile) = dc58eaec3de72fb93cf2393631b1bdc7d31be7cf +SHA1 (snowball-2.1.0.tar.gz) = 4a4c82c1619052442bd2049f7d12c4afa752e524 +RMD160 (snowball-2.1.0.tar.gz) = ecdc9606e494447e1f85ff89076f45cec9f0a3dd +SHA512 (snowball-2.1.0.tar.gz) = 1efd7d8ab58852987e83247048244882c517e32237c8cb3c0558b66ecfb075733ce8805ebb76041e6e7d6664c236054effe66838e7c524ee529ce869aa8134f0 +Size (snowball-2.1.0.tar.gz) = 220324 bytes +SHA1 (patch-GNUmakefile) = 0a0c0a1760338fc55374e88b4ab853b47dc24ea0 SHA1 (patch-libstemmer_symbol.map) = 0122f03d0ac54dae908ffd873f1ae4a6e502a56f Index: pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile diff -u pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile:1.1 pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile:1.2 --- pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile:1.1 Tue Apr 14 14:07:50 2020 +++ pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile Thu Feb 18 10:26:56 2021 @@ -1,10 +1,10 @@ -$NetBSD: patch-GNUmakefile,v 1.1 2020/04/14 14:07:50 ryoon Exp $ +$NetBSD: patch-GNUmakefile,v 1.2 2021/02/18 10:26:56 wiz Exp $ * Build dynamic library, from archlinux. ---- GNUmakefile.orig 2019-10-02 03:27:17.000000000 +0000 +--- GNUmakefile.orig 2021-01-21 04:50:09.000000000 +0000 +++ GNUmakefile -@@ -151,10 +151,10 @@ C_OTHER_OBJECTS = $(C_OTHER_SOURCES:.c=. +@@ -162,10 +162,10 @@ C_OTHER_OBJECTS = $(C_OTHER_SOURCES:.c=. JAVA_CLASSES = $(JAVA_SOURCES:.java=.class) JAVA_RUNTIME_CLASSES=$(JAVARUNTIME_SOURCES:.java=.class) @@ -18,16 +18,7 @@ $NetBSD: patch-GNUmakefile,v 1.1 2020/04 clean: rm -f $(COMPILER_OBJECTS) $(RUNTIME_OBJECTS) \ -@@ -179,7 +179,7 @@ clean: - -rmdir $(js_output_dir) - - snowball: $(COMPILER_OBJECTS) -- $(CC) $(CFLAGS) -o $@ $^ -+ $(CC) $(CFLAGS) ${LDFLAGS} -o $@ $^ - - $(COMPILER_OBJECTS): $(COMPILER_HEADERS) - -@@ -200,8 +200,11 @@ libstemmer/libstemmer.o: libstemmer/modu +@@ -212,6 +212,9 @@ libstemmer/libstemmer.o: libstemmer/modu libstemmer.o: libstemmer/libstemmer.o $(RUNTIME_OBJECTS) $(C_LIB_OBJECTS) $(AR) -cru $@ $^ @@ -35,8 +26,5 @@ $NetBSD: patch-GNUmakefile,v 1.1 2020/04 + $(CC) $(CFLAGS) -shared $(LDFLAGS) -Wl,-soname,libstemmer.so.0,-version-script,libstemmer/symbol.map -o $@.0.0.0 $^ + stemwords: $(STEMWORDS_OBJECTS) libstemmer.o -- $(CC) $(CFLAGS) -o $@ $^ -+ $(CC) $(CFLAGS) ${LDFLAGS} -o $@ $^ + $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^ - csharp_stemwords: $(CSHARP_STEMWORDS_SOURCES) $(CSHARP_RUNTIME_SOURCES) $(CSHARP_SOURCES) - $(MCS) -unsafe -target:exe -out:$@ $(CSHARP_STEMWORDS_SOURCES) $(CSHARP_RUNTIME_SOURCES) $(CSHARP_SOURCES) --_----------=_1613644016297820--