Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified)) by mollari.NetBSD.org (Postfix) with ESMTPS id DF2711A9239 for ; Wed, 24 Nov 2021 15:56:20 +0000 (UTC) Received: by mail.netbsd.org (Postfix, from userid 605) id 1F04084EFB; Wed, 24 Nov 2021 15:56:20 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.netbsd.org (Postfix) with ESMTP id 581CC84D66 for ; Wed, 24 Nov 2021 15:56:19 +0000 (UTC) X-Virus-Scanned: amavisd-new at netbsd.org Received: from mail.netbsd.org ([127.0.0.1]) by localhost (mail.netbsd.org [127.0.0.1]) (amavisd-new, port 10025) with ESMTP id 9sgqi6lCCE7Y for ; Wed, 24 Nov 2021 15:56:18 +0000 (UTC) Received: from cvs.NetBSD.org (ivanova.NetBSD.org [IPv6:2001:470:a085:999:28c:faff:fe03:5984]) by mail.netbsd.org (Postfix) with ESMTP id B074684CD9 for ; Wed, 24 Nov 2021 15:56:18 +0000 (UTC) Received: by cvs.NetBSD.org (Postfix, from userid 500) id A9C20FAEC; Wed, 24 Nov 2021 15:56:18 +0000 (UTC) Content-Transfer-Encoding: 7bit Content-Type: multipart/mixed; boundary="_----------=_163776937872090" MIME-Version: 1.0 Date: Wed, 24 Nov 2021 15:56:18 +0000 From: "Thomas Klausner" Subject: CVS commit: pkgsrc/meta-pkgs/nltk_data To: pkgsrc-changes@NetBSD.org Reply-To: wiz@netbsd.org X-Mailer: log_accum Message-Id: <20211124155618.A9C20FAEC@cvs.NetBSD.org> Sender: pkgsrc-changes-owner@NetBSD.org List-Id: Precedence: bulk List-Unsubscribe: This is a multi-part message in MIME format. --_----------=_163776937872090 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Module Name: pkgsrc Committed By: wiz Date: Wed Nov 24 15:56:18 UTC 2021 Added Files: pkgsrc/meta-pkgs/nltk_data: common.mk howto.md split.py Log Message: nltk_data: add shared files for nltk_data packages This also includes a tool to create these packages. To generate a diff of this commit: cvs rdiff -u -r0 -r1.1 pkgsrc/meta-pkgs/nltk_data/common.mk \ pkgsrc/meta-pkgs/nltk_data/howto.md pkgsrc/meta-pkgs/nltk_data/split.py Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. --_----------=_163776937872090 Content-Disposition: inline Content-Length: 3367 Content-Transfer-Encoding: binary Content-Type: text/x-diff; charset=us-ascii Added files: Index: pkgsrc/meta-pkgs/nltk_data/common.mk diff -u /dev/null pkgsrc/meta-pkgs/nltk_data/common.mk:1.1 --- /dev/null Wed Nov 24 15:56:18 2021 +++ pkgsrc/meta-pkgs/nltk_data/common.mk Wed Nov 24 15:56:18 2021 @@ -0,0 +1,24 @@ +# $NetBSD: common.mk,v 1.1 2021/11/24 15:56:18 wiz Exp $ + +MASTER_SITES= https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/${TYPE}/ +EXTRACT_SUFX?= .zip + +MAINTAINER?= pkgsrc-users@NetBSD.org +HOMEPAGE?= https://www.nltk.org/data.html +COMMENT?= Natural Language Toolkit (NLTK) Data + +INSTALLATION_DIRS+= share/nltk_data/${TYPE} + +UNPACK?= no + +do-build: + +.if ${UNPACK} == "no" +do-install: + ${INSTALL_DATA} ${_DISTDIR}/${DISTNAME}${EXTRACT_SUFX} ${DESTDIR}${PREFIX}/share/nltk_data/${TYPE} +.else +USE_TOOLS+= pax + +do-install: + cd ${WRKDIR} && ${PAX} -pp -rw ${DISTNAME} ${DESTDIR}${PREFIX}/share/nltk_data/${TYPE}/ +.endif Index: pkgsrc/meta-pkgs/nltk_data/howto.md diff -u /dev/null pkgsrc/meta-pkgs/nltk_data/howto.md:1.1 --- /dev/null Wed Nov 24 15:56:18 2021 +++ pkgsrc/meta-pkgs/nltk_data/howto.md Wed Nov 24 15:56:18 2021 @@ -0,0 +1,21 @@ +# Sources + +Fetch https://www.nltk.org/nltk_data/ which is an XML file with an XSL +stylesheet + + wget -O nltk_data.xml https://www.nltk.org/nltk_data/ + +should work. +This file contains one line per data, as of 2021-11-24 there are 108 entries, +and some meta package information. + +# Generating the packages + +Update the date in `split.py` and run it: + + split.py + +It will generate one package for each entry in the list in textproc/nltk_data-${id} +You'll then need to run 'make mdi' in each directory. If the package existed +before, make sure that the data really changed (distinfo checksums/size differ) +before committing. Index: pkgsrc/meta-pkgs/nltk_data/split.py diff -u /dev/null pkgsrc/meta-pkgs/nltk_data/split.py:1.1 --- /dev/null Wed Nov 24 15:56:18 2021 +++ pkgsrc/meta-pkgs/nltk_data/split.py Wed Nov 24 15:56:18 2021 @@ -0,0 +1,49 @@ +#!/usr/bin/env python3 + +import os +import xml.etree.ElementTree as ET + +tree = ET.parse('nltk_data.xml') + +root = tree.getroot() + +for child in root[0]: + id = child.attrib["id"] + path = f"/usr/pkgsrc/textproc/nltk_data-{id}" + try: + os.mkdir(path) + except Exception: + pass + name = child.attrib["name"] + if "webpage" in child.attrib: + webpage = "HOMEPAGE=\t" + child.attrib["webpage"] + else: + webpage = "" + if "license" in child.attrib: + license = child.attrib["license"] + subdir = child.attrib["subdir"] + url = child.attrib["url"] + with open(path + "/Makefile", "w") as f: + print(f"""# $NetBSD: split.py,v 1.1 2021/11/24 15:56:18 wiz Exp $ + +DISTNAME= {id} +PKGNAME= nltk_data-{id}-20211124 +CATEGORIES= textproc +DIST_SUBDIR= ${{PKGNAME_NOREV}} + +{webpage} +COMMENT= NLTK Data - {name} +#LICENSE= {license} + +TYPE= {subdir} + +.include "../../meta-pkgs/nltk_data/common.mk" +.include "../../mk/bsd.pkg.mk" +""", file=f, end='') + with open(path + "/DESCR", "w") as f: + print(f"""This package contains data for NLTK, the Natural Language Toolkit. + +This package contains data from/for {name}.""", file=f) + with open(path + "/PLIST", "w") as f: + print(f"""@comment $NetBSD: split.py,v 1.1 2021/11/24 15:56:18 wiz Exp $ +share/nltk/{subdir}/{id}.zip""", file=f) --_----------=_163776937872090--