Content-Transfer-Encoding: 7bit
Content-Type: multipart/mixed; boundary="_----------=_1622135502165970"
MIME-Version: 1.0
Date: Thu, 27 May 2021 17:11:42 +0000
From: "Brook Milligan" <brook@netbsd.org>
Subject: CVS commit: pkgsrc/biology/filter-fastq
To: pkgsrc-changes@NetBSD.org
Reply-To: brook@netbsd.org
Message-Id: <20210527171142.33156FA95@cvs.NetBSD.org>
Sender: pkgsrc-changes-owner@NetBSD.org
Precedence: bulk

This is a multi-part message in MIME format.

--_----------=_1622135502165970
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="US-ASCII"

Module Name:	pkgsrc
Committed By:	brook
Date:		Thu May 27 17:11:42 UTC 2021

Added Files:
	pkgsrc/biology/filter-fastq: DESCR Makefile PLIST distinfo

Log Message:
biology/filter-fastq: add filter-fastq version 0.0.0.20210527

Filter reads from a FASTQ file using a list of identifiers.

Each entry in the input FASTQ file (or files) is checked against all
entries in the identifier list. Matches are included by default, or
excluded if the --invert flag is supplied. Paired-end files are kept
consistent (in order).

This is almost certainly not the most efficient way to implement this
filtering procedure. I tested a few different strategies and this one
seemed the fastest. Current timing with 16 processes is about 10
minutes per 1M paired reads with gzip'd input and output, depending on
the length of the identifier list to filter by.

usage: filter_fastq.py [-h] [-i INPUT] [-1 READ1] [-2 READ2] [-p NUM_THREADS]
                       [-o OUTPUT] [-f FILTER_FILE] [-v] [--gzip]


To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 pkgsrc/biology/filter-fastq/DESCR \
    pkgsrc/biology/filter-fastq/Makefile pkgsrc/biology/filter-fastq/PLIST \
    pkgsrc/biology/filter-fastq/distinfo

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


--_----------=_1622135502165970
Content-Disposition: inline
Content-Length: 3192
Content-Transfer-Encoding: binary
Content-Type: text/x-diff; charset=us-ascii

Added files:

Index: pkgsrc/biology/filter-fastq/DESCR
diff -u /dev/null pkgsrc/biology/filter-fastq/DESCR:1.1
--- /dev/null	Thu May 27 17:11:42 2021
+++ pkgsrc/biology/filter-fastq/DESCR	Thu May 27 17:11:42 2021
@@ -0,0 +1,15 @@
+Filter reads from a FASTQ file using a list of identifiers.
+
+Each entry in the input FASTQ file (or files) is checked against all
+entries in the identifier list. Matches are included by default, or
+excluded if the --invert flag is supplied. Paired-end files are kept
+consistent (in order).
+
+This is almost certainly not the most efficient way to implement this
+filtering procedure. I tested a few different strategies and this one
+seemed the fastest. Current timing with 16 processes is about 10
+minutes per 1M paired reads with gzip'd input and output, depending on
+the length of the identifier list to filter by.
+
+usage: filter_fastq.py [-h] [-i INPUT] [-1 READ1] [-2 READ2] [-p NUM_THREADS]
+                       [-o OUTPUT] [-f FILTER_FILE] [-v] [--gzip]
Index: pkgsrc/biology/filter-fastq/Makefile
diff -u /dev/null pkgsrc/biology/filter-fastq/Makefile:1.1
--- /dev/null	Thu May 27 17:11:42 2021
+++ pkgsrc/biology/filter-fastq/Makefile	Thu May 27 17:11:42 2021
@@ -0,0 +1,32 @@
+# $NetBSD: Makefile,v 1.1 2021/05/27 17:11:42 brook Exp $
+
+PKGNAME=	filter-fastq-0.0.0.20210527
+GITHUB_PROJECT=	filter-fastq
+GITHUB_TAG=	d2c9218
+DISTNAME=	filter-fastq
+CATEGORIES=	biology
+MASTER_SITES=	${MASTER_SITE_GITHUB:=stephenfloor/}
+EXTRACT_SUFX=	.zip
+DIST_SUBDIR=	${GITHUB_PROJECT}
+
+MAINTAINER=	pkgsrc-users@NetBSD.org
+HOMEPAGE=	https://github.com/stephenfloor/filter-fastq/
+COMMENT=	Filter reads from a FASTQ file
+LICENSE=	mit
+
+WRKSRC=		${WRKDIR}/filter-fastq-d2c92182674a6d5aa257fb63eb60ac24ddb8b4a0
+USE_LANGUAGES=	# none
+NO_BUILD=	yes
+
+PYTHON_VERSIONS_ACCEPTED=	27
+
+REPLACE_PYTHON+=	filter_fastq.py
+
+INSTALLATION_DIRS+=	bin share/doc/filter_fastq
+
+do-install:
+	${INSTALL_SCRIPT} ${WRKSRC}/filter_fastq.py ${DESTDIR}${PREFIX}/bin
+	${INSTALL_DATA} ${WRKSRC}/README.md ${DESTDIR}${PREFIX}/share/doc/filter_fastq
+
+.include "../../lang/python/application.mk"
+.include "../../mk/bsd.pkg.mk"
Index: pkgsrc/biology/filter-fastq/PLIST
diff -u /dev/null pkgsrc/biology/filter-fastq/PLIST:1.1
--- /dev/null	Thu May 27 17:11:42 2021
+++ pkgsrc/biology/filter-fastq/PLIST	Thu May 27 17:11:42 2021
@@ -0,0 +1,3 @@
+@comment $NetBSD: PLIST,v 1.1 2021/05/27 17:11:42 brook Exp $
+bin/filter_fastq.py
+share/doc/filter_fastq/README.md
Index: pkgsrc/biology/filter-fastq/distinfo
diff -u /dev/null pkgsrc/biology/filter-fastq/distinfo:1.1
--- /dev/null	Thu May 27 17:11:42 2021
+++ pkgsrc/biology/filter-fastq/distinfo	Thu May 27 17:11:42 2021
@@ -0,0 +1,6 @@
+$NetBSD: distinfo,v 1.1 2021/05/27 17:11:42 brook Exp $
+
+SHA1 (filter-fastq/filter-fastq-d2c9218.zip) = 44b8bbef2690b598a2f06930396fbbf5828e364c
+RMD160 (filter-fastq/filter-fastq-d2c9218.zip) = 715b0e52b5714cea1fa4a64bfe8cbef919cee2ce
+SHA512 (filter-fastq/filter-fastq-d2c9218.zip) = c5ab23b86ac8690f58bf05bd0a16f3b315bd7a71f67bce267fe9f36b5e528ac228c57c2521cad8c547159915cf77433848be58d463100f407693927493ad8f5f
+Size (filter-fastq/filter-fastq-d2c9218.zip) = 4249 bytes


--_----------=_1622135502165970--