Received: by mail.netbsd.org (Postfix, from userid 605) id 9C0E084DCA; Thu, 27 May 2021 17:11:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.netbsd.org (Postfix) with ESMTP id D838984DC6 for ; Thu, 27 May 2021 17:11:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at netbsd.org Received: from mail.netbsd.org ([IPv6:::1]) by localhost (mail.netbsd.org [IPv6:::1]) (amavisd-new, port 10025) with ESMTP id naYj2zTLVRaK for ; Thu, 27 May 2021 17:11:42 +0000 (UTC) Received: from cvs.NetBSD.org (ivanova.netbsd.org [199.233.217.197]) by mail.netbsd.org (Postfix) with ESMTP id 3940184C13 for ; Thu, 27 May 2021 17:11:42 +0000 (UTC) Received: by cvs.NetBSD.org (Postfix, from userid 500) id 33156FA95; Thu, 27 May 2021 17:11:42 +0000 (UTC) Content-Transfer-Encoding: 7bit Content-Type: multipart/mixed; boundary="_----------=_1622135502165970" MIME-Version: 1.0 Date: Thu, 27 May 2021 17:11:42 +0000 From: "Brook Milligan" Subject: CVS commit: pkgsrc/biology/filter-fastq To: pkgsrc-changes@NetBSD.org Reply-To: brook@netbsd.org X-Mailer: log_accum Message-Id: <20210527171142.33156FA95@cvs.NetBSD.org> Sender: pkgsrc-changes-owner@NetBSD.org List-Id: Precedence: bulk List-Unsubscribe: This is a multi-part message in MIME format. --_----------=_1622135502165970 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="US-ASCII" Module Name: pkgsrc Committed By: brook Date: Thu May 27 17:11:42 UTC 2021 Added Files: pkgsrc/biology/filter-fastq: DESCR Makefile PLIST distinfo Log Message: biology/filter-fastq: add filter-fastq version 0.0.0.20210527 Filter reads from a FASTQ file using a list of identifiers. Each entry in the input FASTQ file (or files) is checked against all entries in the identifier list. Matches are included by default, or excluded if the --invert flag is supplied. Paired-end files are kept consistent (in order). This is almost certainly not the most efficient way to implement this filtering procedure. I tested a few different strategies and this one seemed the fastest. Current timing with 16 processes is about 10 minutes per 1M paired reads with gzip'd input and output, depending on the length of the identifier list to filter by. usage: filter_fastq.py [-h] [-i INPUT] [-1 READ1] [-2 READ2] [-p NUM_THREADS] [-o OUTPUT] [-f FILTER_FILE] [-v] [--gzip] To generate a diff of this commit: cvs rdiff -u -r0 -r1.1 pkgsrc/biology/filter-fastq/DESCR \ pkgsrc/biology/filter-fastq/Makefile pkgsrc/biology/filter-fastq/PLIST \ pkgsrc/biology/filter-fastq/distinfo Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files. --_----------=_1622135502165970 Content-Disposition: inline Content-Length: 3192 Content-Transfer-Encoding: binary Content-Type: text/x-diff; charset=us-ascii Added files: Index: pkgsrc/biology/filter-fastq/DESCR diff -u /dev/null pkgsrc/biology/filter-fastq/DESCR:1.1 --- /dev/null Thu May 27 17:11:42 2021 +++ pkgsrc/biology/filter-fastq/DESCR Thu May 27 17:11:42 2021 @@ -0,0 +1,15 @@ +Filter reads from a FASTQ file using a list of identifiers. + +Each entry in the input FASTQ file (or files) is checked against all +entries in the identifier list. Matches are included by default, or +excluded if the --invert flag is supplied. Paired-end files are kept +consistent (in order). + +This is almost certainly not the most efficient way to implement this +filtering procedure. I tested a few different strategies and this one +seemed the fastest. Current timing with 16 processes is about 10 +minutes per 1M paired reads with gzip'd input and output, depending on +the length of the identifier list to filter by. + +usage: filter_fastq.py [-h] [-i INPUT] [-1 READ1] [-2 READ2] [-p NUM_THREADS] + [-o OUTPUT] [-f FILTER_FILE] [-v] [--gzip] Index: pkgsrc/biology/filter-fastq/Makefile diff -u /dev/null pkgsrc/biology/filter-fastq/Makefile:1.1 --- /dev/null Thu May 27 17:11:42 2021 +++ pkgsrc/biology/filter-fastq/Makefile Thu May 27 17:11:42 2021 @@ -0,0 +1,32 @@ +# $NetBSD: Makefile,v 1.1 2021/05/27 17:11:42 brook Exp $ + +PKGNAME= filter-fastq-0.0.0.20210527 +GITHUB_PROJECT= filter-fastq +GITHUB_TAG= d2c9218 +DISTNAME= filter-fastq +CATEGORIES= biology +MASTER_SITES= ${MASTER_SITE_GITHUB:=stephenfloor/} +EXTRACT_SUFX= .zip +DIST_SUBDIR= ${GITHUB_PROJECT} + +MAINTAINER= pkgsrc-users@NetBSD.org +HOMEPAGE= https://github.com/stephenfloor/filter-fastq/ +COMMENT= Filter reads from a FASTQ file +LICENSE= mit + +WRKSRC= ${WRKDIR}/filter-fastq-d2c92182674a6d5aa257fb63eb60ac24ddb8b4a0 +USE_LANGUAGES= # none +NO_BUILD= yes + +PYTHON_VERSIONS_ACCEPTED= 27 + +REPLACE_PYTHON+= filter_fastq.py + +INSTALLATION_DIRS+= bin share/doc/filter_fastq + +do-install: + ${INSTALL_SCRIPT} ${WRKSRC}/filter_fastq.py ${DESTDIR}${PREFIX}/bin + ${INSTALL_DATA} ${WRKSRC}/README.md ${DESTDIR}${PREFIX}/share/doc/filter_fastq + +.include "../../lang/python/application.mk" +.include "../../mk/bsd.pkg.mk" Index: pkgsrc/biology/filter-fastq/PLIST diff -u /dev/null pkgsrc/biology/filter-fastq/PLIST:1.1 --- /dev/null Thu May 27 17:11:42 2021 +++ pkgsrc/biology/filter-fastq/PLIST Thu May 27 17:11:42 2021 @@ -0,0 +1,3 @@ +@comment $NetBSD: PLIST,v 1.1 2021/05/27 17:11:42 brook Exp $ +bin/filter_fastq.py +share/doc/filter_fastq/README.md Index: pkgsrc/biology/filter-fastq/distinfo diff -u /dev/null pkgsrc/biology/filter-fastq/distinfo:1.1 --- /dev/null Thu May 27 17:11:42 2021 +++ pkgsrc/biology/filter-fastq/distinfo Thu May 27 17:11:42 2021 @@ -0,0 +1,6 @@ +$NetBSD: distinfo,v 1.1 2021/05/27 17:11:42 brook Exp $ + +SHA1 (filter-fastq/filter-fastq-d2c9218.zip) = 44b8bbef2690b598a2f06930396fbbf5828e364c +RMD160 (filter-fastq/filter-fastq-d2c9218.zip) = 715b0e52b5714cea1fa4a64bfe8cbef919cee2ce +SHA512 (filter-fastq/filter-fastq-d2c9218.zip) = c5ab23b86ac8690f58bf05bd0a16f3b315bd7a71f67bce267fe9f36b5e528ac228c57c2521cad8c547159915cf77433848be58d463100f407693927493ad8f5f +Size (filter-fastq/filter-fastq-d2c9218.zip) = 4249 bytes --_----------=_1622135502165970--