Sat Aug 20 13:32:06 2022 UTC ()
pkgtools/distlint: add initial draft document, no package yet
(rillig)
diff -r0 -r1.1 pkgsrc/pkgtools/distlint/files/README.md
$NetBSD: README.md,v 1.1 2022/08/20 13:32:06 rillig Exp $
# Introduction
Distlint ensures that the distfiles on the TNF servers conform to the
license requirements.
Distfiles distributed under the GPL must be kept available for as long
as a binary package based on this distfile is distributed, plus 3
years.<sup>[citation needed]</sup>
Distfiles from packages with `NO_SRC_ON_FTP` must not be available at all.
Edge case: Imagine a package having `NO_SRC_ON_FTP` and multiple distfiles.
Some of them must not be available, the others have license GPL.
# Configuration
Distlint is configured by the `distlint.conf` file, which contains one
or more distdir sections. Each such section configures how a single
distdir is related to the directories for pkgsrc installations and
binary package directories:
~~~text
# Each distdir can be populated by several pkgsrc versions, such as
# pkgsrc-current and the quarterly branches.
# Each distdir can be the source for multiple distributions of binary
# packages, for example for different platforms.
distdir /usr/pkgsrc/distfiles
database /var/db/distlint/main
pkgsrc /usr/pkgsrc-current
pkgsrc /usr/pkgsrc-2022Q2
pkgsrc /usr/pkgsrc-2022Q1
packages /usr/pkgsrc/packages
packages /usr/pkgsrc/current-packages
distdir /pub/pkgsrc-archive/distfiles
database /var/db/distlint/archive
pkgsrc /pub/pkgsrc-archive/pkgsrc
packages /pub/pkgsrc-archive/packages
~~~
# Infrastructure overview
* https://cdn.netbsd.org/pub/pkgsrc/distfiles/
* https://cdn.netbsd.org/pub/pkgsrc/packages/
* https://archive.netbsd.org/pub/pkgsrc-archive/distfiles/
* https://archive.netbsd.org/pub/pkgsrc-archive/packages/
# Approach
Distlint maintains a database of distfile requirements.
The requirements are collected from all pkgsrc branches that are either
current or in the archive.
## Examples of database entries
_$distfile_ must not be in distfiles, because on _$updated_at_,
it belonged to package _$pkgname_ in _$pkgpath_,
which was marked as `NO_SRC_ON_FTP` because _$no_src_on_ftp_.
_$distfile_ must be kept in distfiles until _$keep_until_,
because on _$updated_at_, it belonged to package _$pkgname_,
which is published at _$publish_url_ and licensed under _$license_.
# Implementation details
## NO_SRC_ON_FTP
To find out whether a binary package has `NO_SRC_ON_FTP`, look at its
`+BUILD_INFO`.
## Find out the distfiles of a binary package
For most binary packages, the file `+BUILD_VERSION` contains the CVS
revision information of the `distinfo` file.
Some packages use `DISTINFO_FILE` to refer to a `distinfo` file outside
their PKGPATH. The CVS revision information for these `distinfo` files is
not recorded anywhere.
Some packages have no `distinfo` file at all because they are self-contained.
Example: pkgtools/lintpkgsrc.
Whether a binary package had a `distinfo` file or not is not visible from
looking at the binary package alone.
Using the CVS revision information of the `distinfo` file,
its file list can be retrieved from CVS.
# Quick hacks
## Find distfiles with NO_SRC_ON_FTP
This program finds most distfiles with NO_SRC_ON_FTP that are referenced
from the current pkgsrc tree.
Shortcomings:
* It does not find distfiles from stable pkgsrc branches.
* It does not find distfiles from previous versions of the packages.
* It does not find distfiles from packages with `DISTINFO_FILE`.
~~~shell
ssh ftp.netbsd.org
cd /pub/pkgsrc/current/pkgsrc
for pkgpath in $(grep -r NO_SRC_ON_FTP . 2>/dev/null | cut -d/ -f2-3); do
if [ -f "$pkgpath/distinfo" ] &&
! grep -r MASTER_SITE_LOCAL "$pkgpath" >/dev/null 2>&1; then
sed -n 's,^Size (\(.*\)) =.*$,\1,p' "$pkgpath/distinfo" |
while read distfile; do
if [ -f "/pub/pkgsrc/distfiles/$distfile" ]; then
echo "$distfile"
fi
done
fi
done | sort
~~~