--- - branch: MAIN date: Mon Mar 13 14:18:27 UTC 2023 files: - new: '1.1' old: '0' path: pkgsrc/textproc/py-sentencepiece/DESCR pathrev: pkgsrc/textproc/py-sentencepiece/DESCR@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/py-sentencepiece/Makefile pathrev: pkgsrc/textproc/py-sentencepiece/Makefile@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/py-sentencepiece/PLIST pathrev: pkgsrc/textproc/py-sentencepiece/PLIST@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/py-sentencepiece/distinfo pathrev: pkgsrc/textproc/py-sentencepiece/distinfo@1.1 type: added id: 20230313T141827Z.77c2ec77f4579121f1b77c27f33cac93674f3416 log: | textproc/py-sentencepiece: import py-sentencepiece-0.1.97 SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE)) and unigram language model with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing. This package contains the Python module. module: pkgsrc subject: 'CVS commit: pkgsrc/textproc/py-sentencepiece' unixtime: '1678717107' user: wiz