--- - branch: MAIN date: Mon Mar 13 14:17:12 UTC 2023 files: - new: '1.1' old: '0' path: pkgsrc/textproc/sentencepiece/DESCR pathrev: pkgsrc/textproc/sentencepiece/DESCR@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/sentencepiece/Makefile pathrev: pkgsrc/textproc/sentencepiece/Makefile@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/sentencepiece/Makefile.common pathrev: pkgsrc/textproc/sentencepiece/Makefile.common@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/sentencepiece/PLIST pathrev: pkgsrc/textproc/sentencepiece/PLIST@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/sentencepiece/buildlink3.mk pathrev: pkgsrc/textproc/sentencepiece/buildlink3.mk@1.1 type: added - new: '1.1' old: '0' path: pkgsrc/textproc/sentencepiece/distinfo pathrev: pkgsrc/textproc/sentencepiece/distinfo@1.1 type: added id: 20230313T141712Z.06ae483cc9155486fd0d8505e7000514e133c506 log: | textproc/sentencepiece: import sentencepiece-0.1.97 SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE)) and unigram language model with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing. module: pkgsrc subject: 'CVS commit: pkgsrc/textproc/sentencepiece' unixtime: '1678717032' user: wiz