Wed Jan 16 00:07:49 2019 UTC ()
graphics/tesseract: update DESCR

The DESCR was about a decade out of date, revise to reflect 4.0.


(gutteridge)
diff -r1.1.1.1 -r1.2 pkgsrc/graphics/tesseract/DESCR

cvs diff -r1.1.1.1 -r1.2 pkgsrc/graphics/tesseract/DESCR (expand / switch to unified diff)

--- pkgsrc/graphics/tesseract/DESCR 2007/05/18 06:39:27 1.1.1.1
+++ pkgsrc/graphics/tesseract/DESCR 2019/01/16 00:07:49 1.2
@@ -1,9 +1,8 @@ @@ -1,9 +1,8 @@
1This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO 1Tesseract provides an OCR engine and a command line program. It
2OUTPUT FORMATTING, and NO UI. It can only process an image of a 2includes a new neural net (LSTM) based OCR engine which is focused on
3single column and create text from it. It can detect fixed pitch 3line recognition, but also still provides a legacy OCR engine which
4vs proportional text. Having said that, in 1995, this engine was 4works by recognizing character patterns. Tesseract has Unicode (UTF-8)
5in the top 3 in terms of character accuracy, and it compiles and 5support, and can recognize more than 100 languages "out of the box".
6runs on both Linux and Windows. Another current limitation is that 6Tesseract can be trained to recognize other languages. It supports
7it only recognizes English and its character set is only US-ASCII. 7various output formats: plain text, hOCR (HTML), PDF,
8Training code IS included in the open source release however, and 8invisible-text-only PDF, and TSV.
9will be included in a future release.