Wed Jan 16 00:07:49 2019 UTC ()

graphics/tesseract: update DESCR

The DESCR was about a decade out of date, revise to reflect 4.0.


(gutteridge)

diff -r1.1.1.1 -r1.2 pkgsrc/graphics/tesseract/DESCR

--- pkgsrc/graphics/tesseract/DESCR 2007/05/18 06:39:27 1.1.1.1
+++ pkgsrc/graphics/tesseract/DESCR 2019/01/16 00:07:49 1.2

 @@ -1,9 +1,8 @@
-This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO
+Tesseract provides an OCR engine and a command line program. It
-OUTPUT FORMATTING, and NO UI. It can only process an image of a
+includes a new neural net (LSTM) based OCR engine which is focused on
-single column and create text from it. It can detect fixed pitch
+line recognition, but also still provides a legacy OCR engine which
-vs proportional text.  Having said that, in 1995, this engine was
+works by recognizing character patterns. Tesseract has Unicode (UTF-8)
-in the top 3 in terms of character accuracy, and it compiles and
+support, and can recognize more than 100 languages "out of the box".
-runs on both Linux and Windows. Another current limitation is that
+Tesseract can be trained to recognize other languages. It supports
-it only recognizes English and its character set is only US-ASCII.
+various output formats: plain text, hOCR (HTML), PDF,
-Training code IS included in the open source release however, and
+invisible-text-only PDF, and TSV.
 will be included in a future release.