

'Google's Tesseract OCR engine is a quantum leap forward'. ^ a b Announcing Tesseract OCR - The official Google blog.'Tesseract: an Open-Source Optical Character Recognition Engine'. ^ 'Releases - tesseract-ocr/tesseract'.New language codes included: amh (Amharic), asm (Assamese), aze_cyrl (Azerbaijana in Cyrillic script), bod (Tibetan), bos (Bosnian), ceb (Cebuano), cym (Welsh), dzo (Dzongkha), fas (Persian), gle (Irish), guj (Gujarati), hat (Haitian and Haitian Creole), iku (Inuktitut), jav (Javanese), kat (Georgian), kat_old (Old Georgian), kaz (Kazakh), khm (Central Khmer), kir (Kyrgyz), kur (Kurdish), lao (Lao), lat (Latin), mar (Marathi), mya (Burmese), nep (Nepali), ori (Oriya), pan (Punjabi), pus (Pashto), san (Sanskrit), sin (Sinhala), srp_latn (Serbian in Latin script), syr (Syriac), tgk (Tajik), tir (Tigrinya), uig (Uyghur), urd (Urdu), uzb (Uzbek), uzb_cyrl (Uzbek in Cyrillic script), yid (Yiddish).

V3.04, released in July 2015, added an additional 39 language/script combinations, bringing the total count of support languages to over 100. New languages included Arabic, Bulgarian, Catalan, Chinese (Simplified and Traditional), Croatian, Czech, Danish, German (Fraktur script), Greek, Finnish, Hebrew, Hindi, Hungarian, Indonesian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak (standard and Fraktur script), Slovenian, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian and Vietnamese. Arabic, Hebrew) languages, as well as many more scripts. Version 3 extended language support significantly to include ideographic (Chinese & Japanese) and right-to-left (e.g. Tesseract v2 added six additional Western languages (French, Italian, German, Spanish, Brazilian Portuguese, Dutch). The initial versions of Tesseract could only recognize English-language text. Tesseract up to and including version 2 could only accept TIFF images of simple one-column text as inputs. However, due to limited resources it is only rigorously tested by developers under Windows and Ubuntu. It is available for Linux, Windows and Mac OS X. Tesseract was in the top three OCR engines in terms of character accuracy in 1995.Achieve new levels of productivity when converting documents with support for Automator actions and AppleScript commands. Process batches of documents and automate conversion tasks with FineReader Pro for Mac – world-leading OCR and PDF conversion software.To start viewing messages, select the forum that you want to visit from the selection below. You may have to register before you can post: click the register link above to proceed. Software BSD, Mac OS X, Hurd & Others If this is your first visit, be sure to check out the FAQ by clicking the link above.In 2006, Tesseract was considered one of the most accurate open-source OCR engines then available.
