Abby acaba de lanzar su Abby FineReader Engine 7.0, la nueva versión del software OCR para desarrolladores. Con este software pueden realizarse aplicativos en Windows para ICR, OMR y reconocimiento de Código de barras.
NOTA DE PRENSA DE ABBY USA
FREMONT, Calif.–Sept. 17, 2003– ABBYY, developer of document recognition and linguistic technologies, today announced ABBYY FineReader Engine 7.0, the next generation of the company’s Software Development Kit (SDK) for integrating ABBYY OCR, ICR, OMR and barcode recognition technologies into Windows applications.
With the announcement of FineReader Engine 7.0, ABBYY dramatically expands the reach of its technology from traditional vertical markets (including finance, government and healthcare), by offering extended functionalities to address such niche projects as library archiving and Chinese and Japanese recognition. In addition to core platform enhancements to overall accuracy, document analysis, and export functions, FineReader Engine 7.0 adds sophisticated new modules for recognition of ancient and historical texts, PDF files, invoices, barcodes, and Asian characters.
«ABBYY’s goal is to deliver recognition technologies that help organizations transform documents into manageable data that can be processed, searched, indexed, edited, sent, or tabulated. As recognition technologies become more advanced, the true technological challenge in achieving this lies in the ability to address specialized texts and document formats,» explained Vadim Tereshchenko, FineReader division vice president. «With FineReader 7.0, we offer new add-on modules offering technology breakthroughs that expand the ability of our software to perform in key vertical markets, as well as niche markets with specific conversion needs.»
With the release of ABBYY FineReader Engine 7.0, developers will gain access to the powerful functionality of a high-level OCR system which is already being used by many leading companies worldwide such as Cardiff, Kofax, Lexmark, Panasonic, Toshiba, and ZyLab. ABBYY FineReader, ABBYY’s flagship OCR application based on the FineReader Engine, has won more than 100 awards worldwide since 1998.
Platform Enhancements in 7.0
FineReader Engine 7.0 is based on an entirely new recognition platform that offers the following enhancements:
Enhancements to ABBYY’s proprietary IPA Technology and other tools for fine-tuning recognition increase FineReader’s accuracy significantly over previous versions. A major contributor to the overall letter, word, and line recognition accuracy is the addition of new structural character models. In addition, new image preprocessing algorithms increase the technology’s ability to read documents that have text printed over an image, low-contrast documents, and poorly scanned pages. These improvements in accuracy are possible due to further enhancements of the two image preprocessing technologies that aid in recognizing this kind of text: Adaptive Binarization and Intelligent Background Removal. Adaptive Binarization uses a «dynamic» or «intelligent threshold» technique, which tunes the image contrast line by line and word by word, optimizing the characters’ quality in order to achieve the most accurate recognition results. Intelligent background removal removes textures and background «noise» even on complex or degraded documents that could interfere with the recognition of text properly.
Improved Document and Image Analysis
FineReader Engine 7.0 offers a new algorithm, Multilevel Document Analysis, (MDA), a process that examines the document at various levels — from characters to words, lines, and paragraphs. Ultimately, FineReader Engine reconstructs the entire document. With this sophisticated document and image analysis algorithm, FineReader Engine «understands» each formatting element on a document. As a result, applications developed using the FineReader Engine will be able to retain complex layouts, such as placement of images and columns on the page, formatting of tables, and font sizing. Other key benefits include improved recognition accuracy of complex tables, multiple-column documents with images, HTML formatting, and bullet points.
New Export and Synthesis Capabilities
ABBYY FineReader 7.0 also delivers significant improvements in export and synthesis, which include:
— Improved PDF Export. FineReader Engine now creates linearized PDF files that are optimized for publishing on the Web.
— Improved WYSIWYG HTML Output
The retention of complex formatting elements (like text flowing around non-rectangular images) has been improved in HTML. The resulting HTML files are now smaller in size, which is particularly important for documents published on the Internet.
— Output to Microsoft PowerPoint
— Smaller file sizes when exporting results to Microsoft Word
New Image Input Formats:
FineReader supports image input of JPEG 2000 files.
Extended Functionality with New Add-On Modules
With the development of FineReader Engine 7.0, ABBYY focuses on fine-tuning its technology to deliver special features and functions that address niche applications. FineReader Engine’s add-on modules offer specialized functionality to support software developers, system integrators and VARs working with specific types of projects, documents or files. FineReader Engine 7.0 add-on modules include:
1. PDF Opening
ABBYY FineReader Engine 7.0 uses an intelligent opening scheme for PDF documents. FineReader Engine 7.0 now recognizes PDFs in the following manner: it first extracts the text layer from the PDF file, then takes the original image from the same PDF and performs standard recognition, and finally compares the recognition results against the extracted text. This approach ensures higher recognition accuracy, particularly with PDF documents that have unusual encoded underlying text.
2. FineReader XIX: Fraktur/ Black Letter Script Recognition
FineReader 7.0 offers the industry’s first omnifont OCR solution for «Fraktur» or «Black Letter» prints used in ancient texts from the 19th and 20th centuries. FineReader will recognize elaborate, calligraphic-type prints as well as old-style roman-type characters, such as the elongated «s» used in early English or French texts. This feature, developed together with the European METAe archiving project, has been tested by leading universities. Well-suited for archiving a variety of old books and documents, FineReader XIX module includes dictionaries to support German, English, French, Italian, and Spanish.
3. Extended XML Output Module
The Extended XML Output module exports recognition results tagged with document structure information, including the location of graphics, tables, paragraphs and even characters, as well as the full formatting information about characters, paragraphs and tables. Post-recognition processing makes it easy to export this information to external applications, such as document management and content management systems and databases (MS SQL Server, Oracle or MS SharePoint). XML output is offered in the following formats:
— Native XML (includes all information of the recognized document)
— Microsoft Word XML. Recognized files can be exported recognized as native XML files using Microsoft Word 2003 defined schema.
— ASCII XML Output. A special ASCII XML Output module has been designed for DMS and archiving applications. Resulting files contain information about character positions and character confidence levels and can be easily indexed. Automatically eliminates those parts of text which have a low confidence level.
4. Chinese and Japanese Recognition
ABBYY FineReader Engine 7.0 now has an add-on module for Chinese (Traditional, and Simplified) and Japanese (Hiragana, Katakana and Kanji) OCR. Seamlessly integrated with the core engine, this module allows developers to use FineReader Engine’s existing API (Application Programming Interface) to execute recognition for Chinese and Japanese texts. Functions include: recognition of multi-language documents (Chinese-English and Japanese-English texts), automatic recognition of vertical and horizontal texts, automatic detection of text blocks, tables, columns and pictures on a document, manual drawing of recognition zones, detailed information about recognized characters, and export of recognized text into RTF, XML, HTML, TXT, CSV, and DBF file formats. Companies with Pan-pacific conversion projects can be benefit from this module.
5. Document Analysis for Invoices
A special OCR module developed for the financial and banking market segments, Document Analysis for Invoices can be used as a pre-processing engine for the conversion of semi-structured documents such as invoices, payment drafts, checks and transfers. In this pre-processing role, the module is designed to find as much text on these documents as possible, including characters and numbers — even if this information is located within stamps, logos or small text areas.
In contrast to standard OCR, this specialized OCR module assumes all printed information on documents is text and ensures that important text information is not incorrectly identified as graphic elements and that words or numerical values are not separated into multiple characters. As a result, a maximum of textual information is obtained from the document, including the coordinates, and is available for analysis, field-by-field processing and parsing, which are performed at subsequent processing stages by other systems.
6. OMR (Optical Mark Recognition) Module
The Optical Mark Recognition (OMR) module recognizes simple check marks, grouped check marks, model check marks and check marks with «corrections» made by hand.
7. 2D Barcode Recognition Module (PDF417)
The 2D barcode module recognizes PDF417, the industry standard for 2D barcodes. It is ideal for recognizing and categorizing product labels, and packages. PDF417 encodes up to 1.1 kilobytes of data, including text and graphics information.
The FineReader Engine SDK consists of a set of DLLs (Dynamic Link Libraries) and an API that conforms to the COM (Component Object Model) standard and is easily accessed with Visual Studio.NET, C/C++, Visual Basic or any other development tool supporting COM components. The FineReader Engine offers complete access to low-level OCR/ICR/OMR/barcode functionality and does not require a graphical user interface. FineReader Engine 7.0 is backward compatible with version 6.0.
ABBYY also offers a version of its OCR development tool kit for the Linux platform. FineReader Engine software development kit for the Linux platform supports a Linux-based programming and operating environment and provides access to ABBYY OCR functionality through an application programming interface (API) and via the Command Line interface.
ABBYY offers a free, 60-day fully functional trial version of ABBYY FineReader Engine 7.0 to allow prospective customers the ability to test Engine 7.0 under real working conditions without any limitation of functionality. To obtain an evaluation copy, please contact an ABBYY salesperson at www.abbyyusa.com.
Pricing and Availability
ABBYY FineReader Engine 7.0 will be available towards the end of 2003. ABBYY offers flexible pricing options that allow developers to select the type of licensing model that is best suited to their product and sales strategy. For additional product information, visit ABBYY’s website at http://www.abbyy.com