Automatic Classification and OCR

 

How can a Smart Document Management software automatically classify documents into their destination folders without manual intervention?  

 

To understand how this process must first know what a "OCR", which could be defined briefly as "electronic eyes" capable of interpreting characters to extract text in a document.  

 

There are different types of "OCR", multipage TIF image, images monopágina (Jpg, Bmp, Png, Gif, ...), PDF image (generated by scanner), text extraction Text PDF (Generated for applications), text extraction Office (Word, Excel, Powerpoint, Word perfect ....), text extraction Open Office (oDT, ODS, ODP, ...), text extraction ascii (txt, Dat, Log, Html, Asp, Jsp, Php, Rtf, XML ...), text mining to e-mail and attachments (eml, Msg, Pst), high of files in the database without text content (Any extension) ... etc.  

 

After removing the text of any document, it stores a copy attached to the document text content in plain text. Through this process can be contained any word that could be used as "sort key." Each destination folder is assigned a series of "sort keys" so that this folder can call the required document if it contains such codes or words in your text, previously extracted by "OCR".  

 

Despite the complexity and striking it may be, the process is very simple and is a great help in any company.

 

What is the Intelligent Document Management?

Tema: Automatic Classification and OCR

No comments found.

New comment