• Login
    View Item 
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Natural and Agricultural Sciences
    • View Item
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Natural and Agricultural Sciences
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Evaluation of pre-processing techniques for the analysis and recognition of invoice documents

    Thumbnail
    View/Open
    Van_Zyl_PA_2015.pdf (7.012Mb)
    Date
    2015
    Author
    Van Zyl, Petrus Andries
    Metadata
    Show full item record
    Abstract
    The automatic extraction and handling of information contained on invoice documents holds major benefits for many businesses as this could save many resources, which would otherwise have been spent on manual extraction. Document Analysis and Recognition (DAR) is a process, which makes use of Optical Character Recognition (OCR) for the recognition and analysis of the contents of physical documents in order to digitally extract and process the information. It consists of four steps, namely pre-processing, layout analysis, text recognition, and post-processing. Pre-processing is used to improve the overall quality of a document image in order to prepare it for the steps that follow. Techniques used for pre-processing have a direct influence on the resulting OCR accuracy as any small deficiencies that pass through this stage are dragged along the rest of the OCR process and ultimately recognized incorrectly. A significant contribution can be made to the relevant research areas and business communities by revealing which preprocessing techniques are the most effective for the analysis and recognition of invoice documents. In order to approach this problem, an exploratory study was first conducted. Case studies were used during which owners and CEOs of five DAR-related companies were interviewed. Transcriptions and content analysis of these semi-structured interviews allowed prevalent themes to emerge from the data. The second study was an experimental investigation. The experiments conducted involved taking a number of invoice document images, performing various pre-processing techniques on the images, and measuring the effect of the techniques on the recognition rates. By acquiring the recognition rates of the different techniques, it was possible to quantitatively compare the techniques with each other. It was revealed that many businesses in the DAR industry make use of the same business process. Much was learnt about the DAR-related software used in the industry, how Intelligent Character Recognition (ICR) should be approached, and what the best scanning practices are. It was also discovered that the use of paper-based information and the need for the electronic processing thereof is increasing, thereby securing the future of the industry. Regarding the efficiency of pre-processing techniques, it was successfully revealed that some techniques do perform better than others. In addition, many findings were made regarding the functioning of some of the techniques used for the experiments
    URI
    http://hdl.handle.net/10394/26820
    Collections
    • Natural and Agricultural Sciences [2777]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV