DSpace Repository

A Pre-processing Method for Printed Tamil Documents: Skew Correction and Textual Classification

Show simple item record

dc.contributor.author Ramanan, M.
dc.contributor.author Ramanan, A.
dc.contributor.author Charles Eugene, Y.
dc.date.accessioned 2016-01-04T07:25:53Z
dc.date.accessioned 2022-06-28T04:51:44Z
dc.date.available 2016-01-04T07:25:53Z
dc.date.available 2022-06-28T04:51:44Z
dc.date.issued 2015-12-12
dc.identifier.uri http://repo.lib.jfn.ac.lk/ujrr/handle/123456789/816
dc.description.abstract An Optical character recognition (OCR) consists of the phases: preprocessing and segmentation, feature extraction, classification and post-processing. This paper focuses on pre-processing and segmentation tasks which plays a major role in the subsequent processes of an OCR. The objective of pre-processing and segmentation is to improve the quality of the input image. In addition this phase removes unnecessary portions of the input image that would otherwise complicate the subsequent steps of OCR and reduce the overall recognition rate. Preprocessing and segmentation step consists many sub processes namely, image binarisation, noise removal, skew detection and correction, page segmentation, text or non-text classification, line segmentation, word segmentation and character segmentation. This paper proposes a new approach to calculate the skew angle, segment and classify the blocks as text or non-text. The skew angle is calculated on the scanned document using Wiener filter, smearing technique and Radon transform. Document image is segmented into blocks using run length smearing algorithm and connected component analysis. Features such as basic, density and HOG are extracted from each block for text and non-text classification. The proposed methods are tested on 54 documents. The testing results show a recognition rate of 96.30% for skew detection and correction whereas the recognition rate is 99.18% for text or non-text classification with binary SVMs using RBF kernel. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.subject Printed Tamil documents; Skew correction; Textual classification en_US
dc.title A Pre-processing Method for Printed Tamil Documents: Skew Correction and Textual Classification en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record