A Pre-processing Method for Printed Tamil Documents: Skew Correction and Textual Classification

Ramanan, M.; Ramanan, A.; Charles Eugene, Y.

DSpace Home
→
Faculty of Science
→
Computer Science
→
View Item

dc.contributor.author	Ramanan, M.
dc.contributor.author	Ramanan, A.
dc.contributor.author	Charles Eugene, Y.
dc.date.accessioned	2016-01-04T07:25:53Z
dc.date.accessioned	2022-06-28T04:51:44Z
dc.date.available	2016-01-04T07:25:53Z
dc.date.available	2022-06-28T04:51:44Z
dc.date.issued	2015-12-12
dc.identifier.uri	http://repo.lib.jfn.ac.lk/ujrr/handle/123456789/816
dc.description.abstract	An Optical character recognition (OCR) consists of the phases: preprocessing and segmentation, feature extraction, classification and post-processing. This paper focuses on pre-processing and segmentation tasks which plays a major role in the subsequent processes of an OCR. The objective of pre-processing and segmentation is to improve the quality of the input image. In addition this phase removes unnecessary portions of the input image that would otherwise complicate the subsequent steps of OCR and reduce the overall recognition rate. Preprocessing and segmentation step consists many sub processes namely, image binarisation, noise removal, skew detection and correction, page segmentation, text or non-text classification, line segmentation, word segmentation and character segmentation. This paper proposes a new approach to calculate the skew angle, segment and classify the blocks as text or non-text. The skew angle is calculated on the scanned document using Wiener filter, smearing technique and Radon transform. Document image is segmented into blocks using run length smearing algorithm and connected component analysis. Features such as basic, density and HOG are extracted from each block for text and non-text classification. The proposed methods are tested on 54 documents. The testing results show a recognition rate of 96.30% for skew detection and correction whereas the recognition rate is 99.18% for text or non-text classification with binary SVMs using RBF kernel.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE	en_US
dc.subject	Printed Tamil documents; Skew correction; Textual classification	en_US
dc.title	A Pre-processing Method for Printed Tamil Documents: Skew Correction and Textual Classification	en_US
dc.type	Article	en_US