IJFANS International Journal of Food and Nutritional Sciences

ISSN PRINT 2319 1775 Online 2320-7876

OPTICAL CHARACTER RECOGNITION FOR TELUGU LANGUAGE USING TESSERACT

Main Article Content

S. Sagar Imambi1, Harsha Mynedi2
» doi: 10.48047/IJFANS/11/S6/037

Abstract

Optical Character Recognition (OCR) is a technique used to convert scanned images into text, and this technology has seen significant enhancements, enabling its application to "read" computer files. OCR, often abbreviated as OCR, involves the mechanical or electronic conversion of typed, handwritten, or printed text from various sources into machine-readable text. It can work with scanned documents, photographs, or scene photos, allowing you to transform them into documents with editable, searchable text that can be modified, copied, and edited as needed.Handwriting recognition in OCR software utilizes "intelligent character recognition" technology, which enables the conversion engine to identify different shapes and patterns as letters. While OCR software handwriting recognition has made significant progress, it is not entirely flawless. It excels in recognizing highly structured text, where each letter is neatly separated in boxes, but faces challenges in other contexts.OCR technology has long been utilized by organizations like the US mail to read addresses on mail. Ongoing research is dedicated to improving OCR software handwriting recognition to further enhance its accuracy and capabilities.This research work main objective is to extract Telugu text from images for subsequent editing, formatting, indexing, and translation. It aims to accelerate the character recognition process in document processing, with Telugu being a Dravidian language spoken by over 80 million people worldwide. OCR for Telugu script has a wide range of applications, including education, healthcare, and administration. The unique and intricate nature of the Telugu script distinguishes it from languages like English and German.Deep learning models are a promising approach to Telugu OCR.Tesseract achieves the highest accuracy on Telugu OCR, followed by Blark, LSTM and CNN-ECOC online handwriting recognition.

Article Details