Skip to content

High Level API for OCR tasks #2753

@harsh2ai

Description

@harsh2ai

🚀 Feature

Pytorch vision library has many high-level API for performing the tasks under the hood seamlessly if there can be a high-level API for OCR tasks then downloading lots of third party libraries could be avoided.

Motivation

When I was building one such ocr system I found that there is no generalized way to recognize and text from various image/document formats even due to slight change in the format structure of document/images the systems tend to fail, at times we have to resort to using many traditional machine learning techniques which are highly time-consuming to get the desired results.

Pitch

OCR has high application in the industry millions of industry use this software on a daily basis and spends a lot of money in maintaining them they are used in medical, finance, delivery and many other domains for verification and data entry/storage purpose but despite having a use case over a wide number of industries there is no single solution which everyone can use and keep improving over the years, instead they rely on the old methods of creating one's own solution for this purpose.

If such an API is integrated in PyTorch then many businesses can shift to digital platforms and can increase their productivity, apart from this it will also be useful to the school and university students in a wide range of tasks.

Lastly once integrated into Pytorch it will be freely available for everyone to use since OCR software are very expensive and it can be improved over the years to come.

Alternatives

At present we only have tesseract which is capable of leveraging deep learning tasks for ocr but tesseract has its own set of problems and cannot be used everywhere.

and required a lot of preprocessing from other libraries to be done beforehand in order to use it

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions