Skip to content

OCR - Expose document writer with OCR capabilities #128

Closed
@yveszoundi

Description

@yveszoundi

Rationale

In one of my apps, I started replacing few libraries with MuPDF and I noticed that MuPDF has Tesseract support (duplicate symbol errors with Leptonica in my app, only on Windows).

Goal

I believe that "most" people don't have to use tesseract-rs or its related sys crates containing more advanced features. With mupdf-rs I can drop 5 dependencies in one of my apps (poppler-rs, cairo-rs, lopdf, tesseract-sys, leptonica-sys).

Proposal

I can send a pull request for the following:

  • Expose a new method in mupdf-rs (DocumentWriter) to allow OCR via fz_new_pdfocr_writer
  • This allows trivial branches for applications that optionally perform OCR, because it doesn't require a different data structure.
  • I suppose that this would need to be guarded by the tesseract feature in the code

Other notes

I did some quick tests under Windows 11 and it's fine. I'll test soon under Linux and Mac OS prior submitting a pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions