-
Notifications
You must be signed in to change notification settings - Fork 36
Add initial OCR support in document_writer #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial OCR support in document_writer #134
Conversation
- Remove 'tesseract' from default features as it wasn't properly enabled - Update build.rs in 'mupdf-sys' to ensure that tesseract support is built properly - Introduce a new C wrapper method for error handling when the build has OCR disabled - A the method 'new_pdfocr_writer' to document_writer.rs Fix #128
I don't have a solution yet for "arguments list too long" (Windows MSYS2 tests on GitHub Actions). |
For the "argument list too long" issue, this comment should help: #121 (comment) Also you might want to rebase over main (or merge main into this branch) as it may (hopefully) help your CI issues. |
Thanks, this PR is based on latest main branch changes. I do have another repo forked from your previous 1.25 branch though. It looks like
|
- Replace "&str" usages in document_writer with FilePath - Re-enable neon CPU flags in build.rs Fix #128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rebase onto main?
There's a hardcoded 60s timeout in cargo test infrastructure
let mut cpu_flags = DEFAULT_CPU_FLAGS.to_vec(); | ||
let target_arch = env::var("CARGO_CFG_TARGET_ARCH").unwrap_or_default(); | ||
|
||
if target_arch == "arm" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if target_arch == "arm" { | |
if target_arch == "arm" || target_arch == "aarch64" { |
also need to consider aarch64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, I only had aarch64
and the build was failing on GitHub Actions (OK locally on Mac OS x86_64).
I will give it a try though.
@@ -513,6 +528,20 @@ fn main() { | |||
println!("cargo:rerun-if-changed=wrapper.h"); | |||
println!("cargo:rerun-if-changed=wrapper.c"); | |||
|
|||
if let Ok(ref target_os) = env::var("CARGO_CFG_TARGET_OS") { | |||
if target_os == "windows" { | |||
#[cfg(target_env = "msvc")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cross compiling support, target_env
should also be obtained from env vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll will make changes for other msvc
references in build.rs
as well.
I'll see if I can submit a new pull request at some point. For now, I'll use my own fork. It seems that the |
I'm sorry for changing so much. I will take care of the rebase so you can keep focusing on tesseract. |
It is not a problem at all and thank you. I appreciate your work. |
Initial OCR support in document_writer to create searchable PDF files with selectable text.
new_pdfocr_writer
indocument_writer.rs
, as well as wrapper inwrapper.c
for error handling (otherwise if OCR is disabled, applications will just crash silently).build.rs
fortesseract
support, as it was not enabled by defaulttesseract
, on non-Windows platformsFix #128