-
Notifications
You must be signed in to change notification settings - Fork 606
get_text("rawdict") always returns same values for image xres and yres #4433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As always: |
Cannot share the exact PDF I was using when I discovered this bug because of privacy, however, for this washing machine manual I found on my laptop (pdf link) I get the same results for page 0.
|
Thanks for the example. |
MuPDF issue link https://bugs.ghostscript.com/show_bug.cgi?id=708433 |
As a circumvention use this code snippet: import pymupdf
doc = pymupdf.open("WGG246Z5NL.pdf")
for page in doc:
for img in [b for b in page.get_text("dict")["blocks"] if b["type"] == 1]:
print(f"{img['xres']=}, {img['yres']=}")
pix = pymupdf.Pixmap(img["image"])
print(f"{pix.xres=}, {pix.yres=}") # <== this is correct!
print() |
Thanks for the help! For the first picture on the page (the washing machine) your code outputs: Meanwhile Adobe shows that this picture has: |
I don't understand how Adobe computes this. Extract the images via XPDF or via |
Our MuPDF team has looked at this: The recipe I gave you (via making an intermediate Pixmap) is the correct way to find the values embedded in the image. I finally also learned that Adobe does the trivial computation image-width / bbox-width to come up with the values you showed me. So it also ignores any embedded DPI of the image. When convenient we will update the documentation accordingly and mention that xres / yres are useless values. |
Description of the bug
I am using this code to find information about the DPI of the images contained on the page, but no matter which PDF or what Image i check, the function always returns the same xres == yres = 96, which I suppose is the default of the library.
How to reproduce the bug
Output:
Does this run as intended?
Is there a different way to do what I want with the library?
Thank you for the great work!
PyMuPDF version
1.25.4
Operating system
Windows
Python version
3.10
The text was updated successfully, but these errors were encountered: