Skip to content

How do I detect if a pdf contains text that has been rotated? #916

Answered by JorjMcKie
cabal-chan asked this question in Q&A
Discussion options

You must be logged in to vote

Before I take a look at the PDF, some general comments:

  • Text extraction (via "dict", etc.) only returns the bbox ( = rectangle) of the text not the quadrilateral. For what that means look at this picture from the documentation:
  • I am returning (x0,y0, x1, y1) corresponding to the blue rectangle, not (ul, ur, ll, lr) corresponding to the corners of the quad. I do consider introducing an option to request a different output in some future version.
  • There is an information contained in the "dict" output that at least lets you check whether there is a rotation: this is the dir key of the line sub-dictionary:
  • In cases of 90 degree rotations (= exactly one component of dir is zero), the quad i…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@cabal-chan
Comment options

Answer selected by cabal-chan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #915 on February 22, 2021 10:02.