-
Notifications
You must be signed in to change notification settings - Fork 69
Closed
Description
Describe the bug
In line:
with tempfile.NamedTemporaryFile() as tmp_file: |
is a tmp file created to pass as filename to
process_file_with_model
-> DocumentLayout.from_file
-> load_pdf
-> extract_pages
(pdf_miner).The extract_pages tries to read the file again with
open_filename(pdf_file, "rb") as fp:
.
Which results in a PermissionError: [Errno 13] Permission denied: 'C:\\Users\\...\\AppData\\Local\\Temp\\tmpf9flca30'
under windows.
Same error here:
https://github.com/Unstructured-IO/unstructured/blob/d3a404cfb541dae8e16956f096bac99fc05c985b/unstructured/partition/pdf_image/ocr.py#L79
To Reproduce
import tempfile
# print operating system name
import os
print(os.name)
# Create a temporary file
with tempfile.NamedTemporaryFile() as tmp_file:
# Write some data to the file
tmp_file.write(b'Hello, world!')
tmp_file.flush() # Flush the buffer to make sure data is written
# Get the name of the file
file_name = tmp_file.name
# Since the file is closed after the with block, we need to open it again for reading
with open(file_name, 'r') as file:
# Read the data from the file
content = file.read()
print("Content of the temp file:", content)
Expected behavior
I expect it not to crash :)
Additional context
Possible solution taken from here: https://stackoverflow.com/questions/39983886/python-writing-and-reading-from-a-temporary-file
def process_data_with_model(
data: BinaryIO,
model_name: Optional[str],
**kwargs,
) -> DocumentLayout:
"""Processes pdf file in the form of a file handler (supporting a read method) into a
DocumentLayout by using a model identified by model_name."""
with tempfile.TemporaryDirectory() as td:
f_name = os.path.join(td, "tmp_file")
with open(f_name, "w") as tmp_file:
tmp_file.write(data.read())
tmp_file.flush()
layout = process_file_with_model(
f_name,
model_name,
**kwargs,
)
return layout
or another solution by gpt:
import tempfile
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
tmp_file.write(b'Hello, world!')
# Get the name of the file before closing
file_name = tmp_file.name
# Now the file is closed, you can open it again
with open(file_name, 'r') as file:
content = file.read()
print("Content of the temp file:", content)
# Optionally, delete the file if you don't need it anymore
import os
os.remove(file_name)
Not sure which is better.
The latter one probably requires a try catch final with the removal and then reraise the error.
Metadata
Metadata
Assignees
Labels
No labels