-
Notifications
You must be signed in to change notification settings - Fork 6.2k
UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @shubhsrathore! I'm Dosu, your friendly bot here to help you with bugs, answer your questions, and assist you in becoming a contributor. While you wait for a human maintainer, feel free to ask me anything you need help with! To resolve the 'utf-8' codec can't decode byte 0xc4 error, you can modify the file reading logic to handle the decoding error by opening the file in binary mode and then decoding it. Here's a code snippet that demonstrates this approach: def read_file_content(self, file_path):
"""Read and return the content of a file."""
file_path_ = Path(file_path)
if not file_path_.is_file():
return None
try:
with file_path_.open(encoding="utf-8") as file:
return file.read()
except UnicodeDecodeError:
with file_path_.open("rb") as f:
return f.read().decode("utf-8") This code attempts to read the file with UTF-8 encoding, and if a To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@dosu, helpful only if you're developing Langflow locally or maintaining a custom deployment of Langflow (e.g., self-hosted on your infra with access to the source code). It won’t apply to the Datastax online version, where you don’t have access to the codebase or the Path(file_path) logic. |
In the Datastax online version of Langflow, you might not have direct access to modify the codebase. However, you can try the following workaround to address the encoding issue:
These steps should help mitigate the encoding problem without needing direct access to the Langflow codebase. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Bug Description
I’m using Langflow with a flow that starts with an Unstructured.io node to convert files to text. The flow works just fine on langflow. When I test the flow’s API endpoint using Postman, I get a 500 Internal Server Error with the following message:
{
"message": "'utf-8' codec can't decode byte 0xc4 in position 171: invalid continuation byte"
}
This error persists even after trying several fixes, like converting the file to UTF-8 and specifying different encodings in Postman. I’m a non-technical user, so I need help understanding what’s causing this and how to resolve it.
Reproduction
To Reproduce
Create a Langflow flow with an Unstructured.io node as the first node to process a file (e.g., a CSV or text file) and convert it to text.
Use Postman to send a POST request to the flow’s API endpoint (e.g., http:///api/v1/run/).
In Postman, set up the request:
Method: POST
Headers:
Content-Type: multipart/form-data
Accept: application/json
Body (form-data):
Key: files, Value: Upload a file (e.g., a CSV or text file)
(Optional) Key: encoding, Value: utf-8 or latin1
{
"message": "'utf-8' codec can't decode byte 0xc4 in position 171: invalid continuation byte"
}
Expected behavior
The Langflow API should process the file successfully and return the converted text without a 500 error.
Who can help?
No response
Operating System
Mac OS 15.4.1
Langflow Version
Datastax Langflow online version
Python Version
None
Screenshot
Flow File
No response
The text was updated successfully, but these errors were encountered: