UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714

shubhsrathore · 2025-04-19T13:16:07Z

Bug Description

I’m using Langflow with a flow that starts with an Unstructured.io node to convert files to text. The flow works just fine on langflow. When I test the flow’s API endpoint using Postman, I get a 500 Internal Server Error with the following message:

{
"message": "'utf-8' codec can't decode byte 0xc4 in position 171: invalid continuation byte"
}

This error persists even after trying several fixes, like converting the file to UTF-8 and specifying different encodings in Postman. I’m a non-technical user, so I need help understanding what’s causing this and how to resolve it.

Reproduction

To Reproduce

Create a Langflow flow with an Unstructured.io node as the first node to process a file (e.g., a CSV or text file) and convert it to text.
Use Postman to send a POST request to the flow’s API endpoint (e.g., http:///api/v1/run/).
In Postman, set up the request:

Method: POST

Headers:

Content-Type: multipart/form-data

Accept: application/json

Body (form-data):

Key: files, Value: Upload a file (e.g., a CSV or text file)

(Optional) Key: encoding, Value: utf-8 or latin1

Send the request and observe the error:

{
"message": "'utf-8' codec can't decode byte 0xc4 in position 171: invalid continuation byte"
}

Expected behavior

The Langflow API should process the file successfully and return the converted text without a 500 error.

Who can help?

No response

Operating System

Mac OS 15.4.1

Langflow Version

Datastax Langflow online version

Python Version

None

Screenshot

Flow File

No response

dosubot · 2025-04-19T13:17:13Z

Hello @shubhsrathore! I'm Dosu, your friendly bot here to help you with bugs, answer your questions, and assist you in becoming a contributor. While you wait for a human maintainer, feel free to ask me anything you need help with!

To resolve the 'utf-8' codec can't decode byte 0xc4 error, you can modify the file reading logic to handle the decoding error by opening the file in binary mode and then decoding it. Here's a code snippet that demonstrates this approach:

def read_file_content(self, file_path):
    """Read and return the content of a file."""
    file_path_ = Path(file_path)
    if not file_path_.is_file():
        return None
    try:
        with file_path_.open(encoding="utf-8") as file:
            return file.read()
    except UnicodeDecodeError:
        with file_path_.open("rb") as f:
            return f.read().decode("utf-8")

This code attempts to read the file with UTF-8 encoding, and if a UnicodeDecodeError occurs, it reads the file in binary mode and then decodes it using UTF-8. This approach should help resolve the error you're encountering ^[1].

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

shubhsrathore · 2025-04-19T13:27:23Z

@dosu, helpful only if you're developing Langflow locally or maintaining a custom deployment of Langflow (e.g., self-hosted on your infra with access to the source code). It won’t apply to the Datastax online version, where you don’t have access to the codebase or the Path(file_path) logic.

dosubot · 2025-04-19T13:28:56Z

In the Datastax online version of Langflow, you might not have direct access to modify the codebase. However, you can try the following workaround to address the encoding issue:

File Encoding Conversion: Before uploading the file to Langflow, ensure that it is encoded in UTF-8. You can use a text editor or a command-line tool to convert the file encoding. For example, using the iconv command in a terminal:
```
iconv -f original_encoding -t utf-8 input_file -o output_file
```
Replace original_encoding with the current encoding of your file (e.g., latin1), input_file with the path to your file, and output_file with the path where you want to save the converted file.
Postman Encoding Parameter: When sending the request via Postman, ensure that the encoding parameter is set correctly. If UTF-8 doesn't work, try using latin1 or another encoding that matches your file's original encoding.
Contact Support: If the issue persists, consider reaching out to the support team of the Datastax online version for further assistance, as they might have specific solutions or configurations to handle such encoding issues.

These steps should help mitigate the encoding problem without needing direct access to the Langflow codebase.

To continue talking to Dosu, mention @dosu.

^{Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other}

shubhsrathore added the bug Something isn't working label Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714

UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714

shubhsrathore commented Apr 19, 2025

dosubot bot commented Apr 19, 2025

shubhsrathore commented Apr 19, 2025

dosubot bot commented Apr 19, 2025

UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714

UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714

Comments

shubhsrathore commented Apr 19, 2025

Bug Description

Reproduction

Expected behavior

Who can help?

Operating System

Langflow Version

Python Version

Screenshot

Flow File

dosubot bot commented Apr 19, 2025

shubhsrathore commented Apr 19, 2025

dosubot bot commented Apr 19, 2025