Skip to content

UnicodeDecodeError in Langflow with Unstructured.io Node: 'utf-8' codec can't decode byte 0xc4 #7714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shubhsrathore opened this issue Apr 19, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@shubhsrathore
Copy link

Bug Description

I’m using Langflow with a flow that starts with an Unstructured.io node to convert files to text. The flow works just fine on langflow. When I test the flow’s API endpoint using Postman, I get a 500 Internal Server Error with the following message:

{
"message": "'utf-8' codec can't decode byte 0xc4 in position 171: invalid continuation byte"
}

This error persists even after trying several fixes, like converting the file to UTF-8 and specifying different encodings in Postman. I’m a non-technical user, so I need help understanding what’s causing this and how to resolve it.

Reproduction

To Reproduce

  1. Create a Langflow flow with an Unstructured.io node as the first node to process a file (e.g., a CSV or text file) and convert it to text.

  2. Use Postman to send a POST request to the flow’s API endpoint (e.g., http:///api/v1/run/).

  3. In Postman, set up the request:

Method: POST

Headers:

Content-Type: multipart/form-data

Accept: application/json

Body (form-data):

Key: files, Value: Upload a file (e.g., a CSV or text file)

(Optional) Key: encoding, Value: utf-8 or latin1

  1. Send the request and observe the error:

{
"message": "'utf-8' codec can't decode byte 0xc4 in position 171: invalid continuation byte"
}

Expected behavior

The Langflow API should process the file successfully and return the converted text without a 500 error.

Who can help?

No response

Operating System

Mac OS 15.4.1

Langflow Version

Datastax Langflow online version

Python Version

None

Screenshot

Image

Image

Flow File

No response

@shubhsrathore shubhsrathore added the bug Something isn't working label Apr 19, 2025
Copy link

dosubot bot commented Apr 19, 2025

Hello @shubhsrathore! I'm Dosu, your friendly bot here to help you with bugs, answer your questions, and assist you in becoming a contributor. While you wait for a human maintainer, feel free to ask me anything you need help with!

To resolve the 'utf-8' codec can't decode byte 0xc4 error, you can modify the file reading logic to handle the decoding error by opening the file in binary mode and then decoding it. Here's a code snippet that demonstrates this approach:

def read_file_content(self, file_path):
    """Read and return the content of a file."""
    file_path_ = Path(file_path)
    if not file_path_.is_file():
        return None
    try:
        with file_path_.open(encoding="utf-8") as file:
            return file.read()
    except UnicodeDecodeError:
        with file_path_.open("rb") as f:
            return f.read().decode("utf-8")

This code attempts to read the file with UTF-8 encoding, and if a UnicodeDecodeError occurs, it reads the file in binary mode and then decodes it using UTF-8. This approach should help resolve the error you're encountering [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@shubhsrathore
Copy link
Author

@dosu, helpful only if you're developing Langflow locally or maintaining a custom deployment of Langflow (e.g., self-hosted on your infra with access to the source code). It won’t apply to the Datastax online version, where you don’t have access to the codebase or the Path(file_path) logic.

Copy link

dosubot bot commented Apr 19, 2025

In the Datastax online version of Langflow, you might not have direct access to modify the codebase. However, you can try the following workaround to address the encoding issue:

  1. File Encoding Conversion: Before uploading the file to Langflow, ensure that it is encoded in UTF-8. You can use a text editor or a command-line tool to convert the file encoding. For example, using the iconv command in a terminal:

    iconv -f original_encoding -t utf-8 input_file -o output_file

    Replace original_encoding with the current encoding of your file (e.g., latin1), input_file with the path to your file, and output_file with the path where you want to save the converted file.

  2. Postman Encoding Parameter: When sending the request via Postman, ensure that the encoding parameter is set correctly. If UTF-8 doesn't work, try using latin1 or another encoding that matches your file's original encoding.

  3. Contact Support: If the issue persists, consider reaching out to the support team of the Datastax online version for further assistance, as they might have specific solutions or configurations to handle such encoding issues.

These steps should help mitigate the encoding problem without needing direct access to the Langflow codebase.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant