Skip to content

blob.upload_from_string get error Caused by SSLError(SSLEOFError #1242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sky-silence opened this issue Mar 21, 2024 · 3 comments
Closed

blob.upload_from_string get error Caused by SSLError(SSLEOFError #1242

sky-silence opened this issue Mar 21, 2024 · 3 comments
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API.

Comments

@sky-silence
Copy link

sky-silence commented Mar 21, 2024

Hi, Im getting this error since mid Mar when before it was working well.

Environment details

  • OS type and version: deploy in the google cloud functions
  • Python version: 3.11
  • pip version: -
  • google-cloud-storage 2.16.0

Code example

# example
`import logging
import os
import random
import requests
import string
from bs4 import BeautifulSoup
from google.cloud import storage


def html_to_attachment(request_json):
    storage_client = storage.Client()
    bucket_name = os.environ.get("CLOUD_STORAGE_BUCKET")
    bucket = storage_client.bucket(bucket_name)
    blob_name = generate_random_string(12) + '.txt'
    blob = bucket.blob(blob_name)

    text_content = html_to_text(request_json['html'])
    logging.warning("html to attachment text content is: " + text_content)
    blob.upload_from_string(text_content, content_type='text/text')

    blob_url = f'https://storage.cloud.google.com/{bucket_name}/{blob_name}'
    request_json['attachments'] = [{'url': blob_url}]
    return blob_url


def beyond_to_attachment(request_json):
    """This function is Beyond specific."""
    storage_client = storage.Client()
    bucket_name = os.environ.get("CLOUD_STORAGE_BUCKET")
    bucket = storage_client.bucket(bucket_name)
    blob_name = generate_random_string(12) + '.txt'
    blob = bucket.blob(blob_name)

    # Extract all the <a> tag texts
    soup = BeautifulSoup(request_json['html'], 'html.parser')
    a_tags = soup.find_all('a')
    a_tag_texts = [a_tag.get_text(strip=True) for a_tag in a_tags]
    # Find the order URL then download the order
    a_tag_texts = [text for text in a_tag_texts if text and text.startswith('http')]
    url = a_tag_texts[0]
    response = requests.get(url)
    html_content = response.text

    text_content = html_to_text(html_content)
    blob.upload_from_string(text_content, content_type='text/text')

    blob_url = f'https://storage.cloud.google.com/{bucket_name}/{blob_name}'
    request_json['attachments'] = [{'url': blob_url}]
    return blob_url


def html_to_text(html_content):
    # Parse the HTML content
    soup = BeautifulSoup(html_content, 'html.parser')

    # Process list items
    for li in soup.find_all('li'):
        li.string = f'* {li.string}'
        li.insert_after('\n')

    # Process tables
    for row in soup.find_all('table'):
        row.insert_before('{====\n')
        row.insert_after('====}\n')

    for row in soup.find_all('tr'):
        row.insert_before('[--\n')
        row.insert_after('--]\n')

    for cell in soup.find_all(['td', 'th']):
        cell.insert_after('\n')

    processed_text = soup.get_text(separator='\n', strip=True)
    return processed_text


def generate_random_string(length):
    """Generates a random string of the specified length.

    Args:
      length: The desired length of the random string.

    Returns:
      A random string consisting of letters and digits.
    """
    characters = string.ascii_letters + string.digits
    random_string = ''.join(random.choice(characters) for _ in range(length))
    return random_string`

Stack trace

`requests.exceptions.SSLError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /upload/storage/v1/b/peblla-order-test/o?uploadType=multipart&ifGenerationMatch=0 (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')))`

It worked when I used upload_from_string in the html_to_attachment function to upload the file, but when I used beyond_to_attachment to upload the file, it had problems uploading strings, which seemed to be an ssl related error

Thanks!

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Mar 21, 2024
@cdeln
Copy link

cdeln commented Jun 7, 2024

Hi @sky-silence . I (and several others) are having similar problems. Can you please put together a minimal reproducing script (your example file is incomplete). It would be very helpful for many fellow storage users! Thanks

@cojenco
Copy link
Contributor

cojenco commented Jun 7, 2024

Thanks for reporting. We're looking into this, and it seems to be caused by issues in the underlying cpython and urllib3 packages. The urllib3 open issue is not resolved yet, so I'd suggest trying a few workarounds in the meanwhile

  • Change to use python 3.9 (cpython issue does not affect python versions < 3.10)
  • Ensure that you have retries enabled for the uploads, either (1) apply if_generation_match preconditions with the upload method, as demonstrated here, or (2) modify to always retry by using DEFAULT_RETRY, as shown below
from google.cloud.storage.retry import DEFAULT_RETRY

upload_from_string(retry=DEFAULT_RETRY)

[cpython]

  • The cpython bug affects python 3.10+. Errors such as broken pipe error and connection reset by peer that occur when the connection is interrupted during ssl communication have all been changed to SSLEOFError errors.
  • Bug is fixed in latest python version, but not backported to python 3.10 and 3.11. Features developed in Python 3.9 or under do not need action.

[urllib3]

  • Open issue not yet resolved. Main impact being that the connection errors are being misinterpreted as SSLEOFError, and the urllib3 library raises theSSLEOFError as a result. Whereas urllib3 gracefully handles broken pipe/connection rest by peer errors.

urllib3.exceptions.SSLError are considered retryable errors for upload operations in the python storage client as long as retry is enabled. However, without further resolution in the underlying urllib3 library, it's possible the SSL errors will exhausts retries if they are exacerbated by overall network instability.

@cojenco cojenco self-assigned this Jun 12, 2024
@cojenco
Copy link
Contributor

cojenco commented Aug 27, 2024

The team still has no success in reproducing this. Being as there is currently a workaround I am going to close this issue. If you feel that this did not resolve your issue please feel free to reopen.

@cojenco cojenco closed this as completed Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API.
Projects
None yet
Development

No branches or pull requests

3 participants