Skip to content

Streaming contents to S3 fails in latest typescript s3 sdk #7048

@CWMark

Description

@CWMark

Checkboxes for prior research

Describe the bug

Use case: We have a folder on a disk, we want to stream its contents to a tar stream and then stream that to s3. For performance reasons we don't want to create the tar file on disk. We want to upgrade our aws s3 sdk version from 3.282.0 to 3.782.0. This is being done as we want to take advantage of a new feature in s3 where it can calculate checksums for us which is a requirement to enable object lock by default. Currently we do not have object lock enabled by default.

We now get "Unable to calculate hash for flowing readable stream" errors, which seems to come from [https://github.com/smithy-lang/smithy-typescript/blob/2d5a06af634c243bf2469566176dd17afedb1058/packages/hash-stream-node/src/readableStreamHasher.ts#L11C22-L11C38 ]

Logic:

  1. First create a tar stream from the data on disk
  2. create a buffered transform (needed b/c with the latest aws sdk version you cannot push data under 8kb in size)
  3. use pipeline to push data to s3.

Old logic that works in older versions of sdk

  1. First create a tar stream from the data on disk
  2. use a passThrough and pipeline to push data to s3.

Old logic that doesn't work with newer version of sdk:

  • generates "Only the last chunk is allowed to have a size less than 8192 bytes error is Only the last chunk is allowed to have a size less than 8192 bytes" when instead of using the custom transform we simply use a passThrough

Regression Issue

  • Select this option if this issue appears to be a regression.

SDK version number

@aws-sdk/client-s3": "3.782.0"

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

node version: v20.17.0

Reproduction Steps

// the min amount of data to send to S3 at once
const CHUNK_SIZE = 8 * 1024; 
export class BufferedTransform extends Transform {
  private _bufferChunks: Buffer[] = [];
  private _bufferLength = 0;

  constructor(
    startPaused = true
  ) {
    super();
    if (startPaused) {
      this.pause(); // Start paused to avoid immediate flowing
    }
  }

  _transform(chunk: Buffer, _encoding: BufferEncoding, callback: TransformCallback): void {
    // Defensive copy of the incoming chunk for safety, I don't want to assume the caller will not modify it or destroy it before we push it
    const safeChunk = Buffer.from(chunk);
    this._bufferChunks.push(safeChunk);
    this._bufferLength += safeChunk.length;

    if (this._bufferLength >= CHUNK_SIZE) {
      this.pushDataAndClearBuffer();
    }

    callback();
  }

  override resume(): this {
   // logging_line1
    return super.resume();
  }

  _flush(callback: TransformCallback): void {
    this.pushDataAndClearBuffer();
    callback();
  }

  // a helper function that pushes and resets the buffer
  pushDataAndClearBuffer(): void {
    if (this._bufferLength > 0) {
      const combined = Buffer.concat(this._bufferChunks, this._bufferLength);
      // logging_line2
      if (!this.push(combined)) {
        // Pause the stream if the downstream is not ready
        this.once('drain', () => this.resume());
        this.pause();
      }

      // Reset internal buffer state
      this._bufferChunks.length = 0;
      this._bufferLength = 0;
    }
  }
}



private async uploadSingleFile(
    stream: Pack,
    size: number,
    params: Omit<PutObjectCommandInput, 'Body'>
  ): Promise<void> {
    const bufferedStream = new BufferedTransform(true);
    const sendCommandPromise = this.client.send(
      new PutObjectCommand({ ...params, Body: bufferedStream, ContentLength: size })
    );

/*
   if (stream.readableFlowing === true) {
      logging_line3
    } else {
      logging_line4
    }
    if (bufferedStream.readableFlowing === true) {
      logging_line5
    } else {
      logging_line6
    }
*/
    
    const piplelinePromise = pipeline(stream, bufferedStream);

/*
   if (stream.readableFlowing === true) {
      logging_line7
    } else {
      logging_line8
    }
    if (bufferedStream.readableFlowing === true) {
      logging_line9
    } else {
      logging_line10
    }
*/
    await Promise.all([piplelinePromise, sendCommandPromise]);
  }

Observed Behavior

Error "Unable to calculate hash for flowing readable stream" is generated with the exception

Expected Behavior

be able to upload to s3 using streams and pipeline.

Possible Solution

No response

Additional Information/Context

I added logging to many places and this is the results:

  • logging line added to "override resume()", ie logging_line1, is never logged
  • logging line added to "pushDataAndClearBuffer", ie logging_line2, is never logged
  • logging added to uploadSingleFile indicate "readableFlowing" is never true. Not before or after the pipeline

Activity

added
bugThis issue is a bug.
needs-triageThis issue or PR still needs to be triaged.
on May 2, 2025
self-assigned this
on May 6, 2025
aBurmeseDev

aBurmeseDev commented on May 7, 2025

@aBurmeseDev
Contributor

Hey @CWMark - thanks for reaching out.

To resolve this error, Only the last chunk is allowed to have a size less than 8192 bytes, add a client config shown below:

new S3Client({
  requestStreamBufferSize: 32 * 1024,
});

Refer to this comment for the culprit: #6859 (comment)

Hope that helps!

added
response-requestedWaiting on additional info and feedback. Will move to \"closing-soon\" in 7 days.
p2This is a standard priority issue
and removed
needs-triageThis issue or PR still needs to be triaged.
on May 7, 2025
CWMark

CWMark commented on May 7, 2025

@CWMark
Author

Hi @aBurmeseDev, thanks for the response.

I am aware of that setting to allow less than 8192 bytes however if I use that and revert to my original code (no use of my buffering class BufferedTransform) and simply have this as the function to handle small files:

private async uploadSingleFile(
    stream: Pack,
    size: number,
    params: Omit<PutObjectCommandInput, 'Body'>
  ): Promise<void> {
    const passThrough = new PassThrough();
    const piplelinePromise = pipeline(stream, passThrough);
    await Promise.all([
      piplelinePromise,
      this.client.send(new PutObjectCommand({ ...params, Body: passThrough, ContentLength: size }))
    ]);
  }

Then I get a lot more errors, all of which so far seem to be "An error was encountered in a non-retryable streaming request". This seems to be the same thing reported in #6770.

Another thing I tried was to keep the use of my buffering class BufferedTransform as is but change the function that handles small files to delay the pipeline with the hopes of giving AWS time to set up the checksum:

private async uploadSingleFile(
    stream: Pack,
    size: number,
    params: Omit<PutObjectCommandInput, 'Body'>
  ): Promise<void> {
      try {
      const bufferedStream = new BufferedTransform();
      const sendCommandPromise = this.client.send(
        new PutObjectCommand({ ...params, Body: bufferedStream, ContentLength: size })
      );

      // Defer pipeline to next tick to avoid stream getting touched before AWS hooks
      const pipelinePromise = new Promise<void>((resolve, reject) => {
        process.nextTick(() => {
          pipeline(stream, bufferedStream).then(resolve).catch(reject);
        });
      });

      await Promise.all([pipelinePromise, sendCommandPromise]);
    } catch (e) {
      // istanbul ignore next
      this.logger.error(`New Error uploading single file: ${e}`);
      throw e;
    }
  }

This gives less errors however I have seen some "An error was encountered in a non-retryable streaming request" and one "Unable to calculate hash for flowing readable stream".

Any suggestions/ideas? It really seems like the latest AWS s3 sdk is having issues with streaming data.

removed
response-requestedWaiting on additional info and feedback. Will move to \"closing-soon\" in 7 days.
on May 8, 2025
DMCTowns

DMCTowns commented on Jul 23, 2025

@DMCTowns

I'm also getting the "Unable to calculate hash for flowing readable stream" error when trying to stream the response from a fetch request to S3:

// Example URL: https://docs.rs-online.com/4892/0900766b811eacb3.pdf
fetch(url).then(response => {
    if (response.ok && response.body) {
      const key = `queue/${url.replace(/^https?:\/\//i, '')}`
      const input = {
        Body: response.body,
        Bucket: S3Config.bucket.upload,
        Key: key,
        ContentType: 'application/pdf'
      }
      const command = new PutObjectCommand(input)
      return s3Client.send(command).then((result: PutObjectCommandOutput) => ({
        uri: `s3://${S3Config.bucket.upload}/${key}`,
        type,
        eTag: result.ETag,
        version: result.VersionId
      }))
    }
  })
marcesengel

marcesengel commented on Aug 4, 2025

@marcesengel

@DMCTowns what I've found is that the typings are a little off since they are shared between web and nodejs. Looking at https://github.com/smithy-lang/smithy-typescript/blob/b15137dba0091b26b3dd5d6efaac58545cf1c18a/packages/hash-stream-node/src/readableStreamHasher.ts#L11-L13 the field readableFlowing is checked to be null. This works for nodejs readable streams but not for web streams returned by fetch. This can be fixed by the following conversion

import type { ReadableStream as ReadableWebStream } from 'node:stream/web'
import { Readable } from 'node:stream'

const res = await fetch(...)
const nodeStream = Readable.fromWeb(res.body as ReadableWebStream)
// send nodeStream to s3

However at that point uploading still fails for me, now with the warning An error was encountered in a non-retryable streaming request. (see #6770), so for now I'm collecting the stream into a buffer and then send that...

Edit: uploading works using @aws-sdk/lib-storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

bugThis issue is a bug.p2This is a standard priority issuepotential-regressionMarking this issue as a potential regression to be checked by team member

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @marcesengel@DMCTowns@CWMark@aBurmeseDev

      Issue actions

        Streaming contents to S3 fails in latest typescript s3 sdk · Issue #7048 · aws/aws-sdk-js-v3