Skip to content

stream: socket.setEncoding(null) to receive binary Buffers rather than strings has no effect #6038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
inversion opened this issue Apr 4, 2016 · 26 comments
Labels
doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem.

Comments

@inversion
Copy link
Contributor

  • Version: v5.8.0
  • Platform: Darwin 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64
  • Subsystem: stream

I've tried node 5.8.0, 5.10.0 and 4.4.2. I always receive string objects in my data handler unless I manually remove the decoder from the socket readableState:

const server = net.createServer( ( c ) => {

    // See https://nodejs.org/api/stream.html#stream_readable_setencoding_encoding
    c.setEncoding( null );

    // Hack that must be added to make this work as expected
    delete c._readableState.decoder;

    c.on( 'data', ( d ) => {

        // Expect 'object' but get 'string' without deleting the decoder manually
        console.log( typeof d );
    } );

} );
@ChALkeR ChALkeR added the net Issues and PRs related to the net subsystem. label Apr 4, 2016
@inversion
Copy link
Contributor Author

Perhaps this might be a bug in ReadableStream based on a cursory examination of https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L193

I haven't contributed to Node before but if it seems like a correct approach to null out the encoder if enc === null in that function then I would be more than happy to submit a PR/tests.

@mscdex mscdex added the stream Issues and PRs related to the stream subsystem. label Apr 4, 2016
@mscdex
Copy link
Contributor

mscdex commented Apr 4, 2016

Doc change happened in #5155.

@TriAnMan @silverwind @jasnell From what I can tell passing null has never disabled an existing StringDecoder instance?

@mscdex mscdex removed the net Issues and PRs related to the net subsystem. label Apr 4, 2016
@inversion inversion changed the title net: socket.setEncoding(null) to receive binary Buffers rather than strings has no effect stream: socket.setEncoding(null) to receive binary Buffers rather than strings has no effect Apr 4, 2016
@silverwind
Copy link
Contributor

Hmm, that doc change looks wrong, actually. string_decoder does fallback to 'utf8' when encoding is falsy.

@TriAnMan
Copy link

I've using this workaround for v4.3.1 and for v0.10.36
Here is a code snippet:

var processServerResponse = function (data) {
    data.setEncoding(null); // Fix the bug with braked utf8 symbols, converted into two unicode REPLACEMENT CHARACTER

    var str = '';
    data.on('data', function (chunk) {
        str += chunk;
    });
    data.on('end', function () {
        // do something with str
    });
};

http.request({/* some valid http options */}, processServerResponse).end();

I don't know what is a typeof chunk but can test.

@TriAnMan
Copy link

If I use data.setEncoding(null) then typeof chunk is a 'string'
If I comment out setEncoding then typeof chunk becomes an 'object'

This is for v4.3.1, can't test v0.10.36 right now.

@jasnell
Copy link
Member

jasnell commented Apr 13, 2016

@mscdex ... as far as I know, utf8 is the default when setEncoding is null. If calling data.setEncoding('buffer') does not make it return a buffer, then that's a change we should make. We use encoding = 'buffer' in other places to explicitly ask for returning buffers (e.g. in 'fs')

@mscdex
Copy link
Contributor

mscdex commented Apr 13, 2016

@jasnell IIRC it does not return buffers using 'buffer' since the value is passed to StringDecoder() which would treat it as a single-byte/binary string encoding.

@timelesshaze
Copy link

It doesn't work with 'buffer' because it's not a valid encoding.

Buffer.isEncoding('buffer')
> false

The problem is caused by the falsy check in string_decoder.js as @silverwind already said.
Another quick fix is to use non-flowing mode.

'use strict';
const cp = require('child_process');
const child = cp.spawn('ls', { stdio:['ignore','pipe','ignore'] });

// This works
child.stdout.on('readable', () => {
  let chunk;
  while (null !== (chunk = child.stdout.read())) {
    console.log('read %s', typeof chunk);
  }
});

// Type will always be a string
child.stdout.setEncoding(null);
child.stdout.on('data', (chunk) => {
  console.log('onData %s', typeof chunk);
});

@mhart
Copy link
Contributor

mhart commented May 31, 2016

This workaround also works – pretty hacky but does the trick until this is fixed (and backported?):

// Simulate socket.setEncoding(null):
if (socket._readableState) {
  delete socket._readableState.decoder
  delete socket._readableState.encoding
}

EDIT: Sorry, just noticed this is basically in the issue description – my bad for not reading fully!

addaleax added a commit to addaleax/node that referenced this issue Apr 30, 2017
Make `.setEncoding()` reset streams to their default state of
not decoding data.

Fixes: nodejs#6038
@addaleax
Copy link
Member

PR to fix: #12766

@Trott Trott added the doc Issues and PRs related to the documentations. label Aug 6, 2017
@Trott
Copy link
Member

Trott commented Aug 6, 2017

The PR to fix appears stalled. A doc-only update as at least an interim would be welcome. Labeling doc....

@toficofi
Copy link

A year on, and I'm still having to apply the hack to read binary :(

@Trott
Copy link
Member

Trott commented Jun 14, 2018

@nodejs/streams

@jasnell
Copy link
Member

jasnell commented Oct 19, 2018

Ping @nodejs/streams ... @mafintosh @mcollina ... any thoughts here?

@mcollina
Copy link
Member

This needs to be done. It would have to wait until December :/

@thejhh
Copy link

thejhh commented Dec 27, 2018

This seems to be an issue also with the http.IncomingMessage.

I had to use the hack to read binary body to Buffer for HTTP request. I was using Node v10.13.0.

@mscdex
Copy link
Contributor

mscdex commented Dec 28, 2018

@jheusala As long as you don't call setEncoding(), you should get Buffer chunks on the incoming message stream by default.

@thejhh
Copy link

thejhh commented Dec 28, 2018

@mscdex That's what I thought, too. However I have nowhere in my code another .setEncoding(), nor "encoding" or "utf" strings found. (I tried to do a smaller example but it works as expected; so something else in my code must be triggering it as string stream.)

@thejhh
Copy link

thejhh commented Dec 28, 2018

Strange. It started to work also on my complete app. I could swear it was receiving strings, not Buffers, last evening.

@jasnell
Copy link
Member

jasnell commented Jun 25, 2020

@nodejs/streams ... what's the status on this one? Should this remain open?

@mcollina
Copy link
Member

The work of @addaleax to fix this (#12766) never landed.

I'm closing due to lack of activity/interest in this problem.

@mhart
Copy link
Contributor

mhart commented Jun 28, 2020

Should this behaviour be documented at least?

@addaleax
Copy link
Member

@dchest I don’t know if this can be fixed, though. There’s a specific reason why my PR wasn’t merged – the decoding step happens at the time the data is pushed into the readable stream but before it is actually read, that means, the buffered data inside the stream is potentially already decoded. We could re-encode it, but the problem is that the result is potentially different from the original input data in that case.

So I think this is something that would need to be documented, yes.

@mhart
Copy link
Contributor

mhart commented Jun 28, 2020

@addaleax isn't that true for any setEncoding call though? Why would it only be setEncoding(null) that would be affected by that?

@addaleax
Copy link
Member

@mhart Right, for other encodings this is problematic as well.

I think the way to fix this would be to move decoding to the point where the chunks are emitted, rather than the point where they are pushed into the stream. That would enable solving this for both null/'buffer' and other encodings.

@ronag
Copy link
Member

ronag commented Jun 28, 2020

think the way to fix this would be to move decoding to the point where the chunks are emitted, rather than the point where they are pushed into the stream.

That sounds like a way forward... though probably a bit difficult to implement.

@ronag ronag mentioned this issue Jun 28, 2020
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.