Description
Typically, the advice to avoid reading excessively large values into memory when passing an untrusted io.Reader is to wrap the reader in a io.LimitedReader. For encoding/json.NewDecoder this is not necessarily a reasonable approach, since the Decoder may be intended to read from a long lived stream (i.e. a net.Conn) where the user may not want to limit the total amount of bytes read from the Reader across multiple reads, but does want to limit the number of bytes read during a single call to Decoder.Decode (i.e. reading 100 500 byte messages across the lifetime of the Decoder may be perfectly acceptable, but a single 50,000 byte read is not.)
A solution to this problem would be to add a method to Decoder, which allows limiting the amount read from the Reader into the internal Decoder buffer on each call to Decoder.Decode, returning an error if the number of bytes required to decode the value exceeds the set limit. Something along the lines of:
// LimitValueSize limits the number of bytes read from the wrapped io.Reader
// during a single call to dec.Decode. If decoding the value requires reading
// more than limit bytes, dec.Decode will return an error.
func (dec *Decoder) LimitValueSize(limit int)
cc @dsnet
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Activity
rittneje commentedon Nov 15, 2022
Should the bytes that are already in the decoder's buffer count against this limit on subsequent calls to
Decode
? I would think so as otherwise it would lead to inconsistent behavior, but the way it is phrased in your proposal is not entirely clear.dsnet commentedon Nov 15, 2022
The feature seems reasonable in general, but it's unclear whether the limit should be value-based or token-based. Value-based limits make sense when calling
Decode
, while token-based limits make sense when callingToken
.As a work-around, you could reset the limit on an
io.LimitedReader
after everyDecode
call.Jorropo commentedon Nov 15, 2022
@dsnet The annoying thing with LimitedReader is that as you pointed out you need to reset it every time.
Jorropo commentedon Nov 15, 2022
The goal is to help limit how much memory one single message is allowed to keep alive to help prevent memory exhaustion attacks,
So I would say this would limit how big the buffer is allowed, not counting buffering over-reads as this is implementation details.
So all of this to say I think this should be how big one message is allowed to be.
Jorropo commentedon Nov 15, 2022
I think when the limit is reached,
Decode
should continue to read (but discard) the data and fully read over the problematic message.This can be implement with a "simple" statemachine that count how many
()
,{}
,""
, ... we have seen (like how the initial buffering already work but discarding away already red data).This mean if you have 2 messages, the first one is too big but not the second one, decode would error the first time, however if you call it again it would succesfully read the second message.
rsc commentedon Nov 16, 2022
This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group
rsc commentedon Nov 30, 2022
If you call SetValueLimit and then only call Token, is there just no limit?
Using an io.LimitedReader where you reset the limit is also a possibility, of course,
and it seems clearer what it means.
Maybe part of the problem is when the reset happens. What if there was just a SetInputLimit that applied to all future reads, and you have to reset it yourself when you want to allow more data?
rsc commentedon Dec 7, 2022
@rolandshoemaker
Any thoughts on SetInputLimit (with no automatic reset)?
Or about using an io.LimitedReader directly?
rolandshoemaker commentedon Dec 8, 2022
I think in cases where you are directly calling the Decoder, the usage of a io.LimitedReader that you reset yourself is reasonable, but it doesn't address the case where you pass a Decoder to something else which calls Token or Decode, where you may not be able to determine when you need to reset your readers limit.
I've paged out the context about how the parser consumes bytes, but it seems like generally the limit could apply to both values and tokens equally, such that it is essentially only limiting the number of bytes being read for the reader for a particular parsing step. I may be misunderstanding though, @dsnet were you saying that it may make more sense to have different limits for values vs. tokens, because tokens are inherently smaller and as such you may want to have a more constrained limit?
dsnet commentedon Dec 12, 2022
I'm not sure I understand this use case. Why would the called function need to check whether a limit exists or be responsible for resetting the readers? That seems to be the responsibility of the caller, not the callee:
json.Decoder
, then great, the limiter is working as intended.Possibly yes. There are applications that parse a stream of JSON, where the stream is massive. In such an application, you would want to ensure no token exceeds a certain size. Technically, this can be achieved by constantly resetting the per-Value limit right before reading every single token, but that's expensive and probably a frustrating to use API. It would be preferable to specify something like
dec.LimitTokenSize(1<<20)
once and not worry about it again.Also, with value-specific limiting, it's unclear whether limits can be nested. For example:
dec.LimitValueSize
right before the top-level value seems to imply that the top-level value cannot exceed N bytes.dec.Token
to read a leading '[' token, which transitions the decoder state to now parse values within a JSON array.dec.LimitValueSize
again? Is that going to change the top-level value limit? or is it setting a nested limit for the immediate next JSON value within the JSON array? Does it panic?Also, what's the behavior of this functionality with regard to JSON whitespace? Are whitespace counted towards or against the limit? Usually, the purpose of a limit is to ensure some reasonable bounds on memory usage. While whitespace needs to be parsed, the implementation could be done in a way that it only takes O(1) memory to skip N bytes of whitespace.
13 remaining items
rsc commentedon Feb 9, 2023
No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group
[-]proposal: encoding/json: add (*Decoder).SetLimit[/-][+]encoding/json: add (*Decoder).SetLimit[/+]gopherbot commentedon Feb 28, 2023
Change https://go.dev/cl/471875 mentions this issue:
encoding/json: implement SetLimit
dsnet commentedon Mar 3, 2023
I propose renaming this as
SetByteLimit
, which limits the maximum byte size of a JSON value. It is conceivable that we could addSetDepthLimit
, that limits the maximum nested depth of a JSON value.In #58786, @bigpigeon mentioned needing a maximum depth limit, which is a reasonable feature to potentially add.
dsnet commentedon Feb 8, 2025
With the formal filing of #71497, we should adjust this issue to be supporting the following:
along with the following sentinel errors:
[-]encoding/json: add (*Decoder).SetLimit[/-][+]encoding/json/jsontext: add WithByteLimit and WithDepthLimit[/+][-]encoding/json/jsontext: add WithByteLimit and WithDepthLimit[/-][+]proposal: encoding/json/jsontext: add WithByteLimit and WithDepthLimit[/+]