Skip to content

proposal: encoding/json/jsontext: add WithByteLimit and WithDepthLimit #56733

@rolandshoemaker

Description

@rolandshoemaker

Typically, the advice to avoid reading excessively large values into memory when passing an untrusted io.Reader is to wrap the reader in a io.LimitedReader. For encoding/json.NewDecoder this is not necessarily a reasonable approach, since the Decoder may be intended to read from a long lived stream (i.e. a net.Conn) where the user may not want to limit the total amount of bytes read from the Reader across multiple reads, but does want to limit the number of bytes read during a single call to Decoder.Decode (i.e. reading 100 500 byte messages across the lifetime of the Decoder may be perfectly acceptable, but a single 50,000 byte read is not.)

A solution to this problem would be to add a method to Decoder, which allows limiting the amount read from the Reader into the internal Decoder buffer on each call to Decoder.Decode, returning an error if the number of bytes required to decode the value exceeds the set limit. Something along the lines of:

// LimitValueSize limits the number of bytes read from the wrapped io.Reader
// during a single call to dec.Decode. If decoding the value requires reading
// more than limit bytes, dec.Decode will return an error.
func (dec *Decoder) LimitValueSize(limit int)

cc @dsnet

Activity

added this to the Proposal milestone on Nov 14, 2022
moved this to Incoming in Proposalson Nov 14, 2022
rittneje

rittneje commented on Nov 15, 2022

@rittneje
Contributor

Should the bytes that are already in the decoder's buffer count against this limit on subsequent calls to Decode? I would think so as otherwise it would lead to inconsistent behavior, but the way it is phrased in your proposal is not entirely clear.

dsnet

dsnet commented on Nov 15, 2022

@dsnet
Member

The feature seems reasonable in general, but it's unclear whether the limit should be value-based or token-based. Value-based limits make sense when calling Decode, while token-based limits make sense when calling Token.

As a work-around, you could reset the limit on an io.LimitedReader after every Decode call.

Jorropo

Jorropo commented on Nov 15, 2022

@Jorropo
Member

@dsnet The annoying thing with LimitedReader is that as you pointed out you need to reset it every time.

Jorropo

Jorropo commented on Nov 15, 2022

@Jorropo
Member

Should the bytes that are already in the decoder's buffer count against this limit on subsequent calls to Decode? I would think so as otherwise it would lead to inconsistent behavior, but the way it is phrased in your proposal is not entirely clear.

The goal is to help limit how much memory one single message is allowed to keep alive to help prevent memory exhaustion attacks,
So I would say this would limit how big the buffer is allowed, not counting buffering over-reads as this is implementation details.
So all of this to say I think this should be how big one message is allowed to be.

Jorropo

Jorropo commented on Nov 15, 2022

@Jorropo
Member

I think when the limit is reached, Decode should continue to read (but discard) the data and fully read over the problematic message.
This can be implement with a "simple" statemachine that count how many (),{},"", ... we have seen (like how the initial buffering already work but discarding away already red data).

This mean if you have 2 messages, the first one is too big but not the second one, decode would error the first time, however if you call it again it would succesfully read the second message.

rsc

rsc commented on Nov 16, 2022

@rsc
Contributor

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

moved this from Incoming to Active in Proposalson Nov 16, 2022
rsc

rsc commented on Nov 30, 2022

@rsc
Contributor

If you call SetValueLimit and then only call Token, is there just no limit?

Using an io.LimitedReader where you reset the limit is also a possibility, of course,
and it seems clearer what it means.

Maybe part of the problem is when the reset happens. What if there was just a SetInputLimit that applied to all future reads, and you have to reset it yourself when you want to allow more data?

rsc

rsc commented on Dec 7, 2022

@rsc
Contributor

@rolandshoemaker
Any thoughts on SetInputLimit (with no automatic reset)?
Or about using an io.LimitedReader directly?

rolandshoemaker

rolandshoemaker commented on Dec 8, 2022

@rolandshoemaker
MemberAuthor

I think in cases where you are directly calling the Decoder, the usage of a io.LimitedReader that you reset yourself is reasonable, but it doesn't address the case where you pass a Decoder to something else which calls Token or Decode, where you may not be able to determine when you need to reset your readers limit.

I've paged out the context about how the parser consumes bytes, but it seems like generally the limit could apply to both values and tokens equally, such that it is essentially only limiting the number of bytes being read for the reader for a particular parsing step. I may be misunderstanding though, @dsnet were you saying that it may make more sense to have different limits for values vs. tokens, because tokens are inherently smaller and as such you may want to have a more constrained limit?

dsnet

dsnet commented on Dec 12, 2022

@dsnet
Member

it doesn't address the case where you pass a Decoder to something else which calls Token or Decode, where you may not be able to determine when you need to reset your readers limit.

I'm not sure I understand this use case. Why would the called function need to check whether a limit exists or be responsible for resetting the readers? That seems to be the responsibility of the caller, not the callee:

  1. If the callee experiences an error as it uses the json.Decoder, then great, the limiter is working as intended.
  2. If the callee function returns successfully, then the caller knows the limit was not hit and resets the limit.

@dsnet were you saying that it may make more sense to have different limits for values vs. tokens, because tokens are inherently smaller and as such you may want to have a more constrained limit?

Possibly yes. There are applications that parse a stream of JSON, where the stream is massive. In such an application, you would want to ensure no token exceeds a certain size. Technically, this can be achieved by constantly resetting the per-Value limit right before reading every single token, but that's expensive and probably a frustrating to use API. It would be preferable to specify something like dec.LimitTokenSize(1<<20) once and not worry about it again.

Also, with value-specific limiting, it's unclear whether limits can be nested. For example:

  • Calling dec.LimitValueSize right before the top-level value seems to imply that the top-level value cannot exceed N bytes.
  • Suppose I then call dec.Token to read a leading '[' token, which transitions the decoder state to now parse values within a JSON array.
  • What's the behavior of now calling dec.LimitValueSize again? Is that going to change the top-level value limit? or is it setting a nested limit for the immediate next JSON value within the JSON array? Does it panic?

Also, what's the behavior of this functionality with regard to JSON whitespace? Are whitespace counted towards or against the limit? Usually, the purpose of a limit is to ensure some reasonable bounds on memory usage. While whitespace needs to be parsed, the implementation could be done in a way that it only takes O(1) memory to skip N bytes of whitespace.

13 remaining items

moved this from Likely Accept to Accepted in Proposalson Feb 9, 2023
rsc

rsc commented on Feb 9, 2023

@rsc
Contributor

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

changed the title [-]proposal: encoding/json: add (*Decoder).SetLimit[/-] [+]encoding/json: add (*Decoder).SetLimit[/+] on Feb 9, 2023
modified the milestones: Proposal, Backlog on Feb 9, 2023
gopherbot

gopherbot commented on Feb 28, 2023

@gopherbot
Contributor

Change https://go.dev/cl/471875 mentions this issue: encoding/json: implement SetLimit

dsnet

dsnet commented on Mar 3, 2023

@dsnet
Member

I propose renaming this as SetByteLimit, which limits the maximum byte size of a JSON value. It is conceivable that we could add SetDepthLimit, that limits the maximum nested depth of a JSON value.

In #58786, @bigpigeon mentioned needing a maximum depth limit, which is a reasonable feature to potentially add.

dsnet

dsnet commented on Feb 8, 2025

@dsnet
Member

With the formal filing of #71497, we should adjust this issue to be supporting the following:

package jsontext

// WithByteLimit sets a limit on the number of bytes of input or output bytes
// that may be consumed or produced for each top-level JSON value.
// If a [Decoder] or [Encoder] method call would need to consume/produce
// more than a total of n bytes to make progress on the top-level JSON value,
// then the call will report an error.
// Whitespace before and within the top-level value are counted against the limit.
// Whitespace after a top-level value are counted against the limit
// for the next top-level value.
//
// A non-positive limit is equivalent to no limit at all.
// If unspecified, the default limit is no limit at all.
// This affects either encoding or decoding.
func WithByteLimit(n int64) Options

// WithDepthLimit sets a limit on the maximum depth of JSON nesting
// that may be consumed or produced for each top-level JSON value.
// If a [Decoder] or [Encoder] method call would need to consume or produce
// a depth greater than n to make progress on the top-level JSON value,
// then the call will report an error.
//
// A non-positive limit is equivalent to no limit at all.
// If unspecified, the default limit is 10000.
// This affects either encoding or decoding.
func WithDepthLimit(n int) Options

along with the following sentinel errors:

// ErrMaxBytes indicates that the [Encoder] or [Decoder]
// could not make progress since the current top-level JSON value
// has a size that would exceed the specified maximum byte limit.
// This error is directly wrapped within a [SyntacticError] when produced.
var ErrMaxBytes = errors.New("exceeded maximum number of bytes")

// ErrMaxDepth indicates that the [Encoder] or [Decoder]
// could not make progress since the current top-level JSON value
// has a nesting that would exceed the specified maximum depth limit.
// This error is directly wrapped within a [SyntacticError] when produced.
var ErrMaxDepth = errors.New("exceeded maximum nested depth")
changed the title [-]encoding/json: add (*Decoder).SetLimit[/-] [+]encoding/json/jsontext: add WithByteLimit and WithDepthLimit[/+] on Feb 8, 2025
changed the title [-]encoding/json/jsontext: add WithByteLimit and WithDepthLimit[/-] [+]proposal: encoding/json/jsontext: add WithByteLimit and WithDepthLimit[/+] on Feb 8, 2025
added
LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a tool
on Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a toolProposalProposal-Accepted

    Type

    No type

    Projects

    Status

    Accepted

    Development

    No branches or pull requests

      Participants

      @rsc@ianlancetaylor@rolandshoemaker@dsnet@gopherbot

      Issue actions

        proposal: encoding/json/jsontext: add WithByteLimit and WithDepthLimit · Issue #56733 · golang/go