Skip to content

[dev.fuzz] internal/fuzz: mutator should generate valid UTF-8 for strings #46874

Closed
@jayconrod

Description

@jayconrod

Currently, we use the same mutation engine for string and []byte. This tends to generate a lot of invalid UTF-8 strings that aren't usable for many use cases. While invalid UTF-8 is likely to turn up many shallow parser bugs, it may make the mutator less effective at finding more subtle, deeper bugs.

We should have an option to make the mutator only generate UTF-8. Some ideas:

  • Create a UTF8String defined type. A fuzz function that accepts that as a parameter would only get valid UTF-8 strings.
  • Only provide valid UTF-8 strings for string parameters. A function could request []byte for random bytes, and that can still be converted to string.

cc @golang/fuzzing @findleyr

Activity

added
NeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.
fuzzIssues related to native fuzzing support
on Jun 22, 2021
added this to the Unreleased milestone on Jun 22, 2021
dsnet

dsnet commented on Jun 22, 2021

@dsnet
Member

Some ideas ...

IIUC, one of the eventual goals is to be able to fuzz structs. If so, I'm not sure either idea works as well for higher-order types, where the field type is already a string and can't be easily changed.

Perhaps there can be a method hung off of testing.F that configures parameters about the mutator?

findleyr

findleyr commented on Jun 22, 2021

@findleyr
Member

Create a UTF8String defined type. A fuzz function that accepts that as a parameter would only get valid UTF-8 strings.

I like this idea. We could also have ASCIIString. I think it would be convenient to have such a mechanism, for exactly the reasons you list, but I'm wary of making it the default. Despite my comment about the readability of non-utf8 string examples, consuming arbitrary bytes DID turn up a bug that wouldn't have been found with valid characters alone (#46855).

IIUC, one of the eventual goals is to be able to fuzz structs. If so, I'm not sure either idea works as well for higher-order types, where the field type is already a string and can't be easily changed.

I wonder if fields could be explicitly ignored by the mutator, perhaps with a struct tag, so that they may be substituted by other means. I.e.

type S struct {
Count int
Input string fuzz:"ignore"
}

func f(t *testing.T, s S, input fuzz.UTF8String) {
s.Input = string(input)
...
}

Edit: I already dislike this last suggestion, at least as I manifested it. Fuzzing, being testing, should not add struct tags to non-test objects.

CAFxX

CAFxX commented on Jun 22, 2021

@CAFxX
Contributor

Perhaps there can be a method hung off of testing.F that configures parameters about the mutator?

Or, we could start with trying to have the mutator automatically balance exploration and exploitation based on which kind of string (e.g. all valid utf-8 runes, mix of valid and invalid utf-8 runes, binary data) yield the best results, in terms of defects discovered, on that specific target.

So e.g. if for that target the fuzzer detects that all-valid utf-8 runes strings unearth more defects, it will progressively bias the mutator to generate more all-valid utf-8 runes strings, and less of the others types of strings. The benefit of this approach being that it requires no changes to the code being fuzzed and no additional knobs (although it may be slightly slower).

Later on, if needed, we could easily add a way to disable this auto-tuning by passing an explicit configuration as suggested by @dsnet.

rolandshoemaker

rolandshoemaker commented on Jun 22, 2021

@rolandshoemaker
Member

Or, we could start with trying to have the mutator automatically balance exploration and exploitation based on which kind of string (e.g. all valid utf-8 runes, mix of valid and invalid utf-8 runes, binary data) yield the best results, in terms of defects discovered, on that specific target.

Essentially this is the concept behind some of the advanced fuzzing strategies discussed in #46507. Implementing input prioritization methods which focus on inputs which produce more coverage (among other characteristics) naturally biases towards mutations that produce inputs that the program understands, without actually having to alter the mutator at all.

That said there have been some discussions previously about biasing certain mutators, i.e. in order to prioritize mutators which reduce the size of inputs over those that increase the size of inputs etc. In a similar vein we may want to bias the mutators for strings, to produce inputs with valid UTF-8, over invalid UTF-8. I'd like to do some evaluation with a string based target to see how much of an effect of coverage this produces.

katiehockman

katiehockman commented on Jun 23, 2021

@katiehockman
Contributor

Only provide valid UTF-8 strings for string parameters

I don't think we should do this. There is no property in the language which states that strings must be UTF-8, and we shouldn't special case this for fuzzing.

Rob made an important point:

Despite my comment about the readability of non-utf8 string examples, consuming arbitrary bytes DID turn up a bug that wouldn't have been found with valid characters alone (#46855).

This stuck out to me because it demonstrates one of the biggest benefits of fuzzing: the mutator is able to generate inputs to f.Fuzz that are vastly different from what you may have assumed is "valid input". Limiting to UTF-8 by default would take away from this.

A few general thoughts:

  • https://go.googlesource.com/proposal/+/master/design/draft-fuzzing.md#custom-generators talks a bit about what a custom generator might look like with the current design (once structs are supported). We could provide our own set of custom generators in the testing package in the future, including one for UTF-8-only strings
  • I also wonder if testing/quick would make more sense for inputs like this which have very specific generator needs.
katiehockman

katiehockman commented on Jun 29, 2021

@katiehockman
Contributor

Is there anything else to discuss on this issue, or are we okay with closing it?

jayconrod

jayconrod commented on Jun 29, 2021

@jayconrod
ContributorAuthor

Let's close. I think we're in agreement that UTF-8 by default is not a good idea. Custom mutators or defined types are appealing, but let's open an issue for those when we have a design.

locked and limited conversation to collaborators on Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsDecisionFeedback is required from experts, contributors, and/or the community before a change can be made.fuzzIssues related to native fuzzing support

    Type

    No type

    Projects

    Status

    No status

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jayconrod@CAFxX@rolandshoemaker@dsnet@gopherbot

        Issue actions

          [dev.fuzz] internal/fuzz: mutator should generate valid UTF-8 for strings · Issue #46874 · golang/go