-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/go: provide cache support for code rewriting tools #29430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This also does not play nicely with modules/vendor/internal/cgo/some_other_new_feature/etc. The most reasonable interface from a tools perspective would be that it's invoked by go command and given a single package to transform (either in-place or to a new given location). But this also needs to work with cgo and allow parsing the package with go/types, including importing other packages. But reflecting on go-fuzz experience, it's still an open question if s2s is the right approach. It may be a wrong level for such tools. The current go coverage instrumentation is simpler. But for go-fuzz we have an infinite long tail of corner cases where it miscompiles, crashes, produces invalid code, etc. These corner cases are related to specific constant values used in specific contexts, specific nested expression constructs, specific uses of types, imports, shadowing, etc. Here are just few examples: |
I think this could fall under tooling, so I'll CC @ianthehat for his thoughts.
Reminds me of Not saying that that package should solve this problem, but the source-to-source usecase seems quite similar to what |
Another spot-fix idea: Just as we use
I was pondering that myself yesterday, for obvious reasons. But cmd/compile doesn't have a stable IR like llvm, and I don't think we are anywhere near ready to introduce one. Source is the only lingua franca, which makes it almost the only game in town. Here's another (half-baked) idea. Write a single s2s preprocessor (compiler) that rewrites source code into a regular, minimal SSA form, making it simpler to deal with in all other tools. All vars would have explicit types when declared and systematic names (vNNN or phiNNN). Extra information (original var name?) could perhaps be conveyed in structured comments. All dot imports would be rewritten to be regular imports, all imports would be given explicit unique names for easier syntactic resolution. And so on. This is a bit vague, but I think it'd move us a lot closer from N*M to N+M. |
Well, there is a bunch of SSA passes in the compiler that somehow exist ;) |
Right, but the set of ops, their semantics, etc. change from release to release. There is no serialization format, much less a stable one. And ssa.Values refer (via Aux) to other compiler structures like gc.Node, obj.LSym, which opens a giant can of worms. |
Yet these passes achieve their goal. I mean it may not be a generic solution for allowing users to write arbitrary s2s transformations, but it still may allow to solve the problem at hand (fuzzing coverage). Just like race detector is an SSA pass. |
Yes, we could declare fuzzing special enough to build into the toolchain. That's #19109. But we obviously can't do that for all tools. Over in #19109, Russ asked repeatedly for go-fuzz to be prototyped out of tree to match how it would ultimately be integrated. I was experimenting with that. (Before anyone gets excited, it is enough work that I'm unlikely to follow through on it unless I find sponsorship, and I'm not planning to search for that at the moment.) Many of the stumbling blocks struck me as addressable (using go/packages, fixing corner cases, etc.) or merely requiring extra design (corpus location and migration), but this caching problem did not, which means that it is of independent interest for similar tools, and thus worth discussing independently. Incidentally, cc @alandonovan re: my s2s SSA form pipe dream idea above, since that is very much his wheelhouse. |
All of these problems (go/packages, corner cases, caching) simply don't exist if we go with compiler instrumentation. I think "go-fuzz prototype" includes only the interface user-facing part. So I would start with the interface part, the rest is implementation details. |
But as someone who works on the compiler, I don't want more special things in it, I want less. They increase the testing surface area, make other work harder, etc. And as someone who writes tools, I want the toolchain to be more flexible and welcoming to tools, not to wonder "how does test coverage do it?" and discover the answer is that it is special. I want go-fuzz to be built-in to the toolchain. (Although I have to say, writing a good fuzz function is often surprisingly subtle.) But I also think it is worth trying to face some of the challenges it raises head on, in a general way. If it turns out there are no good answers available, sure, let's make it special. But let's at least try first. |
Fair point. |
Last time we discussed fuzzing with @rsc, I think he argued it should not get special compiler support, but just use s2s transformations and live outside of the go tool, at least initially. As I understood it it was a sustainability argument, since as the project grow we can’t add and maintain support to the compiler for everything, but we can add reliable support for s2s. I agree with that view, and I think it’s consistent with the direction set by go generate. We should fix this. (I even have a pipe dream of extracting cgo and maybe coverage into external, forkable tools, where they can be adapted to more niche use cases, instead of having to live with the tradeoffs we selected for the go tool.) |
@FiloSottile what types of tools/transformations have you considered as a use case?
Well, if the current source coverage is not trivially expressible in this framework, then it's not working and we failed. I would convert source coverage just as a proof of concept because any such framework badly needs realistic use cases, so it would happen just as a side effect. |
Some ideas are a C FFI that trades safety for performance, coverage for main(), and prototyping FuzzX functions in test files. I don’t think there’s any major blockers, just UX and documentation bits like this issue. |
Does this potentially include things like inlining, alias analysis, etc (e.g.
Does this require s2s transformations? Or you mean fuzzing coverage? |
Not the ones I considered. I suppose there's a tradeoff between power and exposed surface: as you say if we push this too far we reach a point where we are exposing too much, so some things do belong in the compiler and not outside. But some things can easily live outside of it with s2s, so we should fix issues like this one to enable that.
IIRC testing uses s2s to call |
This issue started out with a very actionable description: providing the ability to use the build cache with code rewriting tools. While this discussion sounds valuable, I don't think the desire to create such tools will go away anytime soon, or ever. Therefore it would be nice to treat the s2s versus not discussion separately from how we might enable build caching for these use cases. As @josharian notes, main hurdle to that is that the cache is keyed by the full path to the build directory, which is usually a temporary directory. I'm happy to take a stab at a fix, but it would be good to reach consensus on what said change should look like. Any suggestions, @josharian et al? |
@eandre thanks for refocusing the conversation on the specific, actionable issue. I had one suggest in #29430 (comment) (look for “spot-fix”). The design decisions here ultimately rest with @bcmills I believe. |
I expect that we will not get to this for 1.13. (We can revisit after that.) @jayconrod and/or @ianthehat may have some insight into how much impact this might have for the broader tooling ecosystem. |
There's a reasonably good chance there will be changes to caching in the 1.13-1.14 time frame. There are some optimizations I'd like to look at, and I think we'll want to do some streamlining when GOPATH is eventually removed (hopefully that doesn't affect caching much, but a lot of things are surprisingly coupled). I don't think we should expose a public interface in 1.13 because it would be hard to avoid breaking changes. For now, I'd rather focus on fixing cases where something is not cached unnecessarily. For any action, if the inputs and outputs are identical, we should be able to use the same cache key. For example, we might not need to include absolute paths in the cache key if those paths don't affect debug or stack info in the compiled archive. |
To my knowledge the absolute paths are included in the compiled archive as a debugging aid. It seems to me that that fact will continue to be desirable. It also seems to me that the need to do source-to-source rewriting in a temporary directory will continue to be how it's done. So even if we want to make changes to caching, if we're thoughtful about what interface we provide to handle this case it doesn't seem to me that these goals conflict with your work necessarily. |
Indeed. Thus my suggestion to provide a way to tell both cmd/compile and cmd/go that a file’s true path is elsewhere. |
Uses of this feature directly affect its priority. Will it enable new capabilities? Which ones? Are they more important than other capabilities? While there are lots of generally useful things, there are few things that we can actually do. |
The priority is mostly important when it comes to asking somebody to do work. Like I said, I'm happy to work on this if we can reach consensus on how it should be implemented. Based on the current fuzzing implementation it seems like it would be worthy supporting it. Between fuzzing, |
If a source code transformation tool wants to play nicely with the build cache, can't it just generate a stable path? |
It's been a bit since I wrote an S2S tool (possibly even pre-Go1!). I imagine the benefits of an ideal integration with the build cache would be
A stable directory would need to look something like If you do all that, I think it gets most of the points, albeit with effort from the tool writer. It seems like the major losses feature-wise are:
Perhaps a library could take care of all or at least most of that heavy lifting? Even so, in an ideal world, having the |
I agree with Ian's comment. If a source file is moved to a different path (or regenerated there), the output should not be cached in cases where the output depends on the path. If we are judicious with flags like |
The downside of using a stable working directory is that it puts the burden on tool authors to worry about concurrent runs of the tool and gracefully handling that without blowing up. That's both annoying, and hard to get right due to the difficulty of file locking in a cross-platform way. From the perspective of a tool author it would be nice to outsource that complexity to the build cache. |
In #16860 (comment), @josharian said there is some chance a fix for #16860 could also address this issue here. In #16860 (comment), Bryan said he thought Russ is most likely planning on addressing #16860 for 1.13. If that happens for 1.13, would that also effectively address this issue here, or would follow-up work also be required to close out this issue? |
FYI, in #16860, it looks like Russ sent two |
If the new When it comes to avoiding always needing to repeatedly do the s2s transform itself, is the thinking that if the piece of code that wants to do s2s lives within cmd/go (such as a possible future edit: #16860 is now marked as closed based on those CLs. To use the new |
Duplicate of #41145 |
This issue could easily have been a comment on #19109, but opening a new issue seemed better for everyone.
Some tools such as go-fuzz do a source-to-source transformation followed by a build. This doesn't play nicely with cmd/go's cache. The problem is where to the translated files. The natural option is to write them all to a tmp dir, but that busts cmd/go's cache, making 'go fuzz' compilation slow. You could instead write them to the user's GOPATH somewhere, but that's mildly hostile, and you end up needing to fight import path rewrites. You could write them into the vendor directory, except that that doesn't work when a package has already been vendored, plus it is definitely hostile (since many people check in their vendor directory).
Note that the one s2s transformation supported in the main Go toolchain, test coverage, gets explicit support from cmd/go, including caching instrumented source code in the pkg dir. (At least if I read the
-x
output correctly.)I don't know what the right interface looks like here. It could be to tell cmd/go "use this transformation tool", but it's tricky to get cmd/go to call the transformation tool with the right set of information (see e.g. #15677). It could be to ask cmd/go for a work dir and then ask it to treat those work dir entries as replacements for the GOPATH/module packages. Or...?
cc @bcmills @dvyukov
The text was updated successfully, but these errors were encountered: