Description
This issue is derived from dominikh/go-tools#924.
What version of Go are you using (go version
)?
1.16rc1
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env
)?
Linux, ARM64.
What did you do?
When using analyzers implemented using the analysis package, I hit upon a case where facts being exported and imported were being matched up with the wrong methods. (This issue was a method-specific one, and the proposed fix is also method specific.)
For example, instrumenting the facts package in Encode and Decode for serialization and deserialization of facts:
2021/02/09 12:37:21 export: exported func (*pkg/sentry/vfs/vfs.DynamicBytesFileDescriptionImpl).StateFields() []string as DynamicBytesFileDescriptionImpl.M11
2021/02/09 12:18:54 import: imported DynamicBytesFileDescriptionImpl.M11 as func (*pkg/sentry/vfs/vfs.DynamicBytesFileDescriptionImpl).Read(ctx pkg/context/context.Context, dst pkg/usermem/usermem.IOSequence, opts google3/third_party/gvir/pkg/sentry/vfs/vfs.ReadOptions) (int64, error)
It turns out that the objectpath package is producing the name above, and this is what is used for keying the type in the exported fact data. During analysis, the type information for imported packages is sourced from the compiled artifact via gcexportdata, but the current package types are synthesized directly from the source files, and facts are derived from those types. However, this means that there is the possibility of facts being constructed using a different method ordering that what might appear in the compiler artifact, and therefore fact serialization may not key types correctly (and whoever is importing this fact is in for a bad time).
Since this logic is effectively built in to the compiler (which is often a binary package), this has the potential to cause issues for any analysis packages that may link against a different go/types package or perturb the ordering for NamedType.methods.
Possible Fix
While I realize that this API is not stable, the issue is effectively mitigated by making the Method ordering stable for NamedTypes by performing a sort on the first call to Method (and ensuring that no calls to AddMethod happen after that point). This should eliminate the sensitivity issue (but won't fix breakages in the case of genuine binary incompatibility, which is fine).
A draft of this fix is posted here: https://go-review.googlesource.com/c/go/+/290750
(But of course, I could be way off in my diagnose, or there could be a better solution.)
Activity
prattmic commentedon Feb 9, 2021
cc @griesemer @matloob
[-]Encoding used by objectpath is inconsistent for use by compiler & tools/analysis[/-][+]go/types, x/tools/go/types, x/tools/analysis: encoding used by objectpath is inconsistent for use by compiler & tools/analysis[/+]prattmic commentedon Feb 10, 2021
@amscanne do the types constructed from gcexportdata always contain all of the methods that are present when analyzing from source (such as unexported methods)? If not, it seems that simply sorting the methods wouldn't be sufficient. Instead it seems like objectpath might need the actual method name rather than just an index?
amscanne commentedon Feb 12, 2021
Great question. I have no idea. I'll do a bit more digging. However, sorting does solve the issue in this case, so I suspect the answer might yes here (although that does not mean it is yes everywhere).
prattmic commentedon Mar 3, 2021
cc @timothy-king @guodongli-google as well, since y'all have been looking at analysis recently and may have thoughts here.
amscanne commentedon Jun 29, 2021
I think I have a better understanding of this issue.
The object file itself seems to contain rich type information (along with names). Based on my reading of the gcexportdata code, it seems like it is sorted in lexicographical order (presumably by the toolchain when emitting the object). Because of this, gcexportdata will always decode in the same lexicographical order. This is good, because it establishes a consistent ordering.
The problem is fundamentally with objectpath, which relies on this ordering. The objectpath package encodes methods in a way that relies on method ordering, as opposed to using names (which almost everything else does). Therefore, the objectpath encoding is susceptible to the method ordering on *types.Func objects. Note that *types.Interface objects are explicitly ordered during construction, so this indexing scheme works fine for interfaces -- just not methods.
The facts framework relies on objectpath to generate consistent keys for types. If two *types.Package objects are the same with the exception of method ordering on relevant *types.Func, then objectpath will generate two different paths for many objects (functions themselves, parameters, etc.).
For analysis, there are two different sources for the *types.Package: the current package comes from the source code (parsing and type checking), while the *types.Package for dependencies comes from gcexportdata. So the facts are saved using the source method ordering, while they are loaded using the gcexportdata method ordering (lexicographical).
It was suggested in [1] that type checking will sort methods, but I don't believe that is true. I think that ast parsing and type checking is sensitive to the ordering of files being parsed. This must be the case because the gcexportdata import does have a consistent lexicographical ordering imposed, as noted at the top.
As some evidence of this, my current workaround is to sort all input files [2] which seems to permute the types generated sufficiently to match in my case (though this is extremely fragile and works only for now). I've added sanity checking for the binary vs AST-derived types, and this files simply by removing the sort in this case.
So there are two reasonable solutions,
Since (1) is unlikely to break anything (both uses of objectpath will get the exact same encoding, since gcexportdata will already be sorted and objectpath will index into that list), I think this makes more sense.
[1] https://go-review.googlesource.com/c/go/+/290750
[2] https://github.com/google/gvisor/blob/dc1b3884f30d96053dae550d3c40d035c8893d4b/tools/nogo/nogo.go#L465
amscanne commentedon Jun 29, 2021
I've sent a change to use a "canonical" ordering for objectpath:
https://go-review.googlesource.com/c/tools/+/331789
amscanne commentedon Jun 30, 2021
As expected, the workaround is too brittle to work correctly. While everything works in one configuration, another configuration (tsan/race) yields type conflicts. I think the proper objectpath fix is the way forward.
mdempsky commentedon Jul 12, 2021
Package-scope declarations are ordered lexicographically, but methods aren't declared at package scope. They're attached to their receiver type.
How robust is objectpath expected to be about different build configurations? In general, changing build tags can arbitrarily change a type's definition, which I would think would necessarily invalidate objectpath strings.
timothy-king commentedon Jul 13, 2021
Here is a description of the reproducers for this issue that I am aware of:
We can make method ids in objectpath agnostic to the GoFile order of these two paths by sorting them.
If "foo_generated.go" and "foo.go" contained
init()
functions, we would be changing the expected order these are executed in. This can potentially be resolved separately.findleyr commentedon Jul 15, 2021
Hi, just catching up. As I commented on https://golang.org/cl/331789, I'm a little concerned about changing objectpath serialization. I think we should do it, but want to go over the compatibility implications.
Here's my analysis:
Does anyone else have observations or concerns with respect to backwards compatibility of objectpath encoding? I have very little context on this package, but based on my analysis I think it's a gray area. Since this is fixing a bug, I think we should proceed. We should probably also make it explicit that objectpath encoding may change.
mdempsky commentedon Jul 15, 2021
FWIW, the current encoding depends on implementation details of the export data format that I at least consider to be unstable. E.g., I've already landed CLs on the dev.typeparams branch that sort methods before exporting them, so that's going to change objectpath's method numbering anyway. (And the x/tools exporter already sorts them too, but using a different sort order.)
So to the extent that objectpath needs to be stable, it needs to be decoupled from the export data ordering of method's anyway.
If it's important to maintain historical sort order, it might be able to sort methods based on Pos. But I think sorting on Id is simpler / more robust.
timothy-king commentedon Jul 15, 2021
I had fairly similar concerns and went through roughly the same checklist. Most of my observations were left as comments on https://golang.org/cl/331789. I think we are okay.
At the moment objectpath is determined by the order files are parsed. This means if two tools disagree about the file order the method ids are unstable anyways. We some evidence that this is happening. (See my previous comment for details.) My understanding of the objectpath documentation is that this was not the intention. So plausibly this is a bug in the implementation.
This would probably need to be similar to other interfaces like gcexportdata. This needs to be consistent while stored in the cache by the same tool while analyzing a different project.
Personally I think we may just want to stop using numbers for identifying the methods and switch to method names in the encoding. Quicker (asymptotically at least) writing and lookup times (after https://golang.org/cl/331789 goes in), removes the file ordering concerns and seems conceptually stable. It may also just be robust enough that clearly documenting the conditions that objectpath is stable w.r.t. is not worth it? The main cost that I can think of is additional memory/storage space.
I have not looked into the details of how token.FileSet are created, but Pos order might have the same set of problems: a different order of files passed to the tool creates a different order of Files, which creates different Pos orders.
Position
s might be doable but I am not sure if we can always rely on these.15 remaining items