Closed
Description
Recent work in gopls resulted in the creation of an internal package for computing text differences in the manner of the UNIX diff
command, for applying those differences to a file in the manner of the patch
command, and for presenting line-oriented diffs using +/- prefix notation aka GNU "unified" diff format (diff -u). Diff functionality is invaluable for developer tools that transform source files, and for tests that compare expected and actual outputs. We propose to publish our diff package with the public API shown below.
// Package diff computes differences between text files or strings.
package diff // import "golang.org/x/tools/diff"
// -- diff --
// Strings computes the differences between two strings.
// The resulting edits respect rune boundaries.
func Strings(before, after string) []Edit
// Bytes computes the differences between two byte slices.
// The resulting edits respect rune boundaries.
func Bytes(before, after []byte) []Edit
// An Edit describes the replacement of a portion of a text file.
type Edit struct {
Start, End int // byte offsets of the region to replace
New string // the replacement
}
func (e Edit) String() string
// -- apply --
// Apply applies a sequence of edits to the src buffer and returns the
// result. Edits are applied in order of start offset; edits with the
// same start offset are applied in the order they were provided.
//
// Apply returns an error if any edit is out of bounds,
// or if any pair of edits is overlapping.
func Apply(src string, edits []Edit) (string, error)
// ApplyBytes is like Apply, but it accepts a byte slice.
// The result is always a new array.
func ApplyBytes(src []byte, edits []Edit) ([]byte, error)
// SortEdits orders a slice of Edits by (start, end) offset.
// This ordering puts insertions (end = start) before deletions
// (end > start) at the same point, but uses a stable sort to preserve
// the order of multiple insertions at the same point.
// (Apply detects multiple deletions at the same point as an error.)
func SortEdits(edits []Edit)
// -- unified --
// Unified returns a unified diff of the old and new strings.
// If the strings are equal, it returns the empty string.
// The old and new labels are the names of the old and new files.
func Unified(oldLabel, newLabel, old, new string) string
// ToUnified applies the edits to content and returns a unified diff.
// It returns an error if the edits are inconsistent; see [Apply].
// The old and new labels are the names of the content and result files.
func ToUnified(oldLabel, newLabel, content string, edits []Edit) (string, error)
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
pjweinb commentedon Mar 6, 2023
The idea of doing this seems fine, but perhaps the documentation should include a few caveats to cover the following points. (There's lots of ways to write diff, and sadly no best one. This one is for inputs that are fairly similar, or fairly short.)
bcmills commentedon Mar 6, 2023
See previously #23113, #41980.
ianlancetaylor commentedon Mar 6, 2023
See also #45200.
DeedleFake commentedon Mar 6, 2023
Instead of
SortEdits()
, maybefunc (Edit) Less(Edit) bool
would be more general. It's pretty straightforward to plug in viaslices.SortFunc(edits, diff.Edit.Less)
.adonovan commentedon Mar 6, 2023
I agree that we should guarantee only that the composition of diff.Strings and diff.Apply is the identity, and nothing about the specific edits that it returns. We should probably also define the Unified text form in more detail.
Thanks for the links to related proposals. There's a fair bit of interest in both the narrow concept of text diff as proposed here, and in richer kinds of structured value diff for use in testing. If some form of the latter is accepted into the standard library, then perhaps simple text diff, on which it would depend, would also need to be in the standard library, though not necessarily exposed. I'm going to resist the temptation to argue that this should be a standard package. We can always do that later.
adonovan commentedon Mar 6, 2023
I tried that initially, but it turns out to be incorrect: it's imperative that you use sort.Stable for edits since insertions at the same point must preserve their relative order.
DeedleFake commentedon Mar 6, 2023
There is also a
slices.SortStableFunc()
.Alternatively, you could add a mechanism to
Edit
that could help them maintain their relative ordering outside of external context such as a numbered priority. Otherwise, if you need a function to sort a[]Edit
, the implicit assumption is that an unsorted slice is likely to be obtained from somewhere, but if the relative ordering of the elements of that slice is important than there's also an assumption that that slice will always already be partially ordered correctly. That seems kind of error-prone to me.adonovan commentedon Mar 6, 2023
The definition of Apply makes clear that the slice of edits is a list, not a set: the relative ordering of insertions is important. But Apply can call SortEdits internally. Within gopls, we use SortEdits after merging lists of edits to the same file, but simple concatenation should suffice. It's also used to ensure to ensure a deterministic order, which some clients have mistakenly assumed.
Perhaps we should remove SortEdits from the API and let gopls implement its own copy of that function.
earthboundkid commentedon Mar 6, 2023
Can the Strings and Bytes functions be unified behind
[byteseq string|[]byte]
? Perhapsdiff.Of
?adonovan commentedon Mar 6, 2023
They could, but it seems like a lot of trouble just to achieve name overloading.
earthboundkid commentedon Mar 6, 2023
The other way around, I feel like it's a lot of work to have duplicate Strings and Bytes functions that work the same way instead of having callers cast their []byte to string or having a single generic function.
23 remaining items