Skip to content

archive/zip: Writer.Copy references entire Reader data #65499

Closed
@jba

Description

@jba

Go version

go1.22-20240109-RC01 linux/amd64

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/usr/local/google/home/jba/.cache/go-build'
GOENV='/usr/local/google/home/jba/.config/go/env'
GOEXE=''
GOEXPERIMENT='fieldtrack,boringcrypto'
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/usr/local/google/home/jba/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/usr/local/google/home/jba/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org'
GOROOT='/usr/lib/google-golang'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/google-golang/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22-20240109-RC01 cl/597041403 +dcbe772469 X:fieldtrack,boringcrypto'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/usr/local/google/home/jba/repos/work/go.mod'
GOWORK='/usr/local/google/home/jba/repos/work/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1055306189=/tmp/go-build -gno-record-gcc-switches'

What did you do?

The following program is zipmem.go:

package main

import (
	"archive/zip"
	"bytes"
	"io"
	"log"
	"os"
	"path/filepath"
	"runtime"
	"runtime/pprof"
)

func main() {
	file := filepath.Join(runtime.GOROOT(), "lib/time/zoneinfo.zip")
	data, err := os.ReadFile(file)
	if err != nil {
		log.Fatal(err)
	}
	zr, err := zip.NewReader(bytes.NewReader(data), int64(len(data)))
	if err != nil {
		log.Fatal(err)
	}
	zw := zip.NewWriter(io.Discard)
	if err := zw.Copy(zr.File[0]); err != nil {
		log.Fatal(err)
	}
	writeHeapProfile()
	if err := zw.Close(); err != nil {
		log.Fatal(err)
	}
}

func writeHeapProfile() {
	const filename = "heap.prof"
	f, err := os.Create(filename)
	if err != nil {
		log.Fatal(err)
	}
	runtime.GC()
	if err := pprof.WriteHeapProfile(f); err != nil {
		log.Fatal(err)
	}
	if err := f.Close(); err != nil {
		log.Fatal(err)
	}
}

Shell command:

> go build zipmem.go && ./zipmem && go tool pprof zipmem heap.prof

Inside pprof:

(pprof) top

What did you see happen?

Showing nodes accounting for 1761.95kB, 100% of 1761.95kB total
      flat  flat%   sum%        cum   cum%
 1024.17kB 58.13% 58.13%  1024.17kB 58.13%  archive/zip.(*Reader).init
  737.78kB 41.87%   100%   737.78kB 41.87%  os.ReadFile
         0     0%   100%  1024.17kB 58.13%  archive/zip.NewReader
         0     0%   100%  1761.95kB   100%  main.main
         0     0%   100%  1761.95kB   100%  runtime.main

What did you expect to see?

The memory profile is written after a GC should have freed up the zip.Reader. So it's surprising to see the memory allocated by os.ReadFile is still there.

I believe the reason is this line in Writer.Copy:

fw, err := w.CreateRaw(&f.FileHeader)

CreateRaw stores the FileHeader pointer in the Writer w. The code that populates the FileHeader takes care to ensure that it shares no memory with the zip.Reader from which it came. Unfortunately, FileHeader is embedded in File, which does refer to the underlying reader:

type File struct {
    FileHeader
    ...
    zipr         io.ReaderAt // the argument to NewReader
    ...
}

I believe the pointer to the embedded FileHeader causes the entire File struct to be retained, including the bytes used for the reader.


The problem can be worked around by copying the embedded FileHeader. Luckily, Writer.Copy is a convenience method whose functionality can be written outside the standard library:

func zipWriterCopy(w *zip.Writer, f *zip.File) error {
	r, err := f.OpenRaw()
	if err != nil {
		return err
	}
	fh := f.FileHeader
	fw, err := w.CreateRaw(&fh)
	if err != nil {
		return err
	}
	_, err = io.Copy(fw, r)
	return err
}

I believe Writer.Copy should be fixed in this way. I don't think the fix should be in Writer.CreateRaw because users may depend on the identity of its argument. But its documentation should be updated to alert users to the problem.

Activity

gopherbot

gopherbot commented on Feb 3, 2024

@gopherbot
Contributor

Change https://go.dev/cl/560238 mentions this issue: archive/zip: reduce memory held by Writer.Copy

added
NeedsFixThe path to resolution is known, but the work has not been done.
on Feb 3, 2024
added this to the Unplanned milestone on Jul 13, 2024
andrewmbenton

andrewmbenton commented on Aug 18, 2024

@andrewmbenton
Contributor

It looks like this was merged.

ianlancetaylor

ianlancetaylor commented on Aug 19, 2024

@ianlancetaylor
Contributor

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.Performance

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @andrewmbenton@ianlancetaylor@gopherbot@seankhliao@jba

        Issue actions

          archive/zip: Writer.Copy references entire Reader data · Issue #65499 · golang/go