-
Notifications
You must be signed in to change notification settings - Fork 75
Source package files lost their timestamps #450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here is the reference to the bug in debian: |
Thanks for the report. Indeed the sdist is created without specifying a mtime, which defaults to 0. It is not difficult to fix it, but I don't know which other mtime it makes sense to use. We could use the same mtime set by However, meson-python picks up uncommitted changes to files that would anyhow be present in the dist tarball, despite To confuse things a bit more, meson-python uses the value set in I think we should drop the support for picking up uncommitted changes, use the mtime set by @rgommers comments? |
The rationale for the current behavior is in #53 |
The rationale is that some packages require modifications to the source tree in the sdist. But I don't see why these cannot be applied with a meson dist script https://mesonbuild.com/Reference-manual_builtin_meson.html#mesonadd_dist_script I find the current behavior of including modifications to committed files, but not new files, quite problematic. Requiring to use a dist script to modify the source tree before archival should take care of this issue. |
Thanks for the report @kif and the diagnosis @dnicolodi!
This would be pretty problematic, this feature is quite important. And I don't see why we'd have to couple this to the timestamp problem.
Keeping the
That seems to be a reproducibility feature, and at first sight also unrelated to the current bug report? It was touched in gh-58 (which fixed gh-53), but already present and not introduced to deal with uncommitted changes. The original Debian bug report does not mention |
Why the same cannot be achieved with a meson dist script? This couple to the mtime issue because the mtime we set should IMO represent the latest modification time of the content of the sdist, and if we pick file content from uncommitted changes we cannot guarantee that the meson dist mtime is the correct one.
Unless we extract the files archived by meson dist and compare their content with the one in the source tree, we cannot know if they have uncommitted changes. I don't think doing this for all the files in the sdist is something we want to do. Thus we would need to settle for the current time or the meson mdist time independently of the fact that the files have been modified.
It is a reproducibility work-around for tools that otherwise produce non-reproducible archives. It is not a feature in itself. If we can produce the same archive independently of when the command to produce the archive is run, there is no use for it.
We use |
That is something that is probably not a reasonable thing to do, right? It's the end of my day here, so I can only have a closer look tomorrow, but setting the time to 0 is something that I'm not sure about (also not that this was implemented intentionally).
This breaks the regular workflow for applying patches for distro packagers. Distros can take sources from either PyPI or directly from a tag on GitHub or another VCS system. The workflow is to download, then apply a set of patches with |
Uh? sdist are source distributions. Distributions build binary distributions from sources, obtained from either sdists or from git checkouts. The Debian bug in this issue is indeed that the source distribution for some packages is invalid. Why would a distribution take a sdist, modify it, and then build another sdist? They would apply changes to the source tree and then build a binary package, not another source package. Is this related to pypa/build building an sdist first and a wheel from that? This behavior can be turned off, can't it? |
This is the source of the bug reported here. I'm just trying to understand what the 0 should be replaced with. |
@picca or @kif I now checked the linked bug reports, but it does not say what exactly breaks. It's a bit surprising, because we've had no complaints about this so far from any other packages. Could you elaborate, either here or on one of the linked
This seems to be the time that the last commit was made for all files, and not the time at which a given file was last changed. It's documented to work like that: https://git-scm.com/docs/git-archive#Documentation/git-archive.txt---mtimelttimegt. So that doesn't match the output of the conventional meaning of mtime (or
I guess that's probably less useful than either using the time of last commit or zero, because it breaks reproducibility. I think a clear requirement is that generating an sdist twice in a row, in the same environment, should give the same result. On the other hand, it's not yet clear to me what the additional requirement is that Debian seems to want here.
I think so. One can directly build a wheel with I now remember another reason, which is that both myself and other SciPy devs were confused multiple times that uncommitted changes were not being installed. This is confusing, since it's a "why aren't my changes showing" up kind of thing (especially when Here is an example of where we amend a license file with platform-specific info: https://github.com/scipy/scipy/blob/main/tools/wheels/cibw_before_build_win.sh#L8. That's now a one-liner. Having to do that via a dist script would be very clumsy, and committing the change would be in itself a source of non-reproducibility (since the git hash changes, and that's part of the sdist and wheel artifacts). tl;dr it's not yet clear that there is actually a problem here that we should consider a bug. And discussing removing a feature that I really needed is certainly premature. Let's focus on the requirement on time stamps first. |
The only requirement Debian adds is that the mtime should not be zero (ie 1970-01-01 00:00:00). If we include uncommitted changes, using the timestamp of the last commit is wrong. I don't think we should do that.
But this is precisely what you are asking. You want the sdist to be different from the content of the git repository.
Are you saying that including uncommitted changes in the sdist is a work-around for the choice made by pypa/build to build wheels via an intermediate sdist by default? I don't think it is a particularly strong reason. Also, I expect most developers to install with
This scripts seems to be used to build wheels, not sdists, or are you saying that SciPy distributes different source packages on different architectures?
I'm just not sure that you need the feature, and it is absolutely confusing to have a big warning that says that uncommitted changes are not included in the sdist, but then include (some of) them. From what I understand, you want to have uncommitted changes included in the sdist, and the members of the sdist archive to be assigned timestamps of the last git commit. We can certainly do that, but it does not make much sense. |
Where is this script used? I haven't found it used where I expected it to be used ie in the wheel build CI jobs. Edit: found, it is used in here https://github.com/scipy/scipy/blob/fe05d2e6c36c46d0afddba9f10f715979a5b7fd8/.github/workflows/windows.yml |
Can we please focus on this first Daniele? I have not seen any evidence of this yet, and I'm really busy - Python 3.12.0rc1 came out yesterday, and everyone wants numpy wheels for it asap. So I am quite reluctant to go dig up old stuff right this minute and discuss whether or not the feature that I needed last year can be ripped out. |
Hi, I tried to set the SOURCE_DATE_EPOCH environment variable but it is not used:
So apparently, meson-python does not "properly" use SOURCE_DATE_EPOCH... |
No, it is not. As I was explaining before, and as it is shown by your experiment, it is used only for the |
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
Sorry for the delay @dnicolodi, and thank you for putting up a fix! I'll start reviewing now, and will try to get through the backlog of reviews this week. |
Set the mtime of all archive members to the mtime set by meson dist, also for files that may have been locally modified. Use the current time for the PKG-INFO file. Drop support for the $SOURCE_DATE_EPOCH environment variable when creating the sdist. This environment variable is for meant to be used as a work-around for build tools that produce non-reproducible binary packages. It does not have a standardized meaning for tools producing source distributions. See https://reproducible-builds.org/docs/source-date-epoch/ Fixes mesonbuild#450.
Even with all of the discussion here, it seems that a solution is still in the works. However, I'd like to note that the decision of using a timestamp of 0 should not be persisted at least. If the last-commit-date (e.g., what to do if a source repo is nowhere to be found?) or |
@mathstuf The fix is waiting for review in #452. The timestamp of 0 is a bug and it has never been a conscious decision to use it for anything. Where did you get the idea that 0 has been considered as a valid timestamp to use? |
Sorry, I guess I was misinterpreting the initial "decision" to use 0 in the first place. If a VCS is always available, then that is fine. I just wanted to add that 0 has other problems than the reported Debian issue. |
While using meson-python in (a couple of) python packages, I noticed the files stored in the archive have no valid timestamps (they are all 01/01/1970). The archive is obtained with a
python -m build --sdist
.silx-kit/fabio#540
silx-kit/pyFAI#1917
@picca, who packages the software for debian, reported this feature breaks the automatic packaging.
I quickly investigated, and both
git archive
andmeson dist
produce archive with valid timestamps for packaged files.The text was updated successfully, but these errors were encountered: