Skip to content

Source package files lost their timestamps #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kif opened this issue Aug 7, 2023 · 20 comments
Closed

Source package files lost their timestamps #450

kif opened this issue Aug 7, 2023 · 20 comments
Labels
bug Something isn't working
Milestone

Comments

@kif
Copy link

kif commented Aug 7, 2023

While using meson-python in (a couple of) python packages, I noticed the files stored in the archive have no valid timestamps (they are all 01/01/1970). The archive is obtained with a python -m build --sdist.

silx-kit/fabio#540
silx-kit/pyFAI#1917

@picca, who packages the software for debian, reported this feature breaks the automatic packaging.

I quickly investigated, and both git archive and meson dist produce archive with valid timestamps for packaged files.

@kif
Copy link
Author

kif commented Aug 7, 2023

Here is the reference to the bug in debian:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1041443

@dnicolodi
Copy link
Member

dnicolodi commented Aug 8, 2023

Thanks for the report. Indeed the sdist is created without specifying a mtime, which defaults to 0. It is not difficult to fix it, but I don't know which other mtime it makes sense to use. We could use the same mtime set by meson dist, which in turn is the same used by the underlying git archive.

However, meson-python picks up uncommitted changes to files that would anyhow be present in the dist tarball, despite python -m build -s displaying a "WARNING: Repository has uncommitted changes that will not be included in the dist tarball" warning. Therefore the current time would be a better choice. However, new files are not picked up.

To confuse things a bit more, meson-python uses the value set in SOURCE_DATE_EPOCH env var as mtime for the PKG-INFO archive file, if set.

I think we should drop the support for picking up uncommitted changes, use the mtime set by meson dist for all archive members, and drop support for SOURCE_DATE_EPOCH.

@rgommers comments?

@dnicolodi
Copy link
Member

The rationale for the current behavior is in #53

@dnicolodi
Copy link
Member

dnicolodi commented Aug 8, 2023

The rationale is that some packages require modifications to the source tree in the sdist. But I don't see why these cannot be applied with a meson dist script https://mesonbuild.com/Reference-manual_builtin_meson.html#mesonadd_dist_script

I find the current behavior of including modifications to committed files, but not new files, quite problematic. Requiring to use a dist script to modify the source tree before archival should take care of this issue.

@rgommers
Copy link
Contributor

rgommers commented Aug 8, 2023

Thanks for the report @kif and the diagnosis @dnicolodi!

I think we should drop the support for picking up uncommitted changes

This would be pretty problematic, this feature is quite important. And I don't see why we'd have to couple this to the timestamp problem.

use the mtime set by meson dist for all archive members,

Keeping the meson dist mtime seems like a good idea to me. In case of uncommitted changes, I think we can still keep the meson dist time, or the current time.

and drop support for SOURCE_DATE_EPOCH.

That seems to be a reproducibility feature, and at first sight also unrelated to the current bug report? It was touched in gh-58 (which fixed gh-53), but already present and not introduced to deal with uncommitted changes.

The original Debian bug report does not mention SOURCE_DATE_EPOCH, but probably it was set when those packages were built? See https://reproducible-builds.org/docs/source-date-epoch/ for what it's meant to do. tl;dr maybe this is a feature and not a bug?

@dnicolodi
Copy link
Member

I think we should drop the support for picking up uncommitted changes

This would be pretty problematic, this feature is quite important. And I don't see why we'd have to couple this to the timestamp problem.

Why the same cannot be achieved with a meson dist script? This couple to the mtime issue because the mtime we set should IMO represent the latest modification time of the content of the sdist, and if we pick file content from uncommitted changes we cannot guarantee that the meson dist mtime is the correct one.

use the mtime set by meson dist for all archive members,

Keeping the meson dist mtime seems like a good idea to me. In case of uncommitted changes, I think we can still keep the meson dist time, or the current time.

Unless we extract the files archived by meson dist and compare their content with the one in the source tree, we cannot know if they have uncommitted changes. I don't think doing this for all the files in the sdist is something we want to do. Thus we would need to settle for the current time or the meson mdist time independently of the fact that the files have been modified.

and drop support for SOURCE_DATE_EPOCH.

That seems to be a reproducibility feature, and at first sight also unrelated to the current bug report?

It is a reproducibility work-around for tools that otherwise produce non-reproducible archives. It is not a feature in itself. If we can produce the same archive independently of when the command to produce the archive is run, there is no use for it.

The original Debian bug report does not mention SOURCE_DATE_EPOCH, but probably it was set when those packages were built? See https://reproducible-builds.org/docs/source-date-epoch/ for what it's meant to do. tl;dr maybe this is a feature and not a bug?

We use SOURCE_DATE_EPOCH only for the modification time of the PKG-INFO file in the archive. All other files currently get an mtime of 0.

@rgommers
Copy link
Contributor

rgommers commented Aug 8, 2023

All other files currently get an mtime of 0.

That is something that is probably not a reasonable thing to do, right? It's the end of my day here, so I can only have a closer look tomorrow, but setting the time to 0 is something that I'm not sure about (also not that this was implemented intentionally).

Why the same cannot be achieved with a meson dist script?

This breaks the regular workflow for applying patches for distro packagers. Distros can take sources from either PyPI or directly from a tag on GitHub or another VCS system. The workflow is to download, then apply a set of patches with patch or another utility, then start building. Requiring packagers to not use patch et al. but a dist script is a very packager-unfriendly idea. I can dig up concrete places where this is used tomorrow. It would be a significant regression at this point imho.

@dnicolodi
Copy link
Member

Why the same cannot be achieved with a meson dist script?

This breaks the regular workflow for applying patches for distro packagers. Distros can take sources from either PyPI or directly from a tag on GitHub or another VCS system. The workflow is to download, then apply a set of patches with patch or another utility, then start building. Requiring packagers to not use patch et al. but a dist script is a very packager-unfriendly idea. I can dig up concrete places where this is used tomorrow. It would be a significant regression at this point imho.

Uh? sdist are source distributions. Distributions build binary distributions from sources, obtained from either sdists or from git checkouts. The Debian bug in this issue is indeed that the source distribution for some packages is invalid. Why would a distribution take a sdist, modify it, and then build another sdist? They would apply changes to the source tree and then build a binary package, not another source package. Is this related to pypa/build building an sdist first and a wheel from that? This behavior can be turned off, can't it?

@dnicolodi
Copy link
Member

All other files currently get an mtime of 0.

That is something that is probably not a reasonable thing to do, right? It's the end of my day here, so I can only have a closer look tomorrow, but setting the time to 0 is something that I'm not sure about (also not that this was implemented intentionally).

This is the source of the bug reported here. I'm just trying to understand what the 0 should be replaced with.

@rgommers
Copy link
Contributor

rgommers commented Aug 9, 2023

@picca, who packages the software for debian, reported this feature breaks the automatic packaging.

@picca or @kif I now checked the linked bug reports, but it does not say what exactly breaks. It's a bit surprising, because we've had no complaints about this so far from any other packages. Could you elaborate, either here or on one of the linked silx-kit issues, what exactly is going wrong. That'll be helpful not only now, but in the future.

We could use the same mtime set by meson dist, which in turn is the same used by the underlying git archive.

This seems to be the time that the last commit was made for all files, and not the time at which a given file was last changed. It's documented to work like that: https://git-scm.com/docs/git-archive#Documentation/git-archive.txt---mtimelttimegt. So that doesn't match the output of the conventional meaning of mtime (or ls -l) for files on disk either.

Therefore the current time would be a better choice.

I guess that's probably less useful than either using the time of last commit or zero, because it breaks reproducibility. I think a clear requirement is that generating an sdist twice in a row, in the same environment, should give the same result. On the other hand, it's not yet clear to me what the additional requirement is that Debian seems to want here.

Is this related to pypa/build building an sdist first and a wheel from that? This behavior can be turned off, can't it?

I think so. One can directly build a wheel with python -m build --wheel. However, having the wheel contents depend on whether or not it was generated directly from a git repo or via an sdist is not good.

I now remember another reason, which is that both myself and other SciPy devs were confused multiple times that uncommitted changes were not being installed. This is confusing, since it's a "why aren't my changes showing" up kind of thing (especially when setup.py builds would include them) and having to commit each intermediate step can be annoying.

Here is an example of where we amend a license file with platform-specific info: https://github.com/scipy/scipy/blob/main/tools/wheels/cibw_before_build_win.sh#L8. That's now a one-liner. Having to do that via a dist script would be very clumsy, and committing the change would be in itself a source of non-reproducibility (since the git hash changes, and that's part of the sdist and wheel artifacts).

tl;dr it's not yet clear that there is actually a problem here that we should consider a bug. And discussing removing a feature that I really needed is certainly premature. Let's focus on the requirement on time stamps first.

@dnicolodi
Copy link
Member

dnicolodi commented Aug 9, 2023

We could use the same mtime set by meson dist, which in turn is the same used by the underlying git archive.

This seems to be the time that the last commit was made for all files, and not the time at which a given file was last changed. It's documented to work like that: https://git-scm.com/docs/git-archive#Documentation/git-archive.txt---mtimelttimegt. So that doesn't match the output of the conventional meaning of mtime (or ls -l) for files on disk either.

Therefore the current time would be a better choice.

I guess that's probably less useful than either using the time of last commit or zero, because it breaks reproducibility. I think a clear requirement is that generating an sdist twice in a row, in the same environment, should give the same result. On the other hand, it's not yet clear to me what the additional requirement is that Debian seems to want here.

The only requirement Debian adds is that the mtime should not be zero (ie 1970-01-01 00:00:00).

If we include uncommitted changes, using the timestamp of the last commit is wrong. I don't think we should do that.

Is this related to pypa/build building an sdist first and a wheel from that? This behavior can be turned off, can't it?

I think so. One can directly build a wheel with python -m build --wheel. However, having the wheel contents depend on whether or not it was generated directly from a git repo or via an sdist is not good.

But this is precisely what you are asking. You want the sdist to be different from the content of the git repository.

I now remember another reason, which is that both myself and other SciPy devs were confused multiple times that uncommitted changes were not being installed. This is confusing, since it's a "why aren't my changes showing" up kind of thing (especially when setup.py builds would include them) and having to commit each intermediate step can be annoying.

Are you saying that including uncommitted changes in the sdist is a work-around for the choice made by pypa/build to build wheels via an intermediate sdist by default? I don't think it is a particularly strong reason. Also, I expect most developers to install with python -m pip install . rather than python -m build . && python -m pip install dist/*.whl. If they do the second, I don't see requiring an extra -w switch passed to build as problematic (it also spares them an extra meson setup step).

Here is an example of where we amend a license file with platform-specific info: https://github.com/scipy/scipy/blob/main/tools/wheels/cibw_before_build_win.sh#L8. That's now a one-liner. Having to do that via a dist script would be very clumsy, and committing the change would be in itself a source of non-reproducibility (since the git hash changes, and that's part of the sdist and wheel artifacts).

This scripts seems to be used to build wheels, not sdists, or are you saying that SciPy distributes different source packages on different architectures?

tl;dr it's not yet clear that there is actually a problem here that we should consider a bug. And discussing removing a feature that I really needed is certainly premature. Let's focus on the requirement on time stamps first.

I'm just not sure that you need the feature, and it is absolutely confusing to have a big warning that says that uncommitted changes are not included in the sdist, but then include (some of) them.

From what I understand, you want to have uncommitted changes included in the sdist, and the members of the sdist archive to be assigned timestamps of the last git commit. We can certainly do that, but it does not make much sense.

@dnicolodi
Copy link
Member

dnicolodi commented Aug 9, 2023

Here is an example of where we amend a license file with platform-specific info: https://github.com/scipy/scipy/blob/main/tools/wheels/cibw_before_build_win.sh#L8.

Where is this script used? I haven't found it used where I expected it to be used ie in the wheel build CI jobs.

Edit: found, it is used in here https://github.com/scipy/scipy/blob/fe05d2e6c36c46d0afddba9f10f715979a5b7fd8/.github/workflows/windows.yml

@rgommers
Copy link
Contributor

rgommers commented Aug 9, 2023

The only requirement Debian adds is that the mtime should not be zero (ie 1970-01-01 00:00:00).

Can we please focus on this first Daniele? I have not seen any evidence of this yet, and I'm really busy - Python 3.12.0rc1 came out yesterday, and everyone wants numpy wheels for it asap. So I am quite reluctant to go dig up old stuff right this minute and discuss whether or not the feature that I needed last year can be ripped out.

@kif
Copy link
Author

kif commented Aug 9, 2023

Hi, I tried to set the SOURCE_DATE_EPOCH environment variable but it is not used:

export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
echo $SOURCE_DATE_EPOCH
python3 -m build -s
tar -tvzf dist/pyfai-2023.6.0a0.tar.gz
[...]
-rw-r--r-- 0/0            3194 1970-01-01 01:00 pyfai-2023.6.0a0/pyproject.toml
-rw-r--r-- 0/0             146 1970-01-01 01:00 pyfai-2023.6.0a0/requirements.txt
-rwxr-xr-x 0/0           16959 1970-01-01 01:00 pyfai-2023.6.0a0/run_tests.py
-rw-r--r-- 0/0             281 1970-01-01 01:00 pyfai-2023.6.0a0/setup.cfg
-rwxr-xr-x 0/0            5402 1970-01-01 01:00 pyfai-2023.6.0a0/version.py
-rw-r--r-- 0/0           13697 2023-08-08 11:08 pyfai-2023.6.0a0/PKG-INFO

So apparently, meson-python does not "properly" use SOURCE_DATE_EPOCH...

@dnicolodi
Copy link
Member

dnicolodi commented Aug 9, 2023

Hi, I tried to set the SOURCE_DATE_EPOCH environment variable but it is not used:

No, it is not. As I was explaining before, and as it is shown by your experiment, it is used only for the PKG-INFO archive member. I don't know what is the reason why it is used only for that. It may simply be a bug.

dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Aug 15, 2023
Set the mtime of all archive members to the mtime set by meson dist,
also for files that may have been locally modified. Use the current
time for the PKG-INFO file.

Drop support for the $SOURCE_DATE_EPOCH environment variable when
creating the sdist. This environment variable is for meant to be used
as a work-around for build tools that produce non-reproducible binary
packages. It does not have a standardized meaning for tools producing
source distributions.

See https://reproducible-builds.org/docs/source-date-epoch/

Fixes mesonbuild#450.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Aug 15, 2023
Set the mtime of all archive members to the mtime set by meson dist,
also for files that may have been locally modified. Use the current
time for the PKG-INFO file.

Drop support for the $SOURCE_DATE_EPOCH environment variable when
creating the sdist. This environment variable is for meant to be used
as a work-around for build tools that produce non-reproducible binary
packages. It does not have a standardized meaning for tools producing
source distributions.

See https://reproducible-builds.org/docs/source-date-epoch/

Fixes mesonbuild#450.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Aug 15, 2023
Set the mtime of all archive members to the mtime set by meson dist,
also for files that may have been locally modified. Use the current
time for the PKG-INFO file.

Drop support for the $SOURCE_DATE_EPOCH environment variable when
creating the sdist. This environment variable is for meant to be used
as a work-around for build tools that produce non-reproducible binary
packages. It does not have a standardized meaning for tools producing
source distributions.

See https://reproducible-builds.org/docs/source-date-epoch/

Fixes mesonbuild#450.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Aug 15, 2023
Set the mtime of all archive members to the mtime set by meson dist,
also for files that may have been locally modified. Use the current
time for the PKG-INFO file.

Drop support for the $SOURCE_DATE_EPOCH environment variable when
creating the sdist. This environment variable is for meant to be used
as a work-around for build tools that produce non-reproducible binary
packages. It does not have a standardized meaning for tools producing
source distributions.

See https://reproducible-builds.org/docs/source-date-epoch/

Fixes mesonbuild#450.
dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Aug 15, 2023
Set the mtime of all archive members to the mtime set by meson dist,
also for files that may have been locally modified. Use the current
time for the PKG-INFO file.

Drop support for the $SOURCE_DATE_EPOCH environment variable when
creating the sdist. This environment variable is for meant to be used
as a work-around for build tools that produce non-reproducible binary
packages. It does not have a standardized meaning for tools producing
source distributions.

See https://reproducible-builds.org/docs/source-date-epoch/

Fixes mesonbuild#450.
@dnicolodi
Copy link
Member

The fix is waiting in #452.

@rgommers I had the impression that you wanted to land a fix for this quickly. Do you foresee having time to review the pending PRs?

@rgommers
Copy link
Contributor

Sorry for the delay @dnicolodi, and thank you for putting up a fix! I'll start reviewing now, and will try to get through the backlog of reviews this week.

dnicolodi added a commit to dnicolodi/meson-python that referenced this issue Aug 26, 2023
Set the mtime of all archive members to the mtime set by meson dist,
also for files that may have been locally modified. Use the current
time for the PKG-INFO file.

Drop support for the $SOURCE_DATE_EPOCH environment variable when
creating the sdist. This environment variable is for meant to be used
as a work-around for build tools that produce non-reproducible binary
packages. It does not have a standardized meaning for tools producing
source distributions.

See https://reproducible-builds.org/docs/source-date-epoch/

Fixes mesonbuild#450.
@mathstuf
Copy link

Even with all of the discussion here, it seems that a solution is still in the works. However, I'd like to note that the decision of using a timestamp of 0 should not be persisted at least. If the last-commit-date (e.g., what to do if a source repo is nowhere to be found?) or SOURCE_DATE_EPOCH (I mean, who sets this normally?) are not available, can the timestamp-of-last-resort be 1 instead of 0? The issue I ran into (scipy/scipy#19158) is that making a patch against such a timestamped file make git think the file doesn't exist when it has a 0 timestamp (see this function because GNU diff uses it to signal a creation/deletion event (other tools use /dev/null, but this is a separate, but equally strong, signal of such in git's code).

@dnicolodi
Copy link
Member

dnicolodi commented Aug 30, 2023

@mathstuf The fix is waiting for review in #452. The timestamp of 0 is a bug and it has never been a conscious decision to use it for anything. Where did you get the idea that 0 has been considered as a valid timestamp to use? meson-python creates source tarballs starting with the content of a version control repository (git or mercurial) thus the time of the last commit is always available.

@mathstuf
Copy link

Sorry, I guess I was misinterpreting the initial "decision" to use 0 in the first place. If a VCS is always available, then that is fine. I just wanted to add that 0 has other problems than the reported Debian issue.

@rgommers rgommers added the bug Something isn't working label Sep 1, 2023
@rgommers rgommers added this to the v0.14.0 milestone Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants