Skip to content

Fail on package data include package data combination #1887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

vanschelven
Copy link

@vanschelven vanschelven commented Oct 24, 2019

Disallow the combination of include_package_data=True and explicit package_data

As per @pganssle 's suggestion number 2 on the issue.

Closes #1461

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d. See documentation for details

@pganssle
Copy link
Member

Thanks for this @vanschelven! Can you please add a test for this?

I'm also wondering if maybe this should be a warning rather than an error, at least at first? This isn't really something where the build is necessarily broken if you do it, and it may cause unnecessary headaches for people to find that their builds are suddenly broken when they used to "work" just fine.

Of course, most people don't ever see warnings, especially not in setuptools, so it might make no material difference. @jaraco @benoit-pierre What do you think - warn out of politeness, or raise an exception out of practicality?

@jaraco
Copy link
Member

jaraco commented Oct 30, 2019

Reading through the diff, it seems it has always been the intention that the union of the two options is honored (pganssle's preferred option in the issue). I don't think we want to adopt this proposed behavior, as I suspect it will be likely to cause problems for legitimate use-cases.

@pganssle
Copy link
Member

pganssle commented Oct 31, 2019

@jaraco I think I decided in that ticket that my preferred option was infeasible because it would subtly change what gets included in your packages, and if we were to change it we might get packages including different things based on what version of setuptools it's built with (which is a bad scene).

That said, it is almost certainly true that no one is deliberately counting on package_data being ignored, and probably in most cases people who have specified both simply haven't noticed that package_data is being ignored. I also can't think of too many situations where suddenly including things specified in package_data would break anything (in fact quite the opposite - people will likely complain that sometimes their packages don't include the data if they sometimes build with an old version of setuptools), so maybe I'm being too cautious about how "backwards incompatible" this change would be.

I still like the idea of phasing out include_package_data entirely and replacing it with a find_package_data function, or adding a new option like include_manifest_specified_files that has the new "union" behavior. It seems much less magical to me, and more closely mirrors how we handle python packages.

@vanschelven
Copy link
Author

Regarding warning-before-error:

I initially imagined that the error would almost exclusively be raised on the machine of original package-developers, and hence would be easy to solve. That's why I skipped on Warnings altogether. I now realise that there are quite a few scenarios in which this is not so. Namely: all scenarios in which packages are not installed as wheels, but built locally.

On the other hand: I seriously doubt it will make a difference in practice: it's not like this warning will be displayed on its own; it's somewhere in there with the whole output of setup.py.

Regarding the backwards incompatability: I would agree that people would not generally deliberately depend on package_data being ignored. However: I'd say that in practice it is, and I would assume that the results of packaging, though not always as intended, would tend to converge over time to something that is functioning. That is: functioning from a usage perspective, even though package_data is being ignored.

Anyways... if you guys are willing to go ahead with the present solution, I'll gladly add some tests.

@jaraco
Copy link
Member

jaraco commented Nov 1, 2019

I still like the idea of phasing out include_package_data entirely and replacing it with a find_package_data function, or adding a new option like include_manifest_specified_files that has the new "union" behavior. It seems much less magical to me, and more closely mirrors how we handle python packages.

An important aspect of include_package_data is that it's not specific to manifest-specified files. In most of the packages I maintain, I rely on include_package_data=True to ensure that data files in the packages are included in the distribution... and there's no manifest specified - the manifest is derived from scm-metadata. I'd hate for that parameter to become deprecated in favor of an option whose namesake actually doesn't map to the purpose for which I'm invoking it (including package data).

Most importantly, whatever solution is devised, it must be possible to do what is possible now - to configure a project such that files added in the SCM are automatically included in the package and to do that in a generic way that needs no details about the particular project.

I should note too that this change is likely to break projects that rely on file-finders and happen to define package_data (even though it's being ignored). This is the type of change that needs to have a deprecation period (and possibly a very long one, because setuptools aims to be able to build previous-published projects).

@pganssle
Copy link
Member

pganssle commented Nov 1, 2019

An important aspect of include_package_data is that it's not specific to manifest-specified files. In most of the packages I maintain, I rely on include_package_data=True to ensure that data files in the packages are included in the distribution... and there's no manifest specified - the manifest is derived from scm-metadata. I'd hate for that parameter to become deprecated in favor of an option whose namesake actually doesn't map to the purpose for which I'm invoking it (including package data).

Yeah, this is a major problem with include_package_data - it's super hard to understand what it means. The documentation says it includes "any data files" (data file does not seem well-defined here, but I assume it means anything not .py) in the directories specified by Manifest.in, but then elsewhere it's mentioned that it also looks for "anything under source control".

It seems to me that we should either use some sort of function that generates an input to package_data following certain search patterns or we can replace include_package_data with one or more enums, so you can do something like package_data=setuptools.MANIFEST_DATA | setuptools.SOURCE_CONTROL_DATA or something of that nature.

To me the important thing is that right now include_package_data sounds like it would be an option that is necessary for package_data to be used, but in fact it's mutually exclusive with package_data in some ways. I think they either need to be a single option (so that you can't accidentally specify both), or we need the flags to be more explicit about how they are modifying what gets included in the package. I am not a huge fan of flags that change the behavior of other parameters, which is why I'm more in favor of solutions that allow us to get the include_package_data behavior by passing something to package_data.

@vanschelven
Copy link
Author

Anything I could do to get this out of analysis-paralysis? I was thinking of reducing the scope of the solution somewhat, by changing the error into a warning. That would avoid breakage, and the documentation-changes already present in the PR would help avoid future confusion.

If this is something we can all agree on, I'll gladly proceed accordingly and will add those tests that were requested.

@vanschelven
Copy link
Author

By the way, to think that the combination of parameters has some useful behaviour appears to be a common error:

https://github.com/search?q=%22from+setuptools+import+setup%22+%22package_data%3D%22+%22include_package_data%3DTrue%22++filename%3Asetup.py&type=Code

@pganssle
Copy link
Member

pganssle commented Nov 9, 2019

I think a warning is the least we can do here and seems uncontroversial, so let's start with that.

@vanschelven vanschelven force-pushed the fail-on-package_data-include_package_data-combination branch from 8112c2f to 44f8fac Compare November 15, 2019 22:31
@vanschelven
Copy link
Author

I think a warning is the least we can do here and seems uncontroversial, so let's start with that.

I updated my implementation accordingly approximately one month ago. Anything I can do to push this forward?

@jaraco jaraco self-assigned this Dec 16, 2019
@jaraco
Copy link
Member

jaraco commented Dec 16, 2019

I'll try to get to merging this within a week. Don't hesitate to ping me if I haven't addressed it by the coming Monday.

@@ -602,3 +605,27 @@ def test_prune(self):
file_list.sort()
assert file_list.files == ['a.py', ml('f/f.py')]
self.assertWarnings()

def test_warning_on_meaningless_parameter_combination(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a test for this behavior. Tests are mainly for behavior we wish to enforce. This particular behavior, that a warning is issued, is incidental to the intention. I'm not sure it quite captures what we need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying we shouldn't add a test for the warning at all, or that this particular test doesn't correctly cover the use case?

I personally think we should be testing that a warning is raised in the right conditions (and, preferably, asserting that no warning is raised under the wrong conditions as well, though I think #1823 should cover us in that situation, since any uncaptured warning would be considered a failure).

In general I don't love the idea of exact matches on the warning message, though. To me the user-facing behavior we want to enforce is, "A warning will be issued in this situation that describes the problem," so the precise wording of 'Distutils 'package_data'.... Normally I would avoid false positives by checking the warning type, but I do think it's common to use the self.warn mechanism here instead of the warnings module.

I think maybe we can do something a bit looser, use a assert_called_once and then check that the message includes package_data and include_package_data, possibly with a comment like, "This is a heuristic to detect whether the correct warning was raised; the exact warning message is not guaranteed to be stable."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This test does provide some value. In fact, it enabled me to readily confirm my suspicions that the behavior wasn't quite what was intended.

My main concern with this test (and its phrasing) is that it seems to indicate that supplying these two parameters is intended to be incompatible, which contradicts the previously-documented intention that these parameters should be compatible. Assuming we find an acceptable implementation, I'll tweak the phrasing and maybe add a docstring to the test to explain how this test is describing behavior and not prescribing behavior.

if not self.distribution.include_package_data:

if self.distribution.include_package_data:
self.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this warning may be issued unconditionally, even if package_data was not indicated. That's going to cause a lot of spurious warnings for use-cases that don't use package_data but use include_package_data. I'll test to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed. Making this change to the tests, the tests still pass:

diff --git a/setuptools/tests/test_manifest.py b/setuptools/tests/test_manifest.py
index 25e65ce5..a1d90aad 100644
--- a/setuptools/tests/test_manifest.py
+++ b/setuptools/tests/test_manifest.py
@@ -611,7 +611,6 @@ class TestFileListTest(TempDirTestCase):
         self.make_files([ml('app/__init__.py'), ml('app/e.py')])
 
         setup_attrs = {
-            "package_data": {"some": ["data"]},
             "include_package_data": True,
             'packages': ['app'],
         }

Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really intending to merge this change tonight, but unfortunately, it's not suitable as drafted, as it adds the warning even if package_data is not specified.

@jaraco jaraco force-pushed the fail-on-package_data-include_package_data-combination branch from c08d07b to edaf1b5 Compare December 22, 2019 03:28
@jaraco
Copy link
Member

jaraco commented Dec 22, 2019

I've pushed 3910bbb to setuptools master... and rebased the above commits on that as the pr-1887-rebase branch. I suggest force-pushing those commits to your branch.

@vanschelven
Copy link
Author

Thanks for the serious review, and sorry for introducing the mistake. I don't have time looking at this in the coming weeks, so feel free to take it from here on my branch.

@jaraco
Copy link
Member

jaraco commented Apr 5, 2020

I'm not sure I'll have time to work on this either. I'm going to close this PR, but welcome others to pick up #1461 either from 3910bbb or from scratch.

@jaraco jaraco closed this Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignoring package_data when include_package_data is True
3 participants