Skip to content

Pip filter phony packages #4184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Pip filter phony packages #4184

wants to merge 4 commits into from

Conversation

sschuberth
Copy link
Member

Please have a look at the individual commit messages for the details.

@sschuberth sschuberth requested a review from a team as a code owner June 17, 2021 11:20
@sschuberth
Copy link
Member Author

FYI @mmurto.

@mmurto
Copy link
Contributor

mmurto commented Jun 17, 2021

FYI @mmurto.

I'll test this asap!

fviernau
fviernau previously approved these changes Jun 17, 2021
@@ -679,7 +679,7 @@ class Pip(
}

val declaredLicenses = sortedSetOf<String>()
getLicenseFromLicenseField(map["License"]?.single())?.let { declaredLicenses += it }
map["License"]?.mapNotNullTo(declaredLicenses) { getLicenseFromLicenseField(it) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit: Do you have a reference to the case (package) where this problem appeared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. According to @mmurto's report in the chat it should be one of dill Django psycopg2 pymongo python-dateutil pytz PyYAML sqlparse (unclear in which version), but I cannot spend the time to find out exactly now.

Copy link
Member

@fviernau fviernau Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering whether the issue could be a bug in lines 660 - 678. Maybe we just wait for @mmurto 's test results.

Copy link
Member Author

@sschuberth sschuberth Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's @mmurto's feedback from the chat:

Found the probable culprit: one of the packages has the whole license text of MIT in setup.py's setup() functions license field 🙂

So given it's a clear misuse of the "License" field, should I drop this change again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we should not fail hard but only warn if the "License" contains multiple fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit. Please have a look @fviernau. And @mmurto you're welcome to test once more 😉

Copy link
Contributor

@mmurto mmurto Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analyzer succeeds, but declared licenses contain it all:

        declared_licenses:
        - "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER"
        - <redacted>
        - "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL\
          \ THE"
        - "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,"
        - "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\
          \ FROM,"
        - "MIT License"
        - "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\
          \ IN"
        - "Permission is hereby granted, free of charge, to any person obtaining a\
          \ copy"
        - "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS\
          \ OR"
        - "THE SOFTWARE."
        - "The MIT License"
        - "The above copyright notice and this permission notice shall be included\
          \ in"
        - "all copies or substantial portions of the Software."
        - "copies of the Software, and to permit persons to whom the Software is"
        - "furnished to do so, subject to the following conditions:"
        - "in the Software without restriction, including without limitation the rights"
        - "of this software and associated documentation files (the \"Software\"),\
          \ to deal"
        - "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell"
        declared_licenses_processed:
          spdx_expression: "MIT"
          mapped:
            MIT License: "MIT"
            The MIT License: "MIT"
          unmapped:
          - "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER"
          - <redacted>
          - "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL\
            \ THE"
          - "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,"
          - "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\
            \ FROM,"
          - "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\
            \ IN"
          - "Permission is hereby granted, free of charge, to any person obtaining\
            \ a copy"
          - "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS\
            \ OR"
          - "THE SOFTWARE."
          - "The above copyright notice and this permission notice shall be included\
            \ in"
          - "all copies or substantial portions of the Software."
          - "copies of the Software, and to permit persons to whom the Software is"
          - "furnished to do so, subject to the following conditions:"
          - "in the Software without restriction, including without limitation the\
            \ rights"
          - "of this software and associated documentation files (the \"Software\"\
            ), to deal"
          - "to use, copy, modify, merge, publish, distribute, sublicense, and/or\
            \ sell"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzer run succeeded with the following result:

    - id: "PIP::<redacted>:5743024ffd8a42b806a13060104d861146d4e5ba"
      definition_file_path: "<redacted>/requirements.txt"
      authors:
      - "<redacted>"
      declared_licenses:
      - "\nThe MIT License \nCopyright (c) 2021- <redacted>\

@mmurto, is the "<redacted>/requirements.txt" part in here really correct? Shouldn't it be "<redacted>/setup.py"?

Because above you wrote

the packages has the whole license text of MIT in setup.py

and requirements.txt does not actually support declaring a license AFAIK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzer run succeeded with the following result:

    - id: "PIP::<redacted>:5743024ffd8a42b806a13060104d861146d4e5ba"
      definition_file_path: "<redacted>/requirements.txt"
      authors:
      - "<redacted>"
      declared_licenses:
      - "\nThe MIT License \nCopyright (c) 2021- <redacted>\

@mmurto, is the "<redacted>/requirements.txt" part in here really correct? Shouldn't it be "<redacted>/setup.py"?

Because above you wrote

the packages has the whole license text of MIT in setup.py

and requirements.txt does not actually support declaring a license AFAIK.

The part with requirements.txt is correct, the project's requirements.txt contains only ., which leads to pip getting the dependencies from setup.py, some info here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part with requirements.txt is correct, the project's requirements.txt contains only ., which leads to pip getting the dependencies from setup.py, some info here.

Wow, subtle!

fviernau
fviernau previously approved these changes Jun 17, 2021
So far phony packages have only been filtered out from the dependency
tree, but they should also be removed from the list of installed
packages.

Signed-off-by: Sebastian Schuberth <[email protected]>
There was a case reported where the "single()" failed due to multiple
licenses. While that might have been an issue with "pip", simply support
that case as it is easy to do so, like we already do it for the
classifiers one line below.

Signed-off-by: Sebastian Schuberth <[email protected]>
There are projects which put the full license text instead of just the
license name into a license field. Omit such texts from the list of
declared licenses by only accepting licenses that do not contain a
newline character.

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth force-pushed the pip-filter-phony-packages branch from 996479a to ee75aac Compare July 9, 2021 10:29
@sschuberth sschuberth requested review from fviernau and a team July 9, 2021 11:00
@sschuberth sschuberth enabled auto-merge (rebase) July 9, 2021 11:00
The field is specified to be a "short string" which is "a single line of
text, not more than 200 characters" [1]. Respect that limit, which also
filters out cases where people add full license texts to the field.

[1] https://docs.python.org/3/distutils/setupscript.html#additional-meta-data

Signed-off-by: Sebastian Schuberth <[email protected]>
@sschuberth sschuberth force-pushed the pip-filter-phony-packages branch from 8940a6f to 8c7b86e Compare July 9, 2021 11:06
@sschuberth
Copy link
Member Author

Superseded by #5319.

@sschuberth sschuberth closed this May 4, 2022
auto-merge was automatically disabled May 4, 2022 13:02

Pull request was closed

@sschuberth sschuberth deleted the pip-filter-phony-packages branch May 4, 2022 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants