Skip to content

Pip filter phony packages #4184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions analyzer/src/main/kotlin/managers/Pip.kt
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@ class Pip(
}

companion object {
private const val SHORT_STRING_MAX_CHARS = 200

private val INSTALL_OPTIONS = arrayOf(
"--no-warn-conflicts",
"--prefer-binary"
Expand Down Expand Up @@ -436,13 +438,15 @@ class Pip(
return declaredLicenses
}

private fun getLicenseFromLicenseField(value: String?): String? =
value?.let {
// Work-around for projects that declare licenses in classifier-style syntax.
getLicenseFromClassifier(it) ?: it
}?.takeUnless {
it.isBlank() || it == "UNKNOWN"
}
private fun getLicenseFromLicenseField(value: String?): String? {
if (value.isNullOrBlank() || value == "UNKNOWN") return null

val isShortString = value.length <= SHORT_STRING_MAX_CHARS && "\n" !in value
if (!isShortString) return null

// Apply a work-around for projects that declare licenses in classifier-syntax in the license field.
return getLicenseFromClassifier(value) ?: value
}

private fun getLicenseFromClassifier(classifier: String): String? =
// Example license classifier:
Expand Down Expand Up @@ -643,8 +647,10 @@ class Pip(

val rootNode = jsonMapper.readTree(json) as ArrayNode

return rootNode.elements().asSequence().mapTo(mutableSetOf()) {
Identifier("PyPI", "", it["name"].textValue(), it["version"].textValue())
return rootNode.elements().asSequence().mapNotNullTo(mutableSetOf()) {
val name = it["name"].textValue()
val version = it["version"].textValue()
Identifier("PyPI", "", name, version).takeUnless { isPhonyDependency(name, version) }
}
}

Expand Down Expand Up @@ -677,7 +683,7 @@ class Pip(
}

val declaredLicenses = sortedSetOf<String>()
getLicenseFromLicenseField(map["License"]?.single())?.let { declaredLicenses += it }
map["License"]?.mapNotNullTo(declaredLicenses) { getLicenseFromLicenseField(it) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit: Do you have a reference to the case (package) where this problem appeared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. According to @mmurto's report in the chat it should be one of dill Django psycopg2 pymongo python-dateutil pytz PyYAML sqlparse (unclear in which version), but I cannot spend the time to find out exactly now.

Copy link
Member

@fviernau fviernau Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just wondering whether the issue could be a bug in lines 660 - 678. Maybe we just wait for @mmurto 's test results.

Copy link
Member Author

@sschuberth sschuberth Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's @mmurto's feedback from the chat:

Found the probable culprit: one of the packages has the whole license text of MIT in setup.py's setup() functions license field 🙂

So given it's a clear misuse of the "License" field, should I drop this change again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we should not fail hard but only warn if the "License" contains multiple fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit. Please have a look @fviernau. And @mmurto you're welcome to test once more 😉

Copy link
Contributor

@mmurto mmurto Jun 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analyzer succeeds, but declared licenses contain it all:

        declared_licenses:
        - "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER"
        - <redacted>
        - "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL\
          \ THE"
        - "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,"
        - "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\
          \ FROM,"
        - "MIT License"
        - "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\
          \ IN"
        - "Permission is hereby granted, free of charge, to any person obtaining a\
          \ copy"
        - "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS\
          \ OR"
        - "THE SOFTWARE."
        - "The MIT License"
        - "The above copyright notice and this permission notice shall be included\
          \ in"
        - "all copies or substantial portions of the Software."
        - "copies of the Software, and to permit persons to whom the Software is"
        - "furnished to do so, subject to the following conditions:"
        - "in the Software without restriction, including without limitation the rights"
        - "of this software and associated documentation files (the \"Software\"),\
          \ to deal"
        - "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell"
        declared_licenses_processed:
          spdx_expression: "MIT"
          mapped:
            MIT License: "MIT"
            The MIT License: "MIT"
          unmapped:
          - "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER"
          - <redacted>
          - "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL\
            \ THE"
          - "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,"
          - "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\
            \ FROM,"
          - "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\
            \ IN"
          - "Permission is hereby granted, free of charge, to any person obtaining\
            \ a copy"
          - "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS\
            \ OR"
          - "THE SOFTWARE."
          - "The above copyright notice and this permission notice shall be included\
            \ in"
          - "all copies or substantial portions of the Software."
          - "copies of the Software, and to permit persons to whom the Software is"
          - "furnished to do so, subject to the following conditions:"
          - "in the Software without restriction, including without limitation the\
            \ rights"
          - "of this software and associated documentation files (the \"Software\"\
            ), to deal"
          - "to use, copy, modify, merge, publish, distribute, sublicense, and/or\
            \ sell"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzer run succeeded with the following result:

    - id: "PIP::<redacted>:5743024ffd8a42b806a13060104d861146d4e5ba"
      definition_file_path: "<redacted>/requirements.txt"
      authors:
      - "<redacted>"
      declared_licenses:
      - "\nThe MIT License \nCopyright (c) 2021- <redacted>\

@mmurto, is the "<redacted>/requirements.txt" part in here really correct? Shouldn't it be "<redacted>/setup.py"?

Because above you wrote

the packages has the whole license text of MIT in setup.py

and requirements.txt does not actually support declaring a license AFAIK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analyzer run succeeded with the following result:

    - id: "PIP::<redacted>:5743024ffd8a42b806a13060104d861146d4e5ba"
      definition_file_path: "<redacted>/requirements.txt"
      authors:
      - "<redacted>"
      declared_licenses:
      - "\nThe MIT License \nCopyright (c) 2021- <redacted>\

@mmurto, is the "<redacted>/requirements.txt" part in here really correct? Shouldn't it be "<redacted>/setup.py"?

Because above you wrote

the packages has the whole license text of MIT in setup.py

and requirements.txt does not actually support declaring a license AFAIK.

The part with requirements.txt is correct, the project's requirements.txt contains only ., which leads to pip getting the dependencies from setup.py, some info here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part with requirements.txt is correct, the project's requirements.txt contains only ., which leads to pip getting the dependencies from setup.py, some info here.

Wow, subtle!

map["Classifiers"]?.mapNotNullTo(declaredLicenses) { getLicenseFromClassifier(it) }

val authors = parseAuthorString(map["Author"]?.singleOrNull())
Expand Down