-
Notifications
You must be signed in to change notification settings - Fork 64
[f8a] It's possible to ingest the same package multiple times, under different names #3339
Comments
This is where we try to normalize package names now: https://github.com/fabric8-analytics/fabric8-analytics-worker/blob/6b9dd98004b1fcc63ae3257cc182dc2b146b05d2/f8a_worker/workers/init_analysis_flow.py#L32 But it clearly doesn't work as expected. |
PyPI package distribution is case insensitive - https://stackoverflow.com/questions/26503509/is-pypi-case-sensitive However it is possible that in subsequent steps the package name that is used is coming from medata generated by Mercator. I will investigate where it might happen. |
Encountered this error
Looks like the HTML structure of pypi.org is changed now, so the parsing fails to find repository description. |
I analyzed both
After a while when both the ingestion completed, I see only one package and one version entry in the graph. I haven't been able to reproduce the issue with these steps. Let me try something more. |
I was testing this via jobs service. |
Local setup is giving following errors with latest Docker images:
Due to above error I am not able to run ingestion on my system. Apparently, many others are facing same issue with latest pip3 recently: |
This time I used Jobs api to schedule the analyses for I can see the entries in Postgres, Minio and Graph. |
The issue is with Jobs API sending parameters directly to the flow engine without performing case transformation. Here is how to inspect this issue. Step 0Run component analysis for some package that ensures initialization of s3 buckets for package and version data ( This is only required if you are starting afresh )
Wait for analyses to complete. Step 1Now invoke component analysis for
Wait for analyses to complete. Go to Jobs UI, and check for bookkeeeping data for this package You will see all the workers which were executed for Step 2Finally run component analysis for
Wait for analyses to complete. Go to Jobs UI, and check for bookkeeeping data for this package You will see no workers which were executed for {
"error": "No result found."
} Step 3Now run the analyses using Jobs UI: http://localhost:34000/api/v1/jobs/selective-flow-scheduling?state=running {
"flow_arguments": [
{
"ecosystem": "pypi",
"force": true,
"force_graph_sync": true,
"name": "pyjwt",
"recursive_limit": 0,
"version": "1.6.1"
},
{
"ecosystem": "pypi",
"force": true,
"force_graph_sync": true,
"name": "PyJWT",
"recursive_limit": 0,
"version": "1.6.1"
}
],
"flow_name": "bayesianFlow",
"run_subsequent": false,
"task_names": []
} Wait for analyses to complete. This time if you check for worker data for {
"summary": {
"analysed_versions": [
"1.6.1"
],
"ecosystem": "pypi",
"package": "PyJWT",
"package_level_workers": [],
"package_version_count": 1
}
} However, if you check for worker data for This clearly means that the case normalization is happening properly at the workers level. Summary
SolutionPerform case normalization before submitting jobs via Jobs API or perform case normalization at the topmost node of the flow. |
@sivaavkd there is a fix for it. here fabric8-analytics/fabric8-analytics-jobs/pull/287 But this issue will be present in any future code that will schedule flows directly not using jobs or server api. |
Uh oh!
There was an error while loading. Please reload this page.
Planner ID: # 2099
Description
Package names in PyPI are case insensitive, i.e.
PyJWT
andpyjwt
are the same package in PyPI world. We normalize Python package names when analysis request comes in, but later we seem to work with the original package name that was given to us by requester. This means that we can analyze the same package multiple times and thus probably end up with multiple entries in graph database.It's possible that this issue affects also other ecosystems, not just PyPI. We need to check and either fix the issue for other ecosystems as well (if easy), or create a separate issue(s) so we can tackle them later.
The text was updated successfully, but these errors were encountered: