Skip to content

Remove De-normalization Of Name and Version #4974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 5, 2018

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Oct 29, 2018

When we were using natural primary keys, a lot of tables had name and version keys on them, since that was how the FKs worked. With the move to surrogate keys for all of our primary keys, these existing name and version fields are just leftover bits of denormalization that needs to be handled.

In #4958 (comment) I laid out a few options for solving this. This PR represents choice (3).

There may be performance implications here, I've tried to keep any index that still seemed like it made sense, but this is sort of drastically altering the performance characteristics of some of these queries, so it's possible that there will be a performance regression. However, if there is we can, at that time, either solve it with additional indexes or perhaps choose to denormalize our data then.

@dstufft dstufft requested a review from ewdurbin October 29, 2018 02:54
@ewdurbin
Copy link
Member

ewdurbin commented Nov 3, 2018

I think the only concern here is changing the name of a project per #4781 would destroy the original name for a given release.

@dstufft
Copy link
Member Author

dstufft commented Nov 3, 2018

I'm not sure what you mean by destroy the original name for the given release?

@ewdurbin
Copy link
Member

ewdurbin commented Nov 3, 2018

If a release for a project Pip==0.8.0 comes through and later the author chooses to rename to pip==1.0.0 utilizing the feature proposed in #4781 , Project.name would transition from Pip to pip and thus Release.project.name for Pip==0.8.0 would simultaneously transition from Pip to pip. I don't know if that's a huge issue, or really care. Just wanted to flag.

@dstufft
Copy link
Member Author

dstufft commented Nov 3, 2018

Ah I see what you mean.

I don't think that matters, as long as we don't allow renames that change the normalized name all tools should treat them equivalent anyways. The current state is that the name in the dist file and the name on the Release metadata aren't always going to line up, because Release.name uses whatever Project.name was at the time of registration. However people can make changes in their dist file, and as long as the name normalizes to the same, we'll look up the correct Project instance, and use Project.name from that, even if that differers from what is in the dist file.

@dstufft dstufft merged commit 7fc001b into pypi:surrogate_keys Nov 5, 2018
@dstufft dstufft deleted the surrogate-keys-no-denorm branch November 5, 2018 18:53
ewdurbin added a commit that referenced this pull request Nov 10, 2018
* Migrate to UUID Primary Key for Project and Release models

* fixup factories for Project and Release, sync Role model to migration

* make reformat

* fix at least one test

* Fix fallout of Role.package_name removal

* Remove Denormalized name and version fields from all models (#4974)

* Update for #5001

* Ensure model state matches db state

* Switch to using FKs to User.id isntead of User.username

* Fix test

* Fix linting

* Add Release.uploader as a real FK (#5015)

* Rename a number of tables to better fit in current scheme (#5016)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants