Skip to content

Integrate typeshed into mypy #882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthiaskramm opened this issue Sep 28, 2015 · 20 comments
Closed

Integrate typeshed into mypy #882

matthiaskramm opened this issue Sep 28, 2015 · 20 comments

Comments

@matthiaskramm
Copy link
Contributor

I started working on a pull request for this, so I'm opening this issue to discuss details and track progress.

The idea is to

  1. Add typeshed (http://github.com/python/typeshed) as a submodule to mypy, e.g. as "./typeshed/" or "./stubs/", and then to tweak default_lib_path in mypy/build.py to work with the typeshed directory structure. (Which is a bit more fine-grained than the one in mypy. It's documented at
    https://github.com/python/typeshed/blob/master/README.md)
  2. Merge the stubs of typeshed and mypy. Both have things the other doesn't have - typeshed has a lot of stubs from C extensions, including obscure things like _sre, strop, syslog and cmath, whereas mypy already also has stubs for the .py part of the standard library. (But a first step might be to make a branch in typeshed that's essentially a copy of mypy/stubs/ just to make sure the submodule+loading logic works.)
@refi64
Copy link
Contributor

refi64 commented Sep 28, 2015

AAAAAH!!!! Submodules are "theoretically" good but bite where it hurts when it comes to larger repos.

@matthiaskramm
Copy link
Contributor Author

I've never had any issues with submodules. They're exactly the right solution for the problem at hand (wanting to extract out and share a piece of functionality and maintain it independently).

@o11c
Copy link
Contributor

o11c commented Sep 28, 2015

With recent versions of git, it's a lot harder to screw up submodules. If you're afraid of them because of experience with old versions of git, your knowledge is obsolete.

@o11c
Copy link
Contributor

o11c commented Sep 29, 2015

Likely add some sort of directory helper method here: o11c@bb2daf5

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 1, 2015

This could be really nice. At least merging the stubs is a no-brainer and I haven't done it yet only because of lack of time.

However, I've had some bad experiences with submodules myself (using older versions of git, I admit). I'll play around with them a little and see if I have any objections any more.

An alternative would be to import the contents of typeshed periodically to the mypy repo as commits. This would have its own set of issues.

@matthiaskramm
Copy link
Contributor Author

The three alternatives typically are:

  1. manual copying
  2. submodules
  3. subtrees (https://www.kernel.org/pub/software/scm/git/docs/howto/using-merge-subtree.html)

In the past, I've tried all three and found submodules the most convenient and the least error-prone (mostly because they prevent you from changing things in the wrong repository), but your mileage might vary. I'll be perfectly happy to use whatever setup you decide to go with.

@jhance
Copy link
Collaborator

jhance commented Oct 9, 2015

Submodules are not bad per-se. The problem that comes with them is when people want to use them as a replacement for SVN partial checkouts.

I think in this scenario a submodule fits the problem nicely.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 12, 2015

I added some comments to #884. Here is another workflow which is more work to support but may give a better experience in the most common workflows for new contributors. I don't really care that much about the workflows for core contributors such as me as we can figure it out or script it, but for new contributors I'd like to give an experience that is as close as possible to a vanilla git(hub) experience.

The idea goes like this:

  • Mypy repo will have copies of all stubs, exactly like now.
  • There will be scripts for pushing changes to stubs from mypy to typeshed, and vice versa. These will be mostly run by core developers instead of casual individual contributors. Dealing with local stub changes that aren't supposed to be pushed to typeshed will be a little tricky.

Now all the existing mypy workflows will work exactly like they currently do, plus there are two additional workflows, one for pushing stub changes from mypy to typeshed and the other for pulling changes from typeshed to mypy.

Benefits over submodule:

  • Simpler workflow for (non-core) mypy contributors that is exactly like a vanilla github workflow.
  • Full commit history for the entire mypy repo, including stubs.
  • Can easily have commits that target both stubs and non-stub code. Merging and rebasing works without issues.
  • No submodules!

Costs over submodule:

  • More work to set up and maintain for core developers. We need to write scripts and run them periodically.
  • It'll be more likely that mypy and typeshed will be periodically out of sync. We'll probably want multiple people who can update stubs so that there is no bottleneck. I can give somebody else push access to mypy so that they can review and merge PRs targeting stubs when I'm not available.
  • The commit histories of typeshed and mypy will be similar but not the same.

I'm not really sure whether this will be worth it. I'll ask around and try to get more opinions on git submodules.

@o11c
Copy link
Contributor

o11c commented Oct 12, 2015

I very very strongly recommend against a git subtree-style workflow. Instead, we should just add a call to git submodule update --init in runtests.py so that people who don't have the recursive clone get something working - or possibly use a separate script and just warn (nicer for cases our script hasn't thought of).

We want stubs to be committed to typeshed early, even if their PR branch is not ready. PR branches should not contain submodule updates, those should be periodically done in core (and we can have our scripts encourage people to use typeshed master rather than the commit the submodule points to - or perhaps offer tested and latest variants).

I created #903 to make life easier for me, not for anybody else.

@matthiaskramm
Copy link
Contributor Author

o11c: I wouldn't run any git commands in runtests.py, just out of principle. If anything, it should detect a missing submodule and output a nice message with instructions what to do.
Also, git by default initializes all submodules upon checkout.

JukkaL: If you allow PRs that cause mypy and typeshed to become out of sync, you (and other core contributors) will take on the extra workload to bring them back into sync. Sure, scripts can help, but it still means extra work for potentially every PR - work that normally would/should be done by the contributor.
It seems the key missing feature for the submodule approach is to have local stub changes in mypy. A possible solution might be a high-precedence "overload/" directory in mypy that contains stubs that are (temporarily) mypy specific.

@matthiaskramm
Copy link
Contributor Author

Correction: git will not automatically initialize submodules, at least not on all Linux distributions. (E.g. on Gentoo git does automatic submodule cloning, but Ubuntu it doesn't)
But you can use "git clone --recurse-submodules", or add "[clone] recurse_submodules = yes" to your .gitconfig. That feature was added in git 1.6.5.

@jhance
Copy link
Collaborator

jhance commented Oct 12, 2015

E.g. on Gentoo git does automatic submodule cloning

No, it does not.

@matthiaskramm
Copy link
Contributor Author

No, it does not.

Oh oops, you're right. (I just tried it on an older Gentoo box)
I wonder where I had seen a system with (presumably) a quirky /etc/gitconfig that caused it to always automatically initialize submodules.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 13, 2015

Okay, I went through the submodule workflow offline with @gnprice today, and git submodule support is definitely still janky but it seems like the least bad option we have.

Here are my/our additional concerns:

  • If you switch branches, git doesn't do anything with the submodule automatically. git status would just shows that stubs (or whatever) is modified, but it doesn't even mention that stubs is a submodule and you should do a submodule update to fix this. This smells like totally broken UI design. (Apparently submodules are the unloved bastard stepchild that git core contributors just tolerate out of duty, but it doesn't get any love, just a kick in the head once in a while.)
  • If you still happen to get a merge conflict (nobody really follows instructions even if they would help you avoid this) git doesn't suggest which of the revisions is the newer one, which is again bad UI.

I see the point of optimizing the workflow for the core contributors, as they are a more critical resource for moving things forward than casual contributors.

My suggestions for partially working around the issues with submodules:

  • Document the need to do git submodule update when switching branches prominently in the README and hope that people will read it. We could even recommend aliasing co to something like checkout (...) && git submodule update as suggested by @gnprice, but we have no way of making sure that everybody has this set up correctly, and it's still easy to forget to use this and do plain checkout instead.
  • Have runtests.py, scripts/mypy and setup.py check that the submodule is initialized and give a message telling what to do if it isn't (only when running in a repo checkout).
  • Have the above scripts check if the submodule is not up-to-date. If not, print out a helpful message and fail. Require something like --dirty-stubs to proceed anyway (if you are running tests against new typeshed revision, for example). This option would only be available (including showing in the usage) when running out of a repo clone.
  • Document the purpose of typeshed clearly in the README and elsewhere in documentation, and how it affects the development workflow.
  • Recommend only updating the submodule pointer in commits that don't touch anything else and that go directly to master.

Optionally, somebody could send a diff to fix the most glaring UI issues in git (any takers?).

What do you think? Would somebody be willing to implement the suggestions?

@matthiaskramm
Copy link
Contributor Author

Sounds good to me. I can implement this.

I'll add the documentation to #884.

The git logic and helper scripts should probably go into a PR of it's own. (Or do you want that as part of #884, too? I usually prefer to modularize things)

We could even recommend aliasing co to something like checkout (...) && git submodule update

An alias would work, but the canonical way is to install a post-receive hook on the client side. Maybe mypy can have a utility script that does this for you.

@jhance
Copy link
Collaborator

jhance commented Oct 13, 2015

Optionally, somebody could send a diff to fix the most glaring UI issues in git (any takers?).

I don't think that upstream git considers these UI choices to be an issue actually. (Or at least, thats a hypothesis just because I can't think of any other reason why it wouldn't be the way you describe)

@o11c
Copy link
Contributor

o11c commented Oct 13, 2015

I don't think that upstream git considers these UI choices to be an issue actually.

Given how many improvements they have made to the UI of submodules in recent versions, that appears to be untrue.

Also git status, which has always been what you should use frequently, has grown a lot more "you might want to use this command" hints.

@matthiaskramm
Copy link
Contributor Author

Ok, submitted PR #923 for verifying whether the submodules are initialized. Will do the documentation changes next.

Since it seems we might be able to merge #884 soon, if at all possible, please hold off on doing any changes to stubs (or on merging PRs that do), to prevent typeshed and mypy getting out of sync. (They're in sync right now, but I had to manually port some changes.)

@o11c
Copy link
Contributor

o11c commented Oct 13, 2015

I added stubs for platform in my fork, but they are only needed for changes my fork.

@gnprice
Copy link
Collaborator

gnprice commented Oct 13, 2015

[#884] submodules have gotten much better. People still hate them, but they hate them for what they were, not what they're now. They'll lose their bad reputation at some point.

This is the story I'd had in my head, but after going through the workflow yesterday with @JukkaL, my conclusion is that they're unfortunately just as bad as I remember from circa 2009. Still, as Jukka says, they seem like the least-bad available option.

Also git status, which has always been what you should use frequently, has grown a lot more "you might want to use this command" hints.

This is a nice illustration of how bad the UI still is. I just double-checked this on the very latest Git (v2.6.1), and the message is exactly the same as Jukka and I looked at yesterday. (It's true that git status has in general gotten a lot better in recent years! Just perhaps not for submodules.)

I made a repo foo, with a submodule bar. Then a commit that updates bar to a new version. Then with git checkout HEAD^ I checked out the old commit without the update to bar -- this is a model of what happens when switching e.g. between a recent master and an older feature branch.

Here's the message:

gregprice@gregprice:/tmp/foo$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   bar (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

There's no hint of any relevant command to run. "new commits" is sort of helpful if you can guess that it means "there's a submodule here", but it's also backward from what the user actually should do. It's left to us, in code like #923, to give the user the hint they actually need (which is git submodule update.)

In my copious free time, I might try dreaming up appropriate fixes to this UI that Git upstream will take. Meanwhile I think we're on the right track given the tools we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants