-
-
Notifications
You must be signed in to change notification settings - Fork 60
Prototype for bot that adds label to PRs with merge conficts #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR, @pablogsal . I think this is a good start, I like the approach of using GraphQL to retrieve the PRs and then making REST API calls to apply the label. For using webhook instead of a daily cron: |
@Mariatta Some preferences for the implementation? Async or flask-sync? Do you want it to defer the work to celery-based workers (as opposed to the webhook doing the work in the same request)? Should I host the webhook (I am happy to)? Sorry for the bothering but I prefer not to assume anything so you can have the implementation you want. 😃 |
I think I would like this to follow bedevere's design, using @brettcannon What do you think? Can this go into bedevere, or a separate app? From @pablogsal prototype, using GraphQL we can retrieve 450 open PRs. During the first run, it will process 94 open PRs using REST APIs. After that, the number of REST API called will be much less. |
As a user, two bot names other than the CI bots seems like enough, whatever the hidden implementation. Bedevere helps get PRs merged. Islington helps create backport PRs. |
Do we know how long the querying of 450 open PRs takes? My worry is that on every merged PR we will time out on the webhook because the processing time takes longer than a single HTTP request. And if that does end up being the situation then we're going to have to set up a queuing service to queue up the requests that will take longer than an HTTP request and then an appropriate system which will fetch work from the queue, all of which ups the complexity by a lot. @pablogsal Beyond Mariatta's suggestion on how to structure things, you can also simplify your GraphQL query by using variables if there is any part of the query that isn't static. |
miss-islington uses celery tasks on Heroku, it's been working well. Long-running tasks like cloning CPython repo, and fetching from |
I have added a first working version of the bot and some tests. The strategy that I have followed is the following:
This is probably not the final version and there is probably some further discussion needed, but I think having this first working version will help to indicate more concrete changes/behaviour. Thank you very much for the time dedicated reviewing this PR. 🙇♂️ |
@Mariatta @brettcannon There is something extra I should do on my part before you can review this? Maybe I should reopen a PR in some bot's repository? |
Thanks @pablogsal. Seems like we haven't decided where to include this code. I was thinking it can be part of bedevere, but there were concerns about hitting the API limit. |
I'm also just swamped with other work and life stuff. |
Sorry for such delay in reviewing this PR. I think it looks great. I would suggest adding this to bedevere. I haven't seen rate-limit error from bedevere yet, so maybe no concern there. Also, bedevere has done most of the labeling work. |
FWIW, pypa has similar functionality on @pypa-bot (we're using it for pip). I think it'd be great if we could switch to using the same bot as CPython for this stuff. https://github.com/pypa/browntruck/blob/twisted/browntruck/hooks/merge_conflict.py |
@pradyunsg especially since the pypa bot is not seemingly actively maintained (I've had a PR opened for two years to rip out custom code to rely more on gidgethub that hasn't gone anywhere). |
I've been meaning to look but I have no access to the deployed bot myself, and without access to the deployed bot's logs, I don't feel confident merging those changes. 🙃 |
This PR is a prototype solution for #96.
On this PR
This PR is NOT the final solution or a final proposed solution, but this initial version of the code shows how is possible to deal with GitHub's
API request limit using the GraphQL API to query all the open pull requests at once and then act accordingly.
Some stats on the current state of Python/cpython:
On the WebHook method
Another possibility that one may think is using a Webhook that triggers when a pull request is updated so we can retrieve if the pull request has merge conflicts from there and update accordingly. Although this is much cleaner it presents a major problem: the
mergeable
state is usually not set in the webhook request because the GitHub job that inspects it is not triggered at the moment of the webhook being delivered. This means that we would need a REST request after some seconds to check the state manually.This is more elegant in the sense that we don't need to run cron jobs daily but a lot of REST API calls will be needed.
I can implement a final version of whatever method you prefer, but I think an extended conversation measuring the pros and cons is needed to achieve a good (and simple) solution. 😄