Skip to content

Prototype for bot that adds label to PRs with merge conficts #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Oct 8, 2017

This PR is a prototype solution for #96.

On this PR

This PR is NOT the final solution or a final proposed solution, but this initial version of the code shows how is possible to deal with GitHub's :octocat: API request limit using the GraphQL API to query all the open pull requests at once and then act accordingly.

Some stats on the current state of Python/cpython:

  • Only 5 requests are needed to retrieve ALL open pull request information (in this example only the number, URL, and mergeable state is requested). 450 open pull request was retrieved. 🎉
  • From this 450 open pull requests only 94 present merge conflicts. 🥇
  • As the GraphQL doesn't provide a way to add labels in one request, the REST API is needed to add the label one by one, making 94 HTTP request needed to update the labels. 😞

On the WebHook method

Another possibility that one may think is using a Webhook that triggers when a pull request is updated so we can retrieve if the pull request has merge conflicts from there and update accordingly. Although this is much cleaner it presents a major problem: the mergeable state is usually not set in the webhook request because the GitHub job that inspects it is not triggered at the moment of the webhook being delivered. This means that we would need a REST request after some seconds to check the state manually.

This is more elegant in the sense that we don't need to run cron jobs daily but a lot of REST API calls will be needed.


I can implement a final version of whatever method you prefer, but I think an extended conversation measuring the pros and cons is needed to achieve a good (and simple) solution. 😄

@Mariatta
Copy link
Member

Mariatta commented Oct 9, 2017

Thanks for the PR, @pablogsal . I think this is a good start, I like the approach of using GraphQL to retrieve the PRs and then making REST API calls to apply the label.

For using webhook instead of a daily cron:
I think it's unlikely for a PR to suddenly become unmergeable after it's been updated. More likely is it becomes unmergeable as a result that a different PR gets merged. So what we can do is to call the GraphQL query to find out which other PRs becomes unmergeable after each merged PR event.
This will happen more frequently than the daily cronjob.

@pablogsal
Copy link
Member Author

pablogsal commented Oct 9, 2017

@Mariatta Some preferences for the implementation? Async or flask-sync? Do you want it to defer the work to celery-based workers (as opposed to the webhook doing the work in the same request)? Should I host the webhook (I am happy to)?

Sorry for the bothering but I prefer not to assume anything so you can have the implementation you want. 😃

@Mariatta
Copy link
Member

I think I would like this to follow bedevere's design, using gidgethub and aiohttp, and hosted on heroku.

@brettcannon What do you think? Can this go into bedevere, or a separate app?

From @pablogsal prototype, using GraphQL we can retrieve 450 open PRs. During the first run, it will process 94 open PRs using REST APIs. After that, the number of REST API called will be much less.

@terryjreedy
Copy link
Member

As a user, two bot names other than the CI bots seems like enough, whatever the hidden implementation. Bedevere helps get PRs merged. Islington helps create backport PRs.

@brettcannon
Copy link
Member

Do we know how long the querying of 450 open PRs takes? My worry is that on every merged PR we will time out on the webhook because the processing time takes longer than a single HTTP request. And if that does end up being the situation then we're going to have to set up a queuing service to queue up the requests that will take longer than an HTTP request and then an appropriate system which will fetch work from the queue, all of which ups the complexity by a lot.

@pablogsal Beyond Mariatta's suggestion on how to structure things, you can also simplify your GraphQL query by using variables if there is any part of the query that isn't static.

@Mariatta
Copy link
Member

miss-islington uses celery tasks on Heroku, it's been working well. Long-running tasks like cloning CPython repo, and fetching from upstream, are deferred to background tasks.
We can set up the same if timing out is a concern :)

@pablogsal
Copy link
Member Author

I have added a first working version of the bot and some tests. The strategy that I have followed is the following:

  • Instead of using celery tasks I am acknowledging the request immediately and launching the task that labels the PR in the background. If you feel that celery tasks will suit better this use case, I can prepare the changes but I have benchmarked 🐎 this solution again a testing repo on GitHub and the results are very good, as all the work is I/O based.

  • As discussed, GraphQL :octocat: is used first to fetch the open pull requests and process them. I have considered the use of variables, but taking into account that a string with the variables will be as well included in the query string and I am constructing dynamically the query (not with values but the query itself), I feel the current approach is not a bad idea, though I will change it if you feel a different approach is definitely better in this case.

  • aiohttp and gidgethub are used to the rest of the queries as indicated by @Mariatta and @brettcannon.

This is probably not the final version and there is probably some further discussion needed, but I think having this first working version will help to indicate more concrete changes/behaviour.

Thank you very much for the time dedicated reviewing this PR. 🙇‍♂️

@pablogsal
Copy link
Member Author

@Mariatta @brettcannon There is something extra I should do on my part before you can review this? Maybe I should reopen a PR in some bot's repository?

@Mariatta
Copy link
Member

Mariatta commented Nov 29, 2017

Thanks @pablogsal. Seems like we haven't decided where to include this code. I was thinking it can be part of bedevere, but there were concerns about hitting the API limit.

@brettcannon
Copy link
Member

I'm also just swamped with other work and life stuff.

@Mariatta Mariatta added the status-stale The PR hasn't been updated in a while, and pending removal label Jul 21, 2018
@Mariatta
Copy link
Member

Mariatta commented Aug 1, 2019

Sorry for such delay in reviewing this PR. I think it looks great. I would suggest adding this to bedevere. I haven't seen rate-limit error from bedevere yet, so maybe no concern there. Also, bedevere has done most of the labeling work.

@pradyunsg
Copy link
Member

FWIW, pypa has similar functionality on @pypa-bot (we're using it for pip). I think it'd be great if we could switch to using the same bot as CPython for this stuff.

https://github.com/pypa/browntruck/blob/twisted/browntruck/hooks/merge_conflict.py

@brettcannon
Copy link
Member

@pradyunsg especially since the pypa bot is not seemingly actively maintained (I've had a PR opened for two years to rip out custom code to rely more on gidgethub that hasn't gone anywhere).

@pradyunsg
Copy link
Member

pradyunsg commented Aug 3, 2019

I've been meaning to look but I have no access to the deployed bot myself, and without access to the deployed bot's logs, I don't feel confident merging those changes. 🙃

@pablogsal pablogsal closed this Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed hacktoberfest status-stale The PR hasn't been updated in a while, and pending removal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants