Skip to content

Guidelines for new contributions #2194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AakashKumarNain opened this issue Oct 7, 2020 · 21 comments · Fixed by #2370
Closed

Guidelines for new contributions #2194

AakashKumarNain opened this issue Oct 7, 2020 · 21 comments · Fixed by #2370

Comments

@AakashKumarNain
Copy link
Member

New contributions are always welcome. However, ML is a rapidly changing field. There are hundreds of papers being published on arXiv everyday. With so many papers coming out, we expect new things that aren't readily available in the core and people would be wiling to contribute to addons.

From the perspective of an end user, it makes sense that addons incorporate new contributions as much as possible. However from the perspective of a maintainer, things can become overwhelming at some point in time because all contributors won't be active all the time which can lead to stalled issues and problems with new updates/releases (if any).

To this end, we need to decide how many citations should be there for a new functionality to be added in the ecosystem. For the core, it's around 100 citations IIRC. For addons, this bar would be much lower but we still need to decide a number. We should update this in the README as well.

There are certain ways to track the number of citations but one of the coolest tools is this one: https://github.com/dennybritz/papergraph

@bhack
Copy link
Contributor

bhack commented Oct 7, 2020

We partially talked about this on the upstreaming treshold at tensorflow/community#239 (comment)

@Harsh188
Copy link
Contributor

Harsh188 commented Oct 7, 2020

Yep, this is a good idea. A "feature submission" suggestion under the contribution section in README.md would be helpful.

@AakashKumarNain
Copy link
Member Author

Thanks @bhack and @Harsh188 for the feedback.

@bhack should I close this one and continue discussion in that issue or you want to discuss it here only?

@bhack
Copy link
Contributor

bhack commented Oct 7, 2020

We need to continue here cause that one was closed. We don't handle the upstreaming process anymore on our side.

@bhack
Copy link
Contributor

bhack commented Oct 8, 2020

Just to collect another resource:
https://paperswithcode.com/methods

@AakashKumarNain
Copy link
Member Author

Yeah that's very interesting. Thanks @bhack

cc: @seanpmorgan @WindQAQ what are your thoughts?

@seanpmorgan
Copy link
Member

seanpmorgan commented Oct 15, 2020

To this end, we need to decide how many citations should be there for a new functionality to be added in the ecosystem. For the core, it's around 100 citations IIRC. For addons, this bar would be much lower but we still need to decide a number. We should update this in the README as well.

Great suggestion. We've hinted at it, but never formalized anything as has been mentioned.

There are certain ways to track the number of citations but one of the coolest tools is this one: https://github.com/dennybritz/papergraph

Very cool but when I tried the live version it wasn't very performant. I think I'd prefer a simple way to determine this, though the tool is super cool / useful

Just to collect another resource:
https://paperswithcode.com/methods

This would be a nice way to offload the decision, a bit concerned about highly cited things being turned down if not there though.

Yep, this is a good idea. A "feature submission" suggestion under the contribution section in README.md would be helpful.
Good call out to get this decision placed in the central README as well as CONTRIBUTING.md

Overall I'd lean to a simple citation number as metric, but could be convinced easily if there is any consensus. Also interested in ways to identify that number of citations? We could pull some data on our historical acceptances.

@AakashKumarNain
Copy link
Member Author

We could pull some data on our historical acceptances.

Yes. We need to do this. We can use paperswithcode and Google Scholar to track the number of citations easily. The count, though, has to be decided on the contributions made in the past and the learning from them

@bhack
Copy link
Contributor

bhack commented Oct 16, 2020

This Is really about define a treshold policy. With a citation only metric we will probably exclude proposal on tops like: http://www.arxiv-sanity.com/top and http://www.arxiv-sanity.com/toptwtr

@AakashKumarNain
Copy link
Member Author

@bhack yes. But we will make another proposal for these things. Sometimes a method that is very promising takes some time before getting into the limelight or enough number of citations. In that case, we will go through the relevant paper and a consensus can be taken whether we want to include the new functionality or not. Though this will take much longer to review

@bhack
Copy link
Contributor

bhack commented Oct 17, 2020

@bhack yes. But we will make another proposal for these things. Sometimes a method that is very promising takes some time before getting into the limelight or enough number of citations. In that case, we will go through the relevant paper and a consensus can be taken whether we want to include the new functionality or not. Though this will take much longer to review

In that case having an official reference implementation it will have its weight in the evaluation.

@failure-to-thrive
Copy link
Contributor

Why not to put a simple voting system? In the end of the day, it's all about implementation in TFA. Something cited doesn't necessary imply eagering to utilize it [in TF/TFA/whatever]. Although the same issue with threshold. 😄 Maybe some % of active users?

@Harsh188
Copy link
Contributor

Why not to put a simple voting system? In the end of the day, it's all about implementation in TFA. Something cited doesn't necessary imply eagering to utilize it [in TF/TFA/whatever]. Although the same issue with threshold. Maybe some % of active users?

@failure-to-thrive this could be a feature we could include, but I don't think relying on the opinion of members is an effective evaluation metric. Some users may not have an opinion on a certain implementation or they might bring their personal bias into the voting system. This system might become chaotic for the user's who want to suggest a new feature since they won't be able to determine the requirements for their proposal.

I think the number of citations could be an effective filtering process to which we could add another metric which would involve user/users to look into the paper and evaluate it based on a proposed system. This would result in an effective combination of an unbiased numerical metric and a countermeasure to prevent issues like

With a citation only metric we will probably exclude proposal on tops like: http://www.arxiv-sanity.com/top and http://www.arxiv-sanity.com/toptwtr

@AakashKumarNain
Copy link
Member Author

Thanks @failure-to-thrive for your feedback.

Why not to put a simple voting system?

A voting system here would be abused. For example, let's say there is a new optimizer that came out yesterday which claims better results than any other existing optimizer. Any PR related to it would gather a huge number of votes because everyone wants to try out new things. The problem is that papers don't cover enough use cases to prove that whatever is presented in a paper is actually useful. You would only know if something works properly if it has a good number of citations.

@relaxation82
Copy link

relaxation82 commented Oct 26, 2020 via email

@AakashKumarNain
Copy link
Member Author

This has been mentioned in a lot of PRs. Let's take a decision and close it in the next meeting

@Harsh188
Copy link
Contributor

Harsh188 commented Feb 11, 2021

I believe in the previous meeting we decided to keep 50 citations as a suggested requirement with the ability for any maintainer to override the requirement. I'm currently working on updating the CONTRIBUTING.md with the newly recommended guidelines for new feature requests.

I still think there's a lot of room for more suggestions to narrow down the quality of the requests.

@AakashKumarNain
Copy link
Member Author

I will take a look at that. Thank you @Harsh188

@AakashKumarNain
Copy link
Member Author

I really like this package. @seanpmorgan @bhack @WindQAQ thoughts?
https://github.com/scholarly-python-package/scholarly

@bhack
Copy link
Contributor

bhack commented Feb 17, 2021

Nice it could be useful for an interactive Github bot

@AakashKumarNain
Copy link
Member Author

Yup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants