Guidelines for new contributions #2194

AakashKumarNain · 2020-10-07T08:53:03Z

New contributions are always welcome. However, ML is a rapidly changing field. There are hundreds of papers being published on arXiv everyday. With so many papers coming out, we expect new things that aren't readily available in the core and people would be wiling to contribute to addons.

From the perspective of an end user, it makes sense that addons incorporate new contributions as much as possible. However from the perspective of a maintainer, things can become overwhelming at some point in time because all contributors won't be active all the time which can lead to stalled issues and problems with new updates/releases (if any).

To this end, we need to decide how many citations should be there for a new functionality to be added in the ecosystem. For the core, it's around 100 citations IIRC. For addons, this bar would be much lower but we still need to decide a number. We should update this in the README as well.

There are certain ways to track the number of citations but one of the coolest tools is this one: https://github.com/dennybritz/papergraph

The text was updated successfully, but these errors were encountered:

bhack · 2020-10-07T09:07:47Z

We partially talked about this on the upstreaming treshold at tensorflow/community#239 (comment)

Harsh188 · 2020-10-07T12:55:46Z

Yep, this is a good idea. A "feature submission" suggestion under the contribution section in README.md would be helpful.

AakashKumarNain · 2020-10-07T15:17:05Z

Thanks @bhack and @Harsh188 for the feedback.

@bhack should I close this one and continue discussion in that issue or you want to discuss it here only?

bhack · 2020-10-07T15:20:38Z

We need to continue here cause that one was closed. We don't handle the upstreaming process anymore on our side.

bhack · 2020-10-08T20:09:31Z

Just to collect another resource:
https://paperswithcode.com/methods

AakashKumarNain · 2020-10-14T05:29:14Z

Yeah that's very interesting. Thanks @bhack

cc: @seanpmorgan @WindQAQ what are your thoughts?

seanpmorgan · 2020-10-15T03:40:16Z

To this end, we need to decide how many citations should be there for a new functionality to be added in the ecosystem. For the core, it's around 100 citations IIRC. For addons, this bar would be much lower but we still need to decide a number. We should update this in the README as well.

Great suggestion. We've hinted at it, but never formalized anything as has been mentioned.

There are certain ways to track the number of citations but one of the coolest tools is this one: https://github.com/dennybritz/papergraph

Very cool but when I tried the live version it wasn't very performant. I think I'd prefer a simple way to determine this, though the tool is super cool / useful

Just to collect another resource:
https://paperswithcode.com/methods

This would be a nice way to offload the decision, a bit concerned about highly cited things being turned down if not there though.

Yep, this is a good idea. A "feature submission" suggestion under the contribution section in README.md would be helpful.
Good call out to get this decision placed in the central README as well as CONTRIBUTING.md

Overall I'd lean to a simple citation number as metric, but could be convinced easily if there is any consensus. Also interested in ways to identify that number of citations? We could pull some data on our historical acceptances.

AakashKumarNain · 2020-10-16T16:07:56Z

We could pull some data on our historical acceptances.

Yes. We need to do this. We can use paperswithcode and Google Scholar to track the number of citations easily. The count, though, has to be decided on the contributions made in the past and the learning from them

bhack · 2020-10-16T16:13:57Z

This Is really about define a treshold policy. With a citation only metric we will probably exclude proposal on tops like: http://www.arxiv-sanity.com/top and http://www.arxiv-sanity.com/toptwtr

AakashKumarNain · 2020-10-17T13:45:02Z

@bhack yes. But we will make another proposal for these things. Sometimes a method that is very promising takes some time before getting into the limelight or enough number of citations. In that case, we will go through the relevant paper and a consensus can be taken whether we want to include the new functionality or not. Though this will take much longer to review

bhack · 2020-10-17T14:19:34Z

@bhack yes. But we will make another proposal for these things. Sometimes a method that is very promising takes some time before getting into the limelight or enough number of citations. In that case, we will go through the relevant paper and a consensus can be taken whether we want to include the new functionality or not. Though this will take much longer to review

In that case having an official reference implementation it will have its weight in the evaluation.

failure-to-thrive · 2020-10-24T10:10:07Z

Why not to put a simple voting system? In the end of the day, it's all about implementation in TFA. Something cited doesn't necessary imply eagering to utilize it [in TF/TFA/whatever]. Although the same issue with threshold. 😄 Maybe some % of active users?

Harsh188 · 2020-10-24T11:02:30Z

Why not to put a simple voting system? In the end of the day, it's all about implementation in TFA. Something cited doesn't necessary imply eagering to utilize it [in TF/TFA/whatever]. Although the same issue with threshold. Maybe some % of active users?

@failure-to-thrive this could be a feature we could include, but I don't think relying on the opinion of members is an effective evaluation metric. Some users may not have an opinion on a certain implementation or they might bring their personal bias into the voting system. This system might become chaotic for the user's who want to suggest a new feature since they won't be able to determine the requirements for their proposal.

I think the number of citations could be an effective filtering process to which we could add another metric which would involve user/users to look into the paper and evaluate it based on a proposed system. This would result in an effective combination of an unbiased numerical metric and a countermeasure to prevent issues like

With a citation only metric we will probably exclude proposal on tops like: http://www.arxiv-sanity.com/top and http://www.arxiv-sanity.com/toptwtr

AakashKumarNain · 2020-10-26T13:44:25Z

Thanks @failure-to-thrive for your feedback.

Why not to put a simple voting system?

A voting system here would be abused. For example, let's say there is a new optimizer that came out yesterday which claims better results than any other existing optimizer. Any PR related to it would gather a huge number of votes because everyone wants to try out new things. The problem is that papers don't cover enough use cases to prove that whatever is presented in a paper is actually useful. You would only know if something works properly if it has a good number of citations.

relaxation82 · 2020-10-26T17:25:43Z

Having observed and/or contributed in scientific publication process in several disciplines for more than a decade, In my opinion it is generally not worth it to implement anything until it has been widely recognized by the community. Especially so on hot topics like ML. The most critical single requirement for any article to get trough peer review is that it shows improvement in something, so that really should not be taken as an indicator of a particularly useful method. In my experience, review works recapping the advances in some area are particularly good sources for new contributions. A work or a series of works cited in such an article are typically useful enough to be implemented. The problem here is that such articles seem to be rare, and might sometimes lead to excessive delays. As a practical example, I recently tested a bunch of brand new optimizers like Radam, LAMB and Yogi with and without Lookahead against the good old Adam in my classification tasks. Found out that most combinations decreased the mean accuracy and/or increased result variance as compared to Adam. In this case I was positively surprised that plain Radam actually significantly improved the mean accuracy and decreased result variance over Adam. 26.10.2020 15.44 Aakash Kumar Nain <[email protected]> kirjoitti: Thanks @failure-to-thrive<https://github.com/failure-to-thrive> for your feedback. Why not to put a simple voting system? A voting system here would be abused. For example, let's say there is a new optimizer that came out yesterday which claims better results than any other existing optimizer. Any PR related to it would gather a huge number of votes because everyone wants to try out new things. The problem is that papers don't cover enough use cases to prove that whatever is presented in a paper is actually useful. You would only know if something works properly if it has a good number of citations. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#2194 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADNNQ2OGOBJOTNFE2EHWETLSMV4MVANCNFSM4SHDECZQ>.

AakashKumarNain · 2021-02-11T16:58:22Z

This has been mentioned in a lot of PRs. Let's take a decision and close it in the next meeting

Harsh188 · 2021-02-11T17:34:10Z

I believe in the previous meeting we decided to keep 50 citations as a suggested requirement with the ability for any maintainer to override the requirement. I'm currently working on updating the CONTRIBUTING.md with the newly recommended guidelines for new feature requests.

I still think there's a lot of room for more suggestions to narrow down the quality of the requests.

AakashKumarNain · 2021-02-12T15:55:57Z

I will take a look at that. Thank you @Harsh188

AakashKumarNain · 2021-02-17T05:38:30Z

I really like this package. @seanpmorgan @bhack @WindQAQ thoughts?
https://github.com/scholarly-python-package/scholarly

bhack · 2021-02-17T18:22:30Z

Nice it could be useful for an interactive Github bot

AakashKumarNain · 2021-02-17T18:40:56Z

Yup.

AakashKumarNain added the discussion needed label Oct 7, 2020

AakashKumarNain self-assigned this Oct 7, 2020

AakashKumarNain added the documentation label Oct 7, 2020

bhack mentioned this issue Nov 10, 2020

Add AdaBelief optimizer #2234

Closed

21 tasks

bhack mentioned this issue Dec 22, 2020

Entmax alpha implementation #2312

Closed

abhinavsp0730 mentioned this issue Jan 7, 2021

Adding Lars Optimizer to TF addons #2337

Closed

bhack mentioned this issue Jan 15, 2021

EmbeddingBag and Product-Key Memory Layers #2201

Closed

Harsh188 mentioned this issue Jan 27, 2021

New contributions guidelines #2370

Merged

21 tasks

bhack closed this as completed in #2370 Mar 13, 2021

Guidelines for new contributions #2194

Guidelines for new contributions #2194

Comments

AakashKumarNain commented Oct 7, 2020

bhack commented Oct 7, 2020

Uh oh!

Harsh188 commented Oct 7, 2020

Uh oh!

AakashKumarNain commented Oct 7, 2020

Uh oh!

bhack commented Oct 7, 2020

Uh oh!

bhack commented Oct 8, 2020

Uh oh!

AakashKumarNain commented Oct 14, 2020

Uh oh!

seanpmorgan commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AakashKumarNain commented Oct 16, 2020

Uh oh!

bhack commented Oct 16, 2020

Uh oh!

AakashKumarNain commented Oct 17, 2020

Uh oh!

bhack commented Oct 17, 2020

Uh oh!

failure-to-thrive commented Oct 24, 2020

Uh oh!

Harsh188 commented Oct 24, 2020

Uh oh!

AakashKumarNain commented Oct 26, 2020

Uh oh!

relaxation82 commented Oct 26, 2020 via email

Uh oh!

AakashKumarNain commented Feb 11, 2021

Uh oh!

Harsh188 commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AakashKumarNain commented Feb 12, 2021

Uh oh!

AakashKumarNain commented Feb 17, 2021

Uh oh!

bhack commented Feb 17, 2021

Uh oh!

AakashKumarNain commented Feb 17, 2021

Uh oh!

seanpmorgan commented Oct 15, 2020 •

edited

Loading

Harsh188 commented Feb 11, 2021 •

edited

Loading