Skip to content

Adding GELU activation fuction #12020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Adding GELU activation fuction #12020

wants to merge 4 commits into from

Conversation

vishalshar
Copy link

Summary

New activation function which performs better than ReLU and ELU on several computer vision, NLP and speech tasks. The explanation is in more detail by authors in very recent paper (Nov 11, 2018) https://arxiv.org/abs/1606.08415

Related Issues

No issues, it is an add-on in existing activation function list.

PR Overview

Adding GELUs activation function they are non-convex and non-monotonic compared to ReLU or ELU.
Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.
Link: https://arxiv.org/abs/1606.08415

  • [y] This PR requires new unit tests [y/n] (make sure tests are included)
  • [n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
  • [y] This PR is backwards compatible [y/n]
  • [n] This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

GLUEs are nonconvex, nonmonotonic unlike ReLU or ELU.
Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.
GLUEs are nonconvex, nonmonotonic unlike ReLU or ELU.
Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.
@fchollet
Copy link
Collaborator

fchollet commented Jan 10, 2019

Thanks for the PR. We will not add this activation function to the core API at this time (it is possible we might add it later though).

In the Keras API, every new feature has to be maintained in perpetuity, and has to be replicated in every implementation of the Keras API (which includes tf.keras, multi-backend Keras, tensorflow.js, keras-mxnet, and others).

As, such, our criteria for adding a new feature in the API is the following:

  1. It should be broadly useful to our users, rather than a niche feature that is only relevant a specific vertical of researchers. Niche features should be maintained independently by those who need them (e.g. by extending the API via subclassing), as third-party add-on packages.

  2. It should be widely recognized as a machine learning best practice. We will not add new layers/etc that were recently published to ArXiv.org, even in case of claims of increased accuracy/etc. We only add new objects that are already commonly used in the machine learning community. Presumably, a new technique that does result in meaningful gains would be broadly adopted after a few months anyway (like ResNet), and that’s when we would be adding it to the core API.

  3. It should have an owner committed to maintaining it in the long term. In particular, the code should be maintainable by multiple people on the team, not just by one technical guru.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants