Adding GELU activation fuction #12020

vishalshar · 2019-01-10T20:12:12Z

Summary

New activation function which performs better than ReLU and ELU on several computer vision, NLP and speech tasks. The explanation is in more detail by authors in very recent paper (Nov 11, 2018) https://arxiv.org/abs/1606.08415

Related Issues

No issues, it is an add-on in existing activation function list.

PR Overview

Adding GELUs activation function they are non-convex and non-monotonic compared to ReLU or ELU.
Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.
Link: https://arxiv.org/abs/1606.08415

[y] This PR requires new unit tests [y/n] (make sure tests are included)
[n] This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
[y] This PR is backwards compatible [y/n]
[n] This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

GLUEs are nonconvex, nonmonotonic unlike ReLU or ELU. Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.

fchollet · 2019-01-10T20:18:24Z

Thanks for the PR. We will not add this activation function to the core API at this time (it is possible we might add it later though).

In the Keras API, every new feature has to be maintained in perpetuity, and has to be replicated in every implementation of the Keras API (which includes tf.keras, multi-backend Keras, tensorflow.js, keras-mxnet, and others).

As, such, our criteria for adding a new feature in the API is the following:

It should be broadly useful to our users, rather than a niche feature that is only relevant a specific vertical of researchers. Niche features should be maintained independently by those who need them (e.g. by extending the API via subclassing), as third-party add-on packages.
It should be widely recognized as a machine learning best practice. We will not add new layers/etc that were recently published to ArXiv.org, even in case of claims of increased accuracy/etc. We only add new objects that are already commonly used in the machine learning community. Presumably, a new technique that does result in meaningful gains would be broadly adopted after a few months anyway (like ResNet), and that’s when we would be adding it to the core API.
It should have an owner committed to maintaining it in the long term. In particular, the code should be maintainable by multiple people on the team, not just by one technical guru.

vishalshar added 4 commits January 10, 2019 12:52

Adding GELU activation function

b192732

GLUEs are nonconvex, nonmonotonic unlike ReLU or ELU. Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.

Adding GELU activation function

9e07efc

GLUEs are nonconvex, nonmonotonic unlike ReLU or ELU. Reference: Gaussian Error Linear Units (GELUs), Hendrycks et. al, 2018.

Adding tests for GELU activation function

4114fea

Update tensorflow_backend.py

ab23dcf

fchollet closed this Jan 10, 2019

bhack mentioned this pull request Nov 8, 2019

Migrate gelu to core tensorflow/addons#550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding GELU activation fuction #12020

Adding GELU activation fuction #12020

Uh oh!

vishalshar commented Jan 10, 2019

Uh oh!

fchollet commented Jan 10, 2019 •

edited

Loading

Uh oh!

Uh oh!

Adding GELU activation fuction #12020

Adding GELU activation fuction #12020

Uh oh!

Conversation

vishalshar commented Jan 10, 2019

Summary

Related Issues

PR Overview

Uh oh!

fchollet commented Jan 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fchollet commented Jan 10, 2019 •

edited

Loading