Skip to content

RFC: Migrate gelu activation from Addons to Core #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 16, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions rfcs/20200525-gelu-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Migrate gelu activation from TensorFlow Addons to TensorFlow Core

| Status | Accepted |
| :---------- | :------------------------------------------------------------------------------------------------- |
| **RFC #** | [252](https://github.com/tensorflow/community/pull/252) | |
| **Authors** | Tzu-Wei Sung (@WindQAQ) & Sean Morgan (@seanpmorgan) |
| **Sponsor** | @alextp |
| **Updated** | 2020-07-15 |
| **Sponsorship Deadline** | 2020-07-17 (45 Days after submission) |

## Rationale for Migration
* [Gaussian Error Linear Units (GELUs)](https://arxiv.org/pdf/1606.08415.pdf) cited 600+ times
* Used in BERT and other influential architectures
* Multiple approvals from TF side:
* https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617832325
* https://github.com/tensorflow/tensorflow/issues/32783#issuecomment-537284266

## Historical Information
* Have there been signifiant issues reported to Addons that need to be adressed?
* Only ABI incompatibilities for the custom-op (not an issue if built along with core TF)
* When was it implemented in Addons?
* C++ custom-op added **2019-08-2019 (TFA 0.5.0)**
* Python composite op added **2020-02-26 (TFA 0.9.0)**
* We have [performed benchmarking of the GELU activation](https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG#scrollTo=0GL2x2S4zxW3)
which shows it may be beneficial to retain the custom-ops, but the maintence burden has grown
to much for us to continue to support it in Addons.
* This migration is long over-due but we've struggled with finalizing the migration process.

## Implementation Details
* Link to implementation in Addons:
* Python: https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/activations/gelu.py
* C++ : https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h
* Does this include custom-op kernels?
* Yes, but currently proposing to just migrate the python composite op. This may
change with discussion in the RFC.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean only python implementation will be migrated to TF?
I ask this because I saw the below topic Is it worth bringing over the custom-op kernels for CPU/GPU? We really hope to add a core kernel for CPU, it will get benefit from fusion, mkldnn or other optimization easily.

Copy link

@nammbash nammbash Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding to the discussion Zantares(Teng Lu) mentions, there is a need for core C++ kernels over and above python was also requested and mentioned in the PR by pennporn here:
tensorflow/tensorflow#33945 (comment)

* Are they CPU/GPU/TPU compatible?
* CPU/GPU compatible. No support for TPU.
* What is the pytest coverage of the addon?
* `tensorflow_addons/activations/gelu.py 89%`
## Changes to Implementation (If Needed)
```
def gelu(x: types.TensorLike, approximate: bool = True) -> tf.Tensor:
x = tf.convert_to_tensor(x)
if approximate:
pi = tf.cast(math.pi, x.dtype)
coeff = tf.cast(0.044715, x.dtype)
return 0.5 * x * (1.0 + tf.tanh(tf.sqrt(2.0 / pi) * (x + coeff * tf.pow(x, 3))))
else:
return 0.5 * x * (1.0 + tf.math.erf(x / tf.cast(tf.sqrt(2.0), x.dtype)))
```
The above implementation would only bring over the python composite op. Since there is
[no way for us to build tfxla kernels](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617842977)
we had no support for TPUs in Addons. [There were comments](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-625380208)
about using a "selector", but we would need guidance on how to implement that.

We may also want to discuss the `approximate` bool and if it should be included in the
upstream version.


## Transition Plan
* The activation would land in [nn_ops.py](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow//python/ops/nn_ops.py), [keras activations](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/activations.py),
and possibly in [keras advaced_activation layers](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/layers/advanced_activations.py)
* No planned changes to the parameter signatures at this time
* Addons would deprecate our activation and make a call to the core functionality.
* After merging to TF Core:
* Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu.py
* Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu_test.py
* Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/nlp/xlnet/xlnet_modeling.py#L29

## Relevant GitHub Issues
* https://github.com/tensorflow/tensorflow/pull/33945
* https://github.com/tensorflow/addons/issues/550
* https://github.com/tensorflow/tensorflow/issues/32783
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add keras-team/keras#11834?


## Questions and Discussion Topics
* Whom from the TF core team would sponsor this migration and ownership of the API?
* Is it worth bringing over the custom-op kernels for CPU/GPU?

## Final Decision
TBD