-
Notifications
You must be signed in to change notification settings - Fork 578
RFC: Migrate gelu activation from Addons to Core #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6a8f789
* gelu migration RFC
seanpmorgan 8fc2dd3
typos
seanpmorgan a3b8b6d
Spacing
seanpmorgan 8f1af58
* Update transition
seanpmorgan 0a2db5d
Add Keras activations
seanpmorgan 1c2bca8
Add RFC number
seanpmorgan cff3725
* Marking RFC as completed
seanpmorgan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Migrate gelu activation from TensorFlow Addons to TensorFlow Core | ||
|
||
| Status | Accepted | | ||
| :---------- | :------------------------------------------------------------------------------------------------- | | ||
| **RFC #** | [252](https://github.com/tensorflow/community/pull/252) | | | ||
| **Authors** | Tzu-Wei Sung (@WindQAQ) & Sean Morgan (@seanpmorgan) | | ||
| **Sponsor** | @alextp | | ||
| **Updated** | 2020-07-15 | | ||
| **Sponsorship Deadline** | 2020-07-17 (45 Days after submission) | | ||
|
||
## Rationale for Migration | ||
* [Gaussian Error Linear Units (GELUs)](https://arxiv.org/pdf/1606.08415.pdf) cited 600+ times | ||
* Used in BERT and other influential architectures | ||
* Multiple approvals from TF side: | ||
* https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617832325 | ||
* https://github.com/tensorflow/tensorflow/issues/32783#issuecomment-537284266 | ||
|
||
## Historical Information | ||
* Have there been signifiant issues reported to Addons that need to be adressed? | ||
* Only ABI incompatibilities for the custom-op (not an issue if built along with core TF) | ||
* When was it implemented in Addons? | ||
* C++ custom-op added **2019-08-2019 (TFA 0.5.0)** | ||
* Python composite op added **2020-02-26 (TFA 0.9.0)** | ||
* We have [performed benchmarking of the GELU activation](https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG#scrollTo=0GL2x2S4zxW3) | ||
which shows it may be beneficial to retain the custom-ops, but the maintence burden has grown | ||
to much for us to continue to support it in Addons. | ||
* This migration is long over-due but we've struggled with finalizing the migration process. | ||
|
||
## Implementation Details | ||
* Link to implementation in Addons: | ||
* Python: https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/activations/gelu.py | ||
* C++ : https://github.com/tensorflow/addons/blob/r0.10/tensorflow_addons/custom_ops/activations/cc/kernels/gelu_op.h | ||
* Does this include custom-op kernels? | ||
* Yes, but currently proposing to just migrate the python composite op. This may | ||
change with discussion in the RFC. | ||
* Are they CPU/GPU/TPU compatible? | ||
* CPU/GPU compatible. No support for TPU. | ||
* What is the pytest coverage of the addon? | ||
* `tensorflow_addons/activations/gelu.py 89%` | ||
## Changes to Implementation (If Needed) | ||
``` | ||
def gelu(x: types.TensorLike, approximate: bool = True) -> tf.Tensor: | ||
x = tf.convert_to_tensor(x) | ||
if approximate: | ||
pi = tf.cast(math.pi, x.dtype) | ||
coeff = tf.cast(0.044715, x.dtype) | ||
return 0.5 * x * (1.0 + tf.tanh(tf.sqrt(2.0 / pi) * (x + coeff * tf.pow(x, 3)))) | ||
else: | ||
return 0.5 * x * (1.0 + tf.math.erf(x / tf.cast(tf.sqrt(2.0), x.dtype))) | ||
``` | ||
The above implementation would only bring over the python composite op. Since there is | ||
[no way for us to build tfxla kernels](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617842977) | ||
we had no support for TPUs in Addons. [There were comments](https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-625380208) | ||
about using a "selector", but we would need guidance on how to implement that. | ||
|
||
We may also want to discuss the `approximate` bool and if it should be included in the | ||
upstream version. | ||
|
||
|
||
## Transition Plan | ||
* The activation would land in [nn_ops.py](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow//python/ops/nn_ops.py), [keras activations](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/activations.py), | ||
and possibly in [keras advaced_activation layers](https://github.com/tensorflow/tensorflow/blob/r2.2/tensorflow/python/keras/layers/advanced_activations.py) | ||
* No planned changes to the parameter signatures at this time | ||
* Addons would deprecate our activation and make a call to the core functionality. | ||
* After merging to TF Core: | ||
* Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu.py | ||
* Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/modeling/activations/gelu_test.py | ||
* Consolidate/remove https://github.com/tensorflow/models/blob/r2.2.0/official/nlp/xlnet/xlnet_modeling.py#L29 | ||
|
||
## Relevant GitHub Issues | ||
* https://github.com/tensorflow/tensorflow/pull/33945 | ||
* https://github.com/tensorflow/addons/issues/550 | ||
* https://github.com/tensorflow/tensorflow/issues/32783 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add keras-team/keras#11834? |
||
|
||
## Questions and Discussion Topics | ||
* Whom from the TF core team would sponsor this migration and ownership of the API? | ||
* Is it worth bringing over the custom-op kernels for CPU/GPU? | ||
|
||
## Final Decision | ||
TBD |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean only python implementation will be migrated to TF?
I ask this because I saw the below topic
Is it worth bringing over the custom-op kernels for CPU/GPU?
We really hope to add a core kernel for CPU, it will get benefit from fusion, mkldnn or other optimization easily.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding to the discussion Zantares(Teng Lu) mentions, there is a need for core C++ kernels over and above python was also requested and mentioned in the PR by pennporn here:
tensorflow/tensorflow#33945 (comment)