Update nn.py #21250

pctablet505 · 2025-05-05T10:36:52Z

Added support for flash attention with sharding, fixed issue when using flash attention on tpu.

codecov-commenter · 2025-05-05T10:42:52Z

Codecov Report

Attention: Patch coverage is 15.78947% with 32 lines in your changes missing coverage. Please review.

Project coverage is 82.56%. Comparing base (4595239) to head (ace9536).

Files with missing lines	Patch %	Lines
keras/src/backend/jax/nn.py	15.78%	30 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #21250      +/-   ##
==========================================
- Coverage   82.59%   82.56%   -0.04%     
==========================================
  Files         564      564              
  Lines       54556    54580      +24     
  Branches     8479     8486       +7     
==========================================
+ Hits        45062    45065       +3     
- Misses       7405     7426      +21     
  Partials     2089     2089

Flag	Coverage Δ
keras	`82.37% <15.78%> (-0.04%)`	⬇️
keras-jax	`63.63% <15.78%> (-0.03%)`	⬇️
keras-numpy	`58.76% <0.00%> (-0.03%)`	⬇️
keras-openvino	`32.97% <0.00%> (-0.02%)`	⬇️
keras-tensorflow	`64.05% <0.00%> (-0.03%)`	⬇️
keras-torch	`63.70% <0.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

divyashreepathihalli

Thanks for the PR!
The gating logic is a little confusing for me. I left some comments. Thanks!

divyashreepathihalli · 2025-05-05T16:54:17Z

keras/src/backend/jax/nn.py

+    )
+    is_tpu = jax.devices()[0].platform == "tpu"
+
+    # Determine flash attention compatibility


I am very confused about the logic here.

why is FA disabled if inputs are sharded?

divyashreepathihalli · 2025-05-05T16:55:11Z

keras/src/backend/jax/nn.py

+        flash_attention = (
+            not inputs_sharded or is_tpu
+        ) and _can_use_flash_attention(query, key, value, bias)
+    elif flash_attention and inputs_sharded and not is_tpu:


this condition is weird
if FA is enabled and inputs are sharded and is not running on TPU - you are disabling FA? why? can you please explain?
following this you are checking if running on TPU and FA is enabled - this will never be true if inputs are sharded - whats the point?

divyashreepathihalli · 2025-05-05T16:58:06Z

keras/src/backend/jax/nn.py


-    # `dot_product_attention` is only available in jax>=0.4.31
+        # Process mask for Splash Attention
+        custom_mask = None


lets verify numerics remain consistent with this updated code to mask

pctablet505 · 2025-05-06T07:59:47Z

#21254
I've raised a new pull request, as I had to delete the repository due to some reasons. I corrected the logic for when to enable flash attention.

google-ml-butler bot added the size:M label May 5, 2025

google-ml-butler bot assigned gbaned May 5, 2025

pctablet505 requested a review from divyashreepathihalli May 5, 2025 10:36

google-ml-butler bot added the awaiting review label May 5, 2025

divyashreepathihalli reviewed May 5, 2025

View reviewed changes

pctablet505 closed this May 6, 2025

pctablet505 force-pushed the master branch from ecbc23d to 6ddaefb Compare May 6, 2025 04:37

google-ml-butler bot removed the awaiting review label May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update nn.py #21250

Update nn.py #21250

pctablet505 commented May 5, 2025

codecov-commenter commented May 5, 2025 •

edited

Loading

divyashreepathihalli left a comment

divyashreepathihalli May 5, 2025 •

edited

Loading

divyashreepathihalli May 5, 2025 •

edited

Loading

divyashreepathihalli May 5, 2025

pctablet505 commented May 6, 2025

Update nn.py #21250

Update nn.py #21250

Conversation

pctablet505 commented May 5, 2025

codecov-commenter commented May 5, 2025 • edited Loading

Codecov Report

divyashreepathihalli left a comment

Choose a reason for hiding this comment

divyashreepathihalli May 5, 2025 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli May 5, 2025 • edited Loading

Choose a reason for hiding this comment

divyashreepathihalli May 5, 2025

Choose a reason for hiding this comment

pctablet505 commented May 6, 2025

codecov-commenter commented May 5, 2025 •

edited

Loading

divyashreepathihalli May 5, 2025 •

edited

Loading

divyashreepathihalli May 5, 2025 •

edited

Loading