-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Open
Labels
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Hi, I'm currently on the latest master commit bcbddcd, I traced a simple generation of a few tokens with rocprofv3, and noticed that the activations are quantized before each linear, without reuse, specifically, the activations are quantized before the key projection, before the query proj, and before the value proj, and also before the up proj and the gate proj, but it is enough to quantize once for the query/key/value, and once for the up and gate projections.
I attach the trace with the visual representation of the issue

Motivation
Faster performance
Possible Implementation
No response