Skip to content

Feature Request: Repeated Unecessary Activation Quantization Ops #15602

@0seba

Description

@0seba

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Hi, I'm currently on the latest master commit bcbddcd, I traced a simple generation of a few tokens with rocprofv3, and noticed that the activations are quantized before each linear, without reuse, specifically, the activations are quantized before the key projection, before the query proj, and before the value proj, and also before the up proj and the gate proj, but it is enough to quantize once for the query/key/value, and once for the up and gate projections.
I attach the trace with the visual representation of the issue

Image

Motivation

Faster performance

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions