Unified Attention Accuracy Bugfixes #393
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've noticed two accuracy issues in unified attention:
unified_execute_model
method.The first one had major impact - I suspect we were malforming batches as the generation process went on, since the self.input_batch.num_tokens & req_state.output_token_ids were not updated correctly - in Granite GSM8K fixing that yielded +10 percentage points improvement
The second one had a negligible impact - I didn't notice any acc improvement in any tests I've run - but we should be masking anything above context length regardless.
I've added GSM8k accuracy test to CI with this PR that should now pass as well.