-
Notifications
You must be signed in to change notification settings - Fork 356
Add max-prefill-length argument in distillation dataset generation script #1748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Surbhi!
Question out of curiosity: which model are you experimenting with this SFT, and what is roughly the maximum prefill/max length supported on the chip you are using?
Asking because of the multimodal work, even the shortest prompt Describe image <start_of_image>
will require 272 tokens in the prefill length. Mentioning this so that the group can be aware of, thanks!
I am using the base configurations for both: https://github.com/AI-Hypercomputer/maxtext/blob/main/MaxText/configs/base.yml#L470 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Surbhi. Added a suggestion
7a139b4
to
62e98fa
Compare
62e98fa
to
f7f18c7
Compare
f7f18c7
to
f9a7698
Compare
Description
This PR introduces
max-prefill-length
argument to the script that is used to generate dataset for distillation. This argument will be used to filter out prompt sequences that are larger thanmax-prefill-length
before running inference.Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.
Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.
Tests
Checklist
Before submitting this PR, please make sure (put X in square brackets):