-
Notifications
You must be signed in to change notification settings - Fork 30.5k
Add CodeGen model #17443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CodeGen model #17443
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this model @rooa ! The code is looking good already. Left some comments. Specifically:
- We should add as much
Copied from ...
statements as possible - We should remove the manual parallelization logic, more details in the comment below.
Let me know if you have any other questions. I will look into tests after these changes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing the comments, the PR looks almost ready. Just left a comment about tie_word_embeddings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this new model!
Good for me with @patil-suraj comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general. Would be nice if we could give self.bias
a better name - think it'd make reading the code much easier
]: | ||
|
||
qkv = self.qkv_proj(hidden_states) | ||
# TODO(enijkamp): factor out number of logical TPU-v4 cores or make forward pass agnostic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(out of curiosity) what does the comment mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patil-suraj why resolve here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @rooa ! This looks good for merge now, once @patrickvonplaten's comment is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to clean up before merging:
- Some tests are failing
- We should add a
truncate_before_pattern
function arg that takes a list of patterns before which we truncate. I think it's important to stay flexible here
return pairs | ||
|
||
|
||
class CodeGenTokenizer(PreTrainedTokenizer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we should add some more # Copied from ...
statements here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patil-suraj why resolve here if there hasn't been an answer or change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Here we add an extra method truncate
in the tokenizer , so didn't add the # Copied from ...
statement.
Hey @patil-suraj @rooa you should go fetch upstream on your fork. There were some test fixes that I think you are missing which is causing the red exes that no one likes to see. I actually would love to use this but I can't because this PR is not merged yet! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge for me!
Co-authored-by: Patrick von Platen <[email protected]>
Merging now! Thanks a lot @rooa for working on this and being patient with the review and tests. |
What does this PR do?
Adds CodeGen PyTorch model.
Before submitting
Pull Request section?
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@lvwerra @patil-suraj @loubnabnl