Skip to content

Conversation

tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Feb 22, 2024

Stack from ghstack (oldest at bottom):

Previously, alpaca dataset is consumed up after only ~50 iterations with 8 data parallel ranks and 8 batch size. This PR adds the (default) option to loop infinitely on the dataset, so that we can unblock integrating other functionalities. Note that loss-related metrics should be read with caution as this will cause overfit.

Update: moved to #92 because migrating to pytorch/ confused ghstack.

tianyu-l added a commit that referenced this pull request Feb 22, 2024
ghstack-source-id: e9fa7fd
Pull Request resolved: #66
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 22, 2024
Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question on stop condition. otherwise LGTM.

Comment on lines +77 to +78
if not self.infinite:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some mechanic to allow a stop? self.infinite is a constant after being initialized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cmd + c should be sufficient?

Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm!

yield input, label
while len(all_tokens) >= max_buffer_token_len:
x = torch.LongTensor(all_tokens[:max_buffer_token_len])
# batched_x = x.reshape(self.batch_size, -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can delete the staled comment?

Comment on lines +77 to +78
if not self.infinite:
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cmd + c should be sufficient?

@tianyu-l tianyu-l closed this Feb 27, 2024
tianyu-l added a commit that referenced this pull request Aug 16, 2024
ghstack-source-id: e9fa7fd
Pull Request resolved: #66
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants