Skip to content

Conversation

lorenzonoci
Copy link

Depth alpha parameterization (without adam eps, weight decay). Anyway weight decay is not affected by the depth alpha parameterization.

@ndey96 ndey96 deleted the depth-alpha-branch branch May 8, 2025 17:54
klei22 pushed a commit to klei22/nanoGPT that referenced this pull request Sep 23, 2025
…eriments

Add time estimate for experiments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants