Add ademamix baseline and optimal hparams #12

tfaod · 2025-08-23T20:00:02Z

New Submission

Submission Information

Please fill out the following information about your submission within the quotation marks.

submission_name: "ademamix"
submission_folder: "submissions/self_tuning/ademamix"  
submission authors:
  * authors: "Alice Yang"  # List authors separated by commas
  * affiliations: "Meta Superintelligence Labs, FAIR Team "
algorithm authors:
  * authors: "Matteo Pagliardini, Pierre Ablin, David Grangier"
  * affiliations: "EPFL, Apple"
version: "1.0"  # Optional version number of your submission
ruleset: "self-tuning"
framework: "PyTorch"
description: "ademamix optimizer with optimal hparams"

Evidence for the Submission's Performance

See results from two AdEMAMix sweeps, compared to baseline sfadamw_v2 and nadamw submissions

Sweep Details

AdEMAMix Sweep 1 - compare with/wo warmup for beta3
- runs wo warmup for beta3 struggled to hit more than half of the targets, while runs with warmup for beta3 consistently hit most/all targets
AdEMAMix Sweep 2 - sweep over lr, wd, alpha
- sweep range:
  - wd: [0, 0.1]
  - lr: [1e-4 to 5e-3]
  - alpha: [8, 10]
- fixed values from lion paper
  - beta1: 0.9
  - beta2: 0.999
  - beta3: 0.9999
  - alpha_warmup: 500000,
  - beta_warmup: 500000
AdEMAMix Sweep 3 across betas for top wd, lr, alpha values
- sweep range:
  - beta1: [0.8, 0.99]
  - beta2: [0.95, 0.999]
  - beta3: [0.99, 0.9999]
- fixed values:
  - wd: 0.1
  - lr: {2e-3, 5e-3}
  - alpha: 8
  - alpha_warmup: 500000,
  - beta_warmup: 500000
Final Top Values:
- (incl criteo run on 32gb): all_ademamix_w_sched_alpha_sweep_over_betas_with_criteo_on_32gb_betas1-0.8_2-0.995_30.9995_lr0.002_wd0.1_alpha8
- (excl criteo): all_ademamix_w_sched_alpha_sweep_over_betas_betas1-0.95_2-0.99_30.9999_lr0.002_wd0.1_alpha8

...

...

Comments

AdEMAMix requires more memory, due to its addition of a third momentum sequence.
The optimizer runs out of memory on the criteo1tb workload with the preset batch size (~262k).
We tried the following three remediations
- Strategy 1) Sweeping over the batch size revealed that the criteo workload began to hit the target when batch size was decreased to ~60k.
  - Despite reaching the target on all workloads, the runtime was increased significantly to where the algorithm was no longer competitive.
- Strategy 2) We implemented and swept over across four "memory-safe" versions of the AdEMAmix algorithm.
  - While the optimizer was able to hit the workload on criteo1tb, it was significantly slower on the remaining workloads.
- Strategy 3) We doubled the the available memory by running criteo1tb workload on 8x 32gb A100 GPUs.
  - We combined these results with the remaining workloads run on the tradition 16GB GPUs.
  - This modified algorithm significantly outperformed the competitive self-tuning nadamw baseline.
We find a significant tradeoff between memory-consumption and speed in the AdEMAMix algorithm. We will look into future modifications of the AdEMAMix algorithm to preserve the competitive speed while reducing memory usage.

github-actions · 2025-08-23T20:00:16Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

tfaod · 2025-08-23T20:00:45Z

@priyakasimbeg @fsschneider The AdEMAMix submission, as requested. The optimal hparams include ogbg.

Add ademamix with optimal hparams

f171a3e

tfaod requested a review from a team as a code owner August 23, 2025 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ademamix baseline and optimal hparams #12

Add ademamix baseline and optimal hparams #12

Uh oh!

tfaod commented Aug 23, 2025

Uh oh!

github-actions bot commented Aug 23, 2025

Uh oh!

tfaod commented Aug 23, 2025

Uh oh!

Uh oh!

Add ademamix baseline and optimal hparams #12

Are you sure you want to change the base?

Add ademamix baseline and optimal hparams #12

Uh oh!

Conversation

tfaod commented Aug 23, 2025

New Submission

Submission Information

Evidence for the Submission's Performance

Sweep Details

Comments

Uh oh!

github-actions bot commented Aug 23, 2025

Uh oh!

tfaod commented Aug 23, 2025

Uh oh!

Uh oh!