Skip to content

Conversation

tohskai
Copy link

@tohskai tohskai commented Sep 21, 2025

Copy link

meta-cla bot commented Sep 21, 2025

Hi @tohskai!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Copy link

meta-cla bot commented Sep 21, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 21, 2025
Copy link
Contributor

@wwwjn wwwjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the blog and find the memory budget idea is cool. Have you try out the implementation on some model (eg, llama3) with torch.compile? I'm curious does it works end to end and if the performance are better

@tohskai
Copy link
Author

tohskai commented Sep 22, 2025

I read the blog and find the memory budget idea is cool. Have you try out the implementation on some model (eg, llama3) with torch.compile? I'm curious does it works end to end and if the performance are better

I haven't done runs on llama3, but on our benchmarks on it showed significant improvements over regular SAC. This is why I wanted to upstream this :)
image
image

But our model is quite different, so it's totally reasonable to see less gains.

@tianyu-l tianyu-l requested a review from soulitzer September 22, 2025 21:19
@wwwjn
Copy link
Contributor

wwwjn commented Sep 22, 2025

Thanks for sharing! We would love to see more verifications - eg, correctness and loss curves , and performance analysis on titan supported models (llama3, etc)

cc @soulitzer for reviewing

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done runs on llama3, but on our benchmarks on it showed significant improvements over regular SAC. This is why I wanted to upstream this :)

I agree with @wwwjn that

We would love to see more verifications - eg, correctness and loss curves , and performance analysis on titan supported models (llama3, etc)

Please refer to https://github.com/pytorch/torchtitan/blob/main/CONTRIBUTING.md#proof-of-value

@tohskai
Copy link
Author

tohskai commented Oct 1, 2025

@wwwjn @soulitzer @tianyu-l

Should this support selection of activation_memory_budget_solver and activation_memory_budget_runtime_estimator? What about visualize_memory_budget_pareto? I found them to be useful, and the discoverability is low, but given that this is unstable api and potentially feature overload, I would prefer to hear your opinion.

https://github.com/pytorch/pytorch/blob/main/torch/_functorch/config.py#L147-L169

@tohskai
Copy link
Author

tohskai commented Oct 1, 2025

I ran models/llama3/train_configs/llama3_8b.toml on 8xH100:

image image image image

Would that suffice for Proof of Value?

@tohskai
Copy link
Author

tohskai commented Oct 6, 2025

I addressed your comments, rebased to avoid conflicts and added the other parts of the api, but I am totally okay with reverting it back, just waiting for your opinion.

@tohskai tohskai requested a review from tianyu-l October 7, 2025 09:12
Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Sounds good in general. Left some comments. Please see if they make sense.

)
model.layers.register_module(layer_id, transformer_block)
if ac_config.mode == "memory_budget":
assert (model_compile_enabled is True), "Memory budget mode requires model to be compiled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert (model_compile_enabled is True), "Memory budget mode requires model to be compiled"
assert model_compile_enabled, "Memory budget mode requires model to be compiled"

Comment on lines +577 to +590
activation_memory_budget_runtime_estimator: Literal["flops", "profile"] = "flops"
"""
This controls how we estimate the runtime when deciding what the cheapest
operators to recompute are. The 3 options are
"flops": Bases it off of the flop count provided by torch.utils.flop_counter
"profile": Benchmarks each operator to come up with a runtime
"testing": Returns 1 for everything
"""
activation_memory_budget_solver: Literal["dp", "greedy", "ilp"] = "dp"
"""
This controls the solver used for the 0-1 knapsack. By default we use a
quantized DP solution ("dp"). The other approaches are a "greedy" and a "ilp"
(which has a scipy dependency).
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell if users should change them. Maybe let's remove them for now and see if people have complaints.

quantized DP solution ("dp"). The other approaches are a "greedy" and a "ilp"
(which has a scipy dependency).
"""
visualize_memory_budget_pareto: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not sure. It could be useful because otherwise people have no idea what value should they set the budge?
Btw, the picture you linked doesn't seem to be in "increments of 0.5". It seems 0.05.

Whether to stop recomputing early when all activations have already been
rematerialized.
"""
activation_memory_budget: float = 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set 0.5 by default, o/w if user turns on memory_budget without tuning this, nothing would happen.

This dumps out a SVG visualization of the expected runtime vs. activation
memory tradeoffs for all memory budget values from 0 to 1 in increments of
0.5. See an example here:
https://github.com/pytorch/pytorch/pull/126320#discussion_r1625104015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe what folder it'll dump into.

Whether to stop recomputing early when all activations have already been
rematerialized.
"""
activation_memory_budget: float = 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add space between configs, o/w hard to tell if a message is associated with config above / below.

Whether to stop recomputing early when all activations have already been
rematerialized.
"""
activation_memory_budget: float = 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activation_ is redundant, so we can just call it

Suggested change
activation_memory_budget: float = 1.0
memory_budget: float = 1.0

0.0 corresponds to the activation memory from applying
activation checkpointing to the full compiled region, and 1.0 corresponds to
the activation memory from the default runtime-optimized strategy.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants