feat: Add min, max ranges to mark_dynamic API #119737

peri044 · 2024-02-13T01:04:54Z

This PR adds:

mark_dynamic API will accept min, max values to create a bounded constraint on the dim.
test case in test_misc.py which checks if ConstraintViolationError is triggered if torch.compile gets a input dimension out of bounds.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @aakhundov @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4
cc: @narendasan

pytorch-bot · 2024-02-13T01:04:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/119737

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending, 3 Unrelated Failures

As of commit 8abdc26 with merge base 8fa6340 ():

NEW FAILURE - The following job has failed:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
mnasnet1_0

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu) (gh)
Build completed, 1 test FAILED, 46 total actions

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_grad_scaling_autocast_foreach_cpu
trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral) (gh)
test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2024-02-13T01:05:08Z

Please seek CI approval before scheduling CIFlow labels

colesbury · 2024-02-14T14:11:21Z

@jansel - would you please review this?

jansel

A nit, though @ezyang should take a look here.

torch/_dynamo/decorators.py

ezyang · 2024-02-14T17:58:49Z

torch/_dynamo/decorators.py


    if isinstance(index, int):
        if not hasattr(t, "_dynamo_dynamic_indices"):
            t._dynamo_dynamic_indices = set()
        # TODO(voz): Should we bounds check?
-        t._dynamo_dynamic_indices.add(index)
+        t._dynamo_dynamic_indices.add((index, min, max))


Consider just making a little data structure for this, easier than indexing it

Sure. I added _DimRange dataclass now to access it via dim_range.min etc. Please let me know if there's any other preference

ezyang · 2024-02-14T18:00:04Z

torch/_dynamo/variables/builder.py

+                if min_val == 2 and not max_val:
+                    constraint_dim = RelaxedUnspecConstraint(warn_only=False)
+                else:
+                    constraint_dim = StrictMinMaxConstraint(


I'm a little concerned about this going from Relaxed to Strict, the main question is whether or not export requires there are NO other constraints, or if it's just testing the value range. If it's just value range that should be alright.

As per my understanding (of this snippet and this file), it seems like it's just testing the value range.

ezyang · 2024-02-14T18:00:20Z

This looks good but there are test failures, you know how to solve?

pytorch-bot · 2024-02-15T09:04:29Z

Please seek CI approval before scheduling CIFlow labels

pytorch-bot · 2024-02-15T09:22:57Z

Please seek CI approval before scheduling CIFlow labels

ezyang · 2024-02-19T00:15:49Z

holler if you need help fixing ci problems

pytorch-bot · 2024-02-20T18:40:38Z

Please seek CI approval before scheduling CIFlow labels

peri044 · 2024-02-20T18:46:07Z

holler if you need help fixing ci problems

Yes. That would be very helpful.

Currently I pushed a commit to fix linter issues. I ran lintrunner --force-color -a <filename> --skip CLANGTIDY,CLANGFORMAT to fix them. Please approve so that CI starts running with this latest commit.
The previous CI error (not visible now) is _DimRange is not defined (occured at test_dynamic_shapes.py -k test_mark_dynamic_with_ranges_dynamic_shapes). I'm not sure why. It passes on my local. Please let me know if you have any suggestions for this if it repeats again. Thanks

jansel · 2024-02-20T20:22:50Z

I approved the CI for you

ezyang · 2024-02-22T04:42:22Z

seems like just silly ci errors right now

peri044 · 2024-02-22T21:15:08Z

seems like just silly ci errors right now

It seems so. The error is as follows

ERROR RUNNING GUARDS my_dyn_fn /home/dperi/Downloads/my_fork/pytorch/test/dynamo/test_misc.py:7106
lambda L, **___kwargs_ignored:
  ___check_global_state() and
  ___check_type_id(L['a'], 94252923105904) and
  ((L['a']._dynamo_dynamic_indices.issubset({_DimRange(dim=0, min=None, max=None)})) if hasattr(L['a'], '_dynamo_dynamic_indices') else True) and
  utils_device.CURRENT_DEVICE == None and
  ___check_current_backend(139619195448208) and
  ___check_tensors(L['a'], tensor_check_names=tensor_check_names) and
  L['a'].size()[0] > 2 and
  2 <= L['a'].size()[0] and
  2 <= L['a'].size()[0]
Malformed guard:
((L['a']._dynamo_dynamic_indices.issubset({_DimRange(dim=0, min=None, max=None)})) if hasattr(L['a'], '_dynamo_dynamic_indices') else True)

As per your suggestion, I made a dataclass _DimRange to store index, min and max in the attribute _dynamo_dynamic_indices. But from the error log it seems like the guards do not have _DimRange definition. Do I need to define it elsewhere ? Any suggestions ? Thanks !!

ezyang · 2024-02-24T22:36:52Z

oh blah, this is annoying. Hmm...

So, we can fix the proximal problem by introducing a _DimRange binding to CLOSURE_VARS in torch/_dynamo/guards.py. However, the failure here has made me realize that there is another annoying problem, which is that the issubset test is no longer the right thing to do in the presence of this extra information. To motivate this, the idea is that let's say you compile some code under the assumption that dim=1 is dynamic. If later you also mark dim=2 dynamic, the guard here will force a recompilation (so that we actually generate a dynamic kernel). If you remove the dim=1 marking, though, we don't recompile, because our dynamic kernel should work for your static case. There's a comment on this in guards.py at

            # A frame is valid for reuse with dynamic dimensions if the new dynamic dimensions are a
            # strict subset of the old.

This is all very delicate though and we are pretty inconsistent (I don't think we're guarding on mark static lol). So I feel maybe the easiest thing to do, is to just store the min/max range on a separate variable and file a bug for follow up on the guard problem. If you want to bash your way past this, though, then you not only need to do the subset test, but you also have to do some containment test on the ranges (a frame is valid to reuse if the new allowed range is a subset of the old).

peri044 · 2024-03-06T17:40:28Z

@pytorchbot label "release notes: dynamo"

peri044 · 2024-03-06T17:43:23Z

Thank you for all the help @ezyang

ezyang · 2024-03-06T17:46:06Z

@pytorchbot merge

pytorchmergebot · 2024-03-06T17:47:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-03-06T17:48:03Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, linux.4xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Signed-off-by: Edward Z. Yang <[email protected]>

ezyang · 2024-03-07T06:08:59Z

@pytorchbot merge

pytorchmergebot · 2024-03-07T06:12:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-03-07T06:26:03Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

ezyang · 2024-03-07T17:11:54Z

@pytorchbot merge -i

pytorchmergebot · 2024-03-07T17:13:48Z

Merge started

Your change will be merged while ignoring the following 3 checks: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral), trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-03-07T17:24:57Z

Merge failed

Reason: 2 jobs have failed, first few of them are: .github/workflows/trunk.yml / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral), .github/workflows/trunk.yml / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

peri044 · 2024-03-07T19:24:37Z

@ezyang The following failures on Windows seem unrelated. Any suggestions ?
1)

test_torch.py::TestTorchDeviceTypeCPU::test_grad_scaling_autocast_foreach_cpu FAILED [0.0849s] [  0%]

=================================== RERUNS ====================================
________ TestTorchDeviceTypeCPU.test_grad_scaling_autocast_foreach_cpu ________
Traceback (most recent call last):
  File "C:\actions-runner\_work\pytorch\pytorch\test\test_torch.py", line 5886, in test_grad_scaling_autocast_foreach
    self._grad_scaling_autocast_test(device=device.type, optimizer_ctor=optimizer_ctor, optimizer_kwargs={"foreach": True})
  File "C:\actions-runner\_work\pytorch\pytorch\test\test_torch.py", line 5872, in _grad_scaling_autocast_test
    self._run_scaling_case(
  File "C:\Jenkins\Miniconda3\lib\unittest\case.py", line 226, in __exit__
    self._raiseFailure("{} not raised".format(exc_name))
  File "C:\Jenkins\Miniconda3\lib\unittest\case.py", line 163, in _raiseFailure
    raise self.test_case.failureException(msg)

test_optim.py::TestSWAUtils::test_averaged_model_all_devices_ema_True <- test\optim\test_swa_utils.py FAILED [0.0112s] [ 28%]

=================================== RERUNS ====================================
____________ TestSWAUtils.test_averaged_model_all_devices_ema_True ____________
Traceback (most recent call last):
  File "C:\actions-runner\_work\pytorch\pytorch\test\optim\test_swa_utils.py", line 99, in test_averaged_model_all_devices
    self._test_averaged_model(cpu, cpu, ema)
  File "C:\actions-runner\_work\pytorch\pytorch\test\optim\test_swa_utils.py", line 69, in _test_averaged_model
    self.assertEqual(p_avg, p_swa)
  File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 3625, in assertEqual
    raise error_metas.pop()[0].to_error(
AssertionError: Tensor-likes are not close!

ezyang · 2024-03-07T23:23:03Z

@pytorchbot merge -i

pytorchmergebot · 2024-03-07T23:25:16Z

Merge started

Your change will be merged while ignoring the following 4 checks: pull / linux-focal-cuda12.1-py3.10-gcc9-bazel-test / build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), trunk / win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral), trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions bot added module: dynamo ciflow/inductor labels Feb 13, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 13, 2024

pytorchbot added the open source label Feb 13, 2024

colesbury requested a review from jansel February 14, 2024 14:11

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 14, 2024

colesbury requested a review from ezyang February 14, 2024 14:11

jansel reviewed Feb 14, 2024

View reviewed changes

torch/_dynamo/decorators.py Outdated Show resolved Hide resolved

ezyang reviewed Feb 14, 2024

View reviewed changes

github-actions bot added the ciflow/inductor label Feb 15, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 15, 2024

github-actions bot added the ciflow/inductor label Feb 15, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 15, 2024

github-actions bot added the ciflow/inductor label Feb 20, 2024

pytorch-bot bot removed the ciflow/inductor label Feb 20, 2024

jansel added the ciflow/inductor label Feb 20, 2024

pytorch-bot bot added the release notes: dynamo label Mar 6, 2024

ezyang added the topic: new features topic category label Mar 6, 2024

pytorchmergebot added the merging label Mar 6, 2024

pytorchmergebot removed the merging label Mar 6, 2024

Handle subclass

fd36b64

Signed-off-by: Edward Z. Yang <[email protected]>

pytorchmergebot added the merging label Mar 7, 2024

pytorchmergebot removed the merging label Mar 7, 2024

chore: linter fix

8abdc26

pytorchmergebot added the merging label Mar 7, 2024

pytorchmergebot removed the merging label Mar 7, 2024

jansel approved these changes Mar 7, 2024

View reviewed changes

pytorchmergebot added the merging label Mar 7, 2024

pytorchmergebot closed this in b1657be Mar 7, 2024

pytorchmergebot added Merged and removed merging labels Mar 7, 2024

feat: Add min, max ranges to mark_dynamic API #119737

feat: Add min, max ranges to mark_dynamic API #119737

Uh oh!

Conversation

peri044 commented Feb 13, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/119737

❌ 1 New Failure, 1 Pending, 3 Unrelated Failures

Uh oh!

pytorch-bot bot commented Feb 13, 2024

Uh oh!

colesbury commented Feb 14, 2024

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ezyang Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

peri044 Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

peri044 Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Feb 14, 2024

Uh oh!

pytorch-bot bot commented Feb 15, 2024

Uh oh!

pytorch-bot bot commented Feb 15, 2024

Uh oh!

ezyang commented Feb 19, 2024

Uh oh!

pytorch-bot bot commented Feb 20, 2024

Uh oh!

peri044 commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jansel commented Feb 20, 2024

Uh oh!

ezyang commented Feb 22, 2024

Uh oh!

peri044 commented Feb 22, 2024

Uh oh!

ezyang commented Feb 24, 2024

Uh oh!

peri044 commented Mar 6, 2024

Uh oh!

peri044 commented Mar 6, 2024

Uh oh!

ezyang commented Mar 6, 2024

Uh oh!

pytorchmergebot commented Mar 6, 2024

Merge started

Uh oh!

pytorchmergebot commented Mar 6, 2024

Merge failed

Uh oh!

ezyang commented Mar 7, 2024

Uh oh!

pytorchmergebot commented Mar 7, 2024

Merge started

Uh oh!

pytorchmergebot commented Mar 7, 2024

Merge failed

Uh oh!

ezyang commented Mar 7, 2024

Uh oh!

pytorchmergebot commented Mar 7, 2024

Merge started

peri044 commented Feb 13, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Feb 13, 2024 •

edited

Loading

peri044 Feb 15, 2024 •

edited

Loading

peri044 Feb 15, 2024 •

edited

Loading

peri044 commented Feb 20, 2024 •

edited

Loading