Skip to content

Properly import LooseVersion #996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jeffdaily
Copy link
Collaborator

No description provided.

malfet and others added 2 commits April 13, 2022 16:13
Summary:
This fixes regression introduced by pytorch#57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: pytorch#69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
@jeffdaily jeffdaily merged commit 0d9cbd2 into release/1.9_atomicAddNoReturn Apr 14, 2022
pruthvistony pushed a commit that referenced this pull request Apr 21, 2022
Properly import LooseVersion (pytorch#69904)

Summary:
This fixes regression introduced by pytorch#57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: pytorch#69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
jithunnair-amd pushed a commit to jithunnair-amd/pytorch that referenced this pull request Sep 20, 2022
Properly import LooseVersion (pytorch#69904)

Summary:
This fixes regression introduced by pytorch#57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: pytorch#69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
jithunnair-amd pushed a commit that referenced this pull request Sep 28, 2022
Properly import LooseVersion (pytorch#69904)

Summary:
This fixes regression introduced by pytorch#57040

Somehow importing `distutils` from `setuptool` caused import of
`distutils.versions`, which is not a documented dependency and got
change with the release of
[setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0)
We should not rely on that, as
`import distutils` never re-imports `distutils.version`, which one can
see by observing
https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py
or by running:
```
% python3 -c "import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys']
% python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))"
3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version']
```

Pull Request resolved: pytorch#69904

Reviewed By: albanD, atalman, janeyx99

Differential Revision: D33094453

Pulled By: malfet

fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97
akashveramd pushed a commit that referenced this pull request Jun 13, 2025
Noticed that in the memory snapshot when running the Llama3-8B models,
the cross entropy loss is causing huge spikes in peak memory:
<img width="474" alt="image"
src="https://github.com/user-attachments/assets/31ea0f9a-7734-4454-90e5-e3a4ed0b5db3"
/>

Replace the cross entropy loss function with a chunked version can help
reduce peak memory by 7GiB without affecting training throughput. This
is inspired by the torchtune
[implementation](https://github.com/pytorch/torchtune/blob/c3703482bde72e572b535d3f7c43c81e94164ebc/torchtune/modules/loss/ce_chunked_output_loss.py#L13).

**Baseline run**
```
...
[rank0]:[titan] 2025-03-20 08:19:11,539 - root - INFO - step: 30  loss:  7.7158  memory: 49.58GiB(52.20%)  tps: 5,966  tflops: 345.53  mfu: 34.94%
[rank0]:[titan] 2025-03-20 08:19:25,291 - root - INFO - step: 40  loss:  7.3677  memory: 49.58GiB(52.20%)  tps: 5,957  tflops: 345.01  mfu: 34.88%
[rank0]:[titan] 2025-03-20 08:19:39,040 - root - INFO - step: 50  loss:  7.0890  memory: 49.58GiB(52.20%)  tps: 5,959  tflops: 345.10  mfu: 34.89%
[rank0]:[titan] 2025-03-20 08:19:52,794 - root - INFO - step: 60  loss:  6.8336  memory: 49.58GiB(52.20%)  tps: 5,957  tflops: 344.99  mfu: 34.88%
...
```

**With the changes**
```
...
[rank0]:[titan] 2025-03-20 08:29:30,780 - root - INFO - step: 30  loss:  7.7423  memory: 42.26GiB(44.49%)  tps: 5,957  tflops: 345.01  mfu: 34.89%
[rank0]:[titan] 2025-03-20 08:29:44,554 - root - INFO - step: 40  loss:  7.3368  memory: 42.26GiB(44.49%)  tps: 5,948  tflops: 344.49  mfu: 34.83%
[rank0]:[titan] 2025-03-20 08:29:58,318 - root - INFO - step: 50  loss:  7.0574  memory: 42.26GiB(44.49%)  tps: 5,953  tflops: 344.74  mfu: 34.86%
[rank0]:[titan] 2025-03-20 08:30:12,080 - root - INFO - step: 60  loss:  6.8472  memory: 42.26GiB(44.49%)  tps: 5,953  tflops: 344.79  mfu: 34.86%
...
```

**cc**: @lessw2020 @weifengpy @wconstab @yf225
akashveramd pushed a commit that referenced this pull request Jun 13, 2025
To enable chunked loss, I refactor the `train_spec.loss_fn` to
`train_spec.build_loss_fn` as suggested in #996

 Also:
- Unified chunked loss implementation with the `cross_entropy_loss`
name, since default is a special case with chunk=1.
Identical loss and memory:

<img width="1257" alt="Screenshot 2025-03-30 at 13 03 44"
src="https://github.com/user-attachments/assets/11c07849-2340-4b42-ad5c-d96657ee53b9"
/>

- Enable compiling the loss function as it's working fine for me, and
result in reduction of memory and slightly better througput.

<img width="1275" alt="Screenshot 2025-03-30 at 16 47 00"
src="https://github.com/user-attachments/assets/09edaf16-1770-48da-87b2-e4de29e7519c"
/>

While we only have one type of loss, I create a field for `loss.name`
with default to 'cross_entropy' to be fork-friendly anyway.

Co-authored-by: tianyu-l <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants