Skip to content

Changes to support input sequence ID tracking #70264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 40 commits into from

Conversation

alexmsettle
Copy link
Contributor

in the NVTX markers. This feature adds additional information
to the NVTX marker string eg seq_ids=[101, 102, 103]. This indicates
the sequence id of the op which produced the input tensor based on its
position index in the array. In the above example input tensor 0 was produced by
the node with sequence id 101, input tensor 1 is from node 102, input tensor 2 is from
node with sequence id 103. This is the same way the sizes array is
organized. If you know the sequence id of the node and the sequence ids
of the input edges, then you have enough information to construct the
network graph.

Fixes #66105

in the NVTX markers.  This feature adds additional information
to the NVTX marker string eg seq_ids=[101, 102, 103].  This indicates
the sequence id of the op which produced the input tensor based on its
position index in the array.  This is the same way the sizes array is
organized. If you know the sequence id of the node and the sequence ids
of the input edges, then you have enough information to construct the
network graph.
@pytorch-probot
Copy link

pytorch-probot bot commented Dec 21, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/alexmsettle/pytorch/blob/6d7124d985841878dd905065de40046d9c20bf26/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-binary-conda ciflow/binaries, ciflow/binaries/conda 🚫 skipped
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries/libtorch 🚫 skipped
linux-binary-manywheel ciflow/binaries, ciflow/binaries/wheel 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-bionic-py3.6-clang9 ciflow/xla 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 21, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 1547536 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@samdow samdow added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 23, 2021
@albanD albanD requested review from robieta and removed request for albanD December 29, 2021 12:04
@alexmsettle
Copy link
Contributor Author

Hi - this is my first PR for pytorch. What is the next step for getting this through the CI process?

@robieta
Copy link

robieta commented Jan 5, 2022

Hi - this is my first PR for pytorch. What is the next step for getting this through the CI process?

Hey, thanks for the PR. I'm on vacation until the 7th, and then I'll review and help guide it in.

@alexmsettle
Copy link
Contributor Author

alexmsettle commented Jan 5, 2022 via email

Merge remote-tracking branch 'origin/master' into alex_input_seq_ids
@alexmsettle
Copy link
Contributor Author

I just resolved a merge conflict that was introduced after I started this PR. It looks like the kineto sw has been changing recently and it is causing conflicts with my changes.

@robieta
Copy link

robieta commented Jan 18, 2022

We just wrapped up performance reviews, so I'll finally have time to review this. Thanks for your patience.

@alexmsettle
Copy link
Contributor Author

alexmsettle commented Jan 18, 2022 via email

@robieta
Copy link

robieta commented Jan 18, 2022

Thanks! Any idea how to restart a the test? I saw that one of the CI jobs was cancelled.

I went ahead and just manually restarted it.

@soulitzer soulitzer removed their request for review January 18, 2022 17:18
@alexmsettle
Copy link
Contributor Author

Thanks for restarting the pipeline, looks like it worked this time.

@robieta
Copy link

robieta commented Jan 19, 2022

At a high level, this seems awesome! So much so that I also want this for the Kineto profiler. (Though I'm perfectly happy to have this PR just add it for NVTX and then I can steal all the utility functions later.)

I do have some concerns about using Autograd Node as a proxy for Tensor identity. I understand why it's appealing since Tensors don't have a good way of assigning uuid, but I worry about losing information. For one, this won't work in inference mode (which, ironically, is when a lot of more intrusive optimizations like buffer reuse become available). It also (I think) will alias Tensors that are produced from multi-output ops like split unless we also record the output index. (Though [sequenceID, TensorImpl*] should be unambiguous.) And lastly, because of views and version bumps in autograd some analyses (like memory planning or fusion) won't be possible unless we also trace StorageImpl* / storage_offset / strides / version_number. WDYT?

@alexmsettle
Copy link
Contributor Author

Thanks for the feedback @robieta !

I think your concerns about using the autograd Node are valid. When I added this feature I was specifically working on performance analysis of training, so not working on inference wasn't really a concern at the time. Also the Autograd Node was the only data structure I could find that maintained information about the producer of a given tensor. Leveraging the JIT graph is another option I looked into, but it was difficult to convert the entire network to torch script for all the nets I was investigating. I would like for this to work for inference, I'm actually starting to analyze inference performance now, so it is an important issue for me. Do you have any suggestions for alternative data structures for maintaining the tensors' producer information?

For training the autograd Node approach works reasonably well. I have run into some issues that would be good to resolve. Would it be possible to move forward with this PR, then work on a better solution to recording the tensors' producer? Most of the code supporting the profiling would stay in-tact. The LOC for the autograd node approach is small, so replacing it with a more robust solution shouldn't be too much effort once there's s a good solution in place.

@facebook-github-bot
Copy link
Contributor

@robieta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

alexmsettle and others added 4 commits March 11, 2022 11:50
the early exits in THPFunction_apply().  Feedback
from code review.
…put_seq_ids

Remote alex_input_seq_ids was already merged with master, this merge makes
local branch match the remote.
@alexmsettle
Copy link
Contributor Author

Hi @robieta I fixed the RECORD_FUNCTION issue in python_function.cpp. CI passed, so it should be good to go now.

@facebook-github-bot
Copy link
Contributor

@robieta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

std::string str("[");
int idx = 0;

for (const auto op_id_info_pair : input_op_ids) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to const auto& op_id_info_pair? One of the internal builds is super pedantic and refuses to build if there is a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed the change

@alexmsettle
Copy link
Contributor Author

@robieta Any idea what this error is?

Using default tag: latest
latest: Pulling from tool/alpine
Digest: sha256:def822f9851ca422481ec6fee59a9966f12b351c62ccb9aca841526ffaa9f748
Status: Image is up to date for 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine:latest
308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine:latest
chown: ./docs/cpp/build/doctrees/environment.pickle: No such file or directory
Error: Process completed with exit code 1.

@facebook-github-bot
Copy link
Contributor

@robieta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@alexmsettle
Copy link
Contributor Author

Any update on the internal testing?

@robieta
Copy link

robieta commented Mar 21, 2022

Any update on the internal testing?

I was out the second half of last week. It's failing some tests, but unclear if it is unrelated flakes. I fired off a new batch.

@alexmsettle
Copy link
Contributor Author

Ok thanks for the update!

@alexmsettle
Copy link
Contributor Author

Any updates on the testing status?

@alexmsettle
Copy link
Contributor Author

ping @robieta

@robieta
Copy link

robieta commented Mar 25, 2022

I'm trying to get it through CI. Because so many projects have PyTorch as a dependency it is an involved process to separate flakes from legitimate breakages. Never fear, I am actively landing this PR.

@alexmsettle
Copy link
Contributor Author

I'm trying to get it through CI. Because so many projects have PyTorch as a dependency it is an involved process to separate flakes from legitimate breakages. Never fear, I am actively landing this PR.

Much appreciated!

@alexmsettle
Copy link
Contributor Author

FYI @robieta here's a clip from a network graph that I generated based on this PR. This is from the network efficientnet. I combined the op_ids with the module names from the nn.module object associated with each aten:: op. It makes it really easy to understand the network architecture, you don't even really need to look at the network source code.

image

@robieta
Copy link

robieta commented Mar 31, 2022

@alexmsettle Testing is in good enough shape (I'm quite confident that all remaining failures are unrelated) that I just started to land.

facebook-github-bot pushed a commit that referenced this pull request Mar 31, 2022
Summary:
in the NVTX markers.  This feature adds additional information
to the NVTX marker string eg seq_ids=[101, 102, 103].  This indicates
the sequence id of the op which produced the input tensor based on its
position index in the array.  In the above example input tensor 0 was produced by
the node with sequence id 101, input tensor 1 is from node 102, input tensor 2 is from
node with sequence id 103. This is the same way the sizes array is
organized. If you know the sequence id of the node and the sequence ids
of the input edges, then you have enough information to construct the
network graph.

Fixes #66105

Pull Request resolved: #70264

Reviewed By: chaekit

Differential Revision: D34792707

Pulled By: robieta

fbshipit-source-id: 4407b853c929a737505803b0db77a8ecd966cce2
@github-actions
Copy link
Contributor

Hey @alexmsettle.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@alexmsettle
Copy link
Contributor Author

@alexmsettle Testing is in good enough shape (I'm quite confident that all remaining failures are unrelated) that I just started to land.

Awesome! Thanks for driving this to completion @robieta

@robieta robieta added the release notes: profiler release notes category label Apr 1, 2022
@robieta
Copy link

robieta commented Apr 1, 2022

@alexmsettle Testing is in good enough shape (I'm quite confident that all remaining failures are unrelated) that I just started to land.

Awesome! Thanks for driving this to completion @robieta

Thanks! Likewise, thanks for adding this awesome feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed open source release notes: profiler release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add input node id tracking to autograd profiler along with input tensor dimensions
8 participants