-
Notifications
You must be signed in to change notification settings - Fork 695
Cache HW device context #3178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache HW device context #3178
Conversation
@mthrok has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: TODO: add cache release Pull Request resolved: pytorch#3178 Differential Revision: D44136275 Pulled By: mthrok fbshipit-source-id: 202b687c246eab285b82768a8ee91a9f45d334d7
This pull request was exported from Phabricator. Differential Revision: D44136275 |
Summary: TODO: add cache release Pull Request resolved: pytorch#3178 Differential Revision: D44136275 Pulled By: mthrok fbshipit-source-id: 002aec2dba734dec9a81778d200235ab940d1b73
This pull request was exported from Phabricator. Differential Revision: D44136275 |
Hey @mthrok. |
In pytorch#3178, a mechanism to cache HW context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory.
In pytorch#3178, a mechanism to cache HW context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory.
Summary: In pytorch#3178, a mechanism to cache HW device context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory. --- Q: What is HW device context? From https://ffmpeg.org/doxygen/4.1/structAVHWDeviceContext.html#details > This struct aggregates all the (hardware/vendor-specific) "high-level" state, i.e. > > state that is not tied to a concrete processing configuration. E.g., in an API that supports hardware-accelerated encoding and decoding, this struct will (if possible) wrap the state that is common to both encoding and decoding and from which specific instances of encoders or decoders can be derived. Pull Request resolved: pytorch#3215 Reviewed By: nateanl Differential Revision: D44504051 Pulled By: mthrok fbshipit-source-id: c52b4463af9ec6eeb01da85e7a4d6a47952aae1e
Summary: In #3178, a mechanism to cache HW device context was introduced. This commit applies the reuse in StreamWriter, so that when using GPU video decoding and encoding, they are shared. This gives back about 250 - 300 MB of GPU memory. --- Q: What is HW device context? From https://ffmpeg.org/doxygen/4.1/structAVHWDeviceContext.html#details > This struct aggregates all the (hardware/vendor-specific) "high-level" state, i.e. > > state that is not tied to a concrete processing configuration. E.g., in an API that supports hardware-accelerated encoding and decoding, this struct will (if possible) wrap the state that is common to both encoding and decoding and from which specific instances of encoders or decoders can be derived. Pull Request resolved: #3215 Reviewed By: nateanl Differential Revision: D44504051 Pulled By: mthrok fbshipit-source-id: 77579cdc8bd9e9b8a218e3f29031d091cda83860
This commit adds caching mechanism to CUDA device context when using GPU video decoding.
The following table shows the performance improvement from the change when decoding 3 seconds of HEVC video with CUVID on NVIDIA GeForce RTX 3080.
Note: The cache was cleared before the 5th iteration
The first time video decoder is used, some other initialization is happening, and it is slower than the other times, but for other cases, caching device context improves the decoding speed.
With 30 seconds of HEVC video, there is 0.13 seconds of improvement.
Note: The cache was cleared before the 5th iteration
Memory-wise it caches about 200 MB of GPU memory.
code
The data is generated with
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test.hevc
andffmpeg -f lavfi -i mandelbrot -t 30 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test.hevc
raw data (3sec)
Upstream main branch
This commit
raw data (30sec)
Upstream main branch
This commit