-
Notifications
You must be signed in to change notification settings - Fork 7.1k
GPU VideoReader not working #5702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems to be a very serious bug, as many video learning code relies on the GPU decoding. Could someone please help? @mostafarohani I'm also curious, how do you manage to compile torchvision with GPU video decoding support? I installed |
@mostafarohani @LinxiFan we're currently assigning all GPU decoding issues as low-pri as there was a shift in priorities and we're a bit short on people. I'll definitely look at this but probably not before the end of the month. Sorry about that :( @prabhat00155 do you perhaps know what this could be about? If so, maybe I can take a look at it sooner if I have a decent starting point?
@LinxiFan the instructions are here; we're utilising NVC rather than on FFMPEG for hardware-accelerated decoding. |
Is our CI actually running the tests for the GPU decoder? Looking at recent CI runs it seems like the tests are always skipped. |
@NicolasHug nope: see #5147 |
The current implementation of GPU decoder enables you to seek once in the video and read frames from there. If you seek multiple times, the subsequent outputs may be frames closely related to the first seek. The reason being that multiple frames could have been decoded and queued up for return, after seek and demuxing. |
@prabhat00155 in this case, I would propose to drop all enqueued frames after a seek. Seeking multiple times on the same file is a very common operation, and in its current state the decoder is indeed not really working as expected |
@fmassa That makes sense. |
Thanks for the responses everyone, and I agree with @LinxiFan that this is a very serious bug. In my use case, I can't afford to just fetch all the frames (there are thousands); only a subset of 15-30 are annotated, and the annotations are not spaced evenly apart; thus I need to do the seeking like described above. On the other hand, the CPU version is so slow that it blocks my model training. Our current hack is to export all the frames as jpg as preprocessing and then do model training. The con of course is the dataset size ballooned from 100GB to 5TB, which has caused a lot of issues, like not being able to store all the data on the server's local disk. A prompt fix on this would be much appreciated <3 |
@LinxiFan , to install, what i did was in my dockerfile
i also add |
🐛 Describe the bug
When seeking to specific timestamps in the video and trying to extract the closest image frames, the cpu implementation of VideoReader works exactly as expected. However, the gpu implementation outputs progressively more corrupted versions of a single frame, with halo effects of other frames getting more prevalent in the latter frames.
Versions
The text was updated successfully, but these errors were encountered: