You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that we have a flow_to_image() util #5091, we can write a sphinx-gallery example to illustrate how to use the RAFT model.
Ideally, the example would look like something like this:
Briefly explain what optical flow is: predicting movement from 2 consecutive frames
load the model with pre-trained weights
load a video (from our assets folder), or download it. This is TBD. If we host the video on our GitHub repo we need to make sure it's free of use and super lightweight, so it's unlikely. One potential option might be to host the video on our S3 bucket where we host the pre-trained weighst?
read frames from this video with read_video(). Ideally we would be using VideoReader(), but it is only available to users who build torchvision from source, so it may not be usable by most. Note: we don't want to rely on external dependencies to read the video
pass consecutive frames to the model, get predictions, and convert the predicted flows to images via flow_to_image()
plot a few of the predictions for illustration
save the predicted images to a folder and tell the user they can create a video with e.g. ffmpeg -f image2 -framerate 30 -i frame_%d.jpg -loop -1 my_cool_gif.gif
we need the example to be reasonably fast to run, so that means the image resolution should not be too big. Ideally it should take < 1 min on the CI (which does not have a GPU!)
If loading a video is impossible (or hard), we can fallback to just using pre-selected pairs of images. The example won't be as fancy, but still interesting, and we can expand later.
loading the weights from torchvision.models.optical_flow will give pre-trained weights from Chairs + Things. These may not be the most accurate weights, depending on the type of video that we apply the model to. The most accurate weights would probably be e.g. Raft_Large_Weights.C_T_SKHT_V2 or C_T_SKHT_K_V2, but these are only available through the prototype API in torchvision.prototype.models.optical_flow. Whether we can already use this API in a sphinx-gallery example is TBD: perhaps @datumbox can share his thoughts here? In any case it's safe to start writing the eaxmple with the basic pre-trained weights from torchvision.models.optical_flow.
As discussed offline @oke-aditya is interested in this so I'll assign it to you :)
The text was updated successfully, but these errors were encountered:
If we host the video on our GitHub repo we need to make sure it's free of use and super lightweight
Using one of the assets included in our repo sounds like a good choice.
The most accurate weights would probably be e.g. Raft_Large_Weights.C_T_SKHT_V2 or C_T_SKHT_K_V2, but these are only available through the prototype API in torchvision.prototype.models.optical_flow.
The multi-weight API is still in prototype because the community has been providing feedback (#5088) and making changes up until last week. That's why it's probably hard to have it moved on main TorchVision area in this release. This means that you can't depend on it from examples because it will cause issues on the release (unless you change it on the release branch). Another approach could be to change the default weights of your non-prototype model builder (pretrained=True) to use the ones from Raft_Large_Weights.C_T_SKHT_V2. The change is only a few lines of code and since they haven't been officially released, it's non-BC breaking.
Now that we have a
flow_to_image()
util #5091, we can write a sphinx-gallery example to illustrate how to use the RAFT model.Ideally, the example would look like something like this:
assets
folder), or download it. This is TBD. If we host the video on our GitHub repo we need to make sure it's free of use and super lightweight, so it's unlikely. One potential option might be to host the video on our S3 bucket where we host the pre-trained weighst?read_video()
. Ideally we would be usingVideoReader()
, but it is only available to users who build torchvision from source, so it may not be usable by most. Note: we don't want to rely on external dependencies to read the videoflow_to_image()
ffmpeg -f image2 -framerate 30 -i frame_%d.jpg -loop -1 my_cool_gif.gif
A good starting point is #5091 (comment)
Some caveats worth noting:
torchvision.models.optical_flow
will give pre-trained weights from Chairs + Things. These may not be the most accurate weights, depending on the type of video that we apply the model to. The most accurate weights would probably be e.g.Raft_Large_Weights.C_T_SKHT_V2
orC_T_SKHT_K_V2
, but these are only available through the prototype API intorchvision.prototype.models.optical_flow
. Whether we can already use this API in a sphinx-gallery example is TBD: perhaps @datumbox can share his thoughts here? In any case it's safe to start writing the eaxmple with the basic pre-trained weights fromtorchvision.models.optical_flow
.As discussed offline @oke-aditya is interested in this so I'll assign it to you :)
The text was updated successfully, but these errors were encountered: