Skip to content

Sphinx-gallery example for RAFT model #5309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NicolasHug opened this issue Jan 28, 2022 · 1 comment · Fixed by #5316
Closed

Sphinx-gallery example for RAFT model #5309

NicolasHug opened this issue Jan 28, 2022 · 1 comment · Fixed by #5316

Comments

@NicolasHug
Copy link
Member

NicolasHug commented Jan 28, 2022

Now that we have a flow_to_image() util #5091, we can write a sphinx-gallery example to illustrate how to use the RAFT model.

Ideally, the example would look like something like this:

  • Briefly explain what optical flow is: predicting movement from 2 consecutive frames
  • load the model with pre-trained weights
  • load a video (from our assets folder), or download it. This is TBD. If we host the video on our GitHub repo we need to make sure it's free of use and super lightweight, so it's unlikely. One potential option might be to host the video on our S3 bucket where we host the pre-trained weighst?
  • read frames from this video with read_video(). Ideally we would be using VideoReader(), but it is only available to users who build torchvision from source, so it may not be usable by most. Note: we don't want to rely on external dependencies to read the video
  • pass consecutive frames to the model, get predictions, and convert the predicted flows to images via flow_to_image()
  • plot a few of the predictions for illustration
  • save the predicted images to a folder and tell the user they can create a video with e.g. ffmpeg -f image2 -framerate 30 -i frame_%d.jpg -loop -1 my_cool_gif.gif

A good starting point is #5091 (comment)

Some caveats worth noting:

  • we need the example to be reasonably fast to run, so that means the image resolution should not be too big. Ideally it should take < 1 min on the CI (which does not have a GPU!)
  • If loading a video is impossible (or hard), we can fallback to just using pre-selected pairs of images. The example won't be as fancy, but still interesting, and we can expand later.
  • loading the weights from torchvision.models.optical_flow will give pre-trained weights from Chairs + Things. These may not be the most accurate weights, depending on the type of video that we apply the model to. The most accurate weights would probably be e.g. Raft_Large_Weights.C_T_SKHT_V2 or C_T_SKHT_K_V2, but these are only available through the prototype API in torchvision.prototype.models.optical_flow. Whether we can already use this API in a sphinx-gallery example is TBD: perhaps @datumbox can share his thoughts here? In any case it's safe to start writing the eaxmple with the basic pre-trained weights from torchvision.models.optical_flow.

As discussed offline @oke-aditya is interested in this so I'll assign it to you :)

@datumbox
Copy link
Contributor

If we host the video on our GitHub repo we need to make sure it's free of use and super lightweight

Using one of the assets included in our repo sounds like a good choice.

The most accurate weights would probably be e.g. Raft_Large_Weights.C_T_SKHT_V2 or C_T_SKHT_K_V2, but these are only available through the prototype API in torchvision.prototype.models.optical_flow.

The multi-weight API is still in prototype because the community has been providing feedback (#5088) and making changes up until last week. That's why it's probably hard to have it moved on main TorchVision area in this release. This means that you can't depend on it from examples because it will cause issues on the release (unless you change it on the release branch). Another approach could be to change the default weights of your non-prototype model builder (pretrained=True) to use the ones from Raft_Large_Weights.C_T_SKHT_V2. The change is only a few lines of code and since they haven't been officially released, it's non-BC breaking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants