Skip to content

[Feature] RandomCrop for Audio #416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
haideraltahan opened this issue Jan 16, 2020 · 3 comments
Closed

[Feature] RandomCrop for Audio #416

haideraltahan opened this issue Jan 16, 2020 · 3 comments
Assignees

Comments

@haideraltahan
Copy link

haideraltahan commented Jan 16, 2020

🚀 Feature

Similar to RandomCrop in torchvision but implemented for audio.

Motivation

Often we have a model with fixed input but the dataset has variable audio length. One approach to remedy this problem would be to randomly crop the audio to a fixed length. Thereby, allowing us to feed to our model. I would've used RandomCrop in torchvision, however, it only takes PIL Image instead of a Tensor. We need it done on audio across time only not changing the channel dimension of the audio.

Pitch

The implementation is for audio given a tensor. We would return a randomly cropped segment of the audio given a requested audio length. Keeping the same number of channels. Alternatively, if the audio is shorter in length than the requested audio length, we would pad the audio across time.

Additional context

This is my first contribution hence I was not aware that I needed to create an issue before PR (#403). I apologize for that 😕

@vincentqb
Copy link
Contributor

Thanks for bringing this up! This may be relevant to pytorch/vision#1375 for torchvision to migrate away from PIL. Let's see if they have interest in having something like this. @fmassa

We had removed deterministic pad and trim since they already existed in pytorch, see #160. This is slightly different since it also adds randomness.

Opening an issue is usually a good idea, since this allows you to get feedback before starting to work on code that may or may not be aligned with current needs :)

@haideraltahan haideraltahan changed the title RandomCrop for Audio [Feature] RandomCrop for Audio Jan 17, 2020
@vincentqb
Copy link
Contributor

By the way, when loading an audio file, torchaudio support reading only the segment provided. This avoids having to read a whole audio file when only a segment is of interest.

@mthrok
Copy link
Collaborator

mthrok commented Aug 3, 2021

Closing the PR as

  1. torchvision's RandomCrop now accepts torch.Tensor
  2. This needs formal specification, as many audio training has corresponding metadata, which also needs to be cropped in the same time steps.

@mthrok mthrok closed this as completed Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants