[Feature] RandomCrop for Audio #416

haideraltahan · 2020-01-16T21:36:22Z

🚀 Feature

Similar to RandomCrop in torchvision but implemented for audio.

Motivation

Often we have a model with fixed input but the dataset has variable audio length. One approach to remedy this problem would be to randomly crop the audio to a fixed length. Thereby, allowing us to feed to our model. I would've used RandomCrop in torchvision, however, it only takes PIL Image instead of a Tensor. We need it done on audio across time only not changing the channel dimension of the audio.

Pitch

The implementation is for audio given a tensor. We would return a randomly cropped segment of the audio given a requested audio length. Keeping the same number of channels. Alternatively, if the audio is shorter in length than the requested audio length, we would pad the audio across time.

Additional context

This is my first contribution hence I was not aware that I needed to create an issue before PR (#403). I apologize for that 😕

vincentqb · 2020-01-17T16:44:38Z

Thanks for bringing this up! This may be relevant to pytorch/vision#1375 for torchvision to migrate away from PIL. Let's see if they have interest in having something like this. @fmassa

We had removed deterministic pad and trim since they already existed in pytorch, see #160. This is slightly different since it also adds randomness.

Opening an issue is usually a good idea, since this allows you to get feedback before starting to work on code that may or may not be aligned with current needs :)

vincentqb · 2020-03-05T20:07:51Z

By the way, when loading an audio file, torchaudio support reading only the segment provided. This avoids having to read a whole audio file when only a segment is of interest.

mthrok · 2021-08-03T21:15:18Z

Closing the PR as

torchvision's RandomCrop now accepts torch.Tensor
This needs formal specification, as many audio training has corresponding metadata, which also needs to be cropped in the same time steps.

haideraltahan mentioned this issue Jan 16, 2020

Implementation of RandomCrop for audio #403

Closed

vincentqb assigned vincentqb and fmassa Jan 17, 2020

haideraltahan changed the title ~~RandomCrop for Audio~~ [Feature] RandomCrop for Audio Jan 17, 2020

mthrok closed this as completed Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] RandomCrop for Audio #416

[Feature] RandomCrop for Audio #416

haideraltahan commented Jan 16, 2020 •

edited

Loading

vincentqb commented Jan 17, 2020

Uh oh!

vincentqb commented Mar 5, 2020

Uh oh!

mthrok commented Aug 3, 2021

Uh oh!

[Feature] RandomCrop for Audio #416

[Feature] RandomCrop for Audio #416

Comments

haideraltahan commented Jan 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Feature

Motivation

Pitch

Additional context

vincentqb commented Jan 17, 2020

Uh oh!

vincentqb commented Mar 5, 2020

Uh oh!

mthrok commented Aug 3, 2021

Uh oh!

haideraltahan commented Jan 16, 2020 •

edited

Loading