You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to RandomCrop in torchvision but implemented for audio.
Motivation
Often we have a model with fixed input but the dataset has variable audio length. One approach to remedy this problem would be to randomly crop the audio to a fixed length. Thereby, allowing us to feed to our model. I would've used RandomCrop in torchvision, however, it only takes PIL Image instead of a Tensor. We need it done on audio across time only not changing the channel dimension of the audio.
Pitch
The implementation is for audio given a tensor. We would return a randomly cropped segment of the audio given a requested audio length. Keeping the same number of channels. Alternatively, if the audio is shorter in length than the requested audio length, we would pad the audio across time.
Additional context
This is my first contribution hence I was not aware that I needed to create an issue before PR (#403). I apologize for that 😕
The text was updated successfully, but these errors were encountered:
Thanks for bringing this up! This may be relevant to pytorch/vision#1375 for torchvision to migrate away from PIL. Let's see if they have interest in having something like this. @fmassa
We had removed deterministic pad and trim since they already existed in pytorch, see #160. This is slightly different since it also adds randomness.
Opening an issue is usually a good idea, since this allows you to get feedback before starting to work on code that may or may not be aligned with current needs :)
By the way, when loading an audio file, torchaudio support reading only the segment provided. This avoids having to read a whole audio file when only a segment is of interest.
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
Similar to RandomCrop in torchvision but implemented for audio.
Motivation
Often we have a model with fixed input but the dataset has variable audio length. One approach to remedy this problem would be to randomly crop the audio to a fixed length. Thereby, allowing us to feed to our model. I would've used RandomCrop in torchvision, however, it only takes PIL Image instead of a Tensor. We need it done on audio across time only not changing the channel dimension of the audio.
Pitch
The implementation is for audio given a tensor. We would return a randomly cropped segment of the audio given a requested audio length. Keeping the same number of channels. Alternatively, if the audio is shorter in length than the requested audio length, we would pad the audio across time.
Additional context
This is my first contribution hence I was not aware that I needed to create an issue before PR (#403). I apologize for that 😕
The text was updated successfully, but these errors were encountered: