[Doc] Explain the effect of `length` in `Wav2Vec2Model` #1889

hihunjin · 2021-10-16T12:53:27Z

🚀 The feature

More specific explanation in the docs.
in here, I need more in-detailed explanation about length. Is it a sample rate?

Motivation, pitch

It's confusing to understand what the variable/argument behaves.

Alternatives

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

mthrok · 2021-10-16T15:37:44Z

Hi @hihunjin

When batching multiple audios with different duration, the resulting Tensor would have padding for shorter audios.
length parameter is to indicate what is the valid (unpadded) length of each batch sample.

Say I create a batch from 1 second audio and 0.8 second audio, both single channel and sampled at 16k Hz. The resulting batch Tensor will be in shape of [2, 16000].
The second audio in the batch, actually has 12800 ( == 16000 * 0.8) valid samples and the rest 3200 samples are just padding. In this case the input length Tensor should look like torch.Tensor([16000, 12800]).

By providing the length Tensor, Wav2Vec2Model will compute the appropriate mask when it go through transformer layer, so that artifacts from the padding portion will not affect the computation.

length parameter also helps to get the same sort of information for the output values. Since the Wav2Vec2Model changes the number of frames, it is not intuitively clear what are the valid output lengths just by looking at the shape of the output Tensor. When length is provided, it will compute the valid output lengths and return it.

mthrok · 2021-10-16T15:41:01Z

Length computation in convolution layer

audio/torchaudio/models/wav2vec2/components.py

Lines 62 to 65 in e885204

    
           if length is not None: 
        
               length = torch.div(length - self.kernel_size, self.stride, rounding_mode='floor') + 1 
        
               # When input length is 0, the resulting length can be negative. So fix it here. 
        
               length = torch.max(torch.zeros_like(length), length)

Mask computation in Transformer layer

audio/torchaudio/models/wav2vec2/components.py

Lines 442 to 449 in e885204

    
           if lengths is not None: 
        
               batch_size, max_len, _ = x.shape 
        
               # create mask for padded elements and zero-out them 
        
               mask = torch.arange(max_len, device=lengths.device).expand(batch_size, max_len) >= lengths[:, None] 
        
               x[mask] = 0.0 
        
               # extend the mask to attention shape and set weight 
        
               mask = -10000.0 * mask[:, None, None, :].to(dtype=features.dtype) 
        
               mask = mask.expand(batch_size, 1, max_len, max_len)

hihunjin · 2021-10-16T15:45:54Z

Thanks a lot. I appreciate it.

mthrok · 2021-10-16T15:59:03Z

Glad to help.
I will keep this open until we update the doc to include the above information. Thanks for your feedback.

mthrok added the question label Oct 16, 2021

mthrok added the module: docs label Oct 16, 2021

hihunjin closed this as completed Oct 16, 2021

mthrok reopened this Oct 16, 2021

mthrok changed the title ~~What is length in Wav2Vec2Model?~~ [Doc] Explain the effect of length in Wav2Vec2Model Oct 16, 2021

mthrok removed the question label Oct 16, 2021

mthrok mentioned this issue Oct 16, 2021

[Cherry-picked 0.10] Update descriptions of lengths parameters #1890

Merged

mthrok closed this as completed in #1890 Oct 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc] Explain the effect of `length` in `Wav2Vec2Model` #1889

[Doc] Explain the effect of `length` in `Wav2Vec2Model` #1889

hihunjin commented Oct 16, 2021

mthrok commented Oct 16, 2021

Uh oh!

mthrok commented Oct 16, 2021

Uh oh!

hihunjin commented Oct 16, 2021

Uh oh!

mthrok commented Oct 16, 2021

Uh oh!

[Doc] Explain the effect of length in Wav2Vec2Model #1889

[Doc] Explain the effect of length in Wav2Vec2Model #1889

Comments

hihunjin commented Oct 16, 2021

🚀 The feature

Motivation, pitch

Alternatives

Additional context

mthrok commented Oct 16, 2021

Uh oh!

mthrok commented Oct 16, 2021

Uh oh!

hihunjin commented Oct 16, 2021

Uh oh!

mthrok commented Oct 16, 2021

Uh oh!

[Doc] Explain the effect of `length` in `Wav2Vec2Model` #1889

[Doc] Explain the effect of `length` in `Wav2Vec2Model` #1889