Skip to content
Discussion options

You must be logged in to vote

I don't think that significantly overlapping the computation kernels on GPUs is realistic, kernels tend to saturate SM by design. The data transfer and computation kernels are supposed to overlap, and the prefetcher was supposed to leverage that so it could kick off the transfer of the next batch from CPU -> GPU memory previous batch computation was finishing.

The long pauses are possibly backups caused by excessive IO or memory swapping or other system events that hold up the dataloader worker processes or main process for a moment. Having excessive dataloader worker processes can make that worse.

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
1 reply
@fsolgui
Comment options

Comment options

You must be logged in to vote
2 replies
@rwightman
Comment options

Answer selected by fsolgui
@fsolgui
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants