-
Notifications
You must be signed in to change notification settings - Fork 372
Open
Labels
Description
Bug Description
Currently, Torch-TRT displays the following error when compiling Llama-2-7B in FP16 on a 4090 GPU:
[05/08/2024-20:47:56] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[05/08/2024-20:47:56] [TRT] [W] Requested amount of GPU memory (13476831488 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:47:56] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin (No Myelin Error exists)
The model should successfully compile on a 4090 GPU, given the memory available.