🐛 [Bug] Llama-2-7b on a 4090 GPU

##  Bug Description

Currently, Torch-TRT displays the following error when compiling Llama-2-7B in FP16 on a 4090 GPU:
```
[05/08/2024-20:47:56] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[05/08/2024-20:47:56] [TRT] [W] Requested amount of GPU memory (13476831488 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:47:56] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin (No Myelin Error exists)
```

The model should successfully compile on a 4090 GPU, given the memory available.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [Bug] Llama-2-7b on a 4090 GPU #2836

Bug Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 [Bug] Llama-2-7b on a 4090 GPU #2836

Description

Bug Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions