Skip to content

🐛 [Bug] Llama-2-7b on a 4090 GPU #2836

@gs-olive

Description

@gs-olive

Bug Description

Currently, Torch-TRT displays the following error when compiling Llama-2-7B in FP16 on a 4090 GPU:

[05/08/2024-20:47:56] [TRT] [E] 1: [defaultAllocator.cpp::allocate::19] Error Code 1: Cuda Runtime (out of memory)
[05/08/2024-20:47:56] [TRT] [W] Requested amount of GPU memory (13476831488 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[05/08/2024-20:47:56] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin (No Myelin Error exists)

The model should successfully compile on a 4090 GPU, given the memory available.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghf

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions