Nulls instead of vector for Alibaba-NLP/gte-multilingual-base on T4 GPU

### System Info

Model - [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base)
Image - text-embeddings-inference:turing-1.5
Azure VM - Standard_NC4as_T4_v3
GPU - [Nvidia Tesla T4](https://www.nvidia.com/en-us/data-center/tesla-t4/)
AKS version - 1.28.14
OS - Ubuntu 22.04
Command - 
```yml
command: ["text-embeddings-router"]  
args: 
[ 
      "--model-id", "Alibaba-NLP/gte-multilingual-base",
      "--port", "8080",
      "--max-client-batch-size", "2000",
      "--payload-limit", "200000000",
      "--max-batch-tokens", "260000",
      "--revision", "refs/pr/7",
      "--auto-truncate"
]
```

### Information

- [X] Docker
- [ ] The CLI directly

### Tasks

- [X] An officially supported command
- [ ] My own modifications

### Reproduction

When executing the following request the first time:
```
POST /v1/embeddings
{
 "input":  "test",
 "model": "Alibaba-NLP/gte-multilingual-base"
}
```
The response is following
```
{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "embedding": [
                -0.055719655,
                0.06356562,
                -0.030253513
                ......................
            ],
            "index": 0
        }
    ],
    "model": "Alibaba-NLP/gte-multilingual-base",
    "usage": {
        "prompt_tokens": 3,
        "total_tokens": 3
    }
}
```
However, when repeating the same request the second time, I am getting:
```
{
    "object": "list",
    "data": [
        {
            "object": "embedding",
            "embedding": [
                null,
                null,
                null
                ......................            
             ],
            "index": 0
        }
    ],
    "model": "Alibaba-NLP/gte-multilingual-base",
    "usage": {
        "prompt_tokens": 3,
        "total_tokens": 3
    }
}
```
I tried setting `USE_FLASH_ATTENTION=False`, however, it seems that this env variable is ignored for GTE models. I understand that Turing support is marked as experimental, but is there any way to run this on T4 with or without Flash Attention v1? 


### Expected behavior

Do not get nulls instead of vector.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nulls instead of vector for Alibaba-NLP/gte-multilingual-base on T4 GPU #439

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nulls instead of vector for Alibaba-NLP/gte-multilingual-base on T4 GPU #439

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions