Support Alibaba-NLP/gte-large-en-v1.5 on CPU/MPS

### Feature request

We'd like to run the `Alibaba-NLP/gte-large-en-v1.5` model on a CPU `text-embedding-router` server, but are hitting 

Caused by:
    Could not start backend: GTE is only supported on Cuda devices in fp16 with flash attention enabled

Is there any way to implement/allow this model to run on CPU?

### Motivation

For some of our clients we need to support a CPU embedding server, and would like to use the `Alibaba-NLP/gte-large-en-v1.5` model to avail ourselves of the long 8192 token context length.

### Your contribution

We'd be happy to test and run performance benchmarks if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Alibaba-NLP/gte-large-en-v1.5 on CPU/MPS #375

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Alibaba-NLP/gte-large-en-v1.5 on CPU/MPS #375

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions