Support SPLADE style sparse embeddings

### Feature request

It would be great to have a single server to calculate the dense / sparse embeddings fro hybrid search.

Having a sparse endpoint that would calculate the sparse embeddings would do this.

### Motivation

I want to be able to use the same TEI server for my dense and sparse embedding calculations with hybrid search.

### Your contribution

I am happy to help by working on a PR.

The following code from llama_index may be of use for processing the incoming vectors and returning it in a sparse format for dbs like Qdrant.

```py
def default_sparse_encoder(model_id: str) -> SparseEncoderCallable:
    try:
        import torch
        from transformers import AutoModelForMaskedLM, AutoTokenizer
    except ImportError:
        raise ImportError(
            "Could not import transformers library. "
            'Please install transformers with `pip install "transformers[torch]"`'
        )

    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForMaskedLM.from_pretrained(model_id)
    if torch.cuda.is_available():
        model = model.to("cuda")

    def compute_vectors(texts: List[str]) -> Tuple[List[List[int]], List[List[float]]]:
        """
        Computes vectors from logits and attention mask using ReLU, log, and max operations.
        """
        # TODO: compute sparse vectors in batches if max length is exceeded
        tokens = tokenizer(
            texts, truncation=True, padding=True, max_length=512, return_tensors="pt"
        )
        if torch.cuda.is_available():
            tokens = tokens.to("cuda")

        output = model(**tokens)
        logits, attention_mask = output.logits, tokens.attention_mask
        relu_log = torch.log(1 + torch.relu(logits))
        weighted_log = relu_log * attention_mask.unsqueeze(-1)
        tvecs, _ = torch.max(weighted_log, dim=1)

        # extract the vectors that are non-zero and their indices
        indices = []
        vecs = []
        for batch in tvecs:
            indices.append(batch.nonzero(as_tuple=True)[0].tolist())
            vecs.append(batch[indices[-1]].tolist())

        return indices, vecs

    return compute_vectors
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support SPLADE style sparse embeddings #129

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support SPLADE style sparse embeddings #129

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions