-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Modern embedding models like Qwen3-Embedding support query-document asymmetry through instruction prompts. The API should allow passing optional context/instruction when generating embeddings:
// Generate query embedding
EmbeddingRequest queryRequest = EmbeddingRequest.builder()
.inputs(List.of("machine learning"))
.instruction("query: ")
.build();
EmbeddingResponse queryResponse = embeddingModel.call(queryRequest);
// Generate document embedding
EmbeddingRequest docRequest = EmbeddingRequest.builder()
.inputs(List.of("machine learning"))
.instruction("passage: ")
.build();
EmbeddingResponse docResponse = embeddingModel.call(docRequest);
// These produce different vectors optimized for retrievalAPI additions:
- Add optional
instructionfield toEmbeddingRequest - Add
instructiongetter/setter toEmbeddingOptionsinterface - Default to
nullfor backward compatibility
Current Behavior
Spring AI's EmbeddingModel and EmbeddingRequest only accept text content and basic options (model, dimensions). There's no way to pass instructions/context to the embedding model, even when the underlying model supports it.
This means the same text always generates the same embedding, regardless of whether it's being used as a query or document, losing significant retrieval accuracy improvements.
Context
How this affects usage:
When using models like Qwen3-Embedding or E5 that support instructions, I cannot leverage their query-document asymmetry feature, which significantly improves semantic search performance.
What I'm trying to accomplish:
Build a semantic search system where queries and documents are embedded differently to improve retrieval accuracy, as recommended by modern embedding model best practices.
Models affected:
- Qwen3-Embedding (0.6B, 7.6B)
- E5 series (e5-base, e5-large, e5-mistral-7b)
- BGE series with instructions
- Instructor embedding models
Current workaround:
Manually prepend instructions to text (e.g., "query: " + text), but this is less clean and may not work identically to native instruction handling in the model's API.
How other APIs handle this:
Qwen native API:
model.encode("text", prompt="query: ") # Different vectors
model.encode("text", prompt="passage: ")OpenAI-compatible APIs (vLLM):
{
"input": "text",
"extra_body": {"prompt_name": "query"}
}