Skip to content

ask about this project #191

@TentenMarchhhh

Description

@TentenMarchhhh

SIE unifies embeddings, reranking, and extraction under one API with on-demand loading + LRU eviction. What were the biggest technical challenges in supporting such a diverse set of architectures (dense, sparse, multi-vector, ColBERT, vision, cross-encoders, GLiNER, etc.) in a single Rust/Python backend?
How do you handle model compatibility and quality verification? You mentioned MTEB CI checks — what’s your process for adding new models, and how often do you update the 85+ model catalog?
Memory management: What’s the practical limit for running multiple large models simultaneously on a single GPU (e.g., 24GB or 80GB)? Any clever optimizations for mixed precision, quantization, or batching?
In production (Kubernetes with KEDA autoscaling), what kind of latency and throughput do you typically see for common models like BGE, Stella, or BGE-reranker on GPU vs CPU?
Compared to alternatives like vLLM, TGI, Ollama, or dedicated embedding servers (e.g., TEI), what are the biggest advantages and trade-offs of SIE?
OpenAI-compatible /v1/embeddings endpoint is great for migration. How complete is the compatibility, and do you plan to expand it (e.g., reranking or extraction endpoints)?

If someone wants to add custom models or new task types (e.g., more advanced extraction, OCR, or multimodal), what’s the best way to extend SIE while keeping it within the unified API?
Plans for the roadmap: Any support for fine-tuned models, LoRA adapters, continuous batching, or more advanced routing strategies?
The project mixes Rust (core performance) and Python. How do you manage the boundary, and any lessons learned from that hybrid approach?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions