Skip to content

Conversation

@sxy-trans-n
Copy link
Collaborator

@sxy-trans-n sxy-trans-n commented Jun 9, 2025

Add embedding models support

🚀 Overview

This PR adds comprehensive embedding model support to Swama, enabling text embeddings generation using MLX-powered models for semantic search, similarity analysis, and other embedding-based applications.

✨ Key Features

  • EmbeddingRunner: Core embedding inference engine with batch processing
  • HTTP API: OpenAI-compatible /v1/embeddings endpoint
  • Batch Processing: Efficient handling of multiple texts with automatic padding
  • Usage Tracking: Token count reporting for monitoring

🛠️ Implementation

Core Components

  • EmbeddingRunner.swift - Actor-based embedding generation
  • EmbeddingsHandler.swift - HTTP API endpoint
  • Integration with existing ModelManager

API Example

POST /v1/embeddings
{
  "input": ["Hello world", "Another text"],
  "model": "mlx-community/Qwen3-Embedding-4B-4bit-DWQ"
}

📦 Dependencies

  • mlx.embeddings: New dependency for embedding inference
  • MLX Swift: Updated to v0.25.4

✅ Quality Assurance

  • Swift build passes
  • SwiftFormat compliance
  • Unit tests added
  • Memory efficient implementation
  • Thread-safe actor design

🔧 Technical Details

  • Resolved SwiftFormat acronyms rule conflicts with external APIs
  • Comprehensive error handling and validation
  • OpenAI-compatible response format

🎯 Performance

  • Batched inference for multiple texts
  • Attention masking for variable-length sequences
  • Efficient MLXArray operations

Breaking Changes: None - fully backward compatible

Testing: Unit and integration tests included

@sxy-trans-n sxy-trans-n requested a review from syh-trans-n June 9, 2025 13:10
Copy link
Collaborator

@syh-trans-n syh-trans-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@sxy-trans-n sxy-trans-n merged commit 7f7bacc into main Jun 10, 2025
2 checks passed
@sxy-trans-n sxy-trans-n deleted the embedding-model branch June 10, 2025 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants