⭐ If Ella's RAG-based medical triage architecture gave you ideas — a star helps other health-AI builders find it. Takes 2 seconds.
Ella is a high-performance medical orchestration engine designed to bridge the gap between patient inquiries and clinical action. Operating as a sophisticated Reasoning and Extraction (RAG) agent, Ella processes unstructured natural language to perform clinical triage, intent classification, and medical knowledge retrieval.
The core of Ella operates on a multi-stage pipeline designed to ensure clinical safety, contextual grounding, and logical reasoning. Unlike standard conversational AI, Ella utilizes a "Hard-RAG" protocol, which constrains the generative capabilities of the model to a curated set of medical handbooks.
Upon receiving patient input, the system utilizes a State-Aware Router. This component analyzes the current message alongside the conversation history using Pydantic-validated schemas. The router classifies the input into four distinct streams:
- Emergency: High-priority detection of life-threatening symptoms.
- Triage: Clinical investigation of symptoms using medical data.
- Booking: Administrative coordination for appointment scheduling.
- General Info: Routine inquiries or emotional support.
Ella’s "brain" is powered by a high-density vector database containing over 90,000 clinical records. The retrieval process is two-fold:
- Semantic Search: Utilizing vector embeddings to understand the underlying meaning of a patient's concern.
- Keyword Matching: Ensuring specific medical terminology and drug names are captured with high precision.
- Reranking: A Cross-Encoder model (BGE-Reranker) filters the top results to provide only the most relevant clinical context to the LLM, minimizing hallucinations.
The final response is generated through a constrained synthesis phase. Ella is instructed to integrate the retrieved clinical data into a professional, empathetic response. This phase ensures that every piece of medical advice is grounded in established medical texts, effectively acting as an "open-book" diagnostic aid.
The fundamental intelligence and data infrastructure of Ella are now fully operational. Key milestones achieved include:
- Core Infrastructure: Establishment of a low-latency environment capable of toggling between local (Ollama) and cloud (Groq) inference.
- Data Engineering: Implementation of an atomic data curation pipeline that cleaned and structured thousands of pages of medical text into machine-readable formats.
- Vector Intelligence: Deployment of a Qdrant-based vector architecture and a hybrid retrieval system with integrated reranking for sub-second precision.
- Clinical Logic: Development of an emergency guardrail system and diagnostic reasoning logic that allows Ella to perform multi-turn triage without losing clinical context.
The next stage of development focuses on persistent data and advanced state management. This includes the integration of a PostgreSQL/Supabase backend to manage clinic schedules and patient records, alongside the implementation of WindowBufferMemory for enhanced long-term conversation retention.
To transition from a CLI to a functional receptionist, Ella will be integrated with telephony bridges (Vapi/Twilio). This phase will optimize Speech-to-Text (STT) and Text-to-Speech (TTS) pipelines (using Whisper and ElevenLabs) to achieve a "Time to First Byte" latency of under 1.5 seconds.
The final roadmap involves outbound notification systems via WhatsApp/SMS, the creation of a React-based administrative dashboard for human-in-the-loop oversight, and rigorous security hardening to ensure architectural compliance with international medical data privacy standards.