Voice-directed surgical co-pilot for the da Vinci robotic surgery platform. A real-time, hands-free AI agent that gives surgeons instant access to patient data, CT imaging, 3D anatomy, drug safety checks, and operative documentation — all through natural speech, without ever breaking scrub.
Built with: Google ADK · Gemini Live API · Vertex AI · Cloud Run · GCS
Live deployment (voice console): https://soar-main-60284943541.europe-west1.run.app
A2A endpoint (Prompt Opinion marketplace): https://soar-a2a-60284943541.europe-west1.run.app
- Disclaimer
- The Problem Statement
- The Solution
- Prompt Opinion Marketplace (A2A)
- Architecture
- Agent System
- Gemini & ADK Features Used
- Features
- Tech Stack
- GCP Backend & Logs Demo (for Hackathon)
- Local Setup
- Assets Setup
- Cloud Deployment & Automation
- Project Structure
- Data Sources
- Key Voice Commands
This project is designed to demonstrate the capabilities of the Gemini Live API and Google Agent Development Kit (ADK). It may contain clinical inaccuracies and has not been reviewed by medical domain experts.
During robotic surgery, the operating surgeon's hands are locked on instrument controls inside a sterile field for the entire procedure. They cannot type, click, tap, or interact with any computer system. Every piece of critical information — patient labs, CT imaging, drug safety checks, phase checklists — requires them to either break scrub or call out to circulating staff. Both are slow, disruptive, and potentially dangerous at the wrong moment.
Beyond the access problem, five specific, evidence-backed failures compound surgical risk:
| # | Challenge | Metric | Source |
|---|---|---|---|
| 1 | CVS documentation gap — Critical View of Safety is rarely confirmed before bile duct division | Only 23.1% of laparoscopic cholecystectomies have CVS documented | Terho et al. 2021 (PMID 33975802) |
| 2 | WHO Surgical Checklist compliance — life-saving but inconsistently executed under OR pressure | Implementing the checklist reduces mortality by 47% and complications by 36% | Haynes et al. 2009, NEJM (PMID 19144931) |
| 3 | Blood loss estimation error — visual EBL is unreliable, delaying transfusion decisions | Surgeons underestimate by 52–85%; 95% of clinicians are wrong by >25% | PMC7943515 |
| 4 | Operative note delays — critical documentation written days after the procedure from memory | Mean dictation delay of 15.6 days vs. 28 minutes with voice templates | Laflamme et al. (PMC1560865) |
| 5 | Intraoperative drug errors — wrong drug, dose, or timing without real-time cross-checks | 1 in 20 anesthesia administrations has an error; 80% are preventable | Nanji et al. (PMC4681677) |
| Capability | Details |
|---|---|
| Hands-Free Access | Fully voice-activated console — labs, vitals, imaging, anatomy on screen, all accessible under a second |
| Real-time Decision Support | Real-time drug safety cross-check · blood loss threshold alerts at 15/25/40% · complication protocol surfaced instantly |
| Protocol Enforcement | WHO Safety Timeout & phase-specific checklists run on voice command — every item timestamped |
| Situational & Context Awareness | Live surgical video + full screen capture streamed to Gemini — SOAR sees the operative field and the external console |
| Anatomical Guidance | Phase-aware danger zone alerts · 3D model with structure isolation · CT landmark navigation |
| Auto-Documentation | Every event logged with a timestamp as it happens · operative report generated instantly at case close |
| Intelligent Routing | One Orchestrator, 9 Agents, 24 Tools — the surgeon just talks, SOAR takes care of the rest |
| Hallucination Prevention | Prompt hardening · ADK before/after callbacks for argument enforcement |
SOAR is a voice-activated surgical co-pilot that listens to the surgeon continuously throughout the procedure. The surgeon speaks naturally — "SOAR, show hemoglobin", "run the timeout", "I have bleeding", "is cefazolin safe?" — and SOAR responds in under a second with the right information on the console display and a calm, brief spoken confirmation.
The system watches the live surgical video at 1 fps, giving Gemini real-time OR context. Eight specialist agents handle different domains of surgical need: pre-op briefing, safety timeout, blood loss tracking, drug safety, anatomy guidance, complication protocols, operative documentation, and SBAR handoff — all orchestrated by a root agent that routes intelligently based on intent.
SOAR does not give clinical opinions. It surfaces data, enforces protocols, and logs events. The surgeon decides. SOAR makes sure they have what they need to decide correctly.
SOAR is listed on the Prompt Opinion agent marketplace as a text-callable A2A agent. The A2A surface wraps the same SOAR orchestrator as the voice console but uses gemini-2.5-flash (via generateContent) instead of the native-audio model, making it accessible from any A2A-compatible client without a microphone.
A2A endpoint: https://soar-a2a-60284943541.europe-west1.run.app
Agent card: https://soar-a2a-60284943541.europe-west1.run.app/.well-known/agent-card.json
All requests require an X-API-Key header. Obtain your key from the Prompt Opinion marketplace after connecting the agent to your workspace.
X-API-Key: <your-api-key>The agent card itself (/.well-known/agent-card.json) is public — no key required to discover the agent.
| Skill ID | Trigger phrases | What it returns |
|---|---|---|
preop-briefing |
"brief me on this case", "patient rundown" | Demographics, key labs, allergies, held medications, first-phase checklist — under 50 words |
surgical-timeout |
"run the timeout", "WHO checklist" | WHO Surgical Safety Checklist verification against the FHIR record |
complication-protocol |
"I have bleeding", "air leak", "convert to open" | Step-by-step crisis-management protocol for the named complication |
drug-safety-check |
"can I give heparin", "is cefazolin safe" | Allergy + lab cross-check with a safety verdict and alternative if conflict found |
ebl-tracking |
"blood loss 200 mL", "update EBL", "how much have we lost" | Cumulative EBL with threshold alerts at 15 / 25 / 40 % of estimated blood volume |
operative-report |
"operative report", "summarize the case" | Structured report from the session event log |
sbar-handoff |
"prepare handoff", "sign out" | SBAR (Situation / Background / Assessment / Recommendation) for surgical sign-out |
anatomy-context |
"what's at risk", "danger zone" | Phase-aware danger-zone pearls — structures at risk, landmarks, clinical cautions |
SOAR uses the A2A v1 JSON-RPC protocol. Send POST / with Content-Type: application/json.
curl -X POST https://soar-a2a-60284943541.europe-west1.run.app \
-H "Content-Type: application/json" \
-H "X-API-Key: <your-api-key>" \
-d '{
"jsonrpc": "2.0",
"id": "1",
"method": "tasks/send",
"params": {
"id": "task-001",
"message": {
"role": "user",
"parts": [{"text": "I have bleeding — what is the protocol?"}]
}
}
}'The agent will respond using the hardcoded demo patient record (case_demo_001) when no FHIR credentials are provided.
Prompt Opinion injects FHIR credentials via the metadata field when the user has connected their FHIR server in the marketplace UI. If you are calling directly, pass the credentials the same way:
curl -X POST https://soar-a2a-60284943541.europe-west1.run.app \
-H "Content-Type: application/json" \
-H "X-API-Key: <your-api-key>" \
-d '{
"jsonrpc": "2.0",
"id": "2",
"method": "tasks/send",
"params": {
"id": "task-002",
"metadata": {
"http://localhost:5139/schemas/a2a/v1/fhir-context": {
"fhirUrl": "https://your-fhir-server.org/fhir",
"fhirToken": "Bearer eyJhbGci...",
"patientId": "patient-123"
}
},
"message": {
"role": "user",
"parts": [{"text": "Brief me on this patient"}]
}
}
}'When valid FHIR credentials are present, SOAR fetches the live patient record (Patient, Condition, MedicationRequest, AllergyIntolerance, Observation) from your FHIR R4 server instead of using demo data.
| Scope | Required |
|---|---|
patient/Patient.rs |
Yes |
patient/Condition.rs |
Yes |
patient/MedicationRequest.rs |
Yes |
patient/AllergyIntolerance.rs |
Yes |
patient/Observation.rs |
Yes |
patient/Procedure.rs |
No |
# Pre-op
"Brief me on this patient before we start"
"Run the WHO surgical timeout"
# Intraoperative
"I have vascular bleeding — what do I do?"
"Can I give 5000 units of heparin?"
"Blood loss 350 mL — update the EBL"
"What structures are at risk during fissure development?"
# Post-op / handoff
"Generate the operative report"
"Prepare an SBAR handoff for the ICU"
- Navigate to the Prompt Opinion marketplace and find SOAR Surgical Co-Pilot.
- Click Connect — the platform fetches the agent card and displays the required FHIR scopes.
- (Optional) Link your FHIR server: enter your FHIR base URL, paste a SMART-on-FHIR bearer token, and enter the patient ID. Prompt Opinion stores these credentials in your workspace and injects them into every request automatically.
- Type any of the example prompts above in the Prompt Opinion chat interface. SOAR responds with a plain-text answer suitable for verbal delivery.
| Voice console | Prompt Opinion / A2A | |
|---|---|---|
| Input | Live 16 kHz PCM audio | Text (JSON-RPC) |
| Output | Native TTS audio + rendered surgical UI panels | Plain text |
| FHIR | Demo data only | Live FHIR R4 server |
| Model | gemini-live-2.5-flash-native-audio |
gemini-2.5-flash |
| Use case | Intraoperative hands-free | Pre-op prep, remote query, integrations |
Browser (Surgical Console)
│ 16 kHz PCM audio + 1 fps JPEG video frames
│ ◄── 24 kHz PCM audio + JSON render_commands
▼
FastAPI WebSocket Server (Cloud Run)
│ LiveRequestQueue → ADK Runner → Vertex AI Live API
│ │
│ Gemini 2.5 Flash Native Audio Dialog
│ ASR · LLM reasoning · Native TTS · Function calling
▼
SOAR_Orchestrator (root_agent)
├── 18 direct tools (IR · IV · AR · PC · DOC)
└── 8 specialist sub-agents
Briefing_Agent · Timeout_Agent · Report_Agent · Complication_Advisor
EBL_Tracker · Drug_Checker · Anatomy_Spotter · Handoff_Agent
Handles all voice input, applies wake-word filtering, and either calls direct tools or routes to a specialist sub-agent via transfer_to_agent(). Owns 18 direct tools for single-action and parallel multi-action commands.
| Agent | Trigger phrases | Tools |
|---|---|---|
| Briefing_Agent | "brief me", "patient rundown", "case summary" | display_all_patient_data, get_surgical_phase |
| Timeout_Agent | "run the timeout", "WHO checklist", "safety check" | hide_all_overlays, display_all_patient_data, get_surgical_phase, log_event, show_agent_summary |
| Report_Agent | "operative report", "summarize the case", "what did we do" | hide_all_overlays, show_event_log, display_all_patient_data, show_agent_summary |
| Complication_Advisor | "I have bleeding", "air leak", "nerve injury", "we need to convert" | get_complication_protocol, get_surgical_phase, toggle_structure, log_event, capture_surgical_photo, show_agent_summary |
| EBL_Tracker | "blood loss 200 mL", "update EBL", "how much have we lost" | update_ebl, get_ebl_summary, display_patient_data |
| Drug_Checker | "can I give heparin", "is cefazolin safe", "check ketorolac" | check_drug_safety, display_patient_data |
| Anatomy_Spotter | "what's at risk", "danger zone", "anatomy check" | get_anatomy_context, get_surgical_phase, toggle_structure, jump_to_landmark, navigate_ct, rotate_model |
| Handoff_Agent | "prepare handoff", "sign out", "I'm scrubbing out" | show_event_log, display_all_patient_data, get_surgical_phase, log_event, show_agent_summary |
| Category | Tools |
|---|---|
| Patient Data (IR) | display_patient_data, display_all_patient_data, hide_patient_data |
| CT Imaging (IV) | navigate_ct, jump_to_landmark, hide_ct |
| 3D Model (AR) | rotate_model, toggle_structure, hide_3d, reset_3d_view, show_only_ar |
| Phase Checklist (PC) | get_surgical_phase, hide_surgical_checklist |
| Documentation (DOC) | log_event, capture_surgical_photo, show_event_log, hide_event_log |
| Specialist | update_ebl, get_ebl_summary, check_drug_safety, get_anatomy_context, get_complication_protocol, show_agent_summary |
- Barge-in —
StreamingMode.BIDIlets the surgeon interrupt SOAR mid-response at any time - Native audio dialog —
response_modalities=['AUDIO'], full speech-in / speech-out with no separate TTS step - Custom voice persona —
PrebuiltVoiceConfig(voice_name='Charon'), SOAR has a consistent OR voice - Live video streaming — surgical video frames sent at 1 fps as
image/jpegviasend_realtime(), giving Gemini real-time OR context - Simultaneous multimodal input — 16 kHz PCM audio + JPEG video on the same stream
- Input + output audio transcription —
AudioTranscriptionConfig()on both sides; displayed live in the console - Function calling — 22 tools declared via docstrings; the model selects and calls them autonomously
- Multi-agent hierarchy — root orchestrator + 8 specialist sub-agents using
LlmAgent+sub_agents transfer_to_agent()— LLM-driven dynamic routing at runtime- Parallel tool dispatch — multi-action commands (
"show hemoglobin and open the CT") call multiple tools in one response turn - Before-tool grounding callbacks — argument whitelists validated before every tool call
- After-tool schema validation — every tool response checked for valid
render_commandschema InMemorySessionService— session reconnection support viaget_session()beforecreate_session()LiveRequestQueue— per-connection audio buffer prevents race conditionsaclosing()generator cleanup — guaranteed cleanup ofrun_live()async generator on cancellationFIRST_EXCEPTIONtask pair — allows multi-turn conversations without reconnecting- Zombie session prevention —
live_request_queue.close()always called infinallyblock
- Argument whitelisting — field names, landmark names, phase names, structure names, event types all validated against hardcoded whitelists before any tool executes
- Hallucination prevention — root agent instructed never to state patient data from memory; always calls the tool
- Error recovery —
ValueError/KeyError/TypeErrorcaught mid-stream; session continues and surgeon is notified
| ✅ Feature List | ✅ Feature List |
|---|---|
| Live Agents audio interaction | Barge-in handled naturally |
| Context-aware Native audio dialog | UI Navigation: Visual UI Understanding & Interaction |
| Custom voice persona | Grounding: prompt hardening & before/after tool callback |
| Live video streaming & Screen Share (1fps send_realtime) | Error handling caught mid-stream |
| Multimodal: simultaneous input | Automated deployment |
| Transcription: Input and output audio | ADK Multi-agent & multi-tool orchestration |
| Layer | Technology |
|---|---|
| AI model | Gemini 2.5 Flash Preview Native Audio Dialog (Vertex AI) |
| Agent framework | Google ADK 1.26.0 |
| Backend | FastAPI + Uvicorn (Python 3.11) |
| Transport | WebSocket (bidirectional, binary + JSON) |
| Frontend | Vanilla HTML/CSS/JS — no framework |
| 3D rendering | Three.js r128 |
| Deployment | Google Cloud Run |
| CI/CD | Google Cloud Build (auto-deploy on push to main) |
| Image registry | Google Artifact Registry |
| Assets | Google Cloud Storage |
- Python 3.11+
gcloudCLI authenticated (gcloud auth application-default login)- A Google Cloud project with Vertex AI API enabled
- A GCS bucket with CT slices, 3D model, and surgical videos (see Assets Setup)
git clone https://github.com/adityashukla8/soar.git
cd soarpython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .Or with conda:
conda create -n soar python=3.11
conda activate soar
pip install -e .cp app/.env.template app/.envEdit app/.env:
GOOGLE_GENAI_USE_VERTEXAI=1
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_CLOUD_LOCATION=us-central1
# Verify current model ID at:
# https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
DEMO_AGENT_MODEL=gemini-live-2.5-flash-native-audio
GCS_BUCKET=your-gcs-bucket-name
PATIENT_ID=case_demo_001Never commit
app/.envto source control.
gcloud auth application-default login
gcloud config set project your-gcp-project-idcd app
uvicorn main:app --reload --port 8080Critical: Run
uvicornfrom inside theapp/directory. Running from the project root causesModuleNotFoundErrorforsoar_orchestrator.
Navigate to http://localhost:8080/console in your browser.
- Click Connect to start the WebSocket session
- Allow microphone access when prompted
- Say "SOAR, brief me on this case" to start
The landing page is at http://localhost:8080.
SOAR requires three types of assets uploaded to a GCS bucket.
gs://your-bucket/
├── ct/case_demo_001/
│ ├── 001.png
│ ├── 002.png
│ └── ... (133 slices total)
├── models/
│ └── lung_model.glb
└── video/
├── surgical_video.mp4 # Phases: port_placement, inspection
├── mmc11.mp4 # Phases: fissure_development, vascular_dissection, bronchial_dissection
└── mmc12.mp4 # Phases: specimen_extraction, lymph_node_dissection, closure
- Download case LIDC-IDRI-0001 from The Cancer Imaging Archive
- Convert DICOM to PNG:
pip install pydicom Pillow numpy python assets/convert_ct.py assets/dicom_raw/ assets/ct_slices/
- Upload to GCS:
gsutil -m cp assets/ct_slices/*.png gs://your-bucket/ct/case_demo_001/
Source a lung GLB model (e.g. NIH 3D Print Exchange or Sketchfab).
The model must contain meshes named: lung_right, lung_left, bronchus, tumor, parenchyma, vessels, ribs, pleura.
gsutil cp lung_model.glb gs://your-bucket/models/lung_model.glbSource VATS lobectomy procedure videos (Pexels / Pixabay / open-access surgical archives). Rename to match the filenames above and upload:
gsutil cp surgical_video.mp4 mmc11.mp4 mmc12.mp4 gs://your-bucket/video/gsutil cors set cors.json gs://your-bucketThe cors.json is included in the repository root.
# Build and push Docker image
docker build -t us-central1-docker.pkg.dev/YOUR_PROJECT/soar-repo/soar:latest .
docker push us-central1-docker.pkg.dev/YOUR_PROJECT/soar-repo/soar:latest
# Deploy to Cloud Run
gcloud run deploy soar \
--image=us-central1-docker.pkg.dev/YOUR_PROJECT/soar-repo/soar:latest \
--region=us-central1 \
--platform=managed \
--allow-unauthenticated \
--port=8080 \
--memory=2Gi \
--cpu=2 \
--timeout=3600 \
--set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=1,GOOGLE_CLOUD_PROJECT=YOUR_PROJECT,GOOGLE_CLOUD_LOCATION=us-central1"
# Set secret env vars separately (not in cloudbuild.yaml)
gcloud run services update soar \
--region=us-central1 \
--update-env-vars="DEMO_AGENT_MODEL=gemini-live-2.5-flash-native-audio,GCS_BUCKET=your-bucket,PATIENT_ID=case_demo_001"Every push to main automatically builds, pushes, and deploys via cloudbuild.yaml.
To set up the trigger:
# Connect your GitHub repository in Cloud Build console, then:
gcloud builds triggers create github \
--repo-name=soar \
--repo-owner=adityashukla8 \
--branch-pattern='^main$' \
--build-config=cloudbuild.yaml \
--region=us-central1The pipeline runs three steps: Docker build → push to Artifact Registry → gcloud run deploy with the commit SHA tag.
DEMO_AGENT_MODEL,GCS_BUCKET, andPATIENT_IDare set directly on the Cloud Run service and intentionally omitted fromcloudbuild.yaml— they survive redeployments without being overwritten or exposed in source control.
soar/
├── app/
│ ├── main.py # FastAPI server + WebSocket endpoint
│ ├── .env.template # Environment variable template
│ └── soar_orchestrator/
│ ├── __init__.py # Exports root_agent (ADK requirement)
│ ├── agent.py # 9 LlmAgent definitions + grounding callbacks
│ └── tools.py # 22 tool functions + patient/drug/protocol data
│ └── static/
│ ├── index.html # Surgical console UI (4-panel layout)
│ ├── landing.html # Public landing page
│ └── js/
│ ├── app.js # WebSocket client + event dispatcher
│ ├── ct-viewer.js # CT PNG slice renderer (canvas)
│ ├── anatomy-3d.js # Three.js GLB model renderer
│ ├── clinical-panel.js # Clinical data card overlay
│ ├── checklist-panel.js # Surgical phase checklist tile
│ └── log-panel.js # Intraoperative event log tile
├── assets/
│ ├── convert_ct.py # DICOM → PNG converter
│ └── dicom_raw/ # Raw DICOM files (LIDC-IDRI-0001)
├── Dockerfile # Container definition (WORKDIR: /app/app)
├── cloudbuild.yaml # Cloud Build CI/CD pipeline
├── cors.json # GCS CORS configuration
└── pyproject.toml # Python package manifest
| Asset | Source | License |
|---|---|---|
| CT imaging | LIDC-IDRI-0001, The Cancer Imaging Archive | CC BY 3.0 |
| 3D anatomy model | NIH 3D Print Exchange / Sketchfab | Per model license |
| Surgical videos | Open-access VATS lobectomy recordings (mmc6, mmc11, mmc12) | Per source license |
| Patient record | Synthetic FHIR-compliant demo data — no real clinical information | N/A |
| Drug database | Hardcoded pharmacology rules — 10 common intraoperative drugs | N/A |
# Pre-op
"SOAR, brief me on this case"
"SOAR, run the timeout"
# Patient data
"SOAR, show hemoglobin"
"SOAR, display all patient data"
# CT imaging
"SOAR, jump to the tumor"
"SOAR, next 5 slices"
# 3D model
"SOAR, show the tumor"
"SOAR, rotate the model left"
# Blood loss
"SOAR, blood loss 200 millilitres"
"SOAR, what is the total EBL"
# Drug safety
"SOAR, can I give cefazolin"
"SOAR, is morphine safe"
# Complications
"SOAR, I have bleeding"
"SOAR, we need to convert to open"
# Documentation
"SOAR, log CVS confirmed"
"SOAR, generate the operative report"
# Handoff
"SOAR, prepare handoff"