Nomad is a voice-first collaborative travel assistant built with the Google Agent Development Kit (ADK). It demonstrates bi-directional voice streaming (low latency, interruption handling) and multi-agent orchestration using Gemini 2.5 Flash.
- Backend: Python (FastAPI) + Google ADK
- Orchestrator:
NomadConcierge(gemini-live-2.5-flash-native-audio) routes tasks. - Subagents:
Flight SpecialistandLifestyle Specialist(both usegemini-2.5-flash).
- Orchestrator:
- Frontend: React + Vite + WebSockets
- Real-time audio visualization.
- Live transcript and system activity dashboard.
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.pyServer runs on http://0.0.0.0:8000
cd frontend
npm install
npm run devClient runs on http://localhost:5173
Developers can customize the agent's behavior and models in backend/config.py:
- Model Selection: Change
ORCHESTRATOR_MODELorSUBAGENT_MODELto test different Gemini versions. - System Instructions: Modify
NOMAD_INSTRUCTIONto change the orchestrator's persona orFLIGHT_SPECIALIST_INSTRUCTION/LIFESTYLE_SPECIALIST_INSTRUCTIONto tweak subagent behavior. - App Name: Update
APP_NAMEfor session tracking.
The application provides real-time visibility into system performance and agent reasoning:
- Turn TTFB (Time to First Byte): The time from when the user stops speaking (VAD detected) to when the first byte of the agent's audio response is received.
- Note: This accounts for VAD silence duration to give a true "end-of-speech" to "start-of-response" measurement.
- Subagent Execution Time: The specific duration taken by a subagent tool (e.g.,
consult_flight_specialist) to process and return a result.
The right-hand panel in the frontend visualizes the agent's "brain":
- Active Subagent: Shows which specialist is currently working (e.g., "Flight Specialist").
- Subagent Args: Displays the parameters passed to the tool (e.g.,
{"destination": "Tokyo", "date": "May 2025"}). - Subagent Response: Shows the raw JSON/Text output from the subagent.
Low latency, handled directly by the Orchestrator.
User: "Who are you?" Nomad: "I'm Nomad, your collaborative travel assistant! I work with a team of specialists to help you plan trips."
Higher latency, involves tool execution and multi-turn reasoning.
User: "Find me flights to Tokyo for under $1000 in May." Nomad: [Internal Thought: User needs flights -> Call Flight Specialist] > System: [Visual: Active Subagent "Flight Specialist" appears] > Flight Specialist: [Tool executes: Checks mock DB] -> Returns "ANA, $850, May 20th" Nomad: "I found a great option on ANA for $850 departing May 20th. Does that work for you?"