Skip to content
/ hexapod Public

This project develops an autonomous hexapod robot using auditory scene analysis for navigation. It integrates sound source localization (DOA) and beamforming via ODAS with a circular microphone array for precise spatial detection. A machine learning-based Keyword Spotting (KWS) module enables voice command recognition for human-robot interaction.

Notifications You must be signed in to change notification settings

Gl0dny/hexapod

Repository files navigation

Thesis : "Hexapod autonomous control system based on auditory scene analysis: real-time sound source localization and keyword spotting for voice command recognition"

Diploma project completed at Warsaw University of Science and Technology as a part of Master of Science in Engineering - Computer Science.

This project aims to develop an autonomous control system for a hexapod walking robot, using auditory scene analysis as the primary modality for navigation and environmental interaction. The system integrates sound source localization (Direction of Arrival estimation - DOA) and beamforming techniques via the ODAS framework, employing a circular microphone array for enhanced spatial precision. This enables the robot to accurately detect and characterize sound sources, allowing real-time responses to acoustic stimuli for dynamic, context-aware behavior.

A Keyword Spotting (KWS) module, powered by machine learning, is incorporated to recognize predefined voice commands, enabling effective human-robot interaction. The research focuses on developing the hardware and software infrastructure to seamlessly integrate acoustic processing with the robot's control system.

The project includes designing and building the robot's platform, encompassing both the mechanical structure and embedded systems. The hexapod's platform is engineered to support advanced auditory processing, ensuring optimal performance in real-world scenarios. This involves creating a robust mechanical framework for stable, agile locomotion and an embedded system architecture for real-time processing and decision-making.

The hardware is designed to accommodate the circular microphone array, ensuring precise sound capture, while the software facilitates seamless communication between auditory processing modules, the control system, and actuators. This comprehensive approach ensures the robot can perform complex tasks, such as navigating dynamic environments and responding accurately to auditory cues.

Real-Time Sound Source Localization: Hexapod Robot with ODAS Audio Processing

[Click the image below to watch the full demonstration video]

Real-Time Sound Source Localization: Hexapod Robot with ODAS Audio Processing

This video demonstrates an autonomous hexapod robot performing advanced auditory scene analysis in real-time. The complete ODAS (Open embeddeD Audition System) pipeline with beamforming is showcased, featuring:

  • Real-time Direction of Arrival (DoA) estimation using a 6-microphone circular array
  • Live GUI visualization showing sound source tracking and spatial mapping
  • Terminal debug output displaying active sound sources with coordinates and activity levels
  • Elevation and azimuth time charts showing temporal tracking of sound source positions
  • System monitoring panel showing CPU usage, temperature, memory usage, and IP address
  • Robot view - top-down view of the hexapod responding to acoustic stimuli
  • LED feedback system indicating detected sound sources through visual cues
  • Multi-source tracking - demonstrating the system's ability to track up to 4 simultaneous sound sources
  • Automatic audio stream separation and recording of individual source audio files

This represents a complete autonomous control system where the hexapod can navigate and interact based purely on auditory cues, enabling sophisticated human-robot interaction through voice commands and environmental sound awareness.

Gamepad Control System

The hexapod supports three control modes through a connected DualSense controller, providing both manual control and automated gait control capabilities:

Control Modes

1. Body Control Mode (Default)

Body Control "Direct body positioning" - Direct control of hexapod body position and orientation using inverse kinematics. Left stick controls translation (forward/back/left/right), right stick controls rotation (roll/pitch), L2/R2 triggers control up/down movement, and L1/R1 control yaw rotation. LEDs show blue pulsing animation (blue base with black pulse) during body control operations.

2. Gait Control Mode

Gait Control "Natural walking movement" - Uses the hexapod's gait generator for realistic walking. Left stick controls movement direction (forward/back/left/right/diagonal), right stick controls rotation while walking, and X button toggles marching in place. LEDs show indigo thinking animation pattern during gait control operations.

3. Voice Control Mode

Voice Control System

"Voice command processing" - System switches to voice control mode where manual inputs are disabled and the robot responds to voice commands. Can be toggled from any manual mode. LEDs show blue and green pulsing animation (blue base with green pulse) synchronized with voice control system.

Gamepad Features

  • Automatic mode detection - System automatically detects connected DualSense controller
  • LED feedback integration - Controller LEDs provide visual feedback matching robot status and mode
  • Seamless mode switching - Switch between body control, gait control, and voice control modes on-the-fly
  • Voice control integration - Voice commands can interrupt and override manual control
  • Precise movement control - Analog sticks provide smooth, proportional control with adjustable sensitivity
  • Safety features - Built-in safety limits and emergency stop functionality
  • Sensitivity adjustment - Real-time sensitivity control via D-pad for fine-tuning movement

Voice Control System

The hexapod operates through a sophisticated voice control system that processes commands through distinct phases, each with specific functionality and visual feedback:

System Phases

1. Wake Word Detection Mode

Wake Word Mode "Listening for 'Hexapod'..." - System continuously monitors audio input for the wake word using Picovoice Porcupine engine. LEDs show pulsing animation (blue base with green pulse) during passive listening state.

2. Intent Recognition Mode

Intent Mode "What would you like me to do?" - After wake word detection, system switches to active command listening using Picovoice Rhino engine. LEDs show alternating light rotating pattern while waiting for voice command.

3. Command Processing Mode

Processing Mode "Processing your request..." - System analyzes the recognized intent, extracts parameters, and determines the appropriate action. System dispatches the command to the appropriate subsystem (movement, lights, audio, or system control). LED animation shows lime green opposite rotation pattern during processing.

4. Error Handling Mode

Error Mode "Command not recognized" - System handles unrecognized commands, invalid parameters, or execution failures. LED indicators show pulsing animation (red base with orange pulse) for error states.

System Features

  • Multi-intent processing - Handles complex commands with multiple parameters
  • Task interruption - Wake word detection automatically interrupts current tasks (gait tasks are gracefully stopped after completing a cycle)
  • Real-time feedback - Visual and audio confirmation of system state
  • Error recovery - Graceful handling of command failures and system errors

Usage Examples

Movement Commands

Walk

Walk "Hexapod, walk/move [direction] [for X seconds/minutes/cycles]" - Omnidirectional movement in 8 directions: forward, backward, left, right, forward left, forward right, backward left, backward right. Supports time-based (seconds/minutes) or cycle-based movement

Rotate

Rotate "Hexapod, rotate/turn [clockwise/counterclockwise] [for X seconds/minutes/cycles]" - Smooth rotation in both directions using inverse kinematics. Supports time-based (seconds/minutes) or cycle-based rotation

March in Place

March in Place "Hexapod, march in place/step in place [for X seconds/minutes]" - In-place marching demonstration with optional duration control

Idle Stance

"Hexapod, go to idle stance/neutral position" - Return to neutral default position

Entertainment Commands

Sit Up

Sit Up "Hexapod, make some sit ups/do sit ups" - Dynamic sit-up exercise routine

Say Hello

Say Hello "Hexapod, say hello/wave" - Friendly greeting gesture with leg movement

Helix

Helix "Hexapod, helix/spiral" - Helical movement pattern

Audio Commands

Sound Source Localization

Sound Source Localization "Hexapod, run sound source localization/analyze sounds" - Analyze environment for sound sources

ODAS Studio

Sound Source Following

"Hexapod, follow me/track me" - Audio-based target following using ODAS

Stream ODAS Audio

"Hexapod, stream ODAS audio" - Stream processed audio from ODAS system to remote host

Start/Stop Recording

"Hexapod, start recording/begin recording [for X seconds/minutes]" / "Hexapod, stop recording/end recording" - Begin/end audio recording with optional duration control

Light Commands

Police Lights

Police Lights "Hexapod, activate police mode/police lights" - Police-style flashing lights

Rainbow Lights

Rainbow Lights "Hexapod, activate rainbow/rainbow mode" - Rainbow color sequence

Change Color

Change Color "Hexapod, change color/set color to [blue/red/green/etc.]" - Change LED color to specified color from 13 available colors

Turn Lights On/Off

"Hexapod, turn lights/turn on lights [on/off]" - Control LED power state

Set Brightness

"Hexapod, set brightness/adjust brightness to X%" - Adjust LED brightness from 0-100%

System Commands

Calibrate

Calibrate "Hexapod, calibrate servos/calibrate" - Servo calibration and position setup

System Status

"Hexapod, what's your status?/show status" - System health and status reporting System Status

Help

"Hexapod, show commands/help" - Display list of available commands. All voice commands are defined in intent.yml

Wake Up

Wake Up "Hexapod, wake up/activate" - Activate the system from sleep mode

Sleep

Sleep "Hexapod, go to sleep/sleep" - Put system into sleep mode

Stop

"Hexapod, stop/halt" - Immediately stop current task or movement

Repeat Last Command

"Hexapod, repeat last command/do it again" - Execute the previous command again

Set Speed

"Hexapod, set speed/adjust speed to X%" - Adjust movement speed from 0-100%

Set Acceleration

"Hexapod, set acceleration/adjust acceleration to X%" - Adjust movement acceleration from 0-100%

Shut Down

Shut Down "Hexapod, shut down/power off" - Safely power down the entire system with countdown timer and LED indication

Key Features

Advanced Voice Control

  • Custom wake word detection ("Hexapod")
  • Natural language command processing
  • Multi-intent handling for complex command processing
  • Context-aware command interpretation
  • Support for movement, gait control, and system commands

Spatial Audio Processing

  • Real-time Direction of Arrival (DOA) estimation
  • 6-microphone circular array processing
  • Multi-source tracking (up to 4 simultaneous sources)
  • Beamforming for enhanced speech recognition
  • Optional GUI on remote host for ODAS processing
  • Real-time audio streaming and recording capabilities
  • Network communication for remote processing
  • Automatic audio source separation and tracking

Intelligent Movement

  • 18-degree-of-freedom movement (3 DOF per leg)
  • Multiple gait patterns (tripod, wave, custom)
  • IMU-based stability control
  • Circle-based targeting for direction-independent movement
  • Precise inverse kinematics for accurate positioning
  • State machine management for coordinated gait execution

Visual Feedback

  • LED strip integration for status indication
  • Advanced LED animation system with multiple patterns
  • Real-time sound source localization visualization through LED patterns
  • Color-coded system status and error indication
  • Brightness control and color customization (RGB based)

Logging System & Terminal Feedback

The hexapod system provides comprehensive real-time feedback through a structured logging system that displays system status, voice command processing, and operational information directly in the terminal.

Example: Terminal Logging Example

Performance Characteristics

  • Voice Recognition: >95% accuracy, <200ms latency
  • Sound Source Localization: ±5° accuracy, real-time processing
  • Movement Control: 50Hz servo update rate
  • Multi-source Tracking: Up to 4 simultaneous sound sources

Hardware

Core Components

  • Raspberry Pi 4 (2GB+ RAM recommended)
  • 18x TowerPro MG-995 Servos (3 per leg)
  • Pololu Maestro 24-Channel Servo Controller
  • ReSpeaker 6-Mic Circular Array
  • ICM-20948 IMU
  • APA102 LED Strip (integrated in ReSpeaker)
  • 5 x 1.2V 2500 mAh NiMH Battery Pack (6V total)

Optional Components

  • Remote ODAS - GUI
  • Gamepad Controller (for manual control)

Quick Start

Installation

# Clone the repository
git clone <repository-url>
cd hexapod

# Run the automated installation script
./install.sh

The installation script will:

  • Check Python version compatibility
  • Install the hexapod package and dependencies
  • Create configuration directories
  • Prompt for your Picovoice Access Key
  • Set up the configuration file automatically

Running the System

Basic Voice Control

# Run with voice control (uses default config file)
hexapod

With Gamepad Control

The system automatically detects and uses a connected DualSense controller. If a controller is detected, the system will start in manual control mode with voice control as a secondary option. If no controller is detected, the system falls back to voice control mode only.

# Run with automatic controller detection
hexapod

Custom Configuration

# Use a custom picovoice access_key configuration file
hexapod --config /path/to/your/.picovoice.env

# Override the Picovoice access key
hexapod --access-key "YOUR_PICOVOICE_KEY"

Configuration Options

For a complete list of available command-line options, run:

hexapod --help

The system uses a configuration file at ~/.config/hexapod/.picovoice.env by default, which is automatically created during installation.

System Architecture

The hexapod system consists of several integrated components:

  • Voice Control System - Picovoice integration for wake word detection and command recognition
  • ODAS Audio Processing - Real-time spatial audio processing and sound source localization
  • Robot Movement - 18-DOF hexapod with inverse kinematics and multiple gait patterns
  • Hardware Integration - Servo control, IMU sensing, and LED feedback systems
  • Task Management - Central coordination system for complex operations

High-Level System Architecture

graph TB
    subgraph "Hexapod System Architecture"
        subgraph "Application Layer"
            MA[Main Application<br/>• Entry Point<br/>• Component Coordination<br/>• Control Mode Management]
            TI[Task Interface<br/>• Voice Command Processing<br/>• Task Orchestration<br/>• State Management]
        end
        
        subgraph "Control Systems"
            VC[Voice Control<br/>• Wake Word Detection<br/>• Intent Recognition<br/>• Audio Processing]
            MC[Manual Control<br/>• Gamepad Input<br/>• Mode Switching<br/>• LED Feedback]
        end
        
        subgraph "Robot Systems"
            RM[Robot Movement<br/>• Gait Generation<br/>• Inverse Kinematics<br/>• Movement Commands]
            HI[Hardware Integration<br/>• Servo Control<br/>• Sensor Integration<br/>• Power Management]
        end
        
        subgraph "Audio Processing"
            ODAS[ODAS Audio<br/>• Sound Source Localization<br/>• Spatial Processing<br/>• Direction Tracking]
        end
    end
    
    MA --> TI
    MA --> VC
    MA --> MC
    TI --> RM
    TI --> HI
    VC --> TI
    MC --> TI
    ODAS --> TI
    RM --> HI
Loading

Voice Control System

graph TB
    subgraph "Voice Control System"
        subgraph "Audio Input"
            MA[6-Mic Array<br/>• ReSpeaker 6<br/>• 8 Channels<br/>• 16kHz Sample Rate]
            AD[Audio Device<br/>• Auto-detection<br/>• Device Selection<br/>• ALSA Integration]
        end
        
        subgraph "Voice Processing"
            PV[Picovoice Engine<br/>• Porcupine Wake Word<br/>• Rhino Intent Recognition<br/>• Real-time Processing]
            ID[Intent Dispatcher<br/>• Command Routing<br/>• Parameter Parsing<br/>• Task Interface]
        end
        
        subgraph "Robot Control"
            TI[Task Interface<br/>• Movement Commands<br/>• Light Control<br/>• System Commands]
            AR[Audio Recording<br/>• Continuous Recording<br/>• Duration-based<br/>• File Management]
        end
    end
    
    MA --> AD
    AD --> PV
    PV --> ID
    ID --> TI
    PV --> AR
Loading

ODAS Audio Processing

graph TB
    subgraph "Audio Processing System"
        subgraph "Hardware Input"
            MA["6-Mic ReSpeaker Array<br/>• 8 Channels<br/>• 16kHz Sample Rate<br/>• 32-bit Audio"]
            AD["Audio Device<br/>• ALSA Integration<br/>• Real-time Capture<br/>• Multi-channel"]
        end
        
        subgraph "ODAS Processing"
            OD["ODAS Framework<br/>• Sound Source Localization<br/>• Direction of Arrival<br/>• Beamforming<br/>• Audio Separation"]
            CF["Configuration Files<br/>• local_odas.cfg<br/>• SSL Parameters<br/>• DOA Settings"]
        end
        
        subgraph "Data Processing"
            DS["Data Servers<br/>• Tracked Sources (Port 9000)<br/>• Potential Sources (Port 9001)<br/>• TCP Communication"]
            AP["Audio Processor<br/>• Channel Selection<br/>• Sample Rate Conversion<br/>• Picovoice Integration"]
        end
        
        subgraph "Output Streams"
            AS["Audio Streaming<br/>• Remote Playback<br/>• Real-time Transfer<br/>• WAV Conversion"]
            VS["Visualization<br/>• LED Feedback<br/>• Direction Display<br/>• Source Tracking"]
        end
    end
    
    MA --> AD
    AD --> OD
    OD --> CF
    OD --> DS
    DS --> AP
    AP --> AS
    DS --> VS
Loading

Robot Movement System

graph TB
    subgraph "Robot Movement System"
        subgraph "Core Control"
            H[Hexapod<br/>• Main Controller<br/>• 18 Servo Management<br/>• Position Tracking]
            GG[Gait Generator<br/>• Pattern Execution<br/>• State Management<br/>• Thread Coordination]
        end
        
        subgraph "Leg Control"
            L[Leg Class<br/>• Individual Leg Control<br/>• Inverse Kinematics<br/>• Joint Management]
            J[Joint Class<br/>• Servo Control<br/>• Angle Validation<br/>• Safety Limits]
        end
        
        subgraph "Movement Patterns"
            TG[Tripod Gait<br/>• 3+3 Leg Groups<br/>• High Stability<br/>• Efficient Movement]
            WG[Wave Gait<br/>• Sequential Movement<br/>• Maximum Stability<br/>• Precise Control]
        end
        
        subgraph "Hardware Integration"
            MU[Maestro UART<br/>• Servo Communication<br/>• Real-time Control<br/>• Safety Management]
            BC[Balance Compensator<br/>• IMU Integration<br/>• Stability Control<br/>• Fall Prevention]
        end
    end
    
    H --> GG
    H --> L
    L --> J
    GG --> TG
    GG --> WG
    H --> MU
    H --> BC
    J --> MU
Loading

Hardware Integration

graph TB
    subgraph "Hardware Integration System"
        subgraph "Control Layer"
            HC[Hexapod Controller<br/>• Joint Management<br/>• Position Control<br/>• Calibration]
            MC[Maestro Controller<br/>• UART Communication<br/>• Servo Commands<br/>• Error Handling]
        end
        
        subgraph "Hardware Components"
            SC[Servo Motors<br/>• 18 MG-995 Servos<br/>• 3 per leg<br/>• Position Control]
            IMU[IMU Sensor<br/>• ICM-20948<br/>• 9-DOF<br/>• Orientation Data]
            BTN[Button Input<br/>• GPIO Pin 26<br/>• User Interface<br/>• System Control]
            LED[ReSpeaker 6 LEDs<br/>• Integrated APA102 LEDs<br/>• 12 LEDs<br/>• Visual Feedback]
        end
        
        subgraph "Communication"
            UART[UART Interface<br/>• /dev/ttyAMA1<br/>• 9600 baud<br/>• Pololu Protocol]
            SPI[SPI Interface<br/>• LED Control<br/>• High Speed<br/>• Real-time Updates]
            GPIO[GPIO Interface<br/>• Button Input<br/>• Power Control<br/>• Digital I/O]
        end
    end
    
    HC --> MC
    MC --> UART
    UART --> SC
    HC --> IMU
    HC --> BTN
    HC --> LED
    LED --> SPI
    BTN --> GPIO
Loading

Task Management System

graph TB
    subgraph "Task Interface"
        subgraph "Core Management"
            TI[TaskInterface<br/>• Central Coordinator<br/>• Voice Command Processing<br/>• Task Lifecycle Management]
            SR[StatusReporter<br/>• System Health Monitoring<br/>• Status Information<br/>• Diagnostic Data]
        end
        
        subgraph "Hardware Integration"
            H[Hexapod<br/>• Robot Control<br/>• Movement<br/>• Calibration]
            LH[LightsHandler<br/>• Visual Feedback<br/>• Status Display<br/>• Animations]
            BH[ButtonHandler<br/>• GPIO Input<br/>• User Interaction<br/>• State Management]
        end
        
        subgraph "Task Execution"
            TQ[Task Queue<br/>• Task Management<br/>• Lifecycle Control<br/>• Callback Handling]
            TE[Task Executor<br/>• Thread Management<br/>• Resource Allocation<br/>• Error Recovery]
        end
    end
    
    TI --> H
    TI --> LH
    TI --> BH
    TI --> SR
    TI --> TQ
    TQ --> TE
Loading

Documentation

For detailed technical documentation and system architecture please refer to the Documentation.

The documentation covers:

Getting Started

Core Systems

Robot Movement

Hardware Integration

Interface Systems

Voice Control & Audio

ODAS Audio Processing

Testing & Code Quality

This repository includes a comprehensive unit test suite with 93% overall code coverage across the entire codebase. The test suite was generated using AI-assisted development to ensure thorough validation of all system components.

Test Suite Overview

  • Total Test Files: 78 test files
  • Total Test Functions: 1,786 individual test cases
  • Overall Code Coverage: 93% (5,819 of 6,227 statements covered)
  • Test Status: 1,786 passed

Test Architecture

The test suite is organized to mirror the source code structure:

tests/
├── unit/
│   ├── gait_generator/     # Gait pattern and locomotion tests
│   ├── interface/          # User interface and controller tests
│   │   ├── console/        # Console input handler tests
│   │   ├── controllers/    # Manual controller tests
│   │   ├── input_mappings/ # Input mapping tests
│   │   └── logging/        # Logging system tests
│   ├── kws/                # Voice control and keyword spotting tests
│   ├── lights/             # LED control and animation tests
│   │   └── animations/     # LED animation tests
│   ├── maestro/            # Maestro servo controller tests
│   ├── odas/               # Audio processing and ODAS tests
│   ├── robot/              # Robot movement and control tests
│   │   └── sensors/        # Sensor (IMU, button) tests
│   ├── task_interface/     # Task management and execution tests
│   │   └── tasks/          # Individual task implementation tests
│   └── utils/              # Utility function tests
├── conftest.py             # Shared test fixtures and configuration
└── reports/                # Coverage reports (HTML, XML, JSON)

Running Tests

Run All Tests

# Run complete test suite with coverage
pytest --cov=hexapod --cov-report=html --cov-report=term-missing

Note: The test suite is configured to fail if overall code coverage drops below 80%. This ensures code quality standards are maintained.

Coverage Reports

Detailed coverage reports are automatically generated and available in multiple formats:

  • HTML Report: tests/reports/html/index.html - Interactive web-based coverage report
  • XML Report: tests/reports/coverage.xml - Machine-readable format for CI/CD
  • JSON Report: tests/reports/coverage.json - Structured data format

Type Checking with MyPy

The project uses MyPy for static type checking to ensure code quality and catch type-related errors early in development.

Current Project State

$ mypy hexapod/
Success: no issues found in 83 source files

License

Copyright (c) 2025 Krystian Głodek krystian.glodek1717@gmail.com. All rights reserved.

About

This project develops an autonomous hexapod robot using auditory scene analysis for navigation. It integrates sound source localization (DOA) and beamforming via ODAS with a circular microphone array for precise spatial detection. A machine learning-based Keyword Spotting (KWS) module enables voice command recognition for human-robot interaction.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published