GitHub - MAMware/acoustsee: Spatial Audio Navigator

a photon to phonon code

Introduction

The content in this repository is meant to provide the code for a public infraestructure web app that aims to transform visual environments into soundscapes, empowering the users to experience the visual world by synthetic audio cues, in real time.

Why? We believe in enhancing humanity with open-source software in a fast, accessible and impactful way. You are invited to join us to improve its mission and make a difference!

Project Vision

Synesthetic Translation: Converting visual data into stereo audio cues, mapping colors, motion to distinct sound signatures.
Dynamic Soundscapes: Adjusts audio in real time based on object distance and motion, e.g., a swing’s sound shifts in volume and complexity as it moves.
Location-Aware Audio: Enhances spatial awareness by producing sounds in the corresponding ear, such as a wall on the left sounding in the left ear.

Tech stack needed

Run the version of your choice in any internet browser from year 2020 and up. The design is tested with a mobile phone anda its front camera Input: Mobile camera for real-time visual data capture. Audio Output: Stereo headphones for spatial audio effects.

Hipothetic Use Case

Launch the app on a mobile device to translate live camera input into a dynamic stereo soundscape. For a visually impaired user in a park, a mobile phone worn as a necklace captures surrounding visuals like a swing in motion, as the swing moves away, the app produces a softer, simpler sound; as it approaches, the sound grows louder and more complex. Similarly, a sidewalk might emit a steady, textured tone, a car in the distance a low hum, and a wall to the left a localized sound in the left ear. This enables users to perceive and interact with their surroundings through an innovative auditory interface, fostering greater independence and environmental awareness.

Development

Entirely coded by xAI Grok 3 to Milestone 4 as per @MAMware prompts Milestone 5 wich is a work in progress is getting help from OpenAI ChatGPT 4.1, 04-mini, Anthropic Claude 4 via @github copilot at codespaces and also Grok 4 wich is charge of the re-estructuring from v0.5.12

We welcome contributors!

Haptic feedback via Vibration API Developing in Progress 85%
Console log on device screen and mail to feature for debuggin. Developing in Progress 85%
New languajes agnostic architecture ready to provide multilingual support for the speech sinthetizer and UI Developing in Progress 95%
Mermaid diagrams to reflect current Modular Single Responsability Principle To do

Changelog

Current "stable" version from "present" is v0.4.7, link above logs the history and details past milestones achieved.
Current "future" version in development starts from v0.5

Project structure

Current version RUN
Previous versions RUN
Test version in development RUN

System requirements

The software is designed to run in most modern mobile and desktop web browsers. Video processing runs locally in the browser; audio is produced in real time and routed to stereo output (headphones recommended).

Hypothetical Use Case

Launch the app in a web browser to translate live camera input into a dynamic stereo soundscape. For example, a swinging object might map to a softer sound as it moves away and a louder, richer sound as it approaches. A distant car could render as a low hum. The goal is to enable perception of surroundings through an auditory interface, improving independence and situational awareness.

Current Status

Milestone 0 to 4: reached by vibecoding with xAI Grok 3
Milestone 5: reached byv ibecoded with SuperGrok 4. some assistance from Gemini 2.5 Pro (Preview), ChatGPT 4.1 & o4-mini agents + small reviews from Claude 4.
Milestone 6: restructered with Gemini 2.5 Pro and ChatGPT 4.1 & 04-mini agents
Milestone 6.5: (WIP) robust architectural improvements and integration work by GPT-5 mini (Preview)
Milestone 7 to 9: mayor redesign with a foundational Command pattern and Hexagonal architecture while still in plain vanilla JS, not merged to developing branch becouse this actually a complete rebase.

v0.6 Project structure, (in construction)


web/
├── audio/                    # Audio synthesis/processing (notes-to-sound, HRTF, mic)
│   ├── audio-controls.js     # PowerOn/AudioContext init
│   ├── audio-manager.js      # AudioContext management
│   ├── audio-processor.js    # Core audio (oscillators, playAudio, cleanup; integrates HRTF/ML depth)
│   ├── hrtf-processor.js     # HRTF logic (PannerNode, positional filtering)
│   └── synths/               # Synth methods (extend with HRTF)
│       ├── sine-wave.js
│       ├── fm-synthesis.js
│       └── available-engines.json
├── video/                    # Video capture/mapping (camera-to-notes/positions; includes ML depth)
│   ├── video-capture.js      # Stream setup/cleanup
│   ├── frame-processor.js    # Frame analysis (emits notes/positions; calls ML if enabled)
│   ├── ml-depth-processor.js # New: Monocular depth estimation 
│   └── grids/                # Visual mappings 
│       ├── hex-tonnetz.js
│       ├── circle-of-fifths.js
│       └── available-grids.json
├── core/                     # Orchestration (events, state)
│   ├── dispatcher.js         # Event handling 
│   ├── state.js              # Settings/configs 
│   └── context.js            # Shared refs
├── ui/                       # Presentation (buttons, DOM; optional ML/HRTF toggles)
│   ├── ui-controller.js      # UI setup
│   ├── ui-settings.js        # Button bindings 
│   ├── cleanup-manager.js    # Teardown listeners
│   └── dom.js                # DOM init
├── utils/                    # Cross-cutting tools (TTS, haptics, logs)
│   ├── async.js              # Error wrappers
│   ├── idb-logger.js         # Persistent logs
│   ├── logging.js            # Structured logs
│   └── utils.js              # Helpers (getText, ...)
├── languages/                # Localization (add ML/HRTF strings)
│   ├── es-ES.json
│   ├── en-US.json
│   └── available-languages.json
├── test/                     # Tests (grouped by category)
│   ├── audio/                # Audio/HRTF tests
│   │   ├── audio-processor.test.js
│   │   └── hrtf-processor.test.js
│   ├── video/                # Video/grid/ML tests
│   │   ├── frame-processor.test.js
│   │   └── ml-depth-processor.test.js  # New: Test depth estimation
│   ├── core/                 # Dispatcher/state tests (if added)
│   ├── ui/                   # UI tests
│   │   ├── ui-settings.test.js
│   │   └── video-capture.test.js
│   └── utils/                # Utils tests (if added)
├── .eslintrc.json            # Linting
├── index.html                # HTML entry
├── main.js                   # Bootstrap (update imports for moves/ML init)
├── README.md                 # Docs (update structure/ML/HRTF)
└── styles.css                # Styles

Contributing

We welcome contributors!

At this document linked above, you will find the list for our current TO TO list, now from milestone 5 (v0.5.2)

Code flow diagrams

Diagrams covering the Turnk Based Development approach (v0.2).

Process Frame Flow
Audio Generation Flow
Motion Detection such as oscillator logic.

graph TD
        A[dispatcher.js] -->|routes| B[core/handlers/]
        B --> C[video-handlers.js]
        B --> D[audio-handlers.js]
        B --> E[ui-handlers.js]
        B --> F[settings-handlers.js]
        B --> G[grid-handlers.js]
        B --> H[debug-handlers.js]
        C -->|calls| I[video/frame-processor.js]
        D -->|calls| J[audio/audio-processor.js]
        E -->|updates| K[ui/ui-settings.js]
        F -->|uses| L[utils/utils.js]
        A -->|state| M[state.js]
        A -->|logs| N[utils/logging.js]
        B -->|future| O[ml-handlers.js]

Changelog

Current "stable" version from "present" is v0.4.7, the link above logs the history and details past milestones achieved.
Current "future" version in development starts from v0.6

FAQ

Follow the link for list of the Frecuently Asqued Questions.

License

GPL-3.0 license details

Peace Love Union Respect

Name		Name	Last commit message	Last commit date
Latest commit History 601 Commits
.github		.github
docs		docs
future		future
past		past
present		present
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Introduction

Project Vision

Tech stack needed

Hipothetic Use Case

Development

Table of Contents

Usage

Check Usage for further details

Current Status

Changelog

Project structure

System requirements

Hypothetical Use Case

Current Status

v0.6 Project structure, (in construction)

Contributing

Code flow diagrams

Changelog

FAQ

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

MAMware/acoustsee

Folders and files

Latest commit

History

Repository files navigation

Project Vision

Tech stack needed

Hipothetic Use Case

Development

Table of Contents

Check Usage for further details

System requirements

Hypothetical Use Case

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Uh oh!

Languages