Note
This project is currently under active development by our team.
Expected Completion Date: December 2025.
AIris is a revolutionary wearable AI system that provides instant, contextual scene descriptions for visually impaired users. With a simple button press, users receive intelligent, real-time descriptions of their surroundings through advanced computer vision and natural language processing.
- Sub-2-second response time from capture to audio description
- Contextual intelligence with spatial awareness and safety prioritization
- Offline-first design with cloud enhancement capabilities
- Wearable form factor designed for comfort and accessibility
- Private audio delivery through integrated directional speakers
graph TB
A[π Spectacle Camera] --> B[π₯οΈ Raspberry Pi 5]
B --> C[π Directional Speaker]
B --> D[π Portable Battery]
B --> E[π± Optional Phone Sync]
style A fill:#4B4E9E,color:#fff
style B fill:#C9AC78,color:#000
style C fill:#4B4E9E,color:#fff
graph LR
A[π· Camera Interface] --> B[π§ AI Engine]
B --> C[π Audio System]
subgraph "AI Engine"
D[π― Scene Analyzer]
E[βοΈ Groq API Client]
F[π Local Models]
end
subgraph "Audio System"
G[π£οΈ TTS Engine]
H[π΅ Speaker Control]
end
style A fill:#E9E9E6
style B fill:#4B4E9E,color:#fff
style C fill:#E9E9E6
Metric | Target | Current Status |
---|---|---|
Response Latency | < 2.0s | ~ |
Object Recognition | > 85% | ~ |
Battery Life | > 8 hours | ~ |
Memory Usage | < 7GB | ~ |
We're currently in the prototype and testing phase, working with a web interface to evaluate and optimize different multimodal AI models before hardware integration.
Our development team is using a local web interface to rapidly prototype and test various AI models:
π Development Web Interface
βββ Image Upload & Capture Testi
βββ Audio Output Testing
βββ Real-time Metrics Visualization
We're currently testing and benchmarking multiple state-of-the-art vision-language models:
Model | Status | Avg Response Time | Accuracy Score | Memory Usage |
---|---|---|---|---|
LLaVA-v1.5 | β Testing | ~ | ~ | ~ |
BLIP-2 | β Testing | ~ | ~ | ~ |
MiniGPT-4 | β Testing | ~ | ~ | ~ |
Groq API | β Testing | ~ | ~ | ~ |
Ollama Local | β Testing | ~ | ~ | ~ |
Model Evaluation
- Testing multiple vision-language models
- Benchmarking performance on Raspberry Pi 5
- Optimizing for speed vs. accuracy trade-offs
Web Interface Development
- Real-time model comparison dashboard
- Performance metrics visualization
- User experience prototyping
Performance Optimization
- Model quantization experiments
- Memory usage optimization
- Latency reduction techniques
- Custom hardware design and 3D modeling
- Wearable form factor development
- Field testing with target users
- β Core software architecture
- β AI model research and selection
- π Web interface development
- π Performance optimization
- β³ Audio system integration
- β³ Hardware design and 3D modeling
- β³ Wearable system integration
- β³ Field testing with users
- β³ Final optimization and documentation
This project will be developed by:
Name | Institution | ID | GitHub | Followers |
---|---|---|---|---|
Rajin Khan | North South University | 2212708042 | ||
Saumik Saha Kabbya | North South University | 2211204042 |
~ as part of CSE 499A/B at North South University, building upon the foundation of TapSense to advance accessibility technology.