Self-learning cognitive architecture with LinUCB contextual bandits, quaternion semantic exploration, and anchor-based perspective detection.
ARIA is an advanced self-learning cognitive architecture that learns from every query to continuously improve retrieval quality. It combines:
- 🎯 LinUCB Contextual Bandits - Feature-aware multi-armed bandit optimizes retrieval strategies
- 🌀 Quaternion Semantic Exploration - 4D rotations through embedding space with golden ratio spiral
- 🧭 Anchor-Based Perspective Detection - 8-framework query classification aligned with philosophical anchors
- 📚 Enhanced Semantic Networks - V2 vocabularies with 121 concepts across 8 domains
- 🎓 Continuous Learning Loop - Learns from conversation feedback and quality scoring
- 📊 Hybrid Search - BM25 lexical + semantic embeddings (sentence-transformers)
- Context-Aware: Uses 10D query feature vectors (complexity, domain, length, etc.)
- Fast Convergence: Learns optimal strategies in ~50 queries (vs 100+ for Thompson Sampling)
- Feature-Based: Generalizes across similar query types
- High Performance: 22,000+ selections/second, sub-millisecond latency
- Golden Ratio Spiral: φ-based (1.618...) uniform sphere coverage with 100 sample points
- Multi-Rotation Refinement: 1-3 iterations for progressive depth
- PCA-Aligned Rotations: Follow semantic space structure
- Perspective-Aware Angles: 15°-120° rotation based on query intent and anchor alignment
- 8 Philosophical Anchors: Platonic Forms, Telos, Logos, Aletheia, Nous, Physis, Techne, Praxis
- Vocabulary Alignment: 121 enhanced concepts across philosophy, engineering, law, business, creative arts, social sciences, security, data science
- Meta-Cognitive Guidance: Reasoning heuristics, common errors, learning paths
- Topology Maps: Network graphs show concept relationships and prerequisites
- Teacher ARIA: Query-driven knowledge retrieval with bandit optimization
- Student ARIA: Conversation corpus learning from LLM interactions
- Feedback Loop: Quality scoring updates bandit preferences
git clone https://github.com/dontmindme369/ARIA.git
cd ARIA
pip install -r requirements.txtCreate aria_config.yaml:
# Core paths
knowledge_base: "/path/to/your/knowledge/base"
embeddings: "/path/to/embeddings"
output_dir: "./rag_runs/aria"
# LinUCB Bandit Settings
bandit:
epsilon: 0.10 # Exploration rate
alpha: 1.0 # UCB exploration parameter
feature_dim: 10
# Retrieval Presets (controlled by bandit)
presets:
fast:
top_k: 40
sem_limit: 64
rotations: 1
balanced:
top_k: 64
sem_limit: 128
rotations: 2
deep:
top_k: 96
sem_limit: 256
rotations: 3
diverse:
top_k: 80
sem_limit: 128
rotations: 2from core.aria_core import ARIA
# Initialize ARIA
aria = ARIA(
index_roots=["/path/to/knowledge"],
out_root="./aria_packs"
)
# Query with automatic preset selection
result = aria.query(
"How do I implement a binary search tree in Python?"
)
# Access results
print(f"Preset: {result['preset']}")
print(f"Run dir: {result['run_dir']}")
print(f"Pack: {result['pack']}")# Single query
python aria_main.py "Explain how HTTP cookies work"
# With specific preset
python aria_main.py "Debug memory leak" --preset deep
# With anchor alignment
python aria_main.py "What is justice?" --with-anchorARIA consists of 8 integrated layers:
- Multi-format input (text, structured queries)
- Query preprocessing and normalization
- 10-dimensional feature vectors
- Query complexity, domain, length, entity counts
- 8 anchor-aligned perspectives
- V2 semantic network vocabularies
- ~1,440 perspective markers
- Philosophical framework alignment
- Template matching with exemplar scoring
- Contextual Multi-Armed Bandit
- 4 arms: fast, balanced, deep, diverse
- Feature-aware UCB with epsilon-greedy (ε=0.10)
- A/b matrix tracking per arm
- 4D semantic space rotations
- Golden ratio spiral sampling
- Multi-rotation refinement
- BM25 + semantic embeddings
- Reciprocal rank fusion
- Diversity-aware deduplication
- Coverage + exemplar fit + diversity scoring
- Conversation quality analysis
- Bandit feedback with feature vectors
ARIA has migrated from Thompson Sampling to LinUCB (Linear Upper Confidence Bound) contextual bandits for superior performance:
| Metric | Thompson Sampling | LinUCB | Improvement |
|---|---|---|---|
| Convergence | ~100 queries | ~50 queries | 2× faster |
| Features Used | None | 10D vectors | Context-aware |
| Selection Speed | ~1,000 ops/sec | 22,658 ops/sec | 23× faster |
| Generalization | Per-query only | Cross-query patterns | Better |
-
Feature Extraction: Extract 10D vector from query
features = [ query_length, # Normalized 0-1 complexity, # simple/moderate/complex/expert domain_technical, # Binary indicators domain_creative, domain_analytical, domain_philosophical, has_question, entity_count, time_of_day, bias_term # Always 1.0 ]
-
UCB Calculation: For each preset (arm):
UCB(arm) = θ·x + α·√(xᵀ·A⁻¹·x) ↑ ↑ expected uncertainty reward (exploration) -
Selection: Choose arm with highest UCB (with ε-greedy random exploration)
-
Update: After reward feedback:
A ← A + x·xᵀ b ← b + r·x θ = A⁻¹·b # Ridge regression weights
See docs/LINUCB_MIGRATION_COMPLETE.md for full details.
Enhanced semantic networks with anchor alignment:
- Philosophy (16 concepts) - Epistemology, metaphysics, ethics
- Engineering (15 concepts) - Systems, optimization, design patterns
- Law (15 concepts) - Justice, contracts, precedent
- Business (15 concepts) - Strategy, operations, markets
- Creative Arts (15 concepts) - Aesthetics, narrative, craft
- Social Sciences (15 concepts) - Society, culture, research
- Security (15 concepts) - Threat modeling, defense, analysis
- Data Science (15 concepts) - ML, statistics, visualization
Total: 121 enhanced concepts with:
- Semantic networks (551 edges, density 0.55-0.70)
- Reasoning heuristics
- Common errors and pitfalls
- Learning prerequisites (depth 0-4)
- Mental models
See docs/PHASE_3_COMPLETION_REPORT.md for details.
✅ High-Volume Processing: 1,527 queries/second
✅ Concurrent Processing: 2,148 queries/second (10 threads)
✅ Bandit Selection: 22,658 operations/second (0.044ms avg)
✅ Bandit Update: 10,347 operations/second (0.097ms avg)
✅ Memory: Stable, no leaks detected
✅ Performance Degradation: < 1% over time
Coverage Score: 0.75-0.95 (semantic space coverage)
Exemplar Fit: 0.60-0.90 (anchor template alignment)
Diversity: 0.70-0.95 (result variety)
Overall Reward: 0.68-0.92 (multi-objective)
# Stress tests
python aria_systems_test_and_analysis/stress_tests/test_stress.py
# Bandit intelligence
python aria_systems_test_and_analysis/bandit_intelligence/test_bandit_intelligence.py
# Integration tests
python aria_systems_test_and_analysis/integration/test_integration.py| Suite | Tests | Passed | Success Rate |
|---|---|---|---|
| Stress Tests | 6 | 6 | 100% |
| Bandit Intelligence | 6 | 6 | 100% |
| Integration Tests | 6 | 5* | 83%** |
| Total | 18 | 17 | 94.4% |
*One test requires optional watchdog dependency for file monitoring
**Core functionality: 100% tested and passing
- GETTING_STARTED.md - Quick start guide
- docs/ARCHITECTURE.md - System architecture
- docs/API_REFERENCE.md - API documentation
- docs/QUATERNIONS.md - Quaternion mathematics
- docs/USAGE.md - Usage examples
- docs/LINUCB_MIGRATION_COMPLETE.md - LinUCB migration details
- docs/PHASE_3_COMPLETION_REPORT.md - V2 vocabulary system
- docs/CONTRIBUTING.md - Contribution guidelines
Current Phase: Production Ready (Phase 3.5 Complete)
- ✅ Phase 1: Anchor framework integration
- ✅ Phase 2: V2 vocabulary development (121 concepts)
- ✅ Phase 3: Semantic network integration & topology maps
- ✅ Phase 3.5: LinUCB migration (Thompson → LinUCB)
- 🚧 Phase 4: Production integration & monitoring
- 🚧 Enhanced query expansion using semantic networks
- 🚧 Meta-cognitive reasoning heuristics
- 🚧 Real-time learning dashboard
See docs/ARIA_PROJECT_CHECKPOINT.md for roadmap.
python >= 3.8
numpy >= 1.21.0
sentence-transformers >= 2.0.0
rank-bm25 >= 0.2.2
pyyaml >= 5.4.1
Optional:
watchdog >= 2.1.0 # For file monitoring
MIT License - see LICENSE for details
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/