Four Rust tools. One month. No containers — just single binaries you drop on any machine. Octofs 0.4.3: regex search + parallel file walking. Preserves permissions. Octobrain 0.6.1: smaller binary, faster cold starts. Your local LLM runtime. Octolib 0.21.5: unified reasoning effort across every provider. Anthropic adaptive thinking, prompt cache keepalive, DeepSeek tool calling. Octomind 0.29.0: intent-based MCP auto-activation, persistent vector cache, schedule persistence. All Apache-2.0. Check it out: https://lnkd.in/ep6rGckh
Muvon
IT Services and IT Consulting
We build AI systems — agents, RAG, MCP servers, and training — so your team ships AI, not slides.
About us
e help companies ship AI that works: • AI Agents — autonomous systems that handle real workflows • RAG & Search — retrieval-augmented generation over your data • Custom MCP Servers — connect AI to your internal tools • AI Team Training — hands-on workshops on your actual codebase • Ongoing AI Support — monitoring, tuning, scaling post-launch • AI Consulting — $250/hr, senior engineers only "AI Builders, Not AI Advisors" — every engagement is led by engineers who write code, not slides. No junior associates. No bloated SOWs. We maintain open source AI tools used by developers worldwide: → Octomind — multi-agent runtime with 13+ providers → Octocode — production RAG system for code search → Octobrain — persistent AI memory system Explore what we can build: muvon.io
- Website
-
https://muvon.io
External link for Muvon
- Industry
- IT Services and IT Consulting
- Company size
- 2-10 employees
- Headquarters
- Hong Kong
- Type
- Privately Held
- Founded
- 2017
- Specialties
- AI Development, AI Agents, RAG Systems, Retrieval-Augmented Generation, MCP Servers, Model Context Protocol, AI Consulting, AI Team Training, AI Integration, Custom AI Solutions, LLM Development, AI Engineering, AI Support, Machine Learning, Natural Language Processing, AI Automation, Vector Search, AI Architecture, Open Source AI, and Production AI Systems
Locations
-
Primary
Get directions
50 Stanley Street
SUITE C, LEVEL 7, WORLD TRUST TOWER
Hong Kong, 999077, HK
Employees at Muvon
Updates
-
AI CLI tools have a UX problem that nobody talks about. We just shipped Octomind 0.29.0. Three things that actually matter: 1. A terminal that doesn't hurt to look at Real-time cost tracking ($0.02 with progress bar), context window usage bar, pixel-art ANSI banner, continuous blue rail for history. Framed tool blocks. Clean markdown. Silent Ctrl+C. It sounds small until you spend 8 hours in it. 2. Skills that load themselves Semantic capability activation: local embeddings, cosine similarity scoring, margin gating so only the best match fires. Zero network calls. Zero manual wiring per session. Claude Code, Codex, Cursor CLI, Aider — all make you wire this up by hand. Every time. 3. Project-local MCP tools, zero config Drop executable scripts in .agents/tools/. Any language. Auto-discovered every turn. Hot-reload on save. No separate MCP servers. No JSON configs. Nobody else does this. Developer experience isn't polish. It's removing the friction that makes engineers quit tools. If you're building with AI in the terminal, this is the release to try. Try it: https://lnkd.in/euNnYNeH
-
We built Timex because every time tracker we tried had the same problems: cloud dependency, monthly subscriptions, and interfaces that felt designed to sell enterprise licenses. So we made something different. Timex is an automatic time tracker for Mac that samples your active app, window title, and browser tab once per second — and writes everything to a local SQLite file. No cloud. No account. No telemetry. No subscription. The file is yours. Query it, back it up, or delete it. It works offline, forever. We also added a break timer with exercise videos that pauses when you're idle (it doesn't punish deep focus), and a clamshell mode that keeps your Mac awake with the lid closed — useful for overnight Ollama runs, renders, or agent workflows. Pricing: $24.50 once with code TIMEX50 (through July 1, normally $49). Free trial is 100 hours of tracking, about 12 work days. No credit card required. We're bootstrapped and AI-native. We don't have a board to answer to or a growth-at-all-costs mandate. We just want to build software people are happy to pay for once. If that sounds like your kind of tool: https://gettimex.app
-
Claude Code's $100/month Max 5 plan: drained in 1 hour. That's not a bug. It's the business model. Anthropic admitted they're throttling paid users during peak hours. Developers are reporting rate limits after 15–30 minutes of intensive coding—not during some theoretical spike, during normal work. The official advice? Pay more. Switch to API billing ($3 in / $15 out per million tokens). An 8-hour agentic coding session can burn $50/day. There's a better answer: stop relying on one provider. Rate limits are physics—finite GPUs, finite capacity. But single-provider architecture makes your engineering velocity hostage to one company's business decisions. The fix is architecturally simple: session-level multi-provider routing. When Claude throttles, route to DeepSeek. When DeepSeek is slow, route to GPT. When everything's expensive, route to a local model. Same session. Same context. Same memory. Different brain. Here's the provider hierarchy that actually works: 🔹 Tier 1 (cheap/fast): DeepSeek V4 Flash, local models → File reading, pattern matching, boilerplate, tests 🔹 Tier 2 (capable/affordable): DeepSeek V4 Pro, GPT-5.4, Claude Sonnet → Multi-file refactoring, debugging, architecture decisions 🔹 Tier 3 (frontier): GPT-5.5, Claude Opus 4.6 → Novel algorithms, deep security audits, expensive failures Most sessions spend 80% of their time in Tier 1–2. The frontier model should be a special occasion, not a default. I run local models for routine work and switch to APIs for hard problems. Monthly API bill: $15 instead of $200. Rate limit hits: zero. Single-provider dependency is a liability. Multi-provider routing is just infrastructure. Code is cheap. Downtime is expensive. Rate limits are optional. → github.com/muvon/octomind
-
OWASP just released the first-ever Top 10 for AI Agent Security. 100 industry experts, six months of work, and the result is a list every team deploying agents should read twice. But here's what the report doesn't say explicitly: most of these risks aren't bugs you patch. They're architectural mistakes you make on day one. The three that show up most in production: Excessive agency. Your agent has permissions it doesn't need. It can deploy to production, access your database, and send Slack messages — all because the framework assumed more tools = more value. That agent is a security incident waiting to happen. Insecure output handling. The agent generates code, your system runs it, and nobody checks whether it's safe. Not because the team is careless — because the architecture treats AI output as trustworthy by default. It's not. Insecure memory. The agent has seen your API keys, database passwords, and private tokens. They're in its context. A carefully crafted prompt can extract them. OWASP calls this "Insecure Memory and Storage" — I call it giving your secrets to a probabilistic text generator. Most frameworks treat the AI as a trusted component with unlimited access. That architecture is backwards. The AI should be the least trusted component in the system — limited access, limited memory, limited tools, limited execution scope. Everything verified, logged, and reversible. We've been building Octomind this way from the start. Scoped permissions per session. Sandboxed execution by default. Secrets injected at runtime, never stored in context. Sessions isolated between users. Not because we're paranoid — because it's the only architecture that makes sense when your agent can rewrite your codebase. Full breakdown of all 10 OWASP risks and what to actually do about them → link in comments. Has your team run into any of these already? Curious which one surprised you most.
-
YouTube purged 16 channels this year. 35 million subscribers. Gone. Not for hate speech — for AI slop. The same problem is happening in codebases everywhere, and it's harder to spot. AI-generated code doesn't have uncanny valley faces. It has perfect docstrings, clean variable names, and logical structure — that silently fails in production. Catches every exception and returns None. Calls APIs with wrong signatures. Imports libraries that don't exist. We call it the slop tax: the time you spend sorting through generated output to find what's actually correct. The root cause is architectural. Agents are trained to produce plausible output, not correct output. There's no verification in that loop — only generation. And as agents get faster, the gap between what they produce and what you can review keeps widening. At Octomind, we built around one principle: make verification automatic where possible, visible where it's not. Tools that run tests and check compilers mid-session. Deterministic skills that load the right context from the start. Persistent memory so the agent doesn't forget what it was doing and drift. The teams that win this era won't have the fastest agents. They'll have the best verification. Curious if you've run into this — what's your current approach to reviewing agent output? → https://lnkd.in/eWdYNVit
-
We ran the numbers. The result was uncomfortable. DeepSeek V4 Pro costs $1.74 per million input tokens. GPT-5.5 costs $5.00. For a mid-size SaaS processing 100M tokens daily, that's $29,500/month — before output tokens, where GPT-5.5 charges $30/M vs DeepSeek's $3.48. The models score within a few benchmark points of each other. So why is almost every AI application still hardcoded to a single provider? It's not laziness. It's architecture. Prompts tuned to one model's personality. Token counting calibrated to one context window. Error handling built around one failure mode. Switching feels like retesting everything — so teams don't. But that logic made sense when there was one frontier model. Now there are half a dozen, with price spreads so wide they look like a data error. At Octomind, we built session-level model routing — not request-level. Because when your agent is refactoring a codebase across 50 tool calls and 10 minutes of work, a gateway that treats every API call as independent can't see the arc of what's happening. The agent can. The practical rule we've landed on: start cheap, upgrade when stuck, downgrade when unstuck. Most sessions spend 80% of their time in the cheap tier. The savings are real. The quality drop, if any, is smaller than you'd expect. Model selection isn't an afterthought. It's a first-class engineering decision — and the teams treating it that way are going to have a serious cost advantage. Full breakdown with pricing table and routing guide → https://lnkd.in/eqHUTZD9
-
Everyone's celebrating cheap code. I think we're celebrating the wrong thing. Drew Breunig published "10 Lessons for Agentic Coding" this week — it hit the front page of Hacker News, and deservedly so. His tenth lesson nails it: "Code is cheap, but maintenance, support, and security aren't." He called agentic code "free as in puppies." The bill comes later. But here's what most agentic coding advice still misses: when code becomes free, the bottleneck shifts. It's no longer writing code. It's managing context — the design decisions you rejected, the constraints you discovered, the three failed approaches that led to the one that worked. Most coding agents run in ephemeral sessions. You close the tab, the agent forgets everything. So you rebuild. And you relearn the same lessons. Again. We've been thinking about this problem at Octomind for a while. Our take: cheap code is worthless without cheap memory. Persistent sessions, adaptive context compression, tool-first architecture — not because it's clever, but because serious engineering requires it. The developers who thrive in the agentic era won't be the ones who generate the most code. They'll be the ones who generate the right code, keep track of why they made each decision, and build systems their agents can actually reason about. We wrote up our full response to Breunig's lessons — what we agree with, what we'd push further, and what the current generation of tools still gets wrong. Worth a read if you're building seriously with AI agents → https://lnkd.in/eYZBiD6K
-
Last week, millions of people opened their laptops and discovered a 4-gigabyte file they never downloaded. It wasn't malware. It was Chrome. Google has been silently shipping Gemini Nano — a local AI model — to user devices as part of routine browser updates. No consent dialog. No clear opt-out. Delete the files, and Chrome re-downloads them automatically. Then it got worse. Palo Alto Networks disclosed CVE-2026-0628: a high-severity vulnerability in Chrome's new Gemini panel. A malicious extension could hijack the panel to access your camera, microphone, and local files — all through an official Google UI. Google patched it in January. But the pattern is more troubling than the bug: new AI surface area, rushed to billions of users, with security review that missed a flaw this severe. This isn't just a Chrome problem. It's a preview of how every major platform plans to handle on-device AI. Microsoft is weaving Copilot into Windows at the OS level. The trajectory is clear: make their AI the default, the unavoidable, the opt-out-not-opt-in. Local processing is sold as privacy-friendly — and it can be. But only when you actually control what's running and why. The alternative is straightforward. Open-source tools like Ollama make self-hosted models genuinely practical. A 7B parameter model fits in the same ~4GB footprint as Gemini Nano. The difference is you chose to install it, you know exactly what it is, and you can remove it with one command. We built Octomind on this principle: an open-source agent runtime that connects to any model — local or remote — with zero hidden infrastructure, no telemetry you can't audit, and no 4GB payload arriving unannounced. Your machine. Your models. Your choice. That shouldn't be radical. But right now, it kind of is. If you're running local AI in production, how are you handling model provenance and update control? Curious what's working for teams who've made the switch.
-
We just shipped Octobrain 0.6.0 — and it solves a problem every AI agent workflow hits. Most AI memory tools can retrieve a chunk. But when your agent needs the full spec, the complete PDF, or every line matching a specific error code — you're stuck copying and pasting. Three things that changed: 1️⃣ Read command — feed it a URL or file path, get the full text back. HTML, PDF, DOCX, plain text. No chunking, no summarization. 2️⃣ Regex match — grep across your entire indexed knowledge base. "Find every line with error_code or timeout" just works. 3️⃣ Streaming queries — no more memory spikes or arbitrary row caps on large indexes. The knowledge layer (external sources) and memory layer (accumulated context) now work together cleanly. 🔗 Full release notes: https://lnkd.in/egrSFZRc What's your biggest pain point with AI agent memory right now?