CocoIndex’s cover photo
CocoIndex

CocoIndex

Technology, Information and Internet

Continuously turns any data into clean, structured fresh context for AI - with ultra performant incremental engine.

About us

𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬𝐥𝐲 turns any data - PDFs, Codebase, Emails, Screenshots, Meeting Notes etc - into 𝐜𝐥𝐞𝐚𝐧, 𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞𝐝, 𝐟𝐫𝐞𝐬𝐡 𝐜𝐨𝐧𝐭𝐞𝐱𝐭 𝐟𝐨𝐫 𝐀𝐈 - always up to date, at any scale, ultra-performant with incremental engine. Hot examples - https://cocoindex.io/docs/examples Open source repo - https://github.com/cocoindex-io/cocoindex Star and let's go 🚀🚀 !

Website
https://cocoindex.io
Industry
Technology, Information and Internet
Company size
2-10 employees
Headquarters
San Francisco
Type
Privately Held

Locations

Employees at CocoIndex

Updates

  • CocoIndex reposted this

    🎉 Check out the Python library: CocoIndex Code. 👉10k GitHub Stars 👉 897k downloads 👉 License: Apache 2.0 (permissive for commercial use) 🤗 What is it? Creates an AST (Abstract Syntax Tree) aware semantic index for the AI agents to look up parts of the code and help narrow down the problem of searching the correct context. When you use an editor like VS Code, it uses language servers to help prompt what comes after the “<class>.?” AST based search helps there. In simple English, every programming language has a grammar and keywords. The AST is created using the knowledge of the structure/grammar of the language, and this is a pre-GenAI reproducible tech. The AST aware semantic index essentially creates a lookup index for the AI to search your code alike one would check for words in a dictionary/thesaurus. Since this minimizes unnecessary guess work, it also helps reduce the token usage. 🦋 My personal perspective on AI Model Strategy: As AI companies increase the cost of token-usage (it’s a when, not an if), the AI bill is going to feel painful, unless your AI strategy has a hybrid approach: Local smaller models for simple tasks, self-hosted on-demand servers for big(~1T parameter) open-weights models (the hedge against closed proprietary models) and finally closed frontier models for very ambitious tasks, the AI bill will be painful. AI access should be viewed as akin to investment strategy: not all eggs should be placed in the same basket and hedging is necessary for a viable outcome where no proprietary model provider gets to twist your arms. True privacy is there when you thought silently, wrote nothing, said nothing. Just acted upon it. LLM traces, prompts and the product — unless all of this is accessible to you and only you, your thoughts aren’t yours. Your second-brain isn’t yours. ⚡️ pip install -U cocoindex ✨ Homepage: https://cocoindex.io/ 🎁 GitHub: https://lnkd.in/gVZ9y_Fp 🍀 PyPI: https://lnkd.in/gnwzhXp2 💡 What are ASTs? How Language Servers work using ASTs? Check an example and thorough blog post in the comments section. DISCLAIMER: All the views presented here are my own or already available openly. They do not represent in any form any of my past or present employers/clients/stakeholders. #codeindexing #ai #aiagents #ast #contextengineering #python #rust #privacy #personalai #aimodelsstrategy

    View profile for André Lindenberg

    Agents, Graphs, Ontologies

    CocoIndex Code helps coding agents look up the right code before they start opening files. It builds a local AST-aware semantic index, so Claude Code, Codex, Cursor, or OpenCode can retrieve relevant functions and classes instead of scanning raw files. Tree-sitter preserves code boundaries; local refresh keeps context current. Practical gain: fewer blind reads, less token waste, better first edits. 10k stars, Apache-2.0. #MCP #CodingAgents #ContextEngineering #TreeSitter #AIInfra

    • No alternative text description for this image
  • CocoIndex reposted this

    CocoIndex Code helps coding agents look up the right code before they start opening files. It builds a local AST-aware semantic index, so Claude Code, Codex, Cursor, or OpenCode can retrieve relevant functions and classes instead of scanning raw files. Tree-sitter preserves code boundaries; local refresh keeps context current. Practical gain: fewer blind reads, less token waste, better first edits. 10k stars, Apache-2.0. #MCP #CodingAgents #ContextEngineering #TreeSitter #AIInfra

    • No alternative text description for this image
  • CocoIndex is at Databricks Data & AI Summit next week! 30,000+ data engineers, AI practitioners, and builders at Moscone Center in SF — the room where the conversation is to discuss What’s next in AI. We will be sharing the insights we’ve learned building production infrastructure for AI. Agents run ~50x faster than humans and create massive unstructured data. You cannot fit the entire thing into the agent; it requires building a view over a large unstructured corpus. CocoIndex is a context engine for agents: a new computing paradigm over evolving unstructured assets and multi-stage processing. You declare the state of your target as a function of the source; the engine keeps it in sync with minimal reprocessing, across time horizons, at scale. And how we fit into the ecosystem and bridge the gap between dynamic unstructured data and structured data to close the loop for responsible agents in production. We will also do a brief demo about - when you have a large enterprise monorepo or corpus with 1% changing every day, how do you build a data pipeline that effectively surfaces fresh insights to your coding agents continuously. Come find us Linghua Jin Jiangzhou H. for casual coffee and exchange ideas! 📍 Moscone Center · June 15–18 · San Francisco #EnterpriseAI #AI #Databricks #DataAISummit #DataInfrastructure #AIAgent #AIInfrastructure

    • No alternative text description for this image
  • CocoIndex reposted this

    🧰 Harness Engineering is quickly becoming one of the hottest topics in Agentic AI. A few companies just talking about Harness Engineering in the conferences. However, some actually already started building/evaluating/optimising own harnesses. We are covering the later at this event in San Francisco 🌉 On 29th June at AWS Builder Loft. 💡 Harness Engineering: State of the Art in Agent Harnesses We are covering Harness Engineering from very different angle and hidden part of the Harness Engineering at the event that you probably haven't heard before.. 🦾 We covering 🧪 Harness evaluation by Dat Daryl Ngo ( Arize AI ) 🎛️ HyperParallel Experimentation of Harnesses by Arun Kumar ( RapidFire AI) ⏲️Harness Optimization by Myeongsoo Kim (Amazon Web Services (AWS)) 🥥Fresh Data for Coding Agent by Linghua Jin (CocoIndex) ✍️ We've already crossed 75+ registrations, and seats are filling up quickly. If you're building AI agents and want to learn what production-grade Harness Engineering actually looks like, reserve your spot now. 👉 https://luma.com/rtd0f6ka Hope to see you at the AWS Builders Loft in San Francisco 🌉 #HarnessEngineering #AgentEngineering #AgenticAI

    • No alternative text description for this image
  • CocoIndex reposted this

    🔥June is going to be HUGE for the Agentic AI Engineering community. I'm excited to be hosting two special events on two emerging disciplines one in London 🇬🇧 and one in San Francisco 🌉 that are rapidly shaping the future of Agentic AI and Agent Engineering! 🇬🇧 Agentic Engineering with Alibaba Cloud: London Agentic AI 📍 Tessl London 📅 June 25 🎙️Speakers from Alibaba Cloud Recombine Tomoro 👉 https://luma.com/hdygxmyx 🌉 Harness Engineering: State of the Art in Agent Harnesses: San Francisco 📍 AWS Builder Loft 📅 June 29 🎙️ Speakers from Arize AI Amazon Web Services (AWS) RapidFire AI CocoIndex 👉 https://luma.com/rtd0f6ka 🙏 Spaces are filling fast, don't wait until last moment. RSVP now #AgenticEngineering #HarnessEngineering #AIAgents #AIEngineering

  • In production, the hard part of AI is usually the data, not the model. The context an agent reads goes stale between runs. 𝐈𝐭 𝐧𝐞𝐞𝐝𝐬 𝐚 𝐟𝐫𝐞𝐬𝐡 𝐝𝐚𝐭𝐚 𝐯𝐢𝐞𝐰, 𝐜𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬𝐥𝐲, 𝐭𝐡𝐫𝐨𝐮𝐠𝐡𝐨𝐮𝐭 𝐭𝐡𝐞 𝐫𝐮𝐧. CocoIndex's incremental engine is built for that. Since v1.0 we shipped seven releases (1.0.1 to 1.0.7), focused on operating that engine in production: • 𝐏𝐞𝐫-𝐚𝐫𝐠𝐮𝐦𝐞𝐧𝐭 𝐦𝐞𝐦𝐨𝐢𝐳𝐚𝐭𝐢𝐨𝐧: keep clients, loggers, and debug flags out of the cache key, so a stable handle never busts cached LLM calls. • 𝐒𝐜𝐡𝐞𝐝𝐮𝐥𝐞𝐝 𝐥𝐢𝐯𝐞 𝐫𝐞𝐟𝐫𝐞𝐬𝐡: coco.auto_refresh turns "poll this source every few minutes" into a first-class live component. • 𝐒𝐜𝐨𝐩𝐞𝐝 𝐬𝐭𝐚𝐭𝐬: break the adds / reprocesses / deletes counts down by data slice (per tenant, project, or folder), so you can see what each slice is doing, for example: growth, churn, an unexpected reprocess. • 𝐏𝐚𝐫𝐚𝐥𝐥𝐞𝐥 𝐞𝐧𝐭𝐢𝐭𝐲 𝐫𝐞𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧: resolve independent graph components concurrently, not in one global pass. • 𝐍𝐞𝐰 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐨𝐫𝐬: Turbopuffer, Neo4j, FalkorDB, OCI Object Storage, Apache Iggy, plus eight new languages for the code splitter. Underneath, a hardened core: SQL injection paths closed, and concurrency races fixed that only surface under real database latency. #AIEngineering #KnowledgeGraph #Python #Knowledge #Rust #EnterpriseAI #AIAgents

  • CocoIndex reposted this

    CocoIndex turns codebases, meeting notes, inboxes, Slack, PDFs, and videos into live, continuously fresh context for your AI agents and LLM apps to reason over effectively — with minimal incremental processing. Get your production AI agent ready in 10 minutes with reliable, continuously fresh data — no stale batches, no context gap

  • You declare state, not messages. Joining CocoIndex Contributor and Apache Kafka Confluent Chennai meetup organizer Srihari Thyagarajan on cocoindex recent work on realtime data for ai. The combination of CocoIndex and Kafka - treats the knowledge layer the same way operational data has been handled for years — as a stream of change events rather than a snapshot to be re-read. Drives, repos, design files, wikis, PDFs, and file shares — the unstructured data that has traditionally lived outside the streaming world — can be published to the same event backbone that already carries orders, clicks, and CDC traffic. The benefits show up in several places: More efficient AI workloads. Embeddings, retrievals, and agent context are refreshed only when something has actually changed, which reduces redundant work and improves freshness at the same time. A single change reaches every consumer. A commit, a renamed Drive document, or a Notion edit can update the vector index, notify an agent, update search, feed a Flink job, and land in a BI tile — without any of those systems needing to know about each other. Easier extensibility. A new agent, a rebuilt knowledge layer, or a compliance tool can be added as another subscriber to the topic, with the log providing replay so it sees historical changes the same way it sees new ones. Better auditability. Each change consumed by an agent is durably recorded with offsets and timestamps, which makes it possible to answer questions like “did the agent see the updated policy before it acted?” with concrete evidence. A stable contract over time. The change-event schema on the topic provides a stable interface between sources and consumers. Detectors, sources, and models can evolve independently while the wire format stays consistent. Thanks for hosting the event Srihari Thyagarajan, Ena Koide, Confluent. and sharing the project!

    • No alternative text description for this image
    • No alternative text description for this image
  • CocoIndex reposted this

    Spoke at the Confluent's Apache Kafka meetup (in Chennai) at Facilio yesterday which I co-organized with Ena Koide and Sai K. (a few folks came up wanting to help volunteer for the next one, so the organizing crew might be a bit bigger next time around I post about this!!) My talk was based on CocoIndex’s new Kafka connector. The idea was to keep it simple: changing files / knowledge sources should be able to become Kafka records without every downstream consumer rebuilding its own integration logic. I contribute to CocoIndex, so it was a good reason to bring it to a Kafka room. Also really enjoyed Mani Selvan K’s session on Apache Flink and stream processing fundamentals. Wrote a short reflection post on the event here: https://lnkd.in/gkFdPJHe Slides here: https://lnkd.in/gCRhJDJD if you want to check them out.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • CocoIndex reposted this

    Caught a VeloDB meetup last week at the Trellis coworking space on Mission in SF. Yuankai (Kevin) Shen walked through how VeloDB handles observability use cases. If you haven't looked at this database seriously, you should. It's built on Apache Doris and it is fast. Think more people in the data community should be talking about it. Then Linghua Jin from CocoIndex did a live demo. The guy sitting next to me was SOLD — he'd been working on the same problem that she'd already solved. Always fun when a meetup leaves people actually excited. #VeloDB #ApacheDoris #DataEngineering #Observability #SanFrancisco

    • No alternative text description for this image

Similar pages

Browse jobs