📐 查看交互式架构图 | 💻 TypeScript 实现指南 | 🚀 快速开始 | 📚 架构图文档
The Sentient Framework is now fully implemented in TypeScript using LangChain.js, Google Gemini, and modern AI tools.
Quick Start:
npm install
cp env.example .env
# Add your OPENAI_API_KEY and GOOGLE_API_KEY to .env
npm run example:fullDocumentation:
- 📖 Implementation Guide - Complete TypeScript implementation docs
- 🚀 Quick Start Guide - Get running in 5 minutes
- 🤝 Contributing Guide - How to contribute
- 📐 Architecture Diagrams - Interactive C4 diagrams
What's Included:
- ✅ Multi-Modal RAG (Document, Graph, Keyword)
- ✅ Hybrid Retrieval with RRF
- ✅ EXIT-inspired Context Compression
- ✅ Agentic Orchestration
- ✅ MCP & A2A Protocol Services
- ✅ Complete Examples & Tools
- ✅ Production-Ready Server
This initial section establishes the foundational philosophy and high-level structure of the Sentient framework. It defines the core principles and data flow, providing a strategic overview before delving into the specific components and their implementation. The architecture is designed to address the complex, multifaceted nature of information retrieval and reasoning required by modern AI systems.
The development of sophisticated AI applications necessitates a paradigm shift from simple prompt engineering to a more rigorous discipline: Context Engineering. This discipline involves the systematic design of systems that can dynamically discover, retrieve, filter, compress, and structure information to provide Large Language Models (LLMs) with the precise context required to perform complex tasks. The Sentient framework is conceived not as a linear Retrieval-Augmented Generation (RAG) pipeline, but as an advanced, agent-driven system for mastering this entire lifecycle of context. It is an architecture for building intelligent systems that understand the nuances of information, whether it resides in unstructured documents, structured codebases, or the collaborative dialogue between specialized agents.
The Sentient framework is built upon three core architectural pillars, each addressing a fundamental limitation of conventional RAG systems. Together, they form a robust foundation for building context-aware, reasoning-capable AI applications.
- Multi-Modal Retrieval: The initial wave of RAG systems primarily focused on semantic search over unstructured text. While powerful, this approach is fundamentally incomplete. Information possesses multiple dimensions: semantic meaning, structural relationships, and lexical specificity. A truly effective context engine must therefore operate across all these modalities. Sentient achieves this by integrating three distinct retrieval strategies: semantic search for conceptual understanding (Document RAG), structural analysis for relational context (Graph RAG), and lexical matching for precision (Keyword RAG). This multi-modal approach ensures that the framework can retrieve the most relevant information, regardless of how it is encoded.
- Agentic Orchestration: Static, predefined workflows, such as LangChain's original "chains," are too rigid for the dynamic and unpredictable nature of complex problem-solving. The future of AI lies in agentic systems—autonomous entities that can reason, plan, use tools, and collaborate to achieve goals. Sentient places an agentic core at its heart, moving beyond static retrieval to dynamic action. This core is responsible for decomposing complex user requests into manageable subtasks, invoking external tools and services to gather information or perform actions, and coordinating the efforts of multiple specialized agents to synthesize a comprehensive solution.
- Contextual Optimization: The context windows of even the most advanced LLMs are a finite and valuable resource. Flooding this window with raw, unfiltered retrieved data introduces noise, increases processing latency, drives up operational costs, and can degrade the model's ability to focus on the most salient information.1 Contextual optimization is therefore not an optional refinement but a critical necessity. Sentient incorporates intelligent context compression as a final, crucial step before generation. By employing efficient extractive techniques, the framework filters out irrelevant information and condenses the context to its most potent form, ensuring the LLM receives a signal of the highest possible quality.
The Sentient architecture is organized into three distinct layers, promoting modularity, separation of concerns, and clear data flow. This layered design facilitates independent development, testing, and scaling of each component.
- Service Layer: This is the outermost layer, providing the framework's external interfaces. It exposes all of Sentient's capabilities to the outside world through standardized protocols.
- MCP Service: A Model Context Protocol (MCP) server that allows agents and applications to discover and interact with Sentient's internal tools (e.g., the code analysis engine) in a standardized, interoperable manner.3
- A2A Service: An Agent-to-Agent (A2A) protocol server that functions as a message bus, enabling seamless communication and task delegation between Sentient's internal agents and potentially with external, third-party agents.4
- Agentic Core: This is the central processing and reasoning layer—the "brain" of the Sentient framework. It orchestrates the entire workflow from query ingestion to final response generation.
- OrchestratorAgent: The primary agent that receives user requests, decomposes them into subtasks, and delegates them to specialized agents via the A2A Service.
- ToolManager: Manages the lifecycle and execution of tools, interacting with the MCP Service to expose them and handling function calls from the LLM.
- MessageBus: An internal implementation that routes messages and tasks between agents, built upon the A2A protocol.
- Context Engine Layer: This is the foundational layer responsible for all information retrieval tasks. It abstracts the complexities of different data sources and retrieval methods, providing a unified interface to the Agentic Core.
- DocumentRAGEngine: Handles retrieval from unstructured and semi-structured sources like PDFs, text files, and web pages using semantic vector search.
- GraphRAGEngine: Specializes in retrieving context from structured sources with explicit relationships, most notably code repositories, by analyzing their dependency graphs.
- KeywordRAGEngine: Provides high-precision lexical search capabilities, essential for finding exact matches, identifiers, or acronyms.
The design of this architecture reflects a deep understanding of the current trajectory of AI development. The explicit separation of specialized retrieval engines (Document RAG, Graph RAG) from standardized communication protocols (MCP, A2A) is a deliberate choice. This structure acknowledges that as AI systems grow in complexity, the monolithic, single-agent model becomes untenable.6 The industry is converging on a "society of agents" model, where complex problems are solved by a collection of specialized agents collaborating toward a common goal.8
This trend is observable even at a micro level. The experimental "Architect/Editor" pattern within the Aider framework, for instance, demonstrates that separating the "reasoning" task (Architect agent) from the "code editing" task (Editor agent) yields superior results for a single coding problem.9 This principle of specialization is the philosophical cornerstone of Sentient's design. The framework is not merely "multi-agent"; it is an ecosystem architected from the ground up to foster and leverage interoperable specialization. Each component is designed to do one thing exceptionally well, and the service layers (MCP and A2A) provide the universal language that allows these specialized components to collaborate effectively.
To manage the flow of information and state across its distributed components, the Sentient framework relies on a central data structure and a set of core programmatic interfaces. These abstractions ensure consistency, enforce contracts between modules, and provide a clear, traceable path for every query that enters the system.
The SentientPayload is the canonical data object that is passed between the various layers and agents within the framework. It serves as a comprehensive container for all information related to a single user request, evolving as it progresses through the system. Its structure is designed to be extensible and serializable, allowing it to be transmitted over the A2A message bus.
A conceptual definition of the SentientPayload includes:
- query: The original, unmodified user query string.
- sessionId: A unique identifier for the conversation or session, enabling stateful interactions.
- history: An array of previous message turns, providing conversational context.
- state: A flexible object representing the current state of the agentic workflow, including the decomposed task list, dependencies, and intermediate results.
- contextFragments: An array of retrieved information snippets. Each fragment includes its content, source, retrieval score, and the engine that produced it (e.g., 'DocumentRAG', 'GraphRAG').
- compressedContext: The final, optimized context string generated after the compression stage.
- finalOutput: The final response generated by the LLM for the user.
- metadata: Additional metadata, including timestamps, performance metrics, and tracing information.
To illustrate the data flow, consider a complex, multi-step query: "Analyze the RepoMapper codebase, identify performance bottlenecks in the graph ranking algorithm, and suggest optimizations in the associated README.md."
The lifecycle of this query through the Sentient framework would proceed as follows:
- Ingestion: The query is received by the Service Layer and packaged into an initial SentientPayload. This payload is passed to the OrchestratorAgent in the Agentic Core.
- Task Decomposition: The OrchestratorAgent analyzes the query and decomposes it into a series of subtasks with dependencies:
- Task 1: Retrieve the structure and key components of the RepoMapper codebase.
- Task 2: Locate the specific implementation of the graph ranking algorithm.
- Task 3: (Depends on 2) Analyze the algorithm for performance bottlenecks.
- Task 4: Retrieve the current README.md content related to performance.
- Task 5: (Depends on 3 & 4) Synthesize the findings and generate optimized code and updated documentation.
- Delegation & Context Retrieval: The OrchestratorAgent uses the A2A Service to delegate these tasks to specialized agents.
- It sends Task 1 and 2 to a CodingAgent. The CodingAgent invokes the GraphRAGEngine to get a repository map, identifying importance.py and repomap_class.py as key files.3 The retrieved code snippets are added to the contextFragments in the payload.
- It sends Task 4 to a DocumentationAgent. The DocumentationAgent invokes the DocumentRAGEngine to retrieve the relevant sections from the README.md file. This content is also added to the contextFragments.
- Tool Use: For Task 3, the OrchestratorAgent might determine that static analysis is insufficient. It could instruct an agent to use an external performance profiling tool (e.g., cProfile for Python). This tool would be exposed via the MCP Service. The agent would invoke the tool, and the profiler's output would be added to the contextFragments.
- Context Compression & Synthesis: Once all prerequisite tasks are complete, the OrchestratorAgent initiates Task 5. It gathers all the contextFragments—the repo map, the ranking algorithm code, the profiler output, and the README text. This collection of text is passed to the Context Compression module, which filters out noise and produces a dense, highly relevant compressedContext.
- Final Response Generation: The OrchestratorAgent makes a final call to the LLM, providing the compressedContext and the final prompt: "Based on the following context, suggest code optimizations for the ranking algorithm and update the README." The LLM's response is placed in the finalOutput field of the payload, which is then returned to the user through the Service Layer.
To ensure this complex interplay of components is manageable and type-safe, the framework is built upon a set of foundational TypeScript interfaces. These interfaces define the contract that each component must adhere to, enabling true modularity and interchangeability.
-
IContextEngine: Represents any module capable of retrieving documents in response to a query. All RAG engines (Document, Graph, Keyword) and the final Hybrid Retriever will implement this interface.
TypeScript
import { Document } from "@langchain/core/documents";export interface IContextEngine {
retrieve(query: string): Promise<Document>;
} -
IAgent: Represents an autonomous agent capable of processing a SentientPayload. The Orchestrator and all specialized agents will implement this interface.
TypeScript
// Assuming SentientPayload is a defined type
export interface IAgent {
process(payload: SentientPayload): Promise<SentientPayload>;
} -
ITool: Represents an external capability that an agent can invoke. This interface standardizes the definition of tools for both the ToolManager and the MCP Service.
TypeScript
import { z } from "zod";export interface ITool {
readonly name: string;
readonly description: string;
readonly schema: z.ZodObject<any, any, any>;
execute(args: z.infer<this["schema"]>): Promise<any>;
}
These core abstractions form the backbone of the Sentient framework, providing the structure and discipline necessary to build a powerful, scalable, and maintainable context engineering system.
This section provides a detailed architectural examination of the three distinct retrieval engines that form the foundation of the Sentient framework. Each engine is designed to extract a different modality of context—semantic, structural, and lexical—from various data sources. The fusion of these engines allows Sentient to build a rich, multi-faceted understanding of the information landscape relevant to a given query.
The Document RAG Engine serves as the primary interface for processing unstructured and semi-structured data sources. Its domain includes a vast array of common enterprise documents, such as PDFs, Microsoft Word files, web pages, Markdown files, and plain text documents. The engine's architecture leverages the robust, modular components of the LangChain.js library to create a flexible and powerful pipeline for ingesting, processing, and retrieving text-based information based on semantic meaning.
The first step in any RAG pipeline is loading the raw data into a standardized format. LangChain.js provides a comprehensive set of document loaders, all of which implement a common BaseDocumentLoader interface.11 This standardized interface, which exposes a .load() method, allows the Sentient framework to handle diverse data sources in a uniform manner.12
To manage this diversity effectively, the Document RAG Engine implements a factory pattern. This design pattern allows the engine to dynamically select and instantiate the appropriate loader based on the file's extension (MIME type) or the source's URI scheme. This approach ensures scalability and usability; new document types can be supported by simply adding a new loader to the factory without altering the core processing logic.13
A key implementation within this component is the handling of PDF files, a ubiquitous format for technical documentation, reports, and academic papers. The engine utilizes the PDFLoader from the @langchain/community/document_loaders/fs/pdf package. This loader, which relies on the pdf-parse library, can be configured to create a separate LangChain Document object for each page or to treat the entire file as a single document.14 For web-based content, the CheerioWebBaseLoader is employed to scrape and parse HTML, providing a robust mechanism for ingesting online articles and documentation.16
It is crucial to be aware of the runtime environments in which these loaders operate. Some loaders have dependencies that are specific to Node.js (e.g., file system access) or that rely on browser APIs (DOM access). When deploying in constrained environments like browser extensions or serverless functions, these dependencies can cause issues. For example, the WebPDFLoader's dependency on pdf-parse can fail in a Chrome extension's service worker due to its implicit use of DOM APIs.13 In such cases, it may be necessary to create custom-packed loaders that bundle their dependencies directly, bypassing dynamic imports that fail in these environments.13
Once a document is loaded, its content must be divided into smaller, semantically meaningful chunks. This splitting process is critical for several reasons: it allows long documents to be processed within the finite context windows of LLMs, improves the precision of vector search by creating more focused chunks, and optimizes memory usage.17
A naive approach of splitting text by a fixed number of characters is often suboptimal, as it can sever sentences or break apart coherent ideas, destroying semantic context. The Sentient framework therefore adopts a more sophisticated strategy, defaulting to the RecursiveCharacterTextSplitter provided by LangChain.js.17 This splitter operates on a hierarchical list of separators, by default ["\n\n", "\n", " ", ""]. It first attempts to split the text into paragraphs. If the resulting chunks are still too large, it recursively splits them into sentences, and then into words, ensuring that the largest possible semantically coherent units are preserved.17
The engine is also capable of adapting its splitting strategy based on the document's structure. For semi-structured formats like Markdown or HTML, it can employ structure-aware splitters that use headers or HTML tags as delimiters, preserving the document's logical organization.17 For applications where precise alignment with a model's tokenization is paramount, a TokenTextSplitter can be used to ensure that no chunk exceeds the model's token limit, although care must be taken with languages where a single character can map to multiple tokens.17
After splitting, the text chunks must be converted into a numerical format that enables semantic comparison. This process, known as vectorization or embedding, uses a deep learning model to map each text chunk to a high-dimensional vector. The geometric proximity of these vectors in the resulting vector space corresponds to their semantic similarity.20
The resulting vectors are then stored and indexed in a specialized database called a vector store. Vector stores are optimized for performing efficient nearest-neighbor searches, allowing the system to quickly find the document chunks whose vector representations are most similar to a given query vector.20
The Sentient framework's design is agnostic to the specific vector store implementation, thanks to LangChain's standardized VectorStore interface. This allows for flexibility in deployment. For local development, testing, and small-scale applications, the in-memory MemoryVectorStore is an excellent choice due to its simplicity and lack of external dependencies.22 For production environments requiring persistence, scalability, and advanced filtering capabilities, the framework can be seamlessly configured to use enterprise-grade solutions such as Weaviate, Pinecone, or Chroma.20
The end-to-end pipeline for a PDF document demonstrates the synergy of these components. A file is loaded with PDFLoader, its content is split into semantically coherent chunks by RecursiveCharacterTextSplitter, each chunk is converted into a vector using an embedding model like OpenAIEmbeddings, and the resulting documents and their embeddings are indexed into a MemoryVectorStore using the convenient MemoryVectorStore.fromDocuments() static method.16
The final component of the Document RAG Engine is the retriever. The retriever is a lightweight wrapper around the vector store that exposes a simple interface for querying. It accepts a string query, converts it into a vector using the same embedding model used for ingestion, and queries the vector store to find the most relevant document chunks.26
Any LangChain VectorStore instance can be easily converted into a retriever by calling the .asRetriever() method. This method can be configured with parameters such as k, which specifies the number of documents to return.26
Furthermore, the retriever can implement more advanced retrieval strategies to enhance the quality of the results. One such strategy is Maximal Marginal Relevance (MMR). A standard similarity search can sometimes return a set of highly redundant chunks that all express the same idea. MMR mitigates this by first fetching a larger set of candidate documents and then re-ranking them to optimize for a combination of relevance to the query and diversity among the selected documents. This ensures that the final context provided to the LLM is not only relevant but also informationally rich and non-repetitive.23
While the Document RAG Engine excels at understanding the semantic content of text, it is fundamentally limited when dealing with domains where structure and relationships are as important as the text itself. A prime example of such a domain is a software codebase. Treating source code as a collection of unstructured text files misses the intricate web of dependencies, function calls, and class inheritances that define its architecture and behavior. The Graph RAG Engine is specifically designed to capture and leverage this structural context, providing a far deeper understanding of software repositories.
The design of the Graph RAG Engine is heavily inspired by the innovative "Repo Map" feature pioneered by the AI coding assistant, Aider, and its standalone implementation, RepoMapper.3 This approach recognizes that a codebase is not merely a collection of files but a graph of interconnected symbols. By analyzing this graph, the engine can identify the most influential and central components of a repository, providing the LLM with a high-level architectural overview that is impossible to obtain from simple semantic search.
The generation of a repository map is a multi-stage process that transforms raw source code into a compact, token-optimized representation of the codebase's architecture.
- File Discovery & Parsing: The process begins by scanning a git repository to discover all relevant source files, respecting .gitignore and other exclusion rules. Each discovered file is then parsed using tree-sitter, a powerful and versatile parser generator.3 Unlike older tools like ctags which perform simple pattern matching, tree-sitter constructs a full Abstract Syntax Tree (AST) for each file based on a formal grammar for the specific programming language.28 This detailed structural representation allows for the precise extraction of symbol definitions (e.g., class and function declarations) and references (e.g., function calls and variable usage).
- Graph Construction: With the symbols and references extracted, the engine constructs a directed graph, typically using a library such as networkx in Python or a JavaScript equivalent like js-graph-algorithms. In this graph, each source file is represented as a node. A directed edge is created from file A to file B if code in file A contains a reference to a symbol defined in file B.3 The resulting structure is a dependency graph of the entire repository, visually representing the flow of information and control between different modules.
- Relevance Ranking: Not all files in a repository are equally important. A central utility class that is imported by dozens of other modules is architecturally more significant than a peripheral script. To quantify this importance, the engine applies a graph ranking algorithm, such as PageRank, to the constructed dependency graph.3 PageRank, famously used by Google to rank web pages, assigns a score to each node (file) based on the number and quality of incoming edges. Files that are referenced by many other important files will receive a higher rank, effectively identifying the architectural backbone of the codebase.27
- Map Generation & Token Optimization: The final step is to generate the textual "repo map" that will be provided to the LLM. This map is not a complete dump of the code; it is a curated summary containing only the most critical information. The engine iterates through the files, ordered by their PageRank score, and extracts the definitions of their most important symbols (e.g., function signatures, class declarations). This process continues until a predefined token budget is reached (e.g., 1024 tokens, configurable via a parameter like --map-tokens).27 To ensure the map fits precisely within this budget, a binary search algorithm is often used to find the optimal amount of content to include.3 The final output is a human-readable text file that presents a hierarchical view of the most important files and their key symbols, as demonstrated by the RepoMapper tool's output format.10
Given the specialized dependencies of the Graph RAG Engine (such as tree-sitter and its various language grammars) and the computationally intensive nature of parsing and graph analysis, the optimal architectural pattern is to implement it as a standalone microservice. This service encapsulates its complexity, allowing the main Sentient application to remain lightweight. The service exposes an endpoint (ideally, an MCP-compliant one) that accepts a repository location and returns the generated repo map. This design also enables effective caching; the parsed symbol data and calculated graph ranks can be persisted to disk (e.g., in a .repomap.tags.cache.v1/ directory), dramatically speeding up subsequent requests for the same repository.3
The principles underlying the Graph RAG Engine are not confined to software engineering. The core methodology involves parsing a collection of structured documents to identify discrete entities and the relationships between them, then using graph theory to analyze this network of connections. This powerful pattern can be generalized beyond code to any domain characterized by interconnected knowledge.
For example, consider a corpus of academic research papers. The documents are the papers themselves. A specialized parser, instead of tree-sitter, could be used to extract entities (e.g., paper titles, authors) and relationships (citations). The same graph construction and PageRank algorithm could then be applied to this citation network to identify the most influential, foundational papers in a field. Similarly, in a legal domain, cases could be nodes and citations of legal precedent could be edges, allowing the engine to identify landmark cases.
This demonstrates that the Graph RAG Engine is a specific implementation of a more general concept: "Structural RAG." By designing the engine in a modular way, where the parser is a pluggable component, the Sentient framework can be extended to perform structural analysis on any domain that can be modeled as a graph. This represents a significant leap beyond simple semantic retrieval, enabling the framework to understand the deep, relational context of complex knowledge bases.
The individual retrieval engines—Document, Graph, and Keyword—are powerful but specialized. Each excels at finding a different kind of information, and each has its own blind spots. Semantic search might miss a specific function name, lexical search cannot understand conceptual queries, and structural search is unaware of the natural language explanations in code comments. A truly comprehensive and resilient retrieval system cannot rely on a single strategy. Instead, it must fuse the outputs of all three, leveraging their complementary strengths to produce a result that is superior to any single engine's output.
The necessity for a hybrid approach stems from the multi-faceted nature of user queries and the information they seek.
- Document RAG (Vector Search): This engine is best for conceptual and semantic queries. It can understand that "how to handle authentication" is related to documents discussing "user login security" and "JSON Web Tokens," even if the exact keywords don't match.31 Its weakness lies in its "averaging" nature; the vector for a chunk represents its overall meaning, potentially obscuring specific, important keywords or identifiers.
- Graph RAG (Structural Search): This engine excels at navigating explicit relationships, particularly within codebases. It can answer questions like "what functions use the database.connect method?" by traversing the dependency graph.32 Its primary limitation is its domain-specificity; it requires a formal grammar and cannot process unstructured prose.
- Keyword RAG (Lexical Search): This engine, typically implemented using algorithms like BM25 or TF-IDF, is unmatched for precision when dealing with exact terms, acronyms, or unique identifiers.31 It will reliably find every occurrence of KMS_API_KEY. However, it has no semantic understanding and will fail to connect "feline" with "cat."
A hybrid system combines these strengths. For a query like "Find the most important functions related to our JWT authentication implementation," a hybrid system would use vector search to find conceptual documents about security, graph search to identify the core authentication functions and their callers, and keyword search to pinpoint every usage of the term "JWT."
The first step in creating this hybrid system is to perform ensemble retrieval. This involves running the user's query against all three context engines in parallel. LangChain provides constructs like the EnsembleRetriever that facilitate this pattern.26 Each retriever returns its own ranked list of relevant documents. The challenge then becomes how to merge these three separate lists into a single, coherently ranked list.
Simply concatenating the results or interleaving them based on their original scores is suboptimal, as the scoring mechanisms of each retriever are not directly comparable. A score of 0.9 from a vector search does not mean the same thing as a high PageRank score from the graph analysis.
A more robust and effective method for merging the ranked lists is Reciprocal Rank Fusion (RRF). RRF is a rank-based fusion method that does not depend on the absolute scores of the documents. Its formula is simple and powerful: for each document, its RRF score is calculated by summing the reciprocal of its rank in each of the retrieval lists.
The formula is:
Where:
- is the document being scored.
- is the number of retrieval lists (in this case, 3).
- is the rank of document in list . If the document is not in the list, its rank can be considered infinite.
- is a constant (commonly set to 60) that mitigates the impact of high ranks (i.e., reduces the influence of documents that appear very far down a list).
The final, fused list is then created by sorting all unique documents from the ensemble results by their calculated RRF score in descending order. This method consistently prioritizes documents that are deemed relevant by multiple diverse retrieval strategies, leading to a more reliable and comprehensive final context.
To provide a clear reference for developers using and extending the Sentient framework, the distinct characteristics of each RAG engine are summarized below. This table serves as a quick guide for understanding their strengths, weaknesses, and appropriate use cases, which is invaluable for debugging retrieval issues and optimizing performance for specific tasks.
| Feature | Document RAG (Vector) | Graph RAG (Structural) | Keyword RAG (Lexical) |
|---|---|---|---|
| Underlying Tech | Embeddings, Vector Stores | AST Parsing, Graph Theory | Inverted Index, TF-IDF/BM25 |
| Best For | Conceptual & Semantic Similarity | Code Navigation, Dependency Analysis | Exact Matches, Acronyms, IDs |
| Strengths | Handles synonyms, understands intent | Understands code structure, finds relevant APIs | Fast, precise for known terms |
| Weaknesses | Can miss keywords, "average" vector problem | Domain-specific (code), requires parsing | No semantic understanding, fails on synonyms |
| Sentient Impl. | LangChain.js, MemoryVectorStore | RepoMapper Service, tree-sitter | BM25 Retriever (e.g., from LangChain) |
This section details the architecture of the Agentic Core, the reasoning and orchestration center of the Sentient framework. This layer elevates the system's capabilities beyond static data retrieval, enabling it to perform dynamic actions, interact with external systems, and facilitate collaboration among specialized AI agents. This transition from a passive RAG pipeline to an active, agent-driven workflow is what unlocks the ability to solve complex, multi-step problems.
A fundamental capability of an advanced agent is the ability to use tools. Tool use, often referred to as function calling, allows an LLM to break out of the confines of text generation and interact with the outside world. It can query databases, call APIs, run code, and perform actions, transforming it from a passive knowledge source into an active problem-solver.
The Sentient framework leverages the native function calling capabilities of models like Google's Gemini.35 This feature provides a structured mechanism for the LLM to request the execution of external functions. The process is a controlled, multi-step loop that ensures safety and reliability.36
- Tool Definition and Provision: The developer defines a set of available tools in the application code. Each tool definition includes a clear name, a detailed description of its purpose and capabilities, and a formal schema (e.g., an OpenAPI schema or a Zod schema in TypeScript) that specifies its input parameters.35 The quality of the description is paramount; it is the primary information the LLM uses to determine when and how to use the tool.38
- Model Inference and Tool Selection: The user's prompt, along with the definitions of all available tools, is sent to the Gemini model. The model analyzes the prompt and determines if invoking one of the tools would help in fulfilling the user's request. If it decides to use a tool, it does not execute any code itself. Instead, its response is a structured JSON object, a functionCall, containing the name of the tool to be called and a dictionary of arguments that conform to the tool's schema.36
- Application-Side Execution: The application code receives this functionCall object. It is the application's responsibility to parse this response, identify the requested tool, and execute the corresponding function with the arguments provided by the model. This step ensures that the LLM never has direct execution privileges, maintaining a secure operational boundary.37
- Response and Synthesis: The result of the function execution (e.g., data from an API call, the output of a calculation) is then packaged into a functionResponse object and sent back to the model in a subsequent API call. The model uses this new information to synthesize a final, user-facing natural language response that incorporates the results of the tool's execution.36
While defining tools directly in the application code is effective, it creates a tight coupling between the agent and its capabilities. As the number of tools and agents grows, a more scalable and interoperable solution is required. The Model Context Protocol (MCP) provides this solution. MCP is an open standard designed to allow AI models and applications to discover and communicate with external services, tools, and data sources in a standardized way.3
The Sentient framework implements a dedicated MCP Service, an HTTP endpoint (e.g., built with Express.js in TypeScript) that acts as a universal gateway to all of the framework's internal tools. This service exposes a well-defined endpoint, which, when queried, returns a manifest of all available tools, including their names, descriptions, and schemas, in the standard MCP format.
This architecture provides several key advantages:
- Dynamic Discovery: Agents are no longer hard-coded with a list of tools. They can dynamically query the MCP Service to discover what capabilities are available at runtime.
- Decoupling: Tools can be added, removed, or updated within the Sentient framework without requiring any changes to the agents that use them. As long as the tool is registered with the MCP Service, it becomes available to the entire system.
- Interoperability: Any external agent or application that speaks MCP can interact with Sentient's tools, fostering a broader ecosystem of interoperable AI components.
The RepoMapper tool developed for the Graph RAG Engine is a perfect candidate for exposure via this MCP Service. An agent needing to understand a codebase can query the MCP Service, discover the repo_map tool, and invoke it with the path to a project repository to receive a structured code map.3
For tasks of significant complexity, even a single, powerful, tool-using agent can be insufficient. The principle of "separation of concerns" from software engineering applies equally to AI systems. A multi-agent architecture, where a problem is distributed across a team of specialized agents, offers significant advantages in terms of specialization, scalability, and fault tolerance.6 A system composed of smaller, focused agents is more modular, easier to maintain, and can be scaled by adding more agents to handle increased load.7
The Sentient framework is designed around a core set of specialized agent roles, each with a distinct responsibility. This division of labor allows each agent to be optimized for its specific task.
- Orchestrator Agent: This is the central coordinator or "project manager." It receives the initial user request, analyzes its complexity, and decomposes it into a logical sequence of subtasks. It then delegates these subtasks to the appropriate specialized agents and synthesizes their results into a final, coherent response. This architecture is inspired by established multi-agent patterns like the orchestrator-worker and hierarchical agent patterns.8
- Retrieval Agent: This agent's sole focus is information gathering. It is an expert in querying the Context Engine Layer, capable of invoking the Document, Graph, and Keyword RAG engines to find relevant information.
- Coding Agent: This agent specializes in tasks related to software development. It heavily utilizes the Graph RAG Engine to understand code structure, and it is responsible for analyzing, refactoring, and generating source code.
- Documentation Agent: This agent is an expert in processing prose and semi-structured documents. It handles tasks such as summarizing technical papers, extracting information from PDFs, and updating Markdown documentation.
For these specialized agents to collaborate effectively, they need a common language and a reliable communication channel. The Agent-to-Agent (A2A) protocol provides this. A2A is an open standard designed to be a "universal translator" for AI agents, enabling interoperability between agents built using different frameworks or by different providers.4
The A2A protocol defines a set of core architectural components for managing agent interactions 4:
- A2A Client and Server: Agents communicate in a client-server model over HTTP.
- Agent Card: A JSON metadata file that acts as an agent's "business card," describing its identity, capabilities, endpoint URL, and authentication requirements. This enables agent discovery.
- Task: A standardized representation of a unit of work. Tasks have unique IDs and progress through a defined lifecycle (e.g., submitted, working, completed, failed), making them ideal for managing long-running, asynchronous collaborations.
- Message and Artifact: The fundamental units of communication, used to exchange information, status updates, and final deliverables (artifacts).
The Sentient framework implements an A2A Service, an HTTP endpoint that functions as the central message bus for the entire system. When the OrchestratorAgent delegates a subtask, it constructs an A2A Task object and sends it to the A2A endpoint of the appropriate specialized agent (e.g., the CodingAgent). The receiving agent processes the task and uses the A2A protocol to send back status updates and, ultimately, the results.
The relationship between MCP and A2A is a critical aspect of the Sentient architecture. They are not competing standards but complementary ones, representing two distinct modes of interaction for an agent. MCP governs how an agent interacts with a tool or a service—it is about accessing a passive capability. A2A governs how an agent interacts with another agent—it is about delegating a task and engaging in active collaboration.4
A single capability within Sentient, such as the Graph RAG Engine, can be exposed through both protocols to provide maximum flexibility. A simple request like "Give me the repo map for project X" can be treated as a tool call, handled by the MCP Service. A more complex request, such as "Analyze project X and identify the files most critical to its authentication flow," is better framed as a task to be delegated to a specialized CodeAnalysisAgent via the A2A Service. By supporting both protocols, the Sentient framework allows for both simple, direct tool use and complex, collaborative, multi-agent workflows, making it adaptable to a wide range of problem complexities.
The retrieval of relevant information by the Context Engines is only the first half of the problem. The second, equally critical half is ensuring that this information is presented to the LLM in the most effective and efficient manner possible. The finite nature of LLM context windows, coupled with the performance and cost implications of processing large amounts of text, makes context optimization a non-negotiable component of any serious RAG architecture. This section details the Sentient framework's approach to intelligently compressing and refining retrieved context before it reaches the generation model.
The "naive" RAG approach—retrieving the top-k document chunks and concatenating them directly into the prompt—suffers from significant drawbacks. Retrievers are imperfect and often return documents containing noise, irrelevant details, or redundant information alongside the valuable content. Stuffing this unfiltered context into the prompt not only consumes valuable token space and increases API costs and latency but can also actively harm the LLM's performance by distracting it with irrelevant data, a phenomenon sometimes referred to as the "lost in the middle" problem.2
There are two primary strategies for context compression:
- Abstractive Compression: This approach uses an LLM to read all the retrieved documents and generate a new, concise summary that synthesizes the key information. While this can produce a very dense and coherent context, it comes at a significant performance cost. The summarization step is itself an autoregressive generation process, which can dramatically increase end-to-end latency, making it unsuitable for many real-time applications.2
- Extractive Compression: This approach is far more efficient. Instead of generating new text, it selects and extracts the most relevant segments (e.g., sentences, passages) directly from the original retrieved documents. This avoids the latency of a full generation step. However, traditional extractive methods, which often evaluate the relevance of each sentence independently, can fail to capture the full context and may discard sentences that are crucial for understanding the surrounding text.41
The Sentient framework adopts an advanced extractive approach, recognizing that for most interactive applications, the efficiency gains are paramount. The design is specifically engineered to overcome the limitations of traditional extractive methods by incorporating a deeper, context-aware relevance assessment.
The framework's context compression module is architecturally inspired by the EXIT (EXtractIve ContexT compression) framework, a state-of-the-art approach that frames compression as a sentence classification problem.2 This method enhances both efficiency, through its extractive nature, and effectiveness, through its dynamic, context-aware selection process.43 The implementation in TypeScript follows a three-stage pipeline:
- Step 1: Sentence Decomposition: After the Hybrid Retriever has produced a ranked list of relevant documents, the content of these documents is decomposed into a flat list of individual sentences. This is typically achieved using a reliable, rule-based sentence tokenizer.
- Step 2: Context-Aware Relevance Classification: This is the core innovation of the approach. Instead of evaluating each sentence in isolation, the module assesses its relevance within its original context. For each sentence in the list, a classification request is made to a fast and cost-effective LLM (e.g., Gemini Flash or a specialized fine-tuned model). The prompt for this classification call is carefully constructed to include three key elements:
- The original user query.
- The specific sentence being evaluated.
- The full text of the document from which the sentence was extracted.
The model is then asked to return a simple binary classification ("Yes" or "No") or a relevance score, answering the question: "Is this sentence essential for answering the user's query, given the full context of the document it came from?" By providing the full document context, the model can make a much more informed judgment. For example, a sentence like "This approach was highly effective" is meaningless in isolation, but when read within the context of the surrounding document, its relevance to the user's query becomes clear. This step is highly parallelizable, as the classification for each sentence can be performed independently, leading to significant speedups.41
- Step 3: Document Reassembly: The final compressed context is constructed by concatenating all sentences that were classified as "Yes" (or that scored above a configurable relevance threshold, tau). Crucially, the sentences are reassembled in their original order to preserve the logical flow and coherence of the text.43 The result is a new, much shorter context that contains only the most salient information, ready to be passed to the main generation LLM.
The user query for the Sentient framework specified a requirement for handling "annotations." The context-aware compression technique provides a natural and powerful point of integration for such metadata. If the source documents are pre-annotated with metadata such as keywords, topics, or semantic tags, this information can be included in the prompt during the relevance classification step (Step 2).
For example, the prompt could be augmented to say: "Given the user's query and the full document (which is annotated with the keywords 'security', 'authentication'), is the following sentence essential...". These annotations act as strong hints to the classification model, further improving its accuracy and allowing it to better align the selection process with the known semantic properties of the source documents. This creates a virtuous cycle where well-annotated knowledge bases lead to more effective context compression, which in turn leads to higher-quality final responses.
This section transitions from high-level architecture and theory to concrete implementation. It provides the foundational TypeScript code for the Sentient framework, including project setup, core interfaces, and implementations of the key engines and services. The code is designed to be modular, type-safe, and illustrative of the architectural principles discussed in the preceding sections.
A robust project structure and well-defined interfaces are essential for building a maintainable and scalable system. This chapter outlines the initial setup and the core TypeScript interfaces that form the contractual backbone of the Sentient framework.
The project is initialized as a standard Node.js TypeScript project.
-
Initialization:
Bash
mkdir sentient-framework
cd sentient-framework
npm init -y
npm install typescript @types/node ts-node --save-dev
npx tsc --init -
tsconfig.json Configuration: A baseline tsconfig.json is configured to support modern Node.js environments with ES modules.
JSON
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"rootDir": "./src",
"outDir": "./dist",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"skipLibCheck": true
}
} -
Dependency Installation: The core dependencies for the framework are installed. This includes libraries for AI integration, data validation, and web services.45
Bash
npm install @langchain/core @langchain/community @langchain/openai @langchain/textsplitters
npm install @google/genai
npm install zod
npm install express @types/express
npm install pdf-parse
The following TypeScript interfaces, introduced conceptually in Chapter 2, are now fully defined with TSDoc comments. These files should be placed in a src/core/interfaces directory.
IContextEngine.ts
This interface defines the contract for any component that can retrieve documents.
TypeScript
import { Document } from "@langchain/core/documents";
/**
* Represents a context retrieval engine.
* Any component responsible for fetching relevant documents based on a query
* must implement this interface.
*/
export interface IContextEngine {
/**
* Retrieves a list of relevant documents for a given query.
* @param query The user's query string.
* @returns A promise that resolves to an array of LangChain Document objects.
*/
retrieve(query: string): Promise<Document>;
}
ITool.ts
This interface provides a standardized structure for defining tools that agents can use. It leverages Zod for robust schema validation.48
TypeScript
import { z, ZodObject, ZodRawShape } from "zod";
/**
* Represents an external tool that an agent can execute.
* It defines the tool's identity, purpose, input schema, and execution logic.
*/
export interface ITool<T extends ZodRawShape> {
/**
* A unique, programmatic name for the tool.
* Should be in snake_case.
*/
readonly name: string;
/**
* A detailed description of what the tool does, its parameters, and when it should be used.
* This is critical for the LLM to make correct decisions.
*/
readonly description: string;
/**
* A Zod schema defining the input parameters for the tool.
*/
readonly schema: ZodObject<T>;
/**
* The function that executes the tool's logic.
* @param args The validated input arguments, conforming to the schema.
* @returns A promise that resolves to the tool's output, typically a string or JSON object.
*/
execute(args: z.infer<ZodObject<T>>): Promise<any>;
}
IAgent.ts
This interface defines the contract for all autonomous agents within the framework.
TypeScript
// Define SentientPayload in a separate file, e.g., src/core/types.ts
import { SentientPayload } from "../types";
/**
* Represents an autonomous agent capable of processing a task.
* Each agent takes the central SentientPayload, performs its specialized function,
* and returns the modified payload.
*/
export interface IAgent {
/**
* Processes the SentientPayload to perform a specific task.
* @param payload The current state of the request, including query, history, and context.
* @returns A promise that resolves to the updated SentientPayload.
*/
process(payload: SentientPayload): Promise<SentientPayload>;
}
IMessageBus.ts
This interface abstracts the communication layer for multi-agent interactions, based on the A2A protocol's concepts.
TypeScript
// Define A2AMessage in a separate file, e.g., src/core/types.ts
import { A2AMessage } from "../types";
/**
* Represents the message bus for inter-agent communication,
* abstracting the A2A protocol.
*/
export interface IMessageBus {
/**
* Sends a message to another agent.
* @param message An A2A-compliant message object, containing target agent, task, and content.
* @returns A promise that resolves when the message has been successfully dispatched.
*/
send(message: A2AMessage): Promise<void>;
}
This chapter provides concrete TypeScript classes that implement the IContextEngine interface, demonstrating how the different retrieval strategies are realized in code.
This class encapsulates the LangChain.js pipeline for semantic retrieval from documents.
TypeScript
// src/engines/DocumentRAGEngine.ts
import { Document } from "@langchain/core/documents";
import { OpenAIEmbeddings } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { IContextEngine } from "../core/interfaces/IContextEngine";
export class DocumentRAGEngine implements IContextEngine {
private vectorStore: MemoryVectorStore | null = null;
private embeddings: OpenAIEmbeddings;
constructor() {
// Ensure API keys are set in environment variables
this.embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
}
/**
* Ingests and indexes a PDF file. In a real application, this would be
* part of a separate, persistent ingestion pipeline.
* @param filePath Path to the PDF file.
*/
public async ingestPdf(filePath: string): Promise<void> {
console.log(`Ingesting PDF from: ${filePath}`);
const loader = new PDFLoader(filePath); // [14, 15]
const rawDocs = await loader.load();
const textSplitter \= new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
}); // \[18\]
const splitDocs \= await textSplitter.splitDocuments(rawDocs);
this.vectorStore \= await MemoryVectorStore.fromDocuments(
splitDocs,
this.embeddings
); // \[16, 23\]
console.log("PDF ingested and indexed successfully.");
}
public async retrieve(query: string): Promise<Document> {
if (!this.vectorStore) {
throw new Error("Vector store not initialized. Ingest a document first.");
}
const retriever = this.vectorStore.asRetriever(4); // [26]
return retriever.invoke(query);
}
}
This class acts as a client to the standalone RepoMapper microservice. It implements the IContextEngine interface by making an HTTP request to the service and formatting the response as a LangChain Document.
TypeScript
// src/engines/GraphRAGEngineClient.ts
import { Document } from "@langchain/core/documents";
import { IContextEngine } from "../core/interfaces/IContextEngine";
import axios from "axios";
export class GraphRAGEngineClient implements IContextEngine {
private readonly serviceUrl: string;
constructor(serviceUrl: string) {
this.serviceUrl = serviceUrl;
}
public async retrieve(query: string): Promise<Document> {
try {
// The query is expected to be the path to the repository
const repoPath = query;
// In a real implementation, this would call the MCP/HTTP endpoint of the RepoMapper service
// For this example, we simulate the call and response.
// const response \= await axios.post(this.serviceUrl, { project\_root: repoPath });
// const repoMapContent \= response.data;
const simulatedRepoMapContent \= \`
repomap\_class.py:
(Rank value: 10.8111)
46: class RepoMap:
49: def \_\_init\_\_(
512: def get\_repo\_map(
utils.py:
(Rank value: 0.2297)
21: def count\_tokens(text: str, model\_name: str \= "gpt-4") \-\> int:
\`; // \[10\]
const repoMapDoc \= new Document({
pageContent: simulatedRepoMapContent,
metadata: {
source: \`GraphRAG:${repoPath}\`,
engine: 'GraphRAGEngine',
},
});
return;
} catch (error) {
console.error("Error retrieving from GraphRAGEngine:", error);
return;
}
}
}
This class demonstrates the ensemble and re-ranking logic. It takes multiple IContextEngine instances and fuses their results using Reciprocal Rank Fusion (RRF).
TypeScript
// src/engines/HybridRetriever.ts
import { Document } from "@langchain/core/documents";
import { IContextEngine } from "../core/interfaces/IContextEngine";
export class HybridRetriever implements IContextEngine {
private readonly retrievers: IContextEngine;
private readonly kConstant: number;
constructor(retrievers: IContextEngine, kConstant: number = 60) {
this.retrievers = retrievers;
this.kConstant = kConstant;
}
public async retrieve(query: string): Promise<Document> {
const allResults = await Promise.all(
this.retrievers.map(retriever => retriever.retrieve(query))
);
const docScores: Map\<string, number\> \= new Map();
const docStore: Map\<string, Document\> \= new Map();
allResults.forEach(rankedList \=\> {
rankedList.forEach((doc, index) \=\> {
const docKey \= doc.pageContent; // Use content as a simple key
if (\!docStore.has(docKey)) {
docStore.set(docKey, doc);
docScores.set(docKey, 0);
}
const rank \= index \+ 1;
const currentScore \= docScores.get(docKey) |
| 0;
docScores.set(docKey, currentScore + 1 / (this.kConstant + rank));
});
});
const sortedDocs \= Array.from(docScores.entries())
.sort((a, b) \=\> b \- a)
.map((\[key\]) \=\> docStore.get(key)\!);
return sortedDocs;
}
}
This chapter provides skeleton code for the Express.js server that hosts the MCP and A2A service endpoints, forming the external interface of the Sentient framework.
TypeScript
// src/server.ts
import express, { Request, Response } from 'express';
import { ITool } from './core/interfaces/ITool';
// Assume other necessary imports for agents, A2A types, etc.
const app = express();
app.use(express.json());
const PORT = 3000;
// --- MCP Service Layer ---
// In a real app, tools would be dynamically registered.
const registeredTools: ITool<any> =;
app.get('/mcp/tools', (req: Request, res: Response) => {
const toolManifest = registeredTools.map(tool => ({
name: tool.name,
description: tool.description,
// Convert Zod schema to JSON schema for MCP compatibility
schema: tool.schema.toJSON(),
}));
res.json({ tools: toolManifest });
});
// --- A2A Service Layer ---
// This endpoint would receive tasks for different agents.
app.post('/a2a/task', async (req: Request, res: Response) => {
const a2aTask = req.body; // Assuming body conforms to A2A Task schema [4]
// Routing logic to determine which agent handles the task
// const targetAgent = getAgentById(a2aTask.targetAgentId);
// if (targetAgent) {
// // Asynchronously process the task
// targetAgent.process(a2aTask.payload);
// res.status(202).json({ taskId: a2aTask.id, status: 'submitted' });
// } else {
// res.status(404).json({ error: 'Agent not found' });
// }
console.log("Received A2A Task:", a2aTask);
res.status(202).json({ taskId: a2aTask.id, status: 'submitted' });
});
app.listen(PORT, () => {
console.log(`Sentient Framework server running on http://localhost:${PORT}\`);
});
This final implementation chapter provides a single, runnable TypeScript file that simulates a complex workflow, tying together all the architectural components: multi-modal RAG, agentic delegation via A2A, and final synthesis.
Scenario: "Based on the RepoMapper repository, refactor the caching logic to use Redis and update the corresponding documentation PDF."
TypeScript
// src/main_workflow.ts
import { DocumentRAGEngine } from './engines/DocumentRAGEngine';
import { GraphRAGEngineClient } from './engines/GraphRAGEngineClient';
import { HybridRetriever } from './engines/HybridRetriever';
// Mock agent and payload types for demonstration
type MockAgent = { process: (task: string, context: string) => Promise<string> };
type A2ATask = { agent: string; task: string };
// --- Main Orchestration Logic ---
async function runWorkflow() {
console.log("--- Starting Sentient Workflow ---");
const userQuery = "Based on the RepoMapper repository, refactor the caching logic to use Redis and update the corresponding documentation PDF.";
// 1. Initialize Context Engines
const docEngine = new DocumentRAGEngine();
await docEngine.ingestPdf('./mock_docs/repomapper_docs.pdf'); // Assumes a mock PDF exists
const graphEngine = new GraphRAGEngineClient('http://localhost:3001/repomap');
// Mock Keyword Engine
const keywordEngine = { retrieve: async (q: string) => };
const hybridRetriever = new HybridRetriever([docEngine, graphEngine, keywordEngine]);
// 2. Orchestrator Agent Decomposes Task
console.log("\n[Orchestrator] Decomposing user query...");
const tasks: A2ATask =;
// 3. Delegate and Execute Tasks (Simulated A2A calls)
let codingContext = '';
let docContext = '';
let refactoredCode = '';
let updatedDocs = '';
// Task 1 & 2: Retrieval
console.log("\n[Orchestrator] Delegating retrieval tasks...");
const graphContextDocs = await graphEngine.retrieve('/path/to/RepoMapper');
codingContext = graphContextDocs.pageContent;
const docContextDocs = await docEngine.retrieve('caching logic');
docContext = docContextDocs.map(d => d.pageContent).join('\n\n');
// Mock Agents processing the retrieved context
const codingAgent: MockAgent = {
async process(task, context) {
console.log(`\n[CodingAgent] Received task: "${task}"`);
// Simulates LLM call with context
return `// Refactored code using Redis client\n// based on analysis of:\n${context.substring(0, 100)}...`;
}
};
const docAgent: MockAgent = {
async process(task, context) {
console.log(`\n Received task: "${task}"`);
return `### Using Redis for Caching\n The new implementation leverages Redis for faster...`;
}
};
// Task 3 & 4: Generation
console.log("\n[Orchestrator] Delegating generation tasks...");
refactoredCode = await codingAgent.process(tasks.task, codingContext);
updatedDocs = await docAgent.process(tasks.task, docContext);
// 4. Synthesize Final Response
console.log("\n[Orchestrator] Synthesizing final response...");
const finalResponse = `
**Proposed Code Refactoring:**
\`\`\`python
${refactoredCode}
\`\`\`
\*\*Updated Documentation:\*\*
${updatedDocs}
`;
console.log("\n--- Workflow Complete ---");
console.log(finalResponse);
}
runWorkflow();
This report has laid out a comprehensive architectural blueprint for the Sentient framework, a sophisticated system for advanced context engineering. By integrating multi-modal retrieval, an agentic core capable of tool use and collaboration, and intelligent context optimization, the framework is designed to overcome the limitations of traditional RAG pipelines. This concluding section summarizes its core capabilities and strategic value, and outlines promising directions for its future evolution.
The Sentient framework represents a strategic investment in a flexible, powerful, and future-proof AI architecture. Its core capabilities address the most pressing challenges in building complex, context-aware applications.
Summary of Capabilities:
- Holistic Context Retrieval: By fusing semantic (Document RAG), structural (Graph RAG), and lexical (Keyword RAG) search, Sentient achieves a far more complete and accurate understanding of the available information landscape than any single-modality system.
- Dynamic, Agentic Workflows: The shift from static chains to a multi-agent architecture orchestrated by a central reasoning agent enables the framework to tackle complex, multi-step problems that require planning, delegation, and tool use.
- Interoperability and Extensibility: Adherence to open standards like MCP for tool integration and A2A for inter-agent communication ensures that Sentient is not a monolithic silo. It is designed to be a collaborative component in a broader ecosystem of AI services, capable of both consuming and providing capabilities in a standardized manner.
- Efficiency and Performance: The inclusion of an advanced extractive context compression module directly addresses the critical issues of context window limits, latency, and cost. By ensuring the LLM receives only the most salient information, the framework optimizes performance and reduces operational expenses.
Strategic Recommendations for Deployment:
- Security First: API keys and other sensitive credentials must never be hard-coded in source code or exposed on the client-side. The recommended practice is to use environment variables on the server or a dedicated secrets management service. API keys should be regularly rotated and, where possible, restricted in scope to minimize the potential impact of a compromise.49
- Phased Implementation: The modular nature of the framework lends itself to a phased rollout. An initial deployment could focus on the Document RAG Engine to provide immediate value for document-based Q&A. The Graph RAG service and multi-agent capabilities can then be developed and integrated incrementally.
- Monitoring and Logging: Implement comprehensive monitoring and logging for all API interactions and agentic decisions. Tools like LangSmith are invaluable for tracing the complex execution paths of multi-agent systems, enabling effective debugging, evaluation, and continuous optimization.23
- Scalable Architecture: For production use, the standalone services (Graph RAG Engine, MCP/A2A server) should be deployed in a scalable environment, such as containerized services managed by Kubernetes or a serverless platform. This ensures that the system can handle varying loads without compromising performance.49
The Sentient architecture is not a final state but a foundation for continuous innovation. Several key areas present exciting opportunities for future development, pushing the boundaries of what agentic systems can achieve.
- Human-in-the-Loop (HITL) Integration: For many critical enterprise applications, full autonomy is undesirable. Future versions of the framework should incorporate HITL checkpoints. For example, before a CodingAgent commits code to a repository or a DeploymentAgent pushes changes to production, the OrchestratorAgent could be required to pause the task and request explicit confirmation from a human supervisor. This pattern combines the speed and scale of AI with the judgment and oversight of human experts.7
- Self-Improving and Self-Healing Agents: The current architecture enables agents to use tools to solve external problems. A more advanced paradigm is to have agents use tools to improve the system itself. An intriguing possibility, demonstrated by researchers at Anthropic, is the creation of a "Tool-Testing Agent".40 This agent could be tasked with systematically invoking every tool registered with the MCP Service, analyzing its output for correctness and observing failure modes. When it identifies a poorly performing tool, it could then attempt to automatically rewrite the tool's description to be clearer and more effective for other agents, creating a system that learns and improves its own capabilities over time.
- Advanced Graph Topologies and Multi-Modal Knowledge: The current Graph RAG Engine is focused on the dependency graph of a codebase. A powerful extension would be to evolve this into a richer, multi-modal knowledge graph. This graph could integrate nodes and edges from disparate sources: linking code functions (from tree-sitter) to their corresponding documentation sections (from the Document RAG engine), which are in turn linked to project management tickets (from an external API like Jira) and design documents (from Confluence). Querying this unified graph would enable agents to answer incredibly complex, cross-domain questions, such as "Show me all the code changes related to ticket JIRA-123, along with the technical specification that prompted them and the developers who worked on them." This would represent the ultimate realization of a truly holistic context engineering system.
- AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models - arXiv, accessed October 9, 2025, https://arxiv.org/html/2409.01579v1
- arXiv:2412.12559v3 [cs.CL] 29 May 2025, accessed October 9, 2025, https://arxiv.org/pdf/2412.12559
- Repo Map | Awesome MCP Servers, accessed October 9, 2025, https://mcpservers.org/servers/pdavis68/RepoMapper
- What Is Agent2Agent (A2A) Protocol? | IBM, accessed October 9, 2025, https://www.ibm.com/think/topics/agent2agent-protocol
- A2A (Agent-to-Agent) Protocol (TypeScript) - Microsoft Learn, accessed October 9, 2025, https://learn.microsoft.com/en-us/microsoftteams/platform/teams-ai-library/typescript/in-depth-guides/ai/a2a/overview
- What is Multi-Agent Collaboration? - IBM, accessed October 9, 2025, https://www.ibm.com/think/topics/multi-agent-collaboration
- Multi-Agent Design Pattern - Microsoft Open Source, accessed October 9, 2025, https://microsoft.github.io/ai-agents-for-beginners/08-multi-agent/
- Designing Multi-Agent Intelligence - Microsoft for Developers, accessed October 9, 2025, https://developer.microsoft.com/blog/designing-multi-agent-intelligence
- Separating code reasoning and editing | aider, accessed October 9, 2025, https://aider.chat/2024/09/26/architect.html
- pdavis68/RepoMapper: A tool to produce a map of a ... - GitHub, accessed October 9, 2025, https://github.com/pdavis68/RepoMapper
- How to write a custom document loader - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/how_to/document_loader_custom/
- Document loaders | 🦜️ Langchain, accessed October 9, 2025, https://js.langchain.com/docs/concepts/document_loaders/
- A Guide to Gotchas with LangChain Document Loaders in a Chrome Extension - Medium, accessed October 9, 2025, https://medium.com/@andrewnguonly/a-guide-to-gotchas-with-langchain-document-loaders-in-a-chrome-extension-6228369f79f6
- How to load PDF files - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/how_to/document_loader_pdf/
- PDFLoader - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/integrations/document_loaders/file_loaders/pdf/
- How to do retrieval | 🦜️ Langchain, accessed October 9, 2025, https://js.langchain.com/docs/how_to/chatbots_retrieval/
- Text splitters | 🦜️ Langchain, accessed October 9, 2025, https://js.langchain.com/docs/concepts/text_splitters/
- How to recursively split text by characters - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/how_to/recursive_text_splitter/
- Unpacking Text Splitter with LangChain | by Donato_TH | Donato Story - Medium, accessed October 9, 2025, https://medium.com/donato-story/unpacking-text-splitter-with-langchain-d14c38758986
- Vector stores | 🦜️ Langchain, accessed October 9, 2025, https://js.langchain.com/docs/concepts/vectorstores/
- Vector stores - ️ LangChain, accessed October 9, 2025, https://python.langchain.com/docs/concepts/vectorstores/
- How to create and query vector stores - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/how_to/vectorstores/
- MemoryVectorStore - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/integrations/vectorstores/memory/
- WeaviateStore - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/integrations/vectorstores/weaviate
- Build Your First RAG Application in JavaScript in Under 10 Minutes (With Code) | by Aparna Prasad | Medium, accessed October 9, 2025, https://medium.com/@aparna_prasad/build-your-first-rag-application-in-javascript-in-under-10-minutes-with-code-30fd35bd3a35
- Retrievers | 🦜️ Langchain, accessed October 9, 2025, https://js.langchain.com/docs/concepts/retrievers/
- Repository map - Aider, accessed October 9, 2025, https://aider.chat/docs/repomap.html
- Building a better repository map with tree sitter | aider, accessed October 9, 2025, https://aider.chat/2023/10/22/repomap.html
- Improving GPT-4's codebase understanding with ctags - Aider, accessed October 9, 2025, https://aider.chat/docs/ctags.html
- Building and using a code graph in MotleyCoder - Motleycrew, accessed October 9, 2025, https://blog.motleycrew.ai/blog/building-and-using-a-code-graph-in-motleycoder
- Retrieval - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/concepts/retrieval/
- Exploring the Power of Code Graphs in Modern Software Development - DEV Community, accessed October 9, 2025, https://dev.to/supratipb/exploring-the-power-of-code-graphs-in-modern-software-development-4k6m
- Safely restructure your codebase with Dependency Graphs - Understand Legacy Code, accessed October 9, 2025, https://understandlegacycode.com/blog/safely-restructure-codebase-with-dependency-graphs/
- Improving RAG Systems with Amazon Bedrock Knowledge Base: Practical Techniques from Real Implementation - DEV Community, accessed October 9, 2025, https://dev.to/aws-builders/improving-rag-systems-with-amazon-bedrock-knowledge-base-practical-techniques-from-real-9kk
- Introduction to function calling | Generative AI on Vertex AI - Google Cloud, accessed October 9, 2025, https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling
- Function calling with the Gemini API | Google AI for Developers, accessed October 9, 2025, https://ai.google.dev/gemini-api/docs/function-calling
- Function calling with the Gemini API - YouTube, accessed October 9, 2025, https://www.youtube.com/watch?v=mVXrdvXplj0
- Best practices with the Live API | Generative AI on Vertex AI - Google Cloud, accessed October 9, 2025, https://cloud.google.com/vertex-ai/generative-ai/docs/live-api/best-practices
- Four Design Patterns for Event-Driven, Multi-Agent Systems, accessed October 9, 2025, https://www.confluent.io/blog/event-driven-multi-agent-systems/
- How we built our multi-agent research system - Anthropic, accessed October 9, 2025, https://www.anthropic.com/engineering/built-multi-agent-research-system
- EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation - arXiv, accessed October 9, 2025, https://arxiv.org/html/2412.12559v2
- EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation - ACL Anthology, accessed October 9, 2025, https://aclanthology.org/2025.findings-acl.253/
- [Literature Review] EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation - Moonlight, accessed October 9, 2025, https://www.themoonlight.io/en/review/exit-context-aware-extractive-compression-for-enhancing-retrieval-augmented-generation
- Official code and resources for the paper "EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation." - GitHub, accessed October 9, 2025, https://github.com/ThisIsHwang/EXIT
- LangChain.js, accessed October 9, 2025, https://v02.api.js.langchain.com/
- How to Build RAG AI Agents with TypeScript - freeCodeCamp, accessed October 9, 2025, https://www.freecodecamp.org/news/how-to-build-rag-ai-agents-with-typescript/
- Installation - LangChain.js, accessed October 9, 2025, https://js.langchain.com/docs/how_to/installation/
- 567-labs/instructor-js: structured extraction for llms - GitHub, accessed October 9, 2025, https://github.com/567-labs/instructor-js
- What are the best practices for implementing Gemini API? - SERPHouse, accessed October 9, 2025, https://www.serphouse.com/blog/best-practices-implementing-gemini-api/
- Using Gemini API keys - Google AI for Developers, accessed October 9, 2025, https://ai.google.dev/gemini-api/docs/api-key
- Getting started with the Gemini API and Web apps | Solutions for Developers, accessed October 9, 2025, https://developers.google.com/learn/pathways/solution-ai-gemini-getting-started-web
- Build a Full Stack RAG App With TypeScript - YouTube, accessed October 9, 2025, https://www.youtube.com/watch?v=rQdibOsL1ps
- Computer Use | Gemini API - Google AI for Developers, accessed October 9, 2025, https://ai.google.dev/gemini-api/docs/computer-use