Skip to content

freedom12321/chatR-GSOC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ChatR: An Intelligent, Local Assistant for R Programmers

The Problem

R users, especially package developers and new contributors, face a common challenge: existing large language models often provide inaccurate or incomplete R code. They struggle with third-party packages, fail to provide precise answers about R contribution workflows, and are often too resource-intensive or proprietary for many users. The result is a frustrating, time-consuming experience that limits learning and collaboration.


The ChatR Solution

ChatR is a lightweight, open, and reliable AI assistant built specifically for R users. It runs locally on your machine without needing a powerful GPU or a proprietary API key.

Think of ChatR as your personal, knowledgeable R expert who can search documentation, run code, and explain complex concepts clearly.

Key Capabilities:

  • Local & Open: Built with open models via Ollama (e.g., llama3.2:3b-instruct), it runs locally with no GPU required.
  • Context-Aware Retrieval: Uses a Retrieval-Augmented Generation (RAG) pipeline to pull package documentation, examples, and vignettes.
  • Live R Tool-Calling: Safely executes R code to verify its behavior and provide live outputs.
  • Educational Support: Teaches concepts step-by-step with tiered support, from a single function to a multi-step analysis pipeline.
  • Easy-to-Use Delivery: Ships as a command-line interface (CLI) and an R package (with an RStudio gadget).

πŸš€ Quick Start

For R Users (Current Process)

# 1. First: Install and start CLI backend (one-time setup)
git clone https://github.com/freedom12321/chatR-GSOC.git
cd chatR-GSOC
pip install -e .
chatr init
chatr serve  # run the first time and then can you can close terminal and R package still works
# 2. Then: Install and use R package (in new R session)
source("https://raw.githubusercontent.com/freedom12321/chatR-GSOC/main/install_chatr.R")
install_chatr()

# 3. Use immediately with full AI power
library(chatr)
chatr("How do I create a linear regression with diagnostics?")  # βœ… Full LLM analysis
chatr_analyze("mtcars")  # βœ… AI-powered dataset analysis  
chatr_repl()  # βœ… Interactive chat interface
chatr_code("machine learning model")  # βœ… Smart code generation

πŸ’‘ Future Update: We're working on making this seamless - soon you'll only need the R package!

For CLI Users

# Install and setup
git clone https://github.com/freedom12321/chatR-GSOC.git
cd chatR-GSOC && pip install -e . && ollama pull llama3.2:3b-instruct

# Use immediately  
chatr chat "How do I merge dataframes in R?"
chatr serve  # Start backend for R package integration

πŸ“– Full Setup Guide β†’ | πŸ“š Complete User Guide β†’


✨ What Makes ChatR Special

🏠 Completely Local & Private

  • Runs on your machine with open models via Ollama
  • No GPU required - works with llama3.2:3b-instruct (2GB RAM)
  • Zero data leaves your computer - complete privacy

🎯 R-Specific Intelligence

  • Live R Execution: Safely runs R code to verify suggestions
  • Package-Aware: Knows about 120+ indexed R functions from essential packages
  • Educational: Explains concepts step-by-step with working examples

πŸ”§ Multiple Interfaces

  • R Package: Native R functions with full documentation (?chatr)
  • Interactive REPL: Chat interface with chatr_repl() for real-time assistance
  • RStudio Integration: Addins and gadgets for seamless workflow
  • Python CLI: Cross-platform command-line tool
  • MCP Integration: Tools for agentic frameworks (Cursor, Copilot)
  • API Server: Backend for custom integrations

🧠 Advanced Features

  • Interactive REPL: Real-time chat interface with chatr_repl() for console-based assistance
  • MCP Integration: 6 tools for agentic frameworks (Cursor, Copilot) via chatr mcp --port 8002
  • Environment Awareness: Understands your current R session and data
  • Smart Code Generation: Creates complete scripts with chatr_generate_script()
  • Enhanced Documentation: 120+ indexed R functions from essential packages
  • Data Analysis Automation: Intelligent dataset exploration with chatr_analyze()

πŸŽͺ All ChatR Features

Feature R Package Function CLI Command Description
Interactive Chat chatr() chatr chat Natural language R programming assistance
Interactive REPL chatr_repl() chatr chat -i Real-time chat interface with context memory
Function Help help_explain() chatr chat Intelligent documentation with examples
Code Analysis analyze_code() chatr chat Code review and improvement suggestions
Data Exploration chatr_analyze() - Automated dataset analysis and insights
Script Generation chatr_generate_script() - Create complete R scripts from descriptions
Smart Code Generation chatr_code() - Generate and optionally execute R code
Interactive Sessions chatr_code_session() - Step-by-step programming guidance
Environment Summary chatr_list_data() - Show available datasets and variables
Data Summaries chatr_data_summary() - Quick dataset overviews
Smart Completion chatr_smart_complete() - Context-aware code recommendations
Analysis Tips chatr_analysis_tips() - Get best practices for data analysis
Interactive Assistant chatr_assistant() - Guided data analysis workflow
Selected Code Analysis chatr_analyze_selection() - RStudio addin for code analysis
MCP Tools - chatr mcp 6 tools for agentic frameworks (port 8002)

Note: CLI uses chatr chat for all queries - specific functionality depends on your question

πŸ“– Usage Examples

Basic Queries

library(chatr)

# Get help with R concepts
chatr("How do I handle missing values in linear regression?")

# Function-specific help with examples  
help_explain("dplyr::mutate")

# Code review and suggestions
analyze_code("
  data <- read.csv('file.csv')
  plot(data$x, data$y)
")

Advanced Data Analysis

# Load your dataset
data(mtcars)

# Get comprehensive analysis
chatr_analyze("mtcars")
# Returns: data summary, distributions, correlations, suggested analyses

# Generate complete analysis script
chatr_generate_script(
  task = "exploratory data analysis with visualizations",
  dataset_name = "mtcars", 
  output_file = "mtcars_eda.R"
)

# Interactive REPL interface (NEW!)
chatr_repl()  # Start real-time chat in R console
# Features: context memory, built-in commands, colored interface

# Interactive code generation with execution
chatr_code("create correlation heatmap", execute_code = TRUE)

Interactive Programming

# Step-by-step programming session
chatr_code_session()  # Interactive guidance

# Environment-aware assistance
chatr_list_data()  # See what data is available
chatr("analyze the relationship between mpg and weight")  # ChatR knows about mtcars

CLI Usage

# Ask questions directly
chatr chat "How do I create a violin plot in R?"

# Interactive mode
chatr chat --interactive
# > You: How do I merge data frames?
# > ChatR: [detailed explanation with examples]

# Start MCP server for agentic frameworks (NEW!)
chatr mcp --port 8002  # 6 tools: r_execute, r_help, r_search, r_explain, r_package_info, r_vignettes

# Start main server for R integration
chatr serve

How ChatR Stays Up-to-Date

ChatR avoids large downloads by using a dynamic, two-pronged approach. It combines live registries with a lightweight, rotating local cache.

1. Live Discovery (No Bulk Downloads)

  • Queries CRAN and r-universe registries on demand for the latest package lists and metadata.
  • Leverages the pkgsearch API to find new packages and their documentation in real-time.
  • Outcome: You can always search for the most current R packages without needing to maintain a giant local corpus.

2. Thin Local Cache

  • Maintains a small, local cache of only package names, titles, and descriptions.
  • Full documentation is downloaded only on demand for packages you actually reference, then cached for future reuse.

Technical Architecture

ChatR is powered by a fully local, open-source Retrieval-Augmented Generation (RAG) system.

Hybrid Retrieval (Fully Local)

You don't need an OpenAI key. ChatR uses a hybrid retrieval system to ensure both speed and semantic accuracy. It combines two methods:

  • Sparse Retrieval (BM25): A lightweight, keyword-based search that quickly filters a vast corpus of documentation, acting as the first pass to identify a broad set of potentially relevant documents.
  • Dense Embeddings: Using local models like nomic-embed-text, this approach performs a semantic search on the pre-filtered results from the sparse retrieval. The best matches are then re-ranked, ensuring the most semantically relevant answers rise to the top. This hybrid approach offers the speed of a keyword search with the precision of a vector-based one.

R Tool-Calling: Secure & Live Execution

A core feature of ChatR is its ability to safely execute R code. It uses sandboxed subprocesses to run Rscript with strict time and resource limits. This isolation prevents unintended side effects and ensures a secure environment.

The system captures key outputs, including:

  • Function help pages (?function).
  • Source code for packages or functions.
  • Live code outputs, which are then summarized to provide users with verified results.

RAG System & Data Sources

The RAG system is built on a foundation of meticulously indexed R documentation. It processes and indexes a range of sources, including:

  • CRAN Man Pages and Vignettes: These are processed and chunked to ensure that each retrieved snippet is contextually rich and highly relevant.
  • Task Views & "Writing R Extensions": These resources are included to provide high-level, conceptual knowledge for more complex queries.

Advanced Task Orchestration (Multi-Package & Multi-Solution)

ChatR will move beyond single-function requests to orchestrate complex, multi-step tasks across multiple packages. When a user asks a high-level question (e.g., "How do I do linear regression in R?"), ChatR will use an internal decision-making process to:

  • Identify Multiple Solutions
    Recognize that a task can be accomplished in different ways (e.g., using lm() from the stats package, or glm() from stats with specific parameters). It will also consider common tidyverse alternatives (e.g., tidymodels). This provides the user with a choice of tools.

  • Sequence Package Dependencies
    Create a logical workflow of R packages and functions. For example, a linear regression task often requires data manipulation (dplyr) and visualization (ggplot2) before the modeling step. ChatR will present this sequence, explaining why each step and package is needed.

  • Provide Tiered Suggestions
    Offer a brief overview of all possible solutions, then provide a detailed, step-by-step guide for one or two of the most common or recommended approaches. This prevents overwhelming the user while still acknowledging alternative methods.


Interactive & Fluid Workflow Integration

To create a truly interactive experience, ChatR will integrate directly into the user's coding environment. This goes beyond a simple chat box and creates a fluid, code-first workflow.

  • Cursor-Based Code Injection
    As the user is typing in their R script or console, ChatR will provide inline, ghost-text suggestions at the cursor's position. This is similar to modern code editors but tailored for R and user-specific use cases.

  • Accept and Run on Demand
    The user can accept the suggested code with a single key press (e.g., Tab). Once accepted, a follow-up prompt will appear (e.g., [Press Enter to run]), allowing the user to immediately execute the code in the R console to see the output. This provides instant feedback and reinforces the learning process.

  • Human-in-the-Loop Validation
    The user remains in full control. They can modify the suggested code before accepting it, or simply ignore the suggestion and continue typing. This ensures the assistant is a helpful tool, not a replacement for the programmer's judgment.

The system uses a thin cache for metadata and downloads full documentation on-demand, which keeps its local footprint minimal.

Delivery Options

ChatR is designed to be accessible wherever you work.

Option A: R Integration (Featured)

Complete R client with all AI capabilities:

  • Functions: chatr(), chatr_repl(), chatr_analyze(), chatr_list_data(), help_explain()
  • Smart Features: Auto-start backend, interactive REPL, data analysis planning
  • Usage: library(chatr) then use any function

Option B: Cross-Platform CLI

Command-line tool for terminal users:

chatr chat "How do I create a violin plot?"
chatr chat --interactive  # Interactive mode
chatr mcp --port 8002     # MCP server for agentic frameworks

Option C: MCP Integration (NEW!)

Tools for agentic frameworks (Cursor, Copilot, etc.):

  • MCP Server: chatr mcp --port 8002
  • 6 Tools Available: r_execute, r_help, r_search, r_explain, r_package_info, r_vignettes
  • Usage: Integrate with agentic frameworks via HTTP endpoints
  • Status: βœ… Working on port 8002 (separate from main server)

Quick Test:

# Start MCP server
chatr mcp --port 8002

# Test endpoints
curl http://localhost:8002/mcp/health
curl -X POST http://localhost:8002/mcp/execute \
  -H "Content-Type: application/json" \
  -d '{"tool": "r_execute", "parameters": {"code": "summary(mtcars)"}}'

About

An Intelligent, Local Assistant for R Programmers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published