Skip to content

1ne/sre-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SRE Agent

Autonomous SRE agent for AWS. Detects anomalies, investigates incidents, and proposes remediation with continuous learning.

Architecture

Internet → CloudFront (HTTPS)
               │
               ├── / (frontend) → S3 OAC (private bucket)
               │
               └── /api/* → VPC Origin ENI → Private ALB → API Handler Lambda
                             (no public IP)    (port 80)

CloudWatch Alarm → EventBridge → Orchestrator Lambda → Step Functions
                                                            │
                                    ┌───────────────────────┼───────────────┐
                                    │                       │               │
                              Agent Loop            Store Report      Send Notification
                         (Bedrock Converse API)   (S3 + pgvector)   (Slack/PagerDuty/SNS)
                              │
                    ┌─────────┼──────────┐
                    │         │          │
              CloudWatch   X-Ray     Logs Insights
              Metrics      Traces    Queries

Three Pillars

  1. Incident Triage: CloudWatch alarm → auto-investigate → remediate → notify
  2. Proactive Prevention: Scheduled daily/weekly analysis of trends and capacity
  3. On-Demand Queries: Natural language SRE queries via dashboard or Slack

Tech Stack

Component Technology
Agent Runtime Bedrock Converse API with native tool_use (Claude Haiku/Sonnet/Opus)
Knowledge Base Aurora Serverless v2 + pgvector (Titan Embeddings v2)
Orchestration AWS Step Functions
Frontend React + Vite + TypeScript, Apple design system
Hosting CloudFront + S3 (OAC) + ALB VPC Origins
Auth Amazon Cognito
IaC AWS CDK (primary) + Terraform (alternative)
Integrations Slack, PagerDuty, Datadog, GitHub (MCP tool registry)

Lambda Functions (15)

Function Purpose
orchestrator Event classifier + Step Functions trigger
agent-loop Core reasoning engine (Converse API + tool_use cycling)
api-handler REST API for frontend dashboard
vector-search pgvector semantic search + storage
tool-dispatcher Routes tool calls via DynamoDB registry
metrics-retrieval CloudWatch + X-Ray parallel queries
log-analysis CloudWatch Logs Insights
remediation Structured remediation via Converse API
notification Routes to Slack/PagerDuty/SNS by severity
proactive-analyzer Scheduled trend analysis
slack-handler Slack events/commands
mcp-tools/datadog Datadog API integration
mcp-tools/pagerduty PagerDuty API integration
mcp-tools/slack Slack API integration
mcp-tools/github GitHub API integration

Deploy

CDK (Primary)

npm install
cd frontend && npm install && npm run build && cd ..
npm run build
npx cdk deploy

Post-Deploy: Create User

aws cognito-idp admin-create-user \
  --user-pool-id <UserPoolId> \
  --username user@example.com \
  --user-attributes Name=email,Value=user@example.com Name=email_verified,Value=true

aws cognito-idp admin-set-user-password \
  --user-pool-id <UserPoolId> \
  --username user@example.com \
  --password "YourPassword" --permanent

Terraform (Alternative)

cd terraform
cp terraform.tfvars.example terraform.tfvars   # fill in values
terraform init && terraform apply

The chaos/fault-injection infrastructure used to generate alarms for the agent to investigate lives in a separate repository and is deployed independently.

Security

  • CloudFront is the sole entry point (VPC Origins)
  • ALB in private subnet, no public IP
  • S3 blocked from public access (OAC)
  • Cognito authentication on all API routes
  • Secrets in AWS Secrets Manager

Cost

~$100-150/mo baseline (Aurora Serverless v2 + NAT Gateway + CloudFront).

About

Autonomous SRE agent for AWS — incident triage, proactive prevention, and on-demand natural-language SRE queries on Step Functions + Bedrock Converse API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors