DeepEval Security Testing Starter

A comprehensive testing framework for evaluating LLM responses on API security topics using DeepEval and OpenAI.

🔐 Overview

This project provides a complete testing suite for validating LLM-generated security advice and responses. It focuses on:

API Security: Authentication, authorization, and common vulnerabilities
Accuracy Testing: Ensuring responses are relevant and factually correct
Hallucination Detection: Preventing fabricated or misleading security advice
RAG Evaluation: Testing retrieval-augmented generation quality
Prompt Regression: Comparing prompt versions and preventing regressions
Inference Provider: Uses OpenAI for LLM inference

📁 Project Structure

startDeepEval/
├── .github/
│   └── workflows/
│       └── deepeval.yml          # CI/CD pipeline
├── datasets/
│   ├── golden_dataset.json       # Golden test cases for accuracy
│   └── rag_dataset.json          # RAG test cases with retrieval context
├── src/
│   ├── __init__.py
│   ├── llm_client.py             # OpenAI client for security responses
│   ├── rag_client.py             # OpenAI RAG client with knowledge base
````markdown
# DeepEval Security Testing Starter

A comprehensive testing framework for evaluating LLM responses on API security topics using DeepEval and OpenAI.

## 🔐 Overview

This project provides a complete testing suite for validating LLM-generated security advice and responses. It focuses on:

- **API Security**: Authentication, authorization, and common vulnerabilities
- **Accuracy Testing**: Ensuring responses are relevant and factually correct
- **Hallucination Detection**: Preventing fabricated or misleading security advice
- **RAG Evaluation**: Testing retrieval-augmented generation quality
- **Prompt Regression**: Comparing prompt versions and preventing regressions

## 📁 Project Structure

startDeepEval/ ├── .github/ │ └── workflows/ │ └── deepeval.yml # CI/CD pipeline ├── datasets/ │ ├── golden_dataset.json # Golden test cases for accuracy │ └── rag_dataset.json # RAG test cases with retrieval context ├── src/ │ ├── init.py │ ├── llm_client.py # OpenAI client for security responses │ ├── rag_client.py # OpenAI RAG client with knowledge base │ └── prompt_versions.py # Prompt version management ├── tests/ │ ├── init.py │ ├── conftest.py # Pytest fixtures │ ├── test_accuracy.py # Accuracy and relevancy tests │ ├── test_hallucination.py # Hallucination detection tests │ ├── test_rag.py # RAG retrieval and generation tests │ └── test_prompt_regression.py # Prompt version regression tests ├── deepeval_results/ # Test results output directory ├── .env.example # Environment variables template ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration ├── pytest.ini # Pytest configuration └── README.md # This file


## 🚀 Getting Started

### Prerequisites

- Python 3.9+
- OpenAI API key

### Installation

1. **Clone the repository**:
   ```bash
   git clone <your-repo-url>
   cd startDeepEval

Install Python dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp .env.example .env
# Edit .env and add your OpenAI API key:
# OPENAI_API_KEY=your_api_key_here

Run all tests:

pytest

Run specific test categories:

# Accuracy tests
pytest tests/test_accuracy.py

# Hallucination detection
pytest tests/test_hallucination.py

# RAG tests
pytest tests/test_rag.py

# Prompt regression
pytest tests/test_prompt_regression.py

Run with markers:

# Run only security tests
pytest -m security

# Run everything except slow tests
pytest -m "not slow"

📊 Test Metrics

Accuracy Tests

AnswerRelevancyMetric: Measures how relevant the response is to the query
FaithfulnessMetric: Ensures responses are grounded in provided context
ContextualRelevancyMetric: Validates context relevance to the query

Hallucination Tests

HallucinationMetric: Detects fabricated information not supported by context
BiasMetric: Identifies biased or unfair recommendations

RAG Tests

ContextualPrecisionMetric: Measures precision of retrieved context
ContextualRecallMetric: Evaluates completeness of retrieved context
ContextualRelevancyMetric: Assesses overall retrieval quality

Prompt Regression Tests

GEval: Custom criteria-based evaluation for comprehensiveness and quality
Version Comparison: Ensures new prompts don't regress on key metrics

🔧 Configuration

Pytest Configuration (`pytest.ini`)

Test discovery patterns
Output formatting
Custom markers for organizing tests
Logging configuration

Environment Variables (`.env`)

# Required: OpenAI API Key
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini

# DeepEval configuration
DEEPEVAL_TELEMETRY_OPT_OUT=true  # Optional
CONFIDENCE_THRESHOLD=0.7         # Optional: default threshold

📝 Writing Tests

Basic Test Structure

from deepeval import assert_test
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric

def test_my_security_feature(llm_client):
    query = "How do I secure my API?"
    response = llm_client.generate_security_response(query)
    
    test_case = LLMTestCase(
        input=query,
        actual_output=response
    )
    
    metric = AnswerRelevancyMetric(threshold=0.7)
    assert_test(test_case, [metric])

RAG Test Structure

def test_rag_retrieval(rag_client):
    query = "How do I prevent SQL injection?"
    result = rag_client.generate_rag_response(query)
    
    test_case = LLMTestCase(
        input=query,
        actual_output=result["response"],
        retrieval_context=result["retrieval_context"]
    )
    
    metric = ContextualRelevancyMetric(threshold=0.6)
    assert_test(test_case, [metric])

🤖 CI/CD Integration

The project includes a GitHub Actions workflow (.github/workflows/deepeval.yml) that:

Runs on push to main/develop branches
Runs on pull requests
Executes weekly on Sunday (for regression detection)
Requires OpenAI API key configured as GitHub secret

Setting Up GitHub Actions

Add your OpenAI API key as a repository secret:

Go to your repository settings
Navigate to Secrets and variables > Actions
Add a new repository secret:
- Name: OPENAI_API_KEY
- Value: Your OpenAI API key

Then push to trigger automated testing:

git push origin dev

🎯 Use Cases

Security Chatbot Validation: Ensure your security chatbot provides accurate advice
Documentation QA: Validate generated security documentation
Prompt Engineering: Test and compare different prompt versions
Compliance: Verify responses align with security standards (OWASP, NIST)
Regression Testing: Catch quality degradation in model updates
Privacy-Conscious Testing: Run tests using OpenAI with appropriate handling of sensitive data

📚 Additional Resources

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new features
Ensure all tests pass
Submit a pull request

📄 License

MIT License - feel free to use this starter template for your projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepEval Security Testing Starter

🔐 Overview

📁 Project Structure

📊 Test Metrics

Accuracy Tests

Hallucination Tests

RAG Tests

Prompt Regression Tests

🔧 Configuration

Pytest Configuration (`pytest.ini`)

Environment Variables (`.env`)

📝 Writing Tests

Basic Test Structure

RAG Test Structure

🤖 CI/CD Integration

Setting Up GitHub Actions

🎯 Use Cases

📚 Additional Resources

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
datasets		datasets
deepeval_results		deepeval_results
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
PERFORMANCE_OPTIMIZATIONS.md		PERFORMANCE_OPTIMIZATIONS.md
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

smilyutin/AIPerformance

Folders and files

Latest commit

History

Repository files navigation

DeepEval Security Testing Starter

🔐 Overview

📁 Project Structure

📊 Test Metrics

Accuracy Tests

Hallucination Tests

RAG Tests

Prompt Regression Tests

🔧 Configuration

Pytest Configuration (pytest.ini)

Environment Variables (.env)

📝 Writing Tests

Basic Test Structure

RAG Test Structure

🤖 CI/CD Integration

Setting Up GitHub Actions

🎯 Use Cases

📚 Additional Resources

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Pytest Configuration (`pytest.ini`)

Environment Variables (`.env`)

Packages