Malware AI Agent

The Malware AI Agent is an advanced, AI-powered tool designed for malware analysis and threat intelligence generation. It collects malware metadata from multiple sources (MalwareBazaar, VirusTotal, VirusShare, and local samples), extracts features, classifies malware using machine learning, and generates detailed threat intelligence reports in multiple formats (JSON, Markdown, text). The project leverages YARA rules for static analysis, Random Forest classification for malware type prediction, and secure handling of API keys using environment variables.

Features

Multi-Source Data Collection: Fetches malware metadata from:
- MalwareBazaar (abuse.ch)
- VirusTotal (optional, with API key)
- VirusShare (optional, with API key)
- Local sample directories
Feature Extraction:
- Static analysis using YARA rules and pattern matching
- Metadata-based features (file size, type, tags, etc.)
- String analysis for suspicious patterns (URLs, IP addresses, registry keys)
- PE file analysis for Windows executables
Machine Learning Classification: Uses a Random Forest Classifier to predict malware types (e.g., ransomware, trojan, botnet).
Threat Intelligence Reports: Generates reports in JSON, Markdown, and text formats, summarizing malware families, file types, and predictions.
Secure Configuration: Stores API keys in a .env file or secure configuration file, excluded from version control.
Extensible Design: Supports additional data sources, YARA rules, and machine learning models (e.g., deep learning with transformers and torch).
Logging and Error Handling: Comprehensive logging to malware_agent.log for debugging and monitoring.
Safe Malware Handling: Optional sample downloading with containment in a quarantine directory (disabled by default).

Prerequisites

Python 3.8+
API Keys (optional, depending on data sources):
- MalwareBazaar: Sign up at https://bazaar.abuse.ch/login/
- VirusTotal: Register at https://www.virustotal.com/
- VirusShare: Request access at https://virusshare.com/
System Requirements:
- Adequate disk space for data, models, and samples (if downloading)
- Optional: GPU for deep learning with torch (if enabled)
Dependencies: Listed in requirements.txt (see Installation section)

Installation

Follow these steps to set up the Malware AI Agent:

Clone the Repository:

git clone https://github.com/ties2/malware-ai-agent
cd malware-ai-agent

Create a Virtual Environment (recommended):

python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows

Install Dependencies:
```
pip install -r requirements.txt
```

Configure API Keys:

Create a .env file in the project root:

MALWAREBAZAAR_API_KEY=your_malwarebazaar_api_key
VIRUSTOTAL_API_KEY=your_virustotal_api_key
VIRUSSHARE_API_KEY=your_virusshare_api_key

Alternatively, create a ~/.malware_agent_config.json file:

{
    "MALWAREBAZAAR_API_KEY": "your_malwarebazaar_api_key",
    "VIRUSTOTAL_API_KEY": "your_virustotal_api_key",
    "VIRUSSHARE_API_KEY": "your_virusshare_api_key"
}

Set file permissions (Linux/macOS):

chmod 600 .env
chmod 600 ~/.malware_agent_config.json

Verify Setup:
- Run the script to ensure dependencies and configuration are correct:
```
python malware_agent.py
```

Usage

Run the Script:
```
python malware_agent.py
```
This will:
- Collect malware metadata from configured sources
- Extract features and classify malware
- Generate reports in output/ (JSON, Markdown, text)
Customize Configuration (optional):
- Edit malware_agent.py to modify Config settings, such as:
  - DATASOURCES: Add/remove sources (["malwarebazaar", "local", "virusshare", "virustotal"])
  - DOWNLOAD_SAMPLES: Set to True to download samples (use with caution in a secure environment)
  - REPORT_FORMATS: Choose output formats (["json", "markdown", "txt"])
View Output:
- Check output/ for generated reports (e.g., malware_report_20250428_120000.md)
- Review malware_agent.log for execution details and errors
Safety Note:
- Do not enable DOWNLOAD_SAMPLES unless running in a secure, isolated environment (e.g., a sandbox or VM).
- Ensure .env or ~/.malware_agent_config.json is excluded from version control (see .gitignore).

Sample Output

Below are excerpts from sample reports generated by the Malware AI Agent.

Markdown Report (`malware_report_20250428_120000.md`)

# Malware Analysis Report

**Generated on:** 2025-04-28 12:00:00

## Summary
- **Total Samples Analyzed:** 50
- **Detected Malware Types:**
  - ransomware: 15 samples
  - trojan: 20 samples
  - botnet: 10 samples
  - unknown: 5 samples

## Detailed Analysis

### Sample 1
- **SHA256 Hash:** a1b2c3d4e5f6...
- **File Name:** sample1.exe
- **Predicted Malware Type:** ransomware
- **Confidence:** 85.23%
- **Probability Distribution:**
  - ransomware: 85.23%
  - trojan: 10.12%
  - botnet: 4.65%
- **Key Features:**
  - file_size: 204800
  - is_pe: 1
  - tag_ransomware: 1
  - static_entropy: 7.8
  - static_suspicious_string_count: 3
- **Static Analysis:**
  - YARA Matches:
    - Rule: Ransomware_Generic (Detects potential ransomware characteristics)
  - Suspicious Strings:
    - "your files have been encrypted" (ransomware)
    - "bitcoin payment" (ransomware)
    - "http://malicious.site" (url)

### Sample 2
- **SHA256 Hash:** f6e5d4c3b2a1...
- **File Name:** trojan.dll
- **Predicted Malware Type:** trojan
- **Confidence:** 92.15%
...

## Notes
- This report was generated automatically by the Malware AI Agent.
- For detailed technical information, refer to the JSON report.

Text Report (`malware_report_20250428_120000.txt`)

Malware Analysis Report
==============================
Generated on: 2025-04-28 12:00:00

Summary
------------------------------
Total Samples Analyzed: 50
Detected Malware Types:
  - ransomware: 15 samples
  - trojan: 20 samples
  - botnet: 10 samples
  - unknown: 5 samples

Detailed Analysis
------------------------------
Sample 1
  SHA256 Hash: a1b2c3d4e5f6...
  File Name: sample1.exe
  Predicted Malware Type: ransomware
  Confidence: 85.23%
  Probability Distribution:
    - ransomware: 85.23%
    - trojan: 10.12%
    - botnet: 4.65%

Sample 2
  SHA256 Hash: f6e5d4c3b2a1...
  File Name: trojan.dll
  Predicted Malware Type: trojan
  Confidence: 92.15%
...

Notes
------------------------------
- This report was generated automatically by the Malware AI Agent.

Log File (`malware_agent.log`)

2025-04-28 12:00:00,000 - malware_ai_agent - INFO - Starting Malware AI Agent
2025-04-28 12:00:00,001 - malware_ai_agent - INFO - Loaded configuration from .env
2025-04-28 12:00:00,002 - malware_ai_agent - INFO - Created directory: output
2025-04-28 12:00:01,123 - malware_ai_agent - INFO - Collecting data from sources: malwarebazaar, local, virusshare
2025-04-28 12:00:05,456 - malware_ai_agent - INFO - Collected 50 unique samples from 3 sources
2025-04-28 12:00:10,789 - malware_ai_agent - INFO - Loaded model from models/malware_model.pkl
2025-04-28 12:00:12,234 - malware_ai_agent - INFO - Report generated: output/malware_report_20250428_120000.md
2025-04-28 12:00:12,345 - malware_ai_agent - INFO - Malware AI Agent completed successfully

CI/CD

This project uses GitHub Actions for continuous integration and deployment:

Linting: Runs flake8 to enforce code style.
Formatting: Uses black to check code formatting.
Type Checking: Executes mypy for static type analysis.
Testing: Runs pytest for unit tests (add tests in a tests/ directory).
Build: Ensures dependencies are installed and the script runs without errors.

Testing

The project includes unit tests for key components using pytest. To run tests:

pytest tests/

## Project Structure

Project Structure for Malware AI Agent

malware-ai-agent/
├── .env                    # Environment file for API keys (e.g., MalwareBazaar, VirusTotal, VirusShare)
├── .gitignore              # Specifies files/directories to ignore in version control (e.g., .env, logs, data)
├── requirements.txt        # Lists Python dependencies (e.g., Flask, pandas, scikit-learn, yara-python)
├── malware_agent.py        # Core Malware AI Agent script containing DataCollector, FeatureExtractor, etc.
├── Dashboard.py            # Flask web application for the black and green themed dashboard
├── templates/              # HTML templates for Flask web interface
│   ├── index.html          # Home page for initiating analysis or file uploads
│   ├── upload.html         # Page for uploading files for malware analysis
│   ├── report.html         # Page for viewing generated analysis reports
├── static/                 # Static assets for web interface
│   ├── css/
│   │   ├── style.css       # CSS file defining the black and green theme
├── data/                   # Stores collected data and extracted features
│   ├── malware_data.json   # Collected malware metadata
│   ├── extracted_features.pkl # Extracted feature data
│   ├── downloaded_samples.json # Metadata for downloaded samples
├── models/                 # Stores trained machine learning models
│   ├── malware_model.pkl   # Trained RandomForestClassifier model
│   ├── model_metadata.json # Model metadata (feature names, labels, timestamp)
├── output/                 # Stores generated analysis reports
│   ├── malware_report_*.{json,md,txt} # Reports in JSON, Markdown, and text formats
├── samples/                # Stores malware samples
│   ├── uploads/            # Subdirectory for user-uploaded files
│   ├── quarantine/         # Subdirectory for downloaded malware samples
├── patterns/               # Stores malware pattern definitions
│   ├── malware_patterns.json # JSON file with regex and hex patterns
├── yara_rules/             # Stores YARA rules for malware detection
│   ├── ransomware.yar      # YARA rule for ransomware detection
│   ├── backdoor.yar        # YARA rule for backdoor detection
│   ├── trojan.yar          # YARA rule for trojan detection
├── malware_agent.log       # Log file for Malware AI Agent operations
├── web_interface.log       # Log file for Flask web interface operations

Notes

Malware Handling: Only enable sample downloading in a secure, isolated environment.

For issues or feature requests, open an issue on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
archive		archive
data		data
models		models
output		output
patterns		patterns
static/css		static/css
templates		templates
tests		tests
.DS_Store		.DS_Store
.coverage		.coverage
.gitignore		.gitignore
README.md		README.md
app.py		app.py
malware_agent.log		malware_agent.log
malware_ai_agent4.py		malware_ai_agent4.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Malware AI Agent

Features

Prerequisites

Installation

Usage

Sample Output

Markdown Report (`malware_report_20250428_120000.md`)

Text Report (`malware_report_20250428_120000.txt`)

Log File (`malware_agent.log`)

CI/CD

Testing

Project Structure for Malware AI Agent

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ties2/malware-ai-agent

Folders and files

Latest commit

History

Repository files navigation

Malware AI Agent

Features

Prerequisites

Installation

Usage

Sample Output

Markdown Report (malware_report_20250428_120000.md)

Text Report (malware_report_20250428_120000.txt)

Log File (malware_agent.log)

CI/CD

Testing

Project Structure for Malware AI Agent

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Markdown Report (`malware_report_20250428_120000.md`)

Text Report (`malware_report_20250428_120000.txt`)

Log File (`malware_agent.log`)

Packages