The Malware AI Agent is an advanced, AI-powered tool designed for malware analysis and threat intelligence generation. It collects malware metadata from multiple sources (MalwareBazaar, VirusTotal, VirusShare, and local samples), extracts features, classifies malware using machine learning, and generates detailed threat intelligence reports in multiple formats (JSON, Markdown, text). The project leverages YARA rules for static analysis, Random Forest classification for malware type prediction, and secure handling of API keys using environment variables.
- Multi-Source Data Collection: Fetches malware metadata from:
- MalwareBazaar (abuse.ch)
- VirusTotal (optional, with API key)
- VirusShare (optional, with API key)
- Local sample directories
- Feature Extraction:
- Static analysis using YARA rules and pattern matching
- Metadata-based features (file size, type, tags, etc.)
- String analysis for suspicious patterns (URLs, IP addresses, registry keys)
- PE file analysis for Windows executables
- Machine Learning Classification: Uses a Random Forest Classifier to predict malware types (e.g., ransomware, trojan, botnet).
- Threat Intelligence Reports: Generates reports in JSON, Markdown, and text formats, summarizing malware families, file types, and predictions.
- Secure Configuration: Stores API keys in a
.envfile or secure configuration file, excluded from version control. - Extensible Design: Supports additional data sources, YARA rules, and machine learning models (e.g., deep learning with
transformersandtorch). - Logging and Error Handling: Comprehensive logging to
malware_agent.logfor debugging and monitoring. - Safe Malware Handling: Optional sample downloading with containment in a quarantine directory (disabled by default).
- Python 3.8+
- API Keys (optional, depending on data sources):
- MalwareBazaar: Sign up at https://bazaar.abuse.ch/login/
- VirusTotal: Register at https://www.virustotal.com/
- VirusShare: Request access at https://virusshare.com/
- System Requirements:
- Adequate disk space for data, models, and samples (if downloading)
- Optional: GPU for deep learning with
torch(if enabled)
- Dependencies: Listed in
requirements.txt(see Installation section)
Follow these steps to set up the Malware AI Agent:
-
Clone the Repository:
git clone https://github.com/ties2/malware-ai-agent cd malware-ai-agent -
Create a Virtual Environment (recommended):
python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows
-
Install Dependencies:
pip install -r requirements.txt
-
Configure API Keys:
- Create a
.envfile in the project root:MALWAREBAZAAR_API_KEY=your_malwarebazaar_api_key VIRUSTOTAL_API_KEY=your_virustotal_api_key VIRUSSHARE_API_KEY=your_virusshare_api_key - Alternatively, create a
~/.malware_agent_config.jsonfile:{ "MALWAREBAZAAR_API_KEY": "your_malwarebazaar_api_key", "VIRUSTOTAL_API_KEY": "your_virustotal_api_key", "VIRUSSHARE_API_KEY": "your_virusshare_api_key" } - Set file permissions (Linux/macOS):
chmod 600 .env chmod 600 ~/.malware_agent_config.json
- Create a
-
Verify Setup:
- Run the script to ensure dependencies and configuration are correct:
python malware_agent.py
- Run the script to ensure dependencies and configuration are correct:
-
Run the Script:
python malware_agent.py
This will:
- Collect malware metadata from configured sources
- Extract features and classify malware
- Generate reports in
output/(JSON, Markdown, text)
-
Customize Configuration (optional):
- Edit
malware_agent.pyto modifyConfigsettings, such as:DATASOURCES: Add/remove sources (["malwarebazaar", "local", "virusshare", "virustotal"])DOWNLOAD_SAMPLES: Set toTrueto download samples (use with caution in a secure environment)REPORT_FORMATS: Choose output formats (["json", "markdown", "txt"])
- Edit
-
View Output:
- Check
output/for generated reports (e.g.,malware_report_20250428_120000.md) - Review
malware_agent.logfor execution details and errors
- Check
-
Safety Note:
- Do not enable
DOWNLOAD_SAMPLESunless running in a secure, isolated environment (e.g., a sandbox or VM). - Ensure
.envor~/.malware_agent_config.jsonis excluded from version control (see.gitignore).
- Do not enable
Below are excerpts from sample reports generated by the Malware AI Agent.
# Malware Analysis Report
**Generated on:** 2025-04-28 12:00:00
## Summary
- **Total Samples Analyzed:** 50
- **Detected Malware Types:**
- ransomware: 15 samples
- trojan: 20 samples
- botnet: 10 samples
- unknown: 5 samples
## Detailed Analysis
### Sample 1
- **SHA256 Hash:** a1b2c3d4e5f6...
- **File Name:** sample1.exe
- **Predicted Malware Type:** ransomware
- **Confidence:** 85.23%
- **Probability Distribution:**
- ransomware: 85.23%
- trojan: 10.12%
- botnet: 4.65%
- **Key Features:**
- file_size: 204800
- is_pe: 1
- tag_ransomware: 1
- static_entropy: 7.8
- static_suspicious_string_count: 3
- **Static Analysis:**
- YARA Matches:
- Rule: Ransomware_Generic (Detects potential ransomware characteristics)
- Suspicious Strings:
- "your files have been encrypted" (ransomware)
- "bitcoin payment" (ransomware)
- "http://malicious.site" (url)
### Sample 2
- **SHA256 Hash:** f6e5d4c3b2a1...
- **File Name:** trojan.dll
- **Predicted Malware Type:** trojan
- **Confidence:** 92.15%
...
## Notes
- This report was generated automatically by the Malware AI Agent.
- For detailed technical information, refer to the JSON report.Malware Analysis Report
==============================
Generated on: 2025-04-28 12:00:00
Summary
------------------------------
Total Samples Analyzed: 50
Detected Malware Types:
- ransomware: 15 samples
- trojan: 20 samples
- botnet: 10 samples
- unknown: 5 samples
Detailed Analysis
------------------------------
Sample 1
SHA256 Hash: a1b2c3d4e5f6...
File Name: sample1.exe
Predicted Malware Type: ransomware
Confidence: 85.23%
Probability Distribution:
- ransomware: 85.23%
- trojan: 10.12%
- botnet: 4.65%
Sample 2
SHA256 Hash: f6e5d4c3b2a1...
File Name: trojan.dll
Predicted Malware Type: trojan
Confidence: 92.15%
...
Notes
------------------------------
- This report was generated automatically by the Malware AI Agent.
2025-04-28 12:00:00,000 - malware_ai_agent - INFO - Starting Malware AI Agent
2025-04-28 12:00:00,001 - malware_ai_agent - INFO - Loaded configuration from .env
2025-04-28 12:00:00,002 - malware_ai_agent - INFO - Created directory: output
2025-04-28 12:00:01,123 - malware_ai_agent - INFO - Collecting data from sources: malwarebazaar, local, virusshare
2025-04-28 12:00:05,456 - malware_ai_agent - INFO - Collected 50 unique samples from 3 sources
2025-04-28 12:00:10,789 - malware_ai_agent - INFO - Loaded model from models/malware_model.pkl
2025-04-28 12:00:12,234 - malware_ai_agent - INFO - Report generated: output/malware_report_20250428_120000.md
2025-04-28 12:00:12,345 - malware_ai_agent - INFO - Malware AI Agent completed successfully
This project uses GitHub Actions for continuous integration and deployment:
- Linting: Runs
flake8to enforce code style. - Formatting: Uses
blackto check code formatting. - Type Checking: Executes
mypyfor static type analysis. - Testing: Runs
pytestfor unit tests (add tests in atests/directory). - Build: Ensures dependencies are installed and the script runs without errors.
The project includes unit tests for key components using pytest. To run tests:
pytest tests/
## Project Structuremalware-ai-agent/
├── .env # Environment file for API keys (e.g., MalwareBazaar, VirusTotal, VirusShare)
├── .gitignore # Specifies files/directories to ignore in version control (e.g., .env, logs, data)
├── requirements.txt # Lists Python dependencies (e.g., Flask, pandas, scikit-learn, yara-python)
├── malware_agent.py # Core Malware AI Agent script containing DataCollector, FeatureExtractor, etc.
├── Dashboard.py # Flask web application for the black and green themed dashboard
├── templates/ # HTML templates for Flask web interface
│ ├── index.html # Home page for initiating analysis or file uploads
│ ├── upload.html # Page for uploading files for malware analysis
│ ├── report.html # Page for viewing generated analysis reports
├── static/ # Static assets for web interface
│ ├── css/
│ │ ├── style.css # CSS file defining the black and green theme
├── data/ # Stores collected data and extracted features
│ ├── malware_data.json # Collected malware metadata
│ ├── extracted_features.pkl # Extracted feature data
│ ├── downloaded_samples.json # Metadata for downloaded samples
├── models/ # Stores trained machine learning models
│ ├── malware_model.pkl # Trained RandomForestClassifier model
│ ├── model_metadata.json # Model metadata (feature names, labels, timestamp)
├── output/ # Stores generated analysis reports
│ ├── malware_report_*.{json,md,txt} # Reports in JSON, Markdown, and text formats
├── samples/ # Stores malware samples
│ ├── uploads/ # Subdirectory for user-uploaded files
│ ├── quarantine/ # Subdirectory for downloaded malware samples
├── patterns/ # Stores malware pattern definitions
│ ├── malware_patterns.json # JSON file with regex and hex patterns
├── yara_rules/ # Stores YARA rules for malware detection
│ ├── ransomware.yar # YARA rule for ransomware detection
│ ├── backdoor.yar # YARA rule for backdoor detection
│ ├── trojan.yar # YARA rule for trojan detection
├── malware_agent.log # Log file for Malware AI Agent operations
├── web_interface.log # Log file for Flask web interface operations
- Malware Handling: Only enable sample downloading in a secure, isolated environment.
For issues or feature requests, open an issue on the GitHub repository.