This project provides a comprehensive toolkit for applying Machine Learning and Data-Driven approaches to digital forensics and cyber security investigations. Developed by Anand Binu Arjun as part of Cluster 2 research on AI/ML applications in digital forensics.
- ML-based classification of network traffic as benign or malicious
- Feature extraction from PCAP files
- Random Forest classifier with model persistence
- Command-line interface for training and classification
- Memory forensic analysis capabilities
- Process listing, network connection scanning, file scanning
- Registry analysis with JSON output support
- Extensible plugin architecture
- CASE (Cyber-investigation Analysis Standard Expression) compliant forensic data handling
- Investigation and evidence management
- Observable collection and tracking
- JSON data export for interoperability
-
Clone the repository:
git clone <[repository-url](https://github.com/AnandBinuArjun/ML---Data-Driven-Forensic-Automation)> cd ml-data-driven-forensic-automation
-
Install the required dependencies:
pip install -r requirements.txt
On Windows systems with permission restrictions:
pip install --user -r requirements.txt
The project can be used through the main entry point:
python main.py [module] [options]Train a model with sample data:
python main.py network-analyzer --train sample_network_traffic.csv --save-model traffic_model.joblibClassify a PCAP file:
python main.py network-analyzer --classify sample.pcap --model traffic_model.joblibAnalyze a memory image:
python main.py volatility-int --plugin pslist --image memory.dmpAvailable plugins:
pslist: List processesnetscan: Scan network connectionsfilescan: Scan file objectsregistry: Analyze registry keys
Create and manage forensic investigations:
python main.py case-pipeline --create-investigation inv_001 "Network Investigation" "Suspicious activity"
python main.py case-pipeline --add-evidence ev_001 inv_001 /path/to/evidence.pcap
python main.py case-pipeline --save case_data.json
ml-data-driven-forensic-automation/
├── src/
│ ├── data/
│ │ └── case_pipeline.py # CASE-compliant data handling
│ ├── tools/
│ │ └── network_traffic_analyzer.py # Network traffic classification
│ └── utils/
│ └── volatility_integration.py # Volatility 3 integration
├── examples/
│ └── create_sample_data.py # Sample data generator
├── tests/
│ └── test_toolkit.py # Test suite
├── main.py # Main entry point
├── requirements.txt # Python dependencies
├── demo.ipynb # Jupyter notebook demo
├── sample_network_traffic.csv # Sample dataset
└── traffic_model.joblib # Trained ML model
The project supports Google Colab for cloud-based experimentation. The demo.ipynb notebook provides examples of how to use all components in a cloud environment.
- Python 3.7+
- NumPy
- Pandas
- Scikit-learn
- Scapy
- Matplotlib
- Seaborn
- Joblib
- Volatility 3 (for memory forensics)
Anand Binu Arjun - Initial work - [Your GitHub Profile]
This project is licensed under the MIT License - see the LICENSE file for details.
- Based on research from Cluster 2: Machine Learning and Data-Driven Forensic Automation
- Inspired by the AI4DigitalForensics repository
- Utilizes FlowMeter concepts for network traffic classification
- Integrates with Volatility 3 for memory forensics
- Implements CASE standards for forensic data interoperability