This project implements an anomaly detection solution for identifying unusual patterns in fishing vessel movements using AWS services and Terraform.
The solution analyzes vessel movement data (positions, speeds, courses) to identify vessels exhibiting anomalous behavior patterns. It uses Random Cut Forest, an unsupervised machine learning algorithm, to detect outliers in aggregated vessel metrics.
- Data Storage: Raw vessel tracking data stored in S3
- Data Processing: AWS Glue ETL job aggregates data by vessel (mmsi)
- Machine Learning: Amazon SageMaker trains and hosts Random Cut Forest model
- Deployment: All infrastructure managed with Terraform
The project uses AIS (Automatic Identification System) fishing vessel data from Global Fishing Watch. For detailed information about the dataset, see DATA.md.
-
Clone this repository
-
Apply Terraform configuration
cd terraform terraform init terraform apply -
Upload vessel data to S3
# Upload data to the raw/ folder in the S3 bucket aws s3 cp drifting_longlines.csv s3://fishing-anomaly-detection-1744553260/raw/
- S3 Bucket: Stores raw data and processed results
- Glue ETL Job: Aggregates vessel data by mmsi
- SageMaker: Trains Random Cut Forest model on aggregated data
.
├── DATA.md # Dataset information
├── README.md # This file
├── glue/
│ └── longline_etl.py # Vessel data aggregation script
└── terraform/ # Infrastructure as code
├── glue.tf # AWS Glue resources
├── main.tf # Core infrastructure
├── outputs.tf # Output values
├── scripts/ # SageMaker training scripts
└── variables.tf # Configuration variables