A containerized ETL pipeline that joins FDA drug, establishment, and inspection data to create an early warning list for potential drug shortages.
This repository contains the code for my B.S. in Data Analytics capstone project.
The project links multiple FDA datasets to identify facilities and products that may contribute to future drug shortages.
A companion bot named DataResearchBot/1.0 collects publicly available FDA data.
- Respects all robots.txt guidelines
- Follows crawl delays defined by each site
- Accesses only publicly available data sources
If you have any questions or feedback, please open an issue in this repository.
© 2025 Dan Thomson