ETL Pipeline

Overview

This project is an ETL (Extract, Transform, Load) pipeline that retrieves data from 2 websites.

Features

Data Extraction: Scrapes data from https://www.ft.com/ and https://www.theguardian.com/europe.
Data Transformation: Counts occurrences of election, war, economy keywords in the extracted data.
Data Loading: Inserts the processed data into a PostgreSQL database.
Containerization: The project uses Docker for containerization.
Scheduling: Uses Apache Airflow to schedule and manage the pipeline runs.

Project Structure

Installation

Docker
Docker Compose

Setup

Clone the repository:

git clone https://github.com/AnetteTaivere/ETL_pipeline.git
cd etl_pipeline

Buid and start Docker containers:
```
docker compose up -d
```
Verify that the containers are running:
```
docker compose ps -a
```

Configuration

Generate FERNET_KEY value using python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
Add key to AIRFLOW__CORE__FERNET_KEY
You also need to create airflow user. For this, take the webserver's docker container id, and go into the container. docker exec -it <CONTAINER_ID> /bin/bash In the container create new user with following command. airflow users create --username admin --password admin --firstname First --lastname Last --role Admin --email admin@example.com

Usage

Airflow - http://localhost:8080

Login with created user
Check for two running DAGs. Depending on the day, main.py is scheduled to run at the next available hour or the hour after.

Grafana - http://localhost:3000

default login: admin/admin
If new password is asked then skip
Open Dashboards -> keywords_dashboard
See data (only there is one of the dags have run)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
dags		dags
database		database
grafana		grafana
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Pipeline

Overview

Features

Project Structure

Installation

Setup

Configuration

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ETL Pipeline

Overview

Features

Project Structure

Installation

Setup

Configuration

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages