A personal analytics system for exploring and analyzing video game data — inspired by Letterboxd, but for gamers.
| Member | Role | Status |
|---|---|---|
| Zakharii Furmanets | Data Engineer | ✅ COMPLETED |
| Adam Pabianiak | Data Analyst | 🔄 READY TO START |
- RAWG API data fetching
- JSON → CSV transformation
- SQLite database schema & loading
- End-to-end testing
- Documentation
- Exploratory data analysis
- SQL queries & insights
- Data visualizations
- Final report & presentation
- Python 3.8+
- Git
# Clone repository
git clone https://github.com/yourname/gamebase.git
cd gamebase
# Create virtual environment
python -m venv venv
# Activate virtual environment
# macOS/Linux:
source venv/bin/activate
# Windows:
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- Go to https://rawg.io/apidocs
- Register and get your free API key
- Create
.envfile in project root:
RAWG_API_KEY=your_api_key_hereSince raw data is not stored in Git, you need to fetch it first:
# 1. Fetch raw data from API (takes ~30 seconds)
python src/fetch/fetch_games.py
# 2. Transform JSON to CSV
python src/transform/transform_to_csv.py
# 3. Create database schema
python src/database/database_schema.py
# 4. Load data into SQLite database
python src/database/load_csv_to_db.py
# 5. Test everything works
python src/test_pipeline.pyExpected output:
✅ Data fetch test PASSED: 3 files, 120 total games
✅ Data transform test PASSED
✅ Database schema test PASSED
✅ Database loading test PASSED
✅ Data integrity test PASSED
🎉 ALL TESTS PASSED - Pipeline is working correctly!
Your data is now ready! Check:
- Database:
db/games.db(120 games) - CSV files:
data/transformed/ - Analysis guide:
ANALYST_GUIDE.md
gamebase/
├── src/
│ ├── fetch/ # 🔧 Data fetching (Zakharii)
│ │ └── fetch_games.py # RAWG API data fetcher
│ ├── transform/ # 🔧 Data transformation (Zakharii)
│ │ └── transform_to_csv.py # JSON to CSV converter
│ ├── database/ # 🔧 Database setup (Zakharii)
│ │ ├── database_schema.py # SQLite schema creation
│ │ └── load_csv_to_db.py # CSV to database loader
│ ├── analysis/ # 📊 Your analysis scripts (Adam)
│ │ └── README.md # Analysis guidelines
│ └── test_pipeline.py # 🧪 End-to-end testing
├── data/ # 📁 Data files (not in Git)
│ ├── raw/ # JSON files from API
│ └── transformed/ # Clean CSV files
├── db/ # 💾 Database (not in Git)
│ └── games.db # SQLite database
├── results/ # 📈 Analysis outputs (not in Git)
│ ├── exports/ # Data exports
│ ├── charts/ # Visualizations
│ └── reports/ # Final reports
├── notebooks/ # 📓 Jupyter notebooks (Adam)
├── docs/ # 📚 Documentation
├── .env # 🔐 API keys (not in Git)
├── .gitignore # Git ignore rules
├── requirements.txt # Python dependencies
├── README.md # This file
├── ANALYST_GUIDE.md # Detailed guide for Adam
└── project_plan.md # Original project plan
games- Main game information (120 records)- Rating, release date, metacritic score, playtime
- Categorized ratings and popularity
genres- Game genres (17 unique)platforms- Gaming platforms (11 unique)stores- Digital stores (9 unique)tags- Game tags (182 unique)
game_genres- Games ↔ Genres (329 relationships)game_platforms- Games ↔ Platforms (525 relationships)game_stores- Games ↔ Stores (383 relationships)game_tags- Games ↔ Tags (1162 relationships)game_ratings_detail- Detailed rating breakdown (480 records)
All database tables are also available as CSV files in data/transformed/:
games.csv,game_genres.csv,platforms_lookup.csv, etc.
# Open database
sqlite3 db/games.db
# Example queries
.tables
SELECT name, rating, primary_genre FROM games ORDER BY rating DESC LIMIT 10;
SELECT genre_name, COUNT(*) FROM game_genres GROUP BY genre_name;
.quitimport pandas as pd
import sqlite3
# From database
conn = sqlite3.connect('db/games.db')
games = pd.read_sql('SELECT * FROM games', conn)
# From CSV
games = pd.read_csv('data/transformed/games.csv')- DB Browser for SQLite - https://sqlitebrowser.org/
- DBeaver - Universal database tool
- Any tool that supports SQLite
- What's the average game rating across all games?
- Which year had the most game releases?
- What are the most popular genres and platforms?
- Distribution of ESRB ratings?
- How do ratings correlate with metacritic scores?
- Rating trends over the years (2020-2024)
- Which platforms have the highest-rated games?
- Genre popularity by release year
- Most successful combinations (genre + platform)
- Rating distribution histogram
- Games by release year
- Genre popularity comparison
- Platform market share
- Rating vs Metacritic scatter plot
- Top 20 highest-rated games
# Modify fetching parameters if needed
vim src/fetch/fetch_games.py
# Re-run pipeline
python src/fetch/fetch_games.py
python src/transform/transform_to_csv.py
python src/database/load_csv_to_db.py
python src/test_pipeline.py# Start analysis
cd src/analysis/
python your_analysis_script.py
# Or use Jupyter
jupyter notebook
# Save results
# Charts → results/charts/
# Reports → results/reports/
# Data exports → results/exports/Based on project_plan.md, your tasks:
- Define relevant analytical questions
- Write SQL queries to answer them
- Export fixed tables to
/results/exports/
- Create pivot tables (e.g. genre vs rating)
- Generate visualizations with plotly/seaborn
- (Optional) Connect Google Data Studio or Tableau
- Write final report & documentation
- Create presentation (slides, charts, findings)
ANALYST_GUIDE.md- Detailed guide for data analysisproject_plan.md- Original project roadmapsrc/analysis/README.md- Analysis scripts guidelinesresults/reports/README.md- Report templates
pandas>=2.1.4
requests>=2.31.0
python-dotenv>=1.0.0
sqlite3 (built-in)
matplotlib>=3.8.0
seaborn>=0.12.2
plotly>=5.17.0
jupyter>=1.0.0RAWG API → JSON files → CSV files → SQLite database
↓ ↓ ↓ ↓
fetch_games.py → transform_to_csv.py → load_csv_to_db.py
- Run
python src/test_pipeline.pyto verify data integrity - Validates entire ETL pipeline end-to-end
- Checks for orphaned records and referential integrity
"No JSON files found"
# Make sure you ran the fetcher first
python src/fetch/fetch_games.py"Database not found"
# Create database schema first
python src/database/database_schema.py"API key error"
# Check your .env file exists and has correct key
cat .env"Permission denied"
# Make sure virtual environment is activated
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows- Data pipeline issues: Contact Zakharii
- API problems: Check RAWG API documentation
- Analysis questions: See
ANALYST_GUIDE.md
- Zakharii Furmanets - Data Engineer (Pipeline & Infrastructure)
- Adam Pabianiak - Data Analyst (Analysis & Insights)
Adam, follow these steps:
- ✅ Complete the "Quick Start" section above
- ✅ Verify data is generated with
python src/test_pipeline.py - ✅ Read
ANALYST_GUIDE.mdfor detailed analysis instructions - ✅ Start exploring the data!
Good luck with the analysis! 🎮📊