Skip to content

ChoopScoop 🐾 β€” An open-source site auditor that sniffs out tags, dataLayers, and technologies to map your digital footprint.

License

Notifications You must be signed in to change notification settings

JerushaGray/ChoopScoop

Repository files navigation

ChoopScoop β€” Site Auditor & Tag Detection Tool (v2.1)

ChoopScoop is a professional-grade, Playwright-powered web auditing and tag detection tool.
Developed by Jerusha Gray as part of her MarTech and Data Strategy portfolio, under IdeoPraxis Collective LLC β€” DBA GetFunnelCaked.


Overview

ChoopScoop automates the auditing of websites to detect analytics and marketing tags, identify underlying technologies, analyze dataLayer events, and generate structured reports.
It’s designed for accuracy, transparency, and performance β€” ideal for marketing operations professionals, analysts, and engineers who want actionable insights into digital ecosystems.

Version 2.1 (MVP) focuses on stability, accuracy, and scalability, setting the foundation for future visualization and compliance modules.


Key Features

  • Modern Playwright crawler with asynchronous performance
  • Comprehensive tag detection: GA4, GTM, Facebook, LinkedIn, TikTok, Adobe, Segment, and more
  • DataLayer analysis: Automatically parses GA4 and ecommerce events
  • Performance metrics: Load time, first contentful paint, DOM timings
  • Cross-platform: Works on macOS, Windows, and Linux
  • Clean exports: JSON, CSV, and an interactive HTML dashboard
  • Resumable crawls: State management for large audits
  • Low memory footprint: Smart flush-to-disk and batch processing

Installation

Option 1 β€” Install from Source

git clone https://github.com/<your-handle>/choopscoop.git
cd choopscoop
pip install -r requirements_v2.txt
bash install_v2.1.sh

Option 2 β€” Install via pip (recommended)

Once published or locally packaged:

pip install .

Post-Install Setup

After installation, run this once to install Playwright browsers (required for audits):

choopscoop setup

This step ensures the Chromium browser engine is properly configured.


Usage

Basic Example

choopscoop https://example.com

With Options

choopscoop https://example.com --max-pages 200 --max-depth 3 --format all

From Config

You can also define settings in config.yaml for reusable crawl parameters.


Outputs

ChoopScoop automatically generates three export formats:

Format File Description
JSON site-audit.json Full crawl data including tags, technologies, and metrics
CSV site-audit.csv Summarized audit metrics
HTML site-audit.html Interactive dashboard for visual review

πŸ“ Project Structure

choopscoop/
β”œβ”€β”€ choopscoop_site_auditor_v2_1.py
β”œβ”€β”€ tag_patterns.py
β”œβ”€β”€ config.yaml
β”œβ”€β”€ requirements_v2.txt
β”œβ”€β”€ install_v2.1.sh
β”œβ”€β”€ LICENSE.md
β”œβ”€β”€ ROADMAP.md
β”œβ”€β”€ CONTRIBUTING.md
└── docs/
    β”œβ”€β”€ DELIVERY-SUMMARY-v2.1.md
    β”œβ”€β”€ PATCH-NOTES-v2.1.md
    β”œβ”€β”€ VERSION-COMPARISON.md
    └── QUICK-START-v2.1.md

Roadmap

ChoopScoop evolves thoughtfully β€” see ROADMAP.md for planned enhancements and version milestones.


Contributing

This project is maintained as a personal portfolio artifact.
However, it follows open documentation and structure standards to support long-term maintainability.
See CONTRIBUTING.md for details on project principles and conventions.


Author

Jerusha Gray
Marketing Operations, MarTech & Data Strategy
IdeoPraxis Collective LLC β€” DBA GetFunnelCaked


πŸͺͺ License

Licensed under the MIT License.
Β© 2025 IdeoPraxis Collective LLC β€” DBA GetFunnelCaked

Releases

No releases published

Packages

No packages published