Map showing the files in a repository that have the most changes; full SVG image available in repo
Git-Heat-Map is a tool designed to visualize the activity within a Git repository. By analyzing commit data, it generates an interactive treemap that highlights files based on the number of changes (lines added or removed). This visualization helps in identifying hotspots in the codebase, understanding contributor activity, and tracking project evolution over time.
Follow these steps to set up and use Git-Heat-Map with your private repository:
-
Clone the Repository:
Ensure you have cloned the repository to your local machine.
git clone /path/to/your/private/repo.git cd repo -
Set Up a Python Virtual Environment:
It's recommended to use a virtual environment to manage dependencies.
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Required Modules:
Install the necessary Python packages using
pip.pip3 install -r requirements.txt
-
Generate the Database:
Process the Git history of your repository to generate the SQLite database.
python3 generate_db.py /path/to/your/private/repo/
-
Run the Web Server:
Start the Flask web server to serve the heatmap visualization.
python3 app.py
-
Alternative: You can also use Flask's CLI to run the server.
flask run
-
To run the server on a specific IP address (e.g., accessible from other machines on your network), use:
flask run --host=0.0.0.0
-
-
-
Access the Interface:
Open your web browser and navigate to http://127.0.0.1:5000 to access the Git-Heat-Map interface.
-
Interact with the Heatmap:
-
Select Repository: Choose the repository you want to visualize from the available list.
-
Apply Filters: Add filters based on emails, commits, filenames, and date ranges to highlight specific activity.
- Browse Buttons: Use the "Browse" buttons to view and select valid filter values.
- Manual Input: Alternatively, input valid SQLite LIKE patterns directly.
- Exclusion: Clicking on filter entries will exclude results matching those entries.
-
Visualization Settings:
- Highlighting: By default, highlight hues are determined by file extensions. This can be manually overridden as needed.
- Performance Options: Adjust levels of text rendering and set the minimum size of boxes to optimize performance.
-
Update Visualization:
- Submit Query: Click to apply filters and update the highlighted files.
- Refresh: Update the highlighting hue and redraw based on the current window size.
- Navigation: Click on directories within the heatmap to zoom in, and use the back button in the sidebar to zoom out.
-
The project is divided into two main components:
-
Git Log → Database
- Functionality: Processes the entire Git history of a repository using
git logand stores the data in a structured SQLite database. - Database Tables:
- Files: Tracks filenames.
- Commits: Stores commit hashes, authors, and committers.
- CommitFile: Associates files with commits, recording lines added and removed.
- Author: Maintains author names and emails.
- CommitAuthor: Links commits to authors, supporting multiple authors per commit.
- Purpose: Enables analysis of file activity and contributor behavior within the repository.
- Functionality: Processes the entire Git history of a repository using
-
Database → Treemap
- Functionality: Queries the SQLite database to generate a JSON object representing the file tree structure, then creates an interactive treemap visualization.
- JSON Structure:
{ "type": "directory", "name": "root", "aggregate": 0, "children": [ { "type": "file", "name": "file1.py", "data": 150 }, { "type": "directory", "name": "subdir", "aggregate": 200, "children": [ // Nested files or directories ] } ] } - Visualization: The treemap's rectangles represent files and directories, sized according to the number of line changes. Interactive features allow users to zoom in/out and apply filters to highlight specific areas of interest.
Performance metrics were obtained on a personal machine and may vary based on hardware and repository size.
| Repo | Number of Commits | Git Log Time | Git Log Size | Database Time | Database Size | Total Time |
|---|---|---|---|---|---|---|
| ExampleRepo | 10,000 | 2 minutes | 30MB | 25 seconds | 50MB | 2.5 minutes |
- Scaling: Time and database size scale linearly with the number of commits.
| Repo | Author Filter | Drawing Treemap Time | Highlighting Treemap Time |
|---|---|---|---|
| ExampleRepo | user@example.com | 1.2 seconds | 2.5 seconds |
- Note: Actual rendering times may vary based on browser performance and visualization complexity.
Note: Ensure that all paths, repository names, and other placeholders are updated to reflect your actual project details. Additionally, if you are using this tool with a private repository, handle sensitive information appropriately and restrict access as needed.
If you have any further questions or need additional assistance, feel free to reach out!