Longitudinal Data Merger is a Streamlit-based application designed to help researchers merge multiple longitudinal datasets with ease. The tool supports both wide (horizontal) and long (vertical) merge formats, enabling you to combine datasets from different waves or time points with flexible join options and comprehensive data cleaning and analysis features.
The Longitudinal Data Merger app allows you to:
- Upload Datasets: Supports CSV, Excel (.xlsx), and SPSS (.sav) files.
- Assign Waves: Easily assign a wave number to each dataset.
- Select a Primary Key: Choose a unique identifier (primary key) to ensure correct merging.
- Merge in Two Formats:
- Wide Merge: Merge datasets horizontally with customizable join types (inner, left, right, outer).
- Long Merge: Merge datasets vertically while retaining a 'Wave' identifier for each observation.
- Data Cleaning: Remove duplicate entries, fill missing values, and inspect missing data through summaries and visualizations.
- Preview & Download: View merged dataset previews, summary statistics, and download the final merged dataset.
- Supported File Formats:
- CSV, Excel (.xlsx), and SPSS (.sav)
- Wave Assignment:
- Assign wave numbers to datasets via the sidebar.
- Primary Key Selection:
- Choose a unique identifier that exists across your datasets.
- Merge Options:
- Wide (horizontal) merge with various join types.
- Long (vertical) merge that retains a wave identifier.
- Data Cleaning & Duplicate Removal:
- Automatically strips extra spaces from column names and filters datasets based on specific criteria (e.g.,
StatusandFinishedcolumns).
- Automatically strips extra spaces from column names and filters datasets based on specific criteria (e.g.,
- Missing Values Analysis:
- Get both a summary table and a heatmap visualization of missing data.
- Session Persistence:
- Work is preserved throughout your session.
- Downloadable Results:
- Easily download the merged dataset in CSV format.
-
Clone the Repository
git clone https://github.com/yourusername/longitudinal-data-merger.git cd longitudinal-data-merger -
Create a Virtual Environment (Optional but Recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Required Dependencies
The dependencies are listed in the
requirements.txtfile. Install them using:pip install -r requirements.txt
-
Run the Streamlit App
Launch the application with:
streamlit run main.py
-
Follow the On-Screen Instructions
-
Step 1: Upload Datasets
Use the sidebar to upload your datasets (CSV, Excel, or SPSS files). -
Step 2: Assign Wave Numbers
Assign a wave number to each dataset using the provided number inputs. -
Step 3: Preview Uploaded Datasets
Expand the "View Uploaded Datasets" section to see a preview of your data. -
Step 4: Select Primary Key
Choose a unique case identifier that is present in all datasets. -
Step 5: Configure Merge Options
- Wide Merge: Choose the join type (inner, left, right, outer) to merge datasets horizontally.
- Long Merge: Merge datasets vertically while retaining wave information.
-
Step 6: Data Cleaning Options
Optionally fill missing values across datasets. -
Step 7: Merge Datasets
Click the "Merge Datasets" button to perform the merge. The app will display summary statistics, a preview of the merged dataset, and a missing values analysis.
-
-
Download the Merged Dataset
Once the merge is complete, download the merged dataset as a CSV file directly from the app.
.
├── main.py # Main Streamlit application code
├── requirements.txt # List of required Python packages
└── README.md # This file
Contributions are welcome! If you have suggestions, bug fixes, or improvements:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Commit your changes.
- Submit a pull request detailing your modifications.
This project is open-source and available under the MIT License.
Gabriele Di Cicco, PhD in Social Psychology
GitHub | ORCID | LinkedIn
Happy Merging!