Repository for the Master's Thesis "Aspect-Based Sentiment Analysis of German Hospital Reviews"
Hospitals rely on patient feedback to improve healthcare services. Online reviews have become an important resource for understanding patient experience and provide valuable insights into healthcare services. However, analyzing unstructured reviews remains challenging, especially in low-resource languages like German. This thesis presents an Aspect-Based Sentiment Analysis (ABSA) approach for German hospital reviews.
.
├── data/ # Data files
├── functions/ # Utility functions
│ ├── ate_model_train.py
│ ├── ate_model_train_OB.py
│ ├── absa_model_train.py
│ └── ate_absa_pipeline.py
├── pipeline_results/ # Pipeline output results
│
├── 1-scraping_data.ipynb # Extraction of patient reviews from klinikbewertungen.de
├── 2-data_exploration.ipynb # Exploratory Data Analysis (EDA)
├── 3-data_preparation_for_labeling.ipynb # Data preparation for labeling
├── 4-data_preprocessing.ipynb # Aspect Category Detection + preprocessing
├── 5-data_labeling.ipynb # EDA of labeled data + format conversion
├── 6-name_anonymization.ipynb # Data anonymization
├── 7-ATE_OB.ipynb # ATE training (OB-Tagging)
├── 7-ATE_OB_performance.ipynb # ATE performance analysis
├── 7-ATE.ipynb # ATE training (OBI-Tagging)
├── 8-ABSA.ipynb # ABSA training
├── 8-ABSA_performance.ipynb # ABSA performance analysis
├── 9-ATE_ABSA_pipeline.ipynb # Complete ATE+ABSA pipeline
└── requirements.txt # Python dependencies
-
Data Collection & Preparation
1-scraping_data.ipynb: Scrape patient reviews from klinikbewertungen.de2-data_exploration.ipynb: Initial EDA3-data_preparation_for_labeling.ipynb: Prepare data for manual labeling
-
Labeling & Preprocessing
4-data_preprocessing.ipynb: Aspect Category Detection + preprocessing- Manual labeling required between steps 4 and 5
5-data_labeling.ipynb: Process labeled data6-name_anonymization.ipynb: Optional data anonymization
-
Model Training
7-ATE_OB.ipynb: Train Aspect Term Extraction (OB-Tagging)8-ABSA.ipynb: Train Aspect-Based Sentiment Analysis- Performance analysis notebooks evaluate each component
-
Pipeline Implementation
9-ATE_ABSA_pipeline.ipynb: Combine best models into final pipeline
Python dependencies are listed in requirements.txt.
The annotated German hospital review dataset, used in this work, can be requested from the author, as well as access to the best fine-tuned models and tokenizers.
Computations for this work (notebooks 6-9) were performed using resources of the Leipzig University Computing Center.