AI User-Feedback Analysis

Requirements

pip install -r requirements.txt
run _create_directories.py to automatically create all required folders if they do not already exist

Observations:

googletrans package uses an older httpx package (0.19.0)
all the necessary AI packages (e.g., mistralai, openai) require a new version of the httpx package (0.28.1)

To work around this conflict, a manual version switch of the httpx package was performed, depending on which functionality was needed.

Pipeline (already implemented)

1. Preparation for Analysis

review scraping (datasets_creation_google.py)
review preprocessing for both LLM (datasets_preprocessing_LLMs.py) and LDA (datasets_preprocessing_LDA.py)

2. LDA code

LDA implementation ~ topic modelling + classification + coherence score (topic_modeling_LDA.py)
LDA classification interpretation (interpretation_classification_LDA.ipynb)

3. ChatGPT code

ChatGPT-4o-mini issue extraction (issue_modelling_ChatGPT.py)
ChatGPT-4o-mini evaluation ~ coherence score + cosine similarity with LDA (evaluation_ChatGPT.ipynb)
ChatGPT-4o-mini classification (classification_ChatGPT.py)
ChatGPT-4o-mini classification interpretation (interpretation_classification_ChatGPT.ipynb)

4. Gemini code

supports two models (1.5_pro, 2.0_flash)

Gemini issue extraction (issue_modelling_Gemini.py)
Gemini evaluation ~ coherence score + cosine similarity with LDA (evaluation_Gemini.ipynb)
Gemini classification (classification_Gemini.py)
Gemini classification interpretation (interpretation_classification_Gemini.ipynb)

5. Claude code

Claude-3.5-Sonnet issue extraction (issue_modelling_Claude.py)
Claude-3.5-Sonnet evaluation ~ coherence score + cosine similarity with LDA (evaluation_Claude.ipynb)

6. Mistral code

supports two models (large_2411, small_2501)
Although, due to multiple issues encountered for small_2501, this one was issues only for the issue extraction and evaluation

Mistral issue extraction (issue_modelling_Mistral.py)
Mistral evaluation ~ coherence score + cosine similarity with LDA (evaluation_Mistral.ipynb)
Mistral-large-2411 classification (classification_Mistral.py)
Mistral-large-2411 classification interpretation (interpretation_classification_Mistral.ipynb)

7. LLMs comparison

issue comparison ~ hierarchical graph ~ clustered graph (LLMs_comparison_issue_modelling.py)
per-review/classification comparison ~ Jason-Shannon Divergence (on both LLM-Specific Space and Union Space) ~ review agreement ~ Cohen's Kappa ~ Krippendorff Alpha (LLMs_comparison_classification.py)

8. Time Analysis

issue impact on star ratings and issue frequency over time (time_evolution.py)
issue + frequency forecasting with Gemini 2.0 flash (forecasting_Gemini.py)
forecasting evaluation (forecasting_evaluation.py)

9. Cumulative Link Models Analysis

effects of issues on the Start Ratings ~ moderating effects of years on the issue-rating relationship (CLM_evaluation.py)

Supported Applications

Governmental Applications: KopieID, Reisapp, MijnOverheid, DigiD

Performed Analysis According to Application

Application	Preprocessed Data	LDA modelling	LLM issue extraction	LLM classification	LLM comparison	Time Analysis	CLM Analysis
KopieID	LDA:✔️ , LLM:✔️	✔️	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large, Claude	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large	✔️	star-issue timeline, frequency timeline, Gemini Forecasting	✔️
Reisapp	LDA:✔️ , LLM:✔️️	✔️	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large, Claude	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large	✔️	star-issue timeline, frequency timeline, Gemini Forecasting	✔️
MijnOverheid	LDA:✔️ , LLM:✔️	✔️	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large, Claude	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large	✔️	star-issue timeline, frequency timeline, Gemini Forecasting	✔️
DigiD	LDA:✔️ , LLM:✔️	✔️	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large, Claude	GPT 4o mini, Gemini 1.5, Gemini 2.0, Mistral large	✔️	star-issue timeline, frequency timeline, Gemini Forecasting	✔️

Directories explanations

datasets_raw: contains the unprocessed reviews extracted by google play scraper.
datasets_preprocessed_LDA: contains the preprocessed reviews for the LDA algorithm.
datasets_preprocessed_LLM: contains the preprocessed reviews for the LLMs.
results_LDA: contains five folders, one for the per-review distributions, one for the LDA html visualizations, one for the coherence heatmaps, one for topic + words extractions, and one for the additional plots obtained in the interpretation notebook.
results_ChatGPT: contains four folders, one for the extracted issues, one for per-review distributions, one for the coherence heatmap, and one for the additional plots obtained in the evaluation and interpretation notebooks.
results_Gemini: contains four folders, one for the extracted issues, one for per-review distributions, one for the coherence heatmap, and one for the additional plots obtained in the evaluation and interpretation notebooks.
results_Claude: contains three folders, one for the extracted issues, one for the coherence heatmap and one for the additional plots obtained in the evaluation notebook.
results_Mistral: contains four folders, one for the extracted issues, one for per-review distributions, one for the coherence heatmap, and one for the additional plots obtained in the evaluation and interpretation notebooks.
LLM_comparison: contains two folders, one for the extracted issues in which the cluster and hierarchical graphs are saved for each app, and one for the per-review topic distributions in which the heatmaps for both types of JS divergence and the bar plot for the review agreement are presented (+ plot with 3 agreement metrics).
time_analysis: contains four folders, one with the plots describing the impact of the issues on the star ratings over time, one with the plots showing the frequency of each issue over time, one for the forecasted issues (and suggestions), and one for the plots used in forecasting evaluation.
CLM_analysis: contains two folders, one with the bar plots reflecting the overall issue effects on the star rating, and one with the moderating effects of years on the issue-rating relationship

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI User-Feedback Analysis

Requirements

Pipeline (already implemented)

1. Preparation for Analysis

2. LDA code

3. ChatGPT code

4. Gemini code

5. Claude code

6. Mistral code

7. LLMs comparison

8. Time Analysis

9. Cumulative Link Models Analysis

Supported Applications

Performed Analysis According to Application

Directories explanations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.idea		.idea
CLM_analysis		CLM_analysis
LLM_comparison		LLM_comparison
__pycache__		__pycache__
datasets_preprocessed_LDA		datasets_preprocessed_LDA
datasets_preprocessed_LLM		datasets_preprocessed_LLM
datasets_raw		datasets_raw
logo_images		logo_images
results_ChatGPT		results_ChatGPT
results_Claude		results_Claude
results_Gemini		results_Gemini
results_LDA		results_LDA
results_Mistral		results_Mistral
time_analysis		time_analysis
.gitignore		.gitignore
CLM_evaluation.py		CLM_evaluation.py
LLMs_comparison_classification.py		LLMs_comparison_classification.py
LLMs_comparison_issue_modelling.py		LLMs_comparison_issue_modelling.py
README.md		README.md
_create_directories.py		_create_directories.py
classification_ChatGPT.py		classification_ChatGPT.py
classification_Gemini.py		classification_Gemini.py
classification_Mistral.py		classification_Mistral.py
datasets_creation_google.py		datasets_creation_google.py
datasets_preprocessing_LDA.py		datasets_preprocessing_LDA.py
datasets_preprocessing_LLMs.py		datasets_preprocessing_LLMs.py
evaluation_ChatGPT.ipynb		evaluation_ChatGPT.ipynb
evaluation_Claude.ipynb		evaluation_Claude.ipynb
evaluation_Gemini.ipynb		evaluation_Gemini.ipynb
evaluation_Mistral.ipynb		evaluation_Mistral.ipynb
forecasting_Gemini.py		forecasting_Gemini.py
forecasting_evaluation.py		forecasting_evaluation.py
functions_helpers_eval.py		functions_helpers_eval.py
interpretation_classification_ChatGPT.ipynb		interpretation_classification_ChatGPT.ipynb
interpretation_classification_Gemini.ipynb		interpretation_classification_Gemini.ipynb
interpretation_classification_LDA.ipynb		interpretation_classification_LDA.ipynb
interpretation_classification_Mistral.ipynb		interpretation_classification_Mistral.ipynb
issue_modelling_ChatGPT.py		issue_modelling_ChatGPT.py
issue_modelling_Claude.py		issue_modelling_Claude.py
issue_modelling_Gemini.py		issue_modelling_Gemini.py
issue_modelling_Mistral.py		issue_modelling_Mistral.py
requirements.txt		requirements.txt
time_evolution.py		time_evolution.py
topic_modelling_LDA.py		topic_modelling_LDA.py

Anca-Mt/THESIS

Folders and files

Latest commit

History

Repository files navigation

AI User-Feedback Analysis

Requirements

Pipeline (already implemented)

1. Preparation for Analysis

2. LDA code

3. ChatGPT code

4. Gemini code

5. Claude code

6. Mistral code

7. LLMs comparison

8. Time Analysis

9. Cumulative Link Models Analysis

Supported Applications

Performed Analysis According to Application

Directories explanations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages