Balaji Adithya Dhandapani Shanmugasundaram
Feb 07, 2025
This project analyzes an A/B test conducted on an advertising campaign to assess the impact of advertisements on user conversion rates. The dataset consists of users who were either exposed to advertisements or public service announcements (PSAs). The primary objective is to determine whether advertisements lead to a significant increase in conversions compared to PSAs.
Dataset Link
Additionally, the study examines the influence of key factors such as:
- The day of the week when the maximum number of ads were displayed
- The hour of the day when the most ads were viewed
- The total number of ads seen by users
- Interaction effects between these factors
Statistical techniques, including Chi-Square tests, ANOVA, and Two-Proportion Z-Tests, are applied, along with machine learning approaches such as Random Forest, to analyze the dataset.
- Unnecessary columns such as
user idwere removed. - Column names were renamed for clarity.
- Categorical variables (
test_group,most_ads_day,most_ads_hour) were converted to factor type. - The
convertedcolumn was transformed into a binary numerical format (0 for non-conversion, 1 for conversion).
- Summary statistics were generated to understand the distribution of key variables.
- Conversion rates were analyzed across the test groups (ad vs. PSA).
- Boxplots and histograms were used to visualize total ads distribution across test groups.
- Bar charts were employed to assess the trend of ads served by day and hour.
A Chi-Square test was performed to determine whether there is a significant association between exposure to advertisements and conversion rates.
Hypotheses:
- H₀: There is no significant difference in conversion rates between users exposed to ads and those who saw PSAs.
- H₁: There is a significant difference in conversion rates between the two groups.
The results indicated a statistically significant difference between the groups, confirming the effectiveness of advertisements in driving conversions.
A factorial ANOVA was conducted to assess the impact of multiple factors on conversion rates, including:
test_group(Ad vs. PSA)most_ads_day(Day of the week)most_ads_hour(Hour of the day)total_ads(Total number of ads viewed)
Interaction effects between these variables were also evaluated.
Findings:
- All main effects were statistically significant.
- A strong interaction effect was observed between
most_ads_dayandmost_ads_hour, suggesting that the effectiveness of advertisements varies depending on the time of exposure. - The number of ads viewed (
total_ads) was identified as the strongest predictor of conversion.
A post-hoc Tukey HSD test was performed to identify specific differences between groups for the factors most_ads_day and most_ads_hour.
Results were visualized using confidence interval plots, allowing for an assessment of which days and hours showed significant variations in conversion performance. The analysis helped pinpoint the most effective time slots for ad placement.
A Random Forest classification model was trained to predict conversion outcomes based on ad exposure and other variables.
Findings:
- The total number of ads viewed was the most significant predictor of conversion.
- The day and hour of ad exposure also contributed to conversion probability.
- The test group (ad vs. PSA) had a relatively lower impact on prediction compared to total ads seen.
- Users exposed to advertisements had a significantly higher conversion rate than those exposed to PSAs.
- Certain days of the week (Tuesday-Thursday, Friday) exhibited higher conversion rates compared to others.
- Conversion rates varied significantly across different hours of the day, with late mornings (10 AM - 12 PM) and evenings (6 PM - 9 PM) being the most effective periods for advertisements.
- The effectiveness of advertisements is influenced by the specific combination of day and hour.
- Some
(day, hour)combinations showed significantly higher conversions compared to others, emphasizing the need for targeted ad scheduling.
- A higher number of ads viewed led to an increase in conversions, but beyond a certain threshold, diminishing returns were observed.
- Overexposure to advertisements may not necessarily lead to additional conversions and can lead to ad fatigue.
- Prioritize advertisements on high-converting days (Tuesday-Thursday, Friday).
- Increase ad exposure during peak conversion hours (10 AM - 12 PM, 6 PM - 9 PM).
- Reduce ad spend during low-performing hours such as early mornings and late nights.
- Implement strategies to balance ad exposure and avoid oversaturation.
- Introduce retargeting strategies for users who were exposed to ads but did not convert.
- Conduct A/B tests on ad creatives to refine messaging and design.
- Explore user segmentation to personalize ad experiences based on audience demographics and behavior.
Ensure the following R packages are installed:
install.packages(c("readxl", "ggplot2", "dplyr", "tidyverse", "ggpubr", "car", "DescTools", "lmtest", "lmerTest", "randomForest"))- Clone the repository:
git clone https://github.com/your-repository-name.git cd your-repository-name - Open RStudio and set the working directory to the project folder.
- Run the R script AB_testing_analysis.R step by step.
- Review the generated Tukey HSD plots in the output directory:
tukey_Day_plot.pngtukey_hour_plot.pngtukey_interaction_plot.png
This repository provides a structured and comprehensive approach to evaluating A/B testing results for ad campaigns. By leveraging statistical and machine learning techniques, businesses can optimize their ad strategies for maximum efficiency.
For further improvements, additional segmentation analysis, personalization strategies, and advanced machine learning models can be explored.