Individual Assignment 1
Developing a Customer Segmentation Strategy for Targeted
Marketing
1.0 Introduction
You work for a retail company aiming to improve its marketing strategy by segmenting customers
based on their purchasing behaviors and demographics. The goal is to develop tailored marketing
campaigns for each segment, maximizing engagement, conversion rates, and overall customer
satisfaction. Customer segmentation will allow the company to allocate resources efficiently,
focusing on segments most likely to respond to targeted marketing efforts. You’ll analyze
customer data and prepare it for modeling.
2.0 Dataset
You are tasked with designing the ABT from the database that looks like figure 1. This is called
data schema and shows how different tables are connected to each other. For example, the table
“Purchase_info” has information about different purchases and is connected to both
“Customer_info” and “Product_info” through customer_id and product_id.
3.0 Converting a Business Problem into an Analytical Solution
Assess Feasibility (10 pts):
• Task: Define specific business goals for segmentation (e.g., increase engagement by
15%).
• Task: Evaluate if the dataset is sufficient for meaningful segmentation. Identify additional
data that could be useful (e.g., website interactions, social media data).
Designing the Analytics Base Table (ABT) (15 pts):
• Task: Discuss how to combine these tables (Customer_Info, Product_Info,
Customer_Channel_Usage, Campaign_Info, Purchase_Info) to create the ABT. For
instance, linking customer_id across tables can provide a comprehensive view of each
customer's profile and purchasing patterns, which is essential for accurate segmentation.
• Task: Design the ABT schema, listing relevant attributes for segmentation and justifying
each attribute’s inclusion (e.g., purchase frequency, engagement level). For the features,
mention if they are raw or derived. How you derive them and what’s their significance.
Figure 1. Data Schema
Designing & Implementing Features (15 pts):
• Task: Identify at least ten features to improve segmentation, combining raw features
(e.g., demographics, monetary value, etc.) and derived ones (e.g., recency, frequency,
etc.).
• Task: Derive these features in the dataset and explain the logic for each (e.g., method of
calculation, reason for inclusion).
4.0 Know your Data and Prepare it
In this part, you’ll be working with a fictitious marketing campaign dataset (attached to the
assignment as data.csv) containing 500 rows and 15 features, covering customer demographics,
purchase behavior, and campaign engagement metrics. The dataset includes:
Feature Description
customer_ID Unique identifier for each customer
age Customer’s age
gender Customer’s gender
income_level Customer’s income category (Low, Medium, High)
location Geographic location (Urban, Suburban, Rural)
last_purchase_date Date of the most recent purchase
days_since_last_purchase Days since last purchase
total_purchase_amount Total amount spent by the customer
avg_spent_per_purchase Average spent per purchase
purchase_frequency Number of purchases made
product_category Primary product category purchased
preferred_channel Primary channel for purchase (e.g., in-store, online, mobile app)
campaign_engagement Engagement level with campaigns (Low, Medium, High)
campaign_type Type of campaign (Email, SMS, In-App Notification)
campaign_clicks Click counts per customer
campaign_opens Open counts per customer
response Response to past campaigns (1 for response, 0 for no response)
Finding Invalid or Incorrect Data (10 pts):
• Task: Identify any incorrect or invalid values in the dataset that may lead to inaccurate
results during analysis. This could include:
▪ Non-numeric entries in numeric columns (e.g., "unknown" or "error" in age or
campaign_clicks).
▪ Out-of-range values, such as ages over 120 or negative values in campaign_clicks.
▪ Unexpected values in categorical columns (e.g., income_level containing values
outside the standard categories of "Low," "Medium," and "High").
• Suggested Method: Use data profiling methods (e.g., data type checks, range validation,
or custom value lists for categories) to locate invalid entries. Search on how you can do
these using python.
• Outcome: Document any findings of incorrect data, and suggest handling approaches,
such as data cleaning or conversion, based on the context of each issue.
Data Quality Report (DQR) (10 pts): DQR consists of two tables that show statistics about data.
As explained in the class we need separate tables for each data type.
• For continuous features fill in the following table:
Feature count % missing min max median average Std. Dev
---
---
---
• For categorical features fill in the following table:
Feature count % missing Mode Mode ferq. 2nd Mode 2nd Mode freq.
---
---
---
Handling Missing Values (10 pts):
• Task: Select an approach for handling missing values (e.g., imputation, deletion) and
provide a justification.
Handling Outliers (10 pts):
• Task: Identify potential outliers using visualizations and statistical methods. Include any
charts, plots, or calculations you use for this step.
5.0 Grading Rubric
Key Points Grade Allocation (%)
Converting a Business Problem into an Analytical Solution 40
Know your Data and Prepare it 40
Overall content and format (APA style, font type, size, table, formulas), 20
including references if required.
N.B. Failure to comply with the above would result in low grades.
6.0 Word Limit
MS Word document maximum 1000 words (excluding tables, and appendix)