Machine Learning 1 - Coded
Project
INN Hotels Group
Business Report
Table of Contents
1.1Context
1.2Objective
1.3Data Description
1.3.1Data Dictionary
1.4Rubrics
1.Exploratory Data Analysis
2.Data Preprocessing
3.Model Building
4.Model Performance Evaluation
5.Model Performance Comparison and Final Model Selection
6.Actionable Insights & Recommendations
1.5Actionable Insights & Recommendations
1.5.1 Impact of Booking Lead Time on Cancellations
1.5.2 How Special Requests Might Indicate Customer Satisfaction
1.5.3 Effect of Meal Plan Selection on Cancellation Likelihood
1.5.4 Suggestions for Reducing Cancellations
1.6Conclusion
1.1Context
A significant number of hotel bookings are called off due to cancellations or no-shows. The typical reasons for
cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of
charge or preferably at a low cost which is beneficial to hotel guests, but it is a less desirable and possibly revenue-
diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations. The new
technologies involving online booking channels have dramatically changed customers' booking possibilities and
behaviour. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited
to traditional booking and guest characteristics. The cancellation of bookings impacts a hotel on various fronts: 1. Loss of
resources (revenue) when the hotel cannot resell the room. 2. Additional costs of distribution channels by increasing
commissions or paying for publicity to help sell these rooms. 3. Lowering prices last minute, so the hotel can resell a
room, resulting in reducing the profit margin. 4. Human resources to plan for the guests.
1.2Objective
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking
is likely to be cancelled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of
booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyse the
data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict
which booking is going to be cancelled in advance, and help in formulating profitable policies for cancellations and refunds.
1.3Data Description
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
1.3.1Data Dictionary:
• Booking_ID: the unique identifier of each booking
• no_of_adults: Number of adults
• no_of_children: Number of Children
• no_of_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
• no_of_week_nights: Number of weeknights (Monday to Friday) the guest stayed or booked to stay at the hotel
• type_of_meal_plan: Type of meal plan booked by the customer:
• Not Selected – No meal plan selected
• Meal Plan 1 – Breakfast
• Meal Plan 2 – Half board (breakfast and one other meal)
• Meal Plan 3 – Full board (breakfast, lunch, and dinner)
• required_car_parking_space: Does the customer require a car parking space? (0 - No, 1- Yes)
• room_type_reserved: Type of room reserved by the customer. The values are ciphered (encoded) by INN Hotels
Group
• lead_time: Number of days between the date of booking and the arrival date
• arrival_year: Year of arrival date
• arrival_month: Month of arrival date
• arrival_date: Date of the month
• market_segment_type: Market segment designation.
• repeated_guest: Is the customer a repeated guest? (0 - No, 1- Yes)
• no_of_previous_cancellations: Number of previous bookings that were canceled by the customer prior to the current
booking
• no_of_previous_bookings_not_canceled: Number of previous bookings not canceled by the customer prior to the
current booking
• avg_price_per_room: Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
• no_of_special_requests: Total number of special requests made by the customer (e.g. high floor, view from the room,
etc)
• booking_status: Flag indicating if the booking was canceled or not.
1.4Rubrics
1.Exploratory Data Analysis (EDA)
•EDA is an important part of any project involving data.
•It is important to investigate and understand the data better before building a model
with it.
•A few questions have been mentioned below which will help you approach the
analysis in the right manner and generate insights from the data.
•A thorough analysis of the data, in addition to the questions mentioned below,
should be done.
Leading Questions Answered:
A. What are the busiest months in the hotel?
August, September, October = August with 10.5%, September with 12.07% and October
with 14.7% of the total booking for the year.
B. Which market segment do most of the guests come from?
Online 23214 or 64% of the bookings come via the internet.
C. Hotel rates are dynamic and change according to demand and customer demographics. What are the differences in room
prices in different market segments?
Online booking are the highest despite also having the highest amount of free rooms (I suppose they are redeemed from
online retailers points systems) Aviation, Offline, and Corporate are generally slightly lower priced with Corporate edging out
for the lowest. Complimentary are of course free.
D. What percentage of bookings are canceled?
About 1/3 (11885) of bookings are canceled in the sample data.
E. Repeating guests are the guests who stay in the hotel often and are important to brand equity. What percentage of
repeating guests cancel?
Repeating guest rarely cancel (1.75%).
F. Many guests have special requirements when booking a hotel room. Do these requirements affect booking cancellation?
The absence of special request increases the likelihood of cancellation, the addition of special request begins to reduce the
likelihood of cancellation at one and progressively reduces cancellation to Zero on the instance of a third request.
2.Data Preprocessing
Missing value treatment Duplicate Value Check:
no missing values
3.Model Building
Splitting the data into training and testing sets.
Building a Logistic Regression model
4.Model Performance evaluation
5.Model Performance Comparison and
Final Model Selection
Building a Decision Tree model
•The tree scores very well at accuracy, it captures most of the data.
•With 11885 predictions of cancellation and actual of 11989 this isn't a good model.
•Since we want to avoid cancellations, we will use recall to find data that will help reduce
that number overall.
•With improved closeness in the training and testing accuracy we have successfully eliminated most of the noise from the
first model (dTree)
•Having the accuracy up to 78/79% is also an improvement.
•Also, very close with the recall metric, making this a much better model already than the first model.
•The estimator is not much different than the pre-pruned tree, in fact a little worse on the accuracy metrics from these
numbers
Cost Complexity Pruning
•The trees with restricted maximum tuning and hyperparameter tuning performed the best while reducing
overfitting. I would submit one those the model to the client.
6.Actionable Insights & Recommendations
- Actionable insights and recommendations
1.5Actionable Insights & Recommendations
1.5.1Impact of Booking Lead Time on Cancellations
•Observation:
The Logistic Regression and Decision Tree models show moderate accuracy in predicting cancellations, but overall,
the Naive Bayes model performs best. A possible reason could be that certain variables, such as lead time, have a
significant impact on cancellations. Typically, longer lead times may lead to higher cancellation rates, as
customers' plans may change more often with more time between booking and arrival. •Recommendation: The
hotel can consider stricter cancellation policies for bookings made with long lead times or offer incentives for
early non-refundable bookings to reduce cancellations.
1.5.2How Special Requests Might Indicate Customer Satisfaction
•Observation: The analysis showed a relationship between the number of special requests and cancellations.
Customers who make special requests may have higher expectations, and if these expectations are not met, they might
be more likely to cancel. However, this feature did not show a strong predictive power in the models, suggesting it
might not be the most significant factor in cancellation prediction.
•Recommendation: The hotel should ensure that special requests are noted and fulfilled whenever possible, as this can
reduce the likelihood of cancellations and improve customer satisfaction. Additionally, understanding common requests
could help the hotel proactively offer similar amenities to other guests. 1
1.5.3Effect of Meal Plan Selection on Cancellation Likelihood
•Observation: Meal plan selection might be an indicator of customer commitment to their booking. Guests who select
a full or half-board meal plan might be less likely to cancel, as they have invested more in the stay. This was reflected in
the data, though it wasn't the strongest predictor.
•Recommendation: Encourage customers to choose meal plans at the time of booking by offering discounts or bundling
services. This could potentially reduce cancellations as customers become more committed to their reservations.
1.5.4Suggestions for Reducing Cancellations
• Stricter Cancellation Policies: Implement stricter cancellation policies for high-risk bookings, such as those with long
lead times or made through channels with higher cancellation rates. This can include non- refundable deposits or
shorter cancellation windows.
• Incentives for Early Confirmation: Offer incentives like discounts or room upgrades for guests who confirm their
bookings early or choose non-refundable options.
• Targeted Communication: Use predictive analytics to identify high-risk bookings and reach out to these customers
with reminders or offers that encourage them to keep their reservations.
• Improve Customer Experience: Focus on improving aspects of the stay that are important to guests, such as fulfilling
special requests, to reduce the likelihood of cancellations.
1.6Conclusion
The Naive Bayes model provided the best performance, suggesting that it can be used to help predict cancellations
effectively. However, accuracy rates indicate there is room for improvement, possibly through more advanced modeling
techniques or additional data features. The hotel can use the insights gained from this analysis to refine their policies
and customer service strategies to minimize cancellations and maximize revenue.