0% found this document useful (0 votes)
125 views7 pages

Data Mining and Warehousing: Predicting The Outcome of ODI Matches

The document discusses predicting the outcome of One Day International cricket matches using data mining techniques. It explores using k-Nearest Neighbors, Decision Trees, and Naive Bayes classifiers on a dataset of match statistics scraped from cricinfo between 2006-2011. The kNN algorithm performed best, correctly predicting the winner in over 70% of validation matches. Factors analyzed included home field advantage, toss result, batting order, match timing, opponent, and venue. The models can help teams strategize to increase chances of victory.

Uploaded by

Vandhana Rathod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views7 pages

Data Mining and Warehousing: Predicting The Outcome of ODI Matches

The document discusses predicting the outcome of One Day International cricket matches using data mining techniques. It explores using k-Nearest Neighbors, Decision Trees, and Naive Bayes classifiers on a dataset of match statistics scraped from cricinfo between 2006-2011. The kNN algorithm performed best, correctly predicting the winner in over 70% of validation matches. Factors analyzed included home field advantage, toss result, batting order, match timing, opponent, and venue. The models can help teams strategize to increase chances of victory.

Uploaded by

Vandhana Rathod
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

IT-633 Data Mining and warehousing

Report

IT-633
​ Data Mining and Warehousing

​Predicting the outcome of ODI matches

14 November, 2016

Autumn, 2015-16

DA-IICT, Gandhinagar
IT-633 Data Mining and warehousing

Report

Problem Definition:

Analyzing time oriented data and forecasting are among the most important problems
that analysts face across many fields. It is one of the core topics of research in data
mining. Here different approaches for predicting the outcome of One-Day International
(ODI) cricket match has been presented.This study helps us in finding consistent
approach that allows one to predict the match outcome with a great accuracy. Here we
have studied a prediction system that takes in historical match data as well as the
occurring state of a match, and predicts future match event results in a victory or loss.
A range of variables that could define the outcome of an ODI cricket match has to be
explored. We have worked on the following algorithm for predicting the match outcome:
k-Nearest Neighbors (kNN) Decision Tree, and Naive Bayes. We describe our model
and algorithms and finally present quantitative results.

Motivation:

Cricket is the second most popular sports in the world. The ICC cricket World Cup is the
second largest single sporting event in the world, drawing a cumulative television
audience of 2-3 billion people. There is huge commercial interest in strategic planning
for ensuring victory and in game outcome prediction. This has motivated thorough and
methodical analysis of individual and team performance, as well as prediction of future
games, across all formats of the game. Board, coach and captain can use this tool to
shape their strategies and plans. For instance, if tool predicts a WIN for coming match,
they could go confident in ground with a proper game plan and if it predicts a LOSS,
they could adjust their strategies accordingly by being more alert and careful while
playing to turn the match in must win game. Moreover, this study will help analysts to
discover winning pattern of Indian team against all other oppositions

Related Work:

One of the earliest and pioneering works in cricket was by Duckworth and Lewis where
they introduce the Duckworth Lewis or D-L method, which allows fair adjustment of
scores in proportion to the time lost due to match interruption (often due to adverse
weather conditions such as rain, poor visibility etc.). This proposal has been adopted by
the International Cricket Council (ICC) as a means to reset targets in matches where
time is lost due to match interruptions.
IT-633 Data Mining and warehousing

Report

Home field advantage, winning the toss, game plan (batting first or fielding first), match
type (day or day & night), competing team, venue familiarity and season in which the
match is played will be key features studied for the research . For purposes of study
three algorithms are used: k-nearest Neighbour, Decision Tree and Naïve Bayes.

Experiment & Outcomes:

Dataset:

To retrieve all the required statistics, the entire dataset has been scraped from the
cricinfo website.The dataset includes all the matches played between 2006 and
2011. The dataset contains the basic match details including the two competing
teams, the outcome of the toss, first batting, target, day/night, top scorer, and the
winner of the match for all the matches.
We have restricted our study to only top 2 ODI-playing teams, namely, India and
England. Since the impact of the nature of the game cannot be foreseen, a total of
22 matches which were either interrupted by rain or ended up in a draw/tie, have
been removed from the dataset. Finally, we divided the dataset into two parts,
namely, the training data and the validation data. The training dataset contains all
the matches played during the years 2006 and 2007, and the validation dataset
contains all the matches played in the year 2008 and 2011. There are a total ​of 14
matches in training dataset and 8 matches in validation dataset
IT-633 Data Mining and warehousing

Report

India=1,England=0;Yes=1,No=0
.

Binary Classifiers:
Using various binary and numeric features and the outcome of the match as the
label, we evaluated a large number of binary classifiers using their
models, Decision Trees and kNN

● kNN Training Set:


IT-633 Data Mining and warehousing

Report

● kNN Validation Set:

Naive Bayes:
IT-633 Data Mining and warehousing

Report

Decision Tree:
IT-633 Data Mining and warehousing

Report

Conclusion:
This study brings an exceptional contribution to the literature relating to the new time
series prediction problem i.e. predicting the outcome of One Day International cricket
match. Several unique approaches adopted for dataset formation and classification
model learning have established a worthy statistical approach. Whole dissertation
revolves around formation of accurate dataset and then finding the smart attributes out
of it. It can be observed that, being a simplest algorithm, kNN has outperformed the
other classification algorithms (viz. Decision Tree and Naive Bayes).

References:
[1] Vignesh Veppur Sankaranarayanan, Junaed Sattar and Laks V. S.
Lakshmanan.Autoplay: A Data Mining Approach to ODI Cricket Simulation and
Prediction:pp 1-9,2014.
[2] Mehvish Khan, Riddhi Shah, Role of External Factors on Outcome of a One Day
International Cricket (ODI) Match and Predictive Analysis: pp 192-197, June 2015.
[3] Madan Gopal Jhanwar and Vikram Pudi, Predicting the Outcome of ODI Cricket
Matches: A Team Composition Based Approach: pp 1-10​.

You might also like