MACHINE LEARNING PROJECT ONE ( 30 % )
1. Select title
Define your title
2. Problem definition
Is it:-
Classification
Regression
Clustering
Define your problem in statement
3. Data
This may involve:-
Sourcing
Defining different parameters
Talking to experts about it
4. Evaluation
The evaluation metric is something you might define at the start of a project
5. Features
Features are different parts of the data, during this step, you will want to start
finding out what you can about the data.
One of the most common way to do this is to create a data dictionary.
6. Preparing the tools
This is where you may want to consolidate every library you have used at the top
of your note book. Especially the library’s you will likely take advantage during
almost every structured data project.
Pandas for data analysis
NumPy for numerical operating
Matplotlib / seaborn for plotting or data visualization
Scikit-learn for machine learning modelling and evaluations
7. Load data
8. Data exploration ( Exploratory data analysis or EDA )
Since EDA has no real set methodology, the following is a short check list you
might want to walk through
What question (s) are you trying to solve ?
What kind of data do you have and how do you treat different types?
What is missing from the data and how do you deal with it?
How can you Compare different columns to each other, compare them to
the target variable and correlation between in dependent variables?
How can you add, change or remove features to get more out of your
data?
9. Modelling
features and labels
training and test split
model choices
model comparison
hyperparameter tuning and cross-validation
10. Evaluating your model, beyond the default score() evaluator
For classification
ROC curve and AUC score
Confusion matrix
Classification report
Precision
Recall
F1-score
For regression
Mean Absolute Error (MAE)
Root Mean Squared Error (RMSE)
11. Feature importance
Feature importance is another way of asking , “which features contributing most
to the outcomes of the model?”
12. Experimentation
After trying a few different things, you’d ask yourselves did you meet the
evaluation metric.
Remember you defined one in step 4
Did you achieve your evaluation if not?
A good next step would be discuss with your team on your own different options
of going for word.
Could you collect more data?
Could you try a better model?
Could you improve the current model?
13. Save the model
If your model is good enough, how would you export it and share it with others?
INSTRUCTIONS
Write your project at jupyter note book
You have to use google drive to back up your project
You have to upload your project to GitHub repository and send for me its
link
Make it individually if you have special case communicate me
It is out of 30% for your lab scores
Submission date:- 22-09-2016 E.C