In this project, we applied model selection methods on different regression models introduced in Chapter 6: Linear Model Selection and Regularization (from "An Introduction to Statistical Learning" by James et al). The 5 regression models we practiced are Ordinary Least Squares, Ridge Regression, Lasso Regression, Principle Components Regression and Partial Least Square Regression. In order to find the best parameter (or coefficients) for those regression models, we splitted our datasets into train and test groups, performed 10-fold cross validation on each model respectively, and in the end chose the best model with the least cross-validation error.
Contents are separated into five main sections (with each section corresponding to a directory)
- Code -- this is where regression scripts & utility functions & unit tests locate.
- Data -- data downloaded from the internet & created by code are stored in this directory.
- Images -- images created by exploratory analysis & regression are stored in this direcotry.
- Report -- contains a section sub-directory that includes all sections of the final report in separate files and final report is dynamically generated and stored here.
- Slides -- slides are dynamically generated and stored here.
And the file strucutre of this project is like the following
Stat159-Project-2/
.gitignore
README.md
Makefile
LICENSE
report/
sections/
00-abstract.Rmd
01-introduction.Rmd
02-data.Rmd
03-methods.Rmd
04-analysis.Rmd
05-results.Rmd
06-conclusions.Rmd
report.Rmd
report.pdf
images/
... (dynamically generated images)
data/
Credit.csv
... (dynamically generated data)
code/
functions/
... (utility functions)
scripts/
... (regression scripts)
tests/
... (unit tests against utility functions)
slides/
...
session-info.txt (system info)
This project can be reproduced by following the instructions below.
- Download/Clone this project from GitHub (unzip if downloaded file is in zip format)
- Open terminal or any shell program that supports standard linux commands
cd Stat159-Project-2make cleanto remove old compiled artifactmaketo generate new artifact- open
report.pdfwith PDF viewer of your choice
This section includes description of different make commands that you can use to reproduce corresponding part of this project
make allreproduce the entire projects -- download data, run regression analysis, aseemble report etcmake datadownlaod data from internet, run data cleaning script, split data into train set and test setmake edarun exploratory data analysis scriptmake olsrun OLS regression script and save resultmake ridgerun Ridge regression script and save resultmake lassorun Lasso regression script and save resultmake pcrrun PCR regression script and save resultmake plsrrun PLSR regression script and save resultmake regressionsrun all regression model scripts togethermake sessionrun session info script and store system & package infromation into session-info.txtmake reportassemble report from Rmd files in sections and transform to PDF formatmake slidescreate slides from Rmd filesmake cleanremove old artifactsmake teststo run unit test in tests directory
Junyu Wang
Nichole Ann Rethmeier
ALl media content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
All code is licensed under MIT license.