housing-price-prediction-ML

Housing Prices Prediction Project

This project uses machine learning techniques to predict housing prices based on various features. The dataset is fetched, prepared, and analyzed in a Jupyter Notebook, with Random Forest Regression used as the primary model. Cross-validation is implemented to evaluate the model's performance.

Project Overview

This project demonstrates:

Fetching and extracting data from a remote source.
Data cleaning, feature engineering, and exploratory data analysis.
Building and evaluating a machine learning model using Random Forest Regression.
Performing cross-validation to measure model performance.

Dataset

The dataset used for this project is the 'California Housing Prices' dataset from the StatLib repository. This dataset was based on data from the 1990 California cen sus. It has been added to this GitHub repository and includes housing data stored in a CSV file, which is automatically downloaded and extracted during the notebook's execution.

Data Features

The dataset contains:

Various housing-related features such as location, price, size, and more.
Labels (target variable) representing housing prices.

Project Structure

The project is organized as follows:

Jupyter Notebook: Contains all the code for data loading, preprocessing, model building, and evaluation.
Dataset: Downloaded and extracted automatically into the dataset directory of your current workspace.

Dependencies

To run this project, ensure you have the following installed:

Python 3.8+
Jupyter Notebook
NumPy
Pandas
Scikit-learn
tarfile (standard library)
six (for compatibility)

Install dependencies using pip:

pip install numpy pandas scikit-learn

Running the Project

Clone the GitHub repository:

git clone https://github.com/kitkat1424/housing-price-prediction-ML.git

Navigate to the project directory:
```
cd housing-price-prediction-ML
```
Open the Jupyter Notebook:
```
jupyter notebook housing_prices.ipynb
```
Run all cells in the notebook to:
- Fetch and extract the dataset.
- Perform data preprocessing.
- Train and evaluate the model.

Results

The project tests Linear Regression, Decision Tree Regressor and Random Forest Regression on the training data set. It settles on the use of Random Forest Regression to predict housing prices. Model performance is evaluated using:

RMSE (Root Mean Squared Error): Evaluated on both training data and cross-validation folds.
Cross-validation: Ensures robust evaluation by splitting the dataset into multiple training and test sets. The mean RMSE is then computed.

Acknowledgments

Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (2nd ed.). O'Reilly Media.
Libraries such as Scikit-learn, Pandas, and NumPy were instrumental in building this project.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataset		dataset
README.md		README.md
housing_prices.ipynb		housing_prices.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

housing-price-prediction-ML

Housing Prices Prediction Project

Table of Contents

Project Overview

Dataset

Data Features

Project Structure

Dependencies

Running the Project

Results

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

kitkat1424/housing-price-prediction-ML

Folders and files

Latest commit

History

Repository files navigation

housing-price-prediction-ML

Housing Prices Prediction Project

Table of Contents

Project Overview

Dataset

Data Features

Project Structure

Dependencies

Running the Project

Results

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages