House Price Predication
House Price Predication
Table of Contents
1. Abstract
2. Chapter 1: Introduction
○ 1.1 Background
○ 1.2 Problem Statement
○ 1.3 Objectives
○ 1.4 Scope of the Project
3. Chapter 2: Literature Survey
○ 2.1 Overview of Real Estate Market Dynamics
○ 2.2 Challenges in Traditional Property Valuation
○ 2.3 Machine Learning for Price Prediction
○ 2.4 Related Work
4. Chapter 3: System Analysis
○ 3.1 Existing System
○ 3.2 Proposed System
○ 3.3 Feasibility Study
○ 3.4 Requirements Analysis
■ 3.4.1 Functional Requirements
■ 3.4.2 Non-Functional Requirements
5. Chapter 4: System Design
○ 4.1 System Architecture
○ 4.2 Use Case Diagram
○ 4.3 Data Flow Diagram
○ 4.4 Ui User Authentication
○ 4.5 UI Design
6. Chapter 5: Implementation
○ 5.1 Technology Stack
○ 5.2 Dataset and preprocessing
○ 5.3 Django web interface
○ 5.4 Prediction and Output Display
Chapter 1: Introduction
1.1 Background
Real estate pricing is a complex process influenced by multiple factors such as location, area,
number of rooms, amenities, and market demand. Traditionally, property valuation has relied on
expert opinion or static rule-based assessments. However, with the availability of large datasets
and computational power, predictive modeling through machine learning has become a powerful
tool to estimate property prices with greater accuracy.
This project presents a House Price Prediction System built using Django and a trained ML
model. The system allows users to input property-related features and returns a predicted price
instantly, enhancing transparency and aiding decision-making in the property buying/selling
process.
Manual estimation of house prices lacks consistency and may lead to inaccurate valuations due
to human bias or insufficient data analysis.
Problem Statement: “To develop a Django-based web system that predicts house prices using
machine learning algorithms based on various property features like location, size, and number
of bedrooms.”
1.3 Objectives
Included in Scope:
These features are now being digitized for algorithmic processing in modern pricing systems.
Hence, AI-based prediction systems offer a more scalable and consistent approach.
ML techniques like Linear Regression, Random Forest, Gradient Boosting, and XGBoost are
commonly used in regression-based tasks like house price prediction.
● Kaggle Competitions have demonstrated the use of ensemble models for price
prediction.
● Zillow.com and other platforms use proprietary ML models for price estimates.
● Academic studies have shown 90%+ accuracy using advanced regression models on
clean, well-labeled datasets.
This project adapts a real-world approach by combining Django's web development power with a
machine learning pipeline to deliver an intelligent and usable solution.
Frontend
The user interface is built using Django templates and styled with Tailwind CSS, a utility-first
CSS framework that allows for rapid design and consistent aesthetics across pages. The
frontend is designed to be clean, responsive, and accessible, ensuring usability on both
desktop and mobile devices.
The frontend interacts with the backend using standard form submissions and can be extended
to include AJAX or API-based data submission in future versions. Mobile responsiveness is
achieved using Tailwind's responsive breakpoints, ensuring optimal user experience on smaller
screens.
Backend
The backend is developed in Django 4.x, a high-level Python web framework known for its
scalability, security, and built-in ORM. It handles:
● Routing and Views: URL mappings are created for all major functionalities such as data
entry, prediction, and dashboard access. Views control the logic for rendering templates
or processing prediction requests.
● Authentication and Authorization: Built-in Django auth is used for login, logout, and
password protection. Users are grouped into roles using Django's Group model, enabling
role-based access control.
● Model Integration: The machine learning model is integrated into Django as a Python
module. It is invoked during form processing to generate real-time predictions.
● Data Validation and Security: The backend ensures input validation, uses CSRF
protection on forms, and handles exceptions gracefully to prevent system crashes.
The architecture follows the Model-View-Template (MVT) design pattern, which cleanly
separates data, logic, and presentation layers.
Database
The system uses PostgreSQL as its primary relational database management system.
PostgreSQL was selected due to its robustness, ACID compliance, and support for advanced
features such as JSON storage, indexing, and role management.
● User Accounts: Stores information about registered users, their roles, and credentials.
● Prediction Logs: Captures environmental input values, recommended s, timestamps,
and user ID for traceability.
● Training Dataset Management (Admin Only): A provision to upload and manage
datasets for retraining the ML model in the future.
● Feedback Records: Planned for future implementation where can submit feedback
on the recommendation quality.
The Django ORM abstracts SQL queries and handles migrations seamlessly, which simplifies
database operations and schema evolution.
The core intelligence of the system lies in the Machine Learning service, implemented as a
standalone Python module integrated into the Django backend. The service uses a Random
Forest Classifier trained on a structured dataset containing labels and environmental features.
● Model Training Script: Written in Python using pandas, scikit-learn, and NumPy.
The model is trained offline and validated using test data before deployment.
● Model Serialization: The trained model is serialized using joblib for efficient storage
and fast loading at runtime.
● Runtime Inference: When a user submits environmental data, the Django view loads the
serialized model and passes the input to generate predictions in real-time.
● Result Output: The prediction result is returned to the user interface along with optional
metadata like confidence score (planned in future).
The design ensures that the model can be updated independently of the web app. Admins can
retrain the model offline and replace the serialized .pkl file without needing to redeploy the
entire system.
Integration Workflow
This modular and loosely coupled architecture ensures that each part of the system is
independently testable, replaceable, and scalable.
Technical Feasibility: The system is built using widely available open-source tools (Python,
Django, Scikit-learn). No proprietary software or specialized hardware is required. The ML
model is trained offline and loaded into memory during runtime using joblib.
Operational Feasibility: The system is easy to use and deploy. Once trained, the model
can serve multiple prediction requests in real time without needing re-training unless
explicitly required.
Economic Feasibility: Since the system uses open-source tools and publicly available
datasets, there are no direct costs involved. It is feasible for individual developers and
academic institutions.
Responsibilities:
Collect data from users (age, cholesterol, etc.)
Responsibilities:
Handle HTTP requests and form submissions
3. Data Layer
Components:
Database: SQLite (db.sqlite3)
Responsibilities:
Store input and prediction results
Store and load the trained ML model used for real-time predictions
Workflow Summary
1. User Accesses the Form
User navigates to the form page and enters required information.
API Layer: REST API using Django REST Framework for external integration.
Model Retraining: Periodic retraining with new data entries to improve prediction
accuracy.
Overall Flow
The user begins by accessing the site and submitting house
features.
The system validates the input and either prompts for
correction or proceeds.
Once validated, the ML model is triggered to estimate the
house price.
The result is then displayed to the user.
4.5 UI Design
Place screen shot and write minimal explanation about the screenshot you can UI you
feel good
Chapter 5: Implementation
1. Programming Languages
Python
Used for data processing, model training (machine learning), and back-end
development (Django framework).
HTML/CSS
Used for designing the web front-end interface (forms, templates).
Joblib – For saving the trained model ( _ .pkl) for reuse in the web app.
3. Web Framework
Django (Python-based Web Framework)
o Django Admin panel is used for managing data models through a web
UI.
4. Database
SQLite
o Lightweight, serverless relational database used for storing form
submissions and managing admin data.
5. Front-End
Django Templates (HTML with template tags)
Used to render web forms, display results, and interact with users dynamically.
7. Development Tools
Jupyter Notebook / Python scripts (for model development)
3. User Interface
Although templates were not read directly, the presence of view rendering
(render(request, "result.html", {...})) suggests that:
o A frontend form (form.html) collects user inputs.
o A result page (result.html) displays the predicted price.
4. Dependencies
File: requirements.txt
Contains necessary packages:
django
pandas
numpy
scikit-learn
joblib
These are used for ML modeling, data handling, and serving the web interface.
5. Data and Storage
Model training uses house_dataset.csv.
Django uses SQLite (db.sqlite3) to manage any user-related data (e.g.,
history, form submissions).
Workflow Summary
1. Model Training:
o Run train_model.py → Generates house_model.pkl.
2. Web Application Flow:
o User opens the site.
o Enters house details (location, size, BHK, bathroom).
o Submits the form.
o Model predicts the price.
o Result is displayed on the web page.
Admin Panel working
The Django Admin Panel is a built-in web-based interface provided by Django that
lets administrators manage database records (add, update, delete, view) through a
user-friendly dashboard without writing raw SQL or backend code.
Admin Panel Code and Setup
To enable the admin panel for your project, Django needs:
1. A models.py file with registered database models.
2. An admin.py file to register those models with the admin interface.
3. Superuser credentials to access the panel.
class HousePrediction(models.Model):
location = models.CharField(max_length=100)
sqft = models.FloatField()
bhk = models.IntegerField()
bath = models.IntegerField()
predicted_price = models.FloatField()
def __str__(self):
return f"{self.location} - ₹{self.predicted_price}"
myProject/
│
├── home/ ← Django app
│ ├── views.py ← View functions (main logic)
│ ├── urls.py ← URL routing
│ ├── models.py ← (Optional) Model for database storage
│ ├── templates/ ← HTML templates (form.html, result.html)
│ └── admin.py ← Register models for admin panel
│
├── train_model.py ← Trains and saves ML model (house_model.pkl)
├── house_dataset.csv ← Training dataset
├── manage.py ← Django command-line utility
└── db.sqlite3 ← SQLite database
<form method="post">
{% csrf_token %}
Location: <input type="text" name="location"><br>
Sqft: <input type="number" name="sqft"><br>
BHK: <input type="number" name="bhk"><br>
Bath: <input type="number" name="bath"><br>
<input type="submit" value="Predict Price">
</form>
def predict(request):
if request.method == 'POST':
location = request.POST['location']
sqft = float(request.POST['sqft'])
bhk = int(request.POST['bhk'])
bath = int(request.POST['bath'])
Location: Mumbai
Sqft: 1500
BHK: 3
Bath: 2
[Predict Price]
Result Page:
Estimated House Price: ₹82.6 Lakhs
ML- Integration
1. ML Model Training
1. Data Loading & Preparation
The CSV dataset (house_dataset.csv) contains these columns:
o size (square feet)
o bedrooms
o bathrooms
o age (of the house)
o price (target)
2. Feature/Target Split
o Features (X): size, bedrooms, bathrooms, age
o Target (y): price
3. Train/Test Split
o 80% training, 20% testing
o Ensures we can evaluate generalization on unseen data.
4. Model Selection & Training
o A Linear Regression model was chosen for its
interpretability.
o The model learns coefficients for each feature to best predict
price.
5. Evaluation Metrics
o Mean Absolute Error (MAE) measures average prediction
error in the same units as price.
o R² Score indicates proportion of variance explained (closer
to 1 is better).
Results on Test Set:
Mean Absolute Error: 23,257.70
R² Score: 0.7667
o On average, predictions are off by about ₹23K.
o The model explains ~77% of the variance in house prices—
strong but room for improvement.
6. Saving the Model
o The trained model is serialized to house_model.pkl via
joblib.dump().
o This file is loaded later by the Django app for real-time
predictions.
2. Sample Predictions
Input Features Predicted Price Actual Price
size=1900, 4 bath, 2 bedrooms, age=18 ₹366,760.06 ₹350,000
size=2000, 4 bath, 3 bedrooms, age=5 ₹420,244.67 ₹450,000
4. End-to-End Flow
1. User enters house details on a web form.
2. Django View receives data, loads ML model, and calls predict().
3. ML Model processes numeric inputs, returns a price estimate.
4. View renders a result page showing the predicted price.
This seamless connection lets non-technical users obtain data-driven
price estimates instantly via their browser.
Model Accuracy
Model Evaluation Metrics (Linear Regression)
Based on the testing conducted on your dataset (house_dataset.csv), the model was
trained using the following features:
size (area of the house)
bedrooms
bathrooms
age (of the house)
The target variable is:
price
Performance Results
Metric Value
Mean Absolute Error ₹23,257.70
R² Score 0.7667
Explanation of Metrics
Mean Absolute Error (MAE):
The average difference between the actual house prices and predicted values
is about ₹23,257. A lower MAE indicates more accurate predictions.
R² Score (Coefficient of Determination):
An R² value of 0.7667 means that the model explains approximately 76.67%
of the variability in house prices based on the input features.
o Closer to 1 = better model
o 0.76 is considered good for regression, especially for real-estate
applications.
Interpretation
The model performs well for a simple linear regression using only four basic
features.
It's reliable enough to give realistic price estimates for end users.
However, it can still be improved by:
o Adding more features (e.g., location, amenities)
o Using non-linear models (e.g., Random Forest, XGBoost)
o Cleaning and scaling the dataset
Adavantages
1.User-Friendly Web Interface
Users can easily input house details through a simple form.
No need for technical expertise to get a price estimate.
2. Quick Predictions
The pre-trained ML model provides near-instant price predictions upon form
submission.
3. Real-Time Processing
Integrated with Django, the model runs live predictions and displays results
immediately.
4. Cost-Efficient
Helps users estimate property prices without needing to hire an appraiser or
real estate agent.
5. Scalable
The system can be extended to include new features, models, or services.
6. Educational Value
Great learning tool for understanding how web development and machine
learning can work together.
Limitations
1.Limited Features
The model only uses a few features (size, bedrooms, bathrooms, age),
missing out on key factors like:
o Location
o Market trends
o Nearby amenities
2. Basic Model
Uses Linear Regression, which assumes a linear relationship. This might
oversimplify real-world pricing, which can be non-linear.
3. Dataset Dependency
Accuracy heavily depends on the quality and size of the dataset.
Outdated or biased data can lead to poor predictions.
4. No Error Handling for Edge Cases
The system may not handle unexpected or extreme inputs gracefully.
5. Lack of Authentication
Anyone can access and use the application unless login or security features
are added.
6. No Real-Time Market Integration
It doesn't fetch current real estate prices from live APIs or platforms.
Conclusion
The House Price Prediction System is a practical and efficient application that leverages
the power of Machine Learning and Django web development to deliver real-time, user-
friendly, and accurate property price estimations. By taking basic input features such as
house size, number of bedrooms and bathrooms, and property age, the system predicts
house prices using a trained Linear Regression model. This model was developed using
historical data and evaluated with metrics such as Mean Absolute Error (₹23,257) and R²
Score (0.7667), indicating that the system is reasonably accurate and reliable for most use
cases.
The use of Django provides a robust and scalable backend to the system, allowing
seamless communication between user input, model prediction, and result display. This
integration ensures that even non-technical users can easily access the prediction tool
through a simple web form. The project also demonstrates a clean separation of concerns
between the web interface, backend logic, and machine learning components.
In addition to functioning as a prediction tool, the project has significant educational value,
serving as a foundational example of how data science models can be effectively deployed
in real-world applications. It showcases the entire workflow—from data collection and
preprocessing, through model training and evaluation, to deployment via a web application.
However, like any model-driven system, this project also has limitations. The predictions are
constrained by the simplicity of the model and the limited set of input features. Real-world
house pricing is influenced by a vast array of factors including location, economic trends,
neighborhood quality, proximity to amenities, and more—many of which are not yet
considered in this system.
Despite these constraints, the project successfully illustrates the power of
combining AI with modern web frameworks. It delivers a working solution that could be
further enhanced and scaled up for commercial or enterprise use. Possible future
improvements include integrating more complex models like Random Forest or XGBoost,
expanding feature inputs (e.g., geographic data), connecting to real estate APIs for dynamic
updates, and adding user authentication and dashboards for a more personalized
experience.
In conclusion, this project not only achieves its objective of estimating house prices
based on user input but also serves as a strong foundation for future work in AI-powered
real estate tools, making it a valuable addition to both academic and practical domains.
Future Enhancements
2. Advanced ML Models
Enhancement: Replace or supplement Linear Regression with more powerful
models like:
o Random Forest
o XGBoost
o Gradient Boosting
Benefit: Handles non-linear relationships and improves prediction accuracy.
5. Interactive Visualizations
Enhancement: Add charts to show predicted price trends, comparisons, and
feature impact.
Benefit: Improves user understanding and engagement with visual data.
6. Mobile-Friendly Interface
Enhancement: Make the Django frontend responsive or build a separate mobile
app.
Benefit: Expands accessibility for mobile users and realtors on the go.