0% found this document useful (0 votes)

74 views28 pages

Ids Unit-5

This document provides an overview of data visualization and prototype application development in data science, focusing on tools and techniques for creating interactive dashboards. It discusses various data visualization options, the role of data scientists, and the use of libraries like dc.js, Crossfilter, and d3.js for building dashboards. A case study example is included, demonstrating the process of creating a dashboard for a hospital pharmacy to monitor light-sensitive medicines.

Uploaded by

vijayams16285

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views28 pages

Ids Unit-5

Uploaded by

vijayams16285

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

INTRODUCTION TO DATA SCIENCE

UNIT-5
Data Visualization and Prototype Application Development: Data Visualization
options, Crossfilter, the JavaScript MapReduce library, Creating an interactive
dashboard with dc.js, Dashboard development tools.
Applying the Data Science process for real world problem solving scenarios as a
detailed case study.

INTRODUCTION

Data visualization to the end user

 Data visualization is the process of presenting data in a visual format,

such as charts, graphs, or maps, to make it easier for end users to
understand and interpret.
 It helps users quickly identify patterns, trends, and insights from the
data, making complex information more accessible and actionable.
 Common tools for data visualization include bar charts, line graphs, pie
charts, heatmaps, and dashboards.
 Often, data scientists must deliver their new insights to the end user.

The results can be communicated in several ways:

 A one-time presentation
 A new viewport on your data
 A real-time dashboard

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 1

1. A one-time presentation
 This is typically used for presenting findings from a specific analysis or
project.
 It involves creating a visual and verbal presentation, often using slides,
to explain the insights and recommendations clearly.
 Charts, graphs, and infographics are used to make the information
more understandable and impactful.

2. New Viewport on Your Data

 This involves creating a new way of viewing or interacting with existing
data.
 It might be a new report, chart, or interactive visualization that provides
fresh insights or highlights a specific aspect of the data relevant to the
end user.
 This approach allows users to explore data from different perspectives
and gain new understanding.

3. Real-time Dashboard
 Dashboards provide dynamic, real-time visualizations of data, allowing
end users to monitor key metrics and performance indicators as they
happen. Dashboards are often used for continuous monitoring, offering
an up-to-date view of trends, progress, or any issues that may need
immediate action.
 They are highly interactive and customizable, designed to meet the
specific needs of the user.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 2

When working on data projects, you need to think about a few key factors:

1. Type of Decision:

Are you supporting a strategic decision or an operational one?

 Strategic decisions are usually made once and may not need frequent
updates.
 Operational decisions require reports that are updated regularly.

2. Size of the Organization:

In small organizations, you might handle everything: from collecting

data to creating reports?

 In small organizations, you are responsible for everything, from

collecting data to making reports.
 In larger organizations, there may be a team that creates dashboards
for you.
 Even then, making a sample dashboard yourself can be useful

 But even in this last situation, delivering a prototype dashboard can

be beneficial because it presents an example and often shortens
delivery time.

Data visualization options

Here are some common data visualization options:

1. Charts and Graphs: Examples include bar charts, line graphs, pie charts,
and scatter plots. They help show trends, comparisons, and distributions.
2. Maps: Geographic maps display data related to locations, such as sales
per region or population density.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 3

3. Dashboards: Dashboards combine multiple charts and graphs in one view.
They give a quick overview of key information and allow for real-time
monitoring.
4. Infographics: Visual presentations that combine text, images, and data to
tell a story or explain complex information.
5. Heatmaps: Show patterns or intensity of data values using color
gradients. They are useful for highlighting high and low points in large
datasets.
6. Tables: Display raw data in a structured format, making it easy to view
details and compare values directly.

Example

Creating a Dashboard for a Hospital Pharmacy

1. Overview
 A new government rule requires pharmacies to check if their medicines
are sensitive to light and store them in special containers.
 However, the government hasn’t provided a list of which medicines are
light-sensitive.

2. Data Scientist's Role

 As a data scientist, you can find out which medicines are light-
sensitive by examining their patient information leaflets (small printed
sheets or booklet).
 You can use text mining to categorize each medicine as "light
sensitive" or "not light sensitive."

3. Database Update
 After tagging the medicines, you upload this information to a central
database.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 4

4. Stock Analysis
 The pharmacy provides you with access to its stock data to determine how
many special containers are needed for light-sensitive medicines.

5. Data Format
 The data includes time-series information for one year, with around 10,000
entries for 29 different medicines.

6. Dashboard Options
 There are many options for creating dashboards, but this chapter will focus
on using dc.js, a JavaScript library that combines data handling
(Crossfilter) and data visualization (d3.js).

7. Why dc.js?:

 User-Friendly: dc.js is easy to set up and allows you to create interactive

dashboards where clicking on one graph filters the data shown in other
graphs.
 Time-Saving: It helps you focus on your analysis instead of spending too
much time on dashboard creation.

8. Prerequisites:

 You need to use d3.js and crossfilter.js for dc.js to work.

 Although d3.js can be complex, you don’t need to be an expert in it to
use dc.js.

9. Example Dashboard: You can explore a sample dashboard on the dc.js

website to see how it works and interact with the graphs.

10. Next Steps: By the end of this chapter, you will be able to create a
dashboard yourself using the information and tools provided.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 5

Crossfilter, the JavaScript MapReduce library

 Crossfilter is a powerful JavaScript library designed to handle large

datasets efficiently in web browsers.
 It allows users to filter and aggregate data across multiple dimensions
simultaneously, making it ideal for interactive data analysis and
visualization.
 The main purpose of Crossfilter is to enable fast, interactive exploration
and analysis of large datasets in web applications.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 6

 JavaScript is a high-level, dynamic, and interpreted programming
language primarily used for adding interactivity and functionality to web
pages.
 It is one of the core technologies of the World Wide Web, alongside HTML
(HyperText Markup Language) and CSS (Cascading Style Sheets).
 JavaScript isn’t the best language for heavy data processing.
 A MapReduce library is a programming model and associated
implementation for processing and generating large datasets that can be
parallelized across a distributed cluster of computers.
 The MapReduce model allows for the efficient processing of big data by
breaking down tasks into smaller, manageable pieces.
 However, companies like Square created MapReduce libraries for it.
 When working with data, every speed improvement is helpful.
 You want to avoid sending large amounts of data over the internet or even
your internal network for a few reasons

1. Slow Performance: Sending big data files can slow down your
application.
2. Network Congestion: Large data transfers can clog your network,
making it less efficient.
3. Increased Costs: Transferring large volumes of data can lead to higher
data usage costs.
4. Longer Load Times: Users may have to wait longer for data to load,
which can be frustrating.

Setting up everything

It’s time to build the actual application, and the ingredients of our small
dc.js application are as follows:

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 7

 JQuery—To handle the interactivity
 Crossfilter.js—A MapReduce library and prerequisite to dc.js
 d3.js—A popular data visualization library and prerequisite to dc.js
 dc.js—The visualization library you will use to create your interactive
dashboard
 Bootstrap—A widely used layout library you’ll use to make it all look
better

You’ll write only three files:

 index.html—The HTML page that contains your application

 application.js—To hold all the JavaScript code you’ll write
 application.css—For your own CSS
 In addition, you’ll need to run our code on an HTTP server. You could go
through the effort of setting up a LAMP (Linux, Apache, MySQL, PHP),
WAMP (Windows, Apache,MySQL, PHP), or XAMPP (Cross Environment,
Apache, MySQL, PHP, Perl) server.

But for the sake of simplicity we won’t set up any of those servers here.

Instead you can do it with a single Python command.

Steps to Launch a Python HTTP Server in Python 3

1. Create a one folder and named as dashboard

On that folder you can create necessary files

2. Open Command-Line Tool:

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 8

 For Windows: Open Command Prompt (CMD) by searching for "cmd" in
the Start menu.
 For Linux or macOS: Open the Terminal application.

3. Navigate to the Folder

C:\Users\SRIKANTH\Desktop\dashboard and select url type cmd

4. Check Python Installation:

Python --version

5. Start the HTTP Server:

 Run the following command

python -m http.server

6. Access the Server

 Open a web browser and go to http://localhost:8000.
 You should see your index.html file and any other files in that folder.

http://localhost:8000/index.html

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 9

index.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Interactive Dashboard</title>
<link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.c
ss">
<link rel="stylesheet" href="application.css">
</head>
<body>
<div class="container">
<h1>Medicine Dashboard</h1>

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 10

application.js

// Load the data

d3.json("data.json").then(function(data) {
// Create a Crossfilter instance
var ndx = crossfilter(data);

// Define dimensions
var categoryDim = ndx.dimension(function(d) { return d.category; });
var valueDim = ndx.dimension(function(d) { return d.value; });

// Group data
var categoryGroup = categoryDim.group();
var valueGroup = valueDim.groupAll().reduceSum(function(d) { return
d.value; });

// Create a bar chart

var barChart = dc.barChart("#bar-chart");
barChart
.width(400)
.height(200)
.dimension(categoryDim)
.group(categoryGroup)
.x(d3.scaleBand())
.xUnits(dc.units.ordinal)
.renderHorizontalGridLines(true)
.renderVerticalGridLines(true)
.elasticY(true);

// Create a pie chart

var pieChart = dc.pieChart("#pie-chart");
pieChart
.width(200)
.height(200)
.dimension(categoryDim)
.group(categoryGroup);
// Render the charts
dc.renderAll();
});

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 11

application.css

body {
background-color: #f8f9fa;
font-family: Arial, sans-serif;
}

h1 {
text-align: center;
margin-top: 20px;
}

#chart {
display: flex;
justify-content: space-around;
margin-top: 30px;
}

#bar-chart, #pie-chart {
border: 1px solid #ccc;
border-radius: 5px;
background-color: #fff;
padding: 10px;
}

Data.json

[
{"category": "Pain Relief", "value": 10},
{"category": "Antibiotics", "value": 20},
{"category": "Antidepressants", "value": 15},
{"category": "Antihistamines", "value": 5},
{"category": "Vitamins", "value": 25},
{"category": "Others", "value": 30}
]

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 12

Creating an interactive dashboard with dc.js

 Creating an interactive dashboard with dc.js involves combining several

libraries: dc.js itself (for charts), crossfilter.js (for handling the dataset),
and d3.js (for rendering and manipulating SVG elements).

Prerequisites

 dc.js
 crossfilter.js
 d3.js
 HTML/CSS/JavaScript skills

Components in the Dashboard

From the image, we can see the following charts and components:

Line Chart – for stock tracking over time (likely for a single medicine).

Bar Chart – displaying various medicines and their quantities.

Pie Chart – showing a categorical division (e.g., availability "Yes" or "No").

Reset Filters Button – to reset all applied filters.

Steps to Build a Dashboard like this Using dc.js

1. Setting up the HTML structure

index.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Medicine Stock Dashboard</title>

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 13

<script
src="https://cdnjs.cloudflare.com/ajax/libs/d3/6.6.2/d3.min.js"></script>
<script
src="https://cdnjs.cloudflare.com/ajax/libs/crossfilter/1.3.12/crossfilter.min.
js"></script>
<script
src="https://cdnjs.cloudflare.com/ajax/libs/dc/4.2.7/dc.min.js"></script>
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/dc/4.2.7/dc.min.css">

.dashboard-container {
display: flex;
flex-wrap: wrap;
justify-content: space-between;
}

.chart {
margin: 20px;
padding: 20px;
background-color: #ffffff;
border: 1px solid #ddd;
border-radius: 10px;
}

.chart h3 {
text-align: center;
margin-bottom: 15px;
}

#reset-filters {

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 14

margin: 20px;
padding: 10px 20px;
background-color: #28a745;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
}

#reset-filters:hover {
background-color: #218838;
}
</style>
</head>
<body>

<h1>Medicine Stock Dashboard</h1>

<div class="dashboard-container">

<div id="line-chart" class="chart" style="width: 45%;">
<h3>Stock Over Time</h3>
</div>

<div id="bar-chart" class="chart" style="width: 45%;">
<h3>Medicine Stock Levels</h3>
</div>

<div id="pie-chart" class="chart" style="width: 30%;">
<h3>Medicine Availability</h3>
</div>
</div>

<button id="reset-filters">Reset Filters</button>

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 15

</body>
</html>

2. Create Multiple Charts with dc.js in script.js

This file will hold the logic for the charts. Below is the JavaScript to create
different types of charts (line, bar, pie, etc.), based on the components in the
image.

script.js

// Load the CSV data (replace with your actual dataset)

d3.csv('data.csv').then(function(data) {
// Parse the data
data.forEach(function(d) {
d.date = new Date(d.date); // Assuming date is in the format YYYY-MM-
DD
d.stock = +d.stock;
d.medicine = d.medicine;
d.available = d.available; // "Yes" or "No"
});

// Initialize crossfilter
var ndx = crossfilter(data);

// Define dimensions
var dateDimension = ndx.dimension(function(d) { return d.date; });
var medicineDimension = ndx.dimension(function(d) { return d.medicine; });
var availabilityDimension = ndx.dimension(function(d) { return d.available; });

// Define groups
var stockByDate = dateDimension.group().reduceSum(function(d) { return
d.stock; });
var stockByMedicine = medicineDimension.group().reduceSum(function(d) {
return d.stock; });
var availabilityGroup = availabilityDimension.group();

// Line Chart (Stock Over Time)

var lineChart = dc.lineChart('#line-chart');
lineChart
.width(450)
.height(300)
.dimension(dateDimension)

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 16

.group(stockByDate)
.x(d3.scaleTime().domain(d3.extent(data, function(d) { return d.date; })))
.xAxisLabel("Year")
.yAxisLabel("Stock Level")
.render();

// Bar Chart (Stock Levels by Medicine)

var barChart = dc.barChart('#bar-chart');
barChart
.width(450)
.height(300)
.dimension(medicineDimension)
.group(stockByMedicine)
.x(d3.scaleBand())
.xUnits(dc.units.ordinal)
.xAxisLabel("Medicine")
.yAxisLabel("Stock Level")
.barPadding(0.1)
.outerPadding(0.05)
.render();

// Pie Chart (Availability Yes/No)

var pieChart = dc.pieChart('#pie-chart');
pieChart
.width(300)
.height(300)
.radius(150)
.dimension(availabilityDimension)
.group(availabilityGroup)
.render();

// Reset Filters Button

d3.select('#reset-filters').on('click', function() {
dc.filterAll(); // Reset all filters
dc.renderAll(); // Re-render all charts
});

// Render all charts initially

dc.renderAll();
});

3. Data (data.csv)

To simulate the dashboard, you can create a sample dataset with fields for
date, medicine, stock, and availability.
Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 17
data.csv

date,medicine,stock,available

2023-01-01,Paracetamol,200,Yes

2023-01-02,Ibuprofen,150,No

2023-01-03,Amoxicillin,180,Yes

2023-01-04,Aspirin,100,Yes

2023-01-05,Cetirizine,90,No

2023-01-06,Metformin,250,Yes

2023-01-07,Atorvastatin,300,Yes

2023-01-08,Lisinopril,120,No

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 18

Dashboard development tools
 When it comes to dashboard development tools, there are many options
available depending on your budget, technical expertise, and specific needs.

Here’s an overview of some popular tools for developing interactive

dashboards:

1. Paid Dashboard Tools

These tools are known for their powerful features, ease of use, and support.
They are often used by businesses, but usually require purchasing a license.

 Tableau: Known for its ease of use, powerful visualizations, and

interactivity. It’s widely used for business intelligence and data analysis.
 Microsoft Power BI: Microsoft’s tool for creating interactive dashboards,
with seamless integration into the Microsoft ecosystem (Excel, Azure,
etc.).
 QlikView / Qlik Sense: These are strong tools for data visualization and
business intelligence, offering drag-and-drop simplicity.
 SAP Analytics Cloud: Provides cloud-based analytics and data
visualization solutions, tightly integrated with SAP’s enterprise systems.
 MicroStrategy: Known for its powerful data analytics capabilities and
enterprise-level scalability.
 IBM Cognos Analytics: Another enterprise-level tool for building
interactive dashboards, reports, and analytics.
 TIBCO Spotfire: Offers a user-friendly interface with powerful analytics,
and supports a wide range of data sources.
 SAS Visual Analytics: Provides visual data exploration tools, advanced
analytics, and forecasting.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 19

2. Free and Open-Source Tools

These are excellent options for developers who prefer customizable and cost-
effective solutions.

 Grafana: A popular open-source platform for monitoring and

observability, allowing integration with various data sources like time-
series databases.
 Google Data Studio: A free tool from Google that integrates well with
Google’s ecosystem (Google Analytics, Sheets, etc.) for building
dashboards.
 Metabase: A free, open-source platform designed for easy access to
business data, great for teams to ask questions and create visual
dashboards.
 Redash: Open-source tool to create visualizations and dashboards
directly from SQL databases.
 Kibana: Part of the Elastic Stack, Kibana allows visualization of data
stored in Elasticsearch, commonly used for logs and analytics.
 Dash by Plotly: An open-source framework for building interactive web-
based dashboards using Python, R, or Julia.
 Superset (Apache): An open-source data exploration and visualization
platform originally developed by Airbnb. It allows creating charts, maps,
and dashboards.

3. JavaScript Libraries for Custom Dashboards

If you prefer to build dashboards from scratch, these JavaScript libraries are
excellent for creating custom, highly interactive dashboards.

 D3.js: A powerful library for creating complex data visualizations in the

browser. It provides full control over how data is visualized.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 20

 dc.js: Built on top of D3.js, dc.js is designed for building fast, interactive
dashboards with crossfiltering capabilities.
 Chart.js: A simple, flexible library for charting. Good for basic charts and
lightweight dashboard solutions.
 Highcharts: A JavaScript charting library used to create interactive
charts. It’s free for personal use, but requires a license for commercial
use.
 Google Charts: Free and easy to use, Google Charts provides a variety of
chart types and works well for basic data visualizations.

4. Embedded Analytics Platforms

For developers who want to embed dashboards within applications:

 Looker: A modern data platform for embedded analytics. Recently

acquired by Google, it provides tools to build powerful visualizations and
insights.
 Sisense: Known for embedding analytics into applications, it provides
powerful visualization and dashboard creation capabilities.
 Embedded Power BI: Microsoft also offers Power BI for embedding
dashboards into web applications or portals.

5. Cloud-Based Tools

These platforms allow you to build, share, and access dashboards entirely
online:

 Zoho Analytics: A cloud-based business intelligence tool that allows

easy drag-and-drop dashboard creation.
 ClicData: A cloud-based platform for building and sharing data
visualizations, with a focus on easy integration with other services.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 21

 Mode Analytics: A cloud-based platform focused on SQL-based queries
with built-in visualization tools for dashboards.

Applying the Data Science process for real world problem

solving scenarios as a detailed case study.
Case Study: Predicting Housing Prices in a City

Objective

Overview:

 The housing market in urban areas is often influenced by various factors,

such as location, size, number of bedrooms, and local amenities.
 Accurately predicting housing prices can assist buyers, sellers, and real
estate agents in making informed decisions.

Goal:

 The aim of this case study is to develop a predictive model that estimates
housing prices based on several features of the properties.

1. Problem Definition

Key Questions:

 What factors most significantly influence housing prices?

 Can we accurately predict housing prices using historical data?
 How can this model help stakeholders in the real estate market?

Scope:

 The scope includes residential properties within a specific urban area over
the past five years, focusing on factors such as square footage, location (zip

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 22

code), number of bedrooms, number of bathrooms, and additional features
like a garage or garden.

2. Data Collection

Data Sources:

 Real Estate Websites: APIs from platforms like Zillow, Redfin, or web
scraping to collect historical housing data.
 Government Databases: Local government databases for demographic
information, neighborhood crime rates, school district ratings, etc.
 Survey Data: Collect data from surveys to gather information on buyer
preferences and market sentiment.

Dataset Fields:

 Price: The sale price of the house (target variable).

 Square Footage: Total area of the house.
 Bedrooms/Bathrooms: Number of bedrooms and bathrooms.
 Location: Zip code or neighborhood classification.
 Year Built: Age of the house.
 Additional Features: Presence of a garden, garage, pool, etc.

3. Data Cleaning and Preprocessing

Data cleaning and preprocessing are crucial for ensuring the quality of the
dataset before analysis. This step includes:

 Handling Missing Values: Assess and impute or remove records with

missing values.
 Outlier Detection: Identify and manage outliers that could skew results.
 Data Transformation: Convert categorical variables into numerical
format (e.g., using one-hot encoding for neighborhoods).

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 23

 Normalization: Scale numerical features to ensure uniformity, especially
for models sensitive to feature scales.

Example Code:

import pandas as pd
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer

# Load data
data = pd.read_csv('housing_data.csv')

# Handle missing values

data.fillna(data.mean(), inplace=True)

# Outlier detection
data = data[data['price'] < data['price'].quantile(0.95)]

# One-hot encoding for categorical features

data = pd.get_dummies(data, columns=['neighborhood'], drop_first=True)

# Normalize numerical features

scaler = StandardScaler()
data[['square_footage', 'bedrooms', 'bathrooms']] =
scaler.fit_transform(data[['square_footage', 'bedrooms', 'bathrooms']])

4. Exploratory Data Analysis (EDA)

Exploratory Data Analysis helps uncover patterns and relationships within the
data. This includes:

 Visualizations: Use scatter plots to visualize the relationship between

square footage and price, or box plots to understand price distributions
across different neighborhoods.
 Statistical Analysis: Conduct correlation analysis to identify significant
relationships between features and the target variable.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 24

Example Code for Visualization:

import seaborn as sns

import matplotlib.pyplot as plt

# Scatter plot for square footage vs price

sns.scatterplot(data=data, x='square_footage', y='price')
plt.title('Square Footage vs Price')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.show()

# Box plot for price distribution by neighborhood

plt.figure(figsize=(12, 6))
sns.boxplot(data=data, x='neighborhood', y='price')
plt.xticks(rotation=90)
plt.title('Price Distribution by Neighborhood')
plt.show()

5. Model Selection

Based on the characteristics of the data and the problem at hand, several
machine learning models can be considered:

 Linear Regression: A simple model for predicting a continuous target

variable based on one or more predictor variables.
 Decision Trees: A non-linear model that can capture interactions
between features.
 Random Forest: An ensemble method that combines multiple decision
trees for better performance.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 25

 Gradient Boosting Machines (GBM): Another ensemble method that
builds trees in a sequential manner to improve performance.
 Deep Learning: More complex models like neural networks for larger
datasets.

Example Code for Training a Random Forest Model:

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Split data into features and target

X = data.drop('price', axis=1)
y = data['price']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train Random Forest model

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

6. Model Evaluation

Evaluating the model’s performance is crucial to ensure its accuracy and

reliability. Metrics to consider include:

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 26

 Mean Absolute Error (MAE): Average of the absolute errors between
predicted and actual prices.
 Mean Squared Error (MSE): Average of the squares of the errors.
 R-squared: Measures how well the model explains the variability of the
target variable.

Example Code for Evaluation Metrics:

# Evaluate model performance
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = model.score(X_test, y_test)

print(f'Mean Absolute Error: {mae:.2f}')

print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')

7. Insights

Once the model is evaluated, the following insights can be derived:

 Feature Importance: Analyze which features significantly influence

housing prices (e.g., square footage may be the most important
predictor).
 Predictions vs. Actual Prices: Visualize predicted prices against actual
prices to assess the model's performance visually.
 Recommendations: Suggest potential improvements or adjustments
based on the model's findings, such as focusing on specific
neighborhoods for investment.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 27

Conclusion

This case study demonstrates the application of the Data Science process in
predicting housing prices. By collecting and preprocessing data, conducting
exploratory analysis, selecting an appropriate model, and evaluating its
performance, stakeholders can leverage data-driven insights to make informed
decisions in the housing market. Continuous monitoring and refinement of the
model can further enhance its accuracy and utility.

Prepared by Mr.K Srikanth, Asst. Professor, IT, VNITSW, Guntur Page 28

Unit 5 Ids
No ratings yet
Unit 5 Ids
19 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
IDS Unit3
100% (1)
IDS Unit3
16 pages
R23 IDS Unit4 PPT - 2.0
No ratings yet
R23 IDS Unit4 PPT - 2.0
38 pages
Hadoop Basics for Data Science Students
No ratings yet
Hadoop Basics for Data Science Students
22 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Lab Manual: Sri Ramakrishna Institute of Technology
No ratings yet
Lab Manual: Sri Ramakrishna Institute of Technology
49 pages
R Language
No ratings yet
R Language
59 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
CCchap 2
No ratings yet
CCchap 2
7 pages
Cs3352 - Foundation of Data Science
No ratings yet
Cs3352 - Foundation of Data Science
56 pages
Matplotlib Line and Scatter Plot Guide
No ratings yet
Matplotlib Line and Scatter Plot Guide
32 pages
Unit 1 DS BCA NOTES
No ratings yet
Unit 1 DS BCA NOTES
7 pages
MATPLOTLIB Updated
No ratings yet
MATPLOTLIB Updated
95 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Unit-1 Ooad
No ratings yet
Unit-1 Ooad
12 pages
BCA VI: Data Warehousing Essentials
No ratings yet
BCA VI: Data Warehousing Essentials
149 pages
Unit Iii - Cloud Virtualization
100% (1)
Unit Iii - Cloud Virtualization
10 pages
Data Mining Techniques - Javatpoint
No ratings yet
Data Mining Techniques - Javatpoint
9 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
21 pages
SE - MODULE 2 - ch2
No ratings yet
SE - MODULE 2 - ch2
18 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
Data Warehousing Study Guide
No ratings yet
Data Warehousing Study Guide
10 pages
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
No ratings yet
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
9 pages
IT3501-Full Stack Web Development
No ratings yet
IT3501-Full Stack Web Development
21 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
DWM-Experiment No-1,2,3,4,5,6,7,8
No ratings yet
DWM-Experiment No-1,2,3,4,5,6,7,8
42 pages
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
No ratings yet
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
15 pages
R23 3rd Year B.Tech AI and ML
No ratings yet
R23 3rd Year B.Tech AI and ML
52 pages
AI Basics for Tech Enthusiasts
No ratings yet
AI Basics for Tech Enthusiasts
125 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
Unit 5
No ratings yet
Unit 5
25 pages
BCS Database Programming Notes 1 - 12
No ratings yet
BCS Database Programming Notes 1 - 12
254 pages
DWDM All in One
No ratings yet
DWDM All in One
176 pages
Cp5152 Advanced Computer Architecture
100% (1)
Cp5152 Advanced Computer Architecture
1 page
XHTML & JS for Students
100% (1)
XHTML & JS for Students
18 pages
Deep Learning Important Questions
No ratings yet
Deep Learning Important Questions
2 pages
DSA Lesson Plan CD3291 JEC
No ratings yet
DSA Lesson Plan CD3291 JEC
4 pages
Case Study On Dbms & Rdbms
No ratings yet
Case Study On Dbms & Rdbms
36 pages
Ad3251 Data Structures Desgin Course Plan
No ratings yet
Ad3251 Data Structures Desgin Course Plan
8 pages
Dap M4
No ratings yet
Dap M4
18 pages
HPC for Medical Imaging & Vision
No ratings yet
HPC for Medical Imaging & Vision
3 pages
DWDM Question Bank (R23)
100% (1)
DWDM Question Bank (R23)
6 pages
DS 2 Marks
No ratings yet
DS 2 Marks
2 pages
CS3352 - Foundations of Data Science
No ratings yet
CS3352 - Foundations of Data Science
142 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
138 pages
Ccs341-Dw-Int I Key-Set I-Ar
No ratings yet
Ccs341-Dw-Int I Key-Set I-Ar
18 pages
Unit III EBDP 2022
No ratings yet
Unit III EBDP 2022
77 pages
Computational Methods and Techniques
No ratings yet
Computational Methods and Techniques
15 pages
Unit 3
No ratings yet
Unit 3
28 pages
Ad3251 Unit 2 Notes Edu Engg
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
Data Modelling and Visualization
No ratings yet
Data Modelling and Visualization
31 pages
ResNet Deep Learning Presentation
No ratings yet
ResNet Deep Learning Presentation
8 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
No ratings yet
Cs8582-Object Oriented Analysisand Design Laboratory-46023968-Cs8582 - Ooad Lab
132 pages
Module 4-Data Visualization To The End User
No ratings yet
Module 4-Data Visualization To The End User
9 pages
FBSW 3000Wrms 12
No ratings yet
FBSW 3000Wrms 12
2 pages
Bulk Material Density Chart: Product Type Product Type
No ratings yet
Bulk Material Density Chart: Product Type Product Type
4 pages
Domino Sensors 9-2007
No ratings yet
Domino Sensors 9-2007
4 pages
Wyse 5070 Technical Guidebook PDF
100% (1)
Wyse 5070 Technical Guidebook PDF
27 pages
Thesis HardBound
No ratings yet
Thesis HardBound
227 pages
Born-Haber Cycles: 16-18 Years
No ratings yet
Born-Haber Cycles: 16-18 Years
12 pages
Study Strategies for Students
No ratings yet
Study Strategies for Students
4 pages
Electrical Engineering Exam Guide
No ratings yet
Electrical Engineering Exam Guide
3 pages
Downloading SDR33 To Softdesk
No ratings yet
Downloading SDR33 To Softdesk
2 pages
Build123d Readthedocs Io en Latest
No ratings yet
Build123d Readthedocs Io en Latest
392 pages
Codification 160921114948
No ratings yet
Codification 160921114948
25 pages
Iso-10664 PDF
No ratings yet
Iso-10664 PDF
10 pages
Syntax Checker
No ratings yet
Syntax Checker
8 pages
Sec 1 Maths WA3 Mock Exam Guide
No ratings yet
Sec 1 Maths WA3 Mock Exam Guide
7 pages
Dental Age Estimation of 6-15 Year Old Indian Children Using Demirjian Method
No ratings yet
Dental Age Estimation of 6-15 Year Old Indian Children Using Demirjian Method
4 pages
Tamil Fish
No ratings yet
Tamil Fish
14 pages
Greenhouse Effect Experiment Guide
No ratings yet
Greenhouse Effect Experiment Guide
5 pages
Eligibility Conditions Faculty Appointment TTS
No ratings yet
Eligibility Conditions Faculty Appointment TTS
2 pages
Vinothkumar
No ratings yet
Vinothkumar
3 pages
Investigators Directory - CW - 2.0
No ratings yet
Investigators Directory - CW - 2.0
202 pages
Understanding The Self: Chapter 3 - Part 2
No ratings yet
Understanding The Self: Chapter 3 - Part 2
6 pages
HCR 9 Crawler Drill Parts List
100% (1)
HCR 9 Crawler Drill Parts List
3 pages
Centrifuge Beckman Coulter Optima L-100XP
No ratings yet
Centrifuge Beckman Coulter Optima L-100XP
110 pages
Topic-5: Competition Commission of India: Duties Powers and Functions
No ratings yet
Topic-5: Competition Commission of India: Duties Powers and Functions
32 pages
333 Story
No ratings yet
333 Story
3 pages
Medical Yoga Therapy
No ratings yet
Medical Yoga Therapy
2 pages
Pharma Investors: Anuh Ratings Update
No ratings yet
Pharma Investors: Anuh Ratings Update
7 pages
Man Space Requirements
No ratings yet
Man Space Requirements
150 pages
Screenshot 2023-03-31 at 3.17.45 PM
No ratings yet
Screenshot 2023-03-31 at 3.17.45 PM
1 page
Edited Introdution To Epidemiology
No ratings yet
Edited Introdution To Epidemiology
90 pages