0% found this document useful (0 votes)
32 views53 pages

BI Practical Journal Final-1

The document outlines the practical curriculum for the Business Intelligence and Data Analytics course at Vikas College of Arts, Science & Commerce for the academic year 2024-2025. It includes various practical exercises involving data manipulation and analysis using tools like Microsoft Excel, R, and Python, covering topics such as data visualization, regression analysis, and classification algorithms. The document serves as a certification template for students completing these practicals under the guidance of faculty and external examiners.

Uploaded by

sahilspawar77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views53 pages

BI Practical Journal Final-1

The document outlines the practical curriculum for the Business Intelligence and Data Analytics course at Vikas College of Arts, Science & Commerce for the academic year 2024-2025. It includes various practical exercises involving data manipulation and analysis using tools like Microsoft Excel, R, and Python, covering topics such as data visualization, regression analysis, and classification algorithms. The document serves as a certification template for students completing these practicals under the guidance of faculty and external examiners.

Uploaded by

sahilspawar77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Business Intelligence and Data Analytics

Journal
(2024-2025)

1|Page
2|Page
VidyaVikas Education Society’s

VIKAS COLLEGE OF ARTS, SCIENCE & COMMERCE


Affiliated to University of Mumbai
RE-ACCREDITED ‘A’ GRADE BY NAAC
ISO 9001 : 2008 CERTIFIED

Vikas High School Marg, Kannamwar Nagar No 2, Vikhroli (E), Mumbai – 400083

Dr. R. K. Patra Hon’ ble: Shri P. M. Raut


Principal Chairman. V. V. Edu. Society

This is to certify that,


student of T.Y.B.Sc. (Information Technology) (Semester-V) with college enrolled
Roll no. / University Seat has satisfactorily
completed the practical in the Business Intelligence & Data Analytics in the
Information Technology program from the UNIVERSITY OF MUMBAI for the
academic year 2024-2025.

Guided By College Seal Head Of Department

External Examiner

3|Page
Index

Sr.No Practical.No Title SIgnatur


e

Import the data warehouse data in


1 1a Microsoft Excel and create the pivot table
and pivot chart.
Import the cube in Microsoft Excel and create
2 1b the Pivot table and Pivot Chart to perform
data
analysis.
Apply the what – if Analysis for data
visualization. Design and generate necessary
3 2
reports based on the data warehouse data.
Use
Excel.
Perform the data classification using
4 3
classification algorithm using R/Python.
Perform the data clustering using clustering
5 4
algorithm using R/Python.
Perform the Linear regression on the given data
6 5
warehouse data using R/Python.
Perform the logistic regression on the given
7 6
data
warehouse data using R/Python.
Write a Python program to read data from a
8 7 CSV file, perform simple data analysis, and
generate
basic insights. (Use Pandas is a Python library).
Perform data visualization using Python on any
9 8a
sales data.

4|Page
Perform data visualization using PowerBI on
10 8b
any
sales data.
Create the Data staging area for the selected
11 9
database using SQL.

5|Page
Practical: 1a

Aim: Import the data warehouse data in Microsoft Excel and create the
pivot table and pivot chart.

6|Page
Click on ok button.

Click on Piovt Chart and select any chart.

7|Page
8|Page
Practical: 1b

Aim: Import the cube in Microsoft Excel and create the Pivot table and
Pivot Chart to perform data analysis.

Step 1: Import Data from Data Warehouse to Excel

1. Open Microsoft Excel.

2. Go to the Data tab.

3. From the drop-down menu, choose From Other Sources


and select From Data Connection Wizard.

4. select Microsoft From SQL Server or choose the relevant source


for your data warehouse.
5. Connect to database server provide connection (Server Name).

6. Click Next and then Finish.

9|Page
10 | P a g e
Step 2: Create a Pivot Chart

1. Once you have the PivotTable set up, click anywhere inside
the PivotTable.
2. Select database and Table .
3. Save data connections file and finish .
4. Import data to select pivot chart and pivot table report.
5. Choose the chart type you want to visualize your data
(e.g., Column, Line, Pie, etc.).
6. Excel will generate the PivotChart based on the data in
your PivotTable.

11 | P a g e
12 | P a g e
13 | P a g e
Step 3: Create a Pivot Table

1. Once the data is imported, click anywhere inside the data range.
2. Go to the Insert tab on the Ribbon.
3. Click on PivotTable in the Tables group.
4. You can also choose to place the PivotTable in a new worksheet
or an existing one.
5. Click OK to create the PivotTable.

6. In the PivotTable Field List pane, drag fields into the Rows,
Columns, Values, and Filters areas to organize the data.
For example:

a. Rows: Place dimensions like product names or regions.

b. Columns: Place time periods (e.g., months or years).

c. Values: Put numerical values like sales or revenue.

14 | P a g e
15 | P a g e
16 | P a g e
17 | P a g e
Practical: 2

Aim: Apply the what –if Analysis for data visualization. Design and
generate necessary reports based on the data warehouse data. Use
Excel.

Step 1: Apply What-If Analysis in Excel

Microsoft Excel offers several tools for What-If Analysis that help simulate different scenarios
and outcomes based on changes to input data. These tools are:

1. Scenario Manager
2. Goal Seek

2.1. Scenario Manager

The Scenario Manager lets you create and compare different sets of input values to see how
changes affect the output.

● Step-by-Step:
1. Go to the Data tab on the Ribbon.
2. Click on What-If Analysis, then select Scenario Manager.
3. In the Scenario Manager dialog box, click Add to create a new scenario.
4. Name the scenario (e.g., “Best Case”, “Worst Case”, “Most Likely Case”).
5. Define the changing cells (these are the input values you want to experiment
with, such as sales volume, price, etc.).
6. Enter the values for each scenario you want to test.
7. After setting up all scenarios, click OK.
8. To view the results of each scenario, click Show in the Scenario Manager
dialog box.

18 | P a g e
19 | P a g e
20 | P a g e
21 | P a g e
22 | P a g e
2.2. Goal Seek

Goal Seek helps you find the input value needed to reach a specific result.

● Step-by-Step:
1. Go to the Data tab on the Ribbon.
2. Click on What-If Analysis, then select Goal Seek.
3. In the Goal Seek dialog box, specify the following:
▪ Set cell: The cell that contains the formula you want to solve.
▪ To value: The result you want to achieve.
▪ By changing cell: The input value that will be adjusted to reach the
desired result.
4. Click OK. Excel will calculate the value needed to achieve the target.

23 | P a g e
24 | P a g e
Practical: 3

Aim: Perform the data classification using classification algorithm using


R/Python.

Code:

rainfall <-c(799,1174.8,865.1,1334.6,635.4,918.5,685.5,998.6,784.2,985,882.8,1071)

rainfall.timeseries <- ts(rainfall,start = c(2012,1),frequency = 12)

print(rainfall.timeseries)

png(file = "rainfall.png")

plot(rainfall.timeseries)

dev.off()

Output:

25 | P a g e
Practical: 4

Aim:Perform the data clustering using clustering algorithm using


R/Python.

Code:

library(party)

print(head(readingSkills))

input.dat <- readingSkills[c(1:105),]

png(file = "decision_tree.png")

output.tree <- ctree(nativeSpeaker ~ age + shoeSize + score,data = input.dat)

plot(output.tree)

dev.off()

Output:

26 | P a g e
27 | P a g e
28 | P a g e
Practical: 5
Aim: Perform the Linear regression on the given data warehouse data
using R/Python.
Code:

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function to create a linear regression model
relation <- lm(y ~ x)
# Print the model summary
print(summary(relation))
# Predict the weight of a person with height 170
a <- data.frame(x = 170)
result <- predict(relation, a)
print(result)
# Visualizing the Regression Graphically
png(file = "linearregression.png")
plot(x, y, col = "blue", main = "Height & Weight Regression",
xlab = "Height in cm", ylab = "Weight in Kg", pch = 16)
abline(relation, col = "red")
dev.off()

29 | P a g e
Output:

30 | P a g e
Practical :6
Aim: Perform the logistic regression on the given data warehouse data
using R/Python.
Code:

# Load necessary libraries


library(dplyr)
library(titanic)
library(pROC)

# Load Titanic dataset


data("titanic_train")

# Check dataset structure


head(titanic_train)

# Data Cleaning: Removing rows with missing values


titanic_clean <- titanic_train %>%
filter(!is.na(Age), !is.na(Embarked), !is.na(Sex), !is.na(Pclass))

# Convert categorical variables to factors


titanic_clean$Survived <- as.factor(titanic_clean$Survived)
titanic_clean$Pclass <- as.factor(titanic_clean$Pclass)
titanic_clean$Sex <- as.factor(titanic_clean$Sex)
titanic_clean$Embarked <- as.factor(titanic_clean$Embarked)

# Build Logistic Regression Model


model <- glm(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked,
data = titanic_clean, family = binomial)

# Print model summary


summary(model)

31 | P a g e
32 | P a g e
# Predict probabilities
predictions <- predict(model, type = "response")

# Convert probabilities to binary classification


predictions_class <- ifelse(predictions > 0.5, 1, 0)

# Evaluate Model Accuracy


confusion_matrix <- table(Predicted = predictions_class, Actual =
titanic_clean$Survived)
print(confusion_matrix)

accuracy <- mean(predictions_class == as.numeric(titanic_clean$Survived) - 1)


print(paste("Accuracy:", accuracy))

# ROC Curve
roc_curve <- roc(as.numeric(titanic_clean$Survived) - 1, predictions)
plot(roc_curve, main = "ROC Curve")

Output:
> # Load necessary libraries
> library(dplyr)
> library(titanic)
> library(pROC)
>
> # Load Titanic dataset
> data("titanic_train")
>
> # Check dataset structure
> head(titanic_train)
PassengerId Survived Pclass Name Sex Age
1 1 0 3 Braund, Mr. Owen Harris male 22
2 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38
3 3 1 3 Heikkinen, Miss. Laina female 26
4 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female
35
5 5 0 3 Allen, Mr. William Henry male 35
6 6 0 3 Moran, Mr. James male NA
SibSp Parch Ticket Fare Cabin Embarked
1 1 0 A/5 21171 7.2500 S

33 | P a g e
2 1 0 PC 17599 71.2833 C85 C
3 0 0 STON/O2. 3101282 7.9250 S

34 | P a g e
4 1 0 113803 53.1000 S
C123
5 0 0 373450 8.0500 S
6 0 0 330877 8.4583 Q
>
> # Data Cleaning: Removing rows with missing values
> titanic_clean <- titanic_train %>%
+ filter(!is.na(Age), !is.na(Embarked), !is.na(Sex), !is.na(Pclass))
>
> # Convert categorical variables to factors
> titanic_clean$Survived <- as.factor(titanic_clean$Survived)
> titanic_clean$Pclass <- as.factor(titanic_clean$Pclass)
> titanic_clean$Sex <- as.factor(titanic_clean$Sex)
> titanic_clean$Embarked <- as.factor(titanic_clean$Embarked)
>
> # Build Logistic Regression Model
> model <- glm(Survived ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked,
+ data = titanic_clean, family = binomial)
>
> # Print model summary
> summary(model)

Call:
glm(formula = Survived ~ Pclass + Sex + Age + SibSp + Parch +
Fare + Embarked, family = binomial, data = titanic_clean)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 16.691979 607.920015 0.027 0.978095
Pclass2 -1.189637 0.329197 -3.614 0.000302 ***
Pclass3 -2.395220 0.343356 -6.976 3.04e-12 ***
Sexmale -2.637859 0.223006 -11.829 < 2e-16 ***
Age -0.043308 0.008322 -5.204 1.95e-07 ***
SibSp -0.362925 0.129290 -2.807 0.005000 **
Parch -0.060365 0.123944 -0.487 0.626233
Fare 0.001451 0.002595 0.559 0.576143
Embarke -12.259048 607.919885 -0.020
dC 0.983911
Embarke -13.082427 607.920088 -0.022
dQ 0.982831
Embarke -12.661895 607.919868 -0.021
dS 0.983383
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 964.52 on 713 degrees of freedom
Residual deviance: 632.34 on 703 degrees of freedom
35 | P a g e
AIC: 654.34

36 | P a g e
Number of Fisher Scoring iterations: 13

>
> # Predict probabilities
> predictions <- predict(model, type = "response")
>
> # Convert probabilities to binary classification
> predictions_class <- ifelse(predictions > 0.5, 1, 0)
>
> # Evaluate Model Accuracy
> confusion_matrix <- table(Predicted = predictions_class, Actual = titanic_clean$Survived)
> print(confusion_matrix)
Actual
Predicted 0 1
0 365 83
1 59 207
>
> accuracy <- mean(predictions_class == as.numeric(titanic_clean$Survived) - 1)
> print(paste("Accuracy:", accuracy))
[1] "Accuracy: 0.801120448179272"
>
> # ROC Curve
> roc_curve <- roc(as.numeric(titanic_clean$Survived) - 1, predictions)
Setting levels: control = 0, case = 1
Setting direction: controls < cases
> plot(roc_curve, main = "ROC Curve")

37 | P a g e
38 | P a g e
Practical: 7

Aim: Write a Python program to read data from a CSV file, perform
simple data analysis, and generate basic insights. (Use Pandas is a
Python library).

Code:
import pandas as
pd file_path =
'data.csv'
data = pd.read_csv(file_path)
print("First 5 rows of the
dataset:") print(data.head())
print("\nDataset Information:")
print(data.info()) print("\
nSummary Statistical:")
print(data.describe())
if 'Category' in data.columns:
print("\nUnique values in 'Category'
column:")
print(data['Category'].value_counts())

39 | P a g e
40 | P a g e
Output:

41 | P a g e
Practical: 8a
Aim: Perform data visualization using python on any sales data.

Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = {
'Data': pd.date_range(start='2023-01-01', periods=100, freq='D'),
'Product': ['Product A', 'Product B', 'Product C', 'Product D']* 25,
'Category': ['Electronics', 'Furniture', 'Clothing', 'Books']* 25,
'Region': ['North', 'South', 'East', 'West']* 25,
'Sales': [x * 10 for x in range(100)],
'Profit': [x * 2 for x in range(100)],
}
df = pd.DataFrame(data)
print(df.head())
plt.figure(figsize=(10, 5))
sns.lineplot(data=df, x='Data', y='Sales', marker='o')
plt.title('Sales Trend Over Time')
plt.xlabel('Data')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
plt.figure(figsize=(8,
5))
sns.barplot(data=df, x='Category', y='Sales', ci=None, palette='viridis')
plt.title('Sales by Category')
plt.xlabel('Category')
plt.ylabel('Total Sales')
plt.show()
plt.figure(figsize=(8,
5))
sns.boxplot(data=df, x='Region', y='Profit', palette='coolwarm')
plt.title('Profit Distribution by Region')
plt.xlabel('Region')
plt.ylabel('Profit')
plt.show()
plt.figure(figsize=(8,
5))
product_sales = df.groupby('Product')['Sales'].sum().reset_index()

42 | P a g e
sns.barplot(data=product_sales, x='Product', y='Sales',
palette='magma') plt.title('Total Sales by Product')
plt.xlabel('Product')

43 | P a g e
plt.ylabel('Sales')
plt.show()

Output:

44 | P a g e
45 | P a g e
46 | P a g e
Practical: 8b
Aim: Perform data visualization using PowerBI on any sales data.

Steps:-

Step 1:- Download & Install Power


BI Step 2:- create a sample data

Step 3 :- Import the Sales Dataset


• Click "Home" > "Get Data".
• Choose your data source:
o Excel (XLSX/CSV)
o SQL Server
o Online Services (Google Sheets, SharePoint, etc.)
• Browse and select the dataset.

47 | P a g e
• Click "Load" to import.

Step 4 :- Clean & Transform Data (Power Query)


Step 5 :- Create Data Visualizations
• Click on "Line Chart" in the "Visualizations" pane.
• Drag Order Date to the X-axis.
• Drag Sales Amount to the Y-axis.
• Customize the chart (format labels, add title, etc.).
Step 6:- Add Filters & Interactivity

48 | P a g e
Output:-

49 | P a g e
Practical: 9
Aim: Create the Data staging area for the selected database using SQL

STEPS:-
Step 1:- Create a Staging Database First, create a staging database to
store raw sales data.
CREATE DATABASE
Sales_Staging; USE
Sales_Staging;
Step 2:- Create Staging Tables Create tables that match the structure of
raw sales data but include additional fields like load date and batch ID.
CREATE TABLE Staging_Sales
( SalesID INT PRIMARY KEY,
OrderDate DATE,
ProductName
VARCHAR(100), Category
VARCHAR(50), Region
VARCHAR(50),
SalesAmount
DECIMAL(10,2), Profit
DECIMAL(10,2),
Quantity INT,

50 | P a g e
LoadDate DATETIME DEFAULT CURRENT_TIMESTAMP,
BatchID INT );

51 | P a g e
Step 4:- Perform Data Cleansing & Transformation
• Remove Duplicates DELETE FROM Staging_Sales
WHERE SalesID NOT IN (
SELECT MIN(SalesID) FROM Staging_Sales GROUP BY OrderDate,
ProductName, Region );

52 | P a g e
• Handle Null Values
UPDATE Staging_Sales
SET Profit = 0
WHERE Profit IS NULL;

Output:

53 | P a g e

You might also like