Mini Project – Cold Storage Case Study
Project Report
                                                            Table of Contents
1 Project Objective ............................................................................................................................. 3
2 Assumptions .................................................................................................................................... 3
3 Exploratory Data Analysis – Step by step approach ....................................................................... 3
  3.1 Environment Set up and Data Import ..................................................................................... 3
       3.1.1 Install necessary Packages and Invoke Libraries ............................................................. 3
       3.1.2 Set up working Directory ................................................................................................ 3
       3.1.3 Import and Read the Dataset .......................................................................................... 4
  3.2 Variable Identification ............................................................................................................. 4
      3.2.1 Variable Identification – Inferences ................................................................................ 4
  3.3 Univariate Analysis .................................................................................................................. 4
  3.4 Bi-Variate Analysis................................................................................................................... 5
  3.5 Variable Transformation / Feature Creation .......................................................................... 5
4 Conclusion ....................................................................................................................................... 5
5 Appendix A – Source Code .............................................................................................................. 5
1. Project Objective
The objective of the report is to explore the cold storage data set (“Cold
Storage Case Study”) in R and generate insights about the data set. This
exploration report will consists of the following:
      Importing the dataset in R
      Understanding the structure of dataset
      Graphical exploration
      Descriptive statistics
      Insights from the dataset
2. Assumptions
    The freshness of the products is expected to remain good under the
     temperature range of 2 to 4 degree celsius.
    Designing of a cold store and choosing suitable cooling system are
     important for effective cooling and creating suitable storage conditions.
3. Exploratory Data Analysis – Step by step approach
 A Typical Data exploration activity consists of the following steps:
 1. Environment Set up and Data Import
 2. Variable Identification
 3. Univariate Analysis
 4. Bi-Variate Analysis
 5. Variable Transformation / Feature Creation
 6. Feature Exploration
3.1 Environment Set up and Data Import
3.1.1 Install necessary Packages and Invoke Libraries
Use this section to install necessary packages and invoke associated libraries.
Having all the packages at the same places increases code readability.
List of packages to be installed:
   1. library(readr)
   2. library(ggplot2)
   3. library(readxl)
3.1.2 Set up working Directory
Setting a working directory on starting of the R session makes importing and
exporting data files and code files easier. Basically, working directory is the
location/ folder on the PC where you have the data, codes etc. related to the
project.
Please refer Appendix A for Source Code.
3.1.3 Import and Read the Dataset
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for
importing the file.
Please refer Appendix A for Source Code.
3.2 Variable Identification
meantemp = Used to calculate the mean temperature for full year
sdtemp = Used to calculate the standard deviation of temperature for full year
probtemp = Probability of temperature having fallen below 2 deg C
probtemp2 = Probability of temperature having gone above 4 deg C
P = Probability of penalty for the AMC Company
3.2.1 Variable Identification – Inferences
setwd() = To set working directory
getwd() = To get working directory
attach() = By attaching you can call variables directly (you could avoid using $)
summary() = To analyze the data
nrow() = For number of Samples
ncol() = For number of independent variables
dim() = For dimensions of the data
str() = To understand datatype for each variable
plot() = For graphical representation of data
col() = For colour in box plot
mean() = To calculate the mean
sd() = To calculate the standard deviation
pnorm() = To calculate the probability
aggregate() = To calculate the mean temperature season wise
list() = To list
3.3 Univariate Analysis
Dataframe ATemp is used to tabulate the mean temperature season wise.
         Season                          Mean Temperature
          Rainy                              3.039344
         Summer                              3.153333
         Winter                              2.70813
3.4 Bi-Variate Analysis
Plot function is used to graphically represent the mean cold storage
temperature season wise.
3.5 Variable Transformation / Feature Creation
No need was seen of transforming any variable, few new variables were
created for better understanding of the data, and presenting the results.
4. Conclusion
   1.   The mean cold storage temperature found season wise was;
       Rainy – 3.039344
       Summer – 3.153333
       Winter – 2.70813
   2. Overall mean temperature calculated for full year is 2.96
   3. Overall standard deviation calculated for full year is 0.50
   4. Probability of temperature having fallen below 2 deg C was calculated to
      be 2.91%
   5. Probability of temperature having gone above 4 deg C was calculated to
      be 2.07%
   6. The penalty for the AMC Company calculated is 10%
5. Appendix A – Source Code
#Environment setup and data import
# Set Working Directory
setwd("C:\Users\PKG\Desktop\R Files")
# Get Working Directory
getwd()
# Importing data
mydata = read.csv("Cold_Storage_Temp_Data (1).csv", header = TRUE)
#To view your dataset in R window
mydata
#By attaching you can call variables directly (you could avoid using $)
attach(mydata)
#Analyzing/Summary data
summary(mydata)
#Dimensions of the data
nrow(mydata)# Number of Samples
ncol(mydata)# Number of independent variables
dim(mydata)
#Total no of records : 365 and 4 variables/columns
#Datatype for each variable
str(mydata)
###########################
#Exploratory Data Analysis#
###########################
# Question 1
#Mean cold storage temperature for Summer, Winter and Rainy Season
#Creating a data frame for same
ATemp = aggregate(Temperature,
          list(Season),
          mean)
#Viewing dataframe
ATemp
#Calling function
function(ggplot2)
library(ggplot2)
#Plotted a graph to understand season wise mean temperature
plot(Season, Temperature, horizontal = TRUE,
   geom = "boxplot", col = ("Blue"),
   main = "Mean cold storage temperature season wise",
   xlab = "Temperature",
   ylab = "Season")
#Question 2
#Overall mean for the full year
#Created a variable to store the mean temperature for full year
meantemp = mean(Temperature)
#Question 3
#Standard Deviation for the full year
#Created a variable to store the standard deviation for full year
sdtemp = sd(Temperature)
#Question 4
#probability of temperature having fallen below 2 deg C
probtemp = pnorm(2,
         meantemp,
         sdtemp)
#To view probability in R window
probtemp
#Question 5
#probability of temperature having gone above 4 deg C
probtemp2 = pnorm(4,
          meantemp,
          sdtemp,
          lower.tail = FALSE)
#To view probability in R window
probtemp2
#Question 6
#To calculate penalty for the AMC Company
P = probtemp + probtemp2
**The End**