0% found this document useful (0 votes)

24 views21 pages

Unit 4-1

Uploaded by

gayathrinaik12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views21 pages

Unit 4-1

Uploaded by

gayathrinaik12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Descriptive Analysis in R Programming

In Descriptive statistics in R Programming Language, we describe our data with the help of
various representative methods using charts, graphs, tables, excel files, etc. In the descriptive
analysis, we describe our data in some manner and present it in a meaningful way so that it can be
easily understood.
Most of the time it is performed on small data sets and this analysis helps us a lot to predict some
future trends based on the current findings. Some measures that are used to describe a data set are
measures of central tendency and measures of variability or dispersion.
Process of Descriptive Statistics in R
 The measure of central tendency
 Measure of variability

Measure of central tendency

It represents the whole set of data by a single value. It gives us the location of central points.
There are three main measures of central tendency:
 Mean
 Mode
 Median

Measure of variability
In Descriptive statistics in R measure of variability is known as the spread of data or how well is
our data is distributed. The most common variability measures are:
 Range
 Variance
 Standard deviation
Need of Descriptive Statistics in R
Descriptive Analysis helps us to understand our data and is a very important part of Machine
Learning. This is due to Machine Learning being all about making predictions. On the other hand,
statistics is all about drawing conclusions from data, which is a necessary initial step for Machine
Learning. Let’s do this descriptive analysis in R.

Descriptive Analysis in R
Descriptive analyses consist of describing simply the data using some summary statistics and
graphics. Here, we’ll describe how to compute summary statistics using R software.

Import your data into R:

Before doing any computation, first of all, we need to prepare our data, save our data in
external .txt or .csv files and it’s a best practice to save the file in the current directory. After that
import, your data into R as follow:
R
# R program to illustrate
# Descriptive Analysis

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors = F)
# Print the first 6 rows
print(head(myData))

Output:
Product Age Gender Education MaritalStatus Usage Fitness Income Miles
1 TM195 18 Male 14 Single 3 4 29562 112
2 TM195 19 Male 15 Single 2 3 31836 75
3 TM195 19 Female 14 Partnered 4 3 30699 66
4 TM195 19 Male 12 Single 3 3 32973 85
5 TM195 20 Male 13 Partnered 4 2 35247 47
6 TM195 20 Female 14 Partnered 3 3 32973 66
R functions for computing descriptive analysis:

Histogram of Age Distribution

R
library(ggplot2)
ggplot(myData, aes(x = Age)) +
geom_histogram(binwidth = 2, fill = "blue", color = "red", alpha = 0.8) +
labs(title = "Age Distribution", x = "Age", y = "Frequency")

Output:

Descriptive Analysis in R Programming

The ggplot2 library to create a histogram of the ‘Age’ variable from the ‘myData’ dataset. The
histogram bins have a width of 2, and the bars are filled with a teal color with a light gray border.
The resulting visualization shows the distribution of ages in the dataset.
Boxplot of Miles by Gender
R
ggplot(myData, aes(x = Gender, y = Miles, fill = Gender)) +
geom_boxplot() +
labs(title = "Miles Distribution by Gender", x = "Gender", y = "Miles") +
theme_minimal()

Output:

Descriptive Analysis in R Programming

Mean
It is the sum of observations divided by the total number of observations. It is also defined as
average which is the sum divided by count.

where n = number of terms

R
# R program to illustrate
# Descriptive Analysis

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors = F)

# Compute the mean value

mean = mean(myData$Age)
print(mean)

Output:
[1] 28.78889

Median
It is the middle value of the data set. It splits the data into two halves. If the number of elements
in the data set is odd then the center element is median and if it is even then the median would be
the average of two central elements.

where n = number of terms

R
# R program to illustrate
# Descriptive Analysis

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors = F)

# Compute the median value

median = median(myData$Age)
print(median)

Output:
[1] 26

Mode
It is the value that has the highest frequency in the given data set. The data set may have no mode
if the frequency of all data points is the same. Also, we can have more than one mode if we
encounter two or more data points having the same frequency.
R
# R program to illustrate
# Descriptive Analysis

# Import the library

library(modeest)

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors = F)
# Compute the mode value
mode = mfv(myData$Age)
print(mode)

Output:
[1] 25

Range
The range describes the difference between the largest and smallest data point in our data set. The
bigger the range, the more is the spread of data and vice versa.
Range = Largest data value – smallest data value
R
# R program to illustrate
# Descriptive Analysis

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors = F)

# Calculate the maximum

max = max(myData$Age)
# Calculate the minimum
min = min(myData$Age)
# Calculate the range
range = max - min

cat("Range is:\n")
print(range)

# Alternate method to get min and max

r = range(myData$Age)
print(r)

Output:
Range is:
[1] 32
[1] 18 50
Variance
It is defined as an average squared deviation from the mean. It is being calculated by finding the
difference between every data point and the average which is also known as the mean, squaring
them, adding all of them, and then dividing by the number of data points present in our data set.

where,
N = number of terms
u = Mean
R
# R program to illustrate
# Descriptive Analysis

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors = F)

# Calculating variance
variance = var(myData$Age)
print(variance)

Output:
[1] 48.21217

Standard Deviation
It is defined as the square root of the variance. It is being calculated by finding the Mean, then
subtract each number from the Mean which is also known as average and square the result.
Adding all the values and then divide by the no of terms followed the square root.

where,
N = number of terms
u = Mean
R
# R program to illustrate
# Descriptive Analysis

# Import the data using read.csv()

myData = read.csv("CardioGoodFitness.csv", stringsAsFactors = F)

# Calculating Standard deviation

std = sd(myData$Age)
print(std)

Output:
[1] 6.943498
R - Linear Regression
Regression analysis is a very widely used statistical tool to establish a relationship model between two variables.
One of these variable is called predictor variable whose value is gathered through experiments. The other
variable is called response variable whose value is derived from the predictor variable.

In Linear Regression these two variables are related through an equation, where exponent (power) of both these
variables is 1. Mathematically a linear relationship represents a straight line when plotted as a graph. A non-
linear relationship where the exponent of any variable is not equal to 1 creates a curve.

The general mathematical equation for a linear regression is −

y = ax + b

Following is the description of the parameters used −

 y is the response variable.

 x is the predictor variable.
 a and b are constants which are called the coefficients.

Steps to Establish a Regression

A simple example of regression is predicting weight of a person when his height is known. To do this we need
to have the relationship between height and weight of a person.

The steps to create the relationship is −

 Carry out the experiment of gathering a sample of observed values of height and corresponding weight.
 Create a relationship model using the lm() functions in R.
 Find the coefficients from the model created and create the mathematical equation using these
 Get a summary of the relationship model to know the average error in prediction. Also
called residuals.
 To predict the weight of new persons, use the predict() function in R.

Input Data

Below is the sample data representing the observations −

# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131

# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48

lm() Function
This function creates the relationship model between the predictor and the response variable.

Syntax

The basic syntax for lm() function in linear regression is −

lm(formula,data)

Following is the description of the parameters used −

 formula is a symbol presenting the relation between x and y.

 data is the vector on which the formula will be applied.

Create Relationship Model & get the Coefficients

Live Demo
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)

print(relation)

When we execute the above code, it produces the following result −

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-38.4551 0.6746

Get the mmary of the RelationshipSu

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)

print(summary(relation))

When we execute the above code, it produces the following result −

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-6.3002 -1.6629 0.0412 1.8944 3.9775

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -38.45509 8.04901 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.253 on 8 degrees of freedom
Multiple R-squared: 0.9548, Adjusted R-squared: 0.9491
F-statistic: 168.9 on 1 and 8 DF, p-value: 1.164e-06

predict() Function
Syntax

The basic syntax for predict() in linear regression is −

predict(object, newdata)

Following is the description of the parameters used −

 object is the formula which is already created using the lm() function.
 newdata is the vector containing the new value for predictor variable.

Predict the weight of new persons

# The predictor vector.

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)

# The resposne vector.

y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.

relation <- lm(y~x)

# Find weight of a person with height 170.

a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

When we execute the above code, it produces the following result −

1
76.22869

Visualize the Regression Graphically

# Create the predictor and response variable.

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
relation <- lm(y~x)

# Give the chart file a name.

png(file = "linearregression.png")

# Plot the chart.

plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

R - Normal Distribution
In a random collection of data from independent sources, it is generally observed that the distribution of data is
normal. Which means, on plotting a graph with the value of the variable in the horizontal axis and the count of
the values in the vertical axis we get a bell shape curve. The center of the curve represents the mean of the data
set. In the graph, fifty percent of values lie to the left of the mean and the other fifty percent lie to the right of the
graph. This is referred as normal distribution in statistics.

R has four in built functions to generate normal distribution. They are described below.

dnorm(x, mean, sd)

pnorm(x, mean, sd)
qnorm(p, mean, sd)
rnorm(n, mean, sd)

Following is the description of the parameters used in above functions −

 x is a vector of numbers.
 p is a vector of probabilities.
 n is number of observations(sample size).
 mean is the mean value of the sample data. It's default value is zero.
 sd is the standard deviation. It's default value is 1.

dnorm()
This function gives height of the probability distribution at each point for a given mean and standard deviation.

Live Demo
# Create a sequence of numbers between -10 and 10 incrementing by 0.1.
x <- seq(-10, 10, by = .1)

# Choose the mean as 2.5 and standard deviation as 0.5.

y <- dnorm(x, mean = 2.5, sd = 0.5)

# Give the chart file a name.

png(file = "dnorm.png")

plot(x,y)

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

pnorm()
This function gives the probability of a normally distributed random number to be less that the value of a given
number. It is also called "Cumulative Distribution Function".

Live Demo
# Create a sequence of numbers between -10 and 10 incrementing by 0.2.
x <- seq(-10,10,by = .2)

# Choose the mean as 2.5 and standard deviation as 2.

y <- pnorm(x, mean = 2.5, sd = 2)

# Give the chart file a name.

png(file = "pnorm.png")

# Plot the graph.

plot(x,y)

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

qnorm()
This function takes the probability value and gives a number whose cumulative value matches the probability
value.

# Create a sequence of probability values incrementing by 0.02.

x <- seq(0, 1, by = 0.02)

# Choose the mean as 2 and standard deviation as 3.

y <- qnorm(x, mean = 2, sd = 1)

# Give the chart file a name.

png(file = "qnorm.png")

# Plot the graph.

plot(x,y)

# Save the file.

dev.off()
When we execute the above code, it produces the following result −

rnorm()
This function is used to generate random numbers whose distribution is normal. It takes the sample size as input
and generates that many random numbers. We draw a histogram to show the distribution of the generated
numbers.

# Create a sample of 50 numbers which are normally distributed.

y <- rnorm(50)

# Give the chart file a name.

png(file = "rnorm.png")

# Plot the histogram for this sample.

hist(y, main = "Normal DIstribution")

# Save the file.

dev.off()

When we execute the above code, it produces the following result −

Binomial Distribution in R Programming
 ++++



Binomial distribution in R is a probability distribution used in statistics. The binomial

distribution is a discrete distribution and has only two outcomes i.e. success or failure. All its
trials are independent, the probability of success remains the same and the previous outcome
does not affect the next outcome. The outcomes from different trials are independent. Binomial
distribution helps us to find the individual probabilities as well as cumulative probabilities over
a certain range.
It is also used in many real-life scenarios such as in determining whether a particular lottery
ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine
the number of heads or tails in a finite number of tosses, for analyzing the outcome of a die, etc.
Formula:

Functions for Binomial Distribution

We have four functions for handling binomial distribution in R namely:
 dbinom()
dbinom(k, n, p)
 pbinom()
pbinom(k, n, p)
where n is total number of trials, p is probability of success, k is the value at which the
probability has to be found out.
 qbinom()
qbinom(P, n, p)
Where P is the probability, n is the total number of trials and p is the probability of success.
 rbinom()
rbinom(n, N, p)
Where n is numbers of observations, N is the total number of trials, p is the probability of
success.
dbinom() Function
This function is used to find probability at a particular value for a data that follows binomial
distribution i.e. it finds:
P(X = k)
Syntax:
dbinom(k, n, p)
Example:
dbinom(3, size = 13, prob = 1 / 6)
probabilities <- dbinom(x = c(0:10), size = 10, prob = 1 / 6)
data.frame(x, probs)
plot(0:10, probabilities, type = "l")

Output :
> dbinom(3, size = 13, prob = 1/6)
[1] 0.2138454
> probabilities = dbinom(x = c(0:10), size = 10, prob = 1/6)
> data.frame(probabilities)
probabilities
1 1.615056e-01
2 3.230112e-01
3 2.907100e-01
4 1.550454e-01
5 5.426588e-02
6 1.302381e-02
7 2.170635e-03
8 2.480726e-04
9 1.860544e-05
10 8.269086e-07
11 1.653817e-08

The above piece of code first finds the probability at k=3, then it displays a data frame
containing the probability distribution for k from 0 to 10 which in this case is 0 to n.
pbinom() Function
The function pbinom() is used to find the cumulative probability of a data following binomial
distribution till a given value ie it finds
P(X <= k)
Syntax:
pbinom(k, n, p)
Example:
pbinom(3, size = 13, prob = 1 / 6)
plot(0:10, pbinom(0:10, size = 10, prob = 1 / 6), type = "l")

Output :
> pbinom(3, size = 13, prob = 1/6)
[1] 0.8419226

qbinom() Function
This function is used to find the nth quantile, that is if P(x <= k) is given, it finds k.
Syntax:
qbinom(P, n, p)
Example:
qbinom(0.8419226, size = 13, prob = 1 / 6)
x <- seq(0, 1, by = 0.1)
y <- qbinom(x, size = 13, prob = 1 / 6)
plot(x, y, type = 'l')

Output :
> qbinom(0.8419226, size = 13, prob = 1/6)
[1] 3
rbinom() Function
This function generates n random variables of a particular probability.
Syntax:
rbinom(n, N, p)
Example:
rbinom(8, size = 13, prob = 1 / 6)
hist(rbinom(8, size = 13, prob = 1 / 6))

Output:
> rbinom(8, size = 13, prob = 1/6)
[1] 1 1 2 1 4 0 2 3
Poisson distribution in R
The Poisson distribution is a discrete distribution that counts the number of events in a Poisson
process. In this tutorial we will review the dpois, ppois, qpois and rpois functions to work with
the Poisson distribution in R.

Poisson distribution
A dpois, ppois, qpois, and rpois in R
Here are some examples of cases where you might use each of these functions.
dpois
The dpois function finds the probability that a certain number of successes
occur based on an average rate of success, using the following syntax:
dpois(x, lambda)
where:
 x: number of successes
 lambda: average rate of success
Here’s an example of when you might use this function in practice:
It is known that a certain website makes 10 sales per hour. In a given hour,
what is the probability that the site makes exactly 8 sales?
dpois(x=8, lambda=10)

#0.112599

The probability that the site makes exactly 8 sales is 0.112599.

ppois
The ppois function finds the probability that a certain number of successes or
less occur based on an average rate of success, using the following syntax:
ppois(q, lambda)
where:
q: number of successes
 lambda: average rate of success
Here’s are a couple examples of when you might use this function in practice:
It is known that a certain website makes 10 sales per hour. In a given hour,
what is the probability that the site makes 8 sales or less?
ppois(q=8, lambda=10)

#0.3328197
The probability that the site makes 8 sales or less in a given hour is 0.3328197.
It is known that a certain website makes 10 sales per hour. In a given hour,
what is the probability that the site makes more than 8 sales?
1 - ppois(q=8, lambda=10)

#0.6671803
The probability that the site makes more than 8 sales in a given hour is 0.66718
03.
qpois
The qpois function finds the number of successes that corresponds to a certain
percentile based on an average rate of success, using the following syntax:
qpois(p, lambda)
where:
 p: percentile
 lambda: average rate of success
Here’s an example of when you might use this function in practice:
It is known that a certain website makes 10 sales per hour. How many sales
would the site need to make to be at the 90th percentile for sales in an
hour?
qpois(p=.90, lambda=10)

#14
A site would need to make 14 sales to be at the 90th percentile for number of
sales in an hour.
rpois
The rpois function generates a list of random variables that follow a Poisson
distribution with a certain average rate of success, using the following syntax:
rpois(n, lambda)
where:
 n: number of random variables to generate
 lambda: average rate of success
Here’s an example of when you might use this function in practice:
Generate a list of 15 random variables that follow a Poisson distribution
with a rate of success equal to 10.
rpois(n=15, lambda=10)

# [1] 13 8 8 20 8 10 8 10 13 10 12 8 10 10 6
Since these numbers are generated randomly, the rpois() function will produce
different numbers each time. If you want to create a reproducible example, be
sure to use the set.seed() command.

POISSON Distribution in R ▷ [dpois, ppois, qpois and rpois functions] (r-coder.com)

Unit 4
No ratings yet
Unit 4
35 pages
BQL Record PDF
No ratings yet
BQL Record PDF
65 pages
Descriptive Analysis in R Programming - GeeksforGeeks-1-12
No ratings yet
Descriptive Analysis in R Programming - GeeksforGeeks-1-12
12 pages
Data Science with R: Key Concepts
No ratings yet
Data Science with R: Key Concepts
12 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Business Analytics (Unit4 Chapter5)
No ratings yet
Business Analytics (Unit4 Chapter5)
7 pages
CB161 (R Lab Manual)
No ratings yet
CB161 (R Lab Manual)
32 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
Unit 3
No ratings yet
Unit 3
11 pages
Capital Gains
No ratings yet
Capital Gains
8 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Stats Lab1
No ratings yet
Stats Lab1
11 pages
Report Stats PDF
No ratings yet
Report Stats PDF
23 pages
Advanced Statistics
No ratings yet
Advanced Statistics
259 pages
R Data Types 8
No ratings yet
R Data Types 8
7 pages
R Module 5
No ratings yet
R Module 5
21 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 3
8 pages
R Record-1
No ratings yet
R Record-1
57 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
Measure of Central Tendency Practical
No ratings yet
Measure of Central Tendency Practical
7 pages
BA - Unit 4 (P2)
No ratings yet
BA - Unit 4 (P2)
17 pages
Module2 Analytical Tool
No ratings yet
Module2 Analytical Tool
25 pages
Unit V Statistics R
No ratings yet
Unit V Statistics R
60 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Module V 1
No ratings yet
Module V 1
7 pages
R Programming
No ratings yet
R Programming
8 pages
Rdias FDP
No ratings yet
Rdias FDP
50 pages
STATISTICS
No ratings yet
STATISTICS
6 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
26 pages
DSR 2879
No ratings yet
DSR 2879
25 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Teaching Notes of R
No ratings yet
Teaching Notes of R
78 pages
I Am Sharing 'DOC-20250811-WA0005.' With You
No ratings yet
I Am Sharing 'DOC-20250811-WA0005.' With You
16 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
Unit-15 Data Analysis and R
No ratings yet
Unit-15 Data Analysis and R
12 pages
DWDM - Lab Manual1
No ratings yet
DWDM - Lab Manual1
40 pages
Data Analysis in R
No ratings yet
Data Analysis in R
10 pages
Group 5 - Applied Statistics and Experimental 152611
No ratings yet
Group 5 - Applied Statistics and Experimental 152611
28 pages
Case Studies in R
No ratings yet
Case Studies in R
4 pages
Data Analysis and Data Visualization Basics 2
No ratings yet
Data Analysis and Data Visualization Basics 2
50 pages
R Console
No ratings yet
R Console
6 pages
R UNIT 3 STatistic N Probabilty
No ratings yet
R UNIT 3 STatistic N Probabilty
17 pages
FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only
No ratings yet
FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only
77 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
Module5 Bigdata Analytics
No ratings yet
Module5 Bigdata Analytics
110 pages
Exploratory Data Analysis - NOTES
No ratings yet
Exploratory Data Analysis - NOTES
31 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
42 pages
F24 Lab-01
No ratings yet
F24 Lab-01
4 pages
Java Lab Cbcs
No ratings yet
Java Lab Cbcs
36 pages
V and Vi Sem
No ratings yet
V and Vi Sem
19 pages
ADA Module 2
No ratings yet
ADA Module 2
24 pages
ADA Module 4
No ratings yet
ADA Module 4
27 pages
Final Nep Java Lab
No ratings yet
Final Nep Java Lab
40 pages
Lab Programs
No ratings yet
Lab Programs
22 pages
Unit 1
No ratings yet
Unit 1
42 pages
Unit 3-1
No ratings yet
Unit 3-1
54 pages
Unit 5 To Students
No ratings yet
Unit 5 To Students
41 pages
Unit 2
No ratings yet
Unit 2
41 pages
Unit III
No ratings yet
Unit III
80 pages
WEB Programming Laboratory Manual BSC 6 Sem: Web Design Lab HTML
No ratings yet
WEB Programming Laboratory Manual BSC 6 Sem: Web Design Lab HTML
54 pages
Java Programming Basics Guide
No ratings yet
Java Programming Basics Guide
51 pages
Technical Data Sheet B38G 220-240V 50/60Hz 1 R134a: Compressor Model Voltage Refrigerant
No ratings yet
Technical Data Sheet B38G 220-240V 50/60Hz 1 R134a: Compressor Model Voltage Refrigerant
4 pages
Electrical Specs for Office Project
No ratings yet
Electrical Specs for Office Project
4 pages
Presentation of Parts of Speech
No ratings yet
Presentation of Parts of Speech
23 pages
Lee Colortran ENR Wall Pack Spec Sheet 1990
No ratings yet
Lee Colortran ENR Wall Pack Spec Sheet 1990
4 pages
Lexical Semantics
No ratings yet
Lexical Semantics
32 pages
Home Package Physics Form Five 24/03/2020 Answer All Questions
100% (1)
Home Package Physics Form Five 24/03/2020 Answer All Questions
26 pages
Iii Sem - CS - Minor (Java)
No ratings yet
Iii Sem - CS - Minor (Java)
4 pages
Flasheff2 Quick Guide: The Fe2 Package
No ratings yet
Flasheff2 Quick Guide: The Fe2 Package
7 pages
Review Your Answers: Q & A Exam (Adv V11)
No ratings yet
Review Your Answers: Q & A Exam (Adv V11)
7 pages
Biology Handout: Osmosis & Diffusion
No ratings yet
Biology Handout: Osmosis & Diffusion
12 pages
Machine Tool Structure
No ratings yet
Machine Tool Structure
11 pages
500 Vô tuyến MI17
No ratings yet
500 Vô tuyến MI17
73 pages
Fishing Operations
100% (1)
Fishing Operations
81 pages
Fso Corrigendum
No ratings yet
Fso Corrigendum
3 pages
New Energy Technologies Issue 12
0% (1)
New Energy Technologies Issue 12
81 pages
Evaporation: Physical Separation Processes ECH3118 Faizah MD Yasin
100% (1)
Evaporation: Physical Separation Processes ECH3118 Faizah MD Yasin
49 pages
Linguistic Analysis of Adjectives
No ratings yet
Linguistic Analysis of Adjectives
175 pages
Insert CK 07531389001 V1 en
No ratings yet
Insert CK 07531389001 V1 en
4 pages
Structural Seismic Design Guide
100% (1)
Structural Seismic Design Guide
58 pages
Retail Insights for Executives
No ratings yet
Retail Insights for Executives
5 pages
IntroToDossiers PDF
No ratings yet
IntroToDossiers PDF
119 pages
Exp No-8
No ratings yet
Exp No-8
11 pages
Scienece 8 Cells Tissues Organs and Systems May 30 2017
No ratings yet
Scienece 8 Cells Tissues Organs and Systems May 30 2017
74 pages
03 - AOL - Oracle Application Object Library (AOL) Training Manual
No ratings yet
03 - AOL - Oracle Application Object Library (AOL) Training Manual
51 pages
Crs Triton Common Rail System Ok
100% (3)
Crs Triton Common Rail System Ok
78 pages
BSC Computer Science 36 Months Planner
No ratings yet
BSC Computer Science 36 Months Planner
1 page
Discussion Debate
100% (2)
Discussion Debate
384 pages
PT6A-42A Wash Procedure
100% (1)
PT6A-42A Wash Procedure
17 pages
Ic卡制卡软件使用说明 En
No ratings yet
Ic卡制卡软件使用说明 En
42 pages
Response Sheet
No ratings yet
Response Sheet
32 pages

Unit 4-1

Uploaded by

Unit 4-1

Uploaded by

Descriptive Analysis in R Programming

Measure of central tendency

Import your data into R:

# Import the data using read.csv()

Histogram of Age Distribution

Descriptive Analysis in R Programming

Descriptive Analysis in R Programming

where n = number of terms

# Import the data using read.csv()

# Compute the mean value

where n = number of terms

# Import the data using read.csv()

# Compute the median value

# Import the library

# Import the data using read.csv()

# Import the data using read.csv()

# Calculate the maximum

# Alternate method to get min and max

# Import the data using read.csv()

# Import the data using read.csv()

# Calculating Standard deviation

The general mathematical equation for a linear regression is −

Following is the description of the parameters used −

 y is the response variable.

Steps to Establish a Regression

The steps to create the relationship is −

Below is the sample data representing the observations −

The basic syntax for lm() function in linear regression is −

Following is the description of the parameters used −

 formula is a symbol presenting the relation between x and y.

Create Relationship Model & get the Coefficients

# Apply the lm() function.

When we execute the above code, it produces the following result −

Get the mmary of the RelationshipSu

# Apply the lm() function.

When we execute the above code, it produces the following result −

The basic syntax for predict() in linear regression is −

Following is the description of the parameters used −

Predict the weight of new persons

# The predictor vector.

# The resposne vector.

# Apply the lm() function.

# Find weight of a person with height 170.

When we execute the above code, it produces the following result −

Visualize the Regression Graphically

# Create the predictor and response variable.

# Give the chart file a name.

# Plot the chart.

# Save the file.

When we execute the above code, it produces the following result −

dnorm(x, mean, sd)

Following is the description of the parameters used in above functions −

# Choose the mean as 2.5 and standard deviation as 0.5.

# Give the chart file a name.

# Save the file.

When we execute the above code, it produces the following result −

# Choose the mean as 2.5 and standard deviation as 2.

# Give the chart file a name.

# Plot the graph.

# Save the file.

When we execute the above code, it produces the following result −

# Create a sequence of probability values incrementing by 0.02.

# Choose the mean as 2 and standard deviation as 3.

# Give the chart file a name.

# Plot the graph.

# Save the file.

# Create a sample of 50 numbers which are normally distributed.

# Give the chart file a name.

# Plot the histogram for this sample.

# Save the file.

When we execute the above code, it produces the following result −

Binomial distribution in R is a probability distribution used in statistics. The binomial

Functions for Binomial Distribution

The probability that the site makes exactly 8 sales is 0.112599.

POISSON Distribution in R ▷ [dpois, ppois, qpois and rpois functions] (r-coder.com)

You might also like