Computational Statistics
Setia Pramana
2015
Computational Statistics
Course Outline
 Introduction
 Different Statistical Software
 Data Preparation, Management, Manipulation,
Summarization with:
 SPSS
 R (R Commander)
 Ms. Excel
 Data Tabulation and Visualization
Computational Statistics
Course Outline
 Generate Different Statistical Distribution (with
Rcmdr)
 Simple Linear Regression and Correlation
 Basic R Programming
 Developing Simple Graphical User Interface in R
 Resampling Methods
 Statistical Inference (Point and interval
estimation)
Computational Statistics
Course Outline
 Hypothesis testing: one, two sample t-test (test
for mean difference, proportion and variance)
 Analysis of Variance (Anova): one and two way
Anova.
 Introduction to Design of Experiment
 Final Project
Computational Statistics
Course Workload
20% Theory, 80% practice
Group Project (5 students)
Presentation every week
R code would be provided
Slides can be seen at :
http://www.slideshare.net/hafidztio/
Computational Statistics
Reference Books
Computational Statistics
Reference Books
 John Maindonald dan W. John Braun. Data Analysis and
Graphics Using R  an Example-Based Approach. 3rd
Edition. Cambridge University Press: Cambridge.2010.
 John Fox. Journal of Statistical Software, The R
Commander : A Basic-Statistics Graphical User Interface
to R.Volume 14, Issue 9, September 2005.
 Chris Beeley. Web Application Development with R
Using Shiny. Packt Publishing: Birmingham.2013.
 SPSS Statistics Base Users Guide 17.0. Polar
Engineering and Consulting : Chicago, 2007.
Computational Statistics
Reference Books
 Jurusan Komputasi Statistik. Modul Mata Kuliah
Komputasi Statistik. 2014
 Kerns, G. Jays. Introduction to Probability and Statistics
Using R. E book. GNU Free Documentation License.
2010.
 Geof H. Givens dan Jennifer A. Hoeting. Computational
Statistics, 2nd edition. John Wiley and Sons : New
Jersey. 2013
 Jochen Voss. Statistical Computing. E book. 2011.
 Brent B. Welch, Ken Jones dan Jeffrey Hobbs. Practical
Programming in Tcl and Tk. 4Th edition. Prentice Hall
PTR: New Jersey.2003.
Computational Statistics
Other Materials
 https://sites.google.com/site/biostatinfocor
e/home/rworkshop
 https://sites.google.com/site/biostatinfocor
e/biostatistics-workshop
Computational Statistics
Introduction
Computational Statistics
10
Statistics?
Computational Statistics
11
Computational Statistics
12
What is Statistics?
 Statistics: is the science which deals with
collection, classification and tabulation of
numerical facts as the basis for explanation,
description and comparison of phenomenon.
Computational Statistics
13
Observations on the
Bills of
Mortality (1662)
Recorded Plague
related death for
100 years
Computational Statistics
14
What is Statistics?
 Exploring data: Using graphical and numerical
techniques to study patterns and departures from
patterns (in order to interpreting data)
 Sampling and experimentation: Clarifying the
question, deciding on methods of collection and analysis
to produce valid information.
 Anticipating patterns: Exploring random phenomena
using probability and simulation. Probability is our tool for
anticipating distributions...
 Statistical Inference: Estimating population parameters
and testing hypothesis
Computational Statistics
15
Statistical thinking will one day be as
necessary for efficient citizenship as the
ability to read and write HG Well
Computational Statistics
16
Areas of Statistics
Two areas of statistics:
Descriptive Statistics: collection, presentation,
and description of sample data.
Inferential Statistics: making decisions and
drawing conclusions about populations.
Computational Statistics
17
Statistics Descriptive
What is your conclusion?
The fatality rate is:
 40% in the group of drivers who did not wear seat belts
 20%in drivers who did wear seat belts
 Seat belts appear to save lives
Computational Statistics
18
Inferential Statistics
 Are results applicable to the population of all drivers?
(generalization)
 Does wearing seat belts save lives? (assess strength of
evidence)
 Is the fatality rate of those not wearing seat belts higher than
the fatality rate of those wearing seat belts? (comparison)
 How many lives can be saved by wearing seat belts?
(prediction)
 Do other variables influence the conclusion? For example:
the age of driver, alcohol use, type of car, speed at impact
(ask more questions)
Computational Statistics
19
Statistics and the Technology
 The electronic technology has had a tremendous effect
on the field of statistics.
 Many statistical techniques are repetitive in nature:
computers and calculators are good at this.
 Lots of statistical software packages: R, MINITAB,
SYSTAT, STATA, SAS, Statgraphics, SPSS, MS Excel,
and calculators.
Computational Statistics
20
Available Statistical Packages
Computational Statistics
21
Available Statistical Packages
Proprietary
 Excel
 SPSS
 MINITAB
 SAS
 Stata
 Statistica
 Many more 
Free Software
 LibreOffice Calc
 R
 CS Pro
 WinBugs
 EpiInfo
 Many more..
Computational Statistics
22
Computational Statistics
23
Computational Statistics
24
Computational Statistics
25
Computational Statistics
26
Microsoft Excel
Computational Statistics
27
Which one do you use?
Why?
Computational Statistics
28
Statistical Software Used
Computational Statistics
29
Statistical Software Used
Computational Statistics
30
R is HOT !
Computational Statistics
31
R is HOT !
 R is HOT !
Computational Statistics
http://r4stats.com/articles/popularity/
32
R is HOT !
Computational Statistics
http://r4stats.com/articles/popularity/
33
R is HOT !
Computational Statistics
http://r4stats.com/articles/popularity/
34
What is R?
 A language and environment for statistical computing and
graphics.
 An integrated suite of software facilities for data
manipulation, calculation and graphical display.
 First appeared in 1996 by Prof. Ross Ihaka and Robert
Gentleman of the University of Auckland, NZ.
 GNU software -> Free. Similar like S language.
 Open source, maintained and developed by a community
of developers.
 Works in Windows, Unix,
MacOs
Computational Statistics
35
R includes
 Effective data handling and storage facility,
 A suite of operators for calculations on arrays, in particular
matrices
 A large, coherent, integrated collection of intermediate
tools for data analysis,
 Graphical facilities for data analysis and display either onscreen or on hardcopy
 Well-developed, simple and effective programming
language which includes conditionals, loops, user-defined
recursive functions and input and output facilities.
http://www.r-project.org/
Computational Statistics
36
Why R?
 It is not only statistical software but
also a language
 5000 add-on packages  lots of preprepared packages (http://cran.rproject.org/web/packages/)
 With many applications http://cran.rproject.org/web/views/,
http://www.revolutionanalytics.com/rlanguage-features-applications-andextensions#thirdparty .
 Access to powerful, cutting-edge
Computational Statistics
analytics
37
Why R?
 Flexible (complex or standard statistical practices, bayesian
modelling, GIS map building, building interactive web
applications, building interactive tests, etc. )
 We can make our own package and publish it
 Great Graphics and data visualization
 Can be used for High Performance Computer Clusters
 Well Supported by R Community (http://www.inside-r.org/rresources-web)
 And many more..
Computational Statistics
38
Why R?
 Can be integrated with other languages (C/C++,
Java).
 R can interact with many data sources and other
statistical packages (SAS, Stata, SPSS, and Minitab).
 For the high performance computing task 
multiple cores, either on a single machine or across a
network.
Computational Statistics
39
But..
 R has no warranty
 Command Line Interface : difficult for some users.
 Users must learn a new way of thinking about data
and data analysis sequence
 Thats all .. I guess
Computational Statistics
40
Companies using R in 2013
The New York Times routinely uses R for interactive and print data
visualization.
Google has more than 500 R users.
The FDA supports the use of R for clinical trials of new drugs.
The National Weather Service uses R to predict the extent of flooding
events.
Zillow uses R to model housing prices.
The Consumer Financial Protection Bureau uses R and other open
source tools.
Twitter uses R for data science applications on the Twitter database.
FourSquare uses R to develop its recommendation engine.
Facebook uses R to model all sorts of user behaviour.
Source: Revolutionanalytics
Computational Statistics
41
R Library/packages
IsoGene
nlme
lme4
foreign
zoo
R Base Packages
survival
reshape2
ggplot2
zoo
Computational Statistics
42
My R Packages
IsoGene
IsoGeneGUI
nea
neaGUI
biclustGUI
OCRME
More detail: http://setiopramono.wordpress.com/rprogramming/
Computational Statistics
43
R For Cutting Edge
Technologies
Computational Statistics
44
R Graphics and Visualization
R provides wide range graphics and visualizations
Basic Plots: bar plots, basic 3D plots, heatmap.,etc
Geographic Maps
Projection Maps
Social Network Graphs
Animated graphics and movies (animation)
Motion Charts (GoogleViz)
Interactive Graphics (rggobi)
Image format: BMP, JPEG, PDF, PNG etc
and.many more
Computational Statistics
45
R Graphics
Computational Statistics
46
R Graphics
RCircos
Computationalhttps://gjabel.wordpress.com/
Statistics
47
R Graphics
A map of worldwide email traffic
Computational Statistics
48
R Graphics
Facebook connections between city centers around the world
Computational Statistics
49
R Graphical User Interfaces
 R uses Command line interface and it is preferred for
advanced users  allows direct control, more accurate,
flexible and the analysis is reproducible.
 Requires good knowledge of the language  difficult for
beginners or less frequent users.
 R provides tools for building GUIs  RGUI
Computational Statistics
50
R GUI Projects
 Integrated development environment (IDE)/Script
Editors aimed to provide feature-rich environments to
edit R scripts and code: Rstudio (www.rstudio.com),
and architect (www.Openanalytics.eu)
 Web based application: the Rweb (Banfield, 1999),
R.Net (www.u.arizona.edu/~ryckman/Net.php),
or gWidgetsWWW (Verzani, 2012).
Computational Statistics
51
R GUI Projects
 Python: OpenMeta-Analyst (Wallace et al, 2012)
 Java: JGR (Java GUI for R), Deducer (Fellows, 2012),
and Glotaran (Snellenburg, 2012).
 Php: R-php (http://dssm.unipa.it/R-php/)
 Other extensions connect R to graphical toolboxes for
developing menus and dialog boxes: Tcltk, Gtk.
Computational Statistics
52
R Studio
 Download from
Rstudio.com
 Powerfull IDE
(Integrated
Development
Environment) for R.
Computational Statistics
53
RGUI Developed using tcltk
Computational Statistics
54
RGUI: RCommander
 Rcommander.com
 Helpful for R beginner
 Install inside R
Computational Statistics
55
RGUI using C#: Wires
 Developed by STIS
students
 For Spatial Data
Analysis
 Still developing
Computational Statistics
56
RGUI using C#: Wires
Computational Statistics
57
RGUI: Web Based App
Computational Statistics
58
WebBUGS
 Conducting Bayesian
Statistical Analysis
Online
 Combines
OpenBUGS and R
www.webbugs.psychstat.org
Computational Statistics
59
RGUI: Shiny
 A new package from Rstudio to build interactive web
applications with R.
 Really Easy!
 Build useful web applications with only a few lines of
codeno JavaScript required.
 Self learning: http://shiny.rstudio.com/
 http://www.showmeshiny.com/
Computational Statistics
60
RGUI using Shiny: FAST
Figure 5. FAST main page
Computational Statistics
61
Dynamic Report Generation
 Sweave
 knitr
 markdown
Computational Statistics
62
Want to Learn R? Need Help?
Lots of Self learning Resources
http://www.rdatamining.com/resources/onlinedocs
Blogs:
Software
R
Python
SAS
Stata
# Blogs
550
60
40
11
Blogs Source
R-Bloggers.com
SciPy.org
PROC-X.com, sasCommunity.org Planet
Stata-Bloggers.com
User Group: Stockholm R User group, etc
Indonesia/Jakarta?
https://sites.google.com/site/biostatinfocore/introduction-to-r
Computational Statistics
63
Need Help?
Number of R- or SAS-related posts to Stack Overflow by week.
Computational Statistics
64