0% found this document useful (0 votes)
37 views61 pages

Stats10 lecture 1.1 copy - 副本

The document outlines the first lecture of Stats 10, focusing on the importance of data handling, collection, and visualization. It discusses the variability in data and the significance of statistical reasoning, along with class logistics and expectations. Additionally, it covers different types of studies, the concept of causality, and the role of confounding factors in research.

Uploaded by

ssstella.99914
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views61 pages

Stats10 lecture 1.1 copy - 副本

The document outlines the first lecture of Stats 10, focusing on the importance of data handling, collection, and visualization. It discusses the variability in data and the significance of statistical reasoning, along with class logistics and expectations. Additionally, it covers different types of studies, the concept of causality, and the role of confounding factors in research.

Uploaded by

ssstella.99914
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Welcome to Stats 10,

Lecture 1.1
Thomas Maierhofer
Math Sciences 8935
maierhofer@stat.ucla.edu

Based on Rob Gould’s material


“If your experiment needs statistics, you ought to
have done a better experiment.”
Ernest Rutherford
Goals for the Week

• Why is it important to know how to work with data?

• Handling data

• Collecting data

• Visualizing variability
Topics for Today

• Overview of Statistics

• Class logistics

• Handling data

• Types of studies
What’s easier to predict?
• A) The date of the next major earthquake in
California

• B) The time and location of the next total solar


eclipse

• A) The outcome of a soccer game.

• B) The outcome of a chess game


Science of Variability

• Statistical reasoning is necessary when we’re


confronted with variability.

• Variability creates uncertainty.

• Statisticians have developed ways for describing


trends and patterns when faced with variability and
for quantifying uncertainty.
Age of Big Data

• Data automatically collected in vast volume

• Often by machine/sensors (including “human


sensors”)

• Creates opportunities

• But also dangers


You leave a "data trail"
• school records
• bank transactions
• CCTV
• Google searches
• Facebook postings
• Credit card transactions
• Grocery store cards
Talk to the person next to you
and together come up with a list
of 10 times in the last week
you’ve had data collected about
you.

Who/what collected it, and what/


why do you think they collected?
• When visiting web pages: “If you’re not paying
money, then *you* are the product being sold.”
“If a poor student
can’t get a loan
because a lending
model deems him too
risky (by virtue of his
zip code), he’s then
cut off from the kind
of education that
could pull him out of
poverty, and a vicious
spiral ensues.”
Data

• Data are produced by people, machines,


sensors, computers, phones
• Datasets can be very large
• Require organization to nd the message in
the data
fi
• Suppose I know three things about you: your
zipcode, your date of birth, and your gender.

• What percentages of the population of the United


States can be uniquely identi ed by those three
bits of information?
fi
Go to kahoot.it or download the kahoot app

A) 0%-40%
B) 40%-60%
C) 60%-80%
D) 80%-100% Clicker
87%
from the PhD dissertation of
Latania Sweeney, who is now
director of Harvard’s Data
Privacy Lab

Check your own identi ability at


https://aboutmyinfo.org/identity
fi
Opportunities

• Data exist in many forms in many places, free to


use if you know how to get it and how to
understand it.
About this class
all assignments and lecture
slides are posted at
https://bruinlearn.ucla.edu/courses/180405

This BruinLearn page will be used for both Lectures 1 & 2


Read the Syllabus!
FAQs

• Q: I want to switch discussion sections.


How?
• A: Don’t. Attend the section you’ve
enrolled in. Or you can try to nd
somebody to switch with you.
fi
FAQs
• Q: I’m on the waitlist. Will I get in?
• A: Probably not. If you’re not in by end-of-
day today, though, you won’t get in.
• Q: I’m not on the waitlist and not enrolled.
Can I get in?
• A: No. Sorry.
FAQs

• Q: What do I need to do to pass?


• A: Get all points for completion of
homework and labs and do the quizzes.
Answer all easy and medium dif culty
questions on midterm and nal correctly.
fi
fi
FAQs
• Q: How do I get bonus points?
• A: There will be a 1% bonus on your overall
average in this class for
• nding wrong content (more than just
a typo) on my slides
• getting the highest score on the in-class
questions using kahoots
fi
Required Materials
Essential Statistics book
2nd Edition by Ryan &
Gould Simple calculator

(You can also get


Introductory Statistics, 2nd Laptop for Lab sessions
Edition by Ryan & Gould)

(You can also get the third Kahoot App (free)


edition of either book)
Weekly Schedule
• Monday: Submit previous week's homework.
Submit lab report in odd weeks. Read book
chapter for Tuesday lecture. Attend lab
section.
• Tuesday: Attend lecture.
• Wednesday: Attend discussion section.
Read book chapter for Thursday lecture.
• Thursday: Attend lecture.
• Friday: Work on homework and lab.
Big Days
• Midterm: Week 5; instead of class on Thursday
February 8. Here.

• Final Exam: Final's week.

• Lecture 1: Friday, March 22, from 3-6pm

• Lecture 2: Tuesday, March 19, from 8-11am

• Keep Thursday, March 14 (Week 10) open for now


Holidays

No TA sessions on Martin Luther King, Jr. holiday,


Monday, January 15

No TA sessions on Presidents’ Day holiday, Monday,


February 19
Rules

• Late assignments only accepted for


documented medical reasons. Late means
“after the scheduled time.”

• Exams cannot be rescheduled, I will provide


oral make-up exams for documented medical
excuses.
Getting Help
• Visit me. My Of ce Hours are Tuesdays and
Thursdays 9:30-11:30am in MS 8935.

• Visit the TAs, they also have OH!

• Post to the BruinLearn discussion forum! If you


send me an email directly, I’ll ask that you post your
question rst to BruinLearn.

• But email me directly if it is personal, something to


do only with you, or an emergency.
fi
fi
Etiquette
• Please call me "Professor", last name is optional
• Please be on time.
• Please listen when others are asking questions.
• Ask questions.
• Get help as soon as you need it, or better, before
you need it
• Read the book. Do the quizzes. Do the
homework. Do the labs. Practice!
The Basics of Data
Handling
What are the variables?
price
brand
payment type

street address

rating
number of reviews

neighborhood Which of these are numerical variables?


Which are categorical?
time since last report
user who reported
Price is a
A) categorical variable
B) numerical variable

Clicker
Brand is a
A) Categorical variable
B) Numerical variable

Clicker
User who reported price is a
A) Categorical variable
B) Numerical variable

Clicker
5 digit ZIP code is a
A) Categorical variable
B) Numerical variable

Clicker
Star rating is a
A) Categorical variable
B) Numerical variable

Clicker

Other categorical variables are:


payment type, street address, neighborhood

Other numerical variables are:


Time since last review
Caution

• “numerical” or “categorical” often has more to do


about how we use the variable than it’s actual
values. For example, a categorical variable could
be coded with numbers.

• For example: Code gender as 0 or 1, ZIP codes are


5 digit numbers
Organizing data
Hadley Wickham coined the term tidy data
• Data come in many forms, and you’ll see some of
these in your data project.

• But most statistical analysis software likes to see


data in a “tidy” format.

• In a tidy data set, the rows are objects that were


observed (called “cases”). The columns are
variables.

• One row is one case, one column is one variable.


What were the objects observed?
gas stations

Brand address price rating

10801 S.M.
Arco 2.93 4
Blvd

Conserv 11699 San


3.07 4
Fuel Vicente Blvd

10691 Pico
76 3.09 3
Blvd
Collecting Data
Main methods
• observational studies

• controlled experiments

• surveys

• sensors

• census
Things to pay attention to
• How can we determine if there is a cause-and-
effect relationship between variables?

• What’s a confounding factor?

• How can we minimize or eliminate the role of a


confounding factor?

• What’s the difference between an observational


study and a controlled experiment?
Did you know…
The amount of whole milk consumed per person per year is
inversely related to the number of people killed by falling out
of bed?

http://www.tylervigen.com/spurious-correlations
Causality questions
• If I use a cellphone, will I get brain cancer?
• If a child gets a vaccine, will it become autistic.
• Does alcohol consumption cause cancer?
• If I drink coffee, am I more likely to get heart
disease?
• Does taking notes on a laptop lead to lower
performance on tests compared to taking
notes by hand?
• Do fake news postings on Facebook sway
elections?
What complicates these
questions is...
• Variability. Which leads to sometimes great uncertainty.

• For example, people use different brands of cell phones, and use
them in different ways and for different amounts of time. And
people’s genetics might make them more or less susceptible to
types of cancers.

• The way you take notes and the way a friend takes notes might
differ, but your performance on a test might depend just on
whether you were lucky enough to study the right things.

• Sampling issues: How to nd and draw a random sample of all


fake news postings on FB? How do you know somebody fell out
of bed?
fi
Our Framework
• We’ll be simplifying a bit, and considering situations with
only two groups of treatments.

• The treatment group or intervention group receives the


treatment we are interesting in studying. This is the
group forced to drink whole milk, for example. We will
compare to a control group who does not receive the
treatment.

• The response variable is the outcome we are interested


in observing. For example, whether or not the person
dies falling out of bed.
Two types of studies

• Observational

• Subjects “decide” whether or not they get the


treatment or select their own treatment.

• Experiment

• Subjects are assigned to a treatment


How to determine what is
experiment or observation
• First identify the treatment.

• Next identify the response variable.

• Determine whether people were assigned to the


treatment groups.

• Assigned => Experimental Study

• Choice => Observational Study


Studies that have found a link between cell phone use
and brain cancer are most probably

1. Observational
2. Experiments

because....
the treatment is cell phone use, and the response is
whether or not a person gets brain cancer. And
researchers couldn’t possibly have assigned people to use
or not use a cell phone for an extended period of time.

Clicker question
From US News (Sept 2009)

In a study of adults tracked over one u season, vaccines


made from inactivated, or “killed”, u virus -- the
injectable form--provided better protection against the
seasonal u than vaccines made from live attenuated
virus, the type of vaccine in a nasal spray.

Clicker question Most likely, this is


1. Observational
2. Experimental

because researchers could very easily give one group a


nasal spray, and the other the injection.
fl
fl
fl
In the 1990s, CA evaluated a new program to rehabilitate
prisoners before their release; the object is to reduce the
2-year recidivism rate. The program involves several
months of “boot camp”. Admission to the program is
voluntary.
Prison Spokesperson: “Those who complete boot camp
are less likely to return to prison than other inmates.”

What is the treatment group? The control group?

Treatment: Boot camp


Control: Regular prison
The spokesperson’s comparison is based on

A. An observational study
B. An experiment.
Clicker question

The data prove that boot camp worked.


A. True
B. False
Why not?
• Can you think of another reason why those in
bootcamp might have had a different outcome than
those not in boot camp?

• The two groups might have differed based on their


level of motivation and/or self-discipline.

• People motivated to join a boot camp might also be


more highly motivated to stay out of prison.
• A confounding factor is an alternative explanation
for an association.

• Motivation is a confounding factor for the bootcamp


study: motivation leads to people choosing the
bootcamp approach, and motivation leads to
people staying out of prison.

You might also like