0% found this document useful (0 votes)
277 views16 pages

Elliott 2017

This document discusses inference methods for nonprobability samples as an alternative to traditional probability sampling. It outlines two main approaches for making inferences from nonprobability samples: quasi-randomization, which estimates pseudo-inclusion probabilities based on available covariates, and superpopulation modeling, which uses a model to predict values for nonsample units. The document also provides background on the decline of probability sampling due to low response rates and the rise of easily accessible "big data".

Uploaded by

Max Sarmento
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
277 views16 pages

Elliott 2017

This document discusses inference methods for nonprobability samples as an alternative to traditional probability sampling. It outlines two main approaches for making inferences from nonprobability samples: quasi-randomization, which estimates pseudo-inclusion probabilities based on available covariates, and superpopulation modeling, which uses a model to predict values for nonsample units. The document also provides background on the decline of probability sampling due to low response rates and the rise of easily accessible "big data".

Uploaded by

Max Sarmento
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Statistical Science

2017, Vol. 32, No. 2, 249–264


DOI: 10.1214/16-STS598
© Institute of Mathematical Statistics, 2017

Inference for Nonprobability Samples


Michael R. Elliott and Richard Valliant

Abstract. Although selecting a probability sample has been the standard for
decades when making inferences from a sample to a finite population, incen-
tives are increasing to use nonprobability samples. In a world of “big data”,
large amounts of data are available that are faster and easier to collect than
are probability samples. Design-based inference, in which the distribution for
inference is generated by the random mechanism used by the sampler, cannot
be used for nonprobability samples. One alternative is quasi-randomization in
which pseudo-inclusion probabilities are estimated based on covariates avail-
able for samples and nonsample units. Another is superpopulation modeling
for the analytic variables collected on the sample units in which the model is
used to predict values for the nonsample units. We discuss the pros and cons
of each approach.
Key words and phrases: Coverage error, hierarchical regression, quasi-
randomization, reference sample, selection bias, superpopulation model.

1. INTRODUCTION incorrectly predicted that Alf Landon would win by a


landslide over the incumbent, Franklin Roosevelt. In
Probability sampling became the touchstone for fact, Roosevelt won the election in a landslide, carry-
good survey practice some decades ago after Neyman ing every state except for Maine and Vermont (Squire,
(1934) presented the theory for stratified and clus- 1988). As Squire noted, the magazine’s respondents
ter sampling based on the randomization distribution. consisted mostly of automobile and telephone owners
Neyman also showed that a type of nonrandom quota plus the magazine’s own subscribers. This pool under-
sample of Italian census records drawn by Gini and represented Roosevelt’s core of lower-income support-
Galvani had failed to provide satisfactory estimates for ers. In the same election, several pollsters (Gallup,
many variables in the census. Quoting Smith (1976), Crossley and Roper) using much smaller but more
“This combined attack was overwhelming and since representative quota samples correctly predicted the
that day random sampling has reigned supreme.” An- outcome (Gosnell, 1937). However, it is worth noting
other early nail in the coffin of nonrandom sampling that in the 1948 US presidential elections, Gallup and
was the notable failure of a one enormous, but non- Roper erroneously forecasted that Dewey would win
probability, sample to correctly forecast the 1936 US using quota sampling methods similar to those from
presidential election result. In pre-election polls, the 1936. Quota samples are themselves nonprobability
Literary Digest magazine collected 2.3 million mail samples but are controlled to be distributed more like a
surveys from mostly middle-to-upper income respon- random sample from a population would be.
dents. Although this sample size was huge, the poll More recent examples of polls that failed to correctly
predict election outcomes are the 2015 British par-
Michael R. Elliott is Professor, Biostatistics Department &
liamentary election (Cowling, 2015), the 2015 Israeli
Research Professor, Institute for Social Research, Knesset election (Liebermann, 2015) and the 2014
University of Michigan, ISR Rm 4068, 426 Thompson St., governor’s race in the US state of Maryland (Enten,
Ann Arbor, Michigan 48109, USA (e-mail: 2014). The widespread failure of the British 2015 polls
mrelliott@umich.edu). Richard Valliant is Research led to an extensive evaluation by two professional soci-
Professor, Institute for Social Research, University of eties (Sturgis et al., 2016). There were various potential
Michigan & Joint Program in Survey Methodology, reasons for the misfires, including samples with low
University of Maryland, 1218 Lefrak Hall, College Park, contact and response rates, samples based on unrep-
Maryland 20742, USA (e-mail: rvallian@umd.edu). resentative volunteer panels, inability to predict which

249
250 M. R. ELLIOTT AND R. VALLIANT

respondents would actually vote, question wording and Repeatedly attempting to get nonrespondents to coop-
framing, deliberate misreporting, and volatility in vot- erate, which is standard procedure in probability sam-
ers’ opinions about candidates. The samples for the ples, can be expensive and time-consuming. Eliminat-
2015 British polls were online or telephone polls that ing nonresponse followup is also an expedient way of
could not be considered probability samples of all reg- cutting costs. In telephone-only surveys, no amount of
istered voters. Demographic population totals for char- nonresponse followup is likely to boost response to the
acteristics like age, sex, region, social grade and work- rates that were considered minimally acceptable 10 to
ing status were used to set quota sample and weighting 15 years ago. For these reasons, nonprobability sam-
targets. After evaluating eight putative explanations, pling is currently staging a kind of renascence (e.g.,
Sturgis et al. (2016) concluded that the British polls see Berzofsky, Williams and Biemer, 2009, Dever and
were wrong because of their unrepresentative samples. Valliant, 2014).
The statistical adjustment procedures that were used There are also other data sources that are currently
did not correct this basic problem. receiving attention and might be considered for fi-
On the other hand, selecting a probability sample nite population estimation (Couper, 2013). Social me-
does not guarantee that the cooperating units will pro- dia and other data that can be scraped from the web
vide a good basis for inference to a population. In might be used for gauging public opinion (Murphy
many types of surveys response rates have declined et al., 2015) or measuring changes in consumer prices
dramatically, casting doubt on how well these samples (Cavallo and Rigobon, 2016). Although the inferential
represent the population. Pew Research reported that issues raised subsequently apply to these “big data,” we
their response rates (RRs) in typical telephone surveys mainly concern ourselves with nonprobability samples
dropped from 36% in 1997 to 9% in 2012 (Kohut et al., that were directly collected for the purposes of making
finite population estimates.
2012). With such low response rates, a sample initially
selected randomly can hardly be called a probability 1.1 Types of Nonprobability Samples
sample from the desired population. Low RRs raise the
There are a number of types of nonprobability sam-
question of whether probability sampling is a viable
ples that are summarized briefly below. Regardless of
methodology for general population surveys without
type, there is quite a bit of controversy about the use
expensive face-to-face data collection methods which
of nonprobability surveys for making inferences. Sec-
usually have higher response.
tion 2 describes the potential problems with nonproba-
For some purposes, convenience samples or other
bility samples that can bias inferences. However, these
types of nonprobability samples have long been ac- concerns are not limited to finite population inference.
ceptable. For example, using convenience samples in Keiding and Louis (2016) is a recent discussion of
experimental studies is standard practice, even when problems with self-selected entry to epidemiological
the conclusions are intended to apply to some larger studies and surveys. Stuart et al. (2011) considers the
population. The inferences are model-based and come use of propensity cores to generalize results from ran-
from assuming that the experimental effects are ho- domized trials to populations. Kaizar (2015) reviews
mogeneous among all units in the relevant population. approaches that have been proposed for combining ran-
Models are also used for inference in observational domized and nonrandomized studies in the estimation
studies where, in contrast to designed experiments, as- of treatment efficacy. O’Muircheartaigh and Hedges
signments of interventions or treatments are not con- (2014) describe the use of stratified propensity scores
trolled by an experimenter. However, the lack of ran- for analyzing a nonrandomized social experiment.
domization in those studies may threaten their validity For finite population sampling, the American Asso-
(Madigan et al., 2014). Inferences from nonprobability ciation of Public Opinion Research (AAPOR) has is-
samples must also rely on models, rather than the dis- sued two task force reports on the use of nonprobabil-
tribution generated by random sampling, to project a ity samples—neither of which favored their use. Baker
sample to a larger finite population. et al. (2010) studied the use of online Internet panels;
Obtaining data without exercising much control over Baker et al. (2013a, 2013b) cover nonprobability sam-
the set of units for which it is collected is often cheaper pling generally. Baker et al. (2010) recommended on
and quicker than probability sampling where efforts are several grounds that researchers not use online pan-
made to use a frame that covers most or all of the popu- els if the objective is to accurately estimate popula-
lation, and units are randomly selected from the frame. tion values. Among other reasons, they noted that (i)
NONPROBABILITY SAMPLES 251

some comparative studies showed that nonprobability observational studies. A variation of matching in sur-
samples were less accurate than probability samples; vey sampling is to match the units in a nonprobability
(ii) the demographic composition of different panels sample with those in a probability sample. Each unit in
can affect estimates; and (iii) not all panel vendors the nonprobability sample is then assigned the weight
fully disclose their methods. Baker et al. (2013a) took of its match in the probability sample. Rivers (2007)
a more nuanced view that inferences to a population describes this type of sampling matching in the con-
from nonprobability samples can be valid but that the text of web survey panels. Other techniques developed
modeling assumptions needed are difficult to check. by Rosenbaum and Rubin (1983) and others for ana-
Nonprobability surveys capture participants through lyzing observational data have also been applied when
various methods. The AAPOR task force on nonprob- attempting to develop weights for some volunteer sam-
ability sampling (Baker et al., 2013a) characterized ples.
these samples into three broad types: In network sampling, members of some target pop-
ulation (usually a rare one like intravenous drug users
1. Convenience sampling. or men who have sex with men) are asked to identify
2. Sample matching. other members of the population with whom they are
3. Network sampling. somehow connected. Members of the population that
Baker et al. (2013a) describe these in some detail; are identified in this way are then asked to join the
we briefly summarize them here. Convenience sam- sample. This method of recruitment may proceed for
pling is a form of nonprobability sampling in which several rounds. Snowball sampling (also called chain
easily locating and recruiting participants is the pri- sampling, chain-referral sampling or referral sampling)
mary consideration. No formal sample design is used. is an example of network sampling in which existing
Some types of convenience samples are mall inter- study subjects recruit additional subjects from among
cepts, volunteer samples, river samples, observational their acquaintances. These samples typically do not
studies and snowball samples. In a mall intercept sam- represent any well-defined target population, although
they are a way to accumulate a sizeable collection of
ple, interviewers try to recruit shoppers to take part in
units from a rare population.
some study. Usually, neither the malls nor the people
Sirken (1970) is one of the earliest examples of
are probability samples.
network or multiplicity sampling in which the net-
Volunteer samples are common in social science,
work that respondents report about is clearly defined
medicine and market research. Volunteers may partici-
(e.g., members of a person’s extended family). Prop-
pate in a single study or become part of a panel whose
erly done, a multiplicity sample is a probability sample
members may be recruited for different studies over the
because a person’s network of recruits is well-defined.
course of time. A recent development is the opt-in web Heckathorn (1997) proposed an extension to this called
panel in which volunteers are recruited when they visit respondent driven sampling (RDS) in which persons
particular web sites (Schonlau and Couper, 2017). Af- would report how many people they knew in a rare
ter becoming part of a panel, the members may par- population and recruit other members of the rare popu-
ticipate in many different surveys, often for some type lation. RDS has been used in many applications. For
of incentive. River samples are a version of opt-in web example, Frost et al. (2006) used RDS to locate in-
sampling in which volunteers are recruited at a number travenous drug users; Schonlau, Weidmer and Kapteyn
of websites. Some thought may be given to the set of (2014) used it in an attempt to recruit an internet panel.
websites used for recruitment with an eye toward ob- If some restrictive assumptions on how the recruiting
taining a cross-section of demographic groups. is done are satisfied, probabilities of being included in
In sample matching, the members of a nonproba- a sample can be computed and used for inferences to
bility sample are selected to match a set of important a full rare population, but these assumptions can easily
population characteristics. For example, a sample of be violated (e.g., see Gile and Handcock, 2010). Be-
persons may be constructed so that its distribution by cause the network applications are extremely special-
age, race-ethnicity and sex closely matches the distri- ized, we will not address them further.
bution of the inference population. Quota sampling is
1.2 General Framework for Inference
an example of sample matching. The matching is in-
tended to reduce selection biases as long as the covari- Smith (1983) discusses the general problem of mak-
ates that predict survey responses can be used in match- ing inferences from nonrandom samples. His formula-
ing. Rubin (1979) presents the theory for matching in tion is to consider the joint density of the population
252 M. R. ELLIOTT AND R. VALLIANT

vector of an analysis variable, Y = (Y1 , Y2 , . . . , YN ) Since the sample values are observed, we use lower
and the population vector of 0–1 indicator variables, case y for them; upper case is used for the unobserved,
δ s = (δ1 , δ2 , . . . , δN ) for a sample s. The presentations nonsample values. In this simple case, the nonsample
of Rubin (1976) and Little (1982) on selection mecha- sum, ts̄ , is often estimated (or predicted) by aweighted
nisms and survey nonresponse are closely related. Sup- sum of the sample observations, that is, tˆs̄ = i∈s wi yi
pose that X is an N × p matrix of covariates that can where wi is a weight that may be dependent on the
be used in designing a sample or in constructing es- units in the sample. [Alternative ways of calculating
timators. The conditional density of Y given X and weights in probability samples are discussed in Haziza
a parameter vector  is f (Y|X; ). The density of and Beaumont (2017)]. 
Typically, the estimator can
δ s given Y, X, and another unknown parameter  is also be written as tˆs̄ = i∈s̄ ŷi where ŷi is a prediction
f (δ s |Y, X; ). The joint model for Y and δ s is for nonsample unit i. Thus, for totals the estimation
problem is one of prediction.
(1) f (Y, δ s |X; , ) = f (Y|X; )f (δ s |Y, X; ). Estimation of model parameters often requires solv-
Note that this allows the possibility that being in the ing a set of estimating equations for the parameter es-
sample depends on Y, that is, to be not missing at ran- timates. The estimating equations can be linear in the
dom (NMAR). In a probability sample (without nonre- parameters, as for linear regression or nonlinear, as for
sponse or other missingness that is out of control of generalized linear models. In design-based finite popu-
the sampler), f (δ s |Y, X; ) = f (δ s |X). The density lation estimation, the estimating equations include sur-
f (δ s |X) is the randomization distribution and is the vey weights and are estimators of types of finite pop-
basis for design-based inference. However, in a non- ulation totals (Binder and Roberts, 2009). If weights
probability sample, the distribution of δ s can depend are constructed for a nonprobability sample that are
on both Y and an unknown parameter . Depending appropriate for estimating totals, then those weights
on the application, inference can be based on either can also be used in the estimating equations. Conse-
f (Y|X; ) or f (δ s |Y, X; ) or on a combination of quently, weight construction for nonprobability sam-
both. ples can play the same role in estimation as in proba-
We term two general approaches to making infer- bility sampling.
ences from nonprobability samples as quasi-random- Baker et al. (2013a) discuss the methods that have
ization and superpopulation. Quasi-randomization is been proposed for weighting nonprobability samples.
described in Section 3 and requires modeling f (δ s | Such samples lack many of the features that guide
Y, X; ). Ideally, the probability of being in the sam- weighting in probability samples. A nonprobability
ple is not NMAR and a model can be found for sample is not selected randomly from an explicit sam-
f (δ s |X; ). The superpopulation approach is cov- pling frame. Consequently, selection probabilities can-
ered in Section 4 and involves modeling f (Y|X; ). not be computed, and the usual method of comput-
Both of these approaches involve models, but the ap- ing base weights (inverses of selection probabilities)
proaches are fundamentally different. In the quasi- does not apply. Weights can, however, be computed
randomization approach the probability of a unit’s be- using the quasi-randomization or superpopulation ap-
ing included in the sample is modeled. In the superpop- proaches noted above.
ulation approach, the analytic variables (y’s) collected
in the sample are modeled. Deville (1991) also covers 2. POTENTIAL PROBLEMS WITH
these approaches in the context of quota sampling. NONPROBABILITY SAMPLES
Descriptive statistics, like means and totals, and ana- Since nonprobability samples are often obtained in
lytic statistics, like model parameters, are common es- a poorly controlled or uncontrolled way, they can be
timands in finite population estimation. Detailed dis- subject to a number of biases when the goal is infer-
cussion of the latter is given in Lumley and Scott ence to a specific finite population. Several issues are
(2017). Finite population totals are the simplest target listed here in the context of voluntary Internet panels,
to discuss. A total of some quantity Y can be written but other types of nonprobability samples can suffer
as the sum of the values over the set of sample units, s, from similar problems.
and the sum over the nonsample units s̄: Selection bias occurs if the seen part of the popula-
 
tU = yi + Yi ≡ ts + ts̄ . tion (the sample) differs from the unseen (the nonsam-
i∈s i∈s̄ ple) in such a way that the sample cannot be projected
NONPROBABILITY SAMPLES 253

substantial amount of undercoverage of the full popula-


tion. The coverage varies considerably by demographic
group. Only 58.3% of households where the head is
65 or older have the Internet. Black non-Hispanic and
Hispanic households are less likely to have access than
other race-ethnicities. Households in metropolitan ar-
eas are more likely to have access. There is also a clear
dependence on income and education. As income and
education increase, so does the percentage of house-
holds with access. As illustrated in Dever, Rafferty and
Valliant (2008), these coverage errors can lead to bi-
ased estimates for many items.
F IG . 1. Illustration of potential and actual coverage of a target Selection bias occurs when some groups are also
population.
more likely to volunteer for a panel. Bethlehem (2010)
reviews this issue for web surveys. Vonk, van Ossen-
to the full population. Whether a nonprobability sam- bruggen and Willems (2006) report that ethnic minori-
ple covers the desired population is a major concern. ties and immigrant groups were systematically under-
For example, in a volunteer web panel only persons represented in Dutch panels. They also found that, rel-
with access to the Internet can join a panel. To describe ative to the general population, the Dutch online pan-
three components of coverage survey bias, Valliant and els contained disproportionately more voters, more So-
Dever (2011) defined three populations, illustrated in cialist Party supporters, more heavy Internet users and
Figure 1: (1) the target population of interest for the fewer churchgoers.
study U ; (2) the potentially covered population given Nonresponse of several kinds affects web panels.
the way that data are collected, Fpc ; and (3) the actual Many panel vendors have a “double opt-in” procedure
covered population, Fc , the portion of the target popu-
for joining for a panel. First, a person registers his/her
lation that is recruited for the study through the essen-
name, email and some demographics. Then the vendor
tial survey conditions. For example, consider an opt-in
sends the person an email that must be responded to in
web survey for a smoking cessation study. The target
order to officially join the panel. This eliminates peo-
population U may be defined as adults aged 18–29 who
ple who give bogus emails but also introduces the pos-
currently use cigarettes. The potentially covered pop-
sibility of registration nonresponse since some people
ulation Fpc would be those study-eligible individuals
do not respond to the vendor’s email. People may also
with Internet access who visit the sites where study re-
cruitment occurs; those actually covered Fc would be click on a banner ad advertising the panel but never
the subset of the potential covered population who par- complete all registration steps. Alvarez, Sherman and
ticipate in the study. Selecting a sample only from Fc Van Beselaere (2003) report that, during the recruit-
results in selection bias. The sample s are those per- ment of one panel, just over 6% of those who clicked
sons who are invited to participate in the survey and through a banner ad to the panel registration page even-
who actually do. The U − Fpc area in the figure are the tually completed all the steps required to become a
many persons who have Internet access but never visit panel member. Finally, a panel member asked to par-
the recruiting websites or who do not have Internet ac- ticipate in a survey may not respond.
cess at all. In many situations, U − Fpc is vastly larger Attrition is another problem—persons may lose in-
than either Fc or Fpc . terest and drop out of a panel. Many surveys are tar-
To illustrate a case that is rife with coverage prob- geted at specific groups, for example, young Black fe-
lems, we further consider surveys done using panels males. A panelist that is in one of these “interesting”
of persons recruited via the Internet. Table 1 lists per- groups may be peppered with survey requests and drop
centages of households in the US in 2013 estimated out for that reason. Another reason that some groups,
from the American Community Survey (ACS) that like the elderly, are over-burdened is that they may be
have some type of Internet subscription (File and Ryan, oversampled to make up for anticipated nonresponse.
2014). The ACS estimates are based on a sample of Measurement error is also a worry in nonprobabil-
about 3.5 million households. About 25% of house- ity surveys as they are in any survey. The types of error
holds had no Internet subscription, which in itself is a that have been demonstrated in some studies are effects
254 M. R. ELLIOTT AND R. VALLIANT

TABLE 1
Percentages of US households with Internet subscriptions;
2013 American Community Survey

Percent of households with


some Internet subscription

Total households 74.4


Age of householder
15–34 years 77.7
35–44 years 82.5
45–64 years 78.7
65 years and older 58.3
Race and Hispanic origin of householder
White alone, non-Hispanic 77.4
Black alone, non-Hispanic 61.3
Asian alone, non-Hispanic 86.6
Hispanic (of any race) 66.7
Limited English-speaking household
No 75.5
Yes 51.4
Metropolitan status
Metropolitan area 76.1
Nonmetropolitan area 64.8
Household income
Less than $25,000 48.4
$25,000–$49,999 69.0
$50,000–$99,999 84.9
$100,000–$149,999 92.7
$150,000 and more 94.9
Educational attainment of householder
Less than high school graduate 43.8
High school graduate 62.9
Some college or associate’s degree 79.2
Bachelor’s degree or higher 90.1

due to questionnaire design, mode and peculiarities of ers to probability samples and online to nonprobabil-
respondents. For example, the persons who participate ity samples. As they noted, “Only one of these studies
in panels tend to have higher education levels. The mo- yielded consistently equivalent findings across meth-
tivation for participating may be a sense of altruism for ods, and many found differences in the distributions
some but may be just to collect an incentive for oth- of answers to both demographic and substantive ques-
ers. Participants are often paid per survey completed. tions. Further, these differences generally were not sub-
Some respondents speed through surveys, answering stantially reduced by weighting.”
as quickly as possible to collect the incentive. This is a
Despite all of these actual and potential problems,
form of “satisficing” where respondents do just enough
online panels are now widely used. For example, the
to get the job done (Simon, 1956). On the other hand,
Washington Post newspaper and the company, Survey-
self-administered online surveys do tend to elicit more
reports of socially undesirable behaviors, like drug use, Monkey, have recently mounted a nonprobability, on-
than do face-to-face surveys. Higher reports are usu- line poll of over 75,000 registered voters that covers all
ally taken to be more nearly correct. But, it may be that 50 states in the US (Clement, 2016). Baker et al. (2010)
the people taking those surveys just behave undesirably quotes the market research newsletter, Inside Research
more often than the general population. as estimating the total spent on online research in 2009
Baker et al. (2010, page 739) list 19 studies where at about $2 billion USD, the vast majority of which is
the same questionnaire was administered by interview- supported by online panels.
NONPROBABILITY SAMPLES 255

3. QUASI-RANDOMIZATION APPROACH terms in (2). The first two probabilities—having Inter-


net access and volunteering for the panel—are more
In the quasi-randomization approach, pseudo-
difficult. Both are likely to depend on the xi covariates
inclusion probabilities are estimated and used to cor-
and, in a worse case, upon the Y ’s. For example, per-
rect for selection bias. Given estimates of the pseudo-
sons with higher socioeconomic status are more likely
probabilities, design-based formulas are used for point
to have access; younger people are more likely to join
estimates and variances. Using the earlier notation, the
a panel than older ones. In some countries, probabil-
goal is to estimate f (δ s |Y, X; ) or f (δ s |X; ). Hav-
ity samples that represent the full population may in-
ing a situation where the sample inclusion probabilities
clude questions on Internet access. The US National
do not depend on the Y ’s is ideal since the nonsample Health Interview Survey routinely includes such ques-
Y ’s are unknown, but verifying that this is the case is tions. The probability of volunteering (given Internet
impossible in most applications. There is some liter- access) is harder to estimate.
ature on estimation when nonsample data are NMAR Reference survey. One approach is to use a reference
(e.g., see Little, 2003), but the methods generally re- survey in parallel to the nonprobability survey. The ref-
quire information on nonsample units that is available erence survey can be a probability survey selected from
only in specialized applications. Thus, the practical ap- either (i) the population of persons who have Internet
proach is to estimate f (δ s |X; ). access or (ii) the full population including persons that
To illustrate how involved estimating these probabil- do not have the Internet. The reference sample might
ities may be, consider a case in which a volunteer panel also be a census that covers the entire population. The
of persons is recruited to provide a pool from which a statistical approach is to combine the reference sample
sample of persons is selected. To respond to a survey, and the sample of volunteers and fit a model to predict
a person must have Internet access, volunteer for the the probability of being in the nonprobability sample,
panel, be selected for the particular survey and then re- as described in Section 3.1.
spond. Considering all of these, the probability of per- A key requirement of the reference survey is that it
son i participating in that Web survey [using a simpler include the same covariates xi as the volunteer sur-
notation than f (δ s |X; ) above] can be decomposed vey so that a binary regression can be fitted to per-
as mit estimation of inclusion probabilities for the volun-
P (xi ) teers. One possibility for a reference survey is to use a
    publicly available dataset collected in a well designed
(2) = P i ∈ I |xi P i ∈ V |I, xi and executed probability survey (like one done by a
   
· P i ∈ sV |V , I, xi P i ∈ sV R |sV , V , I, xi , central government agency). Another possibility is for
the survey organization to conduct its own reference
where survey. In the latter case, some specialized questions,
xi = a vector of covariates for person i that are beyond the usual age/race/sex/education types of de-
predictive of participation; mographics, can be added that are felt to be predic-
I = set of persons with Internet access, that is, tive of volunteering and of the analysis variables for
Fpc in Figure 1; V = set of persons who volunteer; the volunteer survey. Schonlau, van Soest and Kapteyn
P (i ∈ I |xi ) = probability of having access to the (2007) refer to these extra covariates as webographics.
Internet; However, identifying webographics that are useful be-
P (i ∈ V |I, xi ) = probability of volunteering for yond the standard demographics (age, race-ethnicity,
an opt-in panel given that person i has access to the sex, income and education) is difficult (Lee and Val-
Internet; liant, 2009). Of course, another problem with conduct-
ing your own reference survey is that doing a high qual-
P (i ∈ sV |V , I, xi ) = probability that person i
ity survey with good coverage of the target population
was subsampled from the panel and asked to partici-
is expensive and may be beyond the means of many
pate with sV denoting the subsample from the panel;
organizations.
P (i ∈ sV R |sV , V , I, xi ) = probability that per-
Sample matching is another approach to attempting
son i responds given selection for the subsample with
to reduce selection biases in a nonprobability sample.
sV R denoting the set of survey respondents.
As noted in Baker et al. (2013a), the matching can
Standard methods (e.g., see Valliant, Dever and be done on an individual or aggregate level. If, for
Kreuter, 2013) can be used to compute the last two each case in a volunteer sample, a matching case is
256 M. R. ELLIOTT AND R. VALLIANT

found in a probability, reference sample, this would (Elliott and Davis, 2005):
be individual-level matching. The matches would be  
found based on covariates available in each dataset. P Si∗ = 1|xi = xo
This may be done based on individual covariate values P (xi = xo |Si∗ = 1)P (Si∗ = 1)
or on propensity scores as described in Rosenbaum and =
P (xi = xo )
Rubin (1983). This is an example of predictive mean (3)
P (xi = xo |Si∗ = 1)P (Si∗ = 1)P (Si = 1|xi = xo )
matching in which an imputation of an inclusion prob- =
ability is made for each nonprobability unit. P (Si = 1)P (xi = xo |Si = 1)
Matching at the aggregate level consists on making P (xi = xo |Si∗ = 1)P (Si = 1|xi = xo )
the frequency distribution of the nonprobability sample ∝ ,
P (xi = xo |Si = 1)
the same as that of the population. Quota sampling is
an example of this. For example, the age × race distri- where P (Si = 1)/P (Si∗ = 1) can be treated as a nor-
bution of the sample might be controlled to be the same malizing constant.
as that in the population. If we start with a large panel Estimating P (xi = xo |Si∗ = 1) and P (xi = xo |Si =
of volunteers, a subsample might be selected to achieve 1) can be difficult for a general joint distribution of
this kind of distributional balance. Each person would covariates x, but extensions of discriminant analysis
receive the same weight, which is the same way that (without making a normality assumption) provide a
a proportionally allocated probability sample would be way around this problem. Combine the probability and
treated. Considered in this way, quota sampling falls nonprobability samples and let Zi = 1 for nonproba-
into the quasi-randomization framework. bility cases (i.e., Si∗ = 1, Si = 0) and Zi = 0 for the
A probability sample used as a reference survey or probability cases (i.e., Si∗ = 0, Si = 1) conditional on
in sample matching ideally must not be subject to cov- being in the combined probability-nonprobability sam-
erage or other types of bias. As noted in Section 1, ple (i.e., Si∗ + Si = 1). Then
many probability samples are now subject to high non-
P (xi = xo |Zi = 1)
response rates and are tantamount to nonprobability
samples themselves. Poor quality reference or match- P (xi = xo |Zi = 0)
ing samples can lead to biased estimators of the in- P (Zi = 1|xi = xo )P (xi = xo )/P (Zi = 1)
(4) =
clusion probabilities in (2) and, consequently, biased P (Zi = 0|xi = xo )P (xi = xo )/P (Zi = 0)
estimators from the nonprobability sample. This is an P (Zi = 1|xi = xo )
argument for using large, well-controlled samples con- ∝ .
P (Zi = 0|xi = xo )
ducted by central governments for reference or match-
ing samples if at all possible. For example, in a house- As long as sampling fractions are small, P (Si =
hold survey in the US, the American Community Sur- 1, Si∗ = 0) ≈ P (Si = 1) and P (Si = 0, Si∗ = 1) ≈
vey (https://www.census.gov/programs-surveys/acs/) P (Si∗ = 1), so P (xi |Zi = 0) = P (xi |Si = 1, Si∗ = 0) ≈
would be a good choice. P (xi |Si = 1) and P (xi |Zi = 1) = P (xi |Si = 0, Si∗ =
1) ≈ P (xi |Si∗ = 1). Thus,
3.1 Estimation Using Pseudo-Weights
 
This approach assumes that the nonprobability sam- P Si∗ = 1|xi = xo
ple actually does have a probability sampling mecha- · P (Zi = 1|xi = xo )
nism, albeit one with probabilities that have to be es- ∝ P (Si = 1|xi = xo ) .
P (Zi = 0|xi = xo )
timated under identifying assumptions. The goal is to
estimate this unknown probability of selection relying The resulting “pseudo-weight” is given by
on a true probability sample or a census with common  
wi = 1/P̂ Si∗ = 1|xi = xo
variables that explain the unknown sampling mecha- (5)
nism (Elliott, 2009, Elliott et al., 2010). Let Si denote P̂ (Zi = 0|xi = xo )
the sampling indicator for the probability sample, Si∗ ∝ 1/P̂ (Si = 1|xi = xo ) .
P̂ (Zi = 1|xi = xo )
denote the indicator for the nonprobability sample, and
xi be the set of common covariates available to both If the covariates xi that are available in both the non-
samples that are assumed to fully govern the sampling probability and probability sample match those used
mechanism for both. Applying Bayes rule, we have to design the probabilities of selection/inclusion in the
NONPROBABILITY SAMPLES 257

probability sample, (5) can be written as be adapted to cases where the nonprobability sample
  represents only a portion of the population.
wi = 1/P̂ Si∗ = 1|xi = xo
(6) If analysis of the nonprobability sample only is re-
P̂ (Zi = 0|xi = xo ) quired, the pseudo-weight construction is complete. If
∝ w̃i , the nonprobability and probability samples are to be
P̂ (Zi = 1|xi = xo ) combined, the nonprobability sample pseudo-weights
where w̃i is the inverse of the probability of selection and probability sample weights are normalized so that
for the nonprobability unit in the probability sampling the weighted fraction of the nonprobability sample is
frame. Otherwise, in the more likely setting where xi equal to the unweighted fraction of the nonprobabil-
does not correspond precisely to the probability sample ity sample cases in the combined dataset, and similarly
design variables, P̂ (Si = 1|xi = xo ) can be estimated the weighted fraction of the probability sample is equal
by regressing xi on w̃i−1 via beta regression (Ferrari to the unweighted fraction of the probability sample
and Cribari-Neto, 2004) in the probability sample, and cases in the combined dataset (Korn and Graubard,
predicting P (Si = 1|xi = xo ) for the nonprobability 1999, pages 278–284). This ensures that the sum of
sample elements. the combined weights continues to approximate the
The term P̂ (Zi = z|xi = xo ) can be obtained via lo- population size, and that each sample will contribute
gistic regression, or, to reduce model misspecification in proportion to their unweighted sample size. This
if xi is of high dimensionality, via least absolute shrink- is accomplished by setting ŵi = CS ∗ 
× wi for CS ∗ =
age and regression operator (LASSO) (Tibshirani, nS ∗ /(nS + nS ∗ ) × i I (Zi = 0)w̃i / i I (Zi = 1)wi
1996, LeBlanc and Tibshirani, 1998), Bayesian addi- for the nonprobability sample cases and ŵi = CS × w̃i
tive regression trees (BART) (Chipman, George and for CS = nS /(nS + nS ∗ ).
McCulloch, 2010), or super learner algorithms that To obtain inference, the pseudo-weights or the nor-
combine estimators from numerous model fitting meth- malized pseudo-weights and probability sample
ods (Van der Laan, Polley and Hubbard, 2007). In some weights in the combined dataset can be used to ob-
settings, the nonprobability sample will represent only tain weighted point estimates. For variance estimation,
a portion of population; for example, in a setting with a bootstrap or jackknife estimator should be used to in-
a binary outcome Y (e.g., injured/uninjured) only pos- corporate both sampling variability in the estimation of
itive outcomes Y = 1 (e.g., injuries) might be repre- the pseudo weights and in the estimation of the main
sented in the nonprobability dataset; in this case (5) is quantity of interest. In the absence of true design infor-
updated as mation in the nonprobability sample, resampling at the
  subject level for the bootstrap or leave-one-out compu-
wi = 1/P̂ Si∗ = 1|xi = xo
tation of the pseudo-estimate for the jackknife can be
(7) ∝ 1/P̂ (Si = 1|xi = xo , Yi = 1) applied. However, some thought must be given to the
structure of the convenience sample. For example, the
P̂ (Zi = 0|xi = xo ) websites used to recruit a volunteer web panel might
· .
P̂ (Zi = 1|xi = xo ) properly be considered as clusters if different types of
An alternative to estimating the probability of unit persons visit the different sites (Brick, 2015). For the
i’s being in the nonprobability sample is used by some probability sample, resampling clusters within strata
panel vendors. The probability (reference) and non- and use of the Rao–Wu bootstrap (Rao and Wu, 1988,
probability samples are combined, but a logistic regres- Rao, Wu and Yue, 1992) to accommodate weights can
sion is run to estimate P (Si∗ = 1|xi = xo ), not condi- be used. For the jackknife, clusters within strata should
tioned on being in the combined probability and non- be dropped, with standard weighting up by the number
probability sample (e.g., see Valliant and Dever, 2011). of clusters divided by the number of clusters retained
This is done by assigning a weight of 1 to the non- to maintain the stratum size should be used. For each
probability cases, the probability sampling weight to bootstrap or jackknife iteration, the pseudo-weights
the probability cases, and running a weighted logistic should be recomputed as well as the point estimator
regression. The model predictions, thus, refer to the using the dropped-out or resampled data.
unconditional probability, P (Si∗ = 1|xi = xo ), not the
4. SUPERPOPULATION MODEL APPROACH
probability conditional on being in the combined sam-
ple. Whether this method is better or worse than (5) In the superpopulation modeling approach, a statis-
has not been studied, although, as noted above, (5) can tical model is fitted for a Y analysis variable from the
258 M. R. ELLIOTT AND R. VALLIANT

sample and used to project the sample to the full pop- For some common estimation methods like poststrat-
ulation. That is, inferences are based on f (Y|X; ). ification, only population totals of the covariates are
This approach could, of course, also be used with a required to construct the estimator, so that individual
probability sample. The difference here is that design- nonsample X values are unnecessary. Suppose that the
based inference, where the randomization distribution mean of a variable yi follows a linear model:
is under the control of the sampler, is not an option for
a nonprobability sample. As noted in Smith (1983), the EM (yi |xi ) = xTi β,
sample selection mechanism can be ignored for model- where the subscript M means that the expectation is
based inferences about the distribution of Y if with respect to the model, xi is a vector of p covariates
(8) f (δ s |Y, X; ) = f (δ s |X; ), for unit i and β is a parameter vector. Given a sample s,
an estimator of the slope parameter is β̂ = A−1 T
s Xs ys
which would be the formal justification for using only
where As = XTs Xs , Xs is the n × p matrix of covariates
f (Y|X; ). There are purposive, nonprobability sam-
for the sample units, and ys is the n-vector of sample
ples that satisfy (8). For example, selecting the n units
y’s. (Weighted least squares might also be used if there
with the largest x values as is done by US Energy
were evidence of nonhomogeneous model variances.)
Information Administration (2016), or sampling bal-
A prediction of the value of a unit in the set of nonsam-
anced on population moments of covariates (Royall,
1970, 1971) are ignorable, nonprobability plans. How- ple units, denoted by r, is ŷi = xTi β̂. A predictor of the
ever, in nonprobability samples where the selection of population total is
 
sample units is not well-controlled, (8) may not hold tˆ1 = yi + ŷi
and the quasi-randomization and superpopulation ap- i∈s i∈s̄
proaches could be combined. (9) 
Note that Y can be partitioned between the sam- = yi + (tU x − tsx )T β̂,
ple and nonsample units as Y = (Ys , Ys̄ ). Thus, i∈s

f (Y|X; ) = f (Ys |Ys̄ , X; )f (Ys̄ |X; ). If f (Ys | where tU x is the total of the x  s in the population and
Ys̄ , X; ) = f (Ys |X; ), then Ys and Ys̄ are inde- tsx is the sample sum of the x  s. This estimator is also
pendent conditional on the covariates, X. If model- equal to the general regression estimator (GREG) of
based inferences are desired for , these can be done Särndal, Swensson and Wretman (1992) if the inverse
based only on f (Ys |X; ). However, if descriptive in- selection probabilities in that estimator are all set to 1.
ferences are required for the full population Y, then The theory for this prediction approach is extensively
f (Ys̄ |X; ) must be estimated. If this model has the covered in Valliant, Dorfman and Royall (2000). If the
same form as f (Ys |X; ), then the model fitted from sample is a small fraction of the population, as would
the sample can be used to predict values for the non- be the case for most volunteer web surveys, the predic-
sample. If this is not the case, inference to the full pop- tion estimator is approximately the same as predicting
ulation may be difficult or impossible. the value for every unit in the population and adding
To introduce the superpopulation approach, consider the predictions:
the simple case of estimating a finite population total. 
The general idea in model-based estimation when es- (10) tˆ2 = ŷi = tTU x β̂.
timating a total is to sum the responses for the sample i∈U
cases and add to them the sum of predictions for non-
sample cases. The key to forming unbiased estimates The population mean of y can be estimated by Ȳˆ =
X̄UT β̂ where X̄ = t /N , the population vector of co-
is that the variables to be analyzed for the sample and U Ux
nonsample follow a common model and that this model variate means.
can be discovered by analyzing the sample responses. The estimators in (9) or (10) are quite flexible in
When both the sample and nonsample units follow the what covariates can be included. For example, we
same model, model parameters can be estimated from might predict the amount that people have saved for re-
the sample and used to make predictions for the non- tirement based on their occupation, years of education,
sample cases. An appropriate model usually includes marital status, age, number of children they have and
covariates, as in f (Ys |X; ) above, which are known region of the country in which they live. Constructing
for each individual sample case. The covariates may the estimator would require that census counts be avail-
or may not be known for individual nonsample cases. able for each of those covariates. Another possibility is
NONPROBABILITY SAMPLES 259

to use estimates from some other larger or more accu- where vi is a variance parameter that does not have to
rate survey (e.g., Dever and Valliant, 2010, 2016). The be specifically defined. The variance estimators below
reference surveys mentioned earlier could be a source will work regardless of the form of vi (as long as it is
of estimated control totals in which webographic co- finite).
variates might be used. For use below, define ai to be wi − 1 where wi is
Both (9) and (10) can be written so that they are either w1i or w2i . The variance estimators below then
weighted sums of y’s. If (9) is used, the weight for unit apply for either of the w1i or w2i weights. The predic-
i is w1i = 1 + tTrx A−1
s xi where trx = tU x − tsx . In (10), tion variance of an estimator of a total, tˆ, is defined
the weight is w2i = tTU x A−1
s xi . The estimated  total for as
an analysis variable can be written as tˆ = s wi yi  
(12) VM (tˆ − tU ) = ai2 vi + vi .
where wi is either w1i or w2i . Notice that these weights
i∈s i∈r
depend only on the x’s not on y. As a result, the same
set of weights could be used for all estimates. It is true The population total of y, tU , is subtracted on the
that a single set of weights will not be equally efficient left-hand side because the sum is random under the
for every y, but this situation is also true for design- model. As long as the fraction of the population that
based weights. is sampled is very small, the second term on the right-
In the superpopulation (y-model) approach, statis- hand side above is inconsequential compared to the
tical properties, like bias and variance, are computed first. The variance estimators are built from the model
conditional on the set of sample units that is ob- residuals, ri = yi − xTi β̂. An estimator of the dominant,
served. This contrasts to the quasi-randomization ap- first term is

proach where the pseudo design-based calculations av- (13) ai2 v̂i ,
erage over the random appearance in the sample of s
units that have the same configuration of covariates ob-
served in the sample. A quasi-randomization estima- where v̂i can be any of three choices: (i) ri2 , (ii) ri2 /(1 −
tor that only uses inverse estimated inclusion probabil- hii ), or (iii) [ri /(1 − hii )]2 where hii is the leverage for
ities as weights will be biased under a y-model where unit i, defined as the diagonal element of the hat matrix
EM (y|x) depends on covariates. Consequently, the y- H = XTs A−1 s Xs . As the sample size increases and if no
model approach to constructing estimators can produce x is extreme, each leverage will converge to zero.
more precise estimators than the quasi-randomization The estimators of the first term are robust in the sense
approach alone. Chen (2015) gives some numerical il- that they are approximately model-unbiased regardless
lustrations of this approach applied to a nonprobability of the form of vi (which is unknown) as long as the
sample. sampling fraction is small. The first choice, v̂i = ri2 ,
when used in (13), gives an example of a sandwich es-
4.1 Variance Estimation for Prediction Estimators timator. The second choice adjusts for the fact that ri2
For the frequentist methods, estimating the vari- is slightly biased for vi . The third choice is very simi-
ance of an estimator is the usual step toward mak- lar to the jackknife in which one sample unit at a time
ing inferences about population values. There are sev- is deleted, a new estimate of the total computed, and
eral choices for variance estimators when model-based the variance among those delete-one estimates is used.
weighting is used. These are described in Valliant, Since the second term in (12) is usually negligible com-
Dorfman and Royall (2000, Chapter 5). To fully de- pared to the first, misspecifying its form is likely to be
fine the model, we need to add a variance specification. unimportant. Valliant, Dorfman and Royall (2000) pro-
The ones we summarize here are appropriate for mod- vide some options for estimating that term.
els in which units are mutually independent. Although The bootstrap is another replication estimator that
model-based estimators have been extended to cases should be equally robust, although, to our knowledge,
where units are correlated within clusters (Valliant, finite population, model-based theory has not been
Dorfman and Royall, 2000, Chapter 9), these clustered worked-out for the bootstrap. The bootstrap should
structures are often unnecessary for the web surveys also be consistent for estimating the variance of esti-
and similar cases that we cover here. Suppose that the mated quantiles, unlike the jackknife. If the population
full model is totals for some of the covariates are estimated from an
EM (yi |xi ) = xTi β independent survey, then the variance in (12) should
(11) be modified by adding a term to reflect that additional
VM (yi |xi ) = vi , uncertainty (e.g., see Dever and Valliant, 2010, 2016).
260 M. R. ELLIOTT AND R. VALLIANT

4.2 Hierarchical Regression Modeling To deal with instabilities in the estimation of β,


a number of authors have considered adding hierar-
This approach can be explained by viewing calibra-
tion approaches such as poststratification and raking as chical models to the mean regression model. Holt and
flowing from special cases of model (11). In the case of Smith (1979) first suggested a model for unit i in com-
poststratification, this can be viewed as regression on bination h of the form:
all of the (discrete) calibration variables and their in-  
yih |μh ∼ N μh , σ 2
teractions. Assume that the calibration variable xi con- (17)  
sists of p binary indicators, xi1 , . . . , xip : μh ∼ N μ, τ 2 .
μyi = EM (yi |xi ) The mean estimator is again given by (16), where μ̂h =
2 2
p
 E(μh |y) = σ 2 /nτ +τ 2 y h + σ 2σ/n/n+τ
h 2
2 y for known σ and
= β0 + βk1 I (xik1 = 1) h h

k1 =1
τ 2 and sample 
sizes nh within the hth combination of
x’s, and n = h nh ; ȳh is the sample mean for units in
p 
 p
the hth combination and ȳ is the mean for all units. In
(14) + βk1 ,k2 I (xik1 = 1)I (xik2 = 1) + · · ·
practice, σ 2 and τ 2 are replaced, for example, with em-
k1 =1 k2 =2
pirical Bayes estimators. Simulation studies in Elliott
p
 and Little (2000) showed that exchangeable priors of
+ βk1 ,k2 I (xikp−1 = 1)I (xikp = 1) the form (17) were somewhat fragile, tending to over-
kp−1 =p−1
smooth when σ 2 and τ 2 were approximately equal. Al-
p
 ternative priors that ordered the strata or poststrata h
+ · · · + βk1 ,k2 ,...,kp I (xikl = 1), by sampling weights wh = Nh /nh for population size
l=1 Nh and included information about this structuring in
where I (·) is a binary indicator variable. Raking as- either the prior mean or the variance (e.g., having the
sumes main effects only: mean be a function of wh , or the variance an autore-
p
 gressive structure as a function of |h − h |) had much
(15) μyi = EM (yi |xi ) = β0 + βk I (xik = 1). better performance with respect to coverage and mean
k=1 square error.
Denote the 2p possible combinations of values of Wang et al. (2015) used an extension of this hier-
x1 , . . . , xp by h = 1, . . . , 2p . The resulting estimates of archical model approach, termed multilevel regression
a population mean are given by and stratification (MRP), to obtain estimates of voting
2p
behavior in the 2012 US Presidential election from a

(16) Yˆ = Ph μ̂h , highly nonrepresentative convenience sample of nearly
h=1 350,000 Xbox users, empaneled 45 days prior to the
election. This large sample, combined with highly pre-
where Ph is the proportion of the population whose
combination of binary indicator variables is equal to h. dictive covariates about voting behavior, including in-
That is, the Ph are special cases of X̄U at the beginning formation about party identification and 2008 Presi-
of this section. dential election voting behavior, allowed for a refined
The estimated mean, μ̂h , of the hth combination is prediction model that incorporated numerous interac-
found by replacing each β with an estimator, β̂, in (14) tions and used priors on the βs to stabilize parame-
for the poststratification estimator and in (15) for the ter estimates and resulting values of μh . The values
raking estimator. (Note that μ̂h is an estimator of μyi of Ph were estimated via probability sample exit polls
for each unit in combination h.) These correspond to from the 2008 US Presidential election, themselves
the weighted estimates obtained from poststratification of very large size (over 100,000). Wang et al. (2015)
or raking. Both of these models can be extended to gen- showed that, despite the fact that the raw Xbox esti-
eralized linear regression by replacing μyi with the ap- mates were severely biased in favor of Romney, re-
propriate link function g(μyi ) (logistic link for logistic flecting its largely male and white sample composi-
regression of a binary outcome, log link for a count out- tion, accurate estimates of voting behavior were ob-
come, etc.). Intermediate models between poststratifi- tained, based on comparisons with aggregated proba-
cation (14) and raking (15) can be fit by incorporating bility sampling polls as well as the final election re-
some but not all possible interaction terms. sult. This accuracy was due to the large sample size
NONPROBABILITY SAMPLES 261

that allowed prediction of voting behavior among de- probability and nonprobability samples. Draws of
cidedly under-represented elements of the population p(Xns |Ys , Xs Zp ) can be made under (18) and im-
(e.g., older minority females), combined with the hier- putations of Yns made by alternating between draws
archical regression modeling to stabilize predictions. of p(θ |Y, X) and p(Yns |Ys , X, θ ). Full implemen-
tation is made by obtaining L Bayesian draws of
4.2.1 Multilevel regression and stratification via Ys , Xs , Zp , S draws of Xns via a weighted FPBB, and
Bayesian finite population inference. Wang. et al.’s im- finally M draws of Yns via standard multiple impu-
plementation of MRP ignored uncertainty in the esti- tation methods (including, possibly, MRP models of
mation of the Ph from the probability sample. While the form used in Wang et al., 2015). Inference about
this may have been warranted due to its large size, in Y , or, more typically, functions Q ≡ Q(Y ) can then
general failure to account for this variance will lead be made via the approximate posterior distribution
to anti-conservative inference (too narrow confidence of Q  given by t
  L−1
(QL , (1 + L−1 )V L ) where QL =
intervals). An alternative approach would be utilize = l (Q̃ − QL )
1 (lms) 1 (l) 2
LMS l m s q and V L L−1
a Bayesian finite population inference approach that  
for Q̃(l) = MS 1
m sq
(lms) and q (lms) is Q(Y (lms) )
treats the unsampled elements in the population as
missing data, together with the variable Y that is miss- where Y (lms) = (Ys , Yns lms ) for Y lms obtained from the
ns
ing in the probability sample data but available in the sth imputation of the mth weighted FPBB of the lth
BB. Details are available in Zhou, Elliott and Raghu-
nonprobability sample data.
nathan (2016c, 2016a, 2016b), where empirical results
Let X be the variables available in the probabil-
are also presented.
ity and nonprobability sample for prediction of Y , Z
be the probability sample design variables, and let 5. CONCLUSION
(Xns , Zns ) and (Xp , Zp ) represent the nonsampled and
probability-sampled elements of the population, re- Although selection of probability samples has been
spectively. Dong, Elliott and Raghunathan (2014) ob- the standard for inference in finite populations for over
tain nonparametric draws from the posterior predictive 60 years, there are now many other sources of data that
distribution of the nonsampled elements (Xns |Xs , Zp ) seem useful. Data obtained from convenient sources
like internal business records or the internet are plenti-
p(Xns |Xs , Zp ) ful and tempting to use in estimation. Another mitigat-
(18)  ing factor is that selecting and maintaining probability
∝ p(Xns , Zns |Xp , Zp )p(Xp , Zp ) dZns samples becomes more difficult all the time, particu-
larly when surveying households and persons. Because
under the assumption of ignorable sampling (X is inde- of these considerations, methods of statistical infer-
pendent of the sampling indicator I conditional on Z) ence other than the design-based, repeated sampling
by making draws of p(Xp , Zp ) from a Bayesian boot- approach are required.
strap (Rubin, 1981) and draws from p(Xns , Zns |Xp , Two alternatives are quasi-randomization and super-
Zp ) via a finite population Bayesian bootstrap (FPBB) population modeling. In the former, probabilities of be-
procedure that accounts for probabilities of selection, ing included in a sample are estimated based on covari-
clustering and weighting. Treating the nonprobability ates. Unit-level covariates must be available for both
sample (Ynp , Xnp ) as a certainty sample and concate- the nonprobability sample and either a census of the
nating it with the probability sample to obtain Ys = population or a well-controlled, reference dataset that
Ynp and Xs = (Xp , Xnp ), we have (Zhou, Elliott and represents the nonsample units. The reference sample
Raghunathan, 2016c) may or may not be a probability sample. But, in any
case, the reference sample must permit inclusion prob-
p(Xns |Ys , Xs , Zp )
 abilities to be estimated for the nonprobability units
∝ p(Xns , Yns |Ys , Xs , Zp ) dYns when the two covariate sources are combined. The su-
perpopulation approach constructs models for y vari-
 
ables and uses them to predict finite population quanti-
∝ p(Yns |X, Ys , Zp , θ )p(Xns |Ys , Xs , Zp , θ ) ties like means or totals. The quasi-randomization and
· p(Ys , Xs , Zp , |θ )p(θ ) dθ dYns superpopulation approaches can also be combined to
create estimators.
under the assumption that p(Y |X, θ ) = p(Ys |Xnp , θ ), There are pros and cons to the two. In quasi-
that is, the model for Y given X holds in both the randomization, general inclusion probabilities can be
262 M. R. ELLIOTT AND R. VALLIANT

estimated that are not specific to particular analytic y ACKNOWLEDGMENTS


variables. Thus, they can apply to estimation for any
The authors thank the Editor, Associate Editor and
y. An estimator generated for a particular y using the
referees for their comments which led to considerable
superpopulation approach may use a model specific
improvements in the paper.
to the y. Such an estimator can have a lower model-
variance than a quasi-randomization estimator because
it accounts for the population structure of the y. On the REFERENCES
other hand, it can be model-biased if the superpopula- A LVAREZ , R., S HERMAN , R. and VAN B ESELAERE , C. (2003).
tion model is misspecified by, say, omitting important Subject acquisition for web-based surveys. Polit. Anal. 11 23–
covariates. Although a quasi-randomization estimator 43.
may be unbiased with respect to repeated “pseudo- BAKER , R., B RICK , J., BATES , N., C OUPER , M., C OUR -
TRIGHT, M., D ENNIS , J., D ILLMAN , D., F RANKEL , M.,
sampling,” it can also be model-biased with respect
G ARLAND , P., G ROVES , R., K ENNEDY, C., K ROSNICK , J.,
to the superpopulation y model. Which of these two L AVRAKAS , P., L EE , S., L INK , M., P IEKARSKI , L., R AO , K.,
approaches is the most useful and statistically efficient T HOMAS , R. and Z AHS , D. (2010). AAPOR report on online
appears to be an open question. Comparing these two panels. Public Opin. Q. 74 711–781.
approaches in the context of the many different types BAKER , R., B RICK , J. M., BATES , N. A., BATTAGLIA , M.,
C OUPER , M. P., D EVER , J. A., G ILE , K. and
of data now available should be fertile ground for re-
T OURANGEAU , R. (2013a). Report of the AAPOR Task
search. Force on Non-probability Sampling. Technical report, Ameri-
Finally, a broader issue is whether there are certain can Association for Public Opinion Research, Deerfield, IL.
situations in which nonprobability samples should be BAKER , R., B RICK , J. M., BATES , N. A., BATTAGLIA , M.,
avoided altogether if some sort of probability sample C OUPER , M. P., D EVER , J. A., G ILE , K. and
T OURANGEAU , R. (2013b). Summary report of the AA-
is available. This consideration can be viewed through
POR task force on non-probability sampling. Journal of Survey
the “fit for purpose” framework (Baker et al., 2013b). Statistics and Methodology 1 90–143.
Though perhaps most commonly used in defense of B ERZOFSKY, M., W ILLIAMS , R. and B IEMER , P. (2009). Com-
nonprobability samples by adding factors such as time- bining probability and non-probability sampling methods:
liness, accessibility and cost to the assessment of sur- Model-aided sampling and the O*NET data collection program.
Survey Practice.
vey design, fit for purpose suggests that, when criti-
B ETHLEHEM , J. (2010). Selection bias in web surveys. Int. Stat.
cal estimates of descriptive quantities such as means, Rev. 78 161–188.
quantiles or cell probabilities are required, nonproba- B INDER , D. and ROBERTS , G. (2009). Imputation of business sur-
bility designs should be avoided or utilized only when vey data. In Handbook of Statistics, Sample Surveys: Inference
it is reasonably certain that there are available covari- and Analysis, Volume 29B (D. Pfeffermann and C. Rao, eds.).
Elsevier, Amsterdam.
ates in both datasets related to the nonprobability selec-
B RICK , J. (2015). Compositional model inference. In Proceedings
tion mechanism that can be used to appropriately incor- of the Section on Survey Research Methods 299–307. Amer.
porate information from the nonprobability sample. If Statist. Assoc., Alexandria, VA.
a sufficiently large probability sample is available for C AVALLO , A. and R IGOBON , R. (2016). The billion prices project:
estimating descriptive statistics, methods to incorpo- Using online prices for measurement and research. The Journal
of Economic Perspectives 151–178.
rate nonprobability data are likely not warranted. (We
C HEN , J. K.-T. (2015). Using LASSO to Calibrate Non-
focus on descriptive statistics because there may be a probability Samples using Probability Samples. Ph.D. thesis,
smaller impact of nonprobability samples on model es- Univ. Michigan, Ann Arbor, MI.
timators. The effect on model estimators results from C HIPMAN , H. A., G EORGE , E. I. and M C C ULLOCH , R. E.
interactions between the probability of selection and (2010). BART: Bayesian additive regression trees. Ann. Appl.
the model effects, which we might presume to be less Stat. 4 266–298. MR2758172
C LEMENT, S. (2016). How the Washington Post-
prevalent to the degree that models are correctly speci- SurveyMonkey 50-state poll was conducted. Available at
fied. However, the possibility of nonignorable selection https://www.washingtonpost.com/news/post-politics/wp/2016/
related to model residuals remains present in nonprob- 09/06/how-the-washington-post-surveymonkey-50-state-poll-
ability samples.) Developments of methods to assess was-conducted/.
the sensitivity of results to failures of the observed co- C OUPER , M. (2013). Is the sky falling? New technology, changing
media, and the future of surveys. Survey Research Methods 7
variates to fully capture the selection mechanism for 145–156.
the nonprobability sample is, thus, yet another avenue C OWLING , D. (2015). Election 2015: How the opinion polls
for future research. got it wrong. Available at http://www.bbc.com/news/
NONPROBABILITY SAMPLES 263

uk-politics-32751993. BBC News online; accessed 06- H OLT, D. and S MITH , T. M. F. (1979). Poststratification. J. R. Stat.
November-2016. Soc., A 142 33–46.
D EVER , J., R AFFERTY, A. and VALLIANT, R. (2008). Internet sur- K AIZAR , E. (2015). Incorporating both randomized and observa-
veys: Can statistical adjustments eliminate coverage bias? Sur- tional data into a single analysis. Annual Review of Statistics
vey Research Methods 2 47–62. and Its Application 2 49–72.
D EVER , J. and VALLIANT, R. (2010). A comparison of variance K EIDING , N. and L OUIS , T. (2016). Perils and potentials of self-
estimators for poststratification to estimated control totals. Surv. selected entry to epidemiological studies and surveys. J. R. Stat.
Methodol. 36 45–56. Soc., A 179 319–376. MR3461587
D EVER , J. and VALLIANT, R. (2014). Estimation with non- KOHUT, A., K EETER , S., D OHERTY, C., D IMOCK , M. and
probability surveys and the question of external validity. In C HRISTIAN , L. (2012). Assessing the representative-
Proceedings of Statistics Canada Symposium 2014. Statistics ness of public opinion surveys. Available at http://www.
Canada, Ottawa, ON. people-press.org/2012/05/15/assessing-the-representativeness-
D EVER , J. and VALLIANT, R. (2016). GREG estimation with un- of-public-opinion-surveys/. Pew Research Center; accessed
06-November-2016.
dercoverage and estimated controls. Journal of Survey Statistics
KORN , E. and G RAUBARD , B. (1999). Analysis of Health Surveys.
and Methodology 4 289–318.
Wiley, New York.
D EVILLE , J. (1991). A theory of quota surveys. Surv. Methodol. 17
L E B LANC , M. and T IBSHIRANI , R. (1998). Monotone shrinkage
163–181.
of trees. J. Comput. Graph. Statist. 7 417–433.
D ONG , Q., E LLIOTT, M. and R AGHUNATHAN , T. (2014). A non- L EE , S. and VALLIANT, R. (2009). Estimation for volunteer panel
parametric method to generate synthetic populations to adjust web surveys uing propensity score adjustment and calibration
for complex sample designs. Surv. Methodol. 40 29–46. adjustment. Sociol. Methods Res. 37 319–343.
E LLIOTT, M. (2009). Combining data from probability and non- L IEBERMANN , O. (2015). Why were the Israeli election
probability samples using pseudo-weights. Survey Practice. polls so wrong? Available at http://www.cnn.com/2015/03/
E LLIOTT, M. R. and DAVIS , W. W. (2005). Obtaining cancer risk 18/middleeast/israel-election-polls/. CNN online; accessed 06-
factor prevalence estimates in small areas: Combining data from November-2016.
two surveys. J. R. Stat. Soc. Ser. C. Appl. Stat. 54 595–609. L ITTLE , R. J. A. (1982). Models for nonresponse in sample sur-
MR2137256 veys. J. Amer. Statist. Assoc. 77 237–250.
E LLIOTT, M. and L ITTLE , R. J. A. (2000). Model averaging meth- L ITTLE , R. J. A. (2003). Bayesian methods for unit and item non-
ods for weight trimming. J. Off. Stat. 16 191–209. response. In Analysis of Survey Data (R. Chambers and C. Skin-
E LLIOTT, M., R ESLER , A., F LANNAGAN , C. and RUPP, J. (2010). ner, eds.). Wiley, Chichester.
Combining data from probability and non-probability samples L UMLEY, T. and S COTT, A. (2017). Fitting regression models to
using pseudo-weights. Accident Analysis and Prevention 42 survey data. Statist. Sci. 32 265–278.
530–539. M ADIGAN , D., S TANG , P., B ERLIN , J., S CHUEMIE , M., OVER -
E NTEN , H. (2014). Flying Blind Toward Hogan’s Upset HAGE , J., S UCHARD , M., D UMOUCHEL , W., H ARTZEMA , W.
Win In Maryland. Available at http://fivethirtyeight.com/ and RYAN , P. (2014). A systematic statistical approach to eval-
datalab/governor-maryland-surprise-brown-hogan/. FiveThir- uating evidence from observational studies. Annual Review of
tyEight online; accessed 06-November-2016. Statistics and Its Application 1 11–39.
F ERRARI , S. L. P. and C RIBARI -N ETO , F. (2004). Beta regression M URPHY, J., L INK , M., C HILDS , J., T ESFAYE , C., D EAN , E.,
for modelling rates and proportions. J. Appl. Stat. 31 799–815. S TERN , M., PASEK , J., C OHEN , J., C ALLEGARO , M. and
MR2095753 H ARWOOD , P. (2015). Social media in public opinion research.
F ILE , T. and RYAN , C. (2014). Computer and internet use in Public Opin. Q. 78 788–794.
the United States: 2013. Available at http://www.census.gov/ N EYMAN , J. (1934). On the two different aspects of the representa-
tive method: The method of stratified sampling and the method
content/dam/Census/library/publications/2014/acs/acs-28.pdf.
of purposive selection. Journal of the Royal Statistical Society
US Census Bureau; accessed 06-November-2016.
97 558–625. MR0121942
F ROST, S., B ROUWER , K., F IRESTONE -C RUZ , M., R AMOS , R.,
O’M UIRCHEARTAIGH , C. and H EDGES , L. V. (2014). Generaliz-
R AMOS , M., L OZADA , R., M AGIS -RODRIGUEZ , C. and
ing from unrepresentative experiments: A stratified propensity
S TRATHDEE , S. (2006). Respondent-driven sampling of injec-
score approach. J. R. Stat. Soc. Ser. C. Appl. Stat. 63 195–210.
tion drug users in two U.S.-Mexico border cities: Recruitment
MR3234340
dynamics and impact on estimates of hiv and syphilis preva- R AO , J. N. K. and W U , C. F. J. (1988). Resampling inference
lence. Journal of Urban Health 83 83–97. with complex survey data. J. Amer. Statist. Assoc. 83 231–241.
G ILE , K. J. and H ANDCOCK , M. S. (2010). Respondent-driven MR0941020
sampling: An assessment of current methodology. Sociol. R AO , J. N. K., W U , C. F. J. and Y UE , K. (1992). Some re-
Method. 40 285–327. cent work on resampling methods for complex surveys. Surv.
G OSNELL , H. F. (1937). How accurate were the polls? Public Methodol. 18 209–217.
Opin. Q. 1 97–105. R IVERS , D. (2007). Sampling for web surveys. Amazon Web
H AZIZA , D. and B EAUMONT, J.-F. (2017). Construction of Services. Available at https://s3.amazonaws.com/yg-public/
weights in surveys: A review. Statist. Sci. 32 206–226. Scientific/Sample+Matching_JSM.pdf.
H ECKATHORN , D. D. (1997). Respondent-driven sampling: A new ROSENBAUM , P. and RUBIN , D. (1983). The central role of the
approach to the study of hidden populations. Soc. Probl. 44 propensity score in observational studies for causal effects.
174–199. Biometrika 70 41–55. MR0742974
264 M. R. ELLIOTT AND R. VALLIANT

ROYALL , R. (1970). On finite population sampling theory under and S MITH , P. (2016). Report of the Inquiry into the
certain linear regression models. Biometrika 57 377–387. 2015 British general election opinion polls. Available at
ROYALL , R. (1971). Linear regression models in finite popula- http://eprints.ncrm.ac.uk/3789/1/Report_final_revised.pdf. ac-
tion sampling theory. In Foundations of Statistical Inference cessed 06-November-2016.
(V. Godambe and D. Sprott, eds.). Holt, Rinehart, and Winston, T IBSHIRANI , R. (1996). Regression shrinkage and selection via
Toronto. the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
RUBIN , D. B. (1976). Inference and missing data. Biometrika 63 MR1379242
581–592. MR0455196 US E NERGY I NFORMATION A DMINISTRATION (2016). Weekly
RUBIN , D. (1979). Using multivariate matched sampling and re- petroleum status report. Available at https://www.eia.gov/
gression adjustment to control bias in observational studies. petroleum/supply/weekly/pdf/appendixb.pdf. US Department
J. Amer. Statist. Assoc. 74 318–328. of Energy online: accessed 06-November-2016.
RUBIN , D. B. (1981). The Bayesian bootstrap. Ann. Statist. 9 130– VALLIANT, R. and D EVER , J. A. (2011). Estimating propensity
134. MR0600538 adjustments for volunteer web surveys. Sociol. Methods Res. 40
S ÄRNDAL , C.-E., S WENSSON , B. and W RETMAN , J. (1992). 105–137. MR2758301
Model Assisted Survey Sampling. Springer, New York. VALLIANT, R., D EVER , J. A. and K REUTER , F. (2013). Practical
MR1140409 Tools for Designing and Weighting Survey Samples. Springer,
S CHONLAU , M. and C OUPER , M. (2017). Options for conducting New York. MR3088726
web surveys. Statist. Sci. 32 279–292. VALLIANT, R., D ORFMAN , A. H. and ROYALL , R. M. (2000).
S CHONLAU , M., VAN S OEST, A. and K APTEYN , A. (2007). Are Finite Population Sampling and Inference: A Prediction Ap-
“Webographic” or attitudinal questions useful for adjusting es- proach. Wiley, New York.
timates from web surveys using propensity scoring? Survey Re- VAN DER L AAN , M. J., P OLLEY, E. C. and H UBBARD , A. E.
search Methods 1 155–163. (2007). Super learner. Stat. Appl. Genet. Mol. Biol. 6.
S CHONLAU , M., W EIDMER , B. and K APTEYN , A. (2014). Re- VONK , T. W. E., VAN O SSENBRUGGEN , R. and W ILLEMS , P.
cruiting an Internet panel using respondent-driven sampling. (2006). The effects of panel recruitment and manage-
J. Off. Stat. 30 291–310. ment on research results. Available at https://www.esomar.
S IMON , H. (1956). Rational choice and the structure of the envi- org/web/research_papers/Web-Panel_1476_The-effects-of-
ronment. Psychological Review 63 129–138. panel-recruitment-and-management-on-research-results.php.
S IRKEN , M. (1970). Household surveys with multiplicity. J. Amer. ESOMAR; accessed 06-November-2016.
Statist. Assoc. 65 257–266. WANG , W., ROTHSCHILD , D., G OEL , S. and G ELMAN , A.
S MITH , T. M. F. (1976). The foundations of survey sampling: A re- (2015). Forecasting elections with non-representative polls. Int.
view. J. Roy. Statist. Soc. Ser. A 139 183–204. MR0445669 J. Forecast. 31 980–991.
S MITH , T. M. F. (1983). On the validity of inferences from non- Z HOU , H., E LLIOTT, M. and R AGHUNATHAN , T. (2016a). Multi-
random samples. J. R. Stat. Soc., A 146 394–403. MR0769995 ple imputation in two-stage cluster samples using the weighted
S QUIRE , P. (1988). Why the 1936 literary digest poll failed. Public finite population Bayesian bootstrap. Journal of Survey Statis-
Opin. Q. 52 125–133. tics and Methodology 4 139–170.
S TUART, E. A., C OLE , S. R., B RADSHAW, C. P. and L EAF, P. J. Z HOU , H., E LLIOTT, M. and R AGHUNATHAN , T. (2016b). Syn-
(2011). The use of propensity scores to assess the generaliz- thetic multiple imputation procedure for multi-stage complex
ability of results from randomized trials. J. R. Stat. Soc., A 174 samples. J. Off. Stat. 32 251–256.
369–386. MR2898850 Z HOU , H., E LLIOTT, M. and R AGHUNATHAN , T. (2016c). A two-
S TURGIS , P., BAKER , N., C ALLEGARO , M., F ISHER , S., step semiparametric method to accommodate sampling weights
G REEN , J., J ENNINGS , W., K UHA , J., L AUDERDALE , B. in multiple imputation. Biometrics 72 242–252. MR3500593

You might also like