International Biometric Society
International Biometric Society
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=ibs. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
47, 269-279
BIOMETRICS
March1991
A NonparametricApproachto Size-BiasedLine
TransectSampling
SUMMARY
1. Introduction
2. Notationand Assumptions
2.1 Notation
We adoptthefollowingnotation:
L, w: Length,half-widthof thesampledtransectstrip;
C, n: Actualand detectednumbersof clustersin thestrip;
A = E(C)/(2Lw): Clusterdensity(meannumberof clustersperunitarea);
Xj: Perpendicular centerof clusterj, for
distancefromthetransectline to thegeometrical
j= 1,...,C;
S.: Size (i.e., numberof objects)of clusterj, forj = 1, .C. Withoutloss of generality,
assumethatSj < M forall j;
N = Y_C Sj: Total numberof objectsin thestrip;
D = E(N)/(2Lw): Populationdensity(meannumberof objectsper upitarea);
SD(s): Commonprobability densityfunction (pdf)of the Sj's;
=B E(Sj): Common mean size of theclusters;
g(x, s): Bivariatedetectionfunction,i.e., probabilityof detecting a cluster,giventhatit is at
distancex fromthetransect line, and thatit has size s;
f(x, s): Jointpdfof distanceand size of a detectedcluster;
p: Averageprobability of detectionof a cluster.
2.2 Assumptions
The followingassumptions
are made:
(A1) A 2 w-by-Ltransectstripis chosenat randomin the regionunderstudy.It resultsthat
C, n, N are randomquantities.
Size-BiasedLine TransectSampling 271
(A2) Beforesampling,C, XI, ..., AXkC, SI, . . , Sc are mutually
independent;moreover,each
Xj is uniformly over
distributed [0, w].
(A3) For each detectedclusteri, thedistanceXi and size Si can be measuredaccurately.
(A4) Detectionsof clustersare mutuallyindependent events.
(A5) Clustersdirectlyon the transectline are surelydetectedwhatevertheirsizes, i.e.,
g(O, s) =1 forall s > 0; g(x, s) is moreovercontinuous withrespectto x.
2.3 Remarks
3. Point EstimationProcedures
If Si is of the discrete type, define 3(x) = Z 3f(x, s), where the summationextends over all
possiblevalues s of Si. If Si is of thecontinuous type,and satisfies0 < Si < M, say, define
O(X) -~ sf(x, s) ds. It is proved in the Appendix that
D E(n)(
= (1l)
2L
Noticethat(1) is similarto theformulaD E(n)fx(0)/(2L) derivedbyCrainet al. (1979) for
nonclusteredpopulations, wherefv is thepdfofthedetecteddistances.A refereepointsoutthat
3(O) = E[ f(OI s)] is the expectedclustersize weightedby the conditionalprobabilityof
detectiongivenclustersize, at distance0. Also in the Appendix,the theoryof trigonometric
Fourierseriesis used to estimateO(0), fromwhichthe followingestimator of the population
densityD is derived:
1 F
I
WrXA7
>S +2 cos (2)
2Lw1~
i '[ r W] ]
wheret is an integerchosenaccordingto Rule 2 below.
A consequenceof (2), using Si = 1 forall i, is thefollowingestimator
of theclusterdensity
A\:
,
Az 2Lw1
[
i I
+2Zcos
r=1
-rrX
W
(3)
D = 3AI\
2Lw
?[
ItL
+ 2E
rLCO
cos
wl]
|
I
Qi =2L + 2 Ecos 2
thenthe Qi's are i.i.d. and D is theirsum. This factallows us to estimatethe conditional
standarddeviationSd,(fD) eitherby bootstrapresampling[see Efron(1982) fora description,
and Bickel and Freedman(1981) for a theoreticaljustificationin the i.i.d. case], or by
calculatingthesamplestandard deviations ofthe bi's.A confidence interval forA can be setup
likewise.The bootstrap does notseemto give a morepreciseestimateof thestandarddeviation
than the sample standarddeviationapproach. But , = D/A\ is a ratio of two dependent
variables,henceitsstandarddeviationis difficult to estimateanalytically (see Cochran,1977,p.
33), whereasthebootstrap is handyand seemsto workwell. The computer programSIZEBIAS,
availableon requestfromtheauthor,uses bothapproachesin estimating standarddeviations.
Two featuresare worthnoting:(i) expression(5) has the familiarformof a "normal
interval,"and (ii) thestandarddeviationSd,,(D) is calculatedconditionalon n, i.e., as if n
werenonrandom.
In thefollowing, therelativemeritsof conditional and unconditional variancesare discussed.
Commonpracticein surveysof resourceabundancecalls for assessingthe variability of an
estimator,say D, by estimatingits unconditional variance var(D) or, equivalently,its
coefficientof variationSd(D)/D. Then further assumptions oftenneed to be made aboutthe
moments of n, whosedistribution is notknown.Quinnand Gallucci(1980) presentseveralways
of estimating suchunconditional variances.Gates(1981) discussesvariousmethodsof optimiz-
ing varianceestimation, amongthemsamplingdesignswhereseveraltransectstripsserveas
samplingunits.
Conditionalvariancesare easier to estimate,butunliketheirunconditional counterparts, do
notconveythefullvariability of theestimator.For thepurposeof settinga confidence interval
forD, however,use of a conditional varianceas in (5) is permissible.Conditional varianceswill
oftenproduceshorterconfidenceintervalsthanthe unconditional ones, since (see Cochran,
1977, p. 276) var(D) = E[var,(D)] + var[E,1(D)].
The interval(5) is notwithout weaknesses,however:(i) itis asymptotic (n mustbe large);(ii)
it is conditionalon t, i.e., it is rigorously valid onlyif thenumbert of termsin (2) is fixedin
advance,ratherthanbeing determined by Rule 2; and (iii) it carriesa coveragedeficiency
because it is built around 3t(O), which is a biased estimatorof j(0). Because of these
weaknesses,thispaperattempts to assess theperformance of (5) by carryingout some Monte
Carlo simulations (see Section6). The resultsparallelthoseof thedistance-only case (see Crain
et al., 1979): whenthebivariatedetection function g(x, s) is modeledby a half-normal density,
then(5) seems to be satisfactory; when g(x, s) is modeledby an exponentialdensity,then
coverage deficiencyis serious.
5. Examples
The computer
programSIZEBIAS is used to processthedata setsdescribedbelow. The results
are summarized
in Table 1. Outputis shownin Figure1.
274 Biometrics, March 1991
Table I
Summaryof analysesof data sets. The entriesare theestimates(standarderror,coefficient
of
variation). All standarderrorsare obtainedby a bootstrapwith100 replications.
Estimate Beer cans Bobwhite Whales
PopulationN ~ 448 (56, 12.4%) 1-2,727 (1,604,1I2.6%) 9,305 (3,037,32.6%)
ClustersC125 (13, 10.4%) 3,632 (299, 8.2%) 1,876a (367, 19.6%)
Clustersize ji 3.60 (.48, 13.33%) 3.50 (.34, 9.7%) 4.96 (1.81, 36.6%)
Obs. meansize S 4.21 3.745 5.33
Area2Lw 8,000 sq m 148.52 sq km 7,800 sq nrn
a withbiasreduction
Obtained technique.
Figure 1. Outputof the computerprogram SIZEBIAS with Otto and Pollock's (1990) beer can
data.
5.3 WhaleData
A surveyof theminkewhalewas conductedduringthe 1979- 1980 seasonby theInternational
WhalingCommission.There were n = 165 detectedpods of whalesin a stripof dimensions
L = 975, 2w = 8 nauticalmiles. The estimatedpopulationsize, N= 9,305 whales, is not
precise:c.v. = 32.6%, and theFourierexpansion(2) requiresa largenumberofterms(t = 13).
The estimatednumberof clusters,C= 2,405 (withs.e.= 219 by the bootstrapmethod),is
dubiousbecause(3) requiresm ) 20, whichis excessive.Indeed,a frequency histogram of the
detecteddistancesindicatesthatthe shape requirementforan efficient estimateof a density
(Burnhamet al., 1980, p. 47) is notmet.Whenthebias reduction technique(Quang, 1990) is
applied,C= 1,786. Bias-reducedvalues are also used in the subsequentbootstrap procedure,
yieldings.e.(C) = 367.
Resultsfromalternative methods(Quinn, 1985; Drummerand McDonald, 1987) are repro-
ducedin Table 2 forcomparison.The readeris referredto theseauthorsfora detaileddiscussion
oftheprocedures used. BothQuinnand DrummerandMcDonaldreportthepopulation estimates
overa regionof 39,100 sq nm,whilethetransect stripmeasures7,800 sq nm.The extrapolated
populationestimate,fromN = 9,305, is 46,644 whales.
Table 2
Comparison
ofanalysesof thewhaledata
Estimated Estimated Estimated
meanpod size # ofpods # of individuals
Method 4 (S.e.) C (S.e.) N (S.e.)
Quinn(1985)
Post-stratification 3.27 ( 0) 10,708(3,427) 35,054( 7,940)
Pooleddata 4.30 (.40) 10,747(2,520) 46,212(11,658)
DrummerandMcDonald(1987)
Negative
exponential 4.155(.502) 10,400(1,615) 43,209(8,242)
Half-normnal 5.175(.874) 4,845( 610) 25,071(5,263)
GEM 4.150(.502) 10,801 44,825
Thispaper
Nonparametric 4.96 (1.81) 9,402a (1,840) 46,644(13,871)
a Obtainedwitha bias-reduction
technique.
6.1 Discussion
The half-normal
modelseems to providean estimateof D thatis overallunbiased,whilethe
exponential
modelunderestimates
it. The same remarkappliesconcerningi\.
Size-BiasedLine TransectSampling 277
Coverage deficiency is a constant feature. Coverage of D is more or less satisfactory
(67.0% -94.6%) under the half-normalmodel, while deficiencyis severe under the exponential
model. The bad behavior of the exponential model under Fourier series techniques is well
known. In the distance-only situation Burnham et al. (1980) showed that Fourier series
techniques work well in cases where the detection curve exhibits a shoulder near the origin
(e.g., the half-normaldetectionfunction),while Quang (1990) proposes a methodto reduce the
bias.
Overall, the estimate ft= D jA does well, while the mean detected cluster size S always
overestimates[t (as expected), oftenby a large amount. Finally, more clusters are detected as
the influenceof size on delectability(the magnitudeof ao) increases.
ACKNOWLEDGEMENTS
I wish to thank Steve Amstrup, Tom Drummer, Charles Gates, Terry Quinn II, Steve
Thompson, the associate editor, and two refereesfor careful reviews of the manuscript,and for
many criticismsthat have helped improve the presentationof the paper. Terry Quinn, Fred
Guthery,and Mark Otto have kindly made available the edited whale data, the bobwhite data,
and the beer can data, respectively.
RESUME
Sont proposes,des estimateurs de la valeur et de l'intervallede confianced'une densityd'objets 'a
repartition de typeagregative,observesau moyende la methoded'echantillonnage par transect lineaire.I1
est supposeque la probability de detecterun groupedepend'a la fois de son effectif et de sa distance
orthogonaleau transect.Aucunehypotheseparticuliere n'est fate sur la formede la fonctionbivariee
associe'e'a la probability un groupeexcepteeque les groupeslocalisesprecisement
de detecter surle transect
sont suirement reperes.Un estimateur de l'effectifmoyendes groupesest aussi propose. Les erreurs
standarddes estimateurs peuvent6treetabliessoit a partirdes variancesdes denombrements soit par
reechantillonnage par la methodebootstrap.MarkOttoa appliquecettemethodea deuxseriesde donnees,
l'une issue d'une campagnede navigationconsacreea l' observation du petitRorquall'autrerelative'a un
lynxamericain.Les resultats de quelquessimulations de typeMonteCarlo sontpresents et discutes.La
methodeest fonduesurl'utilisation de la theoriedes seriestrigonometriques de Fourier.Elle sembleavoir
les memesavantageset les memefaiblessesque celle bienconnuede Burnham,Anderson,et Laake (1980,
WildlifeMonograph 12, Supplement to Journalof WildlifeManagement44).
REFERENCES
Gates,C. E., Evans,W., Gober,D. R., Guthery, F. S., andGrant,W. E. (1985). Line transect estimation
of animaldensitiesfromlarge data sets. In Game HarvestManagement,S. L. Beasom and S. F.
Roberson(eds). Kingsville,Texas: Caesar KlebergResearchInstitute, Texas A&I University.
Kronmal,R. and Tarter,M. (1968). The estimation of probability densitiesand cumulativesby Fourier
seriesmethods.Journalof theAmericanStatisticalAssociation 63, 925-952.
Otto,M. C. and Pollock,K. H. (1990). Size bias in line transectsampling:A fieldtest. Biometrics46,
239-245.
Quang, P. X. (1990). Confidenceintervalsfor densitiesin line transectsampling.Biometrics46,
459-472.
Quinn,T. J. II and Gallucci,V. F. (1980). Parametric modelsforline transectestimators of abundance.
Ecology 61, 293-302.
Quinn,T. J. 11(1985). Line transect
estimatorsforschoolingpopulations.FisheriesResearch3, 183-199.
Ramsey,F. L., Wildman,V., and Engbring,J. (1987). Covariateadjustments to effectivearea in
variable-areawildlifesurveys.Biometrics43, 1-11.
Siegmund,D. 0. (1985). SequentialAnalysis. New York: Springer-Verlag.
fromtheRussianby R. A. Silverman).EnglewoodCliffs,
Tolstov,G. P. (1962). Fourierseries(translated
New Jersey:Prentice-Hall.
APPENDIX
yieldingthefollowingunbiasedestimateof br:
2 n wrrX,
br Scos i
wni=1 w
Choose an integert, and definetheFouriersegment
2b+? rE r
-= 2 SSi +2 cos - = E
2 r,=1 wfl11 r EI W 2Ln1j1
Q
_ SJ() - St ()
Z
Sdn[I~t (?) ]
We deducethatthefollowingstatement
holdswithprobability
approaching1 - of: