Basic Concepts
Statistics: Statistics is a field of study concerned with (i) the collection,
organization, summarization, and analysis of data; and (ii) the drawing of
inferences about a body of data when only a representative part of the data is
observed.
Data: The information about which we are concerned is called data, and the data
are available to us in the form of numbers. Data expressed as numbers are
considered as raw materials in statistics. The two kinds of numbers that we use in
statistics are numbers that result from the taking of a measurement, and those that
result from the process of counting. For example, when a nurse weighs a patient or
takes a patient’s temperature, a measurement, consisting of a number such as 150
pounds or 100 degrees Fahrenheit, is obtained. Quite a different type of number is
obtained when a hospital administrator counts the number of patients-perhaps 20-
discharged from the hospital on a given day. Each of the three numbers is a datum,
and the three taken together are data.
Remark: Data are numbers; numbers contain information; and the purpose of
statistics is to investigate and evaluate the n ature and meaning of this information.
Sources of Data: Data are usually available from one or more of the following
sources:
1. Routinely kept records: It is difficult to imagine any type of organization that
does not keep records of day-to-day transactions of its activities. Hospital medical
records, for example, contain immense amounts of information on patients, while
hospital accounting records contain a wealth of data on the facility’s business
1
activities. When the need for data arises, we should look for them first among
routinely kept records.
2. Surveys: If the data needed to answer a question are not available from
routinely kept records, the logical source may be a survey. Suppose, for example,
that the administrator of a clinic wishes to obtain information regarding the mode
of transportation used by patients to visit the clinic. If admission forms do not
contain a question on mode of transportation, we may conduct a survey among
patients to obtain this information.
3. Experiments: Frequently the data needed to answer a question are available
only as the result of an experiment. A nurse may wish to know which of several
strategies is best for maximizing patient compliance. The nurse might conduct an
experiment in which the different strategies of motivating compliance are tried
with different patients. Subsequent evaluation of the responses to the different
strategies might enable the nurse to decide which is most effective.
4. External sources: The data needed to answer a question may already exist in the
form of published reports, commercially available data banks, or the research
literature. In other words, we may find that someone else has already asked the
same question, and the answer obtained may be applicable to our present situation.
2
Biostatistics: The tools of statistics are employed in many fields-business,
education, psychology, agriculture, and economics, to mention only a few. When
the data analyzed are derived from the biological sciences and medicine, we use
the term biostatistics to distinguish this particular application of statistical tools
and concepts.
Variable: When a characteristic is observed and if it is found that it takes on
different values in different persons, places, or things, the characteristic is called a
variable. This is because the characteristic is not the same when observed in
different possessors of it. Some examples of variables include the diastolic blood
pressure, heart rate, heights of adult males; the weights of preschool children, and
the ages of patients seen in a dental clinic.
Quantitative Variable: A quantitative variable is one that can be measured in the
usual sense. We can, for example, obtain measurements on the heights of adult
males, the weights of preschool children, and the ages of patients seen in a dental
clinic. These are examples of quantitative variables. Measurements made on
quantitative variables convey information regarding amount.
Qualitative Variable: Some characteristics are not capable of being measured in
the sense that height, weight, and age are measured. Many characteristics can be
categorized only, as, for example, when an ill person is given a medical diagnosis;
a person is designated as belonging to an ethnic group; or a person, place, or object
is said to possess or not to possess some characteristic of interest. In such cases,
measuring consists of categorizing. We refer to variables of this kind as qualitative
variables. Measurements made on qualitative variables convey information
regarding attribute.
3
Although, in the case of qualitative variables, measurement in the usual sense of
the word is not achieved, we can count the number of persons, places, or things
belonging to various categories. A hospital administrator, for example, can count
the number of patients admitted during a day under each of the various admitting
diagnoses. These counts, or frequencies as they are called, are the numbers that we
manipulate when our analysis involves qualitative variables.
Random Variable: Whenever we determine the height, weight, or age of an
individual, the result is frequently referred to as a value of the respective variable.
When the values obtained arise as a result of chance factors, so that they cannot be
exactly predicted in advance, the variable is called a random variable. Examples of
random variable are:
Number of patients coming to a hospital on a particular day
Blood glucose level of a diabetic patient
Number of phone calls received by an individual in a day
Health status of a patient in a hospital
Discrete Random Variable: A discrete variable is characterized by gaps or
interruptions in the values that it can assume. These gaps or interruptions indicate
the absence of values between particular values that the variable can assume. Some
examples illustrate the point. The number of daily admissions to a general hospital
is a discrete random variable since the number of admissions each day must be
represented by a whole number, such as 0, 1, 2, or 3. The number of admissions on
a given day cannot be a number such as 1.5, 2.997, or 3.333. The number of
decayed, missing, or filled teeth per child in an elementary school is another
example of a discrete variable.
4
Continuous Random Variable: A continuous random variable does not possess
the gaps or interruptions characteristic of a discrete random variable. A continuous
random variable can assume any value within a specified relevant interval of
values assumed by the variable. Examples of continuous variables include the
various measurements that can be made on individuals such as height, weight, and
skull circumference. No matter how close together the observed heights of two
people, for example, we can, theoretically, find another person whose height falls
somewhere in between.
Population: Population of entities is the largest collection of entities for which we
have an interest at a particular time. If we take a measurement of a variable on each
of the entities in a population, we generate a population of values of that variable.
We may, therefore, define a population of values as the largest collection of values
of a random variable for which we have an interest at a particular time. For
example,
If we are interested in the weights of all the children enrolled in a certain
county elementary school system, our population consists of all these
weights.
If our interest lies only in the weights of first-grade students in the system,
we have a different population—weights of first-grade students enrolled in
the school system.
Hence, populations are determined or defined by our sphere of interest.
Populations may be finite or infinite. If a population of values consists of a fixed
number of these values, the population is said to be finite. If, on the other hand, a
population consists of an endless succession of values, the population is an infinite
one.
5
If we examine all of the population units, we say that we are conducting a census
of the population. Constraints on time, money, and other scarce resources usually
make a census impractical or infeasible. In such a situation, we select and analyze
a subset (or portion) of the population.
Sample: A sample may be defined simply as a representative part of a population.
Suppose our population consists of the weights of all the elementary school
children enrolled in a certain county school system. If we collect for analysis the
weights of only a representative fraction of these children, we have only a part of
our population of weights, that is, we have a sample.