Biostatistics - Prelim Transes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7
At a glance
Powered by AI
The key takeaways from the document are the different types of data, levels of measurement, measures of central tendency and measures of dispersion that are important concepts in biostatistics.

The different types of data discussed are categorical data which can be nominal or ordinal, and numerical data which can be discrete or continuous.

The different levels of measurement discussed are nominal, ordinal, interval and ratio levels of measurement.

Biostatistics

Unit 1: Introduction to Biostatistics || Lesson Reviewer || 2nd Year, 1st Semester

BIOSTATISTICS VARIABLE
Þ The branch of statistics that deals with data relating to living Þ Anything that has a quantity or quality that varies
organisms. Þ In research, it simply refers to a person, place, thing, or
Þ The development and application of statistical methods to a phenomenon that you are trying to measure in some way.
wide range of topics in biology.
Þ An attribute that describes a person, place, thing, or idea.
STATISTICS AS A TOOL IN DECISION-MAKING Þ Quantitative Var. – a variable that contains quantitative
data.
Þ A science that studies data to be able to make a decision.
Þ Categorical Var. – a variable that contains categorical data.
Þ To provide answers or solutions to an inquiry.

IMPORTANCE OF STATISTICS TYPES OF DATA:


It enables us to: categorical
Þ Characterize persons, objects, situations, and phenomena Þ Data that can be added into categories according to their
Þ Explain relationships among variables characteristics.
Þ Formulate objective assessments and comparisons Þ Describes categories or groups.
Þ Make evidence-based decisions and prediction Þ Can take on only a few possible values, for example, race or
cancer stage.
WHAT ARE THE INDEPENDENT AND DEPENDENT Þ Is answers to yes and no questions.
VARIABLES? Þ Examples: Satisfaction Rating (1-5), Diabetes (Y/N), Gender,
Race/Ethnicity, Highest Degree Earned, Final Letter Grade, Car
INDEPENDENT VAR. DEPENDENT VAR.
Brands
Þ Manipulated Variable Þ Response Variable
Þ Presumed to have caused Þ Response or outcome
or influenced the assumed to be caused by the numerical
dependent variable effect of the independent Þ Are those variables that are measured in terms of numbers.
Þ What is controlled or variable DISCRETE CONTINUOUS
manipulated Þ When the variable takes Þ When the variable can take
Þ Treatment = main on a countable number of on any value in some range of
values. values.
STATISTICS Þ Most often these variables Þ Our precision in measuring
indeed represent some these variables is often
Þ Is all about converting data into useful information kind of count such as the limited by our instruments.
Þ Is therefore a process where we are: number of prescriptions
- Collecting data Þ Units should be provided.
an individual takes daily.
- Summarizing data Þ Discrete data can usually Þ Continuous data is infinite,
- Interpreting data be counted in a finite and impossible to imagine.
matter. Þ Examples: height, weight,
DATA Þ Examples: number of serum cholesterol levels,
Þ A specific measurement of a variable children, grades, money, speed or distance, area
- It is the value you record in your data sheet. number of objects.
Þ Data is generally divided into two categories:
1. Quantitative Data represents amounts LEVELS OF MEASUREMENT
2. Categorical Data represents groupings. qualitative
NOMINAL ORDINAL
CLASSIFICATION OF DATA Þ Unordered list Þ Ordered list
TYPE • Categorical Þ A variable that has two or Þ A variable that has two or
• Quantitative (Numerical) more categories, without more categories, with clear
o Discrete any implied ordering. ordering.
o Continuous
Þ Categorical variables with Þ When the possible responses
MEASUREMENT • Qualitative no ordered distinction correspond to a hierarchical
o Nominal among the possible scale.
o Ordinal responses are classified. Þ Examples: scale, rating, the
• Quantitative Þ Examples: gender, four severity of chronic kidney
o Interval seasons, marital status, disease
o Ratio employment status

_______________________________________________________________________________ O_Ongkay
Biostatistics
BINARY VARIABLES / DICHOTOMOUS
Þ Can take on only two possible values, (example: biological sex)
Þ For example: Aspirin use may be represented by an indicator
variable that will be equal to 1 if a study participant is using
aspirin and 0 if they are not.

quantitative
INTERVAL RATIO
Þ Aka Integer or scaled Þ Has a tru zero
Þ Does not have a true zero Þ It is interval data with a
Þ An interval variable is natural zero point. When the
similar to an ordinal variable equals 0.0, there is
variable, except that the none of that variable.
intervals between the Þ Examples: number of
values of the interval objects, distance, time,
variable are equally height, weight, temperature
spaced. in kelvin (it is a ratio variable,
Þ Is defined as a data type as 0.0 Kelvin really does
which is measured along a mean ‘no temperature’.)
scale it has order and equal
intervals.
Þ Examples: Temperature in
Celsius, Annual Income

Interval Scale
Þ Scale with values, and
there is the same
numerical distance
between each value.
Þ This scale has an arbitrary
zero point (no true
meaningful zero point)
Þ Examples: current temp.,
many behavioral science
questionnaires, IQ

CATEGORICAL QUANTITATIVE
nominal ordinal discrete continuous
Final Letter
Gender Income ($) Age
Grade
Satisfaction Number of Blood Pressure
Diabetes (Y/N)
Rating (1-5) ER Visits (mmHg)
Highest Score on 5
Body Mass Index
Race/Ethnicity Degree questions
(kg/m2)
Earned T/F Quiz

RATIONALE:
Þ Understanding the different scales of measurement allows you
to see the different types of data you can gather.
Þ These differences help you determine the kind of statistical
analysis is required for your research.

_______________________________________________________________________________ O_Ongkay
Biostatistics
Unit 2: Data Presentation || Lesson Reviewer || 2nd Year, 1st Semester

DATA PRESENTATION
Þ Three methods of data presentation
1. Textual
2. Tabular
3. Graphical

TEXTUAL
Þ The main method of conveying information.
Þ Used to explain results and trends and provide contextual
information. APA TABLE FORMAT
Þ Data are fundamentally presented in paragraphs or sentences. Þ Table number in bold above the table
Þ This is commonly used when the data is not very large. Þ Brief title, in italics and title case, below the table number.
Þ Detailed information is given in textual presentation. Þ No vertical lines.
Þ Narrative report is a way to present data. Þ Horizontal lines only where necessary for clarity.
Þ Clear, concise labels for column and row headings.
Þ Numbers consistently formatted (e.g., with the same number
of decimal places).
Þ Any relevant notes below the table.

TABULAR
Þ Tables are the most often appropriate for presenting individual
information and can present both quantitative and qualitative
information.
Þ The strength of tables is that they can accurately present
information that cannot be presented with a graph.
Þ Numerical values are presented using tables. ADVANTAGES OF A TABULAR PRESENTATION
Þ Information is lost in tabular presentation of data. • Ease of representation:
Þ A large amount of data can be easily confined in a data
Þ Frequency distribution table is also applicable for qualitative table. Evidently, it is the simplest form of data presentation.
variables.
• Ease of analysis:
Þ Data tables are frequently used for statistical analysis like
PARTS OF THE TABLE calculation of central tendency, dispersion, etc.
Title Þ Includes the number and a short
description of what is found inside the • Helps in comparison:
table. Þ In a data table, the rows and columns which are required to
be compared can be replaced next to each other. To point
Column Header Þ Provides the label of what is being out, this facilitates comparison as it becomes easy to
presented in a column. compare each value.
Row Header Þ Provides the label of what is being • Economical:
presented in a row. Þ Construction of a data table is fairly easy and presents the
Body Þ The information in the cell data in a manner which is really easy on the eyes of a reader.
intersecting the row and the column. Moreover, it saves time as well as space.

_______________________________________________________________________________ O_Ongkay
Biostatistics
GUIDELINES APA FIGURE FORMAT
• Title: should be in accordance with the objectives of study Þ Any images used within your text are called figures. Figures
and should provide a quick insight into the table. include visualization graphics – e.g., graphs, diagrams,
• Comparison: if there might arise a need to company any flowcharts – as well as things like photographs and artworks.
two rows or columns then these might be kept close to each Þ Figure number in bold above the figure.
other. Þ Brief title, in italics and title case, under the figure number.
• Alternative location of stubs: if the rows in a data table Þ If necessary, clear labels and legends integrated into the
are lengthy, then the stubs can be placed in the right-hand image.
side of the table.
Þ Any relevant notes below the figure.
• Headings: should be written in a singular form. For
example, ‘good’ must be used instead of ‘goods.’
• Footnote: should be given only if needed.
• Size of columns: must be uniform and symmetrical.
• Use of abbreviations: headings and sub-headings should
be free of abbreviations.
• Units: there should be a clear specification of units above the
columns.

GENERAL RULES FOR GRAPHICAL REPRESENTATION


graphical OF DATA
Þ Trends are easily seen in graphs compared to tables. • Suitable Title: make sure that the appropriate title is given
Þ It is good to present data using pictures or figures like the to the graph which indicates the subject of the presentation.
pictograph. • Measurement Unit: mention the measurement unit in the
Þ It is better to present data using graphs than tables as they are graph.
much better to look at. • Proper Scale: to represent the data in an accurate manner,
choose a proper scale.
Bar Þ Used to display the category of data and it • Index: index the appropriate colors, shades, lines, design in
Graph compares the data using solid bars to represent the graphs for better understanding.
the quantities.
• Data Sources: include the source of information wherever
it is necessary at the bottom of the graph.
• Keep it Simple: construct a graph in an easy way that
everyone can understand.
• Neat: choose the correct size, fonts, colors, etc. in such a way
that the graph should be a visual aid for the presentation of
information.
Pie Þ Shows the relationship of the parts of the whole.
Graph
Þ Used to present data as part of one whole.

Line Þ Also called as linear graph


Graph
Þ For time-series data.
Þ Used to display the continuous data and it is
useful for predicting future events over time...

_______________________________________________________________________________ O_Ongkay
Biostatistics
Unit 3: Central Tendency || Lesson Reviewer || 2nd Year, 1st Semester
--
CENTRAL TENDENCY Þ If there are an odd number of values, the median is the
middle value.
Þ As the statistical measure that represents the single value of
the entire distribution or a dataset. Þ If there are an even number of values, the median is the
average of the two middle values.
Þ Is a single value that represents the center point of a dataset.
This value can also be referred to as “the central location” of a
dataset.
Þ Is the descriptive summary of a dataset.

PURPOSE OF CENTRAL TENDENCY


Þ It aims to provide an accurate description of the entire data
in the distribution.

MEASURES OF CENTRAL TENDENCY


Þ Each of these measures finds the central location of a dataset
using different methods.
Þ Depending on the type of data you’re analyzing, one of these
three measures may be better to use than the other two.
Þ Can only be used on data that can be ordered – that is, from
ordinal, interval, and ratio levels of measurement
mean
Þ Best to use when the distribution of the data is either skewed
Þ The most commonly used measure of central tendency or there are outliers present.
Þ Represents the average value of the dataset
Þ It can be calculated as the sum of all the values in the dataset
divided by the number of values.
Þ Mean = (sum of all values) ÷ (total # of values)

Þ Does a better job of capturing the “typical” salary of an


Þ Can only be used on interval and ratio levels of measurement individual than the mean. This is because the large values on
because it requires spacing between adjacent values or scores the tail end of a distribution tend to pull the mean away from
in the scale. the center and towards the long tail.

Þ Best to use when the distribution of the data is fairly Þ Does a better job capturing the central location of a
symmetrical and there are no outliers distribution when there are outliers present in the data.

median mode
Þ Is the middle value in a dataset Þ Is the value that occurs most often in a dataset.

Þ Arranging all the individual values in a dataset from smallest Þ A dataset can have no mode (if no value repeats), one mode,
to largest and finding the middle value. or multiple modes

_______________________________________________________________________________ O_Ongkay
Biostatistics

MEASURES OF DISPERSION
Þ Refers to how closely the data cluster around the measure of
central tendency

Þ Can be used for any level of measurement, but its most MEASURES OF VARIABILITY
meaningful for nominal and ordinal levels. Range Þ The difference between the highest score
Þ Best to use when working with categorical data. and the lowest score/values
Þ If the ordinal data are numeric, such as the
rank order of students within a graduating
class, we can use the range as a measure of
dispersion.
Percentiles Þ Divide data into 100 equal portions
Quartiles Þ Divide distribution into 4 equal parts
Standard Þ Determination of variability of scores
Deviation (difference) from the mean
Þ Represents how likely a data point is to vary
WHEN TO USE
a certain amount from the average in a
levels of examples measure of dataset.
measurement central
Þ The wider the rand of values, the bigger the
tendency
Nominal standard deviation.
• Ethnicity • Mode
• Political ideology Þ Indicates how spread out the data is.

Ordinal • Anxiety level • Mode


• Income bracket • Median
Interval and • Reaction time • Mode
Ratio • Test score • Median
• Temperature • Mean

OUTLIERS
Þ An extremely high or extremely low value in the data
KURTOSIS
Þ We can identify an outlier if it is:
Þ “Peakedness”
- Greater than Q3 + 1.5 (Interquartile Range / IQR)
- Lower than Q1 – 1.5 (IQR) Þ Normal – bell-shaped curve
- IQR = Q3 – Q1 - Mean, median, and mode are the same

_______________________________________________________________________________ O_Ongkay
Biostatistics

SKEWNESS
Þ Asymmetry in the shape of a distribution

LINKS
Þ https://www.youtube.com/watch?v=STSP8gTSdT8
Þ https://youtu.be/09Cx7xuIXig
Þ https://www.statology.org/measures-central-tendency/
Þ https://byjus.com/maths/central-tendency/
Þ https://www.scribbr.com/statistics/central-tendency/

_______________________________________________________________________________ O_Ongkay

You might also like