Plan for Data Processing
and Analysis
KNGMANY CHALEUNVONG
G F M E R - W H O - U N F PA - L A O P D R
TRAINING COURSE IN REPRODUCTIVE
H E A LT H R E S E A R C H
VIENTIANE, 25 SEPTEMBER 2009
Why plan for data processing &
analysis?
2
To ensure that:
All information needed has really been collected, and
in a standardized way.
We have not collected unnecessary data which will
never be analyzed.
What Is Data Processing?
3
Process of data processing
Data from your study
4
Data from questionnaires, case record from (CRF),
patients’ records, patients’ charts, etc.
To be keyed in database software or Program data
entry: Excel Sheet, FileMaker Pro, Microsoft
Access and Epidata.
•
•
•
Data processing & analysis
•
•
Data quality Audit and control
Coding
Data order
Data processing
Data analysis
Audit and control data quality
6
Training data collectors
Validity and data record
Consistency check
Coding
7
We can run statistical models.
Our computer programs will understand the variables.
Accountability – we can run models “blind,” or without
knowing what variables stand for, in order to reduce
programming / author bias.
Be Consistent in your coding.
Know what you are coding!
When in doubt, have someone coding a sample of your
data, and see the level of consistency.
Keep track of what you do! Use a codebook!
Data processing
8
Data entry
Double data entry
Validation
Exploratory data analysis
Cross tabulation
Transformation
Transfer data
EpiData Entry
9
Use EpiData when you have collected data on
paper and you want to do statistical analyses
or tabulation of data. Your data could be
collected by questionnaires or any other kind
of paper based information. EpiData Entry is
not made for analysis.
http://www.epidata.dk/downloads/epitour.pdf
EpiData Entry…
10
Controlled data entry:
EpiData will only allow the user to enter data which meets
certain criteria.
Double Entry of Data
Enter data separately in two different files and compare them
afterwards.
Data validation
Compare the two files and then check the discordances
against the original paper copy and correct the errors.
Cleaning data and verification data
EpiData Entry…
11
Have three files:
.qes
.rec
.chk
From Epidata: transfer to statistic programs for
analysis (SPSS, STATA, Epi Info, etc.)
Make your own codes for each variable and
categorical data and have separate sheet for
explanations of each code (very important!!!)
Data analysis
12
Please consult a statistician or statistics textbook if
you are not confident of your own data analysis
capacity
Any statistics programs can be used: SPSS, STATA,
Epi Info, R, etc.
Digest your study objectives before doing analysis
Use simple statistics first!
Data analysis…
13
Before applying any tests: check if the data
are normally distributed or not (using
histogram chart)
Normal distribution Not Normal distribution
(Bell shape) (Skewed shape)
SD
CV 100 25% Normal distribution
X
Data analysis…
14
Descriptive statistics:
+ Proportion/frequency/percentage
+ Mean (SD) or Mean (95%CI)
+ Median (range)
Data analysis…
15
Comparisons or Inferential statistics:
+ Z-test (compare proportion one or two group)
+ Chi-square (association of categorical outcome
variable)
+ Student t-test (compare mean one or two
group)
+ Mann Whitney-U test (non parametric test)
+ ANOVA (compare mean more than two group)
+ ANCOVA (repeated measurement)
Data analysis…
16
+ Correlation (two continuous variables)
+ Regression (continuous variable)
+ Logistic regression (categorical variable)
+ Log Linear Model (continuous variable)
+ Poisson Regression (Count variable)
+ GEE (Generalized Estimating Equation)
+ GLM (Generalized Linear Models)
+ Survival (COX hazard model)
Data analysis…
17
Normal distributed data:
+ Mean (SD) or Mean (95%CI)
+ Comparison:
- Paired or unpaired t-test
- Chi-square
- ANOVA
- MANAVA
- ANCOVA
-Correlation
- Regression
Data analysis…
18
Not normal distributed data:
+ Median (range)
+ Comparison:
- Wilcoxon signed rank test
- Wilcoxon Matched Pairs Signed
Rank Test
- Wilcoxon Rank Sum test or Mann Whitney
- Fisher’s exact test
- Kruskal Wallis test
- Spearman’s and Kendall’s correlation
Technique
19
Outline of research writing and plan for data analysis
Coding manual
Data collection manual
Research Outline
20
1. Title page
4.5 Data processing and analysis
2. Summary)
4.6 Ethic
3. INTRODUCTION 4.7 Pre-test
3.1 Background 5. PROJECT MANAGEMBNT
3.2 Research problem
6. BUDGET
3.3 Objectives
7. REFERENCES
3.3 Literature review 8. APPENDICES
4. METHODOLOGY) 8.1 Questionnaire form
4.1 Study design
8.2 Plan for data analysis
4.2 Population and Sample
4.3 Sample size
4.4 Variable
Dummy table for research question
24
Factor n % disease Crude Adjusted p-value
Odds Ratios Odds Ratios
Smoke
Yes
No
Total
Alcohol use
Yes
No
Total
Age
< 20 yrs
20- 29 yrs
>= 30 yrs
Total
Over weight
Yes
No
Total