Research Methodology: Data Collection, Analysis and Interpretation
Research Methodology: Data Collection, Analysis and Interpretation
MODULE 3: PART 1
Data Collection, Analysis and Interpretation
• Preliminary Understanding
• Data Collection Methods
• Sampling designs
• Connecting RD & SD to RO and RH
• Measurement Scales
• Parametric & Non-Parametric Testing
• Univariate, Bivariate & Multivariate
• Adoption of Appropriate Statistical Methods
• Analysis & Modelling and Interpretation of results
Research Defined
Reduces
Uncertainty
Three Types of Research
No No No No
Selection of
exploratory research
Probability
Nonprobability
technique
Secondary Collection of
ExperiencePilot Case
(historical) Data
data
survey study study Gathering
data (fieldwork)
Problem definition Data Editing and
Processing
and coding
(statement of Analysis
data
research objectives)
Data
Selection of processing
Research Design basic research Interpretation
Conclusions
method and Report
of
Experiment Survey findings
Secondary
Laboratory Field Interview Questionnaire Observation
Data Study Report
Types of Data and Measurement Scales
Data:
measurement scales-add notes.ppt
Non-metric Metric
or or
Qualitative Quantitative
8
Types of Variables
Male/Female
Dichotomous-Norminal
Engineering/non-engineering
Engineering background
Discrete/Ranking Educational level
Production Units
Continuous/Ratio Costs
3-9
Description of HBAT Primary Database Variables
MODULE 3: PART 2
Data Collection, Analysis and Interpretation
Master plan
Framework for action
Specifies methods and procedures
Basic Research Design- Data Collection
SurveysC:\Users\User\Desktop\PG RM MAY
2020\ONLINE MAY20\Survey-add notes.ppt
Experiments
Secondary dataCOVID19 Statistics, Chemical &
Biomolecular Stats
Observation
The Major Decisions in Questionnaire
Design
Questionnaire
relevanceC:\Users\User\Desktop\PG RM MAY
2020\ONLINE MAY20\questionnaire- add notes.ppt
Questionnaire accuracy
Phrasing Questions
Open-ended questions
Fixed-alternative questions
Sampling Design
POPULATION
Sampling
Who is to be sampled?
How large a sample?
How will sample units be selected?
Two Major Categories of Sampling
Probability sampling
• Known, nonzero probability for every
element
Non-probability sampling
• Probability of selecting any particular
member is unknown
Nonprobability Sampling
Convenience
Judgment
Quota
Snowball
Probability Sampling
Plan procedure
for selecting sampling units
Conduct fieldwork
Rationalising the Reasech Work!
RQsROsRH(s)
Univariate- trend, dispersion, fluctuation within the variable.
Bivariate – difference, relationship, association, causal
Multivariate- relationship, association, causal, modeling
Illustration
RQ1- Does the O&G Industry highly fluctuate in the past 5 years? RO? RH?
RO: To examine whether the O&G industry highly fluctuates in the past 5 years.
RH: H0 The O&G industry does not fluctuate significantly in the past 5 years.
Scenario 2:
#1RQ2- Is there any difference between the energy sector and finance sector with regard to
productivity? RO? RH?
RO: This study makes an attempt to investigate whether there is a difference between the energy
sector and finance sector with regard to productivity
RH: H0 There is no significant difference between the energy and finance sector with regard to
productivity level.
H1 There is a significant difference between the energy and finance sector with regard to
productivity level.
Scenario 2-additional:
#2RQ2- Are there any differences among the business sectors (more than 2 sectors) and
performance level? RO? RH?
RO: To examine whether there is a relationship between the business sectors and Performance
level
H0 There are no significant differences among the business sectors with regard to performance
level.
H1 There are significant differences among the business sectors with regard to performance
level.
(RO- This study makes an attempt to investigate whether there is a relationship between the
business sectors and performance level)
H0 There is no significant relationship between the business sectors and performance level.
H1 There is a significant relationship between the business sectors with regard to performance
level.
Scenario 3:
RQ3- what are the factors that affect the performance of energy sector? RO? RH?
RO: This study investigates the factors that affect the performance of energy sector in Malaysia.
H0: The factors do NOT affect the performance of the energy sector in Malaysia.
H1: The factors do affect the performance of the energy sector in Malaysia.
Significance Level 0.10? 0.05? 0.01?, margin for error for statistical testing, i.e rejection of H0
RESEARCH METHODOLOGY
MODULE 3: PART 3
Data Collection, Analysis and Interpretation
Coding required
Data Entry
Data Cleaning
Results with One Variable
Correlational study
Relationship
Causal
Statistical Techniques
Open
Hypothesis Testing : Wording of Final Conclusion
Small p-value
Large p-value
Ho is rejected
Ho is not rejected
Ha is accepted and I
Ha is rejected
have p-value chance of
being wrong
Open
Discussions on
statistical Output:
Open
T-Test Results:
Examples
Open
ATTACH1-SIMPLE T TEST.docx
ATTACH2-PAIRED-IND-ANOVA TEST.docx
Simple T-Test: 2296 obs
One-Sample Statistics
One-Sample Test
Test Value = 60
95% Confidence
Interval of the
Difference
Sig. (2- Mean
t df tailed) Difference Lower Upper
crudeoil - 2295 .000 -30.82723 -31.3328 -30.3216
119.569
PAIRED T -TEST
PAIRED T -TEST
PAIRED T -TEST
PAIRED T -TEST
PAIRED T -TEST
Independent T-Test
Independent T-Test
Independent T-Test
1-way ANOVA
1-way ANOVA
1-way ANOVA
1-way ANOVA
t-Test-excel
Excel output
t-Test: Two-Sample Assuming Unequal Variances
SUMMARY
Groups Count Sum Average Variance
Alone 10 637 63.7 87.56666667
WithPass 12 683 56.91666667 63.53787879
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 250.9833333 1 250.9833333 3.37566268 0.081071382 4.351250027
Within Groups 1487.016667 20 74.35083333
Total 1738 21
Using MEAN as
1 sample T 2 Sample T the parameter of
Hypothesis or
Mann
Sign Test
Whitney
Using MEDIAN as the
parameter of hypothesis
Kruskal
Normality p – value ≤ 0.05 Wallis
(|(Mean-Median)|/Mean )*100
> 10%
Open
Correlation & Regression
Open
Correlation & Regression
• Measure the degree of linear association between two variable using graph and
statistics
• Measure the degree of linear association between two or more variables using
correlation
• Model the relationship between a continuous response variable and one or more
predictor variables
• Determine the strength of the relationship between a continuous response variable
and one or more predictor variables
Open
What is Correlation?
Correlation is…
110 110
85 85 85
100 100
90
Output
90
Output
Output
Output
Output
75 75 75
80 80
70 70
60 65 65 65 60
50 50
40
40 55 55 55
30
30
40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 100 110 120
40 50 60 70 80 90 100 110 120
Open
Correlation
Correlation
Correlation
Correlations
PRODUCTIVITY HOURS
PRODUCTIVITY Pearson Correlation 1 .623**
N 30 30
N 30 30
V1 V2 V3 V4 V5 V6 V7 V8 V9
V1 Price Level 1.00
V2 Store Personnel .427 1.00
V3 Return Policy .302 .771 1.00
V4 Product Availability .470 .497 .427 1.00
V5 Product Quality .765 .406 .307 .472 1.00
V6 Assortment Depth .281 .445 .423 .713 .325 1.00
V7 Assortment Width .354 .490 .471 .719 .378 .724 1.00
V8 In-Store Service .242 .719 .733 .428 .240 .311 .435 1.00
V9 Store Atmosphere .372 .737 .774 .479 .326 .429 .466 .710 1.00
3-70
What is simple regression?
ATTACH3-BIVARIATE CORRELATION-REGRESSION.docx
Simple regression…
Examines the relationships between a continuous response variable (Y)
and a predictor variable (X)
Direct Cause and Effect Higher propane % increase Fuel Gas Density
relationship
Reverse Cause and Effect Bad grades may be caused by absences BUT one should not fail to
relationship consider the fact that bad grades may cause absences
Due to chance or coincidence Suppose that a high correlation is observed between the weekly
sales of hot chocolate and the number of skiing accidents
Coincidence
The coincidence may
be caused by the 3rd
variable, which is
WINTER MONTHS !
Open
Results with More than two Variables: The Elaboration Model
Variables Entered/Removeda
Model Variables Entered Variables Removed Method
1 environ, method, man, machineb . Enter
a. Dependent Variable: prod
b. All requested variables entered.
Model Summaryb
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 584669.492 4 146167.373 528.979 .000b
Residual 11881.758 43 276.320
Total 596551.250 47
a. Dependent Variable: prod
b. Predictors: (Constant), environ, method, man, machine
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -70.981 11.821 -6.005 .000
machine 1.177 .469 .153 2.512 .016
man .630 .157 .122 4.004 .000
method 5.147 2.121 .118 2.426 .020
environ .707 .076 .709 9.315 .000
a. Dependent Variable: prod
ANOVA
df SS MS F Significance F
Note that:
Regression 1 129173.1279 129173.128 15.43583 0.00772299 (1) both t and F
Residual 6 50210.37209 8368.39535 have the same
Total 7 179383.5 p-value, and
(2) t2 = F.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 44.3139535 108.5086985 0.40839079 0.69716178 -221.197461 309.825368
Years 38.755814 9.864427133 3.92884589 0.00772299 14.6184126 62.8932153