0% found this document useful (0 votes)
47 views90 pages

Research Methodology: Data Collection, Analysis and Interpretation

Uploaded by

Anh Nhat Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views90 pages

Research Methodology: Data Collection, Analysis and Interpretation

Uploaded by

Anh Nhat Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

RESEARCH METHODOLOGY

MODULE 3: PART 1
Data Collection, Analysis and Interpretation

Assoc Prof Dr. Maran Marimuthu


8-10 May 2020
FOCUS OF MODULE 3

• Preliminary Understanding
• Data Collection Methods
• Sampling designs
• Connecting RD & SD to RO and RH
• Measurement Scales
• Parametric & Non-Parametric Testing
• Univariate, Bivariate & Multivariate
• Adoption of Appropriate Statistical Methods
• Analysis & Modelling and Interpretation of results
Research Defined

Research is defined as the


systematic and objective process of
generating information for aid in
making decisions.
 Basic or Applied?
I don’t know
if we
should
Information buy a new product?

Reduces

Uncertainty
Three Types of Research

 Exploratory research is done via case


studies and learning the names of
variables.
 Descriptive research is done via surveys
and learning how variables are distributed.
 Causal research is done via experiments
and learning how variables can be
explained.
Determining When to Conduct Research
Availability of Data Benefits
Time Constraints Nature of the Decision vs. Costs
Is the infor- Does the value
Is sufficient time Is the decision
Yes mation already Yes Yes of the research Yes
available before
on hand
of considerable
information
Conducting
a managerial strategic
decision
inadequate
or tactical
exceed the cost Research
for making of conducting
must be made? importance?
the decision? research?

No No No No

Do Not Conduct Research


Problem Discovery Problem Selection of
and Definition exploratory research
discovery technique
Sampling

Selection of
exploratory research
Probability
Nonprobability
technique
Secondary Collection of
ExperiencePilot Case
(historical) Data
data
survey study study Gathering
data (fieldwork)
Problem definition Data Editing and
Processing
and coding
(statement of Analysis
data
research objectives)
Data
Selection of processing
Research Design basic research Interpretation
Conclusions
method and Report
of
Experiment Survey findings
Secondary
Laboratory Field Interview Questionnaire Observation
Data Study Report
Types of Data and Measurement Scales

Data:
measurement scales-add notes.ppt

Non-metric Metric
or or
Qualitative Quantitative

Nominal Ordinal Interval Ratio


Scale Scale Scale Scale

8
Types of Variables

Male/Female
Dichotomous-Norminal
Engineering/non-engineering

Engineering background
Discrete/Ranking Educational level

Production Units
Continuous/Ratio Costs

3-9
Description of HBAT Primary Database Variables

Variable Description Variable Type


Data Warehouse Classification Variables
X1 Customer Type nonmetric
X2 Industry Type nonmetric
X3 Firm Size nonmetric
X4 Region nonmetric
X5 Distribution System nonmetric
Performance Perceptions Variables
X6 Product Quality metric
X7 E-Commerce Activities/Website metric
X8 Technical Support metric
X9 Complaint Resolution metric
X10 Advertising metric
X11 Product Line metric
X12 Salesforce Image metric
X13 Competitive Pricing metric
X14 Warranty & Claims metric
X15 New Products metric
X16 Ordering & Billing metric
X17 Price Flexibility metric
X18 Delivery Speed metric
Outcome/Relationship Measures
X19 Satisfaction metric
X20 Likelihood of Recommendation metric
X21 Likelihood of Future Purchase metric
X22 Current Purchase/Usage Level metric 10
X23 Consider Strategic Alliance/Partnership in Future nonmetric
RESEARCH METHODOLOGY

MODULE 3: PART 2
Data Collection, Analysis and Interpretation

Assoc Prof Dr. Maran Marimuthu


8-10 May 2020
Research Design

 Master plan
 Framework for action
 Specifies methods and procedures
Basic Research Design- Data Collection

 SurveysC:\Users\User\Desktop\PG RM MAY
2020\ONLINE MAY20\Survey-add notes.ppt
 Experiments
 Secondary dataCOVID19 Statistics, Chemical &
Biomolecular Stats
 Observation
The Major Decisions in Questionnaire
Design

1. What should be asked?


2. How should each question be
phrased?
3. Sequence?
4. Meeting the ROs?
5. Pretested?
What Should Be Asked?

 Questionnaire
relevanceC:\Users\User\Desktop\PG RM MAY
2020\ONLINE MAY20\questionnaire- add notes.ppt
 Questionnaire accuracy
Phrasing Questions

 Open-ended questions
 Fixed-alternative questions
Sampling Design

Sample: subset SAMPLE


of a larger population.

POPULATION
Sampling

 Who is to be sampled?
 How large a sample?
 How will sample units be selected?
Two Major Categories of Sampling

 Probability sampling
• Known, nonzero probability for every
element
 Non-probability sampling
• Probability of selecting any particular
member is unknown
Nonprobability Sampling

 Convenience
 Judgment
 Quota
 Snowball
Probability Sampling

 Simple random sample


 Systematic sample
 Stratified sample
 Cluster sample
 sampling-add notes.ppt
Stages in the
Define the target population
Selection
of a Sample Select a sampling frame

Determine if a probability or nonprobability


sampling method will be chosen

Plan procedure
for selecting sampling units

Determine sample size

Select actual sampling units

Conduct fieldwork
Rationalising the Reasech Work!

 RQsROsRH(s)
 Univariate- trend, dispersion, fluctuation within the variable.
 Bivariate – difference, relationship, association, causal
 Multivariate- relationship, association, causal, modeling
Illustration

 RQ1- Does the O&G Industry highly fluctuate in the past


5 years? RO? RH?
 RQ2- Is there any difference between the energy sector
and finance sector with regard to productivity? RO? RH?
 RQ3- what are the factors that affect the performance of
energy sector? RO? RH?
 Significance Level 0.05?
RQ-RO-RH SCENARIO.docx
Scenario 1:

RQ1- Does the O&G Industry highly fluctuate in the past 5 years? RO? RH?

RO: To examine whether the O&G industry highly fluctuates in the past 5 years.

RH: H0 The O&G industry does not fluctuate significantly in the past 5 years.

H1 The O&G industry does fluctuate significantly in the past 5 years.

Scenario 2:

#1RQ2- Is there any difference between the energy sector and finance sector with regard to
productivity? RO? RH?

RO: This study makes an attempt to investigate whether there is a difference between the energy
sector and finance sector with regard to productivity

RH: H0 There is no significant difference between the energy and finance sector with regard to
productivity level.
H1 There is a significant difference between the energy and finance sector with regard to
productivity level.
Scenario 2-additional:

#2RQ2- Are there any differences among the business sectors (more than 2 sectors) and
performance level? RO? RH?
RO: To examine whether there is a relationship between the business sectors and Performance
level
H0 There are no significant differences among the business sectors with regard to performance
level.
H1 There are significant differences among the business sectors with regard to performance
level.
(RO- This study makes an attempt to investigate whether there is a relationship between the
business sectors and performance level)

H0 There is no significant relationship between the business sectors and performance level.

H1 There is a significant relationship between the business sectors with regard to performance

level.
Scenario 3:

RQ3- what are the factors that affect the performance of energy sector? RO? RH?

RO: This study investigates the factors that affect the performance of energy sector in Malaysia.

H0: The factors do NOT affect the performance of the energy sector in Malaysia.

H1: The factors do affect the performance of the energy sector in Malaysia.

Significance Level 0.10? 0.05? 0.01?, margin for error for statistical testing, i.e rejection of H0
RESEARCH METHODOLOGY

MODULE 3: PART 3
Data Collection, Analysis and Interpretation

Assoc Prof Dr. Maran Marimuthu


8-10 May 2020
ANALYSES
&
INTERPRETATIONS
Analysis of Quantitative Data-Dealing
with Data
 Dealing with Data: Coding, Entering, and
Cleaning
 Results with One Variable-UNIVARIATE
 Results with Two Variables- BIVARIATE
 Results with More than Two- MULTIVARIATE
 Relevant for Inferential Statistics
Dealing with Data

 Coding required
 Data Entry
 Data Cleaning
Results with One Variable

 Descriptive Statistics describe numerical data one


variable at a time (univariate), two variables at a
time (bivariate), or more than two (multivariate).
 Frequency Distributions summarize information
including counts and percentages, and
cumulative counts and percentages for nominal,
ordinal, interval, or ratio measurements.
 Graphic representations include the histogram,
bar chart, and pie chart.
descriptive stat-add notes.ppt
Inferential Statistics-add notes.ppt
Measures of Central Tendency

 Mode is the most common or frequently


occurring number.
 Median is the middle point or 50th
percentile used with ordinal, interval or
ratio data.
 Mean is the arithmetic average used with
interval or ratio level data (but it is very
sensitive to extreme values).
Measures of Variation

 In general, variation is defined as the


spread, dispersion, or variability around
the center of the distribution.
 Range is the distance between smallest
and largest scores; e.g. wastage might
vary from a range of 5- 50 units
 Percentiles are scores at a specific place
within the distribution: a 25th percentile
might indicate that 25% of wastage below
5 units.
Measures of Association

 Correlational study
 Relationship
 Causal
Statistical Techniques

one-sample t-test- difference in one variable!


Evaluate the difference between two samples- paired-
sample t-test
Evaluate the difference using INDEPENDENT t-test
Evaluate the difference using One- Way ANOVA
Evaluate the correlation between variables
Evaluate causal effect using Regression Analysis

Open
Hypothesis Testing : Wording of Final Conclusion

 p-value: Probability that the observed results could occur


by chance

Small p-value
Large p-value
 Ho is rejected
 Ho is not rejected
 Ha is accepted and I
 Ha is rejected
have p-value chance of
being wrong

If p ≤ 0.05, we reject the Null Hypothesis If p is


and accept the Alternative Hypothesis low, Ho
must GO!

Open
Discussions on
statistical Output:

Open
T-Test Results:
Examples

Open
ATTACH1-SIMPLE T TEST.docx
ATTACH2-PAIRED-IND-ANOVA TEST.docx
Simple T-Test: 2296 obs
One-Sample Statistics

N Mean Std. Deviation Std. Error Mean


crudeoil 2296 29.1728 12.35384 .25782

One-Sample Test

Test Value = 60

95% Confidence
Interval of the
Difference
Sig. (2- Mean
t df tailed) Difference Lower Upper
crudeoil - 2295 .000 -30.82723 -31.3328 -30.3216
119.569
PAIRED T -TEST
PAIRED T -TEST
PAIRED T -TEST
PAIRED T -TEST
PAIRED T -TEST
Independent T-Test
Independent T-Test
Independent T-Test
1-way ANOVA
1-way ANOVA
1-way ANOVA
1-way ANOVA
t-Test-excel
Excel output
t-Test: Two-Sample Assuming Unequal Variances

Japan United Kingdom


Mean 1953 1783 Note that the
Variance 378300.7 142561.9 degrees of
Observations 31 34 freedom for this t-
Hypothesized Mean Difference
0 test is 49, not 63.
df 49
t Stat 1.327629 The test statistic
P(T<=t) one-tail 0.095226 The p-value, one
t Critical one-tail 1.676551 tail
P(T<=t) two-tail 0.190453 The critical bound,
t Critical two-tail 2.009574 one tail

© 2008 Thomson South-Western


One-Way ANOVA - excel
Compare calculated values to those in the Excel output:
Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Alone 10 637 63.7 87.56666667
WithPass 12 683 56.91666667 63.53787879

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 250.9833333 1 250.9833333 3.37566268 0.081071382 4.351250027
Within Groups 1487.016667 20 74.35083333

Total 1738 21

The test statistic The p-value The critical bound

© 2008 Thomson South-Western


Parametric/non-Parametric

Using MEAN as
1 sample T 2 Sample T the parameter of
Hypothesis or

 Normality p – value > 0.05


 (|(Mean-Median)|/Mean )*100
ANOVA ≤ 10%

Mann
Sign Test
Whitney
Using MEDIAN as the
parameter of hypothesis
Kruskal
 Normality p – value ≤ 0.05 Wallis
 (|(Mean-Median)|/Mean )*100
> 10%

Open
Correlation & Regression

Open
Correlation & Regression

• Measure the degree of linear association between two variable using graph and
statistics
• Measure the degree of linear association between two or more variables using
correlation
• Model the relationship between a continuous response variable and one or more
predictor variables
• Determine the strength of the relationship between a continuous response variable
and one or more predictor variables

Open
What is Correlation?

Correlation is…

The sample correlation coefficient, r, measures the degree of linear


association between two variables.

+ve correlation – both variables tend to increase or


decrease together.
-ve correlation – one variables increases, the other
tends to decrease

When to use correlation?


Use when to determine if there is a linear relationship between them. Only
use when the variables are in continuous data mode .

Why use correlation?


Correlation helps to answers questions such as:

Are two variables related in a linear fashion?


In Practical…
• Is there a relationship between temperature and the viscosity of
cooking oil?

What is the strength of the relationship? In Practical…


How strong is the relationship between ultraviolet exposure and
Open reduced strength in nylon tent material?
Strength & Direction of Correlation
ATTACH3-BIVARIATE CORRELATION-REGRESSION.docx

110 110
85 85 85
100 100
90

Output
90
Output

Output

Output

Output
75 75 75
80 80

70 70

60 65 65 65 60

50 50
40
40 55 55 55
30
30
40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 100 110 120
40 50 60 70 80 90 100 110 120

Input Input Input Input Input

r approaches -1.0 r ~ 0 r approaches 1.0

Strong Weak No Weak Strong


negative negative Correlati positive positive
correlat correlat on correlat correlati
ion ion ion on

Open
Correlation
Correlation
Correlation

Correlations

PRODUCTIVITY HOURS
PRODUCTIVITY Pearson Correlation 1 .623**

Sig. (2-tailed) .000

N 30 30

HOURS Pearson Correlation .623** 1

Sig. (2-tailed) .000

N 30 30

**. Correlation is significant at the 0.01 level (2-tailed).


Correlation Matrix for Store Image Elements

V1 V2 V3 V4 V5 V6 V7 V8 V9
V1 Price Level 1.00
V2 Store Personnel .427 1.00
V3 Return Policy .302 .771 1.00
V4 Product Availability .470 .497 .427 1.00
V5 Product Quality .765 .406 .307 .472 1.00
V6 Assortment Depth .281 .445 .423 .713 .325 1.00
V7 Assortment Width .354 .490 .471 .719 .378 .724 1.00
V8 In-Store Service .242 .719 .733 .428 .240 .311 .435 1.00
V9 Store Atmosphere .372 .737 .774 .479 .326 .429 .466 .710 1.00

3-70
What is simple regression?
ATTACH3-BIVARIATE CORRELATION-REGRESSION.docx

Simple regression…
Examines the relationships between a continuous response variable (Y)
and a predictor variable (X)

General Equation: Y = b0 + b1X + e

When to use simple regression?

Use when you have a continuous Y and one predictor, X.


( 1 To 1)

Why use simple regression?


Will help answer questions such as:

How important is X in predicting Y? In Practical…

How does processing temperature relate to the mean harness of


steel?

• What value can you expect for Y when X is 20? In Practical…


•What is the predicted mean hardness of steel if processed at a
particular temperature?

• How much is the mean of Y expected to change if you increase X by one


unit? In Practical…
Open •What is the expected mean increase in hardness if the processing
Bivariate Regression
Bivariate Regression
Bivariate Regression
Correlation & Causation

I. "Correlation does not imply causation" is a phrase used in


science and statistics to emphasize that correlation between two
variables does not automatically imply that one causes the other
.
II.It is important to understand the nature of the relationship
between the Xs and Ys
III.The relationship
Type Of Possible Relationship possibilities between Xs and Ys to considered :
Example

Direct Cause and Effect Higher propane % increase Fuel Gas Density
relationship
Reverse Cause and Effect Bad grades may be caused by absences BUT one should not fail to
relationship consider the fact that bad grades may cause absences

Due to chance or coincidence Suppose that a high correlation is observed between the weekly
sales of hot chocolate and the number of skiing accidents

Coincidence
The coincidence may
be caused by the 3rd
variable, which is
WINTER MONTHS !

Open
Results with More than two Variables: The Elaboration Model

 To test for whether an alternative


explanation accounts for a relationship
found in bivariate analysis, social
researchers sometimes attempt to rule out
another variable.
 A trivariate table is built from a bivariate
table on the independent and dependent
variable for each category of a third, or
control, variable.
Multiple Regression Analysis

 Multiple regression is a statistical technique for interval or


ratio level analysis that accounts for multiple independent
variables and their combined influence on one dependent
variable.

 Multiple regression can also be used to test for the effects


of one or more control variables.
 ATTACH4-MULTIPLE REGRESSION -
MODELING.docx
Multiple Regression
Multiple Regression
Multiple Regression
Multiple Regression
Multiple Regression
Multiple Regression

Variables Entered/Removeda
Model Variables Entered Variables Removed Method
1 environ, method, man, machineb . Enter
a. Dependent Variable: prod
b. All requested variables entered.

Model Summaryb

Model R R Square Adjusted R Square Std. Error of the Estimate


1 .990a .980 .978 16.623

a. Predictors: (Constant), environ, method, man, machine

b. Dependent Variable: prod


Multiple Regression

ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 584669.492 4 146167.373 528.979 .000b
Residual 11881.758 43 276.320
Total 596551.250 47
a. Dependent Variable: prod
b. Predictors: (Constant), environ, method, man, machine
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -70.981 11.821 -6.005 .000
machine 1.177 .469 .153 2.512 .016
man .630 .157 .122 4.004 .000
method 5.147 2.121 .118 2.426 .020
environ .707 .076 .709 9.315 .000
a. Dependent Variable: prod

PROD = α + Βmac + βman + βmet + βenv + ε


PROD = -70.981 + 1.177mac + 0.630man + 5.147met + 0.707env
R squared = 0.98 …98%, F-test very significant at 0.01
Multiple Regression
Multiple Regression
Multiple Regression
Regression-excel
SUMMARY OUTPUT The global F
test statistic
Regression Statistics for the test of
Multiple R 0.848584 Coefficient of correlation H0: b1 = 0
R Square 0.72009481 Coefficient of determination
Adjusted R Square 0.67344395
Standard Error 91.4789339
Observations 8

ANOVA
df SS MS F Significance F
Note that:
Regression 1 129173.1279 129173.128 15.43583 0.00772299 (1) both t and F
Residual 6 50210.37209 8368.39535 have the same
Total 7 179383.5 p-value, and
(2) t2 = F.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 44.3139535 108.5086985 0.40839079 0.69716178 -221.197461 309.825368
Years 38.755814 9.864427133 3.92884589 0.00772299 14.6184126 62.8932153

The calculated t for the test of H0: b1 = 0


Additional Modeling

 Logit- binomial, multinomial


 Tobit
 Probit
 Multiple Discriminant Analysis
 Panel Data Regression
 Pooled Regression
 MANOVA
 2-Way ANOVA
 Factor Analysis
 Cluster Analysis
Q&A
Thank you!

You might also like