0% found this document useful (0 votes)
6 views42 pages

RMunit 3

Uploaded by

Sanjeev Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views42 pages

RMunit 3

Uploaded by

Sanjeev Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

UNIT 3

Research Methodology
(Shaheed Rajguru College of Applied Sciences for Women)
Syllabus (Unit-3): Data Collection and Analysis: Observation and collection of data-
Methods of data collection. Modeling: Mathematical models for research, Sampling
Method, Data processing, and analysis strategies. Data analysis with statistical packages-
Hypothesis testing, Sampling, Sampling Error, Statistical methods/Tools -Measure of
central tendency and Variation, Test of Hypothesis- z test, t-test, F test, ANOVA, Chi-
square, correlation, and regression analysis, Error estimation.
Chapter Topic Page number

1 Data Collection and Analysis 3


1.1 Data Classification 3
1.1.1 Qualitative Data and Quantitative Data 3
1.1.2 Primary Data and Secondary Data 4
1.1.3 Considerations while Collecting the 6
Secondary Data
1.2 Methods of data collection 7
1.2.1 Primary Data Collection Methods 7
1.2.1.a Interview 7
1.2.1.b Questionnaire 7
1.2.1.c Observation 8
1.2.2 Secondary Data Collection Methods 10
1.3 Possible Questions 11
2 Modelling 12
2.1 Definition of Mathematical Model 12

2.2 How Mathematical Models Used in Research 12


2.3 Types of mathematical models 13
2.4 Benefits of Using Mathematical Models 14
3 Sampling 15
3.1 What is Sampling 15
3.2 Type of Sampling 15
3.3 Probability Sampling 16
3.4 Difference between stratified sampling and cluster 18
sampling
3.5 Non-Probability Sampling 19
3.6 Sampling Error 20
4 Data Processing and Data Analysis 22
4.1 Data Collection 22
4.2 Data Cleaning 22
4.3 Data Analysis 22
4.4 Data Interpretation 23
4.5 Data Visualisation 23

1
5 Measure of Central Tendency and Variation 24
5.1 Measures of Central Tendency 24
5.1.1 Mean 24
5.1.2 Median 24
5.1.3 Mode 24
5.2 Why is Central Tendency important in research 25
5.3 Measures of variation 25
5.3.1 Range 25
5.3.2 Variance 25
5.3.3 Standard Deviation 25
5.4 Why is Variation important in research 26
6 Hypothesis 27
6.1 What is Hypothesis 27
6.2 Independent variable and Dependent variable 28
6.3 Characteristics of Good Hypothesis 28
6.4 Types of Hypothesis 28
6.4.1 Simple hypothesis 28
6.4.2 Complex hypothesis 28
6.4.3 Null hypothesis 28
6.4.4 Alternative hypothesis 29
6.4.5 Directional hypothesis 29
6.4.6 Non-directional hypothesis 29
6.5 More Facts about Null -Hypotheses and 30
Alternate -Hypothesis
7 Hypothesis Testing 31
7.1 T-Test 32
7.2 Z-Test 33
7.3 Comparison between T-Test and Z-Test 34
7.4 F-Test 35
7.5 ANOVA Test 36
7.6 Chi-square Test 37
7.7 When to use ANOVA 38
7.8 When to use Chi-square 38
8 Correlation and Regression 39
8.1 Correlation Analysis 39
8.2 Regression Analysis 39
8.3 Key Differences 40
9 Error Estimation 41
9.1 Types of Errors 41
9.2 Error Estimation Techniques 41
9.3 Importance of Error Estimation 41
Summary 42

2
Chapter 1
Data Collection and Analysis

1.1 Data Classification:


Data can be classified as:
1.1.1 Qualitative Data and Quantitative Data
1.1.2 Primary Data and Secondary Data

1.1.1 Qualitative Data and Quantitative Data

1.1.1.a Qualitative Data:


 Qualitative data is the data type that describes the quality/characteristics of an item.
 It cannot be expressed in the form of numbers. It is in the form of text, images, audio,
video, etc.
 Qualitative data is collected through questionnaires, interviews, observations, etc.
 Example: Satisfaction, motivation, honesty

 Types of Qualitative Data

 Nominal Data:
Nominal data is used to categorize things into mutually exclusive groups without any rank or
order.
Example: marital status (single, divorced, married), gender (male, female), mode of
transportation (car, bus, plane, train).

3
 Ordinal Data:
Ordinal data categorize and rank the data in a certain order
Example: Grades in the exam (A, B, C, D), Position in a competition (first, second, third).
Financial status(upper class, middle class, lower class). Level of education(University level,
College level, School level)

1.1.1.b Quantitative Data


Data that can be quantified, measured, and assigned a numerical value comes under the
category of quantitative data.
Example: Age, Income, Distance, Temperature, Price etc.

 Types of the Quantitative Data

 Discrete Data:
Data that can only take specific/distinct values. It is in the form of whole numbers or integers.
Example: Number of students in a college, Number of family members, Number of states.
 Continuous Data:
Data that can be quantified, measured, or assigned a numerical value
Example: Income, Temperature data, Price of any product.

1.1.2 Primary Data and Secondary Data:


1.1.2.a Primary Data
Primary data is the original data that is collected for a specific purpose. We collect the primary
data starting from the main source. Primary data collection methods include surveys,
interviews, observations, experiments, focus groups, and case studies. These methods are used
to gather original data directly from the source.
 Methods/Sources of Primary Data Collection
 Primary data collection involves gathering first-hand information directly from the
source for specific research purposes.
 This process includes various methods, allowing researchers to obtain relevant and
accurate data related to their study's objectives.
The primary data collection methods are shown in Figure 1.1

4
Figure 1.1 Methods of Primary Data Collection

1.1.2.b Secondary Data


Secondary data is not the original data. Secondary data is collected by someone earlier but you
are using this data in your research.
Secondary data is information that has already been collected by someone else and is used for
a different purpose than the current research. It can include Statistics, Reports, and other
documents.

 Methods/Sources of Secondary Data Collection

The methods of Secondary Data Collection are shown in Figure 1.2

Figure 1.2 Methods of Secondary Data Collection

5
1.1.3 Considerations (Special care) while collecting the Secondary Data for Research.
While collecting the secondary data the following points must be considered
 General considerations:
(i) Relevance:
The researcher must ensure that the data is relevant and fit for the research purpose. The data
should address/answer the specific questions/problems and fulfill the objectives of the
research.

(ii) Adequacy:
Evaluate if the data is sufficient to answer the research question.

(iii) Credibility of the source:


Evaluate the credibility/authenticity and reputation of the source of the data.
.
(iv) Accuracy and Completeness:

Check for potential errors, inconsistencies, or biases in the data.

(v) Consider the methodology:


Understand how the data was collected, what is the sample size, and the sampling method
used.

(vi) Time and Conditions of Data Collection:


Evaluate/Assess the period/timeframe and conditions of data collection

 Ethical Considerations:

(viii) Obtain necessary permissions: If the data is not publicly available, ensure that you have
the necessary permissions to use it.

(ix) Protect data confidentiality: If the data contains sensitive information, ensure that it is
protected and handled ethically.

(x) Acknowledge sources: Properly cite/reference the source of the secondary data in your
research.

6
1.2 Methods and Tools of data collection
1.2.1. Primary Data Collection Methods

1.2.1.a Interview
 The interview is the primary data collection method
 Interview is the communication process between two persons.
 One person will ask the question and the other will respond. One who asks the question
is called the interviewer. The one who responds to questions is called the interviewee.
The response of the interviewee is recorded by the interviewer (Researcher)
The interview can be classified as
 Structured Interview
 Unstructured Interview

 Structured Interview:
 Structured Interview has preplanned questions, pre-determined questions
 The interviewer will ask only those number/ type of questions

 Unstructured Interview:

 An unstructured interview is the opposite of a structured interview


 The researcher changes the questions according to the intelligence level of the
interviewee.
There are no preplanned questions.

Tool for Interview data collection: Interview schedule

1.2.1.b Questioning/ Questionnaire:


The questionnaire method was first developed by Sir Francis Galton in the Statistical Society
of London in 1838. Quite often the questionnaire is considered the heart of survey operation,
hence it should be carefully constructed
 The questionnaire is a primary data collection tool.
 A Questionnaire is a pre-formulated written set (list) of questions
 This list of questions or items is used to gather data (useful information) from
respondents about their attitudes, experiences, or opinions.
(e.g. feedback survey by a company about their product)
 Questionnaires can be used to collect quantitative and/or qualitative information.
 Questionnaires are commonly used in market research as well as in the social and
health sciences. Where respondents provide their answers in their own words rather
than selecting from a predefined list of choices.

7
The questionnaire can be classified as
 Structured Questionnaire
 Unstructured Questionnaire

 Structured Questionnaire

 The structured questionnaire has a specific pre-determined structure.


 Quantitative data is collected through a Structured questionnaire.
 A structured questionnaire is a document used to collect data from respondents
and consists of a set of standard questions with a pre-determined framework that
sets the precise language and sequence of questions.
 The questions are more close-ended
 The researcher restricts the respondent.
 There are certain options and respondents must choose/select/respond from
those certain options/choices like, Yes/No, Agree/Disagree, True/False,
Multiple-Choice type.
For example: The current government can handle corruption/Terrorism: Yes or No.
 Unstructured Questionnaire

 The unstructured questionnaire does not have any specific structure (just has the
basic structure)
 Qualitative data is collected through an unstructured questionnaire.
 It uses open-ended questions, allowing respondents to provide detailed, free-form
answers (Descriptive type) without predefined choices.
 This type of method provides a lot of flexibility and rich insights
For Example: What was your experience in a certain place/restaurant? (In description)
Tools for questioning data collection: Google Forms, SurveyMonkey, Type form, and Jot
form.
1.2.1.c Observation:
The researcher observes the researcher for a particular thing
 Observation is a method of collecting primary data.
 Observation is defined as the systematic, selective, and purposeful way of watching,
examining, or listening to what is happening in their natural setting and
documenting that.
 In Observation, the researcher will not interfere in the event.
For example: A cricket team coach observes his/her team while they are performing.

1.2.1.c.1 When to use the observation method?


 Observation is an appropriate method of data collection when you want to study
culture, an ongoing process behavior of the individual, etc

8
 it is also useful when full and appropriate/ accurate information is not obtained
through a questionnaire because either the respondent is unaware of the questions or
is not cooperative.
Types of observation:

 Naturalistic observation:
 As the name suggests in naturalistic observation researcher observes how the
participants respond to their environment in real life or a natural setting.
 The researcher does not influence their behavior.
Example: Observing animals in national parks
 Participant observation:
In participant observation, the researcher participates in the activities of the group being
observed as a member of the group with/without knowing that they are being observed.
Example:
Spending a few months in jail with prisoners to know their perception of the judicial system in
the country
 Non-Participant Observation:
The researcher observes from a distance, without actively participating in the activity
 Structured Observation:
 Structured observation involves systematically recording observations of specific,
pre-defined behaviors or events in a setting, often using checklists or coding
systems.
 Researchers using structured observation aim to gather quantitative data by
quantifying the frequency, duration, or intensity of specific behaviors.

For example: A researcher is interested in studying classroom behavior patterns in elementary


school students. They develop a structured observation protocol that outlines specific behaviors
to be observed, such as raising hands, participating in group discussions, and following
instructions.

 Unstructured Observation:
 Unstructured observation is a qualitative research method where researchers observe
behaviors or events without a predetermined checklist or structured recording
system.
 It allows for a more flexible, open-ended approach to data collection
For example, An example of unstructured observation is a researcher spending time in a public
park, taking detailed notes on social interactions and behaviors without a pre-defined
checklist or categories.

9
 Overt Observation:
The participants know they are being observed.
 Covert Observation
The participants are unaware they are being observed
Tools for Observation data collection: Field notes, Audio and Video recordings, Rating
scales, Checklists, and Anecdotal Records.

1.2.2 Methods of Secondary Data Collection


 Secondary data means data collected by someone else earlier.
 Access to secondary data is fast and easy, it is cost-effective.
 Secondary data may be Published or Unpublished.

 Sources of secondary data collection


Internal Sources:
For Example, the researcher works in a certain company/organization/institution. He/she may
get the
Data internally (inside the company) such as:
 Sales Report
 Financial/Accounting Statements
 Customer details
 Company information
 Reports and feedback from dealer/retailer/distributor
 Management information system etc
External Sources:
 Government censuses (Population census, agriculture census)
 Historical documents
 Data published by different government departments or private institutions
(such as Banks, Trade Industries (Business data such as share market), Social
Security, Tax, Transportation, etc)
 Journals, Books, Magazines, Internet, Libraries etc
 Commercial sources: TV, Newspapers etc

10
1.3 Possible Questions
Explain the classification of data with suitable examples.
Q 1.1 Evaluate the different methods of Collecting Secondary data.
Q 1.2 Distinguish between Primary Data and Secondary Data
Q 1.3. Explain the meaning of primary data and secondary data. Discuss the special care
that is to be taken while collecting the secondary data for research.
Q 1.4. Discuss different methods of collecting data, its merits and demerits and brief on
the ethical issues in collecting data.
Q1.5 Explain the merits and demerits of the interview data collection method.
Answer:
Merits:
 More in-depth information obtained
 Personal Information
 Greater Flexibility
 Adaptation as per the respondent
Demerits:
 Bias of Interviewer
 Expensive/Time Consuming
 Need expertise
Q1.6 What are the characteristics of a good questionnaire?
The following are the characteristics of a good questionnaire.

 Questions should be related, specific, and concise (Uniformity).


 Questions should be in sequence or logical manner.
 The number of questions should be limited
 Discriminatory/Embarrassing questions should not be used in any case.
 Language used in the questionnaire should be simple, do not use technical
terminologies and abbreviations.
Q1.7 Describe the Ethical considerations while collecting secondary data.

11
Chapter 2
Mathematical Modelling

2.1 Definition of Mathematical Model:

The mathematical model can be defined in the following ways

 A mathematical model is an abstraction of a real-world system, focusing on key


variables and their relationships.
 It uses mathematical equations, inequalities, and other mathematical tools to
describe the system's behavior.
 It helps researchers understand how different parts of a system interact, predict
future outcomes, and test hypotheses.
 A mathematical model in research is a tool used to represent/mimic/replicate real-
world phenomena or systems/processes using mathematical equations and other
mathematical concepts. (in other words, describing any process using mathematical
expression)
 It allows researchers to Study, Analyze, and Make Predictions about the behavior of
a system or process without directly manipulating the real-world system.
 Mathematical models are widely used in various fields like science, engineering, and
economics.

2.2 How Mathematical Model Used in Research:


2.2.1 Problem definition:

Researchers start by clearly defining the problem or phenomenon they want to model.

2.2.2 Model development:

They identify key variables, relationships, and assumptions, and then translate them into
mathematical equations or other models.

2.2.3 Analysis and Simulation:

The model is then used to analyze the system's behavior, simulate different scenarios, and
make predictions.

2.2.4 Validation and Refinement:

The model's accuracy and usefulness are evaluated against real-world data (using test data),
and it may be refined based on the results.

12
2.3 Types of mathematical models:

2.3.1 Deterministic models:

These models assume that the system's behavior/system output /Process behavior is
completely determined by its initial conditions and input variables.

2.3.2 Stochastic models:

 These models incorporate the Stochastic effect (randomness and uncertainty).


 These models are the mathematical tools used to predict outcomes where randomness
and uncertainty play a significant role.
 The system's behavior is not only determined by its initial conditions and input
variables but also by the stochastic inputs (such as Uncertainty, External
disturbances, Noise, etc).

2.3.3 Differential equation models:

Differential equation models are often used to describe the dynamic behavior using differential
equations of systems over time.
Such as the transient behavior of second-order systems (Mechanical system-spring mass
damper system, Electrical system-RC circuits).

2.3.4 Statistical models:

A statistical model is a mathematical model that uses statistical assumptions for the
generation of sample data.

It involves the relationship between random and non-random variables.

logistic regression, time series, clustering, and decision tree s

Used to analyze data, identify patterns, and make predictions.

13
2.4 Benefits of Using Mathematical Models:

2.4.1 Improved understanding:

They provide a deeper understanding of complex systems and processes.

2.4.2 Prediction and Forecasting:

They allow researchers to make predictions about future behavior.

2.4.3 Optimization and decision-making:

They can be used to identify optimal strategies and make informed decisions.

2.4.4 Testing hypotheses:

They can be used to test the validity of different hypotheses and theories.

14
Chapter 3
Sampling Methods

3.1 What is sampling? Why sampling is needed?

 Sampling means how we select samples carefully from the population for
research/study.

 Practically it is not possible to study the entire population. To study the entire
population, we need more resources. It takes more time and cost and it is practically
impossible to handle, manage, and analyze large amounts of data.

 To draw valid conclusions from research samples should be carefully selected which
represent the whole population.

 The researcher collects samples from the population. Then he/she will analyze the data
using different statistical tools and then conclude the results. Then he/she generalized
the result to the entire population.

3.1.1 What is an important consideration while choosing a sample?


A sample must be perfectly representative of the population .it allows you to generalize your
findings to the population.

3.2 Type of Sampling /Classification


Sampling can be classified into two broad categories and further into sub-categories.

 Probability-Sampling
 Non- Probability-Sampling

Probability -Sampling Non-Probability Sampling


 Simple Random Sampling  Convenient Sampling
 Stra fied Sampling  Judgement Sampling
 Systema c Sampling  Quota Sampling
 Cluster Sampling  Snowball Sampling

15
3.3 Probability-Sampling
Probability sampling methods where all subjects in the target population have equal chances
to be selected in the sample.
3.3.1 Simple Random Sampling:
In this technique, every member of the sample is selected purely on a random basis with equal
chance.
For Example: Picking chits from the bowl, and a lottery system are the methods of random
sampling.

Figure 3.1 Random Sampling

First, you define the population. Suppose the population size is 1000, Now you select the
sample size 100. We must select 100 respondents randomly
3.3.2 Stratified Sampling
In this, the population is divided into mutually exclusive groups and every member of the group
has an equal chance of being selected for research.

Figure 3.2 Stratified Sampling

16
Grouping may be based on age, gender, employment, etc. From every group, the researcher
selects the respondent purely on a random basis.
3.3.3 Systematic Sampling
Systematic sampling is a probability sampling method where researchers select members
of the population at regular intervals.
For example, by selecting every 3rd person on a list of the population. If the population is in a
random order
Sampling interval= total population/sample size
Suppose the total population is p and the sample size is n
Sampling interval=p/n

Figure 3.3 Systematic Sampling

3.3.4 Cluster sampling

 This technique is used when a large population (geographically dispersed) is under


study When the population is divided into small groups called clusters.

 Cluster sampling is a survey sampling method wherein the population is divided


into clusters, from which researchers randomly select some to form the sample. This
approach falls under the broader category of probability sampling, making it a valuable
tool for examining extensive/large populations.

For example,

 A company wants to study the performance of the product in the country. The country
is divided into clusters (cities, towns, metropolitan, etc)

 In government schemes, cluster sampling can be used to evaluate the effectiveness of


a program by studying a representative sample of the population. For example, a
government agency might use cluster sampling to assess the impact of a food
assistance program by randomly selecting several states and then sampling
households within those states.

17
Figure 3.4 Cluster Sampling

3.4 Difference between stratified sampling and cluster sampling?

 The key difference between stratified and cluster sampling lies in their approach to
grouping and sampling.
 In a subgroup in stratified sampling, samples are chosen randomly within distinct
categories/homogeneous ‘groups’.
 Stratified sampling aims for homogeneous subgroups (strata), while cluster sampling
involves heterogeneous groups (clusters)
 In contrast, cluster sampling involves randomly selecting clusters from the
population and then sampling all members within those chosen ‘clusters. This
method proves particularly efficient for populations spread across various geographical
locations.
Advantages of Stratified Sampling

 Increases precision and reduces sampling error, especially when there's high
variation within subgroups.
 Allows for separate analysis of subgroups. Not easy to implement

Disadvantages:

 Can be more complex and time-consuming to implement than cluster sampling.


 Requires clear information about the population structure (strata).

18
Advantages of Cluster Sampling

 Cost-effective and efficient, especially for large, geographically dispersed


populations.
 Relatively easy to implement.

Disadvantages:

 This may lead to less precision and higher sampling error compared to stratified
sampling, especially if clusters are not representative of the population.
 Requires clear information about cluster boundaries.

3.5 Non-Probability Sampling

Non-probability sampling is a method in which not all population members have an equal
chance of participating in the study, unlike probability sampling.

3.5.1 Convenience Sampling

 Convenience sampling is also called grab sampling, availability sampling,


accidental sampling, etc.
 It is a type of sampling in which data is collected from the conveniently available
respondents.

3.5.2 Judgement Sampling


 It is also called selective sampling, subjective sampling, and authoritative sampling.
 It is a sampling technique in which the researcher selects the respondents based on
knowledge and judgment.
The researcher creates a criterion in his mind that he will collect the data from the type of
respondent that belongs to a certain religion, certain state, age group, or qualification.
It is easy and cost-effective sampling. But is vulnerable to sampling bias as it is entirely
dependent on the researcher’s judgment.
3.5.3 Quota Sampling
 It is a sampling technique in which the entire population is divided into groups and then
a quota (no of items to be selected for research) is assigned against each group.
 Groups examples, males, females, employed or unemployed people, age groups,
location, etc.
 Once the quota is assigned to each group the sample is selected on convenience or
personal judgment.

19
3.5.4 Snowball sampling

 As the snowball moves further from top to bottom it gets bigger and bigger.
 It is a sampling technique in which the researcher selects one or two respondents
first. These respondents refer to or identify other respondents.
 The researcher continuously selects respondents based on referral until the required
sample size is achieved.
 Snowball sampling is also called referral sampling, chain sampling, network
sampling, and friend-to-friend sampling.

3.6 Sampling Error

Assume you are a market researcher of a company looking to introduce a new product to the
market. You must collect data from a sample of potential customers as part of your research to
determine their preferences and purchasing behavior. But how can you be sure that the
information you get from your sample is accurate for all the people who might buy your
product? The idea of sampling error comes into play here.

3.6.1 Definition:

It is the difference between what a sample has and what the entire population has. It can
significantly affect how accurate and reliable market research data is.

Or

A sampling error occurs when the sample used in the study does not represent the entire
population.

3.6.2 Steps to reduce Sampling errors

20
 Increase sample size

A larger sample size is more accurate because the study gets closer to the actual
population size.

 Divide the population into groups.

Test groups according to their size in the population instead of a random sample.

 Know your population

Study your population and understand its demographic mix (various characteristics of
a population, such as age, gender, ethnicity, income, education level, and other
socioeconomic factors).

21
Chapter 4
Data Processing and Data Analysis

Data Processing
Data processing means the steps we follow while performing data analysis. Data analysis is a
critical task and needs full attention. Following are the steps followed in the data analysis
process,
Data processing includes the following steps
4.1 Data Collection
4.2 Data Cleaning
4.3 Data Analysis
4.4 Data Interpretation
4.5 Data Visualisation

4.1 Data Collection


Two types of data are used in research qualitative data and quantitative data.
Some common sources of data collection are -
Primary sources: Case studies, Surveys, Experiments, Interviews, Questionnaires,
Observations, Focus groups
Secondary sources: Books, Magazines, Datasets, Journals etc

4.2 Data Cleaning


Removing Duplicate Records, Anomalies, incorrectly formatted errors, Inconsistencies,
and Missing values. Data cleaning is mandatory before data analysis otherwise the results
will be inaccurate.

4.3 Data Analysis


Different statistical techniques e.g. Correlation, Narrative analysis, and Regression are used
to study the relation between the variables.
Different data analysis software e.g. Excel, Python, SPSS, MATLAB, etc are used for data
analysis.

22
4.4 Data Interpretation:
After data analysis, the next step is data interpretation. Data interpretation is drawing
conclusions and inferences based on data analysis and generalizing research findings.
4.5 Data visualization:
Graphically represent the research findings using bar charts, graphs, line charts, plots, tables,
heat maps, etc. Visualization helps to understand key research findings and observe
relationships.

23
Chapter 5
Measure of Central Tendency and Variation

5.1 Measures of Central Tendency

 One of the most useful Statistics for researchers is the Center Point of the data.
Knowing the center point answers such questions as, “What is the average value?”

 The central tendency in statistics describes the central or typical value of a dataset.
 It's a measure used to Summarize a dataset with a single value that represents the
middle or center of the data distribution. The three main measures of central tendency
are the

 Mean
 Median
 Mode

All three provide insights into “the center” of a distribution of data points. These measures
of central tendency are defined differently because they each describe the data differently
and will often reflect a different number. Each of these statistics can be a good measure
of central tendency in certain situations.

Important point:
The choice of central tendency measure (Mean, Median, or Mode) depends on the shape of
the data distribution.
 The Mean is appropriate for Normally Distributed Data(Normal Distribution is a
statistical distribution where data points are evenly spread around the mean.

 The Median is better for Skewed Distribution Data (Skewed Distribution is a


statistical distribution where data points are not evenly spread around the mean).

 The Mode is useful for categorical data.

5.1.1 Mean: The mean is the average of all the values in a dataset, calculated by summing
the values and dividing by the total number of values.

5.1.2 Median: The median is the middle value in a sorted (in a certain order) dataset. In
other words, half of the values are above the median, and half are below.

5.1.3. Mode: The mode is the value that appears most frequently in a dataset.

24
5.2 Why is Central Tendency important in research?

5.2.1 Summarizing Data:

Central tendency measures provide a concise way to describe the typical value of a dataset,
making it easier to interpret and communicate research findings.

5.2.2 Comparing Groups:

Researchers can use these measures to compare the central tendencies of different groups in
their study, for example, comparing the average income of two different populations.

5.2.3 Statistical Analysis:

These measures are often used as inputs in more complex statistical analyses, such as t-tests
and ANOVA.

5.2.4 Understanding Data Distribution:

The choice of central tendency measure (mean, median, or mode) depends on the shape of the
data distribution. The mean is appropriate for normally distributed data, while the median is
better for skewed distributions, and the mode is useful for categorical data.

5.3 Measures of variation:

Variation describes the spread or dispersion of data points around that central value.

Common measures of variation include the

 Range
 Variance
 Standard deviation.

5.3.1 Range:
The difference between the highest and lowest values in a dataset.
5.3.2 Variance:
A measure of how much data values deviate from the mean.

5.3.3 Standard Deviation:

The square root of the variance also measures the spread of data points around the mean.

25
5.4 Why is Variation important in research?

5.4.1 Understanding Data Spread:

 Variance provides a numerical measure of how much the data points deviate from
the average.
 A higher variance indicates greater variability, meaning the data is more dispersed,
while a lower variance suggests data points are clustered closer to the mean.

 This understanding of spread is essential for interpreting data correctly and


avoiding misinterpretations based solely on the mean. (imp)

5.4.2 Statistical Hypothesis Testing:

 Variance is the foundation of many statistical tests, including the Analysis of Variance
(ANOVA).

5.4.3 Model Evaluation and Prediction:

 Variance is used to evaluate how well a model fits the data.

26
Chapter 6
Hypothesis

6.1 What is Hypothesis


The hypothesis is a statement that we suppose, guess, or imagine in our research, and during
our research we verify or test these statements.
Or
Hypothesis is a Claim, Expectation, Prediction, and Imagination of the researcher regarding
the relationship between independent and dependent variables.

6.2 Independent variable and Dependent variable


The relationship between Independent and Dependent variables is fundamentally a cause-
and-effect relationship.
6.2.1 Independent variable: it is the thing that is responsible for making changes.
Or
The Independent variable is the factor that is manipulated or changed by the researcher to
observe its effect on another variable.
First Example: Smoking causes Lung Cancer(hypothesis)
Smoking is the cause that makes some changes and is responsible for lung cancer which is the
effect. Smoking is the C

6.2.2 The dependent variable: (the effect) is the outcome that is measured and expected to
be influenced by the independent variable.
In the above example: The dependent variable is Lung Cancer.
Generally Dependent variable is the Research problem under study. (In this case Lung
Cancer)
If our research supports or favours this hypothesis then we accept it otherwise reject it
(First example)
Second Example:
Regular exercise(independent variable) boosts our immunity and reduces the risk of heart-
related disease(dependent variable)

27
6.3 Characteristics of Good Hypothesis
 It should be testable which can be verified with less difficulty
 It should be logical (sensible, reasonable, and practical)
 It should be specific and clear
 It should be simple and understandable.
 It should be economical.
 It should be relevant (should be related to our Research problem)

6.4 Types of Hypothesis

6.4.1 Simple hypothesis

A simple Hypothesis predicts a linear relationship between the single dependent variable
and the single independent variable.
For Example: If you do meditation (Cause), you feel happy (Effect)

6.4.2 Complex hypothesis


Predict the relationship between two or more dependent variables and two or more
independent variables.
Example
For Example: If you do regular/yoga/meditation then you will feel
healthy/energetic/happy.
Smoking/drinking leads to lung cancer/liver disease.

6.4.3 Null hypothesis:


Shows/Predicts no relationship between the variables under study.
For Example, There is no significant change in health when I use green tea.
There is no relationship between good health and green tea.

28
6.4.4 Alternative hypothesis
 Is just the opposite of the null hypothesis. It shows a significant relationship between
the two variables.
 This hypothesis disapproves the null hypothesis.
For Example: When ask my friends and relatives(in testing the hypothesis). People told me
that their health is improving after taking green tea.

6.4.5 Directional hypothesis


 The directional hypothesis predicts the direction of the relationship between two
variables.
 It predicts the effect in a particular direction
 This direction can be positive or negative etc

For Example, There is a positive relationship between advertisements and the sale of
products. This hypothesis Predicts the relationship in the positive direction.

6.4.6 Non-Directional hypothesis


It Predicts the relationship between two variables but the direction is not specified.
For Example, There is a relationship between advertisement and the sale of the product.
This hypothesis predicts the relationship but the direction is not specified.

29
6.5 More Facts about Null -Hypotheses and Alternate -Hypothesis
Null -Hypotheses
 It states that something would happen that is happening/happened normally.
 It is the case of equal to:
 Two things are equal till we prove it.
What is Equal to a case
 The mean of the two groups is equal.
 The intelligence level of the two people is equal.
*Imp point: The researcher always (in most of the cases) test the Null hypothesis

Alternate -Hypothesis
Alternate is something that is not the null which is against the null.
For Example, A researcher wants to know whether the intelligence of two people is equal or
not.
 Null says they are the same, Alternate says they are not the same.
 Most of the researcher wants to disprove the null hypothesis or reject the null
hypothesis.
 To check whether the outcome is significant and effective. The hypothesis you see
in research papers are alternate hypothesis. This hypothesis the Researcher actually
wants to claim

30
Chapter 7
Hypothesis Testing

Hypothesis testing can be classified as follows:

31
7.1 T-Test: T -Test

 It is a parametric test of hypothesis testing based on Student’s T -Distribution.


 It was developed by William Sealy Gosset.
 It is essentially, Testing the significance of the difference of the mean values when
the sample size is small(i.e. less than 30) & when population standard deviation is
not available.
 It assumes:
 Population distribution is normal
 Samples are random & independent
 The sample size is small
 Population standard deviation is not known(Important point)

A T-Test can be a

One sample t-Test Two sample t-Test

To compare the sample mean with that To compare the means of two different samples
of the population mean

𝑥̅ − 𝜇 𝑥̅ − 𝑥̅
𝑡= 𝑠 𝑡=
√𝑛 𝑠 𝑠
where +
𝑛 𝑛
𝑥̅ is the sample mean
where
𝜇 is the population mean
𝑥̅ is the sample mean of the first group
𝑠 is the sample standard deviation
𝑥̅ is the sample mean of the second group
𝑛 is the sample size
𝑠 is the sample/group 1 standard deviation
𝑠 is the sample/group 2 standard deviation
𝑛 is the sample size of group 1
𝑛 is the sample size of group 2

If the value of the test sta s c


is greater than table value Reject the null hypothesis

If the value of the test sta s c Do not Reject the null


is less than table value hypothesis

32
7.2 Z-Test: Z -Test

 It is a parametric test of hypothesis testing.


 It is used to determine whether the means are different when the population
variance is known and the sample size is large(greater than 30)
 It assumes:
 Population distribution is normal
 Samples are random & independent

A Z-Test can be a

One sample Z-Test Two sample Z-Test

To compare the sample mean with that To compare the means of two different samples
of the population mean
𝑥̅ − 𝑥̅
𝑥̅ − 𝜇 𝑡=
𝑡=𝜎 𝜎 𝜎
+
√𝑛 𝑛 𝑛
where
𝑥̅ is the sample mean where
𝜇 is the population mean. 𝑥̅ is the sample mean of the first group.
𝜎 is the population standard 𝑥̅ is the sample mean of the second group.
deviation. 𝜎 is the population/group 1 standard deviation.
𝑛 is the sample size 𝜎 is the population/group 2 standard deviation.
𝑛 is the sample size of group 1.
𝑛 is the sample size of group 2.

If the value of the test sta s c


is greater than table value Reject the null hypothesis

If the value of the test sta s c Do not Reject the null


is less than table value hypothesis

33
7.3 Comparison between Z-test and T-test:

Sample size is small and


T Test is used when
the popula on variance is
not known

Sample size is large and the


Z Test is used when
popula on variance is
known

But in the following condi ons, Z test may also used.

Sample size is large and the


Z Test is used when
popula on variance is not
known

Sample size is small and


Z Test is used when
the popula on variance is
known

34
7.4 F-Test F -Test

 It is a parametric test of hypothesis testing based on Senedecor F- Distribution.


 F-Test is named after test static, F, Which was named in honor of Sir Ronald
Fisher.
 It is the test for the null hypothesis that the two normal populations have the same
variance.
 An F-test is regarded as the comparison of equality of sample variances
 F- Static is simply the ratio of two variances.
 F- test is a very flexible test

 It is calculated as:
𝒔𝟐𝟏
𝑭=
𝒔𝟐𝟐

Where

(𝒙 𝒙)𝟐
𝒔𝟐𝒊 = (𝒇𝒐𝒓 𝒊 = 𝟏 𝒂𝒏𝒅 𝟐)
𝒏 𝟏
 It can be used:
 Test the overall significance of the regression model
 To compare the fits of different models
 To test the equality of means
 It assumes:
 Population distribution is normal
 Samples are drawn randomly and independently

35
7.5 ANOVA ANOVA

 ANOVA(Analysis of variance) is a parametric test of hypothesis.


 F-Test is developed by Sir Ronald Fisher also referred to as Fisher’s ANOVA.
 It is an extension of the T-test and Z-test.
 It is used to test the significance of the differences in the mean values among more
than two sample groups.
 It uses the F-Test to statistically test the equality of means & the relative variance
between them.
 It assumes
 Population distribution is normal
 Samples are drawn randomly and independently
 Homogeneity of sample variance
It can be classified as
One-way ANOVA and Two way ANOVA

F static(ANOVA)=The variance between the sample means/Variance


within the sample means

36
7.6 Chi-Square Test Chi-Square Test

 The chi-square test is a non-parametric test of hypothesis.


 As a non-parametric test, Chi-Square can be used (i) as a test of goodness of fit
and
(ii) as a test of independence of two variables.
 It helps in assessing the goodness of fit between a set of observed values & those
expected theoretically.
 It makes a comparison between the expected frequencies and the observed
frequencies.
 The greater the difference, the greater is the value of the Chi-Square.
 If there is no difference(or the difference is not significant) then the value of Chi-
Square is zero(or nearly zero)
 It is also called as goodness of fit test. It determines whether a particular
distribution fits the observed data or not.
 Chi-square is also used to test the independence of two variables.
 It is calculated as
𝑶𝒊 − 𝑬 𝒊
ꭓ𝟐 (𝑪𝒉𝒊 − 𝒔𝒒𝒖𝒂𝒓𝒆 𝒑𝒂𝒓𝒂𝒎𝒆𝒕𝒆𝒓) =
𝑬𝒊

Where
𝑶𝒊 is the observed frequency
𝑬𝒊 is the expected frequency
 Conditions of Chi-square test
 Observations are collected and recorded on a random basis.
 All the items in the sample must be independent.
 No group should contain very few items (i.e less than 10)
 The overall number of items should be considerably large(at least 50). The
number of groups may be small.

37
7.7 When to use ANOVA?

You have a continuous dependent variable (e.g., height, weight, test scores) and one or more
categorical independent variables (e.g., treatment groups, age groups, education levels).

Use ANOVA when

 You want to compare means across two or more groups.


 You are interested in assessing the statistical significance of group
differences.
 You want to determine if the variation within groups is greater than the
variation between groups.

For example, you might use ANOVA to analyze the effects of different teaching methods
(independent variable) on student test scores (dependent variable) across multiple classes.

7.8 When to use Chi-Square?

Use Chi-Square when:

1. You have categorical variables (e.g., gender, occupation, preferences) and want to
test for associations or independence between them.
2. You want to determine if the observed frequencies of categories differ significantly
from the expected frequencies.
3. You are interested in analyzing the relationship between two or more categorical
variables.

For Example, You might use Chi-Square to examine whether there is an association between
smoking habits (variable 1: smoker, non-smoker) and the incidence of lung cancer (variable 2:
diagnosed with lung cancer, not diagnosed) in a population.

38
Chapter 8
Correlation and Regression

Correlation and regression analysis are statistical techniques used in research to examine the
relationships among variables. Correlation measures the strength and direction of a
relationship between the variables, while Regression analyzes how one variable changes
about another to make predictions. (regression is a kind of curve-fitting, modeling)

8.1 Correlation Analysis:

 Determines the degree of association between two or more variables.


 A correlation coefficient (e.g., Pearson's r) quantifies the strength and direction of the
linear relationship, ranging from -1 to +1.
 A positive correlation indicates that as one variable increases, the other tends to increase
as well. A negative correlation indicates that as one variable increases, the other tends
to decrease.

For Example: Measuring the correlation between hours studied and exam scores.
There is a correlation between height and weight.

Correlation Types:

8.1.1 Positive Correlation: When one variable increases, the other tends to increase as well,
or vice versa.

For Example: The size of a child's clothing changes as they grow.

8.1.2 Negative Correlation: When one variable increases, the other tends to decrease, and vice
versa.

For Example: If a car decreases its speed, the time it takes to reach a destination increases.

8.1.3 No Correlation: There is no significant relationship between the variables.

For Example: The number of people with SAMSUNG mobile and global warming.

8.2 Regression Analysis:

 Models the relationship between variables to make predictions and understand


how one variable influences another.
 Regression analysis produces an equation that describes how the dependent
variable changes based on the independent variable(s).

39
RegressionTypes:

8.2.1 Simple Linear Regression: Uses one independent variable to predict a dependent
variable.

8.2.2 Multiple Regression: Uses two or more independent variables to predict a dependent
variable.

For Example:
Predicting sales revenue based on advertising expenditure. (Simple Linear Regression)
Predicting Rain using humidity and temperature. (Multiple Regression)

8.3 Key Differences:

Feature Correlation Regression


Primary Goal Measure association Predict and model relationships.
Quantifies the strength of a linear Provides a mathematical equation
Relationship Type
relationship to describe the relationship.
X (first variable) and Y(second
Interchangeability X and Y are not interchangeable
variable) are interchangeable

In Summary:

 Correlation is a measure of how strongly two variables are related


 Regression is a tool for modeling and predicting how one variable affects another.

40
Chapter 9
Error Estimation

 Error estimation in research methodology involves assessing the uncertainty and


inaccuracies in research results.
 This process helps researchers understand the reliability and validity of their
findings, ensuring that conclusions are based on accurate and reliable data.

9.1 Types of Errors:

Random Error: Occurs due to chance fluctuations and can be minimized by increasing
sample size and improving measurement techniques.

Systematic Error: Results from a consistent bias in the measurement process and can be
addressed by calibrating instruments and refining experimental techniques.

Measurement Error: Includes all errors associated with the process of obtaining data,
such as errors in instrument calibration, human oversight, and data recording.

Sampling Error: This occurs when a sample is used to estimate a population parameter, and
the sample may not perfectly represent the population.

Non-sampling Error: Includes errors that occur outside of the sampling process, such as
response bias, non-response bias, and data processing errors.

9.2 Error Estimation Techniques:


Standard Error: A measure of the variability of a sample statistic, such as the sample mean,
used to estimate the accuracy of the estimate.

Confidence Intervals: A range of values within which the true population parameter is likely
to fall, based on a specific level of confidence.

Error Analysis: A systematic approach to identifying and quantifying errors in a


measurement or calculation.

Statistical Tests: Methods like t-tests and ANOVA can be used to assess the significance of
differences between groups or to test hypotheses.

9.3 Importance of Error Estimation:


Validity of Results: Ensures that research findings are based on accurate and reliable data,
reducing the risk of drawing incorrect conclusions.

Decision-Making: Helps researchers make informed decisions about the reliability and
validity of their findings, allowing for more confident interpretations.

Reproducibility: Allows other researchers to assess the reliability of the findings and replicate
the study with greater confidence.

41
In summary:

Error estimation is a crucial part of research methodology that helps researchers understand the
potential limitations of their findings and make more informed decisions about the
interpretation and validity of their results.

42

You might also like