0% found this document useful (0 votes)
16 views19 pages

Block 3

The document outlines data processing steps including editing, coding, classification, and tabulation, emphasizing the importance of organizing data for analysis. It also discusses statistical tests, differentiating between parametric and nonparametric tests, and introduces one-sample and two-sample tests for comparing data. Additionally, it covers multivariate analysis, including regression analysis and its applications in understanding relationships between variables.

Uploaded by

POOJA DHAKANE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views19 pages

Block 3

The document outlines data processing steps including editing, coding, classification, and tabulation, emphasizing the importance of organizing data for analysis. It also discusses statistical tests, differentiating between parametric and nonparametric tests, and introduces one-sample and two-sample tests for comparing data. Additionally, it covers multivariate analysis, including regression analysis and its applications in understanding relationships between variables.

Uploaded by

POOJA DHAKANE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT 8 DATA PROCESSING

Short and Simple Version:

After collecting data, it needs to be processed and analyzed as planned.

Processing includes:

• Editing (checking for errors)

• Coding (giving numbers or symbols)

• Classifying (grouping similar data)

• Tabulating (putting data into tables)

Data can be shown in tables or charts for better understanding.

Short and Simple Version:

Editing of data means checking the collected data for mistakes or missing info, and fixing them to
make the data complete and correct.

There are two types of editing:

1. Field Editing – Done soon after the interview. The investigator checks and completes any
unclear or short notes they made during the interview.

2. Central Editing – Done after all forms are collected. A trained editor (or team) checks all
forms for:

o Wrong entries

o Missing or unclear answers

o Consistency and correctness

If needed, the editor may contact the respondent again. All changes should be marked clearly and
signed.

Short and Simple Version:

Coding of data means giving numbers or symbols to answers so they can be grouped and analyzed
easily.

• Codes must be clear, cover all possible answers (exhaustive), and not overlap (mutually
exclusive).

• Each group should represent only one idea.

• Coding helps in quick computer analysis.

• It's best to plan coding while making the questionnaire.

• Mistakes in coding should be avoided or kept very low.

Short and Simple Version:

Classification of data means grouping data into categories for easy analysis.
• It reduces large data into meaningful groups.

• Two main types:

1. By attributes (qualitative): like gender, education, caste, etc.

▪ Simple: One trait (e.g., male/female).

▪ Manifold: Multiple traits (e.g., industry type → size → profit/loss).

2. By numerical values (quantitative): like income, height, marks.

▪ Grouped into class intervals (e.g., income ₹1001–₹1500).

▪ Each group has a frequency, upper & lower limits, and class size.

This helps compare data and draw conclusions.

Short and Simple Notes:

Statistical Series

• A series is an ordered arrangement of data (ascending or descending).

• It helps organize and understand data better.

Types of Series:

1. Time Series – Data over time (e.g., sales by year).

2. Spatial (Geographical) Series – Data by region or place.

3. Condition Series – Based on physical traits (e.g., height, age).

Forms of Series:

1. Individual Series – Single values listed one by one.

2. Discrete Series – Data in separate groups with specific values (e.g., marks with number of
students).

3. Continuous Series – Data in ranges without breaks (e.g., income ₹1000–2000, ₹2001–3000,
etc.).

These series help in analysis and comparison of data.

Short and Simple Notes:

Tables as Data Presentation Tools

• Tables help to organize and summarize data clearly.

• They make it easier to compare, analyze, and understand the information.

Types of Tables:

• Simple Table – Shows one characteristic.

• Complex Table – Shows two or more related characteristics (two-way, three-way, etc.).

Key Features of a Good Table:


1. Clear title above the table.

2. Unique table number for easy reference.

3. Clear column headings (captions) and row headings (stubs).

4. Mention units of measurement.

5. Show source of data.

6. Add footnotes if needed for explanation.

7. Number the columns if useful.

8. Use abbreviations only when necessary.

9. Keep the table simple, clear, and accurate.

10. Arrange data logically (by time, region, alphabet, or size).

11. Make sure the table matches the study’s purpose.

Short and Simple Notes:

Graphical Presentation of Data

Graphs help us to show data visually for better understanding and comparison.

Common Types of Graphs:

1. Bar Chart – Uses bars to compare quantities.

2. Two-Dimensional Diagrams –

o Rectangular Diagrams: Use rectangles; height shows value.

o Squares: Used when data values differ a lot. Size of square is based on square root of
value.

3. Pictograms – Use pictures or symbols to show data (e.g., a bike to show bike production).

4. Pie Chart – Circle divided into parts. Each part shows a percentage of the total.

5. Line Chart (Arithmetic Chart) – A line connects points over time to show trends or changes.

Each type of graph helps make data clear, simple, and easy to compare.
UNIT 9 STATISTICAL ANALYSIS AND INTERPRETATION OF DATA: NONPARAMETRIC TESTS

Here are short and simple notes on the provided content:

Parametric vs. Nonparametric Tests

Parametric Tests

• Use population parameters (mean, standard deviation).

• Assume the population follows a known distribution (e.g., normal).

• Example: t-test (compares means of two samples).

• More powerful if assumptions are valid.

Nonparametric Tests (Distribution-Free Tests)

• Do not assume a specific population distribution.

• Useful when:

o Data is ranked, scaled, or rated.

o Nominal data is involved (e.g., "North", "South").

o Small samples or missing data exist.

• Easier, with fewer arithmetic steps.

• Less precise and may have a higher chance of Type II error (failing to reject a false null
hypothesis).

When to Use Nonparametric Tests

1. Consumer surveys with extreme responses (e.g., like/dislike, not normal).

2. Nominal data, such as preferred job location.

3. Partially filled questionnaires – still usable.

4. Small sample sizes – gives useful results.

Key Point:

Use parametric tests when assumptions are valid.


Use nonparametric tests when data is limited, non-normal, or qualitative.

Here are short and simple notes on the concept of One Sample Tests from your text:
One Sample Tests

These tests are used when we want to compare one sample to a known or expected
value/distribution.

Used to Answer Questions Like:

1. Is there a significant difference between observed and expected values?

2. Was the sample likely drawn from a specific population?

3. Is the sample a random draw from a known population?

These questions are answered using Goodness of Fit tests.

Kolmogorov-Smirnov (K-S) One Sample Test

• A nonparametric test.

• Compares the distribution of sample data with a theoretical distribution (like normal,
uniform, etc.).

• Tests how well the sample matches the expected distribution.

• Works best with ordinal data.

Key Point:

If the sample data closely follows the theoretical distribution, we fail to reject the hypothesis that it
comes from that population.

Let me know if you'd like a real-life example or visual aid for the K-S test!

Here are short and clear notes on the Chi-Square Test from your material:

Chi-Square (χ²) Test of Goodness of Fit

Used When:

• Responses fall into two or more categories, like:

o Like / Dislike

o Favor / Not Favor

o Yes / No

o Male / Female

• You want to check if the observed frequencies differ significantly from the expected
frequencies.
Purpose:

To test if there is a significant difference between:

• What we actually observe (from survey or experiment), and

• What we expected (according to some hypothesis).

Formula for χ² Statistic:

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}

Where:

• OO = Observed frequency

• EE = Expected frequency

• ∑\sum = Summation over all categories

Key Point:

A large value of χ² suggests a large difference between observed and expected frequencies → may
reject the null hypothesis.

Let me know if you’d like help solving an example problem with the χ² test!

Here are short and clear notes on the Two-Sample Tests, based on your study material:

TWO-SAMPLE TESTS

Used When:

You want to compare two groups or samples to determine if there is a significant difference
between them.

Common Questions:

• Are the means of two populations different?

• Are the proportions of two groups different?

Types of Two-Sample Tests:

1. Parametric Test (Assumes normal distribution)

• t-test for two independent samples


o Used when comparing means of two unrelated groups.

o Example: Compare average scores of students from two different colleges.

• t-test for two related samples (paired t-test)

o Used when the same group is tested before and after a treatment.

o Example: Blood pressure before and after taking a drug.

2. Nonparametric Test (No assumption of normal distribution)

• Mann-Whitney U Test (for independent samples)

• Wilcoxon Signed-Rank Test (for paired or related samples)

Assumptions for Parametric Two-Sample t-test:

• Data is approximately normally distributed.

• The two samples are independent.

• Variances are equal (for equal variance t-test).

Formula for Independent Two-Sample t-Test:

If variances are equal:

t=Xˉ1−Xˉ2Sp2(1n1+1n2)t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{S_p^2 \left( \frac{1}{n_1} +


\frac{1}{n_2} \right)}}

Where:

• Xˉ1,Xˉ2\bar{X}_1, \bar{X}_2 = sample means

• Sp2S_p^2 = pooled variance

• n1,n2n_1, n_2 = sample sizes

Let me know if you want a worked-out example using real numbers!

Here are clear, concise notes on K-Sample Tests for your reference:

K-SAMPLE TESTS

These tests are used when comparing more than two samples to determine whether they come
from the same population.
When to Use:

• Comparing performance across multiple machines, branches, products, etc.

• Example: Testing if 3 fertilizers give the same crop yield.

Median Test (Extension of 2-sample test)

Steps:

1. Pool all sample data.

2. Find the combined median.

3. Create a 2 × k table (above vs. below median for each sample).

4. Use the Chi-square test (χ²) with (k−1)(k - 1) degrees of freedom.

Decision Rule:

• If p-value < α (significance level), reject null hypothesis (i.e., not all samples are from same
population).

Kruskal-Wallis Test (Non-parametric equivalent of ANOVA)

Steps:

1. Pool and rank all values (lowest gets rank 1).

2. Compute the sum of ranks rir_i for each group.

3. Use the formula:

H=12n(n+1)∑ri2ni−3(n+1)H = \frac{12}{n(n+1)} \sum \frac{r_i^2}{n_i} - 3(n+1)

Where:

• nn = total number of observations

• nin_i = number of observations in sample ii

• rir_i = sum of ranks in sample ii

4. HH follows a Chi-square distribution with (k−1)(k - 1) degrees of freedom.

Decision Rule:

• If Hcalculated>χcritical2H_{\text{calculated}} > \chi^2_{\text{critical}}, reject the null


hypothesis.

• Otherwise, accept that all samples come from the same population.

Advantages:
• Useful when data is ordinal or not normally distributed.

• Works well even with small samples.

Would you like a worked-out example for either of these tests using real numbers?

UNIT 10 MULTIVARIATE ANALYSIS OF DATA

Here is a summary and breakdown of the content you've provided, organized for better
understanding and study reference:

K-Sample Non-Parametric Tests

The Median Test

• Purpose: Test if k samples come from populations with the same median.

• Steps:

1. Pool all data from k samples.

2. Find the combined median.

3. Create a 2 × k table: count how many values from each sample fall above or below
the median.

4. Use the chi-square (χ²) test with (k - 1) degrees of freedom.

• Decision Rule:

o If p-value < α (level of significance), reject H₀: samples do not come from the same
population.

The Kruskal-Wallis Test

• Purpose: Compare medians across 3 or more independent samples.

• Steps:

1. Combine and rank all data (lowest gets rank 1).

2. Calculate sum of ranks for each sample.

3. Compute the Kruskal-Wallis H-statistic (uses χ²-distribution).

• Decision Rule:

o If H > χ² critical value → Reject H₀: at least one sample median is significantly
different.

• Example: Fertilizer yield study where H < χ² critical value → no significant difference in
median yields → accept H₀.
Regression Analysis

Purpose

• Understand relationship between a dependent variable (Y) and independent variable(s) (X).

• Predict future outcomes.

• Determine which variables significantly explain variation in Y.

Simple Linear Regression

• Model:

Y=β0+β1X+εY = β₀ + β₁X + ε

o Y = dependent variable

o X = independent variable

o ε = random error

• Assumptions:

1. Linear relationship.

2. Y is normally distributed.

3. X is fixed (non-random).

4. Errors (ε) are independent with mean 0 and constant variance.

• Estimated Equation:

Y^=a+bX\hat{Y} = a + bX

o Coefficients a and b found via least squares method.

• Validation:

o Use F-test or t-test for hypothesis testing:

▪ H₀: β = 0 → No linear relationship

▪ H₁: β ≠ 0 → Significant linear relationship

• Strength of Association: Measured by R² (Coefficient of Determination)

o R² = proportion of variation in Y explained by X.

o Example: R² = 0.723 → 72.3% variation explained → strong relationship.

Simple Linear Trend (for Time Series)

• Model sales or data over time:

Y=a+btY = a + bt
• Can also be modeled in semi-log or double log form depending on nature of data.

Multiple Linear Regression

Model:

Y=a+bX1+cX2+...+kXkY = a + bX₁ + cX₂ + ... + kX_k

• Each partial regression coefficient (e.g., b or c) shows the effect of its variable while holding
others constant.

Strength of Association:

• R² (Multiple Coefficient of Determination): % of Y's variation explained by all predictors


combined.

• R (Multiple Correlation Coefficient): sqrt(R²), direction of the relationship.

Hypothesis Testing (F-test via ANOVA):

• Test overall significance of the model.

• H₀: All β’s = 0 (no relationship)

• H₁: At least one β ≠ 0 (significant predictors)

Points to Remember in Multiple Regression:

1. Validate statistically using R² and ANOVA.

2. For forecasting, independent variables must be forecasted first.

3. Too many variables? → Use Stepwise Regression:

o Add/remove variables one by one.

o Stop when R² increases only marginally.

4. Multicollinearity:

o When X’s are highly correlated with each other.

o Consequences: High standard errors, unreliable coefficients.

o Remedies:

▪ Drop some variables.

▪ Use Principal Component Analysis or transformation.

Example Summary

• Regression model:
Y^=0.247+0.493X1+0.484X2\hat{Y} = 0.247 + 0.493X_1 + 0.484X_2

• R² = 0.78 → 78% of Y’s variation explained.

• F-test validates model at 95% confidence.

• Model suitable for prediction and inference.

Would you like me to:

• Convert this into flashcards for revision?

• Create a question bank or quiz based on this content?

• Help you practice interpreting computer outputs (SPSS, R, Excel)?

Let me know how you'd like to study this material further.

Great! Here's a clear and concise summary of Discriminant Analysis based on the text you shared:

Discriminant Analysis: Summary

Definition:
Discriminant Analysis is a classification technique used to assign observations into predefined groups
based on predictor variables. These groups must be:

• Mutually exclusive (no overlap),

• Collectively exhaustive (every observation must belong to one group).

It is commonly used when:

• You want to classify entities (e.g., customers or employees) into categories like successful vs.
unsuccessful, or owner vs. non-owner based on measured characteristics.

Objectives of Two-Group Discriminant Analysis:

1. Group Separation:

o Construct linear combinations (discriminant functions) of the predictor variables to


maximize the separation between groups.

o This is done by maximizing between-group variance relative to within-group


variance.

2. Classification of New Cases:

o Use the derived discriminant function to assign new individuals to one of the known
groups based on their predictor values.

3. Significance Testing:
o Test if there are statistically significant differences in the mean profiles (average
predictor values) between the groups.

4. Variable Importance:

o Identify which predictor variables contribute most to distinguishing between the


groups.

Would you like an example (like classifying loan applicants as approved/denied), or would you prefer
to compare this with logistic regression next?

Here’s a clear and structured summary of Factor Analysis based on your content:

Factor Analysis: Summary

Definition:
Factor analysis is a data reduction and summarization technique used to condense a large set of
interrelated variables into a smaller number of factors, while retaining as much original information
as possible.

Key Characteristics:

1. Purpose:

o Identify a smaller set of underlying factors that explain the observed correlations
among the variables.

o Focus is on describing data rather than predicting or inferring.

2. Whole-variable approach:

o Unlike regression or discriminant analysis, factor analysis does not split variables into
dependent and independent sets.

o It treats all variables simultaneously.

3. Linear Modeling:

o Factors are extracted as linear combinations of the original variables.

4. Exploratory Nature:

o It's a search technique with no strict prior assumptions about the number of factors.

o Stopping rules (like eigenvalues >1, scree plot analysis) are often ad hoc or
subjective.

5. Computational Requirement:

o Due to its complexity, factor analysis is typically performed using statistical software
(e.g., SPSS, SAS, BMD).
Process Overview:

1. Initial Extraction (e.g., Principal Component Analysis):

o Extracts factors by maximizing explained variance.

o Ensures the factors are uncorrelated (orthogonal).

2. Factor Loadings:

o Coefficients that indicate how strongly each variable correlates with each factor.

o Zero loading implies the variable doesn’t contribute to that factor.

3. Variance Explained:

o Each factor accounts for a portion of the total variance.

o In the example, three factors explain about 95% of the total variance.

4. Communality:

o Indicates how much of a variable’s variance is explained by all the factors combined.

o High communality (e.g., >85%) means factors are good representations of the
original variables.

5. Factor Rotation (e.g., Varimax):

o Optional step to simplify interpretation by making loadings more distinct.

o Not covered in detail, but available in tools like SAS.

Illustrative Example:

Objective: Identify factors that define a successful salesperson.


Data: 7 variables for 14 salespeople (e.g., age, education, height, IQ, etc.)
Outcome:

• 3 factors emerged:

1. Maturity (age, children, household size)

2. Physical size (height, weight)

3. Intelligence/training (education, IQ)

Subjective Decisions:

Before conducting factor analysis, two critical subjective questions must be addressed:

1. How many factors to extract? (Using cut-offs like eigenvalue >1 or scree plot)

2. How to label the extracted factors? (Relies on intuition and domain knowledge)
Would you like a diagram or table to visually explain the factor loadings and communalities from the
example?

UNIT 11 ETHICS IN RESEARCH

Here's a clear and concise summary of your content on Research Ethics:

Introduction to Research Ethics

Definition:
Ethics are moral standards that guide how individuals behave and make decisions—particularly in
relation to others. In research, ethics aim to ensure that no individual is harmed during the process
or as a result of the findings.

Purpose of Research Ethics:

• Protect participants from harm, exploitation, or misrepresentation.

• Ensure integrity, honesty, and accountability throughout the research process.

• Promote responsible conduct by researchers, project managers, and sponsors.

Common Ethical Violations in Research:

• Breach of confidentiality or non-disclosure agreements

• Misrepresentation or manipulation of results

• Deceptive practices toward participants

• Falsified invoicing or financial misconduct

• Plagiarism or unoriginal work

Key Elements of Ethical Research:

1. Anticipation of ethical issues:


Ethical concerns should be considered during the planning stage, not after problems arise.

2. Institutional Oversight:

o Research is often reviewed by a Research Ethics Committee or an Institutional


Review Board (IRB).

o These bodies ensure that the study is ethically sound, respects human dignity, and is
free from plagiarism.
3. Responsibility Across Roles:
All parties—researchers, managers, and sponsors—must commit to ethical principles.

4. Human Subject Protection:


Studies involving human participants raise complex ethical, legal, and social issues that
require thoughtful guidelines and protections.

Would you like me to turn this into a visual infographic or summary slide?

Here is a structured and simplified summary of the Principles of Research Ethics you provided, ideal
for learning or presentation:

Principles of Research Ethics

1. Honesty

o Be truthful with participants, stakeholders, and in presenting methods and results.

2. Integrity

o Maintain sincerity; keep promises and avoid giving false hope.

3. Objectivity

o Avoid bias in all parts of the research process: design, analysis, and reporting.

4. Informed Consent

o Ensure participants voluntarily agree to participate, fully understanding their role,


risks, and benefits.

5. Beneficence

o Maximize benefits and minimize harm for participants.

6. Protection of Subjects

o Ensure privacy, autonomy, and safety of all participants.

7. Responsible Publication

o Publish ethically: avoid duplication, fabrication, or plagiarism.

8. Confidentiality

o Safeguard private information including:

▪ Purpose and goals of the research

▪ Participation details

▪ Risks and benefits

▪ Rights to refuse or withdraw

▪ Anonymity methods
9. Non-discrimination

o Treat all participants equally, without bias based on age, gender, caste, religion,
ethnicity, etc.

10. Openness

o Be receptive to feedback, critiques, and alternate views.

11. Carefulness & Respect for Intellectual Property

o Avoid errors, give proper credit, and never plagiarize.

12. Justice

o Ensure fair treatment for all participants; no favoritism.

Would you like this formatted as a poster, slide, or cheat sheet for easier reference?

Here’s a clear and well-organized summary of the Advantages and Limitations of Research Ethics
from your content:

Advantages of Research Ethics

1. Promotes Research Goals

o Encourages clarity, truth, and knowledge advancement.

2. Builds Trust

o Enhances credibility between researcher and participant.

3. Protects Participants

o Safeguards their dignity, rights, and overall well-being.

4. Ensures Accountability

o Makes researchers responsible for their actions and decisions.

5. Fosters Moral Values

o Strengthens social and ethical norms.

6. Supports Study Objectives

o Encourages understanding, integrity, and the reduction of errors.

7. Encourages Collaboration

o Upholds values like trust, responsibility, respect, and objectivity in teamwork.

Limitations of Research Ethics

1. Psychological Risks
o Surveys or questions may unintentionally alter a participant’s behavior or emotional
state.

2. Social, Legal, and Economic Risks

o Accidental data leaks could lead to stigma, discrimination, or legal consequences for
respondents.

3. Discrimination Risk to Groups

o Certain ethnic or indigenous populations might face bias if research reveals sensitive
or high-risk information about them.

Would you like this formatted into a visual chart or slide for easier reference or sharing?

Here is a clear and organized breakdown of the Steps Involved in Ethics based on your input:

Steps Involved in Ethical Decision-Making in Research

1. Collect Facts & Acknowledge Intellectual Property

o Gather all relevant information and ensure transparency about the use of others’
ideas or work.

2. Outline the Moral Considerations

o Identify ethical principles or dilemmas involved (e.g., honesty, beneficence, justice).

3. Identify the Affected Stakeholders

o Determine who might be impacted (participants, researchers, society, institutions,


etc.).

4. Establish the Forfeitures (Potential Losses)

o Assess possible risks, harms, or trade-offs involved in the research process.

5. Recognize Your Obligations

o Understand your ethical responsibilities to all involved parties.

6. Consider Your Style and Honesty

o Reflect on your personal integrity, approach, and ability to remain unbiased.

7. Innovate Ethically

o Think creatively about alternative courses of action that uphold ethical standards.

8. Respect Confidentiality and Privacy

o Ensure all personal data is protected and anonymity is maintained when necessary.

9. Determine the Most Ethical Action


o Choose the action that best aligns with ethical norms and be ready to justify it in the
face of criticism or differing opinions.

Would you like a flowchart or infographic of these steps for easy use in a report or presentation?

Here’s a clear summary of Research Misconducts based on your input:

Types of Research Misconduct

1. Fabrication

o Definition: Making up data or results that were never actually collected or observed.

o Example: Inventing survey responses or lab results to support a hypothesis.

o Why it’s wrong: It misleads readers and invalidates the integrity of the research.

2. Falsification

o Definition: Manipulating research materials, equipment, processes, or changing data


to misrepresent findings.

o Example: Altering images, selectively omitting data, or modifying results to fit


expectations.

o Impact: It distorts the research outcome and can harm public trust and scientific
advancement.

3. Plagiarism

o Definition: Using someone else’s work (text, images, ideas) without proper citation
or permission.

o How to avoid it: Always paraphrase in your own words and give clear credit to the
original source.

o Consequence: Legal action, academic penalties, and damage to reputation.

Would you like this reformatted into a visual table or included in a presentation format (like
PowerPoint)?

You might also like