QA IA-I Question Bank
2 Marks
1. Define “Statistics”. Explain Uses and Limitations of Statistics
ANS :
Definition of Statistics:
Statistics is the science of collecting, analyzing, interpreting, and presenting numerical data. It
helps in estimating probabilities and measuring social phenomena.
Uses of Statistics:
1. Planning & Economics: Essential for economic planning, demand analysis, price trends,
and economic development.
2. Business & Industry: Aids in production control, consumer analysis, and decision-
making, ensuring efficient resource utilization.
3. Mathematics & Research: Integral to mathematical models, leading to fields like
Mathematical Statistics and Econometrics.
4. Biology & Medicine: Supports medical research, disease analysis, and drug testing using
statistical significance tests.
5. Astronomy & Psychology: Helps in planetary motion studies and psychological analysis
(e.g., intelligence measurement).
6. Warfare: Assists military strategies through decision-making models.
Limitations of Statistics:
1. Not for Individual Data: Focuses on aggregates, not single instances.
2. Only Quantitative Analysis: Cannot directly analyze qualitative factors like honesty or
intelligence.
3. Not Universally True: Results hold under specific conditions, unlike natural sciences.
4. One of Many Methods: Must be supplemented by other studies like cultural or
philosophical analysis.
5. Prone to Misuse: Can be manipulated to support
2. Meaning and importance of Tabulation
ANS:
Meaning of Tabulation:
Tabulation is the systematic arrangement of data in rows and columns based on specific
characteristics. It helps in organizing raw data into a structured format for better understanding
and analysis.
Definitions:
A.M. Tuttle: "A statistical table is a logical listing of related quantitative data in vertical
columns and horizontal rows, with explanatory titles and headings for clarity."
Professor Bowley: "Tabulation is the intermediate process between collecting data and
deriving meaningful statistical conclusions."
Importance of Tabulation:
1. Simplifies Complex Data: Large datasets are presented in a concise and structured
manner.
2. Enhances Clarity & Comparability: Allows easy comparison of different data points.
3. Saves Time & Space: Presents maximum information in minimal space efficiently.
4. Facilitates Analysis: Helps in drawing statistical conclusions and making informed
decisions.
5. Ensures Accuracy: Reduces errors by systematically arranging data.
6. Aids in Graphical Representation: Serves as a foundation for charts and graphs.
3. Distinguish between primary data and secondary.
ANS:
Primary Data Secondary Data
Collected first-hand by the researcher for Already collected by someone else and
a specific purpose. used for a different purpose.
Data collection involves defining terms, Compilation of existing data without
selecting units, and determining accuracy. direct collection.
More reliable and accurate since it is May have transcription errors or biases.
collected with a specific objective.
Provides detailed information on Lacks detailed explanations of
terminology, statistical units, and data methodology.
collection methods.
Used when no relevant secondary data is Used when relevant primary data is
available. difficult to obtain.
Examples: Surveys, experiments, Examples: Government reports, books,
interviews. articles.
4. What do you mean by a questionnaire?
ANS:
A questionnaire is a printed set of questions. It is either open-ended or closed-ended.
The respondents are expected to answer the questions. It is based on their knowledge and
experience, with the problems (questions) concerned.
The questionnaire is a part of the survey, whereas the questionnaire's end-goal may or
may not be a survey.
5. What is regression analysis?
ANS: Regression Analysis is a statistical method used to examine the relationship
between two or more variables. It helps determine how a dependent variable (response)
changes based on the values of one or more independent variables (predictors). The
primary goal of regression analysis is to model this relationship mathematically, allowing
for prediction and optimization.
Regression analysis is used for two distinct purposes:
First, regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning.
Second, in some situations, regression analysis can be used to infer causal relationships
between the independent and dependent variables.
Types:
1. Linear Regression
2. Multiple Linear Regression
3. Non-linear Regression
6. Justify or contradict bxy and byx must be either positive or negative
ANS:
bxy and byx must be either both positive or both negative
In regression analysis, bxy (regression coefficient of X on Y) and byx (regression
coefficient of Y on X) represent the slopes of the regression lines. Their relationship is
given by:
bxy * byx = r2 ,
Where r is the correlation coefficient between X and Y.
Since r2 is always non-negative (i.e.,r2 ≥0), the product bxy * byx is also always positive.
This implies that both bxy and byx must have the same sign, either both positive or both
negative.
If r > 0, meaning X and Y have a positive correlation, then both bxy and bxy will be
positive.
If r< 0 meaning X and Y have a negative correlation, then both bxy and bxy will be
negative.
7. From the following data
X 40 34 28 30 44 38 31
Y 32 39 26 30 38 34 28
Find 1.Coefficient of Regression
8. The following data are given regarding expenditure on advertising and sales of a
particular firm
Advertising Expenditure X Sales (in lakhs) Y
Mean 10 90
Standard Deviation 3 12
Correlation coefficient r=0.8
i) Calculate the regression equation of Y on X
5 Marks
1. A survey of 370 students from Commerce Faculty and 130 students from Science
Faculty revealed that 180 students were studying for only C.A. Examinations, 140
for only Costing Examinations and 80 for both C.A. and Costing Examinations. The
rest had offered part-time Management Courses. Of those studying for Costing
only, 13 were girls and 90 boys belonged to Commerce Faculty. Out of 80 studying
for both C.A. and Costing, 72 were from Commerce Faculty amongst which 70 were
boys. Amongst those who offered part-time Management Courses, 50 boys were
from Science Faculty and 30 boys and 10 girls from Commerce Faculty. In all there
were 110 boys in Science Faculty. Present the above information in a tabular form.
Find the number of students from Science Faculty studying for part-time
Management Courses.
ANS:
2. The following Table gives the frequency distribution of the weekly wages (in 00Rs) of 100
workers in a factory. Draw the Histogram and frequency polygon of the distribution.
Weekly 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 Total
wages(00Rs)
No of workers 4 5 12 23 31 10 8 5 2 100
ANS:
3. What is stratified sampling? Explain the merits and limitations of stratified sampling.
Merits of Stratified Sampling
1. Higher Accuracy: Since each subgroup is properly represented, the results are more
precise than simple random sampling.
2. Ensures Representation: Guarantees that all groups (strata) are included in the sample,
preventing underrepresentation of any subgroup.
3. Comparison Between Groups: Allows researchers to compare different groups (e.g.,
males vs. females, urban vs. rural students).
4. More Efficient Than Simple Random Sampling: Reduces sampling error, especially
when there is a large variation among different strata.
5. Applicable for Heterogeneous Populations: Works well when the population is diverse
and contains subgroups with different characteristics.
Limitations of Stratified Sampling
1. Difficult to Identify Strata: Requires detailed knowledge of the population to correctly
classify individuals into appropriate strata.
2. Time-Consuming and Expensive: More complex than simple random sampling,
requiring additional time and resources.
3. Risk of Improper Stratification: If the strata are not well-defined, the sample may be
biased and the results inaccurate.
4. Not Suitable for Small Populations: If the population is small, stratified sampling may
not provide significant benefits over simple random sampling.
4. What do you mean by questionnaire? What is the difference between a questionnaire
and a schedule? State the essential points to be remembered in drafting a questionnaire.
ANS:
A questionnaire is a structured set of questions designed to collect quantitative or qualitative
data from respondents. It is commonly used in surveys, research, and statistical studies to gather
information efficiently. The questions can be open-ended (descriptive answers) or close-ended
(yes/no, multiple choice, Likert scale, etc.).
Questionnaire Schedule
A written set of questions given to respondents A set of questions asked by an enumerator who
to fill out on their own. records the answers.
Self-administered by the respondent. Enumerator fills out responses based on
interaction.
Less expensive since no interviewer is needed. More expensive as enumerators are hired.
Takes more time as responses depend on when Faster since enumerators directly collect
respondents fill it. responses.
May be affected by misunderstanding of More accurate as enumerators clarify doubts.
questions by respondents.
Best for literate respondents who can Suitable for both literate and illiterate
understand and fill the form. respondents.
1. Define the Purpose Clearly: Ensure the questionnaire aligns with the research objective.
2. Use Simple and Clear Language: Avoid technical jargon, complex sentences, or
ambiguous words.
3. Keep It Short and Precise: A long questionnaire may discourage responses.
4. Use Close-Ended Questions Whenever Possible:Helps in easy analysis and quantitative
evaluation.
5. Maintain Logical Order: Start with easy and general questions before moving to
specific ones.
6. Avoid Leading or Biased Questions: Example of a biased question: “Don’t you think
online learning is better?”
7. Ensure Confidentiality: Mention that responses will remain anonymous if necessary.
8. Pre-test the Questionnaire: Conduct a pilot study to check for errors and clarity before
finalizing.
9. Use Proper Formatting:Number the questions, provide clear instructions, and use a
readable font.
10. Provide Response Options Thoughtfully: Ensure options cover all possible answers
without overlap or gaps.
5. Equations of the two lines of regression are x+6y=6 and 3x+2y=10. Find
i) Mean of x and Mean of y
ii) regression coefficient byx and bxy
iii) correlation coefficient between x and y
6. From the data given below find:
a) The Two regression coefficients
b) The Two regression equations
c) The coefficient of correlation between the marks in Economics and statistics.
d) The most likely marks in Statistics if marks in Economics are 30
Marks in 25 28 35 32 31 36 29 38 34 32
Economics
Marks in 43 46 49 41 36 32 31 30 33 39
Statistics
7. Explain the following methods to check the performance of Regression Model
i) MAE ii) MAPE
i) MAE
Definition:
MAE measures the average magnitude of errors between predicted values and actual values. It
gives an idea of how much error the model makes, on average, without considering direction
(whether the error is positive or negative).
1
Formula: 𝑀𝐴𝐸 = ̅ |
∑𝑛𝑖=1 | 𝑌𝑖 − 𝑌𝑖 where,
𝑛
𝑌𝑖 =Actual Value
̅ = Predicted value
𝑌𝑖
𝑛 =Number of Observations
Characteristics:
MAE is always positive.
Lower MAE indicates a better model.
It is measured in the same unit as the target variable.
Example: If actual sales data is [100, 150, 200] and predicted values are [110, 140, 195],
|100 − 110| + |150 − 140| + |200 − 195|
𝑀𝐴𝐸 = = 8.33
3
Mean Absolute Percentage Error (MAPE)
Definition:
MAPE measures the error as a percentage of actual values, making it useful for comparing
models across different datasets.
Formula:
𝑛
100 ̅
𝑌𝑖 − 𝑌𝑖
𝑀𝐴𝑃𝐸 = ∑| |
𝑛 𝑌𝑖
𝑖=1
Characteristics:
Expressed as a percentage (%), making it easier to interpret.
Helps compare models across different datasets and scales.
Sensitive to very small actual values (can give extremely high percentages when YiY_iYi
is close to zero).
Example: Using the same data as before:
100 |100 − 110| + |150 − 140| + |200 − 195|
𝑀𝐴𝑃𝐸 = ( )
3 3
100
= (0.10 + 0.0667 + 0.025) = 6.39 %
3
8. What do you understand about data collection? Classify different types of data based on
source of data
Data collection is the process of gathering and measuring information on variables of interest in
a systematic way. It helps in making informed decisions, analyzing trends, and drawing
meaningful conclusions. The accuracy of any statistical analysis depends on the quality of data
collected.
Types of Data Based on Source
Data can be classified into two main categories based on its source:
1. Primary Data
Definition:
Primary data is original data collected firsthand by a researcher for a specific purpose. It is fresh
and directly obtained from the source.
Methods of Collecting Primary Data:
Surveys & Questionnaires – Collecting responses from individuals.
Interviews – Direct interaction with respondents.
Observations – Monitoring behaviors or events.
Experiments – Conducting tests under controlled conditions.
Focus Groups – Discussing topics with a small group.
Advantages:
✔More reliable and relevant to the study.
✔Up-to-date and specific to the research needs.
Disadvantages:
❌ Time-consuming and expensive.
❌ Requires more effort to collect and process.
2. Secondary Data
Definition:
Secondary data refers to data that has already been collected and processed by someone else for a
different purpose. Researchers use it for their own analysis.
Sources of Secondary Data:
Government Reports – Census data, economic surveys.
Research Papers & Journals – Published studies and findings.
Company Records – Annual reports, financial statements.
Websites & Databases – Online statistics, articles.
Advantages:
✔Readily available and cost-effective.
✔ Saves time and effort in data collection.
Disadvantages:
❌ May not be completely relevant to the study.
❌ Can be outdated or inaccurate if not verified properly.