CHAPTER SIX
PROCESSING AND ANALYSIS OF DATA
The data, after collection, has to be processed and analysed in accordance with the outline laid down for
the purpose at the time of developing the research plan.
Processing:- implies editing, coding, classification and tabulation of collected data so that they are
amenable to analysis.
Analysis:- refers to the computation of certain measures along with searching for patterns of relationship
that exist among data-groups. Thus, “in the process of analysis, relationships or differences supporting or
conflicting with original or new hypotheses should be subjected to statistical tests of significance to
determine with what validity data can be said to indicate any conclusions”.
PROCESSING OPERATIONS
With this brief introduction concerning the concepts of processing and analysis, we can now proceed with
the explanation of all the processing operations.
1. Editing: Editing of data is a process of examining the collected raw data (specially in surveys) to
detect errors and omissions and to correct these when possible. As a matter of fact, editing involves a
careful scrutiny of the completed questionnaires and/or schedules. Editing is done to assure that the data
are accurate, consistent with other facts gat*/hered, uniformly entered, as completed as possible and have
been well arranged to facilitate coding and tabulation.
With regard to points or stages at which editing should be done, one can talk of field editing and central
editing.
Field editing:- consists in the review of the reporting forms by the investigator for completing (translating
or rewriting) what the latter has written in abbreviated and/or in illegible form at the time of recording the
respondents’ responses. This type of editing is necessary in view of the fact that individual writing styles
often can be difficult for others to decipher. This sort of editing should be done as soon as possible after
the interview, preferably on the very day or on the next day. While doing field editing, the investigator
must restrain himself and must not correct errors of omission by simply guessing what the informant
would have said if the question had been asked.
Central editing:- should take place when all forms or schedules have been completed and returned to the
office. This type of editing implies that all forms should get a thorough editing by a single editor in a
small study and by a team of editors in case of a large inquiry. Editor(s) may correct the obvious errors
such as an entry in the wrong place, entry recorded in months when it should have been recorded in
weeks, and the like.
2. Coding: Coding refers to the process of assigning numerals or other symbols to answers so that
responses can be put into a limited number of categories or classes. Such classes should be appropriate to
the research problem under consideration. They must also possess the characteristic of exhaustiveness
(i.e., there must be a class for every data item) and also that of mutual exclusively which means that a
specific answer can be placed in one and only one cell in a given category set.
3. Classification: Most research studies result in a large volume of raw data which must be reduced into
homogeneous groups if we are to get meaningful relationships. This fact necessitates classification of data
which happens to be the process of arranging data in groups or classes on the basis of common
characteristics. Data having a common characteristic are placed in one class and in this way the entire
data get divided into a number of groups or classes. Classification can be one of the following two types,
depending upon the nature of the phenomenon involved:
(a) Classification according to attributes: As stated above, data are classified on the basis of common
characteristics which can either be descriptive (such as literacy, sex, honesty, etc.) or numerical (such as
weight, height, income, etc.). Descriptive characteristics refer to qualitative phenomenon which cannot be
measured quantitatively; only their presence or absence in an individual item can be noticed. Data
obtained this way on the basis of certain attributes are known as statistics of attributes and their
classification is said to be classification according to attributes.
(b) Classification according to class-intervals: Unlike descriptive characteristics, the numerical
characteristics refer to quantitative phenomenon which can be measured through some statistical units.
Data relating to income, production, age, weight, etc. come under this category. Such data are known as
statistics of variables and are classified on the basis of class intervals. For instance, persons whose
incomes, say, are within Birr 201 to Birr 400 can form one group, those whose incomes are within Birr
401 to Birr 600 can form another group and so on.
4. Tabulation: When a mass of data has been assembled, it becomes necessary for the researcher to
arrange the same in some kind of concise and logical order. This procedure is referred to as tabulation.
Thus, tabulation is the process of summarising raw data and displaying the same in compact form (i.e., in
the form of statistical tables) for further analysis. In a broader sense, tabulation is an orderly arrangement
of data in columns and rows.
Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.
Generally accepted principles of tabulation: Such principles of tabulation, particularly of constructing
statistical tables, can be briefly states as follows: *
1. Every table should have a clear, concise and adequate title so as to make the table intelligible without
reference to the text and this title should always be placed just above the body of the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be clear and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the table, along
with the reference symbols used in the table.
6. Source or sources from where the data in the table have been obtained must be indicated just below the
table.
7. Usually the columns are separated from one another by lines which make the table more readable and
attractive. Lines are always drawn at the top and bottom of the table and below the captions.
8. There should be thick lines to separate the data under one class from the data under another class and
the lines separating the sub-divisions of the classes should be comparatively thin lines.
9. The columns may be numbered to facilitate reference.
10. Those columns whose data are to be compared should be kept side by side. Similarly, percentages
and/or averages must also be kept close to the data.
11. It is generally considered better to approximate figures before tabulation as the same would reduce
unnecessary details in the table itself.
12. In order to emphasise the relative significance of certain categories, different kinds of type, spacing
and indentations may be used.
13. It is important that all column figures be properly aligned. Decimal points and (+) or (–) signs should
be in perfect alignment.
14. Abbreviations should be avoided to the extent possible and ditto marks should not be used in the
table.
15. Miscellaneous and exceptional items, if any, should be usually placed in the last row of the table.
16. Table should be made as logical, clear, accurate and simple as possible. If the data happen to be very
large, they should not be crowded in a single table for that would make the table unwieldy and
inconvenient.
17. Total of rows should normally be placed in the extreme right column and that of columns should be
placed at the bottom.
18. The arrangement of the categories in a table may be chronological, geographical, alphabetical or
according to magnitude to facilitate comparison. Above all, the table must suit the needs and requirements
of an investigation.
ELEMENTS/TYPES OF ANALYSIS
Analysis means the computation of certain indices or measures along with searching for patterns of
relationship that exist among the data groups. Analysis, particularly in case of survey or experimental
data, involves estimating the values of unknown parameters of the population and testing of hypotheses
for drawing inferences. Analysis may, therefore, be categorised as descriptive analysis and inferential
analysis (Inferential analysis is often known as statistical analysis).
“Descriptive analysis is largely the study of distributions of one variable. This study provides us with
profiles of companies, work groups, persons and other subjects on any of a multiple of characteristics
such as size. Composition, efficiency, preferences, etc.”. this sort of analysis may be in respect of one
variable (described as unidimensional analysis), or in respect of two variables (described as bivariate
analysis) or in respect of more than two variables (described as multivariate analysis). In this context we
work out various measures that show the size and shape of a distribution(s) along with the study of
measuring relationships between two or more variables. We may as well talk of correlation analysis and
causal analysis.
Correlation analysis studies the joint variation of two or more variables for determining the amount of
correlation between two or more variables.
Regression
Causal analysis is concerned with the study of how one or more variables affect changes in another
variable. It is thus a study of functional relationships existing between two or more variables. This
analysis can be termed as regression analysis. Causal analysis is considered relatively more important in
experimental researches, whereas in most social and business researches our interest lies in understanding
and controlling relationships between variables then with determining causes per se and as such we
consider correlation analysis as relatively more important.
Multivariate analysis which may be defined as “all statistical methods which simultaneously analyse
more than two variables on a sample of observations”. Usually the following analyses * are involved when
we make a reference of multivariate analysis:
(a) Multiple regression analysis: This analysis is adopted when the researcher has one dependent variable
which is presumed to be a function of two or more independent variables. The objective of this analysis is
to make a prediction about the dependent variable based on its covariance with all the concerned
independent variables.
(b) Multiple discriminant analysis: This analysis is appropriate when the researcher has a single
dependent variable that cannot be measured, but can be classified into two or more groups on the basis of
some attribute. The object of this analysis happens to be to predict an entity’s possibility of belonging to a
particular group based on several predictor variables.
(c) Multivariate analysis of variance (or multi-ANOVA): This analysis is an extension of twoway
ANOVA, wherein the ratio of among group variance to within group variance is worked out on
a set of variables.
(d) Canonical analysis: This analysis can be used in case of both measurable and non-measurable
variables for the purpose of simultaneously predicting a set of dependent variables from their joint
covariance with a set of independent variables.
Inferential analysis is concerned with the various tests of significance for testing hypotheses in order to
determine with what validity data can be said to indicate some conclusion or conclusions. It is also
concerned with the estimation of population values. It is mainly on the basis of inferential analysis that
the task of interpretation (i.e., the task of drawing inferences and conclusions) is performed.