A students’ companion for writing an empirical paper in economics
Hiro Ishise
Takuma Kamda
Shuhei Kitamura
Masa Kudamatsu
Tetsuya Matsubayashi
Takeshi Murooka
Hidenori Takahashi
OSIPP, Osaka University
December 7, 2020
1 Introduction
A good paper has a good structure and follows writing rules. This document briefly describes a structure and
minimum rules for writing an empirical paper in economics. Its main target is graduate students, especially
master’s thesis writers. Although the document is written in English, most of the content should also apply
to writers in Japanese and other languages.
2 Structure and content
A typical empirical paper in economics has the following structure. (A section with a ∗ mark indicates
that the paper must have that section as a separate section.1 Literature review can be merged with the
introduction if you like. The number in parentheses for each section means a typical length for that section
if it is given.)
1. Abstract* (max 250 words): You write what you ask, what you do, and what you find in the paper,
and nothing else. You may also write a very brief summary of the contribution of your research, but
this is not necessary.
- Do not write the details of your empirical analysis (e.g., variables descriptions in detail).
2. Introduction* (2-3 pages): You start by motivating readers in the way that they think your research
is important and interesting. After that, you write what you ask (a research question and concrete
hypotheses), what you do (empirical methods), and what you find (results) in the study in more detail.
That is, in this part, you write a longer (but not too long) version of the Abstract.
- In contrast to the Abstract, you have to explain the contribution of your research (i.e., what is new
about the study compared with previous studies).
3. Literature review (max 1 page): You briefly review only very related papers to your study. Only a few
most-related papers is enough (say, max five papers). Do not write a laundry list of papers in the field
of your study.
- The main purpose of this section is to stress the contribution of your research compared with previous
studies, and is not to show off how much you search and read the literature. Readers do not want to
know all the papers in the field of your study, and indeed, it’s painful to pay attention to all of them.
1 You can change the section title as you feel fit. The content is more important than the title.
1
- You write a summary of the related part of, and how your research is different from, every studies
you cite. Alternatively, you may organize papers into a few (say, two or three) groups, and then, after
summarizing the papers in each group, you write a difference between your research and theirs.
- A typical bad example is that you write only a summary of the findings of previous studies without
explaining how they are related to your research. Do not ask readers to infer or guess it. Instead, you
do it for them.
4. Empirical framework*: Clearly state what you test and how you test. You need to include the following
four sub-sections: (i) hypotheses, (ii) empirical methods, (iii) identification assumptions, and
(iv) data. The explanation for each sub-section is the following:
- (i) The number of your hypotheses should be minimal. A bad example is that you write a hypothesis
for each variable and there are so many variables in the regression (hence, so many hypotheses!). How
is it possible that each of those variables indeed satisfies the required assumptions? (For example, ask
yourself, “are these varaibles all random?”) A better formulation, though depending on the research
question, is to pick only one main variable, and you put all effort to convince readers that identification
assumptions are likely to be satisfied for that variable. The rest of the variables are used as controls.
- (i) Write the logic behind each hypothesis (e.g., “All else constant, demand for a good should decrease
as the price increases if it is not a Giffen good.”). Cite literature to motivate your hypotheses if there
is anything. Do not make up a hypothesis without any reason behind it.
- (ii) Write regression equations. See below for how to write a regression equation and the explanation
that follows.
- (ii) Write what you anticipate for regression coefficients. (e.g., “I anticipate that the estimate for β1
in the regression equation should be negative and significant, while the estimate for β2 should not be
significant, if the hypothesis is correct.”)
- (ii) In case you use a fancy estimation method (FE, DID, PSM, IV, RDD, etc.) instead of OLS, discuss
why your choice of method is better than OLS. Naturally, you may need to argue the endogeneity issue
here. This does not mean that these methods are better than OLS in general. (If you have such a
perception, you should reconsider it!) OLS can be just fine as long as you can properly add control
variables to alleviate endogeneity issues (in that case, you should explain why the control variables can
alleviate each endogeneity issue).
- (iii) Clearly state the assumptions of your empirical method (parallel trend, weak instrument, ex-
clusion restriction, no sample selection, etc.). Do check the assumptions using data if you can (e.g.,
showing parallel trend). If you cannot check the assumptions somehow, write the reason and its im-
plication for the interpretation of your results in the Discussion section below. At least, clarify the
assumptions you need before starting analysis.
- (iv) Write the sources of your data for all the variables and how you make them (taking natural
logarithm, making a share variable, etc.) if there is anything.
- (iv) Make a summary statistics table. The purpose of making a summary statistics table is to present
the distribution of each variable, which helps readers interpret regression results. You need at least to
report the number of observations, the mean, standard deviation, and min and max values for each
variable.
5. Results*: Show the results in better ways. If possible, start by showing a figure, instead of a table,
to present the main result. (What kind of figure do you think would best explain the main result of
your study? Think about it.) In addition to the figure, you always need to make regression tables. See
below for how to make a regression table and interpret your results.
- If you insert a figure or a table, you always need to explain it somewhere in the text. Do not just put
it without explaining it. Again, do not ask readers to infer or guess it. Instead, you do it for them.
- If you take a figure or a table from a reference, make sure to cite it. Otherwise, it is considered as
type of plagiarism.
2
- Start by the OLS regression and show the OLS estimate. Then, compare the estimate with the
estimate you get from another method (DID, PSM, IV, RDD, etc.) if you use any other method than
OLS. If you get different estimates, argue why.
6. Discussion*: You can discuss anything related to your results in this section such as difference between
your results and results from previous studies; your interpretation of any counter-intuitive result; the
potential endogeneity bias which cannot be solved by your empirical method, etc.
- Make a sub-section in which you discuss the validity of the assumptions. Are all the assumptions
required for implementing your method (e.g., i.i.d. assumption for the error term, no correlation
between the independent variable and error term after adding control variables (CIA assumption),
assumptions for each fancy method (DID, PSM, IV, RDD, etc.)) are likely to be satisfied? If not, how
does it affect the interpretation of your results? Also, is there any remaining endogeneity that your
method cannot perfectly solve? If so, discuss the direction of the bias (overestimated? underestimated?)
and write how you can possibly alleviate it in the future (by finding a good instrument, adding more
data, etc.).2
- Do not afraid to mention the limitation of your study. It is better to say that your study has some
limitation (due to e.g. endogeneity) than to pretend that it is perfect (unless it is indeed perfect).
Imagine that researchers or policy makers read your paper. Suppose that there is a Type I error (that
is, you claim that there is an effect, although there is not). Don’t you think that your results will give
a wrong information to them?
- Similarly, do not afraid to show insignificant results. Even if your main estimation result is insignif-
icant, you can at least argue why. It could be because of small sample size, endogeneity, etc. Or it
could just be that the true effect is null.
7. Conclusion*: You briefly write what you ask, do, and find in the paper. Most importantly, you need to
write (a) the contribution of your paper (what is new), (b) the limitation of the analysis (for example,
the endogeneity bias which you could not solve) if there is anything, and (c) the possible direction of
future research.
- Writing policy implications is not necessary (unless you want or your supervisor ask you to do so).
At least, if you are not sure whether your estimates are biased or not, you should not state any firm
assertion. Moreover, you should not provide any implication that is not directly derived from your
empirical results.
8. References*: Cite all the references you cite in the paper. Citing style can be anything but economists
often use the Chicago style.
3 Writing regression equations
• Clearly define each variable. Don’t forget specifying the unit if necessary (in kilometers? meters?
tons?, kilograms?, etc.). For dummies, explain the meaning of each value (e.g., what is the meaning of
sex = 1?).
• Avoid using the same parameters in different regression equations (e.g., if you are using α in two
regression equations, use another parameter for one of them).
• Avoid writing all control variables in the equation. Use a vector representation, if possible.
Example: A regression equation for individual i and year t is written by
yit = α + βDit + Xit γ + φi + τt + it , (1)
where yit is wage in the 1990 US dollars (logged), α is a constant, Dit is a treatment dummy, which takes
value one if treated, and zero otherwise, Xit is a vector of time-variant control variables, and φi and τt
2 Do not pretend that you are free from the endogeneity bias unless the variation is econometrically random. Be humble.
Even if you use a fancy method (DID, PSM, IV, RDD, etc.), the estimate is likely to be biased.
3
are fixed effects. Control variables include employment status (1=employed, 0=unemployed) and marriage
status (1=married, 0=unmarried).
4 Making regression tables
• Report the number of observations and relevant basic statistics for your estimation method (R-squared
for OLS, first-stage F-statistic for IV, log-likelihood for MLE, etc.)
• Mention the meaning of each variable name. It is not clear what could be the meaning of e.g., lnwage
(could be the (natural) logarithm of wage?), faminc (could be family income?), etc. without explana-
tion. A simple fix is to use labels (ln wage, family income, etc.) instead.
• Mention what type of standard errors is used (e.g., robust standard errors? cluster-robust standard
errors?)
• Do not report estimates on fixed effects unless necessary. You can wisely use e.g. a Yes/No sign or a
check mark at the bottom of the regression table.
Example:
Dependent variables: log wage
(1) (2) (3)
Treatment −11.10 −10.51 −5.21
(2.22) (2.21) (2.10)
Fixed effects No No Yes
Other controls No Yes Yes
R2 0.02 0.02 0.10
Observations 400 400 400
Notes: Standard errors are in parentheses.
Table 1: The treatment effect on log wage.
5 Interpreting results
• Do not only interpret significant results. Insignificant results can be interesting too, depending on the
context.
• Show results as they are. For example, do not omit a variable because it is insignificant. Showing only
significant results is called p-hacking, a misuse of data analysis (so you must not do it).
• Not only significance, but also the magnitude and the sign are important. Make sure to interpret the
size of the main effect.
• Do not forget specifying the unit when interpreting results (e.g., percent?, percentage point?).
• Do not need to report the significance level (e.g., at the 5% level).
• Do not interpret all results. Report only results that are related to your argument. Readers can read
tables by themselves if they want.
• Compare your results with those from previous studies if possible.
• The reporting order should be wisely organized. You do not need to report from the top to the bottom
of a regression table, for example.
4
Examples of the sentence for interpreting significant results, by assuming that you find a negative estimate
by regressing Y on X.:
1. Correlation
• An additional X is associated with a reduction/decrease of Y by Z.
• A one percentage point increase in X corresponds to/associated with a Z percentage point decrease
in Y.
• X and Y are negatively related/correlated.
• Y and X have a negative relationship/association.
• We find a negative relationship/association between X and Y.
• The negative relationship between X and Y is statistically different from zero.
2. Causation (You should avoid using these sentences unless your estimate captures a causal effect.)
• A one percentage point increase in X decreases Y by Z percentage point.
• The negative effect of X on Y is statistically different from zero.
6 Citing references
• Plagiarism is an academic theft. Do not copy someone else’s words, sentences, thoughts, ideas, or
findings without referring to the original. If plagiarism is detected, it affects not only you, but also
your supervisors.
Examples of citation style:
1. At the end/middle of a sentence:
• Single author: (Smith, 2015)
• Two authors: (Smith and Ricardo, 2017)
• More than two authors: (Smith et al., 2015) or (Smith, Ricardo, and Marshall, 2015)
• More than one reference: (Smith and Ricardo, 2017; Smith et al., 2015)
2. In the text:
• Single author: “Smith (2015) shows that” or “Adam Smith (2015) shows that”
• Two authors: “Smith and Ricardo (2015) show that” or “Adam Smith and David Ricardo (2015)
show that”
• More than two authors: “Smith et al. (2015) show that” or “Smith, Ricardo, and Marshall (2015)
show that” or “Adam Smith, David Ricardo, and Alfred Marshall (2015) show that”
Not in the examples but you may often add the middle names too.
7 Useful sources
• This website (https://sites.google.com/site/mkudamatsu/tips4economists) contains a list of links for
useful sources about writing tips and others.
• Learn from already published papers!