The Gender Earnings Gap: Measurement and Analysis: Esfandiar Maasoumi Emory University
The Gender Earnings Gap: Measurement and Analysis: Esfandiar Maasoumi Emory University
Esfandiar  Maasoumi
Emory  University
Le  Wang
University  of  New  Hampshire,  Harvard  University,  &  IZA
Abstract
When summary measures of latent concepts such as the gender gap fail to be adequately represen-
tative,  one  must seek  better  denitions and  measures.   This  paper  presents a  set  of  complementary
concepts and measurements of the gender gap that move beyond the traditional summary compar-
isons  of   the  earnings  distributions.   In  particular,   we  propose  a  new  concept  of   the  gender  gap
based  on  the  the  distance  between  entire   distributions   with  compelling  properties:   It   is   free  of
outlier  eects,  is  capable  of  representing  populations  with  heterogeneous  gaps  at  dierent  parts  of
the  outcome  distributions,  and  is  invariant  to  increasing  transformations.   When  the  gender  gap  is
dierent  or  of  even  dierent  sign  at  dierent  quantiles,   subjective  comparisons  become  inevitable
in  any  summary,  cardinal  comparisons.   In  response,  we  introduce  tests  based  on  stochastic  domi-
nance  to  allow  for  uniform  rankings  of  the  earnings  distributions  between  men  and  women.   Using
the  Current  Population  Survey  data,   we  rst  construct  a  new  series  on  the  gender  gap  from  1976
to  2011  in  the  United  States.   We  nd  that   traditional   representative  or   moment-based  mea-
sures  underestimate  a  declining  trend  in  the  gender  gap  during  this  period.   More  important,
these  traditional   measures  do  not  necessarily  reect  the  cyclicality  of   the  gender  dierentials  in
earnings  distributions,  and  may  even  lead  to  false  conclusions  about  how  labor  market  conditions
are  related  to  the  gender  gap  at  the  aggregate  level.   Second,   while  we  nd  rst-order  stochastic
dominance  in  most  cases,   even  for  the  recent  recession  where  men  were  hit  harder,   we  also  nd  a
few  instances  where  denite  conclusions  regarding  the  gender  gap  cannot  be  drawn  at  all   or  only
under  more  restrictive  social   evaluation  functions.   Finally,   we  conduct  full   distribution  counter-
factual  analysis  which  suggests  that,  in  many  cases,  altering  the  earnings  structure  would  be  more
eective  in  improving  womens  welfare  (reducing  discrimination)  than  would  changing  human
capital  characteristics.
   The   authors   would  like   to  thank  seminar   participants   at   Bentley  University,   Union  College,
Emory  University,  and  University  of  New  Hampshire  for  their  helpful  comments.
1
1.   Introduction
Studying the  gender  gap, generally referencing the earnings dierences between men and
women,   is   an  important   undertaking.   It   is   at   the   core   of   social   sciences   to  understand
inequality/inequity in a society, as well as the labor market outcomes, and helps shed light on
potential policy directions.   Policy makers and economists are interested in several questions.
One  is,   How  large  is  the  gender  gap?   Another  is,   How  do  women  generally  fare  compared
to  men  in  the  labor   market?   We  emphasize  the  question  of   What   is   the  gender   gap,
quantitatively speaking?  The answers to these questions are more complex than is implicitly
assumed  in  many  of  the  current  responses.   They  require  a  careful   analysis  of  the  earnings
dierentials  between  groups.   When  summary  notions  of   the  gap,   such  as  average/mean
or  median  earnings  dierentials  fail  to  be  representative  of  diverse  magnitudes  and  signs  at
dierent   parts   of   the  earnings   distribution,   the  very  notion  of   the  gap  is   itself   in  need
of  reection  and  innovation.   This  paper  oers  some  proposals  in  this  regard,   and  develops
the  statistical   means   for   implementing  them.   We  oer   alternative  summary  measures   to
dene  the  gap,  as  well  as  rigorous  means  of  dening  and  testing  rankings  between  entire
distributions.
Conventional   wisdom  about  the  gender  gap  is  that  women  do  not  fare  as  well   as  men
do  in  the  labor  market.   Often  cited  to  support  this  view  is  the  examination  of  the  earnings
dierentials  between  average  or  median  men  and  women  (e.g.   Polachek,   2006;   Blau  and
Kahn, 2006).   The sign of the dierence tells us which group fares better in the labor market,
and the magnitude is a measure of the severity of the situation.   Although useful, the gender
dierences   reported  by  these   summary  measures   may  not   represent   the   gender   earnings
gap  in  all   parts  of   the  whole   distribution  of   earnings.   This  is  especially  so  when  the  sign
and/or  magnitude  of  the  gap  is  dierent  at  dierent  quantiles  of  the  earnings  distribution.
Researchers   are   increasingly  aware   of   these   issues,   and  dierences   at   other   parts   of   the
earnings distributions (e.g quintiles and percentiles) are also reported in recent years.   These
dierent  summary  statistics  greatly  improve  our  understanding  of   the  extent  and  location
2
of   the  gender  gap,   but  may  still   present  a  bewildering  view  of   the  distribution  of   the  gap,
both  at  a  given  point  in  time,  and  its  evolution  over  time.   Better  and  more  comprehensive
summary  measures  can  help.
All  summary  measures  are  aggregation  devices  which  assign  subjective,  explicit  and  im-
plicit weights to dierent groups and parts of distributions (high and low earners, for exam-
ple).   This  calls  for  an  examination  of   potential   uniformrankings  that  are  robust  to  the
subjective  weight  distributions/welfare  functions  that  underlie  summary  measures.
To  make  some  of   our  points  more  concrete,   consider  the  following  numerical   examples
for  a  society  with  only  two  men  (MA  and  MB)  and  two  women  (FA  and  FB).
Example  1  The  dierence  in  earnings  between  FA  and  MA  (who  have  similar  char-
acteristics  other  than  gender)  is $200,  and  that  between  FB  and  MB  is $500  (also
with  similar  characteristics  other  than  gender).   A  typical  measure,  the  average  dier-
ence, suggests that the gender gap is $350.   However, the average person here is not
representative.   And  reporting  either $200  or $500  would  also  ignore  the  other
half of the society and consequently would not summarize the situation well.   Note that
in  this  example,   at  least  the  summary  measure  as  well  as  the  quantile  dierences  are
all  negative,  implying  that  men  fare  better  than  do  women  in  the  labor  market.
Example 2 The dierence in earnings between FA and MA is $200, but that between
FB  and  MB  is   now  $200.   The   average   gender   gap  is   $0!   Again,   this   fails   to  be
representative,   since  it  suggests  there  is  no  gender  gap  at  all.   Both  the $200  and
$200  strongly  misrepresent   the   rest   of   the   society.   Compared  to  Example   1,   only
extremely  subjective  weights  would  support  any  ranking.   Since  these  dierences  are
of opposite signs, they suggest dierent rankings of the earnings distributions between
men  and  women.   Any  conclusions  would  be  based  on  an  arbitrary  weighting  scheme.
Example  3  Consider  another  example  similar  to  the  second  one  but  with  more  in-
formation  on  actual  earnings  for  each  individual.   MA  earns  $55, 000  and  FA  $54, 800.
3
MB  earns  $1200  and  FB  $1, 000.   The  dierence  at  both  parts  of   the  distribution  is
$200,  and  in  one  direction.   The  average  gap  of  200  would  give  the  same  weight  what-
ever the level of earnings.   Given the additional   information on each persons earnings,
greater  aversion  to  inequality  may  give  greater  weight  to  the  dierence  in  the  lower
tail,   concluding  existence  of   a  greater   gender   gap  in  favor   of   men.   However,   these
types  of  subjective  preferences  are  usually  not  explicitly  stated  alongside  summary
measures.   The  typical   utility  function  in  economics  is  a  von  Neumann-Morgenstern
type  function,  being  increasing  and  concave.   Can  we  empirically  identify  situations  in
which  the  gender  gap  will   be  ranked  uniformly  by  all   observers  subscribing  to  any
member  of  a  family  of  utility  functions?
The  measurement  problem  would  be  even  more  acute  when  examining  the  time  trend
of   the  gender  gap.   The  timing  of   temporal   deviations  from  the  long-run  trend  could  vary
across  dierent  measures,  which  could  in  turn  lead  to,  for  instance,  a  confused  sense  of  the
impact  of   business  cycles  on  the  gender  gap.   We  rst  propose  a  distributional   measure  of
the  gender  gap  based  on  the  normalized  Bhattacharay-Matusita-Hellinger  entropy  measure
proposed  by  Granger  et  al.   (2004).   One  important  feature,   among  others,   of  this  measure
is  its  ability  to  summarize  the  distance  between  two  entire  distributions,   instead  of  binary
dierences  at  dierent  parts   of   the  distributions.   Another  advantage  of   our  measure  is  its
invariance  to  transformations,   generally,   and  to  log  earnings   transformation,   particularly.
Second,  we employ stochastic dominance  (SD)  tests to  rank  the earnings  distributions.   The
SD tests have been widely used to analyze poverty issues and nancial outcomes, but not to
analyze  the  gender  earnings  gap.   The  advantage  of  our  SD  approach  is  its  explicit  welfare
underpinning,   utilization  of   the  entire  earnings  distributions,   and  ability  to  yield  uniform
rankings   of   distributions   that   are  robust   across   a  wide  class   of   welfare  functions,   as   well
as  underlying  (and  unknown)  earnings  distributions.   Inferring  a  high  dominance  relation
implies that comparisons based on multiple measures, while supported, are often unnecessary.
Moreover,  the  inability  to  infer  a  dominance  relation  is  equally  informative,  indicating  that
4
any  ranking  must  be  based  on  a  particular  weighting  scheme  or  a  specic  welfare  function.
In  the  latter   instances,   conclusions   are  revealed  to  be  highly  subjective  and  may  not   be
conducive  to  consensus  policy-making.
Our  methodology  here  is  broadly  applicable,  including  to  counterfactual  analysis  of  pol-
icy  experiments,   in  which  one  compares  the  potential   outcome  distribution  for  individuals
impacted  by  a  policy  with  their  actual   outcomes.   Policymakers  and  economists  are  often
interested in certain policies to bridge the gender gap and improve the well-being of women.
These policies can be loosely classied into two groups:   (1) policies aimed at changing wom-
ens  pay  structure  and  (2)  policies  aimed  at  changing  their  observable  characteristics  that
aect  their  earnings.   These  types  of   policies  are  related  to  two  major  reasons  that  many
believe  explain  the  dierences   in  earnings   between  women  and  men:   dierences   in  wage
structure  and  dierences   in  human  capital   characteristics.   The  former   is   often  identied
with  discrimination.   It  is  useful  to  provide  an  assessment  of  the  changes  in  the  potential
earnings  among  women  resulting  from  any  implemented  policy.   To  evaluate  a  policy,   we
need  to  compare  the  original   earnings  distribution  with  the  potential   earnings  distribution
resulting  from  a  policy.   We  employ  new  developments  for  identifying  counterfactual   earn-
ings distributions based on estimated inverse probability weighting methods, see Fortin et al.
(2011).
To  illustrate  our  proposals,   we  utilize  the  Current  Population  Survey  (CPS)  data  1976
-  2011  in  the  U.S.   for  our  empirical   analysis.   We  reach  several   conclusions.   First,   we  nd
that traditional summary measures severely underestimate the declining trend of the gender
gap  in  the  U.S.  In  particular,  our  entropy  gender  gap  measure  implies  the  gap  narrowed  at
an  average  annual  rate  of  about  11%  during  the  period  1976-2011,  while  the  largest  annual
rate recorded by conventional measures (based on the median) is only 5.2% during the same
period.   It may be helpful to note that, entropy measures are functions of all the moments of a
distribution,  much  as  moment  generating  functions.   As  such,  they  gauge  the  convergence
between  the  entire  distributions  for  men  and  women  accordingly.   Our  measure  may  thus
5
be  seen  as   a  broad  measure.   The  broad  measure  of   gender   gap  dropped  precipitously
before 1990s, but the convergence has drastically slowed down since.   Moreover, even though
all  measures  show  a  consistently  declining  trend  of  the  gender  gap,   the  timing  of  temporal
deviations  from  the  long-run  varies  across  dierent  measures,   which  in  turn  can  lead  to  a
false  reading  of   the  impact   of   business   cycles   on  the  gender   gap.   Our   measure  indicates
that  our  broad  measure  of   the  gender  gap  is  relatively  insensitive  to  changes  in  economic
conditions,  except  around  2001.
Second,   comparing  the   actual   earnings   distributions,   we   generally  observe   rst-order
stochastic  dominance  to  a  degree  of   statistical   condence  throughout   the  sample  period,
implying  that  men  have  generally  performed  better  in  the  labor  market.   To  our  surprise,
we  nd  dominance  even  for  the  beginning  of  the  recent  recessionary  period  when  men  were
viewed  as  having  been  hit  harder  than  women.   This  conclusion  is  robust  to  a  wide  class  of
increasing  social   welfare  functions.   However,   we  do  nd  several   cases  where  the  relation  is
only second-order, and one case where no statistically meaningful dominance exists.   In these
cases, the inference that men fare better than do women is only supported by a narrower class
of  social   welfare  functions  that  are  both  increasing  and  increasingly  averse  to  inequality
(concave).   Moreover,   it  is  less  likely  to  nd  statistically  signicant  rst-order  dominance
relations  during  the  pre-1994  period.   These  results  altogether  suggest  that  womens  labor
market  situation  has  over  time  improved  relative  to  mens.   Nevertheless,  strong  evidence  of
dominance relations in most cases indicates that the improvement is still far from satisfactory;
this result casts doubt on the broad eectiveness of the reforms intended to improve womens
relative  labor  market  outcomes.
Finally,   combining  the  methods  proposed  here  and  the  recent  development  in  identi-
cation  of   counterfactual   analysis,   we  compare  the  actual   female  earnings  distribution  with
two female counterfactualdistributions:   (a) women earning distribution under the earnings
structure  of  men  (discrimination)  and,  (b)  women  earnings  distribution  should  they  have
mens  characteristics.   The  former  captures  structural   eects  and  the  latter  composition  ef-
6
fects.   We nd that structural eects are generally more important than composition eects.
However, the importance of structural eects has declined over time,  while that of composi-
tion  eects  has  increased.   Our  SD  results  indicate  that  policies  aimed  at  changing  womens
pay structure are generally eective in improving womens earnings prospects, while policies
aimed  at   changing  womens   human  capital   characteristics   are  not.   However,   for   policies
aimed  at  changing  pay  structure,   we  fail   to  nd  rst-order  dominance,   even  second-order
dominance,  to  a  statistical  degree  of  condence,  in  a  few  cases.   This  implies  that  there  are
winners  and  losers,   and  broad  policy  conclusions  cannot  be  drawn  without  imposing
more  subjective  weights  on  subgroups  through  narrower  well   being  functions.   We  want  to
emphasize that the two thought experiments conducted here are by no means precisely iden-
tied  with  specic  policies.   However,   they  do  represent  the  traits  of   the  policies  generally
considered  for  relative  improvement  in  womens  labor  market  outcomes.   Working  through
such exercises using our proposed tools illustrates the importance of these tools in a broader
analysis  of  the  gender  gap.
While  in  this  paper  we  focus  on  measurement  and  analysis  of  the  broad  gender  earnings
gap, we believe that our research could be further extended along several dimensions.   First,
our   approach  could  be   readily  adapted  to  measure   and  analyze   other   types   of   earnings
distances,   such  as  advantaged  vs.   disadvantaged  groups  (e.g.   white  v.s.   black    the  racial
gap).   Second,   in  this   paper   we   focus   on  the   earnings   as   the   only  attribute   of   welfare.
However,  researchers  have  long  recognized  that  welfare  involves  not  only  earnings  but  other
attributes  such  as  health,  and,  as  a  result,  a  growing  literature  has  developed  investigating
multi-dimensional welfare measures that take into account earnings and other factors jointly
(e.g.   Wu  et   al.,   2008).   Our   approach  is   constructed  over   the  space  of   distributions   and
can be seamlessly applied to univariate and multi-outcome contexts.   Finally, aggregate time
series of our distributional measure of the gender gap, once obtained, can be used for further
empirical analysis.   For example, Biddle and Hamermesh (2011) has noted that little is known
about  how  wage  dierentials  vary  with  the  extent  of  labor  market  conditions.   Our  measure
7
can  directly  be  used  for  this  purpose  (in  the  spirit  of   Ashenfelter  (1970))  to  examine  the
aggregate  relationship  between  the  gender  gap  and  the  aggregate  unemployment  rate.
The  rest   of   the  paper   is  organized  as  follows.   Section  2  presents  the  empirical   meth-
ods  employed;   Section  3  describes  the  data;   Section  4  discusses  the  results,   and  Section  5
concludes.
2.   Empirical  Methodology
2.1.   Basic  Notations
To begin, let ln(w
f
) and ln(w
m
) denote the log of earnings for females and males, respec-
tively.   We  observe  a  random  sample  of   N  =  N
0
 + N
1
  individuals.   {ln(w
f
)}
N
1
i=1
  is  a  vector
of  N
1
  observations  of  ln(w
f
)  (denoted  by  D
i
  =  1);   similarly, {ln(w
m
i
  )}
N
0
i=1
  is  a  vector  of  N
0
observations  of   ln(w
m
)  (denoted  by  D
i
  =  0).   Let  F
1
(y)   Pr[ln(w
f
)   y]   represent  the
cumulative  density  function  (CDF)  of  ln(w
f
)  (i.e.   the  log  of  earnings  for  females)  and  f
1
(y)
the corresponding probability density function  (PDF);  F
0
(y) and  f
0
(y) are similarly dened
for  ln(w
m
)  (i.e.   the  log  of  earnings  for  males).   Individual  earnings  are  determined  by  both
observable characteristics X
i
 and unobservable characteristics 
i
 via unknown wage structure
functions,
ln(w
d
i
) = g
d
(X
d
i
 , 
d
i
)   d = m, f
This  specication  implies  that  the  gender  gap  is  from  three  sources:   (1)  dierences  in  the
distributions  of   observable  human  capital   characteristics  X
d
i
  (e.g.   years  of   schooling);   (2)
dierences  in  the  distributions  unobservable  human  capital   characteristics  
d
i
  (e.g.   innate
ability); (3) dierences in the wage structures, g
d
().   Note that these wage functions are not
restrictive  and  allow  for  complicated  interactions  among  X
d
i
  and  
d
i
.
2.2.   A  Distributional   Measure  of  the  Gender  Earnings  Gap
Usually,   the   gender   gap  is   dened  as   the   dierence   in  certain  parts   (or   functionals)
of   the   earnings   distributions   between  males   and  females.   For   example,   average   gender
8
gap  is   the   dierence   in  the   means   of   the   earnings   distribution  between  men  and  wom-
en  (E[ln(w
m
i
  )]  E[ln(w
f
i
 )])  (where  the  mean  is  the  rst  moment  of   the  earnings  distribu-
tion).   The  gender  gap  at  a  p
th
quantile  is  ln(w
m
i
  )
p
 ln(w
f
i
 )
p
,   where  the  p
th
quantile  of  F
0
(the  CDF  for  womens  wage  distribution)  is  given  by  the  smallest  value  ln(w
f
i
 )
p
such  that
F
1
(ln(w
f
i
 )
p
) = p;  ln(w
m
i
  )
p
is  similarly  dened  for  F
1
  (the  CDF  for  mens  wage  distribution).
Even  though  these  measures  are  all   functionals  of  the  wages  distributions,   none  of  them  is
able  to  summarize  the  information  in  the  whole  distribution.   This  problem  is  particularly
acute  when  the  measures  dier  in  terms  of  magnitudes  and  sizes  across  dierent  measures
used.   Hence,   needed  is   a  distributional   measure  of   the  gender   gap,   or   a  measure  of   the
distances  in  the  earnings  distributions  between  females  and  males.
Several  commonly  used  information-based  entropy  measures  such  as  Shannon-Kullback-
Leibler  are  available  to  measure  the  information  at  the  distributional  level.   However,  Shan-
nons  entropy  measure  as  well   as  almost  all   other  entropy  measures  are  not  metric;   these
measures  violate  the  triangularity  rule  and  hence  cannot  be  used  as  a  measure  of  distance.
To this end, we use a metric  entropy measure S
  =
  1
2
(f
1
2
1
 f
1
2
0
 )
2
dy   (1)
This  measure  satises  several   desirable  properties  as  a  distance  metric  between  entire
distributions:   (1)   it   is   well   dened  for   both  continuous   and  discrete  variables;
1
(2)   it   is
normalized  to  zero  if  Y
1
  and  Y
0
  are  equal,   and  lies  between  0  and  1,   (3)  it  is  a  metric  and
hence a true measure of distance,  (4) it is invariant under continuous and strictly increasing
1
Although  (1)  presumes  that  the  variables  are  continuous,  one  can  easily  adapt  this  measure  to  the  case
of discrete variables, S
  =
  1
2
(p
1
2
1
 p
1
2
0
 )
2
where p
1
  (p
0
) is the marginal probability of the random variable Y
1
(Y
0
).   This  generalization  allows  us  to  measure  the  dierences  in  a  broader  set  of  outcomes  between  groups
at  the  distributional  level.
9
transformation  h()   on  the  underlying  variables.
2
Recall   that   following  the  literature  we
utilize the log of earnings as the variable of main interest.   Since the log is a strictly increasing
function, our measure of the gender gap is the same, whether we use the raw wages or the log
of  it.   Moreover,   entropies  are  dened  over  the  space  of  distributions   and  are  consequently
dimension-less as it applies to univariate and multivariate contexts.   Economists have been
increasingly  aware  of  the  fact  that  evaluation  of  individual  well-being  is  inevitably  a  multi-
attribute  exercise  (Lugo  and  Maasoumi,  2008).   This  feature  may  become  very  useful  when
we  consider  the  multidimensional   gender  gap  measure  to  incorporate  attributes  other  than
wages.
Following  Granger  et  al.  (2004)  and  Maasoumi  and  Racine  (2002),  we  consider  a  robust
nonparametric  kernel-based  implementation  of   (1)  (The  computer  code  -srho-  written  by
the   authors   in  Stata  is   also  available   upon  request).   In  our   illustrative   example   below,
we  use  Gaussian  kernels  and  a  more  robust  version  of  the  normal  reference  rule-of-thumb
bandwidth (= 1.06 min(
d
,
  IQR
d
1.349
)n
1/5
, where 
d
, d = m, f  is the sample standard deviation
of {ln(w
d
i
)}
N
d
i=1
;   IRQ
d
is   the  interquartile  range  of   the  sample  d.).   Interested  readers   are
referred  to  Li   and  Racine  (2007)   for   more  sophisticated  bandwidth  selection  procedures.
Integrals are numerically approximated by the integrals of the tted cubic splines of the data,
which give superior results for most smooth functions (StataCorp, 2009).   The asymptotic
distribution  of   the  feasible  measure  has  been  derived  by  Skaug  and  Tjostheim  (1996)  and
Granger et al. (2004).   However, these asymptotic approximations are well known to perform
very  poorly  in  almost  every  case  examined.   As  a  result,   in  the  analysis  below,   we  instead
employ  bootstrap  re-sampling  procedure  based  on  299  replications  to  obtain  critical  values
of  hypothesis  testing  of  H
0
  : S
  = 0.
Our entropy measure of gender gap gives us information on the strong ranking of two wage
distributions.   However,  it  does  not  directly  tell  us  which  distribution  is  (weakly)  uniformly
2
Integrated  squared  norm  (L2)  also  shares  many  of  these  properties,  but  it  is  not  normalized  and  is  not
invariant  to  transformations.   And  it  is  also  thought  to  be  more  sensitive  inliers  and  outliers  (Hart,  1997).
10
better relative to large classes of welfare functions.   and under what conditions.   Below, we
explicitly  introduce  these  concepts  to  rank/compare  two  distributions.
2.3.   Stochastic  Dominance
We  employ  recent  tests  for  Stochastic  Dominance  (SD)  to  enable  uniform  welfare  com-
parisons  of  the  earnings  distributions  between  females  and  males  (ln(w
f
)  and  ln(w
m
)).   The
SD  approach  identies  for  which  class  of   social   welfare  functions  rankings  of   the  earnings
distributions  are  possible.   In  this  paper,  we  consider  two  classes  of  social  welfare  functions
that  are  commonly  used  in  economics  and  nance.   Let  U
1
  denote  the  class  of  all  increasing
von Neumann-Morgenstern  type social  welfare functions u such that welfare is increasing in
wages  (i.e.   u
  0),  and  U
2
  the  class  of  social  welfare  functions  in  U
1
  such  that  u
  0  (i.e.
concave).   Concavity implies an aversion to higher dispersion (or inequality, or risk) of wages
across  individuals.   We  are  interested  in  the  following  scenarios:
Case  1  (First  Order  Dominance):
Male  Earnings   (ln(w
m
))   First   Order   Stochastically  Dominates   Female  Earnings   (ln(w
m
))
(denoted  ln(w
m
)  FSD  ln(w
m
))  if  and  only  if
1.   E[u(ln(w
m
))]  E[u(ln(w
f
))]  for  all  u  U
1
  with  strict  inequality  for  some  u;
2.   Or,  F
0
(y)  F
1
(y)  for  all  y  with  strict  inequality  for  some  y.
Case  2  (Second  Order  Dominance):
Male  Earnings  (ln(w
m
))  Second  Order  Stochastically  Dominates  Female  Earnings  (ln(w
f
))
(denoted  ln(w
m
)  SSD  ln(w
f
))  if  and  only  if
1.   E[u(ln(w
m
))]  E[u(ln(w
f
))]  for  all  u  U
2
  with  strict  inequality  for  some  u;
2.   Or,
F
0
(t)dt 
F
1
(t)dt  for  all  y  with  strict  inequality  for  some  y.
11
These  two  cases  imply  rankings  of  the  earnings  distributions  under  dierent  conditions.
Specically,  if  the  case  1  holds  (ln(w
m
)  FSD  ln(w
f
)),  then  the  earnings  distribution  among
men is better than that among women for all policymakers with increasing utility functions
in  the  class   U
1
  (with  strict   inequality  holding  for   some  welfare  function(s)   in  the  class),
since  the  expected  social   welfare  from  ln(w
m
)  is  larger  or  equal   to  that  from  ln(w
f
).   Note
that  ln(w
m
)  FSD  ln(w
f
)  implies  that  the  average  male  wages  are  greater  than  the  average
female  wages.   However,   a  ranking  of  the  average  wages  does  not  imply  that  one  FSD  the
other;   rather,   the  entire  distribution  matters  (Mas-Colell   et  al.,   1995,   p.196).   Similarly,
if   (ln(w
m
)   SSD  ln(w
f
)),   then  the  earnings   distribution  of   males   is   better  than  that   of
females   for   all   those   with  any   increasing   and  concave   welfare   functions   in  the   class   U
2
(with  strict   inequality  holding  for   some  utility  function(s)   in  the  class).   Note  that   FSD
implies   SSD.   One   immediate   advantage   of   this   approach  is   that   our   conclusions   do  not
depend on any specic wage distributions and/or weights assigned to subgroups within then
population.   This  approach  is  thus  able  to  yield  uniform  rankings  of   distributions  that  are
robust   across   a  wide   class   of   welfare   functions,   rendering  comparisons   based  on  specic
indices  unnecessary,   but  possible  and  more  broadly  supported.   Higher  order  SD  rankings
are  based  on  narrower  classes  of  welfare  functions.   For  instance,  Third  Order  dominance  is
associated  with  welfare  functions  with  increasing  aversion  to  inequality  which  place  greater
weight  on  welfare  improving  transfers  at  the  lower  tails  of  the  earnings  distribution.
In this paper, we employ stochastic dominance tests based on a generalized Kolmogorov-
Smirnov  test  discussed  in  Linton  et  al.   (2005)  and  Maasoumi   and  Heshmati   (2000).   The
Kolmogorov-Smirnov test statistics for FSD and SSD are based on the following functionals:
d   =
  N
0
N
1
N
0
 +N
1
min sup[F
1
(y) F
0
(y)]   (2)
s   =
  N
0
N
1
N
0
 +N
1
min sup
[F
1
(t) F
0
(t)]dt   (3)
12
The test statistics are based on the sample counterparts of d, and s by replacing CDFs with
empirical   ones;   the  empirical   CDFs   are  given  by
  
F
1
(y)   =
  1
N
1
N
1
i=1
I(ln(w
f
i
 )   y),   where
I()  is  an  indicator  function;
  
F
0
(y)  is  similarly  dened.   The  underlying  distributions  of  the
test  statistics  are  generally  unknown  and  depend  on  the  data.   Following  the  literature  (e.g.
Maasoumi   and  Heshmati,   2000;   Millimet   and  Wang,   2006),   we  use  bootstrap  techniques
for  iid  samples  based  on  299  replications  to  obtain  the  actual  sampling  distributions  of  the
test  statistics.   This  approach  estimates  the  probability  the  statistics  falling  in  any  desired
interval,   as  well   as  indicate  where  the  sample  value  of   the  test  statistic  lies.   For  instance,
if  the  probability  of  d  lying  in  the  non-positive  interval   (i.e.   Pr[d   0]   is  large,   say  .90  or
higher,  and
 
d  0,  we  can  infer  FSD  to  a  high  degree  of  statistical  condence.   We  can  infer
SSD  based  on  s  and  Pr[s   0]   in  a  similar  fashion.   All   technical   details  are  presented  in
Appendix  1.
2.4.   Counterfactual   Distributions
We  are  often  interested  in  assessing  two  types  of  counterfactual   situations:   First,   what
if  we  interchange  the  wage  structure  of  women  with  the  wage  structure  of  men,  holding  the
distribution  of  womens  human  capital  characteristics  constant?  Second,  what  if  we  change
the distribution of womens human capital characteristics to that of mens, holding the wage
structure  unchanged?   Will  these  counterfactual  distributions  be  dierent  from  the  original
one?  Will these dierences necessarily cover any distance between the earnings distribution-
s?  Our proposed approaches can be readily applied to answer these counterfactual questions
by measuring the distances between the female earnings distribution and the counterfactual
distribution,  and  by  ranking  them.   An  important  step  is  to  identify  the  counterfactual  dis-
tributions  of  interest.   Specically,  we  want  to  identify  the  following  counterfactual  outcome
distributions:
ln(w
c1
i
  )   =   g
0
(X
i1
, 
i1
)   (Counterfactual  Outcome  #1)   (4)
ln(w
c2
i
  )   =   g
1
(X
i0
, 
i0
)   (Counterfactual  Outcome  #2)   (5)
13
F
c1
  (f
c1
)  represents  the  corresponding  CDF  (PDF)  of   the  counterfactual   outcome  ln(w
c1
i
  ).
F
c2
  (f
c2
)  represents  the  corresponding  CDF  (PDF)  of   the  counterfactual   outcome  ln(w
c2
i
  ).
Notice  that   the  dierences   in  the  distributions   of   F
c1
  and  F
1
  (ln(w
c1
i
  )   v.s.   ln(w
f
i
 ))   come
from  dierences  in  wage  structures;  the  comparisons  of  these  two  distributions  thus  provide
insight into potential discrimination.   On the other hand, the dierences in the distributions
of F
c1
  and F
1
  (ln(w
c2
i
  ) v.s.   ln(w
f
i
 )) come solely from dierences in the distribution of human
capital   characteristics;   the  comparisons  thus  provide  some  insight  into  the  gender  gap  due
to  productivity  dierences  across  gender.
As shown in Firpo (2007, Lemma 1), the counterfactual distributions are identied under
the  following  assumptions:
[A1.]   Unconfoundedness/Ignorability:   Let  (D, X, )  have  a  joint  distribution.   For  all
x,     is  independent  of   D  conditional   on  X  =  x,   where,   as  dened  above,   D  =  1  for
females   and  D  =  0  for   males;   X,    are  observable  and  unobservable  human  capital
characteristics,  respectively.
[A2.]   Common  Support:   For  all  x,  0 < p(x) = Pr[D = 1|X  = x] < 1.
The counterfactual outcome CDF of ln(w
c1
i
  ) is identied and F
c1
  = E[
c1
(D, X)  I[(ln(w
i
) 
y)],   where  
c1
(D, X)  =  (
  p(x)
1p(x)
)  (
1D
p
  ).   The  counterfactual   outcome  CDF  of   ln(w
c2
i
  ),   F
c2
,
is  similarly  identied.   In  practice,  the  scorep(x)  is  estimated  by  probit  or  logit.   Here  we
employed  probit.   p  is  the  unconditional   probability  over  the  corresponding  characteristics
X.   Both  assumptions   (A1)   and  (A2)   are  commonly  used  in  the  literature.   Assumption
(A1)  implies  here  that  given  the  values  of  observable  human  capital   characteristics  X,   the
distribution  of  unobservable  human  capital  characteristics  such  as  ability  is  independent  of
gender.   Assumption  (A2)   rules   out   the  possibilities   that   a  particular   value  x  belongs   to
either   male  or   female  and  that   the  set   of   wage  determinants,   (X, )   dier   across   gender.
Interested  readers  are  referred  to  e.g.   Fortin  et  al.  (2011)  for  detailed  explanations  of  these
two  assumptions.   p(x)  is  the  selection  probability  (score)  for  each  indidual.   It  is  estimated
by  a  Logit  model   of   a  set  of   commonly  employed  characteristics  X;   see  below  for  descrip-
14
tions.   Once we identify the counterfactual distributions of interest, we can then perform our
counterfactual  analysis  using  the  approaches  discussed  above.
3.   Data
To  perform  our   analysis,   we  use  data  from  the  1976-2011  March  Current   Population
Survey  (CPS)  (available  at  http://cps.ipums.org,  King  et  al.,  2010).   The  March  CPS  is
a  large  nationally  representative  household  data  that  contain  detailed  information  on  labor
market  outcomes  such  as  earnings  and  other  characteristics  needed  for  our  counterfactual
analysis.   It  thus  has  been  widely  used  in  the  literature  to  study  the  gender  gap  (e.g.   Wald-
fogel   and  Mayer,   2000).   We  begin  at  1976  since  it  was  the  rst  year  that  information  on
weeks  worked  and  hours  worked  are  available  in  the  March  CPS.  We  restrict  our  sample  to
individuals aged between 18 and 64 who work only for wages and salary.   To ensure that our
sample includes only those workers with stronger attachment to the labor market, we include
only  those  who  worked  for  more  than  20  weeks  (inclusive)  in  the  previous  year.   Moreover,
we  exclude  part-time  workers  who  worked  less  than  35  hours  per  week  in  the  previous  year.
Following  the  literature  (e.g.   Blau  and  Kahn,   1997),   we  use  the  log  of   hourly  wages,
measured  by  an  individuals  wage  and  salary  income  for  the  previous  year  divided  by  the
number  of   weeks  worked  and  hours  worked  per  week.   The  dierences  in  the  distributions
of   log  hourly  wages   between  men  and  women  are  our   measures   of   the  gender   gap.   The
dierences in a specic part of the distribution can be interpreted as percentage dierences.
Note, however, that our distributional measure of the gender gap and SD tests are invariant
to increasing monotonic  transformation,  while  conventional measures  of the  gender gap  are.
In  our  counterfactual   analysis,   we  include  age,   age  squared,   education  (four  education
groups:   Below  high  school,   High  School,   1-3  years   of   College,   and  College   and  Above),
current  marital  status  (1  if  non-married  and  zero  otherwise),   race  (1  if  non-white  and  zero
otherwise),  and  region  (northeast,  midwest,  south,  and  west).   We  also  include  occupations
which  are  divided  into  three  categories:   high-skill   (managerial   and  professional   specialty
15
occupations);   medium-skill   (technical,   sales,   and  administrative  support  occupations);   and
low-skill(other  occupations  such  as  helpers,  construction,  and  extractive  occupations).
4.   Results
4.1.   Baseline  Analysis
4.1.1.   Trend  of  the  Gender  Gap  1976  -  2011
Table  (1)  reports  a  number  of  popular  measures  of  the  gender  gap.   Column  (1)  displays
our distributional measure of the gender gap S
. Recall that, S
  is  statistically  signicantly
dierent  from  zero  (it  is  larger  than  the  critical  values  calculated  at  99th  percentiles  of  the
bootstrapped  distribution  of  S
  in  all  cases).
It is important to note a crucial dierence between our broad entropy measure and others
in  terms  of  standardization.   Our  measure  is  invariant  to  the  logarithmic  transformation  of
the earnings series since,  as was stated earlier,  it is invariant to all monotonic (linear or non
linear)  transformations.   There  is  no  need  to  reinterpret  according  to  data  transformations.
This  is  not  so  for  all   the  other  metrics  in  this  table,   since  they  will   change  depending  on
whether one uses the actual earnings series, or their logarithm, or some other transformation.
We   note   that   our   entropy  measure,   being  a  function  of   many  moments   of   the   earnings
distributions,   is   able  to  account   for   increasing  earnings   that   are  accompanied  by  greater
dispersion  (inequality  increasing).   Indeed,   our  entropy  measure  is  based  on  a  generalized
16
version  of  Theils  inequality  measures.
The dierences of the log-earnings at the selected percentiles of the earnings distribution
between  men  and  women  are   consistently  positive,   suggesting  that   men  earn  more   than
women  do.   However,   the  implied  size   of   the  gender  earnings  dierentials  in  the  economy
vary with the conventional measures.   For example, in 1976, the average gender gap measure
at  the  10th  percentile  indicates  the  gap  is  about  37  percentage  points,   while  the  measure
at  the  90th  percentile  implies  that  it  is  more  than  50  percentage  points.   The  dierence  is
as  large  as  13  percentage  points.   The  dierences  at  other  parts  of  the  earnings  distribution
indicate  the  gender  gap  is  between  45  and  47  percentage  points.   Even  though  consistently
suggesting  the  existence  of  the  gender  gap,   none  of  the  conventional  measures  at  a  specic
part  of   the  distribution  seems  to  represent  the  gender  gap  in  the  rest  of   the  distribution.
What is left to be determined, is how to aggregate the dierent levels of the gap at dierent
quantiles.   That is, what welfare function weights to use.   The mean gap uses equal weights
at  all  earnings  levels;  see  our  example/case  3  above.
There   is   a  further   diculty  with  assessment   of   the   conventional   measures   when  one
examines  the  long-run  trend  in  the  gender  gap.   Looking  at  the  trend  from  1976  to  2011,
we  see  a  decrease  in  the  dierence  between  the  earnings  distributions  of   men  and  women
over  the  past  four  decades,   regardless  of   which  measure  is  used.   However,   the  decrease  is
not  monotonic  over  time,   and  the  timing  of   temporal   deviations  from  the  long-run  trend
dramatically  varies  across  dierent  measures  used.   To  ease  the  presentation,  we  report  the
patterns  of   changes  in  dierent  measures  in  Table  (??).   The  cells  with  I  highlighted  in
green are the years when the measure increased, while the cells with D highlighted in light
grey  are  the  years  when  the  measure  decreased.   As   we  can  clearly  see,   the  conventional
measures  of   the  gender  gap  generally  do  not  move  in  the  same  direction  together,   except
in  few  years   (1980,   1984,   1988,   1990,   1997,   and  2004).   For   example,   the  gender   gap  at
the  median  increased  in  1977,   while  the  gender  gap  at  other  selected  parts  of  the  earnings
distributions  between  men  and  women  decreased.   As  a  result,   it  is  not  clear  why  any  of
17
the  conventional   measures   are  representative  of   the  rest   of   the  earnings   distribution  and
informative  of  the  general  trend  of  the  gap  gap  in  the  society.
On the other hand, our entropy measure of the gender gap, S
 indicates
the broad gender gap decreased in 1977, which is consistent with the decrease at all parts but
the  90th  percentile;  S
  and  other  measures  reported  in  Table  (1)  are  not  directly  compa-
rable.   To further ease the comparisons of the patterns of the time trend implied by dierent
measures,  we  normalize  these  measures.   In  particular,  we  rst  set  the  value  of  all  measures
in  1976  to  100  and  generate  normalized  values  based  on  the  original   growth  rates.   These
normalized  values  are  shown  in  Figure  (??).   As  we  can  see,  while  both  the  measures  at  the
10th and 25th percentiles traced out the path of S
Year   Female  v.s.   Male   Female  v.s.   Counterfactual   Female  v.s.   Counterfactual
#1   #2
90
th
95
th
99
th
90
th
95
th
99
th
90
th
95
th
99
th
(1)   (2)   (3)   (4)   (5)   (6)   (7)   (8)   (9)
1976   0.06   0.07   0.08   0.07   0.08   0.10   0.07   0.08   0.10
1977   0.04   0.05   0.05   0.06   0.06   0.07   0.05   0.06   0.06
1978   0.04   0.05   0.05   0.07   0.07   0.08   0.06   0.07   0.08
1979   0.05   0.05   0.06   0.06   0.06   0.07   0.05   0.05   0.07
1980   0.05   0.05   0.06   0.06   0.06   0.07   0.05   0.05   0.07
1981   0.04   0.04   0.04   0.05   0.05   0.06   0.04   0.04   0.05
1982   0.05   0.05   0.06   0.06   0.07   0.08   0.05   0.06   0.06
1983   0.04   0.05   0.05   0.05   0.06   0.07   0.05   0.05   0.06
1984   0.04   0.05   0.05   0.05   0.06   0.07   0.05   0.05   0.06
1985   0.04   0.05   0.05   0.05   0.05   0.06   0.04   0.05   0.06
1986   0.04   0.05   0.05   0.06   0.07   0.08   0.06   0.06   0.07
1987   0.04   0.05   0.06   0.06   0.06   0.07   0.05   0.05   0.06
1988   0.04   0.04   0.05   0.05   0.05   0.06   0.04   0.04   0.05
1989   0.04   0.05   0.05   0.05   0.06   0.07   0.05   0.05   0.06
1990   0.03   0.04   0.05   0.04   0.05   0.06   0.04   0.04   0.05
1991   0.03   0.04   0.04   0.04   0.04   0.05   0.04   0.04   0.05
1992   0.04   0.04   0.05   0.05   0.05   0.06   0.04   0.04   0.05
1993   0.04   0.04   0.05   0.04   0.05   0.06   0.04   0.04   0.05
1994   0.04   0.05   0.05   0.05   0.06   0.07   0.05   0.06   0.07
1995   0.04   0.05   0.05   0.05   0.05   0.05   0.04   0.05   0.05
1996   0.05   0.05   0.06   0.06   0.06   0.07   0.05   0.05   0.06
1997   0.06   0.06   0.07   0.06   0.06   0.07   0.05   0.06   0.07
1998   0.04   0.05   0.06   0.05   0.06   0.07   0.05   0.05   0.07
1999   0.04   0.05   0.06   0.05   0.05   0.06   0.05   0.05   0.06
2000   0.05   0.06   0.07   0.06   0.06   0.07   0.04   0.05   0.06
2001   0.04   0.04   0.04   0.04   0.05   0.05   0.03   0.04   0.04
2002   0.03   0.03   0.03   0.04   0.04   0.04   0.03   0.04   0.04
2003   0.03   0.04   0.04   0.04   0.04   0.05   0.03   0.04   0.04
2004   0.04   0.04   0.04   0.04   0.04   0.05   0.04   0.04   0.05
2005   0.04   0.04   0.04   0.04   0.05   0.05   0.04   0.04   0.05
2006   0.04   0.04   0.04   0.04   0.05   0.05   0.04   0.04   0.05
2007   0.04   0.04   0.04   0.04   0.05   0.05   0.04   0.04   0.05
2008   0.03   0.04   0.04   0.04   0.04   0.05   0.03   0.04   0.04
2009   0.04   0.04   0.05   0.05   0.05   0.06   0.04   0.04   0.05
2010   0.04   0.04   0.05   0.05   0.05   0.06   0.04   0.04   0.05
2011   0.04   0.04   0.04   0.04   0.05   0.05   0.03   0.04   0.04
1
Data Source:   IPUMS CPS (http://cps.ipums.org/cps/).   Columns (1)-(3) report the 90th, 95th, and
99th  percentiles  obtained  under  the  null   of   no  dierence  between  male  and  female  wage  distributions
(100).   Columns   (4)-(6)   report   the  90th,   95th,   and  99th  percentiles   obtained  under   the  null   of   no
dierence  between  female  and  the  counterfactual   wage  #1  distributions   (   (100));   Columns   (7)-(9)
report  the  90th,  95th,  and  99th  percentiles  obtained  under  the  null  of  no  dierence  between  female  and
counterfactual  wage  #2  distributions  (  (100)).
56