0% found this document useful (0 votes)
4 views70 pages

Consumer-Optimal Segmentation in Multi-Product Markets: Dirk Bergemann Tibor Heumann Michael C. Wang December 24, 2024

This document analyzes the impact of market segmentation on consumer welfare in multi-product markets, focusing on how monopolists can utilize both second-degree and third-degree price discrimination. It identifies consumer-optimal segmentation strategies that enhance consumer surplus, highlighting properties such as quality monotonicity and conditions under which no segmentation is optimal. The findings suggest that strategic market segmentation can benefit consumers, informing regulatory policies regarding price discrimination practices.

Uploaded by

s.singhal17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views70 pages

Consumer-Optimal Segmentation in Multi-Product Markets: Dirk Bergemann Tibor Heumann Michael C. Wang December 24, 2024

This document analyzes the impact of market segmentation on consumer welfare in multi-product markets, focusing on how monopolists can utilize both second-degree and third-degree price discrimination. It identifies consumer-optimal segmentation strategies that enhance consumer surplus, highlighting properties such as quality monotonicity and conditions under which no segmentation is optimal. The findings suggest that strategic market segmentation can benefit consumers, informing regulatory policies regarding price discrimination practices.

Uploaded by

s.singhal17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Consumer-Optimal Segmentation in

Multi-Product Markets∗
Dirk Bergemann† Tibor Heumann‡ Michael C. Wang§
arXiv:2401.12366v2 [econ.TH] 23 Dec 2024

December 24, 2024

Abstract

We analyze how market segmentation affects consumer welfare when a monopolist


can engage in both second-degree price discrimination (through product differentiation)
and third-degree price discrimination (through market segmentation). We characterize
the consumer-optimal market segmentation and show that it has several striking prop-
erties: (1) the market segmentation displays monotonicity—higher-value customers
always receive higher quality product than lower-value regardless of their segment and
across any segment; and (2) when aggregate demand elasticity exceeds a threshold de-
termined by marginal costs, no segmentation maximizes consumer surplus. Our results
demonstrate that strategic market segmentation can benefit consumers even when it
enables price discrimination, but these benefits depend critically on demand elastici-
ties and cost structures. The findings have implications for regulatory policy regarding
price discrimination and market segmentation practices.

JEL Classification: D42, D83, L12


Keywords: Price Discrimination, Nonlinear Pricing, Private Information, Second
Degree Price Discrimination, Third Degree Price Discrimination, Pareto Distribution,
Generalized Pareto Distribution, Bayesian Persuasion


An early version of this paper working in a more limited setting appeared under the title “A Unified
Approach to Second and Third Degree Price Discrimination.” We acknowledge financial support from NSF
grants SES-2001208 and SES-2049744. We have benefitted from many conversations and related joint work
with Ben Brooks and Stephen Morris.

Department of Economics, Yale University, dirk.bergemann@yale.edu

Pontificia Universidad Católica de Chile, tibor.heumann@uc.cl
§
Department of Economics, Yale University, michael.wang.mcw75@yale.edu

1
Contents
1 Introduction 3
1.1 Motivation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Setup 8

3 The Binary Value Case 10

4 Consumer-Optimal Segmentations 17
4.1 Segmentation in Regular Markets . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Value of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Two Forms of Persuasion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Properties of the Consumer-Optimal Segmentation . . . . . . . . . . . . . . 26

5 Computing the Value of Segmentation 29


5.1 When No Segmentation is Optimal . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Conditions on Cost Function and Aggregate Market . . . . . . . . . . . . . . 32

6 Isoelastic Cost 34

7 Surplus-Sharing Segmentations 37
7.1 Pareto-Efficient Segmentations . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Surplus-Sharing Frontier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8 Concavification and Extreme Points 40


8.1 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8.2 Local Segmentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

9 Conclusion 44

A Proof Details 47

B Additional Results: Discrete Goods 60


B.1 Consumer-Optimal Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 60
B.2 From Concavification to Extreme Points . . . . . . . . . . . . . . . . . . . . 61
B.3 Unconstrained Consumer Maximization . . . . . . . . . . . . . . . . . . . . . 67

2
1 Introduction
1.1 Motivation and Results
In the digital economy, firms increasingly segment their markets with unprecedented preci-
sion. Digital platforms adjust prices based on customer browsing history, streaming services
offer differentiated subscription tiers, and airlines practice sophisticated yield management.
While such practices traditionally raise concerns about consumer exploitation through price
discrimination, their welfare implications remain ambiguous, particularly when firms can
simultaneously adjust both prices and product qualities across different market segments.
The welfare implications of market segmentation have long interested economists. While
third-degree price discrimination—charging different prices to distinct market segments—
may either benefit or harm consumers, a similar ambiguity exists for second-degree price
discrimination, where sellers screen consumers through quality-differentiated product menus.
However, most research has analyzed these practices in isolation, leaving open the question
of how they interact when deployed simultaneously, as is increasingly common in practice.
This paper provides a comprehensive analysis of consumer-optimal market segmentation
when a monopolist can employ both forms of price discrimination. We characterize the
segmentation strategies that maximize consumer welfare and show that they exhibit several
surprising properties. First, despite the monopolist’s ability to offer different qualities to
identical consumers across segments, the optimal segmentation maintains uniform quality
provision—consumers with the same value receive similar, and frequently the same quality
level regardless of their segment, though they may pay different prices. Second, quality
monotonicity is preserved both within and across segments—higher-value consumers always
receive higher quality products. Third, when aggregate demand is sufficiently elastic relative
to the seller’s cost structure, no segmentation maximizes consumer surplus.
These findings significantly extend the work of Bergemann et al. (2015), who showed
that an appropriate market segmentation can generate efficient allocation and maximize
consumer surplus in unit demand settings. We demonstrate that with multiple products and
quality differentiation, perfect efficiency is generally unattainable, but strategic segmentation
can still substantially benefit consumers. Our analysis also complements recent work by
Haghpanah and Siegel (2022), Haghpanah and Siegel (2023) by fully characterizing the
consumer-optimal segmentation and identifying precise conditions under which segmentation
improves consumer welfare.
The results have important implications for competition policy and regulation of price
discrimination practices. They suggest that blanket restrictions on market segmentation
may harm consumers by preventing welfare-enhancing price discrimination, while highlight-

3
ing specific market conditions where segmentation is more likely to be beneficial. The
findings also inform ongoing debates about big data and personalized pricing by showing
how consumer heterogeneity and cost structures interact to determine optimal segmentation
strategies.
We consider a seller who can engage simultaneously in second- and third-degree price
discrimination. Our model consists of a monopolist that offers goods of varying quality to a
continuum of buyers. The willingness-to-pay, the value is private information to each buyer
and the seller only knows the distribution of values, which we refer to as the aggregate
market. The seller may segment the market into submarkets, each with its own distribution
of values subject only to the condition that the distribution of values across all submarkets
must conform to the aggregate market. The monopolist offers an optimal pricing scheme in
each submarket.
Before we proceed with describing our results it is convenient to briefly discuss how
second- and third-degree price discrimination interact in our model. In a recent contribution,
Haghpanah and Siegel (2022) showed already that it is impossible to implement the socially
efficient surplus and allow the buyers to appropriate the gains from segmentation. Hence,
there will be an inevitable trade-off between consumer surplus and social surplus. The
intuition is that in the presence of second-degree price discrimination the monopolist will
supply an inefficient quality, unless she can perfectly distinguish between buyers. When
there is a single indivisible good for sale, this trade-off does not appear because it is suffices
for the segmentation to induce the seller to not exclude any buyers, but there is no room for
an inefficient quality supply.
We provide two types of results. First we describe the consumer surplus that can be
attained by the consumer-optimal segmentation. Second, we provide properties of the mar-
kets that conform with this consumer-optimal segmentation. While our analysis focuses on
the consumer-optimal segmentation, the analysis extends in a straightforward manner to
situations in which the objective of interest is a linear combination of consumer surplus and
profits, hence, the Pareto frontier that can be attained by any segmentation.
Our first main result characterizes the consumer surplus attained by the consumer-
optimal segmentation (Theorem 1). In the adverse selection problem the consumer sur-
plus corresponds to the buyers’ information rents, and we can identify the contribution of
each value to the consumer surplus by the product of inverse hazard rate and the marginal
allocation and then take the sum (integral) over all values. We provide a convenient repre-
sentation of this contribution as a function that depends only on the inverse hazard rate, we
refer to this function as the local informational rent. Theorem 1 shows that the consumer
surplus attained by the consumer-optimal segmentation is computed by twice modifying this

4
formula. First, by taking the concavification of the local informational rents. Second, the
local informational rents are not evaluated at the hazard rate of the aggregate market, but
instead a distribution over hazard rates is found by solving a maximization problem over
(quasi-)distributions that are majorized by the aggregate market. Interestingly, while the
original problem consists of a maximization over segmentations, which are distributions over
distributions of values, Theorem 1 yields a maximization over a single distribution of val-
ues. So Theorem 1 provides a much more tractable problem to find the consumer surplus
generated by the consumer-optimal segmentation relative to the original problem.
Theorem 1 allows us to find necessary and sufficient conditions for the optimal segmenta-
tion to be no segmentation (see Proposition 4). We find that in many markets the allocation
is inefficient but nonetheless any segmentation would (weakly) decrease consumer surplus
even further. We also find that in many situations the distribution that solves the maxi-
mization problem is the aggregate market. For example, when the marginal cost is not too
concave and the aggregate market satisfies the monotone hazard rate condition (Proposition
5). In these cases, the value is computed by taking the expectation of the concavification of
the local information rents, (evaluated at the inverse hazard rates of the aggregate market
as in the case without segmentation). Because of the concavification, the aggregated market
itself however is not the consumer-optimal solution.
The second set of results relate to the properties of the markets that constitute the
consumer-optimal segmentation. The consumer-optimal segmentation is shown to sort the
consumer monotonically across all segments, see Proposition 3: (i) buyers with a given value
may be offered different qualities across different segments, but (ii) the qualities all fall within
a narrow bracket and these brackets are ordered monotonically and without overlap. Thus, a
buyer with a higher willingness to pay will always receive a higher quality than a buyer with
a lower willingness to pay, independent and across all possible market segments they find
themselves in. Note that the consumption must be monotonically increasing in the value
within a market due to the incentive compatibility constraint; monotonicity across markets
arises only as a part of the optimization over segments.
While there might be dispersion in the quality consumed by a given value across different
segments, this dispersion must be small, and it becomes negligible as the number of different
values becomes large (Corollary 1). Hence, while not satisfied exactly, the consumer surplus
introduces minimal dispersion in the quality consumed by any given value across segments.
We can also find a lower bound on the quality consumed by any value, independent of the
distribution of values in the aggregate market (Corollary 2).
After providing our general results, we focus on environments when the seller has a con-
stant elasticity cost function. In this case, we find that the consumer-optimal segmentation

5
has a particularly tractable form. There is a cutoff value that determines the demand elastic-
ity in these markets. Demand elasticity of all market segments at values above this cutoff is
the same elasticity as the aggregate market; demand elasticity of all the market segments at
values below this cutoff have a constant elasticity determined by the cost elasticity. As the
cost becomes more inelastic, the demand becomes more elastic. As the cost elasticity goes to
infinity, we obtain the special case that the seller offers an indivisible good and the demand
elasticity is unitary below the cutoff, thus recovering the consumer-optimal segmentation in
Bergemann et al. (2015). Away from the limit, the optimal segmentation generates more
inelastic demands, which increases the supply of the seller.
Finally, we move to analyze the effects of second and third degree price discrimination
more broadly, not just in the consumer optimal segmentation. We extend the earlier analysis
of the value of segmentation and establish that a weighted Bayesian persuasion problem can
attain every point on the constrained efficient Pareto frontier of consumer and producer
surplus (Theorem 2).
From a methodological perspective, our work provides novel insights into the study of
third-degree price discrimination. While it has been acknowledged that the problem of third-
degree price discrimination can be seen as a problem of Bayesian persuasion, previous work
on third-degree price discrimination has largely not relied on concavification techniques.
The reason for this is that the state space is the space of all demands, which typically
has a large dimensionality (infinite dimensional when values are continuous), which in turn
makes the concavification technique more difficult to apply. We show that one can in fact
apply the concavification pointwise value-by-value, which allow us to use classic intuitions
from one-dimensional persuasion problems. This allows us to characterize the value of the
consumer-optimal segmentation and provide properties of all consumer-optimal segmenta-
tion. In Section 8, we then proceed to link the concavification argument to the arguments
used in the literature and discuss our results in more detail.

1.2 Related Literature


Our results are related to a large literature on price discrimination, beginning with Pigou
(1920) and now encompassing a wide range of research on the output and welfare implications
of price discrimination, such as Robinson (1933); Schmalensee (1981); Varian (1985); Aguirre
et al. (2010); Cowan (2012) and Bergemann et al. (2015). We use the classic model of second-
degree price discrimination via quality and quantity differentiated products first presented by
Mussa and Rosen (1978) and Maskin and Riley (1984). Johnson and Myatt (2003) consider
a model of second degree price discrimination under monopoly and duopoly and provide

6
conditions under which the monopolist will offer a single good.
The problem of extending the results of Bergemann et al. (2015) to a multi-product
setting was analyzed earlier by Haghpanah and Siegel (2022) and Haghpanah and Siegel
(2023). Haghpanah and Siegel (2022) show that when the optimal menu in the aggregate
market consists of a menu of more than one item, thus a screening menu, then the consumer
surplus maximizing allocation cannot attain the Pareto frontier and hence the full surplus
triangle cannot be obtained (Theorem 1). Based on this insight, they provide a sufficient
condition when the full surplus triangle will be attained, namely when all distribution over
the values lead to the efficient single item menu (Theorem 2). By contrast, we provide
necessary and sufficient conditions for there to be a segmentation that improves consumer
surplus and provide the value of the consumer surplus maximizing allocation, whether it is
efficient or not. Haghpanah and Siegel (2023) show that in generic markets there is always
a segmentation relative to the single aggregate market that improves consumer surplus,
however they do not provide properties of the consumer-optimal segmentation. In contrast
to Haghpanah and Siegel (2023) (and (2022)), who work with a finite number of products,
we allow for a continuum of qualities. Therefore, the results in Haghpanah and Siegel (2023)
do not apply to our setting, and we find a large class of markets in which no segmentation is
optimal for consumers. We completely solve for the consumer-optimal segmentation under
continuous quality. Additionally, we derive some features of all Pareto efficient segmentations
under both continuous and discrete quality. Our results also allow us to explicitly describe
what the generic markets of Haghpanah and Siegel (2023) are, and to provide a sufficient
condition under which their result fails to go through in the continuous quality limit.
Our characterization of extremal markets in the discrete quality case yields a family of
distributions which solve the multi-unit generalization of the consumer surplus maximization
problem considered by Condorelli and Szentes (2020). This problem and the family of
distributions which underpin it are also related to the work of Roesler and Szentes (2017).
The organization of this paper is as follows. Section 2 introduces our model of non-
linear pricing with market segmentation, thus integrating second and third degree price
discrimination. Next, in Section 3, we characterize the consumer-optimal segmentation in
a simple binary value environment, which nonetheless illustrates the main concepts we will
use throughout the paper. Section 4 extends the solution to completely general conditions.
In Section 5, we discuss what the consumer-optimal segmentation looks like in more detail
for selected environments of economic interest. Section 6 consider the environment with
constant elasticity cost function and obtains explicit solution of the optimal segmentation in
terms of the demand elasticity in the segments. Section 7 extends our analysis to all Pareto
efficient segmentations, and we provide a partial solution for the frontier of achievable surplus

7
divisions. Section 8 discusses the relationship between the concavification approach pursued
here and the analysis of extreme points in Bergemann et al. (2015), and Section 9 concludes.
Appendix A contains omitted proof details. Appendix B links our analysis which is based
on concavification with the previous literature on the single unit demand.

2 Setup
Payoffs and Pricing There is a monopolist and a continuum of consumers. The monop-
olist can produce a vertically differentiated good with quality

q ∈ R+ .

The cost of producing a good of quality q is given by an increasing and convex function
c : R+ → R+ . The monopolist posts a menu of prices p(q) : R+ → R+ which specifies a price
p (q) for each offered quality q.
The consumer’s gross surplus is their value v multiplied by the quality of the product q.
The consumer purchases the quality which maximizes their net utility:
 
U (v, p) ≜ max vq − p(q) , (1)
q

and the corresponding quality choice is denoted by:


 
q(v) ≜ arg max vq − p(q) .

If multiple qualities maximize the utility of the buyer, ties are broken in favor of the seller.
The profit of the seller from a buyer with value v and menu p (q) is:

Π(v, p) ≜ p(q(v)) − c(q(v)).

The seller does not know the value of any given buyer, but knows the distribution of values
in a given market, as we explain next.

Markets A buyer’s value is their private information (type). A market x ∈ ∆V is a


probability distribution over values V that assigns probability x(v) to v ∈ V . The values are

8
drawn from a finite set:1
V = {v1 , . . . , vk , . . . , vK } ⊂ R+ . (2)

We denote by Dx (v) the demand function associated with market x:


X
Dx (v) ≜ x(w). (3)
w≥v

As there is a bijection between a market x and its demand Dx , we frequently identify a


market with its demand function. We denote by x∗ ∈ ∆(V ) the aggregate market, which is
the distribution of values of all buyers present in the economy.
In a given market x, the profit-maximizing menu px in market x solves:

K
X
x
p ∈ arg max x(vk )Π(vk , p). (4)
p(q) k=1

If there are multiple optimal price menus, we select the one which results in the highest
consumer surplus. The aggregate consumer surplus in market x is given by:

K
X
U (x) ≜ x(vk )U (vk , px ). (5)
k=1

Segmentations The goal of this paper is to understand how profit and consumer surplus
vary when consumers are divided into different submarkets, and the seller prices optimally
within each one. That is, the seller engages in both second and third degree price discrim-
ination simultaneously. Segments may be arbitrarily constructed, subject to the constraint
that they aggregate together to the original market.
A segmentation σ is a finite distribution over markets σ ∈ ∆(∆V ) such that
X
σ(x)x = x∗ ,
x∈supp(σ)

where σ(x) is the probability of market x and supp(σ) is the support of σ, that is, the set
of markets that have positive probability in this segmentation. We focus in particular on
the consumer surplus maximization problem, which consists of finding the segmentation of
1
The restriction to finite V is to ease the exposition technically; all results generalize naturally in the
limit where V approaches a continuum.

9
x∗ that generates the highest consumer surplus. Formally, we wish to solve:
" #
X X
max σ(x)U (x) subject to σ(x)x = x∗ . (6)
σ∈∆(∆V )
x∈∆V x∈supp(σ)

We are interested in the value of this problem, as well as the segmentation that attains the
maximum. We are first going to study the consumer-optimal segmentation and then study
other segmentations that induce different surplus sharing between consumers and monopolist.

Notation Before we begin our analysis, we make some simplifications to our notation, and
we explain how the notation is structured. Throughout, all subscripts refer to values, while
superscripts refer to markets (for example, xk ≜ x(vk ) is the probability of value vk , while
px is the optimal menu in market x and Dx is the demand of market x). To make notation
more compact, a superscript “∗” refers to the aggregate market x∗ (so that p∗ is the optimal
price in the aggregate market x∗ ). Finally, all distribution over markets σ ∈ ∆(∆V ) are
assumed to satisfy the constraint in (6), so in later parts of the paper we write the problem
of the consumer-optimal segmentation without these constraints.

3 The Binary Value Case


In this section, we characterize the consumer-optimal segmentation in a simple environment
with binary values and constant elasticity cost functions. The problem in this environment
is already sufficiently rich and allows us to introduce the central concepts and arguments
that will lead to the solution of the general environment described next in Section 4.
Throughout this section, we suppose that there are only two values 0 < vL < vH < ∞,
which occur in the aggregate market with probability x∗L and complementary probability
x∗H = 1 − x∗L . The cost function is given by


c(q) = ,
γ

and thus isoelastic with cost elasticity γ > 1. We denote the inverse of the marginal cost by
Q, provided that the argument value is positive:

1
Q(v) ≜ I [v ≥ 0] c′−1 (v) = I [v ≥ 0] v γ−1 . (7)

We refer to Q as the supply function, since Q(v) is the quality that the monopolist would sell
to a buyer of value v if the monopolist were to offer an efficient pricing scheme. The supply

10
function takes the value 0 if the value is negative, which explains the indicator function
I [v ≥ 0] in the definition.

Roadmap The analysis proceeds as follows. We first provide the profit-maximizing pricing
in a given market and derive the corresponding consumer surplus; this corresponds to the
analysis found in Mussa and Rosen (1978) (focused on our binary-value setting). We then
introduce a discrete version of the inverse hazard and show how to compute its distribution
in any segmentation. The consumer surplus can be written completely in terms of the inverse
hazard rate of the low value, and show that the consumer surplus is captured by the local
informational rent. Next, we show that a bound on the consumer surplus attained by the
consumer-optimal segmentation can be obtained by a persuasion problem where the planner
chooses distributions over inverse hazard rates; it turns out that there are segmentations
that can attain the same value as the persuasion problem. We conclude by analyzing how
the consumer-optimal segmentation changes with the cost elasticity.

Optimal Screening and Consumer Surplus We consider the optimal menu offered by
a profit-maximizing seller. This is classic problem analyzed by Mussa and Rosen (1978).
The optimal allocation for a buyer with value v is determined by the first-order condition
that balances virtual utility and cost. In the special case of binary values, and for any given
market x ∈ ∆{vL , vH }, the quality offered to the low value buyer is:

xH
vL − (vH − vL ) = c′ (qL ). (8)
xL

As there is no distortion at the top, the high value buyer receives the efficient level:

vH = c′ (qH ) .

The optimality condition (8) for the low value buyer, which states the marginal virtual utility
is equal to the marginal cost, is the representative condition as we move to the many value
environment. In the setting with finitely many values, we refer to the product term

xH 1 − xL
hx ≜ (vH − vL ) = (vH − vL ) (9)
xL xL

as the inverse hazard rate. It is the product of the increment between to adjacent values,
and the ratio of upper cumulative probability and point probability. With binary values, the
inverse hazard rate hx of vL is determined uniquely by the market x, that is, the probability

11
xL (and xH = 1 − xL ). We denote the hazard rate in the aggregate market by h∗ :

x∗H
h∗ ≜ (vH − vL ) .
x∗L

The profit-maximizing allocation for a buyer with value v is determined by the first-order
condition that balances virtual utility and cost:

Dx
 
qL = Q vL − (vH − vL ) H . (10)
xL

If the term inside function Q is negative, then the quality offered to the low value is zero
x
(see (7)). We recall that in our binary value environment DH = xH ; our notation choices
are meant to make the analogies to the general case of multiple values more salient. Now,
the consumer surplus in market x is earned by the high value vH and is equal to:

U (x) = xH (vH − vL )qL .

This is the informational rent arising from the fact that vH could pay vL for quality qL , which
generates a surplus (for a buyer of value vH ) equal to (vH − vL )qL .

Distribution Over Inverse Hazard Rates The inverse hazard rate hx also represents
the information rent that the buyer receives from any marginal unit of quality sold to the
buyer. From the point of view of the seller, it therefore represents the virtual cost of selling to
the low value buyer. Hence, the optimality condition (10) condition states that the marginal
virtual utility (vL − hx ) is equal to the marginal cost c′ (qL ). We now show that for any
segmentation σ, we can construct the corresponding distribution over hazard rates.
Consider any segmentation σ of the aggregate market x∗ and denote its (finite) support
by supp(σ). With only two values, the feasibility constraint of the segmentation can be
written simply in terms of the probability of the low value:
X
σ(x)xL = x∗L .
x∈supp(σ)

We define for every x the weight that the market segment x has in the segmentation σ as:

σ(x)xL
λx ≜ ∈ [0, 1] . (11)
x∗L

12
Note that:
X X
λx = 1 and λx hx = h∗ , (12)
x∈supp(σ) x∈supp(σ)

where we the equalities follow from the definitions of these objects. Hence, any segmentation
σ induces a set of weights λx that are interpreted as a distribution over inverse hazard rates.
What is remarkable is that the average hazard rate must equal the hazard rate in the
aggregate market (second equation in (12)). While this last property is not general when
there are many values, we will find an appropriate way to find the average hazard rate across
all markets in a consumer-optimal segmentation.

Consumer Surplus and Local Informational Rents We will now show that we can
rewrite the information rent entirely in terms of the hazard rate. We define a local informa-
tion rent:
uL (h) ≜ h · Q(vL − h), (13)

where the defining variable is now the inverse hazard rate h rather than the probability x.
With this, we can write the consumer surplus generated by any segmentation σ as:
X X
σ(x)U (x) = λx uL (hx ) . (14)
x∈supp(σ) x∈supp(σ)

We thus have that the consumer surplus generated by any segmentation is the expected
value of uL (h).
We refer to uL (h) as the local information rent as it is the rent that all values above
the local type vL receive due the allocation to the low value buyer vL . More generally, we
will define a local information rent later for all intermediate values below the highest value
buyer. The shape of the local informational rents is informative as to what kind of markets
maximize consumer surplus. Given the isoelastic costs, uL (h) is concave whenever h ≤ vL
and attains a unique maximum at
γ−1
h= vL . (15)
γ
Figure 1 illustrates the behavior of the local information rent uL (h) associated with changes
in the inverse hazard rate h. A high h tends to lower the information rent because the
seller reduces the supply to the low value consumers, eventually excluding them altogether.
On the other hand, when h becomes too small, then there are too few high value buyers
to benefit from the informational rents generated by the low values. Hence, the pointwise
surplus uL (h) is maximized at some interior value of h.

13
uL (h)
uL (h)

h
γ−1 vL
γ vL

Figure 1: Local information rent uL (h) as a function of the hazard rate h.

The Segmentation Problem as Bayesian Persuasion We now transform the problem


of determining an optimal segmentation into a Bayesian persuasion problem based on the
local information rent as expressed by (14) and Bayes plausible distribution over inverse
hazard rates h (see (12)). That is, we consider the Bayesian persuasion problem where we
directly choose a distribution µ over hazard rates h:
 
X X
max x∗L µ(h)uL (h) subject to µ(h)h = h∗ . (16)
µ∈∆R+
h∈supp(µ) h∈supp(µ)

Our analysis implies that every segmentation σ corresponds to a particular choice of µ, where
the weight placed on each hazard rate is µ(hx ) = λx .

Lemma 1 (Segmentation as One-Dimensional Bayesian Persuasion)


The consumer-optimal segmentation generates at most value (16). Furthermore, if a seg-
mentation σ maximizes consumer surplus if, for every x ∈ supp(σ),

λx = µ(hx )

for a distribution µ ∈ ∆R+ that solves (16).

We appeal to the standard Bayesian persuasion analysis to find the solution of (16). In
particular, denote by u the concavification of u, that is, it is the smallest concave function
that is pointwise larger than u. Using the analysis of the information rent following (13),

14
the concavification of u is:

u(h) γ−1
if h < γ
vL ;
u(h) ≜   (17)
u γ−1 vL if h ≥ γ−1
vL .
γ γ

In Figure 1, u(h) is plotted as the solid red curve.


Following Kamenica and Gentzkow (2011), the solution to the Bayesian persuasion prob-
lem (16) is the value of u(h∗ ), namely the value of the concave envelope u at the inverse
hazard rate of the aggregate market h∗ . As the definition of u(h) suggests, if

γ−1
h∗ ≤ vL , (18)
γ

the optimal value is attained at µ(h∗ ) = 1. By contrast, if (18) is not satisfied, then the
maximum is achieved with a binary distribution µ supported on b h, e
h:

γ−1
h=
b vL , h = ∞.
e (19)
γ

h = ∞ will mean that value vL is not present in the second market.


Here e

From Persuasion to Segmentations We know that every segmentation corresponds


to a particular choice of distribution µ over hazard rates in (16). The key question is if
the converse holds—given the solution µ to (16), does there exist a segmentation σ which
achieves it? The answer is yes, and we can construct this segmentation explicitly.

Proposition 1 (Optimal Segmentation—Binary Values)


The consumer-optimal segmentation σ attains the concavification bound of (16) with equality:
X
σ(x)U (x) = x∗L uL (h∗ ).
x∈supp(σ)

We can explicitly describe the (unique) consumer-optimal segmentation. The aggregate


market x∗ itself is the consumer-optimal segmentation if and only if (18) is satisfied. Oth-
erwise, if (18) is not satisfied, the consumer-optimal segmentation is a binary segmentation
supported on markets x b, x
e:

vH − vL
x
bL = , x
eL = 0.
vH − vL + b
h

Naturally, the complements are given by xH = 1 − xL .

15
b is constructed so that hxb = b
Notice that x h, while x
e has the value vL completely removed
from its support, which we can think of as achieving hxe = eh = ∞. Thus, hx exactly matches
the values of h in the support of µ. Since both σ and µ are binary, and every segmentation
σ induces a feasible µ, it follows that the λx induced by this segmentation match µ(hx ).

Optimal Segmentation and Cost Elasticity We can also consider how the consumer-
optimal segmentation varies with γ. As γ → ∞, the model converges to a seller supplying
an indivisible at cost 0: in this limit, the cost of supplying an infinitesimally small first unit
is 0, and anything above that is infinitely costly. That is, in the limit we recover the same
unit demand cost structure as Bergemann et al. (2015). The consumer-optimal segmentation
given by Proposition 1 is as follows. If, in the aggregate market, the good is supplied to
both consumers, then the aggregate market is efficient and this is the consumer-optimal
segmentation. Instead, if the low type is excluded, the consumer-optimal segmentation
generates two segments. In one of the segments the seller is left indifferent between supplying
and not supplying the good to the low value. In the other market, the low value is not present,
so the seller extract the full surplus from the high values.
Figure 2 illustrates this by plotting uL (h) for increasing values of γ. In the limit, uL (h)
is maximized at h = vL , and hence the inverse hazard rate is exactly the same as the value.
So, if in the aggregate market h∗L > vL , then the market is segmented to create a segment in
which the seller is exactly indifferent between supplying and not supplying the good.

γ=2
γ=4
γ=∞
vL (h)

h
vL

Figure 2: The local information rent uL (h) as a function of the cost elasticity γ.

When we study the model away from the limit, with γ < ∞, the problem becomes
more subtle. As we saw in Proposition 1, the consumer-optimal segmentation does not
lead to a socially efficient allocation for γ < ∞. In consequence, the consumer optimal
segmentation balances allocative inefficiencies with informational rents. In particular, the

16
frequency of low values has to be high enough to induce the producer to sell to low values,
generating informational rents, but low enough that there are enough high values to benefit
from these rents. The maximum of uL (h) gives the hazard rate at which this trade-off is
exactly balanced. Since this maximum is interior, this means the resulting allocation is
inefficient. Note that the shape of uL (h) does not depend on the aggregate market.

4 Consumer-Optimal Segmentations
In the previous section we consider a binary value model and showed that the consumer-
optimal segmentation is equivalent to a one-dimensional Bayesian persuasion problem. With
two values, this is not too surprising, as the market composition is described by a one-
dimensional parameter. In this section, we show that, surprisingly, a similar logic carries
over to the many value environment with general costs.
Specifically, we show that the problem of finding a consumer-optimal segmentation can
be transformed into a series of K − 1 one-dimensional Bayesian persuasion problems linked
together by a single aggregate feasibility constraint. This is a significant simplification of
the original maximization problem (6), which is a Bayesian persuasion problem over the
(K − 1)-dimensional simplex, and hence an infinite-dimensional optimization problem. The
feasibility constraint captures which distributions of (average) hazard rates are feasible under
some segmentation.
An additional hurdle relative to the previous section is the possibility of support gaps,
meaning some market segments may not include every element of V in their support. These
support gaps not only complicate the calculation of consumer surplus, but also allow the
average hazard rate at any given value vk to be higher or lower than it is in the aggregate
market. This is in contrast to the binary value case, where the average hazard rate had
to equal the aggregate hazard rate exactly. Our main challenge is then to write down an
aggregate constraint which exactly pins down what sequences of hazard rates, and hence
payoffs, are feasible.

4.1 Segmentation in Regular Markets


For any market x and any value vk present in that market, xk > 0, we define the gap at vk ,
denoted by ∆xk , as the distance between vk and the next highest value that is present in this
market:
∆xk ≜ min v ∈ supp(x) | v > vk − vk .

(20)

17
For completeness, if vk is the maximum value present in market x, we define ∆xk = 0. For
any value vk that is present in market x, we define the discrete virtual value
x
x Dk+1
ϕxk ≜ vk − ∆k . (21)
xk

A market is regular if ϕxk is nondecreasing for all vk ∈ supp(x).


As in Section 3, we denote by Q the inverse of the marginal cost function:

Q(ϕ) ≜ I[ϕ ≥ 0](c′ )−1 (ϕ). (22)

In regular markets, the quality supplied to every value is the supply function evaluated at
the virtual value ϕ. If the virtual value is negative, then the good is not supplied, and hence
the indicator function I[ϕ ≥ 0]. qkx denotes the quality that buyer with value vk consumes in
market x under the profit-maximizing menu.

Lemma 2 (Supply in Regular Markets)


In a regular market x, the profit-maximizing menu supplies

qkx = Q(ϕxk ), for every value vk .

We can express the consumer surplus in terms of the inverse hazard rates as in the
previous section. For this, we define the inverse hazard rate at every value vk and a given
market x:
Dx
hxk ≜ ∆xk k+1 , (23)
xk
and the local information rent for every value vk :

uk (h) ≜ h · Q(vk − h). (24)

This extends the notion of inverse hazard rate and local information rent that we encoun-
tered in the previous section in (9) and (13) from the binary to any arbitrary finite value
environment. We note that the local information rent depends only on the local value vk and
the inverse hazard rate h = hxk but not on the entire demand Dx (·). We can now express
the consumer surplus in a compact way.

Lemma 3 (Consumer Surplus in Regular Markets)

18
In every regular market x, the consumer surplus is given by:

K−1
X
U (x) = xk uk (hxk ) . (25)
k=1

We thus represent the consumer surplus in a given regular market x by the weighted sum
over the local information rents uk (hxk ). That is, by appropriately extending the definition
of the local information rent u we can write the consumer surplus in regular markets the
same way as when the space of value had only two elements as in (14).
The expression of consumer surplus (25) above only holds for regular markets (possibly
with gaps in the support). This restriction turns out to be without loss.

Lemma 4 (Consumer Surplus in Irregular Markets)


Every market x (possibly irregular) can be segmented into regular markets such that the seller
sets the same prices for every good as in the original market x.

Hence, it is without loss of generality to restrict to segmentations which are supported


only on regular markets.

4.2 Value of Segmentation


One of the main challenges in finding a consumer-optimal segmentation is that the variable
of the optimization problem is a high-dimensional object: one needs to optimize over dis-
tributions over markets, but markets are in itself distributions over values. We will show
that the main properties of the consumer-optimal distribution can be identified by solving a
maximization problem over a single “quasi-market.”
∗ ∗
For any vector D ∈ RK+ , we write D ≺ D (and say that D is majorized by D ) if:

K−1
X K−1
X

(vi+1 − vi )Di+1 ≤ (vi+1 − vi )Di+1 for all 1 ≤ k ≤ K − 1. (26)
i=k i=k

We do not require D to be decreasing in vi , so in this sense we refer to D as a “quasi-market”.


If D were restricted to be decreasing, ≺ is exactly the weak majorization constraint studied
in, for example, Kleiner et al. (2021). That is, (26) would be equivalent to D being a mean
preserving spread of D∗ . In the expressions we will frequently have a normalized version of
the quasi-market, so to make the notation more compact, we define:

Dk+1
hD
k ≜ (vk+1 − vk ) . (27)
x∗k

19
It is useful to optimize over D as it has a close interpretation with a classic order on distribu-
tions, with hD being the inverse hazard rate associated with a particular market. However,
note that hD is not computed as a standard hazard rate as in the denominator is the probabil-
ity of vk in the aggregate market and not the one implied by D (which would be Dk − Dk−1 ).

Of course, when D = D∗ , then hD is indeed the hazard rate of the aggregate market:

∗ ∗
hD
k = hk .

As in Section 3, we denote the concavification of local information rent uk (h) by uk (h). We


now present an upper bound on the consumer surplus that can be attained by any segmen-
tation σ in terms of a concavification bound expressed for the aggregate market represented
by D∗ . The upper bound is formed by a maximization problem over the concavified local
information rents uk (h):
K−1
X
x∗k uk hD

max∗ k . (28)
D≺D
k=1

We then show that this upper bound can indeed be obtained by the consumer-optimal
segmentation σ and that the concavified maximization problem allows us to construct the
optimal segmentation σ. For a regular single market, say x∗ , it is clear that the above
expression is an upper bound. After all, by Lemma 3 we have that

K−1
X K−1
X
U (x) = x∗k u (h∗k ) ≤ x∗k uk (h∗k ) ,
k=1 k=1

since the concavified local information rent uk (h) is weakly higher than the local information
rent u (h) everywhere. But the relaxation offered in (29) goes further by allowing a maxi-
mization over all majorized vectors D rather than an evaluation just at D∗ . Nonetheless, we
will show that the optimal segmentation can attain this twice relaxed upper bound.

Theorem 1 (Value of Segmentation)


The consumer-optimal segmentation σ attains the concavification bound:

X K−1
X
x∗k uk hD

max σ(x)U (x) = max∗ k . (29)
σ∈∆(∆V ) D≺D
x∈supp(σ) k=1

The theorem provides an expression for the consumer surplus in an optimal segmentation
in terms of a maximization problem over a quasi-market D. We described before the state-
ment of the theorem in which sense the concavification bound clearly represents a relaxed
problem for a single market. The intriguing part of the bound is that it is stated in terms of

20
a single quasi-market D while the consumer surplus on the LHS is obtained by a distribution
σ (x) over many markets.
The optimization problem on the RHS consists of a concave objective function, as it is the
weighted sum of concavified functions, and a set of linear constraints. Thus, it can be solved
using standard techniques. The maximization problem states that the value of the consumer
optimal segmentation can be attained by the expectation over K −1 separate concavification
problems, each denoted by uk (h) for all k = 1, ..., K − 1. The solution can be decomposed
into K − 1 local information rent problems because with regular distributions, the allocation
problem for each value vk in each market segment x can be solved independently of all the
other values. This requires the earlier result of Lemma 4 that allows us to focus on regular
markets.
When D∗ solves (29), then the consumer-optimal segmentation is obtained by taking the
expected value of u(h∗k ). This is precisely the same result we had for two values in Section 3,
except we now take the expectation value by value. In Section 5 we provide conditions for D∗
to be a solution of (29) (in fact, we show there is a large class of models for which this is the
case). However, in general, we obtain the value of the consumer-optimal segmentation not
by taking the expectation using the hazard rates in the aggregate market but by computing
an optimal hazard rate. We next discuss the proof of this theorem and in the following
section provide more intuition about the value of the consumer-optimal segmentation when
D∗ is not a solution to (29).
The proof of Theorem 1 proceeds in two major steps. First, we establish that (29) is
an upper bound. Second, we show that the bound is tight by explicitly constructing a
segmentation which achieves the bound. We provide the proof of the first step next as it
illustrates how the concavification and the majorization constraints appear in the analysis.
The second step of the proof is relegated to the Appendix. Instead, in the next subsection
we provide some intuition for the structure of the optimal segmentation.

Proof (Upper Bound). We first show that the right-hand-side of (29) is an upper bound for
the consumer surplus attained by the optimal segmentation. Before we proceed with the
proof, we redefine the inverse hazard rate as follows:
 x
∆x · Dk+1
k xk
if xk > 0;
hxk =
v if xk = 0.
k

This will prevent us from having inverse hazard rates that are indeterminate, and since the
supply is 0 regardless of whether the inverse hazard rate is infinite or vk both definitions lead

21
to the same analysis. Following (25), we get that
"K−1 #  
X X X K−1
X X
σ(x)U (x) = σ(x) xk uk (hxk ) =  σ(x)xk uk (hxk ) .
x∈supp(σ) x∈supp(σ) k=1 k=1 x∈supp(σ)

We now define:
σ(x)xk
λxk ≜ ,
x∗k
so we can rewrite the above as
K−1
" #
X X X
σ(x)U (x) = x∗k λxk uk (hxk ) .
x∈supp(σ) k=1 x∈supp(σ)

Note that:
X 1 X
λxk = σ(x)xk = 1,
x∗k
x∈supp(σ) x∈supp(σ)

so the weights λxk together form a distribution over hazard rates. We correspondingly define
the average hazard rate over σ at value vk by
X
hσk ≜ λxk hxk .
x∈supp(σ)

Following Jensen’s inequality and the fact that u(h) ≤ u(h) for all h, we have:

X K−1
X
σ(x)U (x) ≤ x∗k uk (hσk ). (30)
x∈supp(σ) k=1

We can thus provide an upper bound on consumer surplus using the concavification of u and
the average hazard rate.
The presence of support gaps introduces the possibility that the average hazard rates
hk ̸= h∗k , unlike in Section 3. We need to characterize the space of feasible sequences
σ

hσk . Construct a “quasi-market” Dσ which would be consistent with hσk for a full support
distribution:
σ x∗k hσk
Dk+1 ≜ .
vk+1 − vk
Again, Dσ is not a true demand function because it is not necessarily monotone.
An important property of Dσ is that:

Dσ ≺ D∗ . (31)

22
We now prove this inequality is satisfied. We first note that:

K−1
X K−1
X X
σ
(vi+1 − vi )Di+1 = σ(x)xk hxk
i=k i=k x∈supp(σ)
K−1
X X
= σ(x)∆xi Di+1
x
1[xi > 0].
i=k x∈supp(σ)

Here once again we use that xi hx (vi ) = 0 when xi = 0. But, if value vk is present in market
x (xk > 0), then
K−1
X K−1
X
x x x
∆k Di+1 1[x(vi ) > 0] = (vi+1 − vi )Di+1 , (32)
i=k i=k

Thus:
K−1
X X K−1
X
σ x
(vi+1 − vi )Di+1 = σ(x) 1[xi > 0](vi+1 − vi )Di+1 .
i=k x∈supp(σ) i=k

We conclude that:
K−1
X X K−1
X K−1
X
σ x ∗
(vi+1 − vi )Di+1 ≤ σ(x) (vi+1 − vi )Di+1 = (vi+1 − vi )Di+1 , (33)
i=k x∈supp(σ) i=k i=k

which implies that (31) is satisfied.


Finally, to conclude the proof, we note that (30) and (31) imply that right-hand-side of
(29) is an upper bound for the consumer surplus attained by the optimal segmentation.

We thus proved that (29) is an upper bound. Proving that this bound is tight, i.e. that
there exists a segmentation over regular markets which achieves it, is more complicated,
and we relegate the details to the Appendix. Instead, in the next subsection, we provide
the basic elements for the construction in the proof and explain the different ways in which
segmentations can improve consumer surplus.

4.3 Two Forms of Persuasion


The proof that the concavification bound can be attained by an optimal segmentation σ has
two parts. In the first part, we show that it is possible to construct a segmentation σ in
which in every market x ∈ supp(σ) the inverse hazard rate of the distribution x at every
value vk in this market is:
hxk = hDk ,

23
where D solves (29). Since, in general, D will differ from D∗ , this requires constructing
segmentations where the average hazard rates differ from the aggregate market. To change
the hazard rate, we need to introduce gaps in the segments: that is, there will be markets
where xk > 0 and xk+1 = 0. This allows to increase the hazard rate of high values at the
expense of the hazard rate at low values.
The second part of the proof consists in segmenting the markets at every value vk where
the local information rent uk (hk ) lies below its concavification uk (hk ). In this second step,
we do not introduce gaps, but instead introduce variation across markets in the hazard rates
at a given value. However, the average hazard rate stays the same as in the first step.
We refer to the segmentation we produce in the first part of the proof as between-value
persuasion: it consists of changing the distribution of average hazard rates across values. By
contrast, we refer to the segmentation in the second part of the proof as within-value per-
suasion: it consists of changing the distribution of the hazard rates across market segments
while keeping the average hazard rate of a value constant.
When D∗ is a solution to (29), we can then say there is only within-value persuasion;
when the solution to (29) is D ̸= D∗ and

uk (hD D
k ) = uk (hk ) for all vk ,

then there is only between-value persuasion. We illustrate this with two different examples.2
We begin with an example in which the aggregate market D∗ solves (29) so there is only
within-value persuasion.
Example 1 (Within-Value Persuasion). Suppose there are two values V = {1, 2} and the cost
function is: 


 0 if q ∈ [0, 1];

c(q) = 34 (q − 1) if q ∈ [1, 2];


∞

if q > 2.
This model can be interpreted as a seller that can supply a unit of the good at 0 cost and
can supply a second unit at cost 34 .3 We can solve this model the same way as in Section 3,
but appropriately changing the u function, which we illustrate in Figure 3. If h∗1 > 1 we get
2
In Section 3, we studied markets with binary support. We found that the aggregate market was seg-
mented into two markets; in one market, type vL consumes a positive quality, while in the other market,vL
is not present. With two values, D∗ always solves (29), so there is never between-value persuasion. Addi-
tionally, there is no variation in the quality supplied to different values. In this sense, the example is too
simple to illustrate the construction of the consumer-optimal segmentation effectively.
3
We interpret qualities q ∈ (0, 1) as a seller that offers 1 unit with probability q and 0 units with
probability (1 − q). Qualities q ∈ (1, 2) can be interpreted in an analogous way.

24
the same optimal segmentation as when c(q) = q γ /γ and we took the limit γ → ∞ (in this
case, for the purpose of the consumer-optimal segmentation, it is irrelevant that the seller
can supply a second good).
If h∗1 ∈ (1/4, 1) then D∗ is the unique solution to (29). In this case, the consumer-optimal
segmentation segments the aggregate market into two segments; in one segment the hazard
rate will be 1/4 (and thus the low value is supplied both units of the good), and in the
other segment the hazard rate will be 1 (and thus the low value is supplied only one unit
of the good). This clearly provides higher consumer surplus than the aggregate market as
sometimes the low value is supplied both units of the good, which increases the consumer
surplus from high values. We can then see how providing some variability across the hazard
rates of a specific value across markets and help increase consumer surplus.

1
u1 (h)
u

2
u1 (h)

1
h
4 1

Figure 3: u1 and u1 for Example 1.

Next, we provide an example where there is only between-value persuasion, and show
how it can improve consumer surplus.
Example 2 (Between-Value Persuasion). Suppose the cost function is c(q) = q 2 /2, there are
three values V = {1, 2, 3} and the aggregate market is:

17 ∗ 1 ∗ 1
x∗1 = ; x = ; x = .
24 2 8 3 6

The hazard rates are:

x∗2 + x∗3 7 x∗3 4


h∗1 = ∗
= ; h∗2 = ∗
= ; h∗3 = 0.
x1 17 x2 3

Note that the inverse hazard rate is increasing between values 1 and 2, but the distribution
is regular.

25
The solution to (29) is:
1
hD
1 = and hD2 = 1.
2
Hence, these will be the average hazard rates in the aggregate market. Since we have that:
   
1 1
u1 = u1 and u2 (1) = u2 (1),
2 2

there will be no within-value segmentation. In particular, in this example, a consumer-


optimal segmentation consists of the following two markets:

40 760 741
x
b1 = ; x
b2 = ; x
b3 = ;
59 4661 4661
80 19
x
e1 = ; x
e2 = 0; x
e3 = .
99 99
The weights on the markets are σ(b x) = 4661/6080 and σ(e x) = 1419/6080.
We plot the u functions, their concavifications, and the inverse hazard rates in Figure
4. Observe that in this example, the solid dots (the hazard rates at the consumer-optimal
segmentation) are shifted away from the white dots (the hazard rate in the aggregate market).

1
u1 (h)
u1 (h)
u

u2 (h)
u2 (h)

1
4

h
1 2

Figure 4: uk and uk for Example 2.

4.4 Properties of the Consumer-Optimal Segmentation


Although we explicitly construct a market segmentation that attains the upper bound of (29),
the consumer-optimal segmentation that we obtain by construction is not easy to express
in closed form. However, we can find a sharp description of the qualities that buyers will
consume in this segmentation. Furthermore, the properties we provide next hold across all
consumer-optimal segmentations, not just the one we obtain from following the procedure
described in the proof of Theorem 1.

26
We first characterize the hazard rates of the demands in the support of a consumer-
optimal segmentation in terms of a solution to (29). We denote by supp(uk (h)) the support
of the concavification of u at h:

supp(uk (h)) ≜ {h′ ∈ R+ : uk (h′ ) = uk (h′ ) and uk (ωh + (1 − ω)h′ ) is linear in ω ∈ [0, 1]}.

In other words, the support consists of the hazard rates h′ where uk and uk coincide, and uk
is linear on the interval between h and h′ .
Proposition 2 (Properties of Consumption)
A segmentation σ solves (6) if and only if there exists some D solving (29) such that for
every x ∈ supp(σ) and vk ∈ supp(x),

hxk ∈ supp uk hD

k+1 , (34)

and, for all k,


X
σ(x)hxk = hD
k . (35)
{x|xk >0}

This proposition provides general properties of the hazard rates of the demand at any
given value vk in a consumer-optimal segmentation in terms of the solution to (29). The
expected hazard rate is hD
k (see (35)) and the hazard rates are in the support of u (see (34)).
We recall that in any market buyer vk will consume quality:

q = Q(vk − hxk ).

So, we can now translate these properties about hazard rates to properties about the qualities
consumed by buyers in a consumer-optimal segmentation.
Let qkσ denote the qualities consumed by value vk in some market of segmentation σ :

qkσ ≜ q ∈ R+ | qkx = q for some x ∈ supp(σ) .




Proposition 3 (Monotonicity of Quality)


Let σ be a consumer-optimal segmentation. Then, for all k,

max{q : q ∈ qkσ } ≤ min{q : q ∈ qk+1


σ
}.

That is, the quality consumption of different values is totally ordered between segments.
This proposition follows from the proof of Theorem 1 and the result that every hxk must
be in the support of uk (hσk ). That consumption is monotone within segments is immediate

27
from incentive compatibility. However, the fact that monotonicity also holds across segments
is a special property of the consumer-optimal segmentation.
We can then provide a limit on how much dispersion there is in the consumption across
values for any given value. We denote by

dσk ≜ max{q : q ∈ qkσ } − min{q : q ∈ qkσ },

the dispersion of consumption in segmentation σ for value vk . This is the difference between
the maximum and minimum quality purchased by value vk across all markets in segmentation
σ. We denote Q the maximum efficient quality supplied to any value:

Q ≜ arg max vK q − c(q) < ∞.
q∈R+

Since the seller will never supply an inefficiently high quality, the quality supplied to every
value will be below Q. We now use these definitions to bound the consumption dispersion.

Corollary 1 (Quality Dispersion)


For any consumer-optimal segmentation and ϵ > 0, there exists at most Q/ϵ different values
vk such that
dσk > ϵ.

Hence, for any fixed level of dispersion ϵ, there is a bound on the number of values that has
a dispersion larger than ϵ. Importantly, the bound does not depend on the aggregate market.
An immediate implication is that, if we approximate a absolutely continuous distribution
with a limit of increasingly finer discrete distributions, then in the limit almost every type
will consume a unique quality. In the following section, we provide conditions such that the
spirit of the result is satisfied exactly, that is, each buyer consumes only one quality across
all markets in a consumer-optimal segmentation.
Finally, Proposition 2 allows us to obtain a upper bound on the prices of any quality
level in any consumer-optimal segmentation. For this, first define:
 
hk = arg max uk (h) (36)
h

Following (34) we have that value vk always buys a quality weakly larger than:

q k ≜ Q(vk − hk ).

We formalize this in the following corollary.

28
Corollary 2 (Minimum Quality)
In any consumer-optimal segmentation σ, any buyer of type vk present in market x ∈ supp(σ)
consumes at least quality q k . Furthermore, the bound is tight: there exists an aggregate market
under which no segmentation is optimal and every value vk consumes q k .

We can thus bound the size of the inefficiencies in any consumer-optimal segmentation.
The surprising aspect of the result is that the bound can be derived using only the value of
a buyer and the cost function.

5 Computing the Value of Segmentation


The results in the previous section provided a characterization of the value of the consumer-
optimal segmentation and provided properties of the qualities consumed by different buyers
in the optimal segmentation. However, in this characterization, the value of the consumer-
optimal segmentation is expressed as a solution to a maximization problem. This makes it
difficult to gauge when the gains can be large or small, or even when no segmentation is indeed
the consumer-optimal segmentation. We thus now provide conditions under which the value
of segmentation can be characterized more sharply. First, we provide sufficient conditions
for D∗ to be the solution of (29). These conditions simplify the calculation of the value of
the consumer-optimal segmentation. However, even when D∗ solves (29), segmentation may
improve the consumer surplus through within-value persuasion (for example, as in Section
3). Hence, we also characterize when no segmentation is optimal.
We begin by providing conditions in terms of the local value of segmentation u and the
hazard rate in the aggregate market h∗k . These conditions will be easy-to-verify and help
characterize the value of segmentation. We then provide conditions on the cost function and
the distribution of values, which will be less general but easier to interpret.

5.1 When No Segmentation is Optimal


First, we provide a condition which guarantees D∗ is a solution to (29).

Lemma 5 (No Between-Value Segmentation)


The solution to (29) is D∗ if and only if u′k (h∗k ) is nondecreasing. In particular, in this case,
the consumer-optimal segmentation generates:

X K−1
X
max σ(x)U (x) = x∗k uk (h∗k ) . (37)
σ∈∆(∆V )
x∈supp(σ) k=1

29
Proof. (⇐) Consider any segmentation σ and recall that Dσ ≺ D∗ . The difference between
the RHS of (29) at Dσ and at D∗ can be bounded as

K−1
X K−1
X
x∗k (uk (hσk ) − uk (h∗k )) ≤ x∗k (u′k (h∗k )(hσk − h∗k ))
k=1 k=1
K−1
X
= u′k (h∗k )(vk+1 − vk )(Dk+1
σ ∗
− Dk+1 )
k=1

where the inequality uses the fact that uk is concave. Following Theorem 2 of Fan and
Lorentz (1954) we have that:

K−1
X
u′k (h∗k )(vk+1 − vk )(Dk+1
σ ∗
− Dk+1 ) ≤ 0,
k=1

which proves the result. To be precise in the use of Theorem 2 of Fan and Lorentz (1954),
we express our terms in their notation. We have the function

Φ(z, k) = −ck zk

which is linear and submodular in (zk , k). The constants ck = u′k (h∗k ) are nondecreasing and
Dσ ≺ D∗ , and hence

K−1
X K−1
X
∗ σ
Φ((vj+1 − vj )Dj+1 , j) ≤ Φ((vj+1 − vj )Dj+1 , j).
j=1 j=1

Multiplying this inequality by −1 yields

K−1
X
u′k (h∗k )(vk+1 − vk )(Dk+1
σ ∗
− Dk+1 )≤0
k=1

proving the lemma.


(⇒) In the proof of Theorem 1, we show that for any solution D of (29), u′ (hD
k ) is
nondecreasing; see discussion surrounding (A.9).

The proof basically uses the concavity of u and a classic inequality for majorization
constraints. The use of inequalities that rely on majorizations has been used recently by
Kleiner et al. (2021). Unlike these papers, we do not have a monotonicity constraint (Dσ
might be nondecreasing), but we can still use Theorem 2 in Fan and Lorentz (1954). Hence,
the essence of the result is the same.

30
We now characterize when no segmentation is optimal. For this, we define:
n o
x x ′ x
O = x ∈ ∆V : uk (hk ) = uk (hk ) for all k, and uk (hk ) is nondecreasing in k .

This exactly characterizes the set of markets where no segmentation is optimal.

Proposition 4 (No Segmentation)


The consumer-optimal segmentation is no segmentation if and only if x∗ ∈ O.

Proof. (⇐). Suppose x∗ ∈ O. Then, following Lemma 5, we have that D∗ solves (29).
Furthermore, since uk (h∗k ) = uk (h∗k ),

K−1
X K−1
X K−1
X
max x∗k uk (hD
k ) = x∗k uk (h∗k ) = x∗k uk (h∗k ).
D≺D∗
k=1 k=1 k=1

The second equality follows from the definition of O. Thus, no segmentation is optimal.
(⇒). If the first condition is not satisfied, then:

K−1
X K−1
X K−1
X
x∗k uk (h∗k ) < x∗k uk (h∗k ) ≤ max∗ x∗k uk (hD
k ),
D≺D
k=1 k=1 k=1

contradicting that no segmentation is optimal. If the second condition is not satisfied, then
by Lemma 5,
K−1
X K−1
X K−1
X
∗ ∗ ∗ ∗
xk uk (hk ) ≤ xk uk (hk ) < max∗ x∗k uk (hD
k ),
D≺D
k=1 k=1 k=1

which again contradicts that no segmentation is optimal.

We illustrate the set O in Figure 5 for when c(q) = q 2 /2 and V = {3, 4, 5}. Since there
are three values, the simplex is a subset of R2+ . The three vertices of the triangles represent
the three markets in which only one of the values is present. Then, any point in the triangle
represents a market where the composition of values is represented by the distance to the
vertices. In this graph, we also illustrate the set of regular distributions that have positive
virtual values (that is, ϕxk is increasing in k and non-negative for all k ∈ {1, ..., K}). For
reference, these are the distributions under which there is no exclusion of any value and there
is full separation. We can see there is a large class of markets under which no segmentation
is optimal. This is despite the fact that in all these markets the allocation is inefficient.
However, we find that in some markets, segmentation can improve consumer surplus even
when the distribution is regular and there is no exclusion of any values.

31
δ4

x regular
O

δ3 δ5

Figure 5: Illustration of O and regular markets with quadratic costs.

5.2 Conditions on Cost Function and Aggregate Market


We now provide sufficient conditions for u′k (h∗k ) to be nondecreasing in terms of the primitives,
the cost function and the distribution of values in the aggregate market. Assume that c(q)
is thrice differentiable, and that:

c′′′ (q)q
inf ≥ −1. (38)
q c′′ (q)

This condition holds whenever c′ (q) is convex or not too concave. This assumption allows
us to characterize the shape of uk .

Lemma 6 (Concave uk )
Under (38), the concavification of uk is given by:

u (h)
k if h ≤ hk ;
uk (h) =
u (h ) if h > hk .
k k

where hk is defined in (36). Furthermore, uk (h) is strictly concave for all h < hk .

In other words, the concavification is defined by a cutoff hk , where the concavification is


constant for hazard rates above the cutoff and is equal to uk below the cutoff. An immediate

32
corollary of this is that in the consumer-optimal segmentation, the quality consumed is
unique.

Corollary 3 (Unique Consumption)


Under (38), in any consumer-optimal segmentation σ, qkσ is a singleton for every k.

Hence, we obtain that each buyer consumes only one quality. This refines Corollary 1,
which states that with many different buyer values, almost all of them consumed essentially
the same quality.
For the aggregate market, we assume that:

Dk∗
(vk+1 − vk ) is decreasing in k. (39)
x∗k

If (vk+1 − vk ) is constant, i.e. the values are a uniform grid, we recover the well-known mono-
tone hazard rate (MHR) condition. For continuous random variables, this is also equivalent
to requiring the distribution to be log-concave (see Bagnoli and Bergstrom (2005) for com-
mon distributions that satisfy this condition and other properties of these distributions).

Lemma 7 (No Between-Value Segmentation with Convex Marginal Cost)


If (38)-(39) are satisfied, then u′k (h∗k ) is nondecreasing and hence the consumer surplus
generated by the consumer-optimal segmentation is (37).

Following Lemma 5-7, we thus have that under (38)-(39), the solution to (29) is D∗ . Note
that the aggregate market in Example 2 did not satisfy the monotone hazard rate condition,
which explains why D∗ did not solve (29).
We can now provide a sharp characterization of the consumer-optimal segmentation.

Proposition 5 (Gains From Segmentation)


Under (38)-(39), the gains from the consumer-optimal segmentation are:
 
X k−1
b
X

max  σ(x)U (x) − U (x ) = x∗k (uk (hk ) − uk (h∗k )),
σ
x∈supp(σ) k=1

n o

where k = min k | hk ≤ hk .
b

The proposition states that the consumer-optimal segmentation is given by the expected
difference between u and u. Furthermore, this difference will be non-zero for values below
a cutoff. Hence, the only gains for consumers come from changes in the consumption of
relatively low values.

33
The proof of Proposition 5 explicitly constructs one consumer-optimal segmentation,
which is helpful for understanding how the segmentation is affecting consumption. In this
segmentation σ, each market x ∈ supp(σ) has support {vj , vj+1 , . . . , vK } for some j ≤ b
k.
Within every market x, for every k > b
k,

hxk = h∗k ,

meaning high-value buyers consume the same quality as in the aggregate market. On the
other hand, for all k ≤ b
k,
hxk = hk ≤ h∗k ,

so low-value buyers consume a higher quality than they would in the absence of segmenta-
tion. We know there is no within-value persuasion, so all buyers of a given type consume
the same quality across segments. Whether the consumer-optimal segmentation raises the
consumption of type vk depends on how h∗k compares to hk .
Proposition 5 also allows us to identify conditions so that the cutoff is trivial, so there
are no benefits from segmentation at all.

Lemma 8 (No Segmentation—Convex MC)


Under (38)-(39), if h∗1 ≤ h1 , then no segmentation is the consumer-optimal segmentation.

This result is in contrast to Haghpanah and Siegel (2023), where with finite goods,
segmentation is generically (in the space of possible markets) beneficial to consumers. Here,
with a continuum of qualities, we obtain a large class of aggregate markets in which no
segmentation is optimal.

6 Isoelastic Cost
A particular functional form satisfying (38) which gives us interesting comparative statics is
the isoelastic cost case:

c(q) = , γ > 1. (40)
γ
The parameter γ = c′ (q)q/c(q) indicates the cost elasticity. This parametric form for the
cost function will allow us to simplify expressions and provide expression that permit easy
comparative statics with respect to γ. We continue to assume (39), the monotone hazard
rate condition on the aggregate market.4
4
With isoelastic costs, this assumption can be mildly relaxed to the weaker condition that, for all k,

ϕ∗k+1 − ϕ∗k 1
≥ .
vk+1 − vk γ

34
We can describe the consumer-optimal segmentation using a discrete version of the fa-
miliar notion of demand elasticity η xk :

xk vk vk
η xk ≜ · = x.
Dk+1 vk+1 − vk hk

That is, in market x at price vx if the price of a good increases by a fraction (vk+1 − vk )/vk of
the original price, then demand will decrease by a fraction xk /Dk+1 , which is η xk . Note that
for measuring the price increase we use as base the pre-increase price while to measure the
demand decrease we use the post-increase demand. When working with continuous demands
this is obviously irrelevant, while for discrete demands there are many natural ways to extend
the definition. We take the definition as it is more convenient for the notation and algebra,
but obviously the choice becomes irrelevant as the grid of possible values become fine enough.
We now describe the demand elasticity in the segments of a consumer-optimal segmentation.

Proposition 6 (Optimal Segmentation—Isoelastic Cost)


Under (39)-(40), in any consumer-optimal segmentation σ, for every x ∈ supp(σ) and every
k such that xk > 0: 
 γ if k < b
k;
η xk = γ−1
η ∗ if k ≥ b
k.
k
n o
γ−1
where k = min k | h∗k ≤
b
γ

Proof of Proposition 6. The isoelastic cost function satisfies (38). Following Lemma 7, we
know that u′k (h∗k ) is nondecreasing and following Lemma 5 we know that this implies that
the solution to (29) is D∗ . Finally, since u′k (h∗k ) is nondecreasing, we know that h∗k ≤ hk if
and only if k ≤ bk, where b k is defined in Proposition 5 and hk is defined in (36).
Following Lemma 6 we know that uk (h) = uk (h) for all h ≤ hk . Following Proposition 2,
we have that in any consumer-optimal segmentation the hazard rates satisfy:

h if k < b
k;
k
hxk =
h∗ if k ≥ b
k.
k

Finally, we have that:


vk (γ − 1)
hk = .
γ
Re-arranging terms, we get the result.

35
We thus have that in the consumer-optimal segmentations the demand elasticity in every
segment takes two forms. Either the demand has the same elasticity as the aggregate market,
or it has a constant elasticity determined by the cost function. Note that we might still have
that the support of values differs across markets, so the price of goods might change across
markets in the consumer-optimal segmentation. As γ increases, the demand elasticity falls,
and the range of consumer values consuming more than they would under no segmentation
shrinks.
The demand elasticity in the consumer-optimal segmentation is weakly decreasing in the
cost elasticity. The intuition is that with a more elastic cost, the demand needs to be more
inelastic for the seller to provide a relative higher quality. Hence, a more inelastic demand
is necessary to reduce the inefficiencies.
We can also recover the unit demand case of Bergemann et al. (2015) by taking the limit
as γ → ∞. Then, we get that in every segment, demand is unit-elastic, and the set of values
who consume more than under no segmentation is
n o
vk | η ∗k ≤ 1

which, with the MHR assumption, is the set of values excluded in the aggregate market.
Additionally, observe that
 1
 γ−1
vk γ→∞
Q(vk − hk ) = −−−→ 1.
γ

Thus, in the consumer-optimal segmentation all consumers are allocated the good, and the
segmentation is socially efficient.
Finally, we describe how the potential gains from segmentation changes with γ. For this,
we denote by Oγ the set of markets under which no segmentation is optimal when the cost
elasticity is γ.
Proposition 7 (Comparative Statics)
For any γ ′ < γ, Oγ ′ ⊂ Oγ .
Hence, as the cost becomes more elastic (γ decreases), the potential gains from segmenta-
tion increases. This is despite the fact that even in some inefficient markets, no segmentation
can be optimal. In contrast, in the limit γ → ∞, no segmentation is optimal only if the
allocation in the aggregate market is efficient. One could have thought that this means that
it is relatively unlikely to find a market in which no segmentation is optimal. However, the
conclusion is the opposite: in the limit γ → ∞ the cost is very inelastic which reduces the
potential benefits from segmentation.

36
7 Surplus-Sharing Segmentations
So far, this paper has focused on the consumer-optimal segmentation, the segmentation
which maximizes consumer surplus. One might naturally wonder whether our approach can
be used to characterize the entire surplus-sharing frontier, i.e. the set of all possible divisions
of surplus under some segmentation. This would produce a generalization of the “surplus
triangle” of Bergemann et al. (2015). Of course, the segmentation will also change how the
seller prices the different goods, and so it will also change the inefficiencies.
It turns out that by appropriately modifying the objective function, our same approach
can be used to produce the entire Pareto frontier, representing all Pareto-efficient outcomes.
For the remainder of the surplus frontier, while the (modified) concavification problems
remains a valid outer bound, this bound may not be tight. However, under binary values
K = 2, or if we assume isoelastic costs and MHR, the bound once again becomes tight and
our concavification bound characterizes the entire set of achievable outcomes.

7.1 Pareto-Efficient Segmentations


Consider the problem of finding, for some λ ∈ [0, 1], a segmentation which solves
X h i
max σ(x) λΠ(x) + (1 − λ)U (x) .
σ∈∆(∆V )
x∈supp(σ)

It will be convenient to rewrite this as


X h i
max σ(x) λW (x) + (1 − 2λ)U (x) ,
σ∈∆(∆V )
x∈supp(σ)

where W (x) is the total welfare in a regular market x. By Lemma 4, it continues to be


without loss to restrict attention to regular markets, so we can write W as

K
X K
X
W (x) ≜ wk (hxk ) = vk Q(vk − hxk ) − c(Q(vk − hxk )).
k=1 k=1

We then define the local objective function ω λ,k in a manner similar to the consumer surplus:

ω k,λ (h) ≜ λwk (h) + (1 − 2λ)uk (h).

37
Then, the problem can be written as
"K−1 #
X X
max xk ω λ,k (hxk ) + λx∗K wK (0),
σ∈∆(∆V )
x∈supp(σ) k=1

where the last term comes from the fact that hxK = 0 always. Since Section 4 uses no
assumptions on the shape of uk , it is no surprise that the same result goes through by just
replacing uk with ω λ,k .

Theorem 2 (Pareto-Efficient Segmentations)


Every Pareto-efficient segmentation σ satisfies, for some λ ∈ [0, 1],

X h i K−1
X
σ(x) λΠ(x) + (1 − λ)U (x) = max∗ x∗k ω λ,k (hD ∗
k ) + λxK wK (0). (41)
D≺D
x∈supp(σ) k=1

The proof of this theorem completely mirrors that of Theorem 1. Note that for λ ≥ 12 , the
solution will be first degree (perfect) price discrimination, which maximizes social surplus
and produces zero consumer surplus.

7.2 Surplus-Sharing Frontier


In general, the entire frontier can be found by finding segmentations which solve
X h i
max σ(x) e1 λΠ(x) + e2 (1 − λ)U (x) ,
σ∈∆(∆V )
x∈supp(σ)

for e1 , e2 ∈ {−1, +1} and λ ∈ [0, 1]. As before, it will be convenient to rewrite this as
X h i
max σ(x) e1 λΠ(x) + (e2 (1 − λ) − e1 λ) U (x) .
σ∈∆(∆V )
x∈supp(σ)

We thus define our local objective function


 
ω λ,e,k (h) = e1 λwk (h) + e2 (1 − λ) − e1 λ uk (h).

The first step of the proof of Theorem 1, showing that the concavification bound is an
upper bound, still applies, giving us an “outer bound” on the surplus frontier.

Lemma 9 (Outer Bound on Surplus-Sharing Frontier)

38
For any e ∈ {−1, +1}2 and λ ∈ [0, 1],

X h i K−1
X
max σ(x) e1 λΠ(x) + e2 (1 − λ)U (x) ≤ max∗ ω λ,e,k (hD
k ) + ω λ,e,K (0). (42)
σ∈∆(∆V ) D≺D
x∈supp(σ) k=1

Theorem 2 tells us that for e = (+1, +1), the boundary of the surplus frontier and the
outcome corresponding to the solution of (42) coincide. However, in general, this is not
true. The reason is that in the proofs of Theorem 1 and 2, we show the upper bound is
implementable with segmentations over regular markets. A crucial step of the proof is notic-
ing that the first-order condition governing the optimal D implies a submodularity property
(Lemma 5) which guarantees regularity. For general (e, λ), we lose this submodularity, and
hence the regularity guarantee.
The concavification bound of (42) is tight whenever K = 2, since all binary distributions
are regular. We can also recover the tightness of the bound by imposing the isoelastic
functional form, along with monotone hazard rate. In particular, suppose that the cost
function is:

c(q) = , γ ≥ 2, (43)
γ
and again impose the MHR condition (39). With these two assumptions, the entire surplus
boundary can be recovered from the concavification bound (42). In fact, the solution is
attained by D = D∗ , as it was in Proposition 6.

Proposition 8 (Surplus-Sharing Frontier)


Under (39) and (43), the surplus-sharing frontier attains the concavification bound of (42)
with equality at D = D∗ :

X h i K−1
X
max σ(x) e1 λΠ(x) + e2 (1 − λ)U (x) = ω λ,e,k (h∗k ) + ω λ,e,K (0)
σ∈∆(∆V )
x∈supp(σ) k=1

for all e ∈ {−1, +1}2 and λ ∈ [0, 1].

The functional form assumption (43) is only used to prove tightness of the concavification
bound when e = (−1, +1), corresponding to the “lower right” section of the boundary, which
is the most difficult case to deal with. The weaker requirement that c′′′ (q) ≥ 0 is sufficient
for the other cases.
Figure 6 shows the surplus-sharing frontier for quadratic costs, c(q) = q 2 /2, when V =
{1, 2} and x∗ = ( 53 , 52 ). Note that the set of possible surplus divisions is a convex subset
of the “surplus triangle” of Bergemann et al. (2015), which is the triangle formed by the

39
constraints that (1) U ≥ 0, (2) Π ≥ Π∗ , the profit in the aggregate market, and (3) U + Π
does not exceed the total available surplus in the market.

Figure 6: Surplus-sharing frontier with quadratic cost.

8 Concavification and Extreme Points


Our results arise as a result of concavification arguments; this stands in contrast to the
existing literature on price discrimination, which has taken different routes to understanding
how market segmentation can benefit consumers. In this section, we discuss some of the
main results in the literature and contrast them to our results.

8.1 Extreme Points


Given a menu p, we can consider the (possibly empty) set of markets where p is optimal:

Xp ≜ {x ∈ ∆V | Π(x, p) ≥ Π(x, p′ ) ∀p′ } . (44)

The set Xp is compact and convex. An extreme point of Xp is a point which cannot be
represented by convex combination of other points in Xp . We will also refer to markets in
the topological interior of Xp , meaning they are neither extreme points nor on the boundary

40
of Xp . Consequently, we call such an x an interior market. By the Minkowski-Caratheodory
Theorem (Simon (2011), Theorem 8.11), Xp is equal to the convex hull of its extreme points.
We denote by p∗ the optimal price in the aggregate market, which we assume in this
section that is uniquely defined. We also assume that the allocation in the aggregate market
is inefficient to avoid analyzing trivial cases. The set Xp∗ identifies the set of all markets
x′ ∈ Xp∗ where the seller’s optimal price remains the same as in the aggregate market. In
particular, the extreme points of this set can be useful to identify the consumer-optimal
segmentation in some situations.
To illustrate this, analyze the case in which the seller has an indivisible good for sale.
We obtain this by assuming that:

0 if q ≤ 1;
c(q) = (45)
∞ if q > 0.

This is the case studied by Bergemann et al. (2015). In this case we have that the extreme
points are sufficient to understand the consumer-optimal segmentation (and, in fact, they
are sufficient to understand the welfare implications of third-degree price discrimination).
Proposition 9 (Optimal Segmentation—Single Good)
If the cost function is (45), then there exists a consumer-optimal segmentation that places
weight only on the extreme points of Xp∗ .
Bergemann et al. (2015) also show that using the extremal points of Xp∗ allow imple-
menting an efficient allocation without increasing the seller’s profits (relative to the aggregate
market), and hence it must be the consumer-optimal segmentation. A remarkable aspect of
this result is that the extreme points also have a very simple structure.
With more than one good one can continue to follow this approach. In particular consider
a cost function c(q) which is piecewise linear with kinks at integer values of q, i.e. when
q ∈ {1, 2, . . . , }. We can interpret this as the cost function of a seller that has many (but
discrete) goods for sale, where a non-integer q is a randomization between the neighboring
two integer values.5 We can then try to apply the same logic as in Bergemann et al. (2015),
but we obtain a much weaker result.
Proposition 10 (Pareto Improvements)
If x∗ is in the interior of Xp∗ and the allocation is inefficient in the aggregate market, then
there is a segmentation that places weight only on the extreme points of Xp∗ that increases
consumer surplus and keeps profits constant.
5
That is, the buyer gets quality ⌈q⌉ with probability (q − ⌊q⌋) and ⌊q⌋ with probability (⌈q⌉ − q), where
⌊·⌋ is the floor function and ⌈·⌉ is the ceiling function.

41
An immediate corollary is that for generic aggregate markets, there is a segmentation
that Pareto improves the welfare generated by the aggregate market. This was also proved
in much greater generality by Haghpanah and Siegel (2023).
There are two drawbacks with this approach. First, segmentations that place weight only
on the extreme points of Xp∗ are not sufficient to generate the consumer-optimal segmenta-
tion. Hence, it is necessary to study segmentations that place weight on all extreme points
of all sets of the form Xp , and not just Xp∗ . Second, the number of extreme points grows
quickly with the number of goods. To illustrate this, we exemplify what happens as the
number of goods grows to infinite.

Proposition 11 (Extreme Markets with a Continuum of Goods)


If c′ is strictly increasing for all q, then for every strictly regular market x that induces no
exclusion (that is, ϕxk is strictly increasing in k and positive for all k), Xpx is a singleton. In
particular, x is a extreme market of Xpx .

We thus have that as the number of markets grow, essentially every relevant market is
an extreme market (recall from Lemma 4 we can focus on regular markets). Since we do not
know whether the consumer-optimal segmentation will place weights only on the extreme
points of Xp∗ , this approach does not allow narrowing the class of markets that need to be
considered when there are many goods. Furthermore, it is clear that in general the gains
from segmentations will become negligible as the number of goods increase, unless we allow
for non-local segmentations. Thus, while Proposition 10 remains valid for any finite number
of goods, its usefulness becomes negligible as the number of goods grow.
Appendix B contains additional discussion and results for the discrete good case. We
also show how concavification can be used to derive the extreme points in closed form.

8.2 Local Segmentations


We can use the set O to characterize the set of markets that are in the support of a consumer-
optimal segmentation. For the second part we first observe that in any consumer-optimal
segmentation σ, if x ∈ supp(σ) then no segmentation must be the consumer-optimal segmen-
tation when the aggregate market is x. After all, if there is an improvement for consumers,
these gains could have been also realized when the aggregate market is x∗ .

Corollary 4 (Support of Consumer-Optimal Segmentations)


Any consumer-optimal segmentation σ has support supp(σ) ⊂ O.

While the set O restricts the set of possible markets, this can still be a larger set. However,
one can build a consumer-optimal segmentation in which one restricts attention to the closest

42
markets to x∗ . To formalize this, for any subset of markets M ⊂ ∆V , we denote by co(M )
the convex hull of M . And, denote by co(M ) the convex hull of M without the points M
themselves:
co(M ) ≜ co(M )\M.

We say M are local extreme points of x∗ if M ⊂ O, x ∈ co(M ) and co(M ) ∩ O = {∅}.


In other words, the markets M are local extreme points of x∗ if markets M belong to O,
they allow segmenting x∗ and there is no market in O that can nontrivially be segmented by
markets in M.

Proposition 12 (Constructing Consumer-Optimal Segmentations)


For all x∗ ̸∈ O, there exists a consumer-optimal segmentation that has support only on local
extreme markets of x∗ .

Proof. Let ⪯ be the Blackwell order on ∆(∆V ), that is, for any σ, σ ′ ∈ ∆(∆V ), σ ⪯ σ ′ if σ ′
is a mean-preserving spread of σ. Let S ∗ be the set of all consumer-optimal segmentations.
Let σ ∈ S ∗ be such that σ ⪯ σ ′ for all σ ′ ∈ S ∗ . Following Proposition 4, we know that
supp(σ) ⊂ O. Now, suppose supp(σ) is not a local extreme market of x∗ . Then, there
exists x b ∈ co(supp(σ)). Let σ
b such that x b be a non-trivial segmentation of xb with support

supp(σ ) ⊂ supp(σ). We now consider the following segmentation:

σ(x) − ϵbσ (x) if x ̸= x
b;
σ
e(x) =
σ(b
x) + ϵ if x = xb,

e(x) ≥ 0 for all x. The utility of segmentation σ


where ϵ is small enough such that σ e can be
written as follows:
 
X X X
σ
e(x)U (x) = x) −
σ(x)U (x) + ϵ U (b σ
b(x)U (x) .
x∈supp(σ) x∈supp(σ) x∈supp(σ)

Following Proposition 4 we have that:


X
x) −
U (b b(x)U (x) ≥ 0.
σ
x∈supp(σ)

Hence, σ e ⪯ σ, so we
e is also a consumer-optimal segmentation. However, we also have that σ
reach a contradiction.

The proposition identifies a set of candidates markets that can be part of a consumer-
optimal segmentation and then states that one can reduce the search to the closest markets

43
among the candidate ones (that is, closest to the aggregate market). As a way to illustrate
how much Proposition 12 narrows the search of a consumer-optimal segmentation, we note
that it is possible to restrict attention to consumer-optimal segmentations that place positive
weight on the boundary of O. The boundary of O is highlighted in Figure 5. In this sense,
it shares a similar spirit as the work of Bergemann et al. (2015) (Proposition 9): we identify
potential markets that can be in the support of an optimal segmentation and then show we
just need to search among the closest ones.

9 Conclusion
This paper has characterized how market segmentation affects consumer welfare when mo-
nopolists can engage in both second- and third-degree price discrimination. Our analysis
yields several key insights. First, consumer-optimal segmentation maintains consistent qual-
ity provision across segments while allowing price variation. Second, the benefits of segmen-
tation depend critically on demand elasticities and cost structures, with no segmentation
being optimal when aggregate demand is sufficiently elastic.
These theoretical results have direct practical implications. For competition authorities,
they suggest that market segmentation should be evaluated based on observable market char-
acteristics rather than prohibited categorically. The finding that quality provisions remain
consistent across segments provides a potential metric for identifying harmful segmentation
practices. For firms, our characterization of optimal segmentation strategies offers guidance
for designing market segmentation policies that balance profit maximization with consumer
welfare.
Our analysis also connects to broader debates about big data and personalized pricing
in digital markets. While enhanced ability to segment markets could enable more sophis-
ticated price discrimination, our results suggest this may benefit consumers when properly
structured. However, the conditions we identify for beneficial segmentation—particularly
regarding demand elasticities and cost structures—may help guide regulatory policy.
Several important directions remain for future research. First, extending the analysis to
competitive markets as in the recent analysis of Bergemann et al. (2023) for single unit de-
mand would provide insight into how market structure affects optimal segmentation. Second,
empirical work testing our theoretical predictions about the relationship between demand
elasticities and optimal segmentation would be valuable.

44
References
Aguirre, I., S. Cowan, and J. Vickers (2010): “Monopoly Price Discrimination and
Demand Curvature,” American Economic Review, 100, 1601–1615.

Bagnoli, M. and T. Bergstrom (2005): “Log-concave Probability and its Applications,”


Economic Theory, 26 (2), 445–469.

Bergemann, Dirk, Ben Brooks, and Stephen Morris (2015): “The Limits of Price
Discrimination,” American Economic Review, 105, 921–957.

Bergemann, D., B. Brooks, and S. Morris (2023): “On the Alignment of Consumer
Surplus and Total Surplus under Competitive Price Discrimination,” Tech. Rep. CFDP
2373, Cowles Foundation for Research in Economics.

Condorelli, D. and B. Szentes (2020): “Information Design in the Hold-Up Problem,”


Journal of Political Economy, 128, 681–709.

Cowan, S. (2012): “Third-Degree Price Discrimination and Consumer Surplus,” Journal


of Industrial Economics, 60, 333–345.

Fan, K. and G. Lorentz (1954): “An Integral Inequality,” American Mathematical


Monthly.

Haghpanah, Nima and Jason Hartline (2021): “When Is Pure Bundling Optimal?”
Review of Economic Studies, 88, 1127–1156.

Haghpanah, N. and R. Siegel (2022): “The Limits of Multi-Product Discrimination,”


American Economic Review: Insights, 4, 443–458.

——— (2023): “Pareto Improving Segmentation of Multi-Product Markets,” Journal of


Political Economy, 131 (6), 1546–1575.

Johnson, J. and D. Myatt (2003): “Multiproduct Quality Competition: Fighting Brands


and Product Line Pruning,” American Economic Review, 93, 748–774.

Kamenica, Emir and Matt Gentzkow (2011): “Bayesian Persuasion,” American Eco-
nomic Review, 101, 2590–2615.

Kleiner, Andreas, Benny Moldovanu, and Philipp Strack (2021): “Extreme


Points and Majorization: Economic Applications,” Econometrica, 89, 1557–1593.

45
Maskin, E. and J. Riley (1984): “Monopoly with Incomplete Information,” RAND Jour-
nal of Economics, 15, 171–196.

Mussa, M. and S. Rosen (1978): “Monopoly and Product Quality,” Journal of Economic
Theory, 18, 301–317.

Pigou, A. (1920): The Economics of Welfare, London: Macmillan.

Robinson, J. (1933): The Economics of Imperfect Competition, London: Macmillan.

Roesler, A. and B. Szentes (2017): “Buyer-Optimal Learning and Monopoly Pricing,”


American Economic Review, 107, 2072–2080.

Schmalensee, R. (1981): “Output and Welfare Implications of Monopolistic Third-Degree


Price Discrimination,” American Economic Review, 71, 242–247.

Simon, Barry (2011): Convexity: An Analytic Viewpoint, Cambridge University Press.

Varian, H. (1985): “Price Discrimination and Social Welfare,” American Economic Review,
75, 870–875.

46
A Proof Details
In these proofs, we will at times refer to an indexed list of markets xℓ . In these cases, a
1
superscript ℓ is shorthand for market xℓ , e.g. h1 means hx .

Proof of Lemma 4. We fix a market x and let p be a seller-optimal price menu. Let Xp be
the set of markets where p is optimal:
( )
X
Xp ≜ x′ ∈ ∆V : p ∈ arg max x′k Π(vk , p) .
k

It is clear that X is convex and compact. Following the Krein-Milman theorem, we can write
x as a linear combination of extreme points of X. Furthermore, by Caratheodory’s theorem,
we can write it as a linear combination of at most K extreme points.
It thus suffices to show that every extreme point of Xp is regular. Suppose that y is an
extreme point of Xp and is not regular. Then there exists vk such that

φD D
k+1 < φk .

Consider the binary segmentation supported on the following markets:


 
 y if i ̸∈ {k, k + 1};  y if i ̸∈ {k, k + 1}′
 i  i

 

yi+ = yi + ϵ if i = k; yi− = yi − ϵ if i = k;

 

y − ϵ if i = k + 1,
 
y + ϵ if i = k + 1,
i i

where ϵ is small enough such that y+ (vk+1 ), y− (vk ) ≥ 0. It is easy to verify that y+ , y− ∈ X
and that
y+ + y−
y= .
2
Thus, y is not an extreme point of Xp , a contradiction.

Proof of Theorem 1. In the main text, we showed that (29) is an upper bound. Next, we
show that there exists a segmentation which exactly achieves the average hazard rate at
every point, i.e., for every x ∈ supp(σ) and vk ∈ supp(x), hxk = hD
k .

Lemma A.1 (Majorized D are Feasible)


For any D such that D ≺ D∗ , there exists a segmentation σ such that for every x ∈ supp(σ)
and vk ∈ supp(x), hxk = hD
k .

47
Proof. For ease of notation, in this proof we work with unscaled markets z : V → R+ which
P
we treat like distributions without imposing k zk = 1. We only require that for all k,
X
σ(z)zk = x∗k . (A.1)
z

At the end, we can convert these unscaled markets into actual markets by re-scaling:

zk X
xk = P , σ̂(x) = σ(z) · zi .
i zi i

This rescaling does not affect the hazard rates (and hence the implied allocations) in any
way. Additionally,
" # " #
X X X X X
σ(z)zk = 1 =⇒ σ̂(x)xk = 1 =⇒ σ̂(x) = 1
k z k x x

so σ̂ is in fact a valid segmentation.


Let D be the function we want to implement. This proof works by induction, going
decreasing down the support. Start with some segmentation σ over {z 1 , . . . , z L } with the
property that for some value j, for all k ≤ j,

zkℓ = x∗k . (A.2)

Furthermore, for all k > j and zℓ such that vk ∈ supp(zℓ ),

∆ℓk Dk+1

(vk+1 − vk )Dk+1
hℓk = = hD
k = . (A.3)

zk x∗k

(A.1)-(A.3) are clearly satisfied by the trivial segmentation for j = K − 1. We produce a


segmentation σ̂ supported on at most one extra market which preserves (A.1)-(A.3) at j − 1.
The construction is as follows. Suppose that we keep the same markets z ℓ and weights
σ(z ℓ ), but we modify them only at vj so that either (A.3) holds, or zjℓ = 0. That is,
 ℓ
 hDj · x∗ vj ∈ supp(ẑℓ )
h j
ẑjℓ = j
.
0 otherwise

Since ẑkℓ = zkℓ for all k ̸= j, (A.1)-(A.2) continue to hold except at k = j. Clearly, if we
remove vj from every market, then ℓ σ(z ℓ )ẑjℓ = 0 < x∗j . We claim that if vj is included in
P

48
the support of every market, then
X
σ(z ℓ )ẑjℓ ≥ x∗j .

To prove this, we first note that:

X X σ(z ℓ )hℓj hσj σ


Dj+1
σ(z ℓ
)ẑjℓ = · x∗j ∗
= D · xj = · x∗ .
ℓ ℓ
hD
j hj Dj+1 j

We now show that σ


Dj+1
≥ 1.
Dj+1
Observe that vj is in the support of all z ℓ . Thus, by (32) and (A.1),

K−1
X K−1
X
σ ∗
(vi+1 − vi )Dj+1 = (vi+1 − vi )Dj+1 . (A.4)
i=j i=j

The majorization constraint implies that:

K−1
X K−1
X

(vi+1 − vi )Dj+1 ≥ (vi+1 − vi )Di+1 . (A.5)
i=j i=j

Combining (A.4) and (A.5), we have

K−1
X K−1
X
σ σ
(vj+1 − vj )Dj+1 + (vi+1 − vi )Di+1 ≥ (vi+1 − vi )Di+1 . (A.6)
i=j+1 i=j

Furthermore, for every k > j, by (A.3)

σ x∗k hσk x∗k hD


k
Dk+1 = = = Dk+1
vk+1 − vk vk+1 − vk

and hence (A.6) implies that


σ
Dj+1 ≥ Dj+1 .

Clearly, the value of ℓ σ(z ℓ )zjℓ is strictly decreasing as we remove vj from the support of
P

markets one by one. The analysis above shows that as some point, we cross over x∗j . At the
crossing point we split the market into two segments, with appropriate weights so that (A.1)
is satisfied with equality. This completes the construction of σ ′ .
There are two important things to point out from this proof. First, the number of total

49
segments is at most K. Second, the proof is agnostic about the order in which we remove vj
from the support of the markets in each step. In particular, we can order them in a way that
produces markets with more natural supports. For example, if we always order the markets
by whether they include vj+1 , we get the feature that at every vj , either every market with
a gap at vj+1 also has a gap at vj , or every market with vj+1 in the support also has vj in
the support. Together, these imply that the construction has a well-defined limit when V
approaches a continuum of values.

In the end, we are left with a segmentation such that in every market,

K−1
X
U (x) = x∗k uk (hσk ).
k=1

However, we want to achieve the concavified value uk , not uk . Take any segment x and vk
such that
uk (hσk ) = λuk (h1 ) + (1 − λ)uk (h2 ), λh1 + (1 − λ)h2 = hσk ,

where h1 < hσk < h2 .


Let us segment x into two markets x1 , x2 with the same support as x such that: (1) for
all vj ̸= vk , h1j = h2j = hxj , and (2) h1j = hx1 , h2j = hx2 . Since demands are equal up to vk ,
x1k > x > x2k . Hence, there exists some µ ∈ [0, 1] such that µx1k + (1 − µ)x2k = x. The payoff
across this segmentation is

µx1k uk (h1 ) + (1 − µ)x2k uk (h2 ) = uk (hσk )

as desired.
The last step of the proof is to check that this constructed segmentation respects the
regularity assumption necessary for (25) to hold. To do so, we rely on the following lemma.

Lemma A.2 (Submodularity Implies Regularity)


If u′k (hk ) = u′k (hk ) ≤ u′k+1 (hk+1 ) = u′k+1 (hk+1 ), then vk+1 − hk+1 ≥ vk − hk .

Proof. Decompose the difference

0 ≤ u′k+1 (hk+1 ) − u′k (hk ) = u′k+1 (hk+1 ) − u′k+1 (hk + (vk+1 − vk )) +


 
 ′
uk+1 (hk + (vk+1 − vk )) − u′k (hk ) .


We claim that the second bracketed difference is negative. If so, then the first difference

50
must be positive, which by concavity is true and only if

hk+1 ≤ (hk + (vk+1 − vk )) ⇐⇒ vk+1 − hk+1 ≥ vk − hk .

To prove this, consider the analytical extension of uk onto all v:

uv (h) ≜ hQ(v − h) =⇒ u′v (h) = Q(v − h) − hQ′ (v − h).

Let h(v) = hk + (v − vk ), so that


Z vk+1
d ′
u′k+1 (hk u′k (hk )

+ (vk+1 − vk )) − = uv (h(v)) dv. (A.7)
vk dv

There are two distinct cases: when uv (h) = uv (h), and where uv (h) < uv (h). First, for any
open region where uv (h(v)) = uv (h(v)),

d ′ d ′
uv (h(v)) = (1 − 2h′ (v))Q′ (v − h) − h(v)(1 − h′ (v))Q′′ (v − h)
 
uv (h(v)) =
dv dv
= −Q′ (v − h(v)) − (1 − h′ (v))u′′v (h) = −Q′ (v − h(v)) ≤ 0,

where the second line uses that

u′′v (h) = hQ′′ (v − h) − 2Q′ (v − h).

Next, we claim that for any differentiable h(v),

d ′ 
uv (h(v)) < uv (h(v)) =⇒ uv (h(v)) ≤ 0.
dv

To see this, let T (v) and B(v) be the top and bottom support points of uv (h(v)), respectively,
so that
uv (T (v)) − uv (B(v))
u′v (h(v)) = u′v (T (v)) = u′v (B(v)) = . (A.8)
T (v) − B(v)
We can compute

d ′  u′ (T (v)) · T ′ (v) + ∂
u (T (v))
∂v v
− u′v (B(v)) · B ′ (v) − ∂v∂
uv (B(v))
uv (h(v)) = v +
dv T (v) − B(v)
(T ′ (v) − B ′ (v)) · (uv (T (v)) − uv (B(v)))
.
(T (v) − B(v))2

51
Substituting in (A.8), the second term cancels with part of the first, resulting in
 
d ′  1 ∂ ∂
uv (h(v)) = uv (T (v)) − uv (B(v)) .
dv T (v) − B(v) ∂v ∂v

We can simplify this by using



uv (h) = hQ′ (v − h).
∂v
But, note that

u′v (T (v)) = Q(v − T (v)) − T (v)Q′ (v − T (v)) = Q(v − B(v)) − B(v)Q′ (v − B(v)) = u′v (B(v)).

This, in turn, means that

d ∂ 1  
uv (T (v)) − uv (B(v)) = Q(v − T (v)) − Q(v − B(v)) ≤ 0.
dv ∂v T (v) − B(v)

Hence, the integrand of (A.7) is weakly negative on any open interval, completing the proof.

To see how we can use this lemma, note that the KKT conditions for (29) are:

k−1
X
u′k (hσk ) = µi (A.9)
i=1

where µi is the multiplier associated with the majorization constraint starting at i. All
µi ≥ 0, so at the optimal Dσ , u′k (hσk ) ≤ u′k+1 (hσk+1 ). But, by construction, for all x ∈ supp(σ),
hxk ∈ supp(uk (hσk )). In particular, this means that uk is concave at hxk and u′k (hxk ) = u′k (hσk ).
Thus, we can apply Lemma A.2, implying that the segmentation we constructed in Lemma
A.1 is regular, completing the proof of Theorem 1.

Proof of Proposition 3. Recall that for any x ∈ supp(σ) and vk ∈ supp(x),

u′k (hxk ) = u′k (hσk )

which is constant across x. By Lemma A.2, this means that for all k and x, x′ ∈ supp(σ),


vk+1 − hxk+1 ≥ vk − hxk .

This means
x ′ ′
qk+1 = Q(vk+1 − hxk+1 ) ≥ Q(vk − hxk ) = qkx

for all x, x′ and instances of vk , vk+1 .

52
Proof of Lemma 6. Recall that

u′k (h) = Q(vk − h) − hQ′ (vk − h), u′′k (h) = hQ′′ (vk − h) − 2Q′ (vk − h).

uk being increasing at h means that

Q(vk − h)
u′k (h) ≥ 0 =⇒ h ≤ .
Q′ (vk − h)

We would like this to be a sufficient condition for uk to be concave at h, that is, either
Q′′ (vk − h) ≤ 0, or
2Q′ (vk − h)
u′′k (h) ≤ 0 =⇒ h ≤ ′′ .
Q (vk − h)
A sufficient condition is that

Q(vk − h) 2Q′ (vk − h) Q′′ (vk − h)Q(vk − h)


≤ ′′ ⇐⇒ ≤ 2.
Q′ (vk − h) Q (vk − h) Q′ (vk − h)2

Defining q = Q(vk − h), then recalling that Q(φ) = (c′ )−1 (φ) and applying the inverse
function theorem, yields
c′′′ (q)q
≥ −2
c′′ (q)
which is obviously implied by (38).

Proof of Lemma 7. This proof consists of two parts. First, we show that hk is increasing
while h∗k is decreasing, meaning that for some threshold b k =⇒ u′k (h∗k ) = 0, and for
k, k < b
k≥b k, u′k (h∗k ) = u′k (h∗k ). hk is the discretization of
 
h(v) = arg max hQ(v − h) .
h

The FOC characterizing h(v) is

q(v − h(v)) − h(v)Q′ (v − h(v)) = 0

Applying the implicit function theorem, we get

Q′ (v − h(v))(1 − h′ (v)) − h′ (v)Q′ (v − h(v)) − h(v)Q′′ (v − h(v))(1 − h′ (v)) = 0


Q′ (v − h(v))
=⇒ h′ (v) = 1 − .
2Q′ (v − h(v)) − h(v)Q′′ (v − h(v))

53
Under (38),

Q′ (v − h(v)) h(v)Q′′ (v − h(v))


h(v) ≤ ′′
=⇒ 2 − ′
≥ 1 =⇒ h′ (v) ∈ [0, 1].
|Q (v − h(v))| Q (v − h(v))

Hence, hk is increasing, and so is vk − hk (a fact we need for the proof of Proposition 5).
Next, we show that u′k (h∗k ) is increasing for all k ≥ b
k. We can decompose

u′k+1 (h∗k+1 ) − u′k (h∗k ) = u′k+1 (h∗k ) − u′k (h∗k ) + u′k+1 (h∗k+1 ) − u′k+1 (h∗k ) .
 

The first term is equal to


Z vk+1 Z vk+1
∂  ′ ∗ 
u (h ) dv = Q′ (v − h∗k ) − h∗k Q′′ (v − h∗k ) dv.
vk ∂v v k vk

Under (38), for any h such that h ≤ hk ,


Z vk+1
′′ ′ ∂  ′ ∗ 
hQ (v − h) ≤ Q (v − h) =⇒ u (h ) dv ≥ 0.
vk ∂v v k

This also means, in particular, that u′k+1 (h) ≥ 0 for all h ∈ (hxk+1 , hxk ). The second term is
Z h∗k+1
u′′k+1 (h) dh ≥ 0
hx
k

k =⇒ u′′k+1 (h) ≥ 0.
using the fact that h∗k ≤ h∗k+1 by (39) and that, by Lemma 6, k ≥ b
Thus, u′k (h∗k ) is increasing in k.

Proof of Proposition 5. What remains is to show that there exists a solution in which every
segment x is supported on {vj , . . . , vK } for some j ≤ k̂. Consider the following demand D̂
supported on V , with hazard rates:

ĥk = min hk , h∗k .




Since hk is increasing and h∗k is decreasing, ĥ switches from the first argument to the second
exactly once, and hk −h∗k = (vk −h∗k )−(vk −hk ) satisfies increasing differences. Furthermore,
from the proof of Lemma 7, we know that vk − hk is increasing, so D̂ is regular.
We are now ready to construct the segmentation. Define the family of distributions

1 if v < vj ;
D̂j (v) =
 D̂(v) if v ≥ vj .

54
Since D̂ is regular, so is each D̂j . What we now need is to find weights σ̂ j such that
P j j ∗
P j
j σ̂ D̂ = D and j σ̂ = 1. We construct these weights recursively.
Begin by making σ̂ 1 the largest value until σ̂ 1 D̂1 (v) ≤ D∗ (v) is binding for some v. Our
previous argument shows that φ∗k − (vk − hk ) is decreasing. This is the amount of “over-
weighting” in D̂ relative to D∗ , meaning that xx̂/ D̂ 1
∗ /D ∗ is decreasing, so the binding v for D̂ is,
k k
in fact, v1 .
Now, consider the remainder distribution D e = D∗ − σ̂ 1 D̂1 . Notice that

x̂k /D̂k x̂k /D̂k


= ∗ .
x
ek /Dek (xk − σ̂ x̂1k )/(Dk∗ − σ̂ 1 D̂k1 )
1

Compute " #
x∗k − σ̂ 1 x̂1k 1 − σ̂ 1 x̂1k /x∗k x∗k
= .
Dk∗ − σ̂ 1 D̂k1 1 − σ̂ 1 D̂k1 /Dk∗ Dk∗

Combining these two together, we get that


" #−1
x̂k /D̂k 1 − σ̂ 1 x̂1k /x∗k x̂k /D̂k
= · .
x
ek /Dek 1 − σ̂ 1 D̂k1 /Dk∗ x∗k /Dk∗

x̂k /D̂k
Now, x∗k /Dk∗
is decreasing in k, which means that the bracketed term is increasing in k, and
x
ek /Dek
hence its inverse is also decreasing. Thus, x∗k /Dk∗
is overall decreasing in k.
We now repeat the same construction on D, e and again until we reach v , the crossing

point of hk and h∗k , at which point we set σ̂ k̂ to be all remaining weight. This completes the
construction of σ̂ j , proving the proposition.

Proof of Theorem 2. The proof mirrors that of Theorem 1. We first apply Lemma A.1, so all
we need to do is to show that the D solving the maximization problem can be implemented
by a segmentation over regular markets. The KKT conditions on hD k imply that

d ′ d ′
[ω k,λ (hσk )] = [ω (hσ )] = λwk′ (hσk ) + (1 − λ)u′k (hσk )
dh dh k,λ k

is increasing in k. As before, decompose

ω ′k+1,λ (hk+1 ) − ω ′k,λ (hk ) = ω ′k+1,λ (hk+1 ) − ω ′k+1,λ (hk + (vk+1 − vk )) +


 
 ′
ω k+1,λ (hk + (vk+1 − vk )) − ω ′k,λ (hk ) .


We wish to show that the second bracketed term is negative. Again, take h(v) = hk +(v−vk ),

55
so that Z vk+1
 ′ d ′
ω k+1,λ (hk + (vk+1 − vk )) − ω ′k,λ (hk ) =
 
ω v,λ (h(v)) dv
vk dv
where ω v,λ is the continuous extension of ω k,λ onto all v and similarly for wv (h). Note that

wv′ (h) = −vQ′ (v − h) + c′ (Q(v − h))Q′ (v − h) = −hQ′ (v − h).

Taking the total differential with respect to v,

d ′
wv (h(v)) = −h′ (v)Q′ (v − h(v)) − h(v)(1 − h′ (v))Q′′ (v − h(v))

dv
= −Q′ (v − h(v)) − (1 − h′ (v))wv′′ (h(v)) = −Q′ (v − h(v)) ≤ 0.

Thus, in any open interval where ω ′v,λ (h(v)) = ω ′v,λ (h(v)),

d ′ d d
ω v,λ (h(v)) = λ wv′ (h(v)) + (1 − 2λ) u′v (h(v)) ≤ 0.
dv dv dv

On the other hand, whenever ω v,λ (h(v)) < ω v,λ (h(v)), we have that

ω v,λ (T (v)) − ω v,λ (B(v)) wv (T (v)) − wv (B(v))


ω ′v,λ (h(v)) = =λ· +
T (v) − B(v) T (v) − B(v)
uv (T (v)) − uv (B(v))
(1 − 2λ) · .
T (v) − B(v)

We then follow the proof of Lemma A.2, which goes through some slight modifications. For
the case where ω v,λ (h(v)) < ω v,λ (h(v)), we are left with

d ′ ∂ ∂
ω v,λ (h(v)) = ω v,λ (T (v)) − ω v,λ (B(v)) (A.10)
dv ∂v ∂v

where
   
∂ ∂ ∂
ω v,λ (h) = λ wv (h) + (1 − 2λ) uv (h) = λQ(v − h) + (1 − λ)hQ′ (v − h).
∂v ∂v ∂v

By construction, ω ′v,λ (T (v)) = ω ′v,λ (B(v)), which means

(1−2λ)Q(v−T (v))−(1−λ)T (v)Q′ (v−T (v)) = (1−2λ)Q(v−B(v))−(1−λ)B(v)Q′ (v−B(v)).

56
Substituting this equation into the previous, (A.10) becomes

d ′  
ω v,λ (h(v)) = (1 − λ) Q(v − T (v)) − Q(v − B(v)) ≤ 0,
dv

completing the proof.

Proof of Proposition 7. We prove that for any market x ∈ / Oγ , x ∈


/ Oγ ′ . Denote by uv (h; γ)
the local information rate function with isoelastic costs:

1
uv (h; γ) = h(v − h) γ−1 −1 ,

By Lemma 6, with isoelastic costs, uv (h; γ) = uv (h; γ) if and only if

γ−1
h≤ v.
γ

This condition obviously becomes weaker as γ increases; hence, if it is not satisfied at γ,


then it is also not satisfied by γ ′ < γ.
Now, take vk ∈ supp(x) such that u′v (hxk+1 ; γ) < u′v (hxk ; γ). Consider the function h(v, γ)
defined on [vk , vk+1 ] which maintains u′v (h(v, γ); γ) = u′k (hxk ; γ). To characterize h, first
compute the scalar derivative:
 
1
−1 γ
u′v (h, γ) = (v − h) γ−1 v− h .
γ−1

Then, take the total derivative of this expression with respect to v when h = h(v):
   
dh ′ i 1 1
−2 ′ γ
uv (h(v); γ) = − 1 (v − h(v)) γ−1 (1 − h (v)) v − h(v) +
dv γ−1 γ−1
 
1
−1 γ ′
(v − h(v)) γ−1 1− h (v) .
γ−1

The differential equation characterizing h(v, γ) is:

2−γ
dh ′ i

(v − h(v, γ)) − γ−1 h(v, γ)
uv (h(v, γ); γ) = 0 ⇐⇒ h (v, γ) = 2−γ . (A.11)
dv 2(v − h(v, γ)) − γ−1 h(v, γ)

Note that
γ−1
h(v) ≤ v =⇒ h′ (v) ≥ 0.
γ

57
Furthermore, observe that (A.11) is increasing in γ. Since x ∈
/ Oγ ,
Z vk+1 Z vk+1

hxk+1 > h(vk+1 ; γ) =⇒ hxk+1 − hxk > h (v, γ) dv > h′ (v, γ ′ ) dv. (A.12)
vk vk

Hence, x ∈/ Oγ ′ . The strict inclusion follows from observing that we can replace the first
inequality of (A.12) with a weak inequality, so if x is on the border of Oγ , then x ∈
/ Oγ ′ .

Proof of Proposition 8. If e = (+1, −1), then the problem is to maximize profits and mini-
mize consumer surplus, which is achieved by first degree (perfect) price discrimination. So,
it suffices to consider the case where e1 = −1. Rewrite the objective function as

−λW + (e2 (1 − λ) + λ)U = U + µW

where
−λ
µ≜ ∈ (−∞, ∞).
e2 (1 − λ) + λ
Correspondingly, define the local objective

ω µ,k (h) = uk (h) + µwk (h).

We can compute
ω ′µ,k (h) = Q(vk − h) − (1 + µ)hQ′ (vk − h),

and
ω ′′µ,k (h) = (1 + µ)hQ′′ (vk − h) − (2 + µ)Q′ (vk − h).

We divide the analysis into three cases. First, if µ ≤ −2, then ω ′′µ,k (h) ≥ 0, meaning that
the objective is convex everywhere. The concavification is then given by the line segment
between 0 and vk :  
h
ω µ,k (h) = µ 1 − wk (0).
vk
This has derivative

µ µ
ω ′µ,k (h) = −

wk (0) = − vk Q(vk ) − c(Q(vk )) .
vk vk

This expression is increasing in k, so under (39) we have submodularity and the solution to
the concavification bound is D = D∗ , by an argument identical to that of Lemma 5. The
solution is implementable by putting as much weight as possible on the distribution where
hxk = vk , and then dividing the remaining mass into degenerate distributions with all weight

58
on a single value.
Next, suppose µ ≥ −1. We then have ω ′′µ,k ≤ 0, and ω µ,k is concave everywhere. The rest
of the analysis is identical to that of Proposition 5; ω ′µ,k (h) is increasing in vk , and hence we
have both submodularity and implementability in segmentations with gapless support.
The final case is when µ ∈ (−2, −1). Now, ω µ,k is increasing everywhere, but is concave
at h if and only if
2 + µ Q′ (vk − h)
h≤ · .
1 + µ Q′′ (vk − h)
We claim that ω µ,k switches from concave to convex exactly once, then remains convex. A
sufficient condition for this is that

Q′ (vk − h) Q′ (ϕ)Q′′′ (ϕ)


is decreasing in h ⇐⇒ sup ≤ 1.
Q′′ (vk − h) ϕ (Q′′ (ϕ))2

This is satisfied by the isoelastic cost function with γ ≥ 2. Thus:



ω
µ,k (h) if h ≤ ĥk ;
ω µ,k (h) =
ω
µ,k (ĥk ) otherwise,

where
ω µ,k (ĥk )
ω ′µ,k (ĥk ) = − .
vk − ĥk
We now claim that ĥk is increasing in k, and hence we have the single-crossing property used
in the proof of Proposition 5. Here, we work directly with the functional form:
1 γ
1 1+µ 1 (ĥk + µvk )(vk − ĥk ) γ−1 − µ/γ(vk − ĥk ) γ−1
(vk − ĥk ) γ−1 − · ĥk (vk − ĥk ) γ−1 −1 = − .
γ−1 vk − ĥk

This simplifies to
µ
(1 + µ) − γ
ĥk = 1+µ µ vk .
γ−1
− γ

Importantly, ĥk only matters when it is positive and less than vk , and when it is positive it
must be an increasing function of vk .
Lastly, we need to establish that submodularity holds. It is easy to verify that ω ′µ,k
is increasing in vk whenever ω ′′µ,k ≤ 0, and hence when h∗k ≤ ĥk we have submodularity.
Otherwise, we need to compute the derivatives at ĥk :
 
1
−1 1+µ
ω ′µ,k (ĥk ) = (vk − ĥk ) γ−1 vk − ĥk − ĥk .
γ−1

59
Since ĥk is a linear function of vk , and vk − ĥk ≥ 0 in the relevant parameter region, this is
increasing in vk . This completes the proof.

B Additional Results: Discrete Goods


B.1 Consumer-Optimal Segmentation
We model the discrete good environment by taking c(q) to be convex and piecewise linear
between integer values of q; equivalently, we take c′ (q) to be a step function. For notational
ease, assume that Q ∈ N, and let {κi }, i ∈ {1, . . . , Q}, denote the different values of c′ (q).
Such a c(q) is a piecewise linear approximation of some smooth function ĉ(q). In particu-
lar, we can think of uk as being equal to the associated ûk when vk − h ∈ {κi }, and otherwise
is equal to the linear interpolation between these points. That makes uk itself a piecewise
linear approximation of cav[ûk ].
This observation by itself is not particularly useful. However, combined with an assump-
tion that replicates the conditions of Section 5, we are able to recover many of the results
from before. In particular, if:
κi+1 − κi is increasing (B.1)

then c is a linear approximation of some ĉ satisfying (38). Thus, the concavification is equal
to a piecewise linear approximation of cav[ûk ] up to the maximum, which is

i∗k = arg max i · (vk − κi ) .


 
i∈N

We can invert this condition by noting that


n o n o
i∗k = max i | i · (vk − κi ) ≥ (i − 1) · (vk − κi−1 ) = max i | vk − κi ≥ (i − 1)(κi − κi−1 ) .
i∈N i∈N

This gives a discrete counterpart to Corollary 2.

Corollary 5 (Minimum Quality with Convex Discrete Marginal Cost)


Under (B.1), every vk such that

vk − κi ≥ (i − 1)(κi − κi−1 )

consumes quality at least i.

By imposing the MHR condition (39), we can also extend Proposition 5.

60
Proposition B.1 (Optimal Segmentation with Convex Discrete Marginal Cost)
Under (39) and (B.1), in any consumer-optimal segmentation, for every market x ∈ supp(σ)
and every consumer of type vk ,

{i∗ } if h∗k ≥ vk − κi∗ ;
qkx ∈
{q ∗ , q ∗ + 1} if h∗k < vk − κi∗ .
k k

Proposition 4 does not go through—this can be seen immediately from Proposition B.1,
as the quality provided to the same type may differ by 1 across segments. As a result, generi-
cally, some segmentation is always beneficial. This fact is also the main result of Haghpanah
and Siegel (2023). However, if we consider a discrete approximation of a continuous cost
function that satisfies Proposition 4, the benefits of segmentation are small: equal to the
gain from concavifying the hazard rate around the two nearby points of vk − κi . As the
number of goods grows, i.e. as c(q) approaches a smooth function, this gain goes to 0.

B.2 From Concavification to Extreme Points


In this section, we show that concavification can be used to explicitly derive the extreme
points in a discrete good setting. We will make some slight adaptations to our notation.
With discrete quality, the optimal price menu p(q) will be constant between integer values
of q. Thus, we take prices to be an increasing vector px ∈ RQ x x
+ , where element pi = p (i),
i ∈ {1, . . . , Q}. Rather than working with p, it is actually easier to work with the incremental
price ρxi = pxi − pxi−1 . This is because we know ρxi = min{v | ϕx (v) ≥ κi }. Note that ρxi is
the lowest consumer type who purchases quality i, and thus is also increasing in i.
This model is equivalent to one of a multi-good monopolist, where the seller sells Q
goods which are identical to the consumer but have increasing marginal cost of production
κi . Then, ρi corresponds to the price charged for each good, which is a “quality increment”
in our original model. We will sometimes refer to ρi as the price of increment or unit i.
Consider the analysis of Section 7 in this setting. Given any increasing sequence κi , wk (h)
is constant except at the jump points of Q(vk − h). Hence, to achieve the Pareto efficient
frontier, it is without loss to restrict to distributions such that vk − hxk ∈ {κi }; these are the
only hazard rates necessary to obtain the concavification of ω k,λ , no matter what λ is.

Lemma B.3 (Sufficient Distributions)


When c is discrete, the Pareto frontier can be achieved with segmentations supported only on
distributions x such that x is regular and for every k, vk − hxk ∈ {κi }.

These distributions have a very particular structure: for every k, hxk = vk − κi for some

61
i ∈ {1, . . . , Q}, where κi increases with k. Equivalently, there is some partition of V into i
different (possibly empty) sections Ri = [ri , ri+1 ), such that for any vk ∈ Ri ,

vk − κi
vk − hxk = κi =⇒ Dk+1
x
= · Dkx .
vk+1 − κi

This distribution can be written explicitly as

Ai
Dkx = ∀vk ∈ (ri , ri+1 ]
vk − ci

where A1 ≜ r1 − c1 and
ri+1 − ci+1
Ai+1 ≜ Ai .
ri+1 − ci
Within the sections Ri , this distribution is what is known as the generalized Pareto distri-
bution, defined by:
 −1/ξ
x−µ
F (x) = 1 − 1 + ξ
σ
where the parameters µ, σ, ξ are called the location, scale, and shape, respectively.
Let DrQ denote the set of piecewise generalized Pareto distributions with support Q, where
r is the vector of cutoffs ri ≥ κi which define the regions Ri . In Figure B.1 we illustrate a few
examples of the piecewise Pareto demand functions on the continuous value set V = [0, 1].
By construction, within each segment, the marginal revenue of selling an additional unit of
quality to type vk is κi when vk ∈ Ri , meaning that ρxi = ri .
Intuitively, these distributions place the greatest amount of mass possible to the right of
ri without violating the requirement that ρxi = ri . This intuition is formalized below.

Proposition B.2 (Stochastic Dominance)


Take any x ∈ ∆S and r such that ri ≥ ρxi for all i. Then:

Dx (vk ) ≤ DrQ (vk ) , ∀vk ∈ Q.

That is, x is first-order stochastically dominated by DrQ .

Proof. For our discussion of discrete qualities, we need some additional notation. Let

Πi (x, ρi ) ≜ Dx (ρi )(ρi − κi )

be the profit attributable to selling the ith quality increment at the incremental price ρi . We

62
Figure B.1: Examples of piecewise Pareto demands.

also denote the total profit by


X
Π(x, ρ) ≜ Πi (x, ρi ).
i

We claim that given any two vectors ρ, ρ̂ with ρi ≤ ρ̂i for all i, Dρ̂ first-order stochastically
dominates any x ∈ Xρ . That is, for all vk ∈ V , Dkx ≤ Dρ̂ (vk ). Suppose, by contradiction,
there exists v̂ such that Dx (v̂) > Dρ̂ (v̂), and without loss assume v̂ is the smallest such value.
Let ρi be such that v̂ ∈ (ρi , ρi+1 ]. Note that v̂ > ρ̂1 ≥ ρ1 , so such ρi exists. We then have
that:
Πi (x, v̂) > Πi (Dρ̂ , v̂) ≥ Πi (Dρ̂ , ρi ) ≥ Πi (x, ρi ).

The first inequality follows from Dx (v̂) > Dp̂ (v̂). The second inequality follows from Lemma
B.4 and using that v̂ ∈ [ρi , ρi+1 ] so v̂ < ρ̂i+1 . The third inequality follows from the fact that
Dx (v) ≤ Dρ̂ (v) for v < v̂. But, this implies that ρi is not an optimal price for increment i,
a contradiction. Hence, for any x ∈ Xρ , Dρ̂ first-order stochastically dominates Dx . Finally,
it is immediate that Dr first-order stochastic dominates any DrQ where Q ⊂ V .

This property of the piecewise generalized Pareto also means that it solves the multi-unit
version of the consumer maximization problem studied in Condorelli and Szentes (2020), a
feature we discuss in Section B.3.
Next, we define the extreme point problem under consideration. Given ρ, we can define

63
the (possibly empty) set of markets where ρ is optimal:
( )
X
X ρ = x | ρx = ρ = x | pxi =

ρj .
j≤i

As mentioned previously, the set Xρ is a compact, convex subset of RK + , and equal to the
convex hull of its extreme points.
Lemma B.3 states that the set of all such DrQ , where we vary over all Q ⊆ V and
increasing sequences r ∈ QQ , is sufficient to achieve all Pareto efficient segmentations. This
is because they are the only distributions needed to support ω k,λ for any λ. It turns out that
these distributions are also exactly the extreme points of Xρ .
Proposition B.3 (Extreme points of Xp )
With discrete qualities, x is an extreme point of Xρ if and only if Dx = DrQ for some Q ⊆ V
and ri ≤ ρi ≤ ri+1 for all i ∈ {1, 2, . . . , Q}.
Proof. We begin with two lemmas. The first tells us how profits co-move with prices and
costs. The second establishes that Xρ is nonempty if and only if Dρ exists.
Lemma B.4 (Co-monotonicity of Prices and Costs)
For any i, j such that ci > cj and ρi , ρ̂i such that ρi > ρ̂i (or ci < cj and ρi < ρ̂i ):

Πi (x, ρi ) ≤ Πi (x, ρ̂i ) =⇒ Πj (x, ρi ) < Πj (x, ρ̂i ).

Proof. We have:

Dx (ρi ) ρ̂ − ci ρ̂ − cj
Πi (x, ρi ) ≤ Πi (x, ρ̂i ) =⇒ x
≤ i < i =⇒ Πj (x, ρi ) < Πj (x, ρ̂i ),
D (ρ̂i ) ρ i − ci ρi − cj

as desired. The case of ci < cj can be proven similarly by inverting the ratios.
Lemma B.5 (Existence of Xρ )
Xρ is non-empty if and only if for all i ∈ {1, .., Q}: (i) ρi ≤ ρi+1 and (ii) ci < ρi .
Proof. For prices satisfying these conditions, Dρ ∈ Xρ and so Xρ is non-empty. For necessity,
ρi < ci is obviously never optimal, and ρi > ρi+1 cannot be optimal by Lemma B.4.

We are now ready to prove the proposition.


(⇒). First, we show that if x ∈ Xρ is an extreme point, then every vk ∈ supp(x) is an
optimal price for some unit i. To see this, suppose there is a vk such that xk > 0, but, for
every i,
vk ∈
/ arg max Πi (x, ρi ).
ρi

64
Consider the following market segmentation with uniform distribution and binary support
in markets {x− , x+ } defined as follows:
 
D(v) if v ̸= vk ; D(v) if v ̸= vk ;
Dx− (v) = Dx+ (v) =
D(v ) − ϵ if v = v , D(v ) + ϵ if v = v .
k k k k

The segmentations clearly conform to the aggregate market. Furthermore, if ϵ is small


enough, then we continue to have that for all i,

vk ∈
/ arg max Πi (x− , ρi ) and vk ∈
/ arg max Πi (x+ , ρi ).
ρi ρi

Hence, x+ , x− ∈ Xρ , contradicting that x is an extreme point. Thus, every extreme point


satisfies:
Πi (x, ρi )
Dkx = ,
vk − ci
for some i. To finish the proof, we need to show that the constants Πi (x, ρi ) are consistent
with the piecewise Pareto. This is equivalent to showing that ρxi = ρxi+1 whenever ci < ci+1 ,
where  
x
ρi = max arg max Πi (x, ρi )
ρi

is the largest optimal price for increment i, and similarly for ρxi . Suppose ρxi < ρxi+1 , and
let i be the smallest index where this fails. Consider the following segmentation, again with
uniform distribution over binary support:
 
Dx (v) if v < ρxi+1 ; Dx (v) if v < ρxi+1 ;
Dx− (v) = Dx+ (v) =
κDx (v) if v ≥ ρx , (2 − κ)Dx (v) if v ≥ ρx ,
i+1 i+1

where κ < 1. We claim that if κ is sufficiently close to 1, x− , x+ ∈ Xρ .


Take x− (a similar argument works for x+ ). If any prices changed, it is of some unit in
{i + 1, . . . , n}. But, for any unit j ≥ i + 1 and vk < ρxi+1 ,

Πj (x, ρj ) ≥ Πj (x, ρxi+1 ) > Πj (x, ρxi ) ≥ Πj (x, vk ),

where the first inequality comes from optimality of price ρj , and the next two follow from
Lemma B.4. Thus, for κ sufficiently close to 1, we have x− , x+ ∈ Xρ , a contradiction.
(⇐). By Proposition B.2, Dρ first-order stochastically dominates every element of Xρ .
Hence, it cannot be written as the convex combination of two elements in Xρ . Additionally,
by Lemma B.5, Dρ exists if and only if Xρ is non-empty, and hence has extreme points.

65
We can visualize this result using the simplex. Figure B.2 represents the simplex ∆V
when there are three values and two products. The different regions Xρ are identified by a
vector that identifies the optimal (incremental) prices in each region.

δ3

(3, 3)

(2, 3)

(1, 3)
(2, 2)

(1, 2)
(1, 1)

δ1 δ2

Figure B.2: Division of ∆V into Xρ with V = {1, 2, 3} and c = (0, 13 ).

The extreme points of Xρ are the vertices of each price region. In Figure B.2 there are
three interior extremal markets, illustrated with white, red and blue dots. The white dot is
a market in which the seller is indifferent between the three prices ρ ∈ {1, 2, 3} for quality
1 and the optimal price for the second unit of quality is ρ = 3. The blue dot is a market in
which the seller is indifferent between the three prices ρ ∈ {1, 2, 3} for the second unit of
quality and the optimal price for the quality 1 is ρ = 1. The red dot is a market such that
the seller is indifferent between the prices ρ ∈ {1, 2} for quality 1 and is indifferent between
the prices ρ ∈ {2, 3} for the second unit. There are other extremal markets in which the
support of the distribution is smaller. The extremal markets marked by gray dots illustrate
points in which all buyers have value 3, and another one in which the buyer never has a
value of 2, but the seller is indifferent between the prices ρ ∈ {2, 3} for the second unit.
Looking at Figure B.2, it is clear that the intersections of the lines are the only points
which are not the convex combination of any two other points in the price region. The
proof makes this observation rigorous by explicitly checking that every point apart from
these intersections can be written as the convex combination of two other points within the
same price region. For the points where some value vk ∈ supp(x) is not an optimal price
for some unit, we can easily perturb x in opposite directions without changing the optimal

66
price vector. This leaves us with just those markets x which are along the boundary. For
these markets, we verify that there always exists two “neighboring” extremal markets such
that x lies in the convex combination of the two.

B.3 Unconstrained Consumer Maximization


The unconstrained consumer maximization problem asks which distribution x leads to the
highest consumer surplus U (x). Formally, it solves
h i
max U (x, ρx ) such that ρx ∈ arg max Π(x, ρ) .
 
(B.2)
x∈∆V ρ∈V Q

We call this problem unconstrained because it is essentially the same as (6) without the
majorization constraint.
It is easy to see that increasing the distribution of values will increase profit. In fact, 0
and vK are the profit when the distribution of values is degenerate at 0 or vK , respectively,
and everything in between is achievable with an intermediate distribution. When we examine
how the distribution of values impacts consumer surplus, however, the effect is more subtle.
In particular, if the distribution is degenerate, the seller extracts the full surplus. Hence, the
problem is not trivial.
The solution to (B.2) with a continuum of values when Q = 1 and c = 0 is given by
Condorelli and Szentes (2020). The optimal distribution is a Pareto distribution with shape
1 and scale parameter 1e , truncated on the interval [ 1e , 1]. With Q = 1, the assumption that
c = 0 is without loss, but for our purposes, it is convenient to “undo” this normalization.

Proposition B.4 (Consumer-Optimal Distribution—Single Good)


The solution to (B.2) when Q = 1 is

1 if v < ρ;
Dx (v) = (B.3)
 ρ−c if v ≥ ρ,
v−c

for some ρ > c.

The distribution is constructed as to keep profit constant for every price in [ρ, vK ], and
(given this indifference) the seller sells the product at the lowest optimal price ρ. The
proof of this result then shows that, given this complete indifference, any other distribution
which induces price ρ is first order stochastically dominated by (B.3), and hence gives lower
consumer surplus.

67
We know that the piecewise generalized Pareto distributions also maintain constant prof-
its across the relevant regions, and by Proposition B.2 they also first order stochastically
dominate any other distribution with the same or lower prices. Thus, we can apply the same
proof for the multi-unit case on these piecewise Pareto distributions.

Proposition B.5 (Consumer-Optimal Distribution)


If (x, ρ) is a solution to the consumer surplus maximization problem (B.2), then the optimal
demand function is given by Dρ for some ρ.

For a given cost vector c, the solution (x, ρ) includes a particular price vector ρ ∈ V Q
that maximizes the consumer surplus in the market x. Figure B.3 plots the optimal demand
function for two products, Q = 2, and the cost vector c = (0, 0.5). The resulting solution is
compared with the corresponding single product solutions for c = 0 and c = 0.5. In general,
the optimal values of ρi are non-elementary expressions of c.

Figure B.3: Consumer-Optimal demands.

We can provide a sharper characterization of the optimal solution than offered by Propo-
sition B.2 in the case of binary products, Q = 2. When there are only 2 product for sale, it
is without loss of generality to normalize c1 = 0, so that the model is parametrized only by
c2 , which we refer to c in the remainder of this Subsection. It is useful to consider the limit
as the set of values becomes more refined. For this purpose, we define:

∆ ≜ max{vk+1 − vk }
k

as we let the number of values become arbitrarily large, K → ∞.

68
Proposition B.6 (Prices with Two Goods)
When Q = 2, there exists c̄ such that in the consumer-optimal distribution of values, the
seller bundles the products (ρ1 = ρ2 ) if and only if c ≤ c̄. Furthermore,

vK
lim c → .
∆→0 1+e

Proof. We first compute U (x) for extremal markets x = Dp in the limit where V = [0, 1].
The consumer surplus from selling unit i to consumers with valuations v ∈ [ρj , ρj+1 ) is:

ρj+1
ρj+1 − cj (ρi − cj )(ρj+1 − ρj )
 
v − ρi
Z
Aj dv = Aj log − .
ρj (v − cj )2 ρ j − cj (ρj+1 − cj )(ρj − cj )

The total consumer surplus is the above expression summed over all i and j ≥ i, plus
A P
the consumer surplus of the mass point at the end, 1−cQQ i (1 − ρi ). Summing everything
together, we get:

n
" #
X ρi+1 − ci AQ X (ρi − cj )(ρj+1 − ρj )
U (x, p) = iAi log + (1 − ρi ) − Aj .
i=1
ρi − ci 1 − cQ j≥i
(ρ j+1 − c j )(ρ j − c j )

We claim that the non-logarithmic terms cancel out. To see this, fix an i, then take the
difference between the middle term and the j = Q summand of the last term:

(ρi − cQ )(1 − ρQ )
 
AQ AQ An−1
1 − ρi − = (ρQ − ρi ) = (ρ − ρi ).
1 − cQ ρ Q − cQ ρ Q − cQ ρQ − cn−1 Q

Repeat this recursively for remaining j, and at the last step the sum becomes 0, giving us

Q  
X ρ − ci
U (x, p) = iAi log i+1 . (B.4)
i=1
ρ i − ci

(B.4) reduces finding the optimal p to a constrained optimization problem, subject to


ρ1 ≤ ρ2 . The KKT conditions are:
   
ρ2 c 1−c ρ1 ρ1 c 1−c
µ = log +2 1− log −1= −2 2
log (FOC)
ρ1 ρ2 ρ2 − c ρ2 ρ2 ρ2 − c
µ(ρ2 − ρ1 ) = 0. (CS)

We can check when ρ1 = ρ2 is a solution. Plugging in ρ1 = ρ2 = ρ yields ρ = 1e (1 − c) + c.

69
Dual feasibility then requires that:

2(1 − c) 1
µ= − 1 ≥ 0 ⇐⇒ c ≤
1 + (e − 1)c 1+e

as desired.

This result can be compared with Haghpanah and Hartline (2021), which discusses con-
ditions under which bundling is optimal for the seller. Figure B.4 plots the prices the seller
charges against the consumer-optimal distribution as a function of c, in the limit where
V = [0, 1]. We can see that when the units have a similar cost, it is optimal to force the
1
seller to bundle the units together. But, as c increases above c = 1+e , the price of unit 1
falls while the price for unit 2 rises.

Figure B.4: Prices under consumer-optimal demand when Q = 2.

70

You might also like