Dynamic Programming Part I
1 Introduction
Dynamic Programming (DP) is a formulation and solution methodology that
helps solve complex decision-making problems. It achieves this by breaking
a decision problem into a sequence of smaller, more manageable decisions,
all tied together by the key concept known as the Principle of Optimality
(PoO).
The Layman’s explanation of the PoO states that given an optimal deci-
sion sequence, any sub-parts of the optimal decision are also optimal. That
is, PoO implies that if a solution is the best one overall, then any part of the
solution should also be the best for that part of the problem.
For example, consider finding the shortest route from point A to point
G via intermediate points B, C, and E. According to PoO, the route from
A to C must also be optimal. Otherwise, a shorter route from A to C could
improve the overall journey to G, which contradicts the assumption that the
original route was the shortest.
2 Dynamic Programming Formulation
The main components of a Dynamic Programming formulation are as follows:
1. Stages: These represent the sub-parts of the problem, often linked to
different points in time or steps in the process.
2. State: The state of the system or problem at any given stage. This is
the information available to the decision-maker at that point.
3. Decision: The action or control that the planner chooses at a partic-
ular stage.
1
4. Cost-to-go function: A function representing the remaining cost
from a current state to the end of the problem.
5. Optimal value: The goal is to determine the best possible value of
the objective function by optimizing the cost-to-go function.
6. Recursive equation: The problem is typically solved using a recursive
formula, breaking the larger problem into smaller, more manageable
sub-problems. The recursion can be built to be forward or backward
according to how stages are explored.
7. Boundary conditions: These are the simplest sub-parts where the
decision-making is trivial, often used as a starting point for the recursive
process.
2.1 Example I: The Stagecoach Problem
As first example, we consider a Stagecoach Problem which is very close to
the Shortest Path Problem.
In the 1800s, a salesman has to travel from city 1 to city 10, and the roads
between these cities are dangerous. For each leg of the journey, the salesman
must purchase travel insurance for his merchandise.
The cost of insurance for traveling between cities i and j is denoted by
aij , as shown in the figure below.
The journey is taken using a stagecoach, which departs each city in the
morning and arrives at another city by evening. The salesman spends the
night at a traveler’s inn before continuing the next leg of the journey the
following day.
The objective is to find the least expensive route (in terms of insurance
costs) that the salesman can take from city 1 to city 10.
The main elements of the DP formulation for this problem are:
1. Stages: Each day of the journey can be viewed as a stage, so the
problem has four stages corresponding to four days of travel. Formally,
t = 1, . . . , 4.
2. State: The state at any given stage is the city from which the salesman
starts the day’s journey, denoted by i where i belongs to the set of cities
Ct at stage t.
2
Figure 1: The Stagecoach Problem: A map illustrating the cities and the
insurance costs between them.
3. Decision: The decision is to choose the next city, j, that the salesman
will travel to, where j is in the set of cities Ct+1 at the next stage.
4. Cost-to-go function: Let ft (i) represent the minimum insurance cost
for a route that starts at city i on day t and follows an optimal path
onward to the destination.
5. Optimal cost: The objective is to find f1 (1), the minimum insurance
cost starting at city 1.
6. Recursive equation: The recursive equation for stage t is:
ft (i) = min {aij + ft+1 (j)}, ∀i ∈ Ct
j∈Ct+1
7. Boundary conditions: If we are at Stage 4, i.e., at the dawn of day
4, the salesman is either in city 8 or in city 9. The cost of traveling
from these cities to city 10 is already known:
We shall solve the problem backwards.
3
At Stage 3, the salesman could be in city 5, 6, or 7. The minimum
cost-to-go from these cities is calculated as:
Focusing on the first equality, f3 (5) is the minimum cost for reaching city
10 starting from city 5 on the third day of travel. The first term is the cost
when the route passes by city 8 while paying the insurance a58 = 1. The
second term describes the cost if the route goes through city 9.
Now, stage 2:
Finally,
Note that this problem could also have been solved using Dijkstra’s Al-
gorithm, as it is a shortest path problem.
2.2 Example II: The Integer Knapsack Problem
Now we will se how to use dynamic programming to solve the Integer Knap-
sack Problem. Recall that we have encountered both the binary version
4
and the integer version of this problem (the Cutting Stock Problem). Here,
we revisit the integer version.
In the integer knapsack problem, the variables xj for each item j =
1, . . . , n are non-negative integers that represent the quantity of each item.
From each item, we can take as many as the knapsack’s capacity allows,
subject to constraints.
The problem can be formulated as follows:
n
X
max p j xj
j=1
Subject to:
n
X
aj x j ≤ b
j=1
where: pj is the profit from item j, aj is the space or weight item j occupies,
and b is the total capacity (maximum weight or space) of the knapsack
We now proceed to the Dynamic Programming formulation
1. Stages: Each item j represents a stage in the decision-making process.
2. State: The state at any given stage is the remaining space (or capacity)
still available in the knapsack, denoted by y at stage j.
3. Decision: The decision at each stage is how many units of item j to
add to the knapsack, represented by xj .
4. Cost-to-go function: Let fj (y) represent the maximum profit at-
tainable starting from stage j with a remaining capacity of y, while
following an optimal policy in stages j, j + 1, . . . .
5. Optimal value sought: The goal is to find f1 (b), which gives the
maximum possible profit starting at the first item with a knapsack of
capacity of b.
6. Recursive equation: The recursive equation for this problem is:
fj (y) = max
{pj xj + fj+1 (y − aj xj )}
y
0≤xj ≤ aj
xj ∈Z+
5
This equation holds for all stages j and for all remaining capacities
y = 0, 1, . . . , b. The interpretation is: if a number xj of items is in-
serted in the knapsack, a profit pj xj is reached plus the maximum profit
achievable from items j + 1, . . . , j in the remaining capacity y − aj xj .
7. Boundary Condition: If we reach the last item, the decision becomes
straightforward because no further decisions are needed. The cost-to-go
function at this point is:
y
fn (y) = pn ∀y = 0, 1, . . . , b
an
This tells us how many last item copies we can take, given the remaining
capacity.
The recursion works in a backward fashion.
Consider the following integer knapsack problem:
max 11x1 + 7x2 + 5x3
Subject to:
6x1 + 4x2 + 3x3 ≤ 15
Where x1 , x2 , x3 ≥ 0 and integers.
We will now solve the problem starting from the last stage (item 3). At
stage 3, the boundary condition applies, and we can directly calculate the
maximum profit based on the remaining capacity: For capacities of 3 or more,
we begin adding item 3:
Continuing for higher capacities:
6
At stage 2, we calculate the maximum profit by adding item 2.
The two arguments in the function refer, respectively, to the choice of not
inserting item 2, thus using the 4 units of capacity to pick item 3, and to the
choice of insert item 2, gaining a profit of 7.
Finally for Stage 1, it is sufficient to consider only y = 15 as our knapsack
instance has a capacity of 15.
The optimal policy is then traced back:
7
2.3 Example III: Non-linear (Discrete) Resource Allo-
cation
In this section, we explore a non-linear version of a knapsack-type problem:
the Resource Allocation problem, which can be solved using Dynamic
Programming.
A corporation has $5 million to allocate among its three plants for possible
expansion. Each plant has submitted a series of proposals detailing the cost
of expansion and the expected revenue generated by the expansion. Each
plant will only be permitted to enact one of its proposals.
Proposal c1 r1 c2 r2 c3 r3
1 0 0 0 0 0 0
2 1 5 2 8 1 4
3 2 6 3 9 - -
4 - - 4 12 - -
The columns represent the different plants and their corresponding pro-
posals. For each proposal, c indicates the cost of expansion (in millions of
dollars), and r represents the expected revenue (in millions of dollars). The
goal is to maximize the total revenue using the $5 million available. We will
assume that any of the $5 million we don’t spend is lost.
1. Stages: Each plant j = 1, 2, 3 represents a stage in the decision-making
process.
2. State: The state at any stage j is the remaining budget, denoted yj ,
in millions.
3. Decision: The decision at each stage is which proposal xj to choose
for plant j. The chosen proposal will have a corresponding cost cj and
revenue rj .
4. Cost-to-go function: fj (y) denote the maximum revenue attainable
starting from stage j (plant j) with budget y remaining and following
an optimal policy for plants j, j + 1, . . ..
5. Optimal value sought: The goal is to find the optimal value at the
first stage, f1 (5).
8
6. Recursive Equation: The recursive equation for this DP problem is:
fj (y) = max {rj (xj ) + fj+1 (y − cj (xj ))}
xj :cj (xj )≤y
This equation tells us that at each stage j, given a remaining bud-
get y, we must choose the proposal xj that maximizes the sum of the
revenue from that proposal and the maximum revenue from the re-
maining stages (plants) with budget y − cj (xj ). Not the similarity to
the knapsack dynamic programming formulation.
7. Boundary Condition For the final stage (plant j = 3), the decision
is straightforward since there are no further stages:
f3 (y) = max {r3 (x3 )} for y = 0, 1, . . . , 5
x3 :c3 (x3 )≤y
We will now solve the problem backwards, starting from the last stage
and working towards the first stage.
For plant 3, we directly compute the maximum revenue for each budget
level:
Plant 3 can only choose from proposal 1 (which yields no revenue) or
proposal 2 (which costs 1 million and generates 4 million in revenue).
Next, we compute the maximum revenue for plant 2:
For larger budgets, we compute:
Consider f2 (5). The first argument assume that proposal 1 is selected
for plant 2, that is the ”no investment” in plant 2 option, and proposal 2
for plant 3 is chosen by investing $1 million. Then, the second argument
9
considers investing $2 millions in plant 2 for its proposal 2, and $1 million
in plant 3 for its proposal 2. The third option is to invest $3 millions in
proposal 3 presented by plant 2, whilst $1 million is invested in proposal 2
of plant 3. Finally, the last argument represents the choice of investing $4
millions on plant 2’s proposal 4 and the remaining amount in proposal 2 of
plant 3.
Finally, for plant 1, we calculate the maximum revenue:
2.4 Example IV: Equipment Replacement
Suppose a shop needs a specific machine continuously for the next five years.
Each new machine costs b = $1000. The maintenance costs for the machine
depend on its age: during the 1st year $60, during the 2nd year $80, during
the 3rd year $120. Let mi be the maintenance cost for a machine during its
i-th year of operation. A machine can be kept for a maximum of three years
before being traded in. The trade-in values depending on the machine’s age
are: after 1 year $800, after 2 years $600, after 3 years $500. We address by
si be the trade value of a machine after i years of operation.
Our goal is to minimize costs over this period by determining an optimal
replacement/maintenance strategy.
Recall that we can solve this problem using Dijkstra Shortest Path al-
gorithm on a suitable graph: one node per each year, an arc (i, j) if a new
machine is purchased at year P i with cost bi and sold at year j with salvage
value sj−i , arc cost cij = bi + j−i
k=1 mk − sj−i .
The elements of the dynamic programming are:
1. Stages: Each stage corresponds to the beginning of each year, t =
1, . . . , 5.
2. State: The state, xt , is the age of the machine at the beginning of year
t.
10
3. Decision: At each stage, the decision is whether to keep (K) the ma-
chine or replace (R) it.
4. Cost-to-go function: Define ft (x) as the minimum cost of an optimal
policy starting in year t with a machine that is x years old and following
an optimal policy through the years t, t + 1, . . . 5.
5. Optimal value sought: We aim to find f1 (0), as we start with a
brand-new machine.
6. Recursive equation: Consider each possible age for the machine at
time t = 1, 2, 3, 4.
• If the machine is three years old at time t, it must be traded in:
ft (3) = −s3 + b + m1 + ft+1 (1) = −500 + 1000 + 60 + ft+1 (1).
While s3 is related to the trade-in of the old machine, b and m1 are
the costs for buying the new machine and to maintain it during
its first year of operation, respectively. Then, ft+1 (1) comprises
the costs for the subsequent year having a 1 year old machine.
• If the machine is two or one year(s) old at time t, we have two
options:
ft (2) = min −s2 + b + m1 + ft+1 (1), m3 + ft+1 (3) =
| {z } | {z }
R K
min −600 + 1000 + 60 + ft+1 (1), 120 + ft+1 (3)
| {z } | {z }
R K
ft (1) = min −s1 + b + m1 + ft+1 (1), m2 + ft+1 (2) =
| {z } | {z }
R K
min −800 + 1000 + 60 + ft+1 (1), 80 + ft+1 (2)
| {z } | {z }
R K
11
7. Boundary conditions: At the end of the 5-year period (beginning of
year 6), we assume the machine is no longer needed:
f6 (x) = −sx , x = 1, 2, 3
We now compute the cost-to-go values starting from the end of the period
and moving backwards.
For stage 5 (beginning of year 5)
For stage 4
At stage 3
In stage 2 the machine is only one year old:
In year 1, we start with a brand-new machine:
12
There are multiple optimal policies that result in the minimum cost of
1280. Let BN be the buy new option. The policies are:
2.5 Example V: Non-additive recursion
Here is an example where we multiply terms to get the recursion.
A student is currently taking three courses. It is important that he not
fail all of them. If the probability of failing French is p1 , the probability of
failing English is p2 , and the probability of failing Statistics is p3 , then the
probability of failing all of them is p1 × p2 × p3 . He has left himself with
four hours to study. How should he minimize his probability of failing all his
courses?
Denote the entries in the below table as pt (k) , the probability of failing
course t given k hours are spent on it.
Table 1: Student failure probabilities.
Hours French English Statistics
0 0.8 0.75 0.9
1 0.7 0.7 0.7
2 0.65 0.67 0.6
3 0.62 0.65 0.55
4 0.6 0.62 0.5
1. Stages: Each stage corresponds to a course, t = 1, 2, 3.
2. State: The number of hours x left for studying for the courses of the
corresponding stage and all the subsequent stages.
3. Decision: At each stage, decide how many hours k spend to study the
corresponding subject.
13
4. Cost-to-go function: Define ft (x) be the probability of failing t and
all following courses, assuming x hours are available and to employ an
optimal policies for all the following courses.
5. Optimal value sought: We aim to find f1 (4), as we have available 4
hours for studying in total.
6. Recursive equation: The recursion is given as follows for each stage
t:
ft (x) = min {pt (k)ft+1 (x − k)}.
k:k≤x
Given x remaining hours of study, the function computes the minimum
cost depending on the number of hours k invested in studying for course
t.
7. Boundary conditions: For the final course, we invest all the remain-
ing time:
f3 (x) = p3 (x) ∀x ≤ 4
The solution is computed by backward recursion.
Starting the dynamic programming, boundary conditions are
At stage 2, the recursion becomes
By iterating, one obtains the following:
Table 2: f2 (x) results.
14
Finally, in stage 1 it is sufficient to compute the value for x = 4, that is
the full study time available:
2.6 Example V: The Traveling Salesperson Problem
Recall the Traveling Salesperson Problem (TSP) asks for visiting all the n
cities at the minimum distance. For instance, let us consider that a politician
begin its tour in New York and has to visit Miami, Dallas and Chicago before
returning back to New York.
Table 3: Distances dij among cities.
New York Miami Dallas Chicago
1. New York - 1334 1559 809
2. Miami 1334 - 1343 1397
3. Dallas 1559 1343 - 921
4. Chicago 809 1397 921 -
1. Stages: Each stage t represents having visited t cities.
2. State: The city i in which the politician currently is and the set of t
cities S previously visited, i.e. the pair (i, S). This is required to decide
where to go next and to compose a feasible tour (no city, except the
starting one, is visited twice).
3. Decision: The next city j to visit along the tour.
15
4. Cost-to-go function: Define ft (i, S) be the minimum total distance
for a route that start from i after visiting t cities and follows an optimal
route up to the destination.
5. Optimal value sought: We aim to find f0 (1, {∅}), the tour value
starting from New York with no city already visited.
6. Recursive equation: The recursion is given as follows for each stage
t:
ft (i, S) = min {dij + ft+1 (j, S ∪ j)}.
j̸=i and j ∈S
/
Given the politician in city i that already visited the cities in S, it
evaluates all the possible next cities j and picks the one leading to the
minimum total distance given stages t, t + 1, . . . .
7. Boundary conditions: For the final stage, the conditions are set for
every possible last city i in which the politician can be located:
fn−1 (i, S = {2, . . . , n}) = di1 ∀i = 2, . . . , n
We proceed backward.
First, we set boundary conditions
At stage 2, let S = {2, 3} :
Let S = {2, 4}
16
Let S = {3, 4}
At stage 1, given S = {2}
Let S = {3}
When S = {4}
Finally, the optimal value is
Although a dynamic programming approach can be devised to solve TSP,
the state space here is extremely wide due to the dependence on the set S.
Indeed, there exist |S| states for any possible S ⊆ {2, . . . , n}, that is 2n−1
subsets. Just to have an intuition, let us consider a TSP instance with 20
cities. The number of states in the 10th stage is more than a million. There-
fore, the computational burden makes this method not viable in practice, no
matter the machine available nowadays (and in the future).
17