In this post I intend to explain what a Linear Program (LP) is, and how to solve an LP problem using Karmarkar's Algorithm implemented in Python. As always, if you see an improvement, feel free to submit a pull request.
A Linear Program is a problem of the form
Maximize:
\[z = \vec c^T \vec x\]
Subject to:
\[A \vec x \leq \vec b\]
Where the inequality is element-wise. The relation between \(A \vec x\) and \(\vec b\) could be any kind of equality constraint (\(\leq\), \(\geq\), or \(=\)), but "less-than-or-equal-to" is the typical relation we will use before we try to extend our basic understanding to larger problems.
Consider the following problem
Maximize:
\[ z = 3 x_1 + 2 x_2\]
Subject to:
\[\begin{aligned}2x_1 + x_2 &\leq 100\\x_1 + x_2 &\leq 80\\x_1 &\leq 40\\x_1, x_2 &\geq 0\end{aligned}\]
We wish to maximize the value of the objective function
\[z = 3x_1 + 2x_2\]
Subject to the constraints
\[\begin{aligned} 2x_1 + x_2 &\leq 100\\ x_1 + x_2 &\leq 80\\ x_1 &\leq 40\\ x_1, x_2 &\geq 0\\ \end{aligned}\]
The last constraint, stating that \(x_1, x_2 \geq 0\) is called the non-negativity constraint and is typically not stated for basic problems. We call the components \(x_1, x_2\), of the vector \(\vec x\), the LP decision variables, and the coefficients of the objective function \(3\) and \(2\) components of the cost vector \(\vec c\).
We can represent the constraints in one fell swoop using the matrix inequality
\[\begin{pmatrix}2 & 1\\1 & 1\\1 & 0\\\end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix} \leq \begin{pmatrix}100\\80\\40\end{pmatrix}\]
We call the points \((x_1, x_2)\) that satisfy all of the given constraints feasible points and the set of all feasible points the feasible set.
The feasible set for this example problem is illustrated below. The constraint equations are shown, as is the region that satisfies all of the constraints. Note that these problems don't have to be 2-dimensional, and in fact, most are not. The challenge of solving these problems comes from their size.
Now that we can see what points are possible, we need to find the point in the feasible set that maximizes our objective function. Currently, everything we've graphed relate to the LP constraints, and not the objective function.
Recall the concept of Level Curves from Calculus III. Level curves are curves of fixed \(z\) values. The following is a graph of several level curves of \(z\) superimposed on the graph of the feasible set.
See how we can immediately pick out the point that maximizes the value of the objective function? Apparently the optimal solution - the point that maximizes the value of the objective function is directly related to the slope of the objective function.
Consider for example, what if the level curves of the objective function had a steeper slope? Then the optimal solution might be different, and so might the optimal value of the objective function as pictured below.
In short, our method for solving these LP problems will be to pick a random point on the interior of the feasible set, and move in the direction that maximizes the change in the objective function. In our 2-dimensional example above this means moving in the direction perpendicular to the level curves illustrated below.
We can immediately see that we'll run into a problem however. If we naively run in the perpendicular direction, we won't always hit the point on the edge of the feasible set that maximizes our objective function. We're going to have to do something different.
True to form, linear algebra will come to our rescue.
First however, we must introduce the idea of LP problems in standard form. Problems in standard form look like
Maximize:
\[ z = \vec c^T \vec x\]
Subject to:
\[ A \vec x = \vec b\]
This is subtly different than the first form of the LP problem I introduced. This one requires that all of our constraints are that of equality. If not for an algebra trick, this would put a dampener on our plans of world domination solving LP problems efficiently.
We will solve our problems by introducing more variables to our problem. Our goal is to reduce problems of multiple forms into the single standard form of an LP. That way we can develop just a single strategy that will solve all of our problems.
Here's a list of different possibilities.
\[4x_1 + 5x_2 \leq 42\]
We introduce the slack variable \(s_1\) to turn the above constraint into
\[4x_1 + 5x_2 + s_1 = 42\]
We also append \(s_1\) to the end of \(\vec x\) and append a \(0\) to the end of the cost vector \(\vec c\).
\[4x_1 + 5x_2 \geq 42\]
We introduce the excess variable \(e_1\) to turn the above constraint into
\[4x_1 + 5x_2 - e_1 = 42\]
Note the coefficient of \(-1\). We also append \(e_1\) to the end of \(\vec x\) and a \(0\) to the end of \(\vec c\).
\[4x_1 + 5x_2 \leq -42\]
We multiply by negative one
\[-4x_1 - 5x_2 \geq 42\]
Then add an excess variable as dictated above
\[-4x_1 - 5x_2 - e_1 = 42\]
\[z = 2x_1 + 3x_2\]
This is equivalent to maximizing the function
\[z' = -z = -2x_1 - 3x_2\]
The following cases are for constraints on decision variables. Normally we just have \(\vec x \geq 0\) elementwise. This is analogous to restricting the feasible set to the first quadrant in 2-dimensions, and its equivalent in \(n\)-dimensions.
We replace \(x_2\) by \(x_2' = x_2 - 10\), or rather \(x_2 = x_2' + 10\).
\[z = 30x_1 - 4(x_2' + 10)\]
We now have the objective function
\[z = 30x_1 - 4x_2' + 40\]
which is not in standard form due to the additive constant. We move the \(40\) over to the left hand side and rename the quantity \(z - 40\) to \(z'\).
\[z' = z - 40 = 30x_1 - 4x_2\]
We define \(x_j = x_j' - x_j''\) where \(x_j', x_j'' \geq 0\) and replace all occurrences of \(x_j\) in our problem with \(x_j' - x_j''\) and simplify.
Suppose we have the problem
Minimize:
\[ z = 2x_1 + 3x_2\]
Subject to:
\[\begin{aligned}x_1 - x_2 &\geq -1\\x_1 + 3x_2 &\geq 20\\x_1 + x_2 &= 10\\x_1, x_2 &\geq 0\end{aligned}\]
that we wish to reduce to standard form. First we note this is a minimization problem, so we negate the objective function to get
\[-z = -2x_1 -3x_2\]
Then, taking one constraint at a time, we multiply the first by \(-1\) to get
\[-1x_1 + x_2 \leq 1\]
to which we add a slack variable \(s_1\)
\[-1x_1 + x_2 + s_1 = 1\]
Then we take the second constraint and add an excess variable \(e_1\) to get
\[x_1 + 3x_2 - e_1 = 20\]
We get the last constraint for free, as it's already in standard form.
Putting this all together we get the LP in standard form
Maximize:
\[ -z = -2x_1 - 3x_2 + 0s_1 + 0e_1\]
Subject to:
\[\begin{aligned}-x_1 + x_2 + s_1 + 0e_1&= 1\\x_1 + 3x_2 +0s_1 -e_1 &= 20\\x_1 + x_2 + 0s_1 + 0e_1 &= 10\\x_1, x_2 &\geq 0\end{aligned}\]
which we can represent concisely with
Maximize:
\[ z = \vec c^T\vec x\]
Subject to:
\[\begin{aligned}A\vec x &= \vec b\\\vec x &\geq 0\end{aligned}\]
where \(\vec c = \begin{pmatrix}-2\\-3\\0\\0\end{pmatrix}\), \(\vec x = \begin{pmatrix}x_1\\x_2\\s_1\\e_1\end{pmatrix}\), \(A = \begin{pmatrix}-1 & 1 & 1 & 0\\1 & 3 & 0 & 1\\1 & 1 & 0 & 0\end{pmatrix}\), and \(\vec b = \begin{pmatrix}1\\20\\10\end{pmatrix}\).
How do we describe the direction of greatest change in an \(n\)-dimensional setting. Recall the concept of the gradient vector \[ \vec \nabla f = \left\langle \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right\rangle\] which is an \(n\)-dimensional vector of the partial derivatives of \(f\) that points in the direction of greatest change of the value of \(f\).
For the function \(f(\vec x) = \vec c^T \vec x\), the gradient vector \(\vec \nabla f = \vec c\).
A naive algorithm might therefore be: pick a point in the feasible set and travel in the direction of the gradient vector. However, it is possible (and even likely) that the gradient vector does not live inside the feasible set. Therefore, we work with the projection of \(\vec c\) onto the feasible set.
Suppose we have a vector \(\vec c = (3, 2, 4)\) that we wish to project onto the \(x_1, x_2\) plane, illustrated below.
This is equivalent to the linear algebra problem of projecting onto the solution set (null space) of the system \(A \vec x = \vec 0\). In this particular example, we wish to project from three dimensions onto two dimensions. This means we need the matrix \(A = \begin{pmatrix}0 & 0 & 1\end{pmatrix}\) so that in the above system \(x_3\) gets mapped to \(0\).
We ultimately wish to find a projection matrix \(P\) such that the projected vector \(\vec c_p = P \vec c\). We do this by the nightmarish formula \(P = I - A^T(A A^T)^{-1}A\).
Consider the problem
Maximize:
\[ z = x_1 + 2x_2\]
Subject to:
\[x_1 + x_2 \leq 8\]
Converting to standard form we get the problem
Maximize:
\[ z = x_1 + 2x_2 + 0s_1\]
Subject to:
\[x_1 + x_2 + s_1 = 8\]
However, for geometric intuition's sake, I'm going to call \(s_1\), \(x_3\) instead:
Maximize:
\[ z = x_1 + 2x_2 + 0x_3\]
Subject to:
\[x_1 + x_2 + x_3 = 8\]
Or, rather
Maximize:
\[ z = \begin{pmatrix}1 & 2 & 0\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}\]
Subject to:
\[\begin{pmatrix}1 & 1 & 1\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} = 8\]
The feasible set and optimal solution are shown below.
Now we calculate \(\vec \nabla z\), which, as a consequence of their linearity, for every problem will be \(\vec c\). Notice how \(\vec c\) is not within the feasible set.
This means we need to calculate the projection of \(\vec c\) onto the feasible set \(A\) in order to move a point \(\vec x_\text{old}\) to its new home \(\vec x_\text{new}\) shown below.
This is an iterative algorithm. It will take in a starting point \(\vec x\) and output our next starting point \(\vec x_\text{new}\). After some unknown number of iterations, it will converge to the optimal solution. This algorithm will work for \(n\) dimensions, but for sake of notation I will only use 3. Note that all vectors are column vectors.
Define a scaling matrix \(D = \begin{pmatrix}x_1 & 0 & 0\\0 & x_2 & 0\\0 & 0 & x_3\end{pmatrix}\) and its inverse \(D^{-1} = \begin{pmatrix}\frac{1}{x_1} & 0 & 0\\0 & \frac{1}{x_2} & 0\\0 & 0 & \frac{1}{x_3}\end{pmatrix}\).
\[\begin{aligned}\tilde A &= AD\\ \tilde c &= D\vec c\end{aligned}\]
Compute the projection matrix \(P = I - \tilde A^T\left(\tilde A \tilde A^T\right)^{-1}\tilde A\).
Compute \(\tilde x_\text{new} = \begin{pmatrix}1\\1\\1\end{pmatrix} + k \tilde c_p\) where \(k > 0\) is chosen to take us halfway between where we are and the edge of the feasible set.
Convert \(\tilde x_\text{new}\) back into original (unscaled) coordinate system by \(\vec x_\text{new} = D \tilde x_\text{new}\)
Here is a graph of first few iterations of the example above.
Karmarkar's Algorithm is still not the standard LP solver algorithm, that title lies firmly in the hands of the Simplex Algorithm. There are some edge cases that give us very poor performance with Karmarkar's Algorithm. However, for very large problems, it offers good performance most of the time. See here for a description of various LP solver algorithms.
One of the benefits of Karmarkar's Algorithm, is that it is easy to implement. Here's a first crack at a Python implementation.
Note all vectors must be row vectors for the matrix multiplication dimensions to work out. This is different than the description above, but was necessary for simpler programming.
import numpy as np
class LPSolution(object):
def __init__(self):
self.iterations = None
self.tolerance = None
self.intermediates = []
self.solution = None
self.solution_string = None
def __str__(self):
self.solution_string = 'Solution: ' + str(self.solution)
self.solution_string += '\n\tTolerance: ' + str(self.tolerance)
self.solution_string += '\n\tIterations: ' + str(self.iterations)
return self.solution_string
class LinearProgram(object):
"""A class that implements Karmarkar's Algorithm for the solution of
Linear Programs in standard form."""
def __init__(self, A, b, c):
"""Constructs an n-variable m-constraint Linear Program.
A -- An n x m numpy matrix of constraint coefficients
b -- A 1 x m numpy row vector of constraint RHS values
c -- A 1 x n numpy row vector of objective function coefficients
"""
self.A = A
self.n, self.m = self.A.shape
self.b = b
self.c = c
self.solution = None
def karmarkar(self, start_point):
"""Runs one iteration of Karmarkar's Algorithm.
start_point -- A 1 x n numpy row vector of decision variable values
"""
D = np.diagflat(start_point)
c_tilde = np.matmul(self.c, D)
A_tilde = np.matmul(self.A, D)
A_tildeT = A_tilde.transpose()
AAT_inverse = np.linalg.inv(np.matmul(A_tilde, A_tildeT))
# matrix multiplication is associative
P = np.identity(self.m) - np.matmul(np.matmul(A_tildeT, AAT_inverse), A_tilde)
cp_tilde = np.matmul(c_tilde, P)
k = -0.5 / np.amin(cp_tilde)
x_tilde_new = np.ones((1, self.m), order='F') + k * cp_tilde
return np.matmul(x_tilde_new, D)
def solve(self, start_point, tolerance=1e-5, max_iterations=50, verbose=False):
"""Uses Karmarkar's Algorithm to solve a Linear Program.
start_point -- A starting point for Karmarkar's Algorithm. Must be a row vector.
tolerance -- The stopping tolerance of Karmarkar's Algorithm.
max_iterations -- The maximum number of iterations to run Karmarkar's Algorithm.
verbose -- List all intermediate values.
"""
x = start_point
solution = LPSolution()
for i in range(max_iterations):
x_new = self.karmarkar(x)
if verbose:
print(x_new)
dist = np.linalg.norm(x - x_new)
x = x_new
solution.intermediates.append(x)
if dist < tolerance:
break
solution.solution = x
solution.iterations = i
solution.tolerance = dist
self.solution = solution
return solution
We use the solver as follows
#!/usr/bin/python3
import numpy as np
from linear_program import LinearProgram, LPSolution
def generate_tikz_plot(solution):
"""Generates a 3D Tikz LaTeX coordinate strings for the intermediate solutions."""
for i, soln in enumerate(solution.intermediates):
coordinate = r'\coordinate (xnew{}) at '.format(i)
coordinate += r'({}, {}, {});'.format(*[coord for coord in soln.flat])
print(coordinate)
draw = r'\draw[red] (x) node[circle, fill, inner sep=1pt]{{}} '
for i in range(len(solution.intermediates)):
draw += r'-- (xnew{}) node[circle, fill, inner sep=1pt]{{}} '.format(i)
draw += r';'
print(draw)
def main():
A = np.matrix([[1, 1, 1], ])
b = np.array([8, ])
c = np.array([1, 2, 0])
LP = LinearProgram(A, b, c)
LP.solve(start_point=np.array([1, 1, 6]))
print(LP.solution)
if __name__ == '__main__':
main()
Running the above script, we get the following output
Solution: [[ 2.97874012e-06 7.99999553e+00 1.48937006e-06]]
Tolerance: 5.57390111055e-06
Iterations: 21
Which we can see in the following image.