Karmarkar's Algorithm — A geometric understanding of linear programming

In this post I intend to explain what a Linear Program (LP) is, and how to solve an LP problem using Karmarkar's Algorithm implemented in Python. As always, if you see an improvement, feel free to submit a pull request.

Introduction

A Linear Program is a problem of the form

Linear Program

Maximize:

\[z = \vec c^T \vec x\]

Subject to:

\[A \vec x \leq \vec b\]

Where the inequality is element-wise. The relation between \(A \vec x\) and \(\vec b\) could be any kind of equality constraint (\(\leq\), \(\geq\), or \(=\)), but "less-than-or-equal-to" is the typical relation we will use before we try to extend our basic understanding to larger problems.

Example

Consider the following problem

Linear Program

Maximize:

\[ z = 3 x_1 + 2 x_2\]

Subject to:

\[\begin{aligned}2x_1 + x_2 &\leq 100\\x_1 + x_2 &\leq 80\\x_1 &\leq 40\\x_1, x_2 &\geq 0\end{aligned}\]

We wish to maximize the value of the objective function

\[z = 3x_1 + 2x_2\]

Subject to the constraints

\[\begin{aligned} 2x_1 + x_2 &\leq 100\\ x_1 + x_2 &\leq 80\\ x_1 &\leq 40\\ x_1, x_2 &\geq 0\\ \end{aligned}\]

The last constraint, stating that \(x_1, x_2 \geq 0\) is called the non-negativity constraint and is typically not stated for basic problems. We call the components \(x_1, x_2\), of the vector \(\vec x\), the LP decision variables, and the coefficients of the objective function \(3\) and \(2\) components of the cost vector \(\vec c\).

We can represent the constraints in one fell swoop using the matrix inequality

\[\begin{pmatrix}2 & 1\\1 & 1\\1 & 0\\\end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix} \leq \begin{pmatrix}100\\80\\40\end{pmatrix}\]

We call the points \((x_1, x_2)\) that satisfy all of the given constraints feasible points and the set of all feasible points the feasible set.

The feasible set for this example problem is illustrated below. The constraint equations are shown, as is the region that satisfies all of the constraints. Note that these problems don't have to be 2-dimensional, and in fact, most are not. The challenge of solving these problems comes from their size.

A Basic Linear Program
The linear constraints forming the feasible region of a basic linear program

Now that we can see what points are possible, we need to find the point in the feasible set that maximizes our objective function. Currently, everything we've graphed relate to the LP constraints, and not the objective function.

Recall the concept of Level Curves from Calculus III. Level curves are curves of fixed \(z\) values. The following is a graph of several level curves of \(z\) superimposed on the graph of the feasible set.

A Basic Linear Program
Selected level curves for fixed \(z\) values

See how we can immediately pick out the point that maximizes the value of the objective function? Apparently the optimal solution - the point that maximizes the value of the objective function is directly related to the slope of the objective function.

Consider for example, what if the level curves of the objective function had a steeper slope? Then the optimal solution might be different, and so might the optimal value of the objective function as pictured below.

A Basic Linear Program
Selected level curves for an objective function with a steeper slope constrained to the same feasible region

In short, our method for solving these LP problems will be to pick a random point on the interior of the feasible set, and move in the direction that maximizes the change in the objective function. In our 2-dimensional example above this means moving in the direction perpendicular to the level curves illustrated below.

A Basic Linear Program
Following the gradient

We can immediately see that we'll run into a problem however. If we naively run in the perpendicular direction, we won't always hit the point on the edge of the feasible set that maximizes our objective function. We're going to have to do something different.

True to form, linear algebra will come to our rescue.

Standard Form

First however, we must introduce the idea of LP problems in standard form. Problems in standard form look like

Linear Program

Maximize:

\[ z = \vec c^T \vec x\]

Subject to:

\[ A \vec x = \vec b\]

This is subtly different than the first form of the LP problem I introduced. This one requires that all of our constraints are that of equality. If not for an algebra trick, this would put a dampener on our plans of world domination solving LP problems efficiently.

We will solve our problems by introducing more variables to our problem. Our goal is to reduce problems of multiple forms into the single standard form of an LP. That way we can develop just a single strategy that will solve all of our problems.

Here's a list of different possibilities.

A \(\leq\) constraint.
  • Introduce a slack variable \(s_i\) that can take on any positive value in order to take up the slack required to turn a \(\leq\) constraint into an \(=\) constraint.
  • Suppose we have the constraint

    \[4x_1 + 5x_2 \leq 42\]

    We introduce the slack variable \(s_1\) to turn the above constraint into

    \[4x_1 + 5x_2 + s_1 = 42\]

    We also append \(s_1\) to the end of \(\vec x\) and append a \(0\) to the end of the cost vector \(\vec c\).

A \(\geq\) constraint.
  • Introduce an excess variable \(e_i\) that can take on any positive value in order to soak up the excess required to turn a \(\geq\) constraint into an \(=\) constraint.
  • Suppose we have the constraint

    \[4x_1 + 5x_2 \geq 42\]

    We introduce the excess variable \(e_1\) to turn the above constraint into

    \[4x_1 + 5x_2 - e_1 = 42\]

    Note the coefficient of \(-1\). We also append \(e_1\) to the end of \(\vec x\) and a \(0\) to the end of \(\vec c\).

A constraint with a negative right hand side.
  • Multiply both sides by \(-1\) and flip inequality to turn into one of the above cases.
  • Suppose we have the constraint

    \[4x_1 + 5x_2 \leq -42\]

    We multiply by negative one

    \[-4x_1 - 5x_2 \geq 42\]

    Then add an excess variable as dictated above

    \[-4x_1 - 5x_2 - e_1 = 42\]

The problem states to minimize the objective function.
  • We maximize the negative of the objective function.
  • Suppose we wish to minimize the function

    \[z = 2x_1 + 3x_2\]

    This is equivalent to maximizing the function

    \[z' = -z = -2x_1 - 3x_2\]

The following cases are for constraints on decision variables. Normally we just have \(\vec x \geq 0\) elementwise. This is analogous to restricting the feasible set to the first quadrant in 2-dimensions, and its equivalent in \(n\)-dimensions.

One of the decision variables is constrained to be greater than or equal to some constant \(c\).
  • Suppose that, rather than all variables being greater than or equal to zero, we have some decision variable \(x_j\) with the constraint \(x_j \geq c\). Note that \(x_j - c \geq 0\). So we replace all instances of \(x_j - c\) with \(x_j'\).
  • Consider the objective function \(z = 30x_1 - 4x_2\) with the constraint that \(x_2 \geq 10\).

    We replace \(x_2\) by \(x_2' = x_2 - 10\), or rather \(x_2 = x_2' + 10\).

    \[z = 30x_1 - 4(x_2' + 10)\]

    We now have the objective function

    \[z = 30x_1 - 4x_2' + 40\]

    which is not in standard form due to the additive constant. We move the \(40\) over to the left hand side and rename the quantity \(z - 40\) to \(z'\).

    \[z' = z - 40 = 30x_1 - 4x_2\]

  • We then proceed as normal, maximizing or minimizing \(z'\), from which we can easily recover the value of \(z\).
One of the decision variables is unrestricted in sign, denoted urs.
  • Suppose that the decision variable \(x_j\) has no sign restriction -- it can take on any value.

We define \(x_j = x_j' - x_j''\) where \(x_j', x_j'' \geq 0\) and replace all occurrences of \(x_j\) in our problem with \(x_j' - x_j''\) and simplify.

Example

Suppose we have the problem

Linear Program

Minimize:

\[ z = 2x_1 + 3x_2\]

Subject to:

\[\begin{aligned}x_1 - x_2 &\geq -1\\x_1 + 3x_2 &\geq 20\\x_1 + x_2 &= 10\\x_1, x_2 &\geq 0\end{aligned}\]

that we wish to reduce to standard form. First we note this is a minimization problem, so we negate the objective function to get

\[-z = -2x_1 -3x_2\]

Then, taking one constraint at a time, we multiply the first by \(-1\) to get

\[-1x_1 + x_2 \leq 1\]

to which we add a slack variable \(s_1\)

\[-1x_1 + x_2 + s_1 = 1\]

Then we take the second constraint and add an excess variable \(e_1\) to get

\[x_1 + 3x_2 - e_1 = 20\]

We get the last constraint for free, as it's already in standard form.

Putting this all together we get the LP in standard form

Linear Program

Maximize:

\[ -z = -2x_1 - 3x_2 + 0s_1 + 0e_1\]

Subject to:

\[\begin{aligned}-x_1 + x_2 + s_1 + 0e_1&= 1\\x_1 + 3x_2 +0s_1 -e_1 &= 20\\x_1 + x_2 + 0s_1 + 0e_1 &= 10\\x_1, x_2 &\geq 0\end{aligned}\]

which we can represent concisely with

Linear Program

Maximize:

\[ z = \vec c^T\vec x\]

Subject to:

\[\begin{aligned}A\vec x &= \vec b\\\vec x &\geq 0\end{aligned}\]

where \(\vec c = \begin{pmatrix}-2\\-3\\0\\0\end{pmatrix}\), \(\vec x = \begin{pmatrix}x_1\\x_2\\s_1\\e_1\end{pmatrix}\), \(A = \begin{pmatrix}-1 & 1 & 1 & 0\\1 & 3 & 0 & 1\\1 & 1 & 0 & 0\end{pmatrix}\), and \(\vec b = \begin{pmatrix}1\\20\\10\end{pmatrix}\).

Karmarkar's Algorithm

How do we describe the direction of greatest change in an \(n\)-dimensional setting. Recall the concept of the gradient vector \[ \vec \nabla f = \left\langle \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right\rangle\] which is an \(n\)-dimensional vector of the partial derivatives of \(f\) that points in the direction of greatest change of the value of \(f\).

For the function \(f(\vec x) = \vec c^T \vec x\), the gradient vector \(\vec \nabla f = \vec c\).

A naive algorithm might therefore be: pick a point in the feasible set and travel in the direction of the gradient vector. However, it is possible (and even likely) that the gradient vector does not live inside the feasible set. Therefore, we work with the projection of \(\vec c\) onto the feasible set.

Projections

Suppose we have a vector \(\vec c = (3, 2, 4)\) that we wish to project onto the \(x_1, x_2\) plane, illustrated below.

A Basic Linear Program
The vector \(\vec c\) projected onto the \(x_1,x_2\) plane

This is equivalent to the linear algebra problem of projecting onto the solution set (null space) of the system \(A \vec x = \vec 0\). In this particular example, we wish to project from three dimensions onto two dimensions. This means we need the matrix \(A = \begin{pmatrix}0 & 0 & 1\end{pmatrix}\) so that in the above system \(x_3\) gets mapped to \(0\).

We ultimately wish to find a projection matrix \(P\) such that the projected vector \(\vec c_p = P \vec c\). We do this by the nightmarish formula \(P = I - A^T(A A^T)^{-1}A\).

Example

Consider the problem

Linear Program

Maximize:

\[ z = x_1 + 2x_2\]

Subject to:

\[x_1 + x_2 \leq 8\]

Converting to standard form we get the problem

Linear Program

Maximize:

\[ z = x_1 + 2x_2 + 0s_1\]

Subject to:

\[x_1 + x_2 + s_1 = 8\]

However, for geometric intuition's sake, I'm going to call \(s_1\), \(x_3\) instead:

Linear Program

Maximize:

\[ z = x_1 + 2x_2 + 0x_3\]

Subject to:

\[x_1 + x_2 + x_3 = 8\]

Or, rather

Linear Program

Maximize:

\[ z = \begin{pmatrix}1 & 2 & 0\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}\]

Subject to:

\[\begin{pmatrix}1 & 1 & 1\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} = 8\]

The feasible set and optimal solution are shown below.

A Basic Linear Program
The feasible set for the given linear program

Now we calculate \(\vec \nabla z\), which, as a consequence of their linearity, for every problem will be \(\vec c\). Notice how \(\vec c\) is not within the feasible set.

A Basic Linear Program
The objective function's gradient vector

This means we need to calculate the projection of \(\vec c\) onto the feasible set \(A\) in order to move a point \(\vec x_\text{old}\) to its new home \(\vec x_\text{new}\) shown below.

A Basic Linear Program
Stepping in the feasible region in the direction of the gradient

The Algorithm

This is an iterative algorithm. It will take in a starting point \(\vec x\) and output our next starting point \(\vec x_\text{new}\). After some unknown number of iterations, it will converge to the optimal solution. This algorithm will work for \(n\) dimensions, but for sake of notation I will only use 3. Note that all vectors are column vectors.

Algorithm
  1. Define a scaling matrix \(D = \begin{pmatrix}x_1 & 0 & 0\\0 & x_2 & 0\\0 & 0 & x_3\end{pmatrix}\) and its inverse \(D^{-1} = \begin{pmatrix}\frac{1}{x_1} & 0 & 0\\0 & \frac{1}{x_2} & 0\\0 & 0 & \frac{1}{x_3}\end{pmatrix}\).

    • Use \(D^{-1}\) to scale \(\vec x\) to \(\begin{pmatrix}1\\1\\1\end{pmatrix}\). This calculation can effectively be ignored, just redefine \(\vec x = \mathbb 1\).
    • Compute scaled versions of \(A\) and \(\vec c\), call them \(\tilde A\) and \(\tilde c\) by

      \[\begin{aligned}\tilde A &= AD\\ \tilde c &= D\vec c\end{aligned}\]

  2. Compute the projection matrix \(P = I - \tilde A^T\left(\tilde A \tilde A^T\right)^{-1}\tilde A\).

    • Compute \(\tilde c_p = P\tilde c\)
  3. Compute \(\tilde x_\text{new} = \begin{pmatrix}1\\1\\1\end{pmatrix} + k \tilde c_p\) where \(k > 0\) is chosen to take us halfway between where we are and the edge of the feasible set.

    • That is, let \(\displaystyle k = \frac{-\frac{1}{2}}{\min\left\{\tilde c_p\right\}}\)
    • \(k\) should always be positive -- and \(\min\{\tilde c_p\}\) should always be negative, and could be read "most negative".
  4. Convert \(\tilde x_\text{new}\) back into original (unscaled) coordinate system by \(\vec x_\text{new} = D \tilde x_\text{new}\)

    • If \(\vec x_\text{new}\) has changed by less than some defined tolerance, we're done and \(\vec x_\text{new}\) is within some epsilon of the optimal solution.
    • Otherwise, go back to step 1.

Here is a graph of first few iterations of the example above.

A Basic Linear Program
A few iterations of stepping in the direction of the gradient

Karmarkar's Algorithm is still not the standard LP solver algorithm, that title lies firmly in the hands of the Simplex Algorithm. There are some edge cases that give us very poor performance with Karmarkar's Algorithm. However, for very large problems, it offers good performance most of the time. See here for a description of various LP solver algorithms.

Python Implementation

One of the benefits of Karmarkar's Algorithm, is that it is easy to implement. Here's a first crack at a Python implementation.

Note all vectors must be row vectors for the matrix multiplication dimensions to work out. This is different than the description above, but was necessary for simpler programming.

import numpy as np

class LPSolution(object):
    def __init__(self):
        self.iterations = None
        self.tolerance = None
        self.intermediates = []
        self.solution = None
        self.solution_string = None

    def __str__(self):
        self.solution_string = 'Solution: ' + str(self.solution)
        self.solution_string += '\n\tTolerance: ' + str(self.tolerance)
        self.solution_string += '\n\tIterations: ' + str(self.iterations)
        return self.solution_string

class LinearProgram(object):
    """A class that implements Karmarkar's Algorithm for the solution of
    Linear Programs in standard form."""
    def __init__(self, A, b, c):
        """Constructs an n-variable m-constraint Linear Program.

        A -- An n x m numpy matrix of constraint coefficients
        b -- A 1 x m numpy row vector of constraint RHS values
        c -- A 1 x n numpy row vector of objective function coefficients
        """
        self.A = A
        self.n, self.m = self.A.shape
        self.b = b
        self.c = c
        self.solution = None

    def karmarkar(self, start_point):
        """Runs one iteration of Karmarkar's Algorithm.

        start_point -- A 1 x n numpy row vector of decision variable values
        """
        D = np.diagflat(start_point)
        c_tilde = np.matmul(self.c, D)
        A_tilde = np.matmul(self.A, D)
        A_tildeT = A_tilde.transpose()
        AAT_inverse = np.linalg.inv(np.matmul(A_tilde, A_tildeT))
        # matrix multiplication is associative
        P = np.identity(self.m) - np.matmul(np.matmul(A_tildeT, AAT_inverse), A_tilde)
        cp_tilde = np.matmul(c_tilde, P)
        k = -0.5 / np.amin(cp_tilde)
        x_tilde_new = np.ones((1, self.m), order='F') + k * cp_tilde
        return np.matmul(x_tilde_new, D)

    def solve(self, start_point, tolerance=1e-5, max_iterations=50, verbose=False):
        """Uses Karmarkar's Algorithm to solve a Linear Program.

        start_point     -- A starting point for Karmarkar's Algorithm. Must be a row vector.
        tolerance       -- The stopping tolerance of Karmarkar's Algorithm.
        max_iterations  -- The maximum number of iterations to run Karmarkar's Algorithm.
        verbose         -- List all intermediate values.
        """
        x = start_point
        solution = LPSolution()
        for i in range(max_iterations):
            x_new = self.karmarkar(x)
            if verbose:
                print(x_new)

            dist = np.linalg.norm(x - x_new)
            x = x_new
            solution.intermediates.append(x)
            if dist < tolerance:
                break

        solution.solution = x
        solution.iterations = i
        solution.tolerance = dist
        self.solution = solution

        return solution

We use the solver as follows

#!/usr/bin/python3
import numpy as np
from linear_program import LinearProgram, LPSolution

def generate_tikz_plot(solution):
    """Generates a 3D Tikz LaTeX coordinate strings for the intermediate solutions."""
    for i, soln in enumerate(solution.intermediates):
        coordinate = r'\coordinate (xnew{}) at '.format(i)
        coordinate += r'({}, {}, {});'.format(*[coord for coord in soln.flat])
        print(coordinate)

    draw = r'\draw[red] (x) node[circle, fill, inner sep=1pt]{{}} '
    for i in range(len(solution.intermediates)):
        draw += r'-- (xnew{}) node[circle, fill, inner sep=1pt]{{}} '.format(i)
    draw += r';'
    print(draw)

def main():
    A = np.matrix([[1, 1, 1], ])
    b = np.array([8, ])
    c = np.array([1, 2, 0])

    LP = LinearProgram(A, b, c)
    LP.solve(start_point=np.array([1, 1, 6]))
    print(LP.solution)

if __name__ == '__main__':
    main()

Running the above script, we get the following output

Solution: [[  2.97874012e-06   7.99999553e+00   1.48937006e-06]]
    Tolerance: 5.57390111055e-06
    Iterations: 21

Which we can see in the following image.

A Basic Linear Program
The figure generated by the above script