# Primal and Dual Simplex Methods

My article on linear programming is a prerequisite. For the second half (dual simplex and degeneracy), my article on duality in linear programming is a prerequisite.

Linear programs have an amazing structure. The simplex methods exploit this amazing structure to quickly find optima. Thanks to that, problems with millions of variables can now be solved, hence solving extremely complex industrial issues. That’s why the simplex methods have become crucial in our world today.

Why are you using the plural when talking about simplex methods?

You’re right to ask. There really is only one simplex method, introduced by the American mathematician George Dantzig right after the second world war. However, variants have been introduced, mainly the dual simplex method, which I’ll present later in this article. Let’s have a look at the simplex method first.

We’ll consider the smart robber problem I described in my article on linear programming, that is we can steal bills or gold, but the total stolen volume is limited, as well as the total weight stolen. Here is a figure of the feasible region. ## The simplex method

Recall that, in a linear program, there is necessarily an extreme point that is the optimum.

So we could just list all the extreme points and choose the best one…

There are two problems with listing all the extreme points. First, there can be a lot of them, especially when considering problems with millions of variables and hundreds of thousands constraints. Second, finding an extreme point can be quite difficult as it involves solving a system with all the constraints that may not lead to a feasible solution. Shortly said, listing all the extreme points is a bad idea.

What the simplex method does is moving from extreme points to strictly better extreme points until finding an optimal extreme points. This simple idea is considered as one of the main breakthroughs of the 20th century in computing, at least according to the journal Computing in Science and Engineering!

Visually it doesn’t seem too complicated but how are extreme points computed?

Any extreme point is the intersection of enough faces of the polyhedron. How many faces? As many as the dimension of the feasible set. Let’s call n this dimension. Be careful if you read other articles on linear programming, n may refer to something else like for instance the number of variables.

Now let’s notice that the faces of the polyhedron are the constraints. Thus, an extreme point is the intersection of n constraints. In our case, the feasible set is of dimension 2, so any extreme point is the intersection of at least 2 constraints.

Oh I get it. We can describe every extreme point by the set of n constraints it is at the intersection of, right?

Almost. Let’s define a constraint base as a set of n constraints.

A constraint base is a concept I have come up with and is therefore not usual. Linear programmers usually rather talk about the “base”, which is the set of variables that don’t correspond to the n constraints. We can obviously talk equivalently about the base or the constraint base, but I prefer to use the constraint base which is more visual.

It might seem like constraint bases can describe extreme points equivalently. Unfortunately, the relation between constraint bases and extreme points is not that simple for two reasons. First, a constraint base doesn’t necessarily describe an extreme point. For one thing, two constraints could totally match or be parallel and the intersection of n constraints could be empty or infinite. This is rare and could be dealt with. But, more importantly and more frequently, the intersection could be a point outside of the feasible set (like for instance the intersection of the red and yellow constraints of our example). Such constraint bases are called infeasible constraint base.

Another thing that can happen is that an extreme point could be described by several constraint bases. As a matter of fact, it’s possible that more than n constraints intersect at an extreme point. In this case, any constraint base describing this degenerated point is called a degenerated constraint base.

Still, despite these defaults, we use constraint bases to describe extreme points. Now, I’ll repeat what I said earlier about the simplex method but I’ll use constraint bases right now. At any extreme point, the simplex method seeks for a next feasible constraint base to visit, that is strictly better than the current constraint base. In order to do that, it looks at the possible direction where it can go to reach a next extreme point via an edge of the polyhedron. This direction is described by the constraint it will get out of the constraint base. The constraint base without one constraint gives a line to follow, delimited by our current extreme point. As we walk along the line, we eventually reach a new constraint that will enter the constraint base. This defines a new feasible extreme point.

Once again, people usually use bases to describe the simplex. Equivalently, what we’re doing here is taking a variable into the base, as we eventually find a variable to get out of the base. I find this description harder to understand, that’s why I didn’t use it.

There’s something else that’s particularly interesting with that method. Although computing an extreme point given its constraint base can be difficult for high dimensions (it requires the inversion of a matrix n x n), moving from one extreme point to a neighbor extreme point can be done very quickly. That’s one of the reasons why the simplex method is very efficient.

But how does it know if a direction will improve the objective function?

It knows by looking at the reduced cost associated with the direction. The reduced cost is how much the objective function will improve with one step in the given direction. It can be computed by making simple operations on the constant parameters of the linear program, or by looking at the dual variables. I’m not going to get into the computation of the reduced costs, as it’d be much more interesting if I had talked about those dual variables. Learn more by reading my article on duality in linear programming.

The following figure shows what the simplex method would do in our case: Here’s a way to visualize the simplex method. Imagine that you hold the polyhedron in your hands. That’s your feasible set. Add pipes on every edge. Those pipes link extreme points of the polyhedron. Now, choose the inclination of the polyhedron. This inclination represents the objective function: the horizontal plane should be a level set, while the vertical should be the gradient (don’t mind about this word if you don’t know what it means). Now, what the simplex does is choosing a first extreme point, and inject a drop of water in it. Then the simplex will choose the path of the drop of water (providing it never divides) until reaching the lowest point of the polyhedron, which will be the optimal point.

You talked about a starting extreme point. How can we choose it?

As you can imagine, choosing a good starting extreme point is crucial for the speed of the algorithm. It’s better to have one close to the optimum. However, this is a difficult problem. In fact, even finding one extreme point is hard. We usually introduce another program to solve this problem: minimizing the non-feasibility of solutions. In this program called Phase I, every solution is feasible, and what’s particularly great, the program is a linear program… which therefore can be solved with a simplex method!

The simplex method has become famous and has been used a lot as it enabled the resolution of problems with millions of variables and hundreds of thousands of constraints in reasonable time. However, it faces problems in cases of degeneracy: it’s possible that the direction of the reduced cost points out of the polyhedron (and that actually happens very often when using column generation). It’s hard to anticipate whether a direction will point out of the polyhedron or not and testing all the directions can take quite a while.

Moreover, the performance of the simplex method is being questioned by the rise of interior point methods. Shortly said, interior point methods penalize points in the border of the feasible set and slowly decrease these penalties to converge to the optimal point. Interior point methods deserve their own article. If you know them, please write about them!

Why have you written on the simplex method if a better method exists?

Because the research on simplex methods is still very productive, and many of its variants are state-of-the-art for specific problems. Let’s mention the column generation for shift scheduling, vehicle routing or clustering. This method is particularly developed in companies and universities in Montreal. It is used with the Dantzig-Wolfe decomposition. Shortly said, the idea is to consider only the vectors that may improve the objective function.

Stochastic programming enables to solve programs with probabilities very efficiently thanks to the L-Shaped method, derived from the Bender’s decomposition. The L-Shaped method can be understood as a method that only adds the constraints that are relevant, by detecting those that are not satisfied when not deleted.

Last but definitely no least, integer linear programming is very efficient using the simplex method! Integer linear programming enables modeling of a very very large range of fields that include binary variables for instance. Examples of these fields are assignment, supply chain and location problems. Find out more with my article on the topic!

All those methods correspond to exact methods (which means that they find the actual optimum). They may be too long. That’s why plenty of approximate methods, called heuristics, have been developed to find a point close to the optimum. These heuristics can be variant from the simplex.

Obviously, there is a lot to say on those variants and their applications. If you can explain, please write an article. Please, make it more complete and/or more understandable than the wikipedia pages! Also, I might have forgotten some key variants of the simplex methods. Please mention them.
Why is the simplex method better in those cases?

Mainly because when we slightly modify a linear program, re-optimizing using the simplex method can be done very quickly, as we can start with an extreme point that will be close to one of the new extreme optimal points. In comparison, the interior point method cannot take information from the previous linear program resolution to solve a slightly modified linear program. This re-optimization will probably require the use of the dual simplex, which I’ll now introduce.

I’m sure that if you work on interior point methods you have something to say! Please do!

## The dual simplex

My article on duality in linear programming is a prerequisite for the following..

I’m now going to explain what’s happening in the dual as we apply the simplex method to the primal. This well help us define another way of solving linear programs, known as the dual simplex. In order to have interesting things to talk about, we’ll assume that the optimal solution is to steal bills only. This gives us the following primal-dual spaces. As you can see on the graphs, primal and dual bases can be associated. The optimal primal-dual solutions are the cyan dots. Let’s have a look of a possible path taken by the simplex algorithm in the primal and dual spaces. Can you explain what happens in the dual space?

The black dot is the initial base solution. Recall that the simplex method will finish when we reach a dual feasible base. At the primal black dot, the primal reduced costs are positive both when leaving the yellow and green primal constraints. This corresponds to having the two violated dual yellow and green constraints. We need to choose one of the primal base variable with a positive reduced cost, which corresponds to choosing one of the two violated dual constraints. In our case, we have chosen the green constraint.

Now that we have chosen the primal green constraint to leave (and, equivalently, the dual green constraint to reach), we need to decide where to stop. In the primal space, we have to follow the yellow constraint until reaching a base. Equivalently we choose another feasible base with the yellow constraint. This corresponds to the primal orange dot, rather than the white dot. What does that correspond to in the dual? By getting the green constraint into the dual constraint base, we can choose to either lose the dual red constraint or the dual blue constraint (one of the constraints that was already in the dual constraint base). The one we have to choose, in order to have it corresponding to the primal base, is the one with the worst better dual value (since we have a dual minimization problem, it’s the smallest higher dual value). As a result, we need to choose the orange dot.

How should this choice of the worst better dual value be interpreted?

We must choose the worst better dual value because we need to keep the dual reduced costs positive (if the dual program is a minimization program), so that when we reach a feasible dual base, we’ll be at the optimum for sure.

Each pivot in the dual space should be done as the one we just did. From the dual orange dot, only the dual yellow constraint isn’t valid, so we have to get it into the dual constraint base, which leads us to choose the worst dual value between the yellow and pink dots. The pink dot has a worse dual value, so we pivot towards it.

Then, from the dual pink dot, only the blue constraint isn’t satisfied, so we’ll get it into the dual constraint base. We then have to choose between the cyan and the white dots, but, as the white dot has a worse dual value, we cannot go there. So we’ll go to the cyan dot. The cyan dot is feasible, thus it is the dual optimum.

That’s a great way to understand the simplex. But is it useful?

Well, it is useful for our understanding of the primal simplex. But that’s not all. The dual simplex is very useful if our initial point is non-feasible, in which case the primal simplex is useless (and we actually need to solve a subproblem called phase I, in order to find a feasible primal point). Yet, having an initial non-feasible point happens very very often in case of integer programming or stochastic programming using Bender’s decomposition. In both cases, we actually add new constraints to the primal linear program that make our current base infeasible. We can therefore apply the dual simplex to re-optimize. What’s more the dual simplex will perform very quickly as the initial base, which is usually the optimum without the new constraints, is very likely to be close to the new optimum.

The dual simplex is actually the reason why the variants of the simplex method work better than the interior point method on several important classes of problems.

I don’t really know what’s been done with interior point methods to do the re-optimization. If you do know, please tell us.

## Degeneracy

Just like for the primal simplex, the dual simplex performs well in case of non-degeneracy. However, if the primal simplex reaches a degenerated point, it may have a lot of trouble to get out of it because of the many bases that can represent the degenerated point.

What does it correspond to from a dual viewpoint?

Let’s draw this out to find the answer. Just like in my article on duality, we moved the primal blue constraint to create a primal degenerated solution. As a result, the primal green dot is actually the merging of the primal orange, white and pink dots. Yet, moving a primal constraint does not affect dual constraints. It does, however, modify the dual objective function, thus the level curves.

Since the orange, white and pink primal base correspond to the same point, their have the same objective value, which implies that dual orange, white and pink dots are on the same dual level curve. The degenerated primal point could actually be associated with the dual polyhedron defined by the convex hull of the three dual dots. This concept of dual polyhedron is very important for column generation.

In our case, things remain simple because there is no much degeneracy and the dimension is very low. Thus, the path we used earlier is still valid and is actually the one taken. What’s more, the dual polyhedron associated with the degenerated point is very simple since it’s a line with three bases. However, in case of more degeneracy and higher dimension, the dual polyhedron could be enormous, and the dual simplex would be travelling all over the dual polyhedron. It may take a very long time for us to leave this dual polyhedron.

It’s important to notice that a primal degenerated pivot (that is, a modification of the base that does not modify the associated solution) corresponds to an actual dual pivot that does not improve the objective value. In cases of higher dimension and higher degeneracy the number of such pivots could be exponential. That’s where comes the idea of dual variable stabilization.

## Let’s sum up

The simplex methods are amazing methods that exploit the structure of linear programs. In order to solve even larger problems, a better management of the bases must be done. That’s where technics like column generation and Bender’s decomposition are used and very efficient. However, they may be slow in cases of degeneracy which actually are very common in many classical modelings. Other technics are being developed to deal with degeneracy including the dual variable stabilization (read this article of mine) or the improved primal simplex.

Overall, the main application of linear programs concerns integer linear programming, which can model a very large range of problems. Many additional technics need to be added though, including the branch and bound, the cutting plane method, or, more recently, the integer simplex. Much research still needs to be done to improve integer linear programming.

## More on Science4All

Optimization by Integer Programming By Lê Nguyên Hoang | Updated:2016-01 | Prerequisites: Optimization by Linear Programming | Views: 8514
Integer programming is arguably the greatest achievement of applied mathematics. Half of the time, it's what's used to solve real-world problems!

Santa Routing and Heuristics in Operations Research By Lê Nguyên Hoang | Updated:2016-01 | Views: 7827
Designing routes to visit customers has become one of applied mathematicians' favorite optimization problems, as companies offer millions to solve them! This article discusses the clever technics they have come up with, and use them to help Santa deliver toys to kids all over the world!

Column Generation and Dantzig-Wolfe Decomposition By Lê Nguyên Hoang | Updated:2016-02 | Prerequisites: Linear Algebra and Higher Dimensions, Optimization by Linear Programming | Views: 12257
Column generation and the Dantzig-Wolfe decomposition are powerful tricks which have revolutionized optimization addressed to industrial problems, and generated millions and millions of dollars. My PhD supervisor effectively took great advantage of these tricks and founded companies with it. This article explains the tricks.