# The Secretary/Toilet Problem and Online Optimization

I’ve recently been appointed post-doc at MIT under the supervision of Patrick Jaillet. Patrick’s world-known specialty is online optimization, and I’m extremely thrilled to start to work with this topic… even though it means that I’ve had (and still have) a lot to learn! And what better way to learn than to take online optimization as a game? Thanks to the wonderful work of Pavan Mirla, I’m very excited to present to you the (first?) online optimization game ever. Your goal here is to pick the best toilet, given that you cannot go back on earlier decisions of acceptation/rejection of previously visited toilets:

To make it in line with a classical secretary problem, as you’ll notice, we’ve added just enough correlations between the number of likes of each toilet, so that you’re not going to be tempted to accept the first toilet, just because it has a good number of likes. More precisely, here’s the distribution we used: For the $k^{th}$ toilet, we first pick uniformly randomly its ranking $r$ among the $k$ first toilets, and then we draw uniformly randomly his number of likes between the number of likes of the $r-1$ and the $r+1$ best toilets out of the $k$ first (we add a bit of margin at the extremes so that we will be able to insert future toilets). This is not that silly a distribution. When you’re visiting toilets in a new context, you may have no idea of the quality of the toilets that you’ll encounter. If the first toilet you visit is good, then, quite probably, all toilets will be good as well.

This game has been solved only a few decades ago, after it has been made popular by the great Martin Gardner in 1960. It quickly became known as the secretary problem, although it is also known as the game of googol, the fiancée problem or the sultan’s dowry problem. In our case, it’ll be the toilet problem. And what’s amazing about this game is what happens as the number of toilets goes to infinity (which, fortunately for you, I won’t have you playing!).

What happens when there is a huge number of toilets? I guess that the probability of finding the best toilets vanishes to zero…

Quite surprisingly, no! As it turns out, the best strategy will guarantee that you’ll always find the best toilet more than 1/3 of the times, no matter how many toilets there are! This is what’s brilliantly explained by Dr Ria Symonds in the great Numberphile video below:

The secretary problem has become a landmark in the History of mathematics as a problem of secretary hiring. It’s a bit sad that mathematicians have happily endorsed this misogynistic illustration. Back then, most mathematicians were indeed male, but, fortunately, this has and is evolving! Following the Numberphile video, we shall illustrate this problem with the toilet example.

More generally, online optimization is all about making clever irrevocable decisions as information pours in. And, these days, that’s a hot, very hot topic!

Why is that?

## The Secretary Toilet Problem

Let’s get back to serious business. I mean… to our toilet problem.

Can you recall what this problem is about?

Sure thing! In the toilet problem, we have, say, 100 toilets available. We visit them one at the time, but once we enter a toilet, we need to make an irrevocable decision: Use it or leave it. But evidently, most toilets are very dirty, and we’d rather use the absolute best toilet. So, here’s the problem: Is there a reasonable chance for me to find the absolute best toilet? And if yes, how can I? What if there were 1,000 toilets? Or a billion?

Hummm… It sounds quite unlikely that you do find the best toilet!

I know! But amazingly, there’s a way to have a very reasonable chance to finding the best toilet, even if there were as many toilets as the number of stars in the universe! No matter how many toilets there are, I can devise a strategy to find the best of them over 36% of the time!

For 2 or 3 toilets, the best probability is 1/2. Between 4 and 10 toilets, this probability decreases from 46% to 40%. For 26 toilets, it drops to 38%. For 150 toilets, it drops to 37%, but then never drops below 36.78%!
Really? That sounds so unlikely! Why 36.78%?

Hehe… To determine this probability, let’s first reflect on the strategy I can devise. A strategy is a rule to accept or reject the proposed toilet, based on how it compares to other observed toilets and to the number of toilets we’ve already rejected. Now, let’s say we’re searching for the absolute best toilet. Evidently, we’ll reject all toilets that are not better than the best toilet we’ve visited so far.

OK… But what about if the proposed toilet is better than all previously observed toilets?

That’s when we’re faced with a dilemma. Intuitively, if we’ve only visited a small number of toilets, then it’s unlikely that the next toilet is the absolute best one, even if it is better than all visited toilets. But if we keep rejecting toilets, the probability that the absolute best toilet is one that we’ve already rejected increases. Therefore, there’s a tradeoff between exploring/learning more about what toilets are out there, and missing out on an opportunity.

So I guess there is some sort of optimal time to stop exploring and start going for it?

Exactly. And if we can determine this optimal stopping time, we can solve the problem.

Unfortunately, the next bit will be a bit technical. You’re welcome to skip it, but I’d encourage you to learn about the probabilistic reasoning, as I find it very interesting. Eventually, there’ll be a bit of calculus required as well, which I guess is less interesting…

Suppose we have $n$ toilets to visit, and that our strategy consists in rejecting the first $k$ toilets, and then accepting the first toilet that’s better than all previously visited toilets. Now, consider a toilet $t > k$. What is the probability $p_t$ that we end up accepting toilet $t$ and that it is the best toilet?

Hummm… Well, the probability that it is the best toilet is $^1/_n$, right?

Yes. So far so good. But now, given that toilet $t$ is the best toilet, what’s the probability that we do accept it?

If we get to it, we’re obviously going to accept it, right?

Yes! That’s because it will obviously be better than all previously visited toilets.

So I guess that the question is: What is the probability that we reject all toilets before toilet $t$?

Very good!

So, what are the conditions under which we do reject all toilets between $k+1$ and $t-1$?

I get it! It must be that the best of the $t-1$ first toilets is within the $k$ first toilets!

Exactly! If the best of the $t-1$ first toilets is within the $k$ first toilets, then any toilet between $k+1$ and $t-1$ will be worse than the best toilet of the $k$ first toilets. Thus, all the toilets between $k+1$ and $t-1$ will be rejected. Conversely, if the best of the $t-1$ first toilets is not within the $k$ first toilets, then it is a toilet between $k+1$ and $t-1$. And since this toilet is better than all previous toilet, there is not a chance that we reject it.

Cool! And the probability that the best of the $t-1$ first toilets is within the $k$ first toilets is $^k/_{t-1}$, right?

Very good. So, overall, the probability that we accept toilet $t$ and that it is the best toilet is $p_t = \, ^k/_{n(t-1)}$. Naturally, the greater the stopping time $k$, the more likely that we will not wrongly choose a toilet that was not the best one. The greater $t$, the more toilets need to be rejected to get to toilet $t$, and thus the lower the probability $p_t$. And finally, obviously, the greater $n$, the smaller the chance that $t$ is the best toilet.

OK, that’s good. But what we want is the probability to choose the best toilet, regardless of its rank, isn’t it?

Yes. And to compute this probability $p$, all we need to add up the $p_t$’s for $t \geq k+1$. And here, we have an opposite phenomenon. Namely, the greater $k$ is, the fewer terms $p_t$ we can add up. That’s where the tradeoff is. For large $k$’s, we add up few large terms. For small $k$’s, we add up many small terms.

Cool! So now, what’s the optimal $k$?

At this point, we need a bit of calculus. Using the approximation of the harmonic series $1+\,^1/_2+\,^1/_3+\,^1/_4+\ldots +\,^1/_n \approx \ln n$ proved by Euler, we have $p = p_k + p_{k+1} + \ldots + p_n = \,^k/_{nk} + \,^k/_{n(k+1)} + \ldots + \,^k/_{n \times n} = \,^k/_n (\,^1/_k + \,^1/_{k+1} \ldots +\, ^1/_n) = \,^k/_n ( (1+ \,^1/_2 + \ldots + \,^1/_n) – (1+ \,^1/_2 + \ldots + \,^1/_{k-1})) \approx (\,^k/_n) \ln (\,^n/_k)$. This is maximized for $k/n \approx 1/e$, where $e =\, ^1/_1 + \,^1/_{1 \times 2} + \,^1/_{1 \times 2 \times 3} + \ldots \approx 2.718$ is Euler’s constant. Choosing $k \approx n/e$ then yields a probability $p \approx 1/e \approx 36.78\%$ of accepting the best toilet.

The validity of these approximations, which can be made rigorous with more care, is beyond the scope of this article. If you want to find out more, check my article on logarithms.
Wow! I guess we’ve proved it! Can you recapitulate this proof though?

I’ll do one better. I’ll let you listen to Dr Ria Symonds’ explanations:

Now, you may be pleasantly surprised that we’ve found a strategy that guarantees a good chance of finding the best toilet. But I’d advise you not to apply it to finding your soulmate (yes, you can replace toilets by dates, and the same reasoning holds). This stopping strategy is very risky!

Why is that?

Here’s the problem. If the best toilet is within the first $k$ toilets of the learning phase, then you’ll just reject all the toilets! Eventually, you will be forced to settle with the last toilet you’ve visited, or, if you’re applying this to dating, you will probably end up dying sad and lonely… And sadly, the probability that this worst case occurs is exactly the same as the probability of finding the best toilet: 36.78%.

That’s not good…

Indeed it is not. In particular, this toilet solution is not a good idea in practice, and I hope that no online web company uses it. To give relevant practical solution to online decision-making, we need to make our problem more realistic.

In practice, it seems that a good stopping time is rather something of the order of $\sqrt{n}$. At least, that’s what Matt Parker says in this Slate article, even though he backs this with psychology research… Not that I don’t believe in psychology, but, well, you know, it’s not mathematics…

## Online Optimization

There are a few details about the classical version of the secretary problem that make it unrealistic. The most crucial of all is about the goal we want to achieve. In practice, it’s much better to have a strategy that always guarantee accepting a near-best toilet than one that finds the absolute best toilet with high probability. After all, the absolute best toilet might be better than the next best ones just because it has four rolls of toilet papers instead of three. Well guess what… I hereby dare to declare that I am personally quite fine with (clean) toilets with only three rolls of toilet papers!

Really?

Yes I am!

So, you’re saying that the problem should rather be about finding very good toilets in average, right?

Exactly! The problem should be about what happens in average. To model that, let’s say that toilets can be scored between 0 to 100. We’ll say that a strategy that always finds a toilet of score 90 is better than one that finds a toilet of score 100 with probability 37% and a toilet of score 80 the rest of the time. Why? Because the average score of the latter strategy is $10 0 \times ^{37}/_{100} + 80 \times ^{63}/_{100} \approx 87$, which is worse than the average score of the former strategy, $90$.

In probability theory, we rather talk about expectations than averages. But, aside from terminology details, the idea is the same. We want to maximize the expected toilet score.

To make the problem more interesting, we will as well assume that we’re accompanying 5 kids, and we’re searching for appropriate toilets for them to use. This means that instead of accepting 1 toilet, we now need to accept 5 toilets. Moreover, let’s imagine that we have to pay to use the toilets, but that we only have one dollar on us at the moment. Since not all toilets cost the same (classier ones are probably more expensive), we’ll need to manage our limited budget.

And this is what online optimization really is about: Budget management. And it’s not necessarily a financial budget. Imagine you’re going shopping for food. You have both a financial budget and a stomach budget to manage: You don’t want to be full when you’re stumbling upon a delicious ice cream shop…

I see… That’s why it’s such a general framework!

Yes. The setting I’ve described here is an online packing/covering problem. Namely we need to “pack” our dollar consumptions, while “covering” the requirement of 5 toilets. Such problems are so ubiquitous in applied mathematics that we call them mathematical programs, and we write them as follows:

On the right, the mathematical program features variables ${\bf 1}_{accept \, t}$ which are binary decision variables. This means that we can choose to have them equalling 1 (which means that we accept toilet $t$) or 0 (which means that we reject toilet $t$). If you’re familiar with optimization, you’ve probably recognized an integer linear program on the right. More generally though, a lot of recent research (including mine!) has focused on convex programming. Find out more with my articles on linear programming and integer programming.

The mathematical program above is what we want to solve. Unfortunately, we can’t really solve it, because we don’t yet know the scores and the costs of the toilets we haven’t visited yet. So we need to do the next best thing: Design an online strategy that guarantees a total score in average that is almost the score we’d get if we knew about all the toilets.

Does such a strategy even exist?

This question has been addressed by no less than 10 papers within the last 5 years (and most have come out last year!). I cannot discuss the specifics here (I’ll do that in a future article), but here’s roughly their answer: If I visit the toilets in a random order, I can design an online strategy whose average score is very close to the best possible score. More precisely, the competitive ratio defined as the ratio of the online strategy’s average score by the best possible score is $1-\epsilon$, where $\epsilon$ is very small when the budgets are sufficiently high.

So what are their online strategies like?

Roughly, there are two extreme strategies, and plenty of others in-between. While these other in-between strategies are quite hard to explain, I find the two extreme strategies very intuitive and natural.

## The Economist’s Strategy

Historically (and by that, I mean, in 2009), the first proposed online strategy is sort of an economist’s strategy. Namely, I’m going to give a value to the budgets I have. More precisely, I’ll estimate how many toilet scores I get by paying 1 dollar, for instance. Or, in the case of food shopping, how much happier I’ll be if I eat the equivalent of one big meal. And then, when I see a next toilet, I’ll see if it’s worth its price.

Hummm… Can you give an example?

For instance, in our toilet problem, I may expect a toilet to gain 10 point out of 100 if its price increases by 3 cents. So, the value of 3 cents would be 10 points out of 100. Then, if I’m faced with a toilet with score 50/100 and cost 20 cents, I’ll know it’s not worth it, because each of the 10 points out of 100 is valued 3 cents, and thus, the toilet should only cost 15 cents.

But that’s only my estimation of the value of 5 points out of 10? How can it be relevant?

The key to this economist’s strategy is to learn the values of the toilets by first observing the first few toilets. In this strategy, like in the original secretary toilet problem, the first few toilets are all rejected. They just serve as a benchmark to judge future toilets.

If you’re familiar with optimization, you may have guessed that the economist’s strategy is technically better known as the dual algorithm. More precisely, the dual online algorithm observes the first $k$ toilets and does nothing. Then, it solves the dual linear program for a budget scaled appropriately to determine an optimal price vector. Evidently, this price vector is only optimal with respect to the sample observed, but, if $k$ is not too small (typically, of the order $\sqrt{b}$), it’s also a good estimation of the globally optimal price vector.
It’s funny… and it really makes sense!

I know. And crucially, if the first few toilets are somehow representative of all the toilets, then I’m likely to have a great estimation of the value of 10 points of 100. And since, with high probability, I will have a good estimation of that, my future decisions will be near-optimal in average!

## The Entrepreneur’s Strategy

While the economist’s strategy is indeed near optimal, last year, Kesselheim, Radke, Tönnis and Vöcking proved that one can do better, with a more “entrepreneurial” strategy. Roughly, in this strategy, an investor allocates a fraction $k/n$ of the total budget to the first $k$ toilets. Then, given this allocated budget, the entrepreneur makes the decision that would be optimal if the problem it had to solve was restricted to these first $k$ toilets.

Wait, the entrepreneur can’t go back on previous decisions, can he?

No he can’t, and he won’t. But, crucially, his decision for the $k^{th}$ toilet is the one he should have made if he could go back on previous decisions. In some sense, the entrepreneur is constantly questioning himself and adapting himself to new data, even though he cannot change the past decisions. In particular, if the $k^{th}$ toilet is really worth it, the entrepreneur might spend all the budget allocated by the investors for the $k$ first toilets on this $k^{th}$ toilet, even if he had already spent some of this budget on the $k-1$ previous toilets. Thus, because of poor past decisions, it’s common for the entrepreneur to spend more budget than is allocated by the investor. But the entrepreneur can’t afford to regret the past.

Can you illustrate the entrepreneur’s strategy on the toilet problem?

Sure! Imagine there are 100 toilets and that you’re visiting the 10th toilet. Since you had 1 dollar to start with, and since you’ve only visited 10% of the toilets, the investors only allocate you 10 cents for the first 10 toilets. Now, imagine that the 10th toilet has the best ratio score over cost, so that you wish you had spent all of your first 10% budget on this 10th toilet. Then, here’s what the entrepreneur’s strategy would have you doing: Spend all of the investor’s allocated budget on this 10th toilet (10 cents), even though you may have already spent some of these 10 cents on previous toilets!

So the allocated budget isn’t really… an allocated budget!

You’re right! It’s more of an indicator of how much should be spent on the first toilets. But still, you cannot spend more than the allocated budget for the first 10 toilets on the 10th toilet.

What if the 10th toilet costs more than the budget allocated for the first 10 toilets?

That’s a very good question! Let’s say it costs 20 cents, which is twice the budget allocated for the 10 first toilets. You cannot afford it. However, the entrepreneur’s strategy does not want you to give up on this toilet! It really wants you to spend the investor’s allocated budget on this lovely 10th toilet. So, ideally, it would want you to pay 10 cents to use half of the toilet!

But that can’t be done, can it?

Well, in practice… I’ve never asked to use half of a toilet, but I doubt they’d let you do so!

So what can be done?

You can flip a coin! If it’s heads, you’ll spend the 20 cents required to use the toilet. If it’s tails, you’ll give up on this lovely toilet (what a shame!). Crucially though, in average, you’ll have spent 10 cents on the 10th toilet.

Hummm… This strategy doesn’t seem nearly as effective as the economist’s strategy…

I know! When I started to read about it, I knew it was the better strategy, and I was deeply troubled… I thought it couldn’t be the better strategy! But amazingly, after five or six pages of proof, its authors did come to an astounding conclusion: Not only is this strategy better than the economist’s strategy, it is in fact the best online strategy!

As you’ve probably guessed, the details are quite hard to explain. But roughly, a previous paper by Agrawal, Wang and Ye gave an upper-bound on how good an online strategy is. This upper-bound is asymptotically met by the entrepreneur’s strategy: This strategy guarantees that, in average, you’ll obtain a score that is $1-O\left( \sqrt{\frac{1+\ln m}{b}} \right)$ times the optimum. In more technical terms, the entrepreneur’s strategy can be qualified as “primal”, as opposed to the dual algorithm. Amusingly, their inventor titled their paper Primal Beats Dual, which is easily the coolest research paper title I’ve ever had to read (on a similar note, the title PRIMES is in P was pretty cool as well!).
Does that mean that online optimization is a closed subject?

Hehe… Sadly (at least for me), online packing with a random order of the visits is indeed a closed subject, in the sense that we know what the best online strategy is, and we know how well it performs. But there are plenty of variants of the problem that remain unanswered. For one thing, mixtures of the economist’s and the entrepreneur’s strategies yield an interesting mixture of quality competitive ratio and fast computation, as opposed to the optimal entrepreneur’s strategy which requires a lot of computation.

Namely, the entrepreneur’s strategy requires solving a linear program at each iteration. It’s not so bad, but it’s more computationally costly than some “primal-dual” algorithms which typically relies on multiplicative weights update methods whose computations boil down to a few multiplications.

But there’s more…

Like what?

The happy news (at least for me) is that I’ve stumbled upon several interesting open questions whose answers I do not have (and sometimes do not even have a guess of what the answers might be!). But, I’ll keep those for myself… for now.

## Let’s Conclude

There’s a lot, lot more to say about online optimization. Evidently, my presentation of this rapidly expanding area of research is deeply skewed by my own personal interests. Let me briefly mention the particular case of online matching, which is the online version of the marriage assignment problem I wrote about years ago. This problem has been given major importance due to applications to online markets and, more surprisingly/amusingly, to kidney transplants. At any point of time, we have a set of donors and receivers. The difficulty lies in the fact that not all donors and receivers are compatible, and that some are more compatible than others. Thus the question: When someone needs a kidney, should we give him one that’s already available, or should we wait for more compatible donors?

Another important variant of online optimization is sometimes known as online learning, where the focus is given on the regret rather than on the competitive ratio. This regret is the difference (as opposed to a ratio) between the best that could have been done as opposed to what the online algorithm has done. I’d love to tell you more about it, but unfortunately, I’m still hugely ignorant of that field of research…