Intuition around NP-Hard and NP-Complete

kqr

, published 2018-04-20

Tags:

I finally have an intuition for the (in)famous complexity classes of problems P, NP, NP-Hard, and NP-Complete. I’ve never had any problem with P and NP, but the latter two I have always found confusing.

Key to my understanding was thinking about them in terms of generality rather than difficulty. I think you already have an intuition for the concept of generality when it comes to problem-solving, so I’m not going to dwell on this too long. However, I will say that the two concepts generality and difficulty are intimately related¹¹ Under the principle that a more general problem must be equally hard or harder to solve. You can easily convince yourself of this: any problem can be plugged into an algorithm that solves a more general problem, so if the more general problem had an easier solution, we would just use that solution also for the specific problem., so we don’t lose anything in terms of reasoning power by switching to that perspective.

Decision problems

A decision problem is a question that can be answered with yes or no, such as “Do I have spare change to cover this expense in order not to have to break a bill?” or “Does any of the car models currently on the market fulfill my requirements?”

Note that the answer is not constructive in the sense that a “yes” does not tell you anything about how to use your spare change, or which cars fit your requirements. So why care about decision problems? Because the steps we take to arrive at the answer to a decision problem tends, in some way, encode a solution too. So if we tilt our heads the right way, squint, and look at the steps taken to answer the question, we also get the what, how, and which.

P

The most specific class of problems we’ll talk about is P. The technical definition is that all problems in P can be solved in polynomial time or faster.

An example is that of linear search: if we have a box full of various mechanical parts and can we build a bicycle²² Assuming “a bicycle” is very strictly defined as containing a number of specific mechanical parts.? To figure that out, we can simply pick up each part in the box, check if it belongs on a bicycle, and then toss it in a separate pile of bicycle parts. If that separate pile then contains all parts necessary for a bicycle, the answer is yes. Otherwise, it is no.

NP

This class of problems is assumed to be slightly more general than P. The technical definition is that any solution to a problem in NP can be verified in polynomial time, but our best solutions to date run slower than polynomial time. We can reuse the previous example for some intuition. Last time, we had a specific definition of a bicycle and which parts were needed to build it. Pretend now we want to know whether we can build any means of human transportation. It could be a bike, it could be a car, it could be a submarine, it could be a hovercraft. Anything goes, including any vehicles that haven’t been invented yet.

If we find the parts for a bicycle, or a car, or a submarine, this task is easy. But in this general case, we cannot assume we will find the parts for a specific vehicle. We can rephrase the problem to reflect this. Instead of asking, “Is it possible to build any means of human transportation?” we can ask, “Can you rule out even the remotest possibility of building any means of human transportation?”

In other words, we have to be able to say for sure that it is absolutely impossible to find a subset of mechanical parts in that box that will actually allow you to transport humans. This is much harder – or at least, so we think.

However, it is quite easy to verify a solution. If I hand you a bunch of parts from the box, and say, “You can use these to build a means of human transportation”, then you can quite easily tell me whether or not that is possible, by simply trying to build it³³ All along here, we are assuming that there is a specific way to build everything and that this way is universally known..

Since this problem is pretty clearly a generalisation of the previous one, note also that if we had an efficient method to solve this problem, we could use that method to solve also the bicycle problem by simply restricting our definition of “vehicle” to only cover bicycles.

Relative Generality

It is important to know that problems in the complexity classes are not equal in their generality. Even within classes, there are gradations of generality. Some problems in P are specialisations of other problems in P, which in turn may be specialisations of problems in NP.

Similarly, some problems in NP are generalisations of other problems in NP.

NP-Complete

There exists a set of problems which are regarded as the most general problems in NP. This set is called NP-Complete. The NP-Complete problems are in fact so general that all problems in NP are special cases of the problems in NP-Complete.

I’ll repeat that because it’s so important.

If you take any problem in NP, you can trivially convert it to any problem in NP-Complete. Since NP-Complete is a part of NP, that means you can also take any problem in NP-Complete and trivially convert it to any other problem in NP-Complete.

So if you have an algorithm that solves any NP-Complete problem, you can – with just a trivial adaptor on that algorithm – make it solve any other problem in NP.

More importantly, if you have a fast algorithm that solves any NP-Complete problem, you automatically have a fast solver for any NP problem.

So in summary, when you see the word “NP-Complete”, mentally replace it with “The most general problems in NP, to which all other problems in NP can be trivially converted.”

NP-Hard

But it doesn’t end there. The class NP-Hard is the set of problems that are at least as general as the NP-Complete problems. This implies that the NP-Hard class contains only the most general problems in NP, and it does not contain the “easy” problems in P and NP.

However, the NP-Hard class also contains any other problems that are not in NP, but are even more general than NP-Complete. These problems may not even be verifiable in polynomial time.

It follows as a logical consequence of this that if you had a fast solution to any NP-Hard problem, then you automatically have a fast solution for an NP-Complete problem.

But since you have a fast solution for an NP-Complete problem, then you have a fast solution for all NP-Complete problems. And since you have a fast solution for all NP-Complete problems, then you have a fast solution for all NP problems.

So that would be nice. But it’s a little like saying “if I had a teleporter, I could send information faster than the speed of light.” There are two ways for that sentence to be true: either you have a teleporter and you can send information faster than the speed of light – or you simply don’t have a teleporter. I think most people would agree it’s more likely you simply don’t have a teleporter.

So, in summary, think of NP-Hard as “the most general problems in NP and all of their generalisations”.

Sometimes we define the set of NP problems which are not NP-Complete as NP-Easy because they’re easier than the most general problems in NP. With this definition, we can also think of NP-Hard as “all generalisations of the NP-Easy problems.”

Nomenclature

I won’t suggest renaming these classes. I’ll just say that if I were the one to pick the names from start, I would probably have used names similar to these:

P-Solve: Solvable in polynomial time
P-Verify: Verifiable, but not necessarily solvable, in polynomial time
Universal-P-Verify: The set of universal P-Verify problems to which all other P-Verify problems can be converted.
Universal-General-Verify: The set of problems which are generalisations of the Universal-P-Verify set, which are no longer in P-verify.

Sure, those names are a little longer, but at least to me, they convey the actual relation between the classes better.

~~I think key to this is that the name “NP” is very devoid of meaning. It simply means “not P” which doesn’t say a whole lot. Especially not since some some parts of NP are also P …~~

Edited 2018-04-23: That was embarrassing. NP does not stand for “non-polynomial”. As you can see in the following excerpt from an email I received, it stands for “non-deterministic polynomial”, and I should have known better.

I just finished reading your article on the complexity classes around P and NP and enjoyed it! I am still developing intuition for these things myself and it was a nice perspective on that. I have just one remark: If I recall correctly, NP technically stands for “non-deterministic polynomial” so your last paragraph might increase rather than decrease confusion. As you correctly point out, calling NP “not P” does not make sense since it contains P (but is often done anyways).

Reader email.

Very grateful for the correction! Intuitively, “non-deterministic polynomial” means that if we had a machine that could brute force all options simultaneously, it would complete in polynomial time. You may see why already – if we try all possible inputs at once, the only time actually taken is that of verifying the input – which we already said was polynomial for the NP class.