Two Wrongs

Relevance Under Uncertainty

Relevance Under Uncertainty

A few months ago, when I still worked with search and relevance in e-commerce, I was given an opportunity to contribute an article to The Informer, a newsletter by the British Computing Society, Information Retrieval Specialist Group (bcs irsg).

If you want to read it in full, you can go directly to Relevance under uncertainty – the commercial realities of information retrieval development.

If you just want a brief summary, here you go.

What is relevance, and what is Loop54?

Loop54 runs under the hood of many e-commerce stores, where it receives information about how users browse the site, and takes that into account to figure out what the user is looking for. Put differently,

[Loop54] attempts to perform the function a really good salesperson would if you step into a brick-and-mortar store: figure out as quickly as possible exactly what you are interested in and guide you directly to that.

Critically, it does this without being creepy! Users don’t even notice someting cool happens behind the scenes. They just go to an online store and search for what they want and it pops up as one of the first hits, even when any other search solution would bury the correct result far down the fourth page.

Loop54 was initially created as a search engine, answering “How likely is this product to be relevant given this search phrase?” Over the past ten years it has evolved to support more types of relevance. On a high level, every user-facing feature is a question of conditional probability:

How likely is this product to be relevant, given that the visitor …

  • … looks for alternatives to an electric kettle?
  • … looks for things that go well together with a guitar?
  • … looked at a smartphone minutes ago and is now searching for “Samsung”?
  • … has a history of purchasing blue clothing, and is now looking in the category of dresses?

This is something a lot of people don’t seem to grasp intuitively. How the user choses to browse an online store tells you a lot about what they are looking for. To get great results, you have to go well beyond text matching.

I believe this type of question of conditional probability is at the heart of information retrieval. By looking at it this way, we avoid common traps like suggesting five pairs of headphones to a user who has just bought a pair of headphones, or five pairs of sneakers to a user because they looked at sneakers once two months ago.

This is common sense, yet so many providers get it wrong.

E-commerce is data-limited

[…] e-commerce is both blessed and cursed. Blessed, because the end user provides the software with a very strong signal of relevance: the purchase.

I don’t know how many people even at Loop54 understood how insanely rich of a signal this is. It’s the gold standard for psychological experiments: want to know what people really feel? Ask them to put their money where their mouth is. And users at online stores volunteer this information because it’s in their own interest!

On the other hand cursed, because purchase data for most small-to-mid size e-commerce businesses is very limited; there is not enough purchase information on individual products to determine their relevance in the wide variety of contexts in which they are potentially relevant. New products take a while to gather purchase data (“cold start problem”), and in some verticals (particularly fashion and technology), new products make up a significant chunk of the active product catalogue.

And this is probably why people underestimate the quality of the signal: we get so little of it that it appears useless, despite how good it could be. Online stores have this super high-quality data source to determine relevance, but it’s way too sparse to supply enough information to the dreaded long tail of queries.

To get around this problem, Loop54 is based on the hypothesis that similar products can be treated as one unit when it comes to relevance. Loop54 very rarely deals with individual products, instead operating on clusters of similar products, structured hierarchically based on domain-specific similarity measures.

This is, of course, the secret sauce and why I can’t be very specific on what it really means to operate on “clusters of similar products, structured hierarchically based on domain-specific similarity measures”. But you get the point. Instead of looking at a long tail, look at related chunks of the tail and it starts to fatten up a little.

To give an example, […] If a few visitors of another shop have bought steak thermometers together with barbeque grills, then maybe it is relevant to suggest steak thermometers as complements to barbeque grills more generally, even those pairs of products for which there is not yet any individual purchase data. (Complementary products like these are especially difficult for traditional techniques, because if the data for n products is sparse, you can guess what the data for n² pairs of products is like.)

The essential problem is combinatorial complexity (if you want truly good relevance, you need to account for everything that happened up to the point where X is a potentially relevant product for the user), and pairs of products that go well together are a very simple – yet difficult – example of this.

Software engineering to optimise for innovation

The rest of the article deals more with the culture and practises used at Loop54 to enable a high rate of innovation. For example, there’s a focus on fast prototyping, because it’s hard to know when you get relevance right.

Since we don’t have concrete proof that the things we do are right or wrong, we need to remain flexible in the face of diffuse evidence over a long period of time. This means it is critical to make it easy to experiment, because cheap experiments lead to flexibility in technical direction.

One of the things that accommodate this is the design of the software itself.

Loop54 has from the start had a modular architecture, with a configuration system that is capable of rewiring most of the program logic without a single line of code. […]

Most of the feature development is done by a small team of product specialists, i.e. experts on Loop54 configuration – not because they have any sort of privilege within the organisation, but because they are intimately familiar with the requirements of many customers, as well as what the competition looks like. When they have ideas, they can create prototypes for even relatively large features in the span of hours to a few days

Of course, such prototypes are not well built, and eventually they will need to be integrated properly.

Once a prototype seems successful, the software engineering team helps build the feature into Loop54 in a way that optimises performance, future maintenance demands, and can be rolled out to every customer that benefits from it. This takes significantly longer – on the order of days to months, which is why it’s important to do only for ideas that appear successful.

I also write a little about keeping software engineers lightly loaded for quick response times, peer review, maintaining high quality code so that the code that does exist serves as reliable building blocks for non-programmers.

Then comes an interesting bit on simplicity, which I will just quote in full.

When people are introduced to the technical details of how Loop54 works, it is common for them to ask, “Why do you not do popular thing X?” or “Have you tried research idea Y?” This is a fair question, because at every level, Loop54 can seem technically primitive. There are three reasons for this apparent lack of sophistication:

  • Reliability. It is our experience that sticking with tried-and-true mechanisms where possible means there are fewer surprises when components are integrated with each other, and it is easier to troubleshoot the problems that do occur.
  • Simplicity. The unsophisticated is usually simpler and makes it easier to achieve high quality, with the benefits already discussed.
  • Differentiation. Many of the ideas that are published are those that our competition has a head start on, either because a competitor was the one to publish the result, or because a competitor has more resources to spend on achieving the desired result. A recurring theme when choosing which way to take the product has been to avoid replicating what our competition does, and instead find out how we can complement or improve on what they are doing.

Those were all the things I wanted to highlight. Feel free to read the entire piece if you’re curious about the rest!

Postscript

Note that Loop54 was acquired by FactFinder a couple of years ago. Some of the above qualities in terms of software engineering have changed under the new management – which contributed to my leaving the company. I still believe the product is the best of its kind out there.