Two Wrongs

How Not To Collect Data

How Not To Collect Data

I’ve held onto this one for a while to avoid outing anyone in particular, but it’s too funny and sad not to share.

At a previous company I worked for, customer service was under pressure to prevent all customer cancellations. Didn’t matter the circumstances – if a customer decided not to renew their contract when it expired, customer service was bad.1 You can see how they were set up for failure!

Therefore, they created a software system in which the manager could monitor customer relationships in real time. Paraphrasing a manual introducing the system:

The customer contact evaluation by everyone should become an integral part of our daily work with customers. Whether it’s a phone call, a raised ticket, or an email, every contact with a customer must be recorded. Please use this tool with every contact and enter your experience there.

Asking customer service representatives to record every single interaction might be a bit excessive on the bureaucracy.2 If we want to improve customer relationships, wouldn’t it be better if they spent their time talking to customers and solving problems? This is typical of what anthropologist Marilyn Strathern calls audit cultures. We don’t want to micro-manage, but we also don’t trust people to self-monitor, so we impose bureaucracy that ensure mechanised self-monitoring processes are followed – and in the end, this bureaucracy becomes micro-managing. Making matters worse, people start to rest against the bureaucracy instead of self-monitoring, which lowers quality further.3 Audit Cultures; Strathern; Routledge; 2003.

Anyway, so how do we use this software? That’s easy. On every customer contact, we fill in and submit a short form. The first question on the form asks us to rate how the customer feels about the service they have received, along three axes. I don’t remember the names used for these axes, but they essentially boil down to

You’re probably going, “No, that wasn’t it.” But it was. I’m not making anything up beyond replacing words with synonyms.

Not only do we ask the customer service representative about how the customer feels – which they can only guess at4 You might say that a good customer service representative would know how the customer feels. Maybe. I’d like to see it tested before I believe it because it seems a lot like one of those things humans think they are good at when they are not. – but also they have to rate this feeling along three scales that measure the exact same thing.

I imagine when they made this software system, they just had the one scale, and someone went, “We need to collect more nuanced data than that. A single scale isn’t actionable.”

“Sure, what data do you have in mind?”

“I don’t know, just make it more!!”

Okay, so we rate the same thing three times. How do we actually record our rating? We click on cartoon faces. With no other explanation or clarification.

For each of the three satisfaction scales, there’s a row of five cartoon faces, ranging from red and angry to green and happy.

I don’t know if I really need to explain this to my readership, but that is bad. If you and I are on the same customer call, I might judge it as a lime-and-slightly-smiling face, while you judge the same interaction as an orange-with-slight-frown-face.

When using fuzzy symbols and words, humans can say the same thing but mean very different things. They can also mean the same thing but say very different things! Fuzzy things produce noisy data. I need to write more about operational definitions, because they are the antidote to this. One way to operationalise the cartoon faces would be to replace them with buttons corresponding to “probability of terminating the contract within the next 12 months”: labeled 90 %, 50 %, 20 %, 10 %, and 5 %.

That would be a really meaningful and immediately useful scale, even if people aren’t perfect judges of such things.5 We can either train them to improve their calibration or automatically adjust for overconfidence.

Never mind, cartoon faces it is. Now that we have clicked on the cartoon faces, we can optionally fill in a free-form text field with more details, and then we click submit. If we have clicked on anything other than green-and-grinning-faces and try to submit, the system gives us a confirmation dialog:

Are you really sure you want to submit this data?

Well, not anymore, am I? If I’m going to be shamed for a less-than-stellar interaction, you can bet I’ll submit mostly stellar interactions from now on.6 Regardless of what actually happened, to be clear.

I don’t know many details about how this data was used, but I do know there was a dashboard somewhere with a matrix that displayed, one row for each customer, and one column for each of the three scales, a cartoon face averaged across all submitted cartoon faces.

I don’t … what is … how do you … an average over cartoon faces? I don’t claim to know much about survey design and mathematics, but surely cartoon faces are the mother of all ordinal scales and you cannot average ordinal scales because the relative distance between symbols is meaningless – ignoring all the noise introduced from the start by asking people to click on cartoon faces.

There was actually a good part in that form, and it’s easy to miss: the free-form text field. If I could redesign the system, I would have kept just the text field and a single button to flag that you are concerned about how the customer perceived the interaction. Using that data is easy: statistically analyse the distribution of concern flags to see (a) which customers have it rough, and (b) if a customer’s demeanor has changed over time. Once you have a signal, study the free form text fields to see if there are beneficial or costly patterns in service received that can be adjusted on a system-wide label.

If I could redesign the organisation, I would also have considered stripping out the middle layer of bureaucracy and just have customer support representatives talk with their manager and other people in the company about their concerns.