JTBD: Earn a new product manager superpower through clustering
Have you ever commissioned a customer needs survey only to discover everyone looks roughly the same leaving you with nowhere to focus? This is surprisingly common and I wanted to share an easy solution using a well known clustering technique that can give you a much higher degree of confidence you’re building the right thing.
In Competing with Luck, Clayton Christensen gives two excellent insights:
- Myth of the average customer: The US Air Force in the 1940s blamed pilot errors after 17 crashes in one day. Turns out the flaw was the 1926 seat designed around the ‘average pilot’ that they subsequently realised didn’t actually exist overturning a 100 year assumption.
- Measure the job, not the customer: Gives a beautiful analogy of how companies analysing the customer vs the job get measurement errors that are narrated away similar to how for 18 centuries we narrated away the errors when measuring the planets as moving around the Earth vs the sun
I’ve personally commissioned six large surveys in the last four years embracing a Jobs view of the world, but each time I’ve been disappointed. Common replies are ‘there’s clearly no problem to solve’ or ‘customers tend to answer around the middle when given many Likert scale questions’. I called up a few product friends in different companies and got similar views with one yesterday at a multi-billion dollar US Fintech saying:
“Quant research is just a tickbox for me now - we have to do it but it doesn’t add value so I go with my gut or use the customer interviews”
Clustering and the Correlation vs Causation mistake
After reading JTBD pioneer Tony Ulwick’s blog on this exact problem, he suggests clustering as a way to find hidden segments with common needs but doesn’t detail how. I remember learning this a few years back on a data science course but never had the time to practise.
Clustering is a way of [unsupervised] learning more about lots of data points by finding meaningful groupings that are too hidden to specify beforehand [unlabelled]. This is in contrast to supervised learning where we know the groupings upfront and simply want to train a model to classify future data points e.g. is this a cat or a dog (see a great explanation of both from those clever fellows at Google).
I’ve asked research teams in banks and tech companies I’ve worked in to try clustering. One team didn’t know how, another didn’t have the right resource free (data scientist — precious commodity) and another had segments already defined from previous exercise so mapped the responses to them (definitely more useful, but still felt imprecise). Instead we tend to get correlation based pivots like “Females in mid 40s SEC AB have these needs in common’. The fact someone is female and 40 usually doesn’t give me much to go on in terms of causal understanding of why they would hire our solution — however surprisingly that’s how many people build and market products.
Then last week whilst on paternity leave, the founders of gini — a fintech I advise — asked for help in applying JTBD to decide what solution to build for a cash flow forecasting product and guess what..
Finally some time (between diaper changing) to try this out myself and 48 hours later I was able to present three clusters of potential customer each with very unique needs along with associated correlation based insights that were just fascinating and gave their product design team and marketing the rich story they were looking for to build and market their new product with high confidence.
4 simple steps to insight
Here’s how to do it in four easy steps using a unsupervised machine learning algorithm called k-means in Python (full code here and please don’t be put off if you don’t know programming, all instructions are in the readme file):
- Organise raw survey data
- Find the right number of clusters
- Cluster the respondents
- Analyse and present your results
1. Organise raw survey data
After many profile questions, we asked respondents to rank job outcome statements by importance to them. A typical question looks like this and note how the mean rank are all too close to each other to be informative:
We exported the results into CSV format available from most tools (we used Pollfish). You want each row to be a respondent and each column to be the answer they chose.
Look at the responses and choose the ones that are needs based. Ignore profile questions like age etc as mentioned they are likely correlations not causation factors.
We decided to choose only the most important outcome to cluster around ignoring the others so we gave a 1 for top rated outcome and a 0 for the others (note: in k-means you need continuous numbers for answers that are comparable like Likert scale i.e. not categorical ones like Dog or Cat.. there are ways to do that but beyond scope of this blog).
Below is how the data looked right before doing the clustering:
Now read this file into a variable called training, then pull out just the JTBD outcome answers (in my CSV it’s column 49 to 96):
training = pd.read_csv("jtbd_survey.csv")
jtbd_needs_only = training.iloc[:,49:96]
2. Find the right number of clusters
While we want to find clusters, it’s not useful to have lots. So how many? Luckily k-means tells you. Simply run it several times with different K’s and compare the data distortions (distance each respondent is from each other on average in K clusters). Once the distortion stops significantly decreasing, you can stop. Let’s try 1 to 5 clusters to see which works best for our dataset:
distortions = []
K_to_try = range(1, 6)
for i in K_to_try:
model = KMeans(
n_clusters=i,
init='k-means++',
n_jobs=-1,
random_state=1).fit(X_pca)
distortions.append(model.inertia_)
Once done, you can compare the distortions list in a chart and simply look for the ‘elbow’ in the line. Three feels good as after that there is little loss.
3. Cluster the respondents
The way k-means works is it plots every respondent as a dot in a geometric space. So if we had only two Likert scale questions and 500 people, you would have a two axis chart like below:
As our chosen K = 3, the algorithm picks three random respondents to be Centroids (the blue dots above) and assigns every respondent to the Centroid they’re closest to. Each Centroid calculates the mean of all its new members’ positions and moves itself to that central position. The variance score between the three Centroids is recorded for later (low variance means the clusters are fairly equal in shape). It then shifts the Centroids to new random positions, repeats and again records the new variance. After many iterations it looks at the variance scores and picks the lowest one. That is our chosen clustering where our respondents’ needs are most neatly related to each other.
Now we had 42 questions so we actually run this in not 2 but 42 dimensions — I wanted to try and draw that but worried I may end up in therapy.
model_k = KMeans(
n_clusters=3, #we put our chosen K here
init='k-means++', #++ chooses more optimal starting Centroids
n_jobs=-1,
random_state=1)
model_k.fit(jtbd_needs_only)
result=model_k.predict(jtbd_needs_only)
Result now contains a two column list with respondent IDs and their new ‘Cluster’. Let’s add this to the original dataset and export it to CSV:
result = pd.concat([training, pd.DataFrame(y_final, columns=['Cluster'])], axis=1)
result.to_csv('clustered_survey.csv')
We can now inspect our new clustered_survey.csv to check the cluster column:
4. Get rich insights from the clusters
Now you can inspect each cluster and pull out what is different about them i.e. what is the significant variance between this cluster and the other two?
We found some fascinating insights using this method and the team are now busy selecting the top needs to tackle on their immediate roadmap for one cluster that stood out in particular.
Things we focused on for this analysis were:
- What important needs did a cluster have significantly more of vs others?
- What did a cluster pick significantly more as their least important need?
- What demographic and profile questions were significantly different?
- Did they run some jobs more or less frequently than the other two?
Whilst I can’t share the exact insights due to confidentiality, I can share a few high level teasers below to give you a flavour:
- One country showed a HUGE difference in perceived skill level & tooling
- A different cluster were calling out for very specific types of help
- One cluster with certain needs were in a very specific type of company at a very specific stage
Needless to say this survey has given the team so much more insight about what to build now vs the usual averaging / user segmentation approach.
Hope you find it useful and feel free to ping as usual on LinkedIn if you have any questions or need help. I’m now going back to all my old surveys and running this to see what we missed :)