How we test Accuracy and Performance
Our AI predicts customer behavior with greater than 90% accuracy. How do we know? We test and measure the performance of our models regularly, in a variety of ways.
Model training test accuracy - 94% accuracy
A statistical measure, performed after any model training update.
In this test, we feed our models an incomplete set of data and compare its results to the actual correct outcomes we have withheld.
Machine learning statisticians consider this test the gold standard. While we understand and can accommodate the desire to review how the models perform against your own company’s customer data, there are important caveats to understand with correlational accuracy, as you’ll see below.
Churn and Active accuracy
A correlational measure, performed weekly for each involve.ai customer.
This is the type of comparative measure you might perform yourself. We compare what the model thought would happen to the actual customer outcome, as logged in your system of record.
In the involve.ai dashboard, a health score of 44% or below indicates churn-risk. Each week, we compare how many accounts were assigned 44% or below and, of those, how many churned. This is an important correlational measure. However, it’s important to recognize that it does not reflect the model’s accuracy.
Why? Imagine you have twenty accounts, and involve.ai assigns five of them health scores of 44% or below. It does so based on low KPIs. If two of those at-risk accounts churn in a week, and the other three do not, this doesn’t tell us that the model is broken: those customers may simply not have churned yet.
ROI and upsell accuracy
A correlational measure, performed monthly or twice-monthly for each involve.ai customer.
In this version, we test the model for the prior fourteen or thirty days (depending whether it’s a bi-weekly or monthly run) to determine…
Overall accuracy - Upsell or downsell instances we predicted correctly compared to the total number of upsell and downsell instances
Upsell accuracy - Total predicted upsell value as compared to the total actual upsell value
Total predicted upsell value - Total dollar amount that clients upsold, where health score accurately predicted (was higher than the involve.ai health score threshold value of 75%)
Examples: If client A upsold by $10,000 in the measurement window, and our model showed a health score of 88 prior to the upsell, we would count this as a correct prediction, because 75 is the threshold at which anything higher indicates potential for upsell.
Similarly, if client B downsold by -$2,000 during the measurement window, and prior to downselling, our model predicted a health score of 33, we would count this as a correct prediction, because 44 is the threshold at which anything lower indicates a likelihood of churn.
On the other hand, if Client C upsold by $1,000 during the measurement window and prior to upselling our model scored the client at 67, we would count this as an incorrect prediction, because the health score was not 75 or above.