Ben J. Clarke
Posts
Prediction Is Hard

Prediction Is Hard

After a decade of professional data science work, the only "fairly" certain prediction I'll make is that clients will often be disappointed.

Ben J. Clarke
December 17, 2024

Anyone working in a complex field understands a disheartening truth - their work is most effective when kept simple. An example from my own profession might be predicting how many doctors are needed in a given locality. Ideally, you'd do something "easy", like consider the number of people in the area and their general state of health, the number of births, deaths, and planned housing developments. Put those things together, and you can make a pretty good prediction. But not a long one.

This kind of simple prediction only allows you to look a few years into the future, and even then, your prediction will be hostage to sudden shifts in demographics, like when families fled English cities during the pandemic. But it's about as solid a piece of data work as any, and it will not be profitable.

If you arrive at a client meeting and say, "I can't predict very far into the future, and anything I do predict will be uncertain and, possibly, completely wrong," nobody will throw money at you. I can't blame them. I've spent most of my career working with the public sector, and the expectations placed on officials are way beyond reason. They're asked to make plans on practically (sometimes actually!) generational timescales, and they need data to back them up. This is silly for all but the largest things.

You could, for instance, do a pretty good job of predicting the population and demographics of France for the next fifty years. You could not do the same for Paris. Once you get down to something as detailed as a single city, even a globally important one, there are too many confounds. Detroit, for example, saw its decline accelerated not only by complex internal factors but also by Middle-Eastern geopolitics. Data can't save you from the real world.

But complex predictions - the kind that earn money (lots of it) - are extremely valuable when they help reveal uncertainties. The best projects don't just present predictions and error margins and reputation-protecting disclaimers; they explain their assumptions and use them as jumping-off points to consider how those assumptions might be lacking and how they might fail. The best clients then start thinking about what they'll do in various failure scenarios, making themselves more likely to succeed when the world (seemingly inevitably) spins against them. So much of this is dependent on the client.

The worst data projects do the opposite - they cover everything with a layer of faux certainty. Your drunk uncle who "knows" which horse will win at the races is the worst kind of predictor. So is the influencer who "knows" that Google stock is going down or that some asset will outperform others. And so is anyone who "knows" (instead of "thinks") that AGI is, or isn't, on the horizon.

And the AI space has been full of people who "know" for the past two years. Nobody knows. Even the godfathers of AI - Le Cunn, Bengio, Higgins - disagree over where AI is heading and even where it currently is.

Yet, frontier AI companies, who are now synonyms for Big Tech - because everyone with spare billions has their own AI operation - are cast by the extremely online media as either masters of the future (AI will lead to AGI soon) or fools chasing a ghost (AI can't even spell in the images it generates). The former camp has been buoyed by the rate at which AI models have scaled, and to be fair, they make Moore's Law look quaint, while the latter camp is emboldened by recent reports that these larger models are not performing as well as expected.

As ever, there is a far less vociferous but much larger group of people in the middle who admit to not "knowing" what our AI future is, but many of us hope it comes with useful things like increases in cancer detection rates. You may have your own opinion on this, but for me, knowing what AI is going to do next (rather than "thinking" - and I do have opinions) would be nice. Knowing if any symptoms I develop are cancerous is much more important.

France will be fine for the next fifty years, by the way. Probably.