Will Covid Break Data Science?
Data Science / Predictive Analytics / Artificial Intelligence Is Largely About Extrapolation
At its heart, predictive analytics is about making educated guesses regarding the future based on things you know about the past. Thus, the field assumes that the future will work like the past did. But, what if the future doesn’t look anything like the past? You might try to adjust your formulae using analogs but some things are so different today that the results will be imperfect at best. For example, is our current recession analogous to the 2008 recession? Well, they’re both recessions but in 2008 people weren’t largely homebound. So, is it really analogous?
In a funny way, today’s situation is like that of the not hotdog app on the show SILICON VALLEY. There, the app worked when it saw something it recognized, a hot dog, but it didn’t know what to do with anything else.
Today we’re in a unique situation and predictive analytics isn’t great with unique. For now, smart companies will use their analytics as baselines but, at the same time, will adjust policies and structures to react quickly should the underlying assumptions prove incorrect.
Not All Data Science Applications Are Affected
It’s important to note that this issue doesn’t affect all applications. The simplest examples are things not built on economic or healthcare data. Image recognition, for example, still works. (let me know if you’ve had trouble unlocking your iPhone with Face ID and I’ll modify this section accordingly)
We Need to Expand Our Definition of Dirty Data
One more point is important here. We teach that if your data is bad or dirty, your results will be wrong (It’s still a GIGO world, right?). Given what we’re seeing today, perhaps we need to expand our concept of dirty data. Perhaps, rather than just incorrect, we should include a component of not representative of the future.
What do you think? Have your models been thrown off by covid? How did you react? Do you have examples you’d like to share? I’d love to see them! Leave your comments on this post or send me an email at Benjamin.Taub@Dataspace.com.
Thanks for reading!
Leave a ReplyWant to join the discussion?
Feel free to contribute!
I said it from the pandemic get-go: Trend-based forecasting models have been instantly broken. There’s an imperative for driver-based forecasting using leading indicators (esp those from external data sources). Cheers, Doug
Great point, Doug! Would you call driver-based forecasting data science, though? Isn’t driver-based forecasting more about building models from the ground up?
It reminds me of the difference between technical and fundamental stock analysis. Technical relies on statistics and trends. Fundamental is more about looking at the basics of the business and building a forecast of how that company will perform.