Thursday, April 9, 2020

Scott Adams and COVID-19 Modeling

Since the crisis started I’ve been watching Scott Adams (Dilbert cartoonist) religiously.  He live streams on Periscope twice a day.  Most of his views are really great.

But today he made a general claim that models are built to be used for persuasion, not to be accurate. I believe he has an MBA and did some modeling for businesses he worked for.  He states that he never claimed his models, and their predictions, were accurate, just useful for his bosses to make decisions.

I will concede that there may be ‘modelers’ that do not expect accuracy.  I will also concede that the output of models in certain domains, can be used for persuasion.  In particular, I believe the President’s Coronavirus Task Force used the COVID-19 mortality predictions to persuade US citizens to modify their behavior.

I will also concede that modelers expect early versions of their models to be inaccurate or less than valid in their modeling of reality.

However, every model I built I expected to be accurate, or at least have a reasonably reliable margin of error.  I also believe every modeler in the physical sciences tries to, and in many cases expects, their models to be accurate.  I cannot say what modelers expect from their models in the social sciences.  I’m also not sure what doctors and medical research might have in the way of model expectations.

The point is, most modelers create their models (or simulations) either to understand how something works or to predict future behavior of the reality being modeled.  They all want their models to be accurate.

Now the degree of accuracy, or the margin of error, is going to vary and depend on a lot of things.  Let’s take a look at how a model is built.

First you start with some understanding of how your relevant portion of the world (what you are modeling) works.  You might only have a theory.  But you identify all of the variables or factors that that you believe have significant affects on the model.  Usually, you are aware of some variables that you know are relevant, but probably have insignificant effects.

You put together equations that show the relations between the variables, inputs and outputs.  Sometimes you will need to simplify variables or relationships.  Say, you know a variable depends on another variable or factor, but you are not sure how.  So you ‘assume’ for that version of the model that that dependency does not exist.

Maybe you even leave out variables you suspect are significant so you can more easily create early versions of the model.

Next, you code up a simulation that lets a computer run a model over time, or any other variable.  The scientist, engineer or modeler should have collected some data from the real world.  That is, given a set of inputs, what outputs are observed over time?

The modeler wants to validate that their model is accurate (or determine how far off from reality it is).  So they run their simulation with the real world input.  If the simulation produces the same output, one can claim the model is validated for the range of that tested input.

More often than not, there is a difference between the expected (real world) output and what the model’s simulation produces.  In most cases, the modeler knows that either some of their assumptions are wrong, their relationship are not correct (bad theory) or they have not included all of the relevant variables.

So they modify their model and simulation and try again.  And again, and again... (:-)

Now shift to models of a spreading, new virus.  You’ve got billions of people that behave differently, and whose immune systems respond differently to the virus.  You’ve got data taken from countries and regions where you don’t understand those differences.  The data is unreliable due to incomplete testing, different doctors’ guidance to patients, imperfect logging, and even government behaviors that hide real data for political purposes.  And to top it all, you’ve only got an early theory about how the virus works in the body and how it spreads.

No modeler is going to expect a model in that situation to be accurate.  The modelers are going to try and build in error bars to reflect the uncertainties in their model. We saw big error bars on TV.

And of course as new data comes in, they are going to update their models in an attempt to increase accuracy.  But which part(s) of the model are most in error?  Is it how it spreads, the effectiveness of social distancing, or how the virus attacks the body?  They are going to do iterations of the model until their output is as close to observed reality as possible.  In this case, the mortality rate is probably what they are trying to hit first.  But we see beds, ICU and ventilator predictions still off.

My point is, no real modeler was going to trust the 100,000 to 240,000 mortality prediction.  You might expect the error bars to capture reality, but with such big error margins, I doubt anyone had much confidence in the model.

My point is, that every modeler hopes and tries to create accuracy.  They usually know when their models are not accurate.  Did you see anybody interview the modelers and ask them?  Of course not.  None of them were going to go on national TV and say what they know about their models.

One final thought.  Do you think virus propagation and top level medical results (mortality, ICU, etc.) over 30 days is a tougher job than earth environment predictions over 100 years?  There are a lot of gullible people out there.  Are you one?

No comments:

Post a Comment