Monday, April 6, 2020

Wuhan Virus - Propagation Equations

Being an engineer and scientist, I wondered if I could sit down and come up with some equations that might get me rough estimates of propagation parameters.  This would allow me to predict what might happen in the future, given what's happened in the past.

[Note:  I'm not doing this like a thesis.  It's more than likely I make all kinds of mistakes.  It's just a quick and dirty, back of the envelope thought experiment.  Please don't laugh!  If you read my conclusions at the bottom, I don't think you will laugh.]

So I got out my pen and paper, and made an attempt.  I ended up initially with 4 variables and only 3 equations.  Normally, to get the value of all variables in a set of equations, you have to have an equal number of equations.  If you want to see my conclusions, skip over most of the calculations...

Here's my simple 'model.'  You start with I number of infected people on day 1.  Obviously, I am ignoring what happened before my hypothetical day 1.  T is the number of infected people on day N.  (If this was a thesis, I would be using subscripts and greek letters.)  R is the number of people one infected person infects in one day.  That is, it's kind of a contagion multiplier, it could very likely be less than 1. C is the number of cumulative reported infections (cases) on day N.  D is the number of people that have died by day N.  'a' is the fraction of the total number of infected people that are reported as cases (infected).  And 'b' is the fraction of the total number of infected people that have died by day N.

So, the propagation of the virus gives you:
     Eq. 1:  T = I (1 + R)^ N

The number of reported (testing confirmed) cases is simply
     Eq. 2:  C = a T

And the number of people that have died is
     Eq. 3: D = b T

We know (or can pull from reports) given two data points where we pick day 1 and N, that reports give us I, C, and D.  But our variables are R, T, a, and b.

Now suppose we pick a third data point.  Somewhere in the middle at day N2, where we have a C2 and a D2.  This gives us

     Eq. 4:  T2 = I (1 + R)^N2

     Eq. 5:  C2 = a T2

     Eq. 6:  D2 = b T2

Now we have six equations and we have the variables (unknowns):  R, T, a, b, and T2.  So now we have 6 equations and only 5 variables.  That usually means either you have a duplicate (equivalent/extra) equation, or you have multiple possible solutions.  I'll ignore those problems here.

Now there are two variables that seem to have the most meaning.  R and a.  With a, you can tell how big the infected population is at any moment given a total of positive tests.

The real problem is that R, a, and b are not actually constants.  They vary over time and location.  The model above just assumes that you can get a rough mean or average value for those variables assuming the simple model I created above.

Just think about R.  If people are shaking hands, it would be high.  If they stay 3 feet apart instead of 6, it would be higher.  If people ignore social distancing in a region, it would be higher in that case.  It obviously differs by region and who follows what practice.  In fact, it probably varies for each individual.

Then too, you have to think about the increase in lock downs.  Over time, R would theoretically go down in time.

Similarly, 'a' will vary depending on what proportion of the truly infected actually get tested.  If you don't do a lot of tests in an area, you can omit counting a lot of asymptomatic and less severe cases.

Let's take a look at some numbers and see what our equations can provide in the way of understanding. 

Assume day 1 was 30 days ago (N = 30).  The initial count of infected (I) was 100.  I'm assuming that the cases reported on day 1 included everyone infected--a really poor assumption.  Today, we see 300,000 cases with 9,000 deaths.  So the first three equations would be:

     T = 100 (1 + R)^30
     300,000 = a T
     9,000 = b T

Since that gives us 4 variables and only 3 equations, let's pick another data point. 

But I'm going to cheat a bit since I'm not pulling actually data down from the Internet.  Let's say that a is hypothetically a constant 0.33.  That would mean that only a third of the infected are actually identified, tested, and test positive.  From our equations, that would mean T = 900,000, and R would then be 0.3546.  Using those hypothetical values (this is NOT something you do in real life)...

Say N2 is 15 (15 days ago, 15 days after day 1).  That would give a T2  of 9,486.  This lets us create out of the air (ha, ha!)...

     T2 = 100 (1 + R)^15
     3,130 = a T2
     100 = b T2   (I just made up the number 100)

Now we have 6 equations and 5 variables.  Let's try and solve for a and R.  That is, let's ignore the fact that we created the second data point assuming the equations were correct, and we had an 'a' of 0.33.  Using the above equations, we can substitute for T and and T2 and get

     300,000 / a = 100 (1 + R)^30
     3,130 / a = 100 (1 + R)^15

Now getting rid of 'a', we get

     300,000 / (100 (1 + R)^30) = 3,130 / (100 (1 + R)^15)

Which we can reduce down to

     95.8466 = (1 + R)^15

Or   R = 0.355, surprise! (Not so much, since we created the N2 data point).  This doesn't seem unreasonable with social distancing.  You wouldn't expect to infect more than one person per day.

Plugging R = 0.355 back into the last 'a' equation produces a value of a = 0.33.  Surprise again!

Now we can plug R and a into equation 1 and get T = 900,000 (ha ha).  This would mean that b = 0.01 or 1% of the actual sick have died.  Remember this is NOT real data.

We can see if this is consistent with equations 4-6.  T2 = 9,486 sick on day 15.  Using 3,130 = a T2 gives an a = 0.33, as we expected.  It also give a b = 100 / 9,486 = 0.01.  So my number I pulled from the sky was pretty close.

AGAIN, these are NOT real numbers.  But think about their implications.  If you think of equation 1 as being reasonably correct, T, the total sick (or infected anyway) is going to keep growing until you infect everyone (that is not totally isolated).  Only if you reduce R to 0 does the number stop growing.

And social distancing does reduce R, but in any infected area, it won't reduce R to 0 since people still have to go out for food, medicine and other necessities. 

You can effectively reduce R to 0 for quarantined people or between an infected area and a non-infected area.  The latter requires a total travel ban into and out of an infected area, or a 100% effective quarantine process for people traveling in/out of the infected area.

If China is not lying, they quarantined houses, apartment complexes, cities and regions that were infected until the sub-quarantined areas 'burned out' (achieved 100% infection) and recovered or died.  Obviously, there may have been quarantined areas with no infections, such as an apartment house. With no virus left, they could somewhat safely remove the quarantines and allow people to move between previously infected and non-infected regions and areas.

The US's voluntary social distancing is NOT effective quarantining.  China effectively achieved an R = 0 by mandatory and physically enforced quarantining.  All the US is doing is reducing R to something above 0.

So how do you look at a large 50 state nation like the US?  You have to start with seeds (infected travelers) in varying locations at varying times.  Since we don't have mandatory quarantining or 100% testing, and some infected are asymptomatic, some 'seeds' in each area/region/state are going to start the spread of the virus.  Those areas with better mitigation will have slow rises in cases.  Those with ineffective mitigation will have quick rises.  NY is one of the latter.  And I suspect they started with a lot of seeds (infected travelers).

In the backwaters of the country (no offense intended), those with sparse populations and no interstates, the virus may actually not spread into their areas if they use good mitigation techniques.  But all it takes is one asymptomatic traveler into their area.  Once seeded, the spread will start, even if it's a very low R rate.

I think Dr. Birx finally realized she wasn't getting any R=0 effects from the current social distancing guidelines.  Numbers were still going up everywhere, even in the remote areas with few seeds.  She looked at the reality on the ground, and saw that going to pharmacies and groceries were places where R went up significantly. So she recommended NOT going to the grocery or pharmacy.  I think she's overlooked other places and the reality of life in the US.

My take-away is that without China-like mandatory quarantines, or 100% testing with effective isolation like in South Korea, the virus is going to continue to spread.  At some point in the future, you will see 'burn out' in your area (100% effective infections) where the mortality reaches it's maximum point where no one else is available to get infected and die.

We could do mandatory quarantines like China in some areas, enforced isolation like South Korea, or 100% testing, and hold the numbers down.

If they come up with a vaccine, we could get herd immunity (something I didn't put in my model where the virus stops spreading).  In fact, they say that 30% to 50% of a community infected is enough for herd immunity.  That would mean the upper limits on mortality using my model would be lower if a region achieved herd immunity just through normal virus spread.

Also, we could get effective therapeutic drugs (like hydroxychloroquine) that would reduce the mortality rate and maybe the R rate.

But with any opening up of the economy before we reach herd immunity, the R rate is going to start going up again.  I think we are going to have to live with that to prevent Great Depression-like effects.

So my suggestion is start looking for ways to really quarantine your high risk family members, at least until the government allows vaccinations to start.


No comments:

Post a Comment