99.7 percent confidence

It turns out that 2007 WD5, the asteroid heading toward Mars, will not hit the red planet. (Darn!) Here’s the Space.com story on the matter, but what I particularly liked was this part:

The new odds were released one day after astronomers with NASA’s NEO office at the Jet Propulsion Laboratory (JPL) in Pasadena, Calif., lowered 2007 WD5’s chances of striking Mars from 3.6 percent to 2.5 percent, or about a 1-in-40 chance, on Tuesday. After analyzing results from a new round of observations between Jan. 5 and Jan. 8, scientists now estimate the asteroid will make its closest pass by Mars at a maximum distance of about 16,155 miles (26,000 km).

JPL researchers said that they are 99.7 percent confident that 2007 WD5 will pass no closer than 2,485 miles (4,000 km) from the martian surface.

What you’ve just read was their confidence interval, a statistical concept. The effect of inaccurate measurements on the predicted path of 2007 WD5 has been drastically reduced by taking a few more measurements. So much so, that they can emphatically declare based on current evidence, that no impact will occur. Before these measurements were taken, the range of trajectories was so broad that hitting Mars was a real (albeit small) possibility. Now the range has been so drastically narrowed that it would take a statistical ‘miracle’ to make it happen.

A confidence interval is pretty simple. It is a range of values (such as heights of pine seedlings) that you come up with after you make a series of measurements – after you collect your data. That range of values usually follows a bellcurve, with the most values right in the middle, with fewer and fewer values the farther away from the average you get. But as a proper statistical study is based on randomly selected samples, there’s a possibility that the numbers you have collected are themselves completely random and can lead you to a false conclusion. Maybe the average height of pine seedlings at X days of age is Y centimeters in height, but you happened to pick 20 really tall ones and came up with a much higher average, Z. The chance of picking so many tall ones rather than a mix of tall, medium, and short is exceedingly small, but there’s still a chance. So you can’t ever, using statistics, say that the average height of pine seedlings is exactly Y. It could be wrong by some tiny (or large) amount.
So instead, scientists calculate a confidence interval. This is a range of values on either side of the mean, or average, that you are confidence the True value is inside of. And your level of confidence is expressed in a percentage. “Confidence,” in the statistical sense, reflects the chance that you are correct – not how gung-ho you may feel when you get up in the morning.

Your typical, run-of-the-mill confidence interval is 95% – this means that the true mean has a 95% chance of being within the range that you calculated. (Say, 10 to 12 centimeters in height) This also means that you have a 5% chance of being wrong. A 99% confidence interval is larger, which makes sense, intuitively – by including more values, you have a greater chance of having the true value in your interval. (Say, 9 to 13 cm) But you can easily see that having a larger interval isn’t necessarily a good thing – where exactly is the average height of pine seedlings when your interval is 4 cm across?

The goal, then, is to get a confidence interval that is as high as you can get, (95…99…99.999…%) while being as narrow as you can get. If you collect more data on pine seedlings, you could maybe get a 99% confidence interval of 10.8-11.2 – which would be awesome. Without that new data, you might have only been 25% confident that the true mean would be in the 10.8-11.2 range. That doesn’t sound good at all, and if you put that in your research paper, you’d get it back from the peer reviewers with a post-it note saying “A little early for April Fools, isn’t it?” But it might be enough to get a grant to collect more data.

In statistics, you learn how to figure out how many observations you need to be able to detect a difference of X amount between one group and another, or one group and a chosen value. So a little data here and there can help you pin down the required sample size – which translates to how much money you require to do your research.
In the case of the asteroid, we’re not trying to impress a grant committee, but trying to get enough observations to be able to tell the difference between the position of the planet Mars on January 30 and the position of 2007 WD5. If there’s no difference in their positions, then we’re due for a cool show. But given the small amount of fuzzy data we started with, we couldn’t be sure that the asteroid would or would not hit Mars because the range of possible trajectories was wide, yet encompassed the red planet. As observations increased, the range was narrowed and it made it possible to rule out the possibility of an impact. Paradoxically, it seemed, the chance of impact increased, but this was an artifact of the way the statistics worked out.
To illustrate what I’m talking about, I’ve made a rough sketch of the asteroid’s likely orbit.

The horizontal black line represents the orbit of Mars, and the small image of Mars represents where the planet will be on the the 30th of January. The Blue spot is the asteroid, and the small blue dots trailing behind it are past observations. At first discovery, there was a wide range of possibilities, and the red area under the bell curve represents the cumulative chance of hitting any part of Mars.

As the observations increased, we became more certain of the position and trajectory of the asteroid. This narrowed the Bell curve, at first increasing the chance that it could hit Mars (late December). Mars was starting the get a little more off-center, but because the bell curve was narrowed, the estimated chance of an impact increased – the space occupied by Mars was a bigger fraction of range.

In Early January, however, many more observations narrowed the bell curve so that Mars was realistically no longer in it. There’s still some uncertainty as to where the asteroid will cross the orbit of Mars, however, we are 99.7% certain that 2007 WD5 will pass Mars greater than 4 thousand kilometers from the planet. To put it in perspective, 4,000 km is about half the width of the Earth, or just over one Mars diameter.

So there you have it, no impact, but still the startling realization that impacts are possible. And now, when scientists talk about confidence and statistical chance – you should be 99.7% certain that you know what they mean!

Published by

Karl Haro von Mogel

Karl Haro von Mogel serves as BFI’s Director of Science and Media and as Co-Executive Editor of the Biofortified Blog. He has a PhD in Plant Breeding and Plant Genetics from UW-Madison with a minor in Life Sciences Communication. He is currently a Postdoctoral Scholar researching citrus genetics at UC Riverside.