Tuesday, August 12, 2008

Caspar Arman - How to lie with real data

From Climate Audit about a leading global warming "authority." I give him the floor:

The Texas Sharpshooter fallacy is a logical fallacy where a man shoots a barn thirty times then circles the bullet holes nearest each other after the fact calling that his target. It's of particular concern in epidemiology.

Folks, you are
never going to see a better example of the Texas Sharpshooter work itself out in real life than Caspar Ammann's handling of Mann's RE benchmark.
I introduce you to
Caspar Ammann, the Texas Sharpshooter. Go get 'em, cowboy.

In Ammann's replication of MBH, he reports a calibration RE (the Team re-brand of the despised calibration r2 statistic) of 0.39 and a verification RE of 0.48. So that's his bulls' eye.

In our original papers, we observed that combinations of high calibration RE and high verification RE statistics were not necessarily "99.99% significant" (whatever that means), but were thrown up quite frequently even by red noise handled in Mannian ways. So something that might look at first blush like sharpshooting, could happen by chance.

In my first post, the other day on this, I observed that Ammann's simulations, like ours, threw up a LOT of high RE values - EXACTLY as we had found. There are nuances of differences in our simulations, but he got a 99th RE percentile of 0.52, while we got 0.54 in MM2005c. Rather than disproving our results, at first blush, Ammann's results confirmed them. Mann didn't appear to be quite the sharpshooter that he proclaimed himself to be or that everyone thought. (This is something that should have been reported in their article, but, needless to say, they aren't going to admit that we know the street that we live on.)

It's not that the MBH RE value for this step isn't in a high percentile - it is, something that we reported in our articles, though in a slightly lower percentile according to our calculations. For us, the problem was the failure of other statistics, which suggested to us that the seemingly high RE statistic (99.999% significant) was an illusion from inappropriate benchmarking - a form of analysis familiar in econometrics (especially the seminal Phillips 1986). The pattern of MBH statistics (high RE, negligible verification r2) was a characteristic pattern of our red noise simulations - something we reported and observed in our 2005 articles.
Obviously, it wasn't enough for Ammann to show that the MBH RE value was in a high percentile - he wanted to show that it was "99% significant" as the maestro had claimed.

So he re-drew the bulls' eye. A couple of days ago, I described the two steps whereby Ammann gets the MBH RE score (0.4817) into the 99% "bullseye" but this was my first cut analysis and did not tie it directly to the re-drawing of the bullls' eye.

Ammann's first step was to assigned an RE value of -9999 to any result with a calibration RE under 0. That only affected 7 out of 1000 and didn't change the 99% percentile anyway. So this seemingly plausible argument had nothing to do with re-drawing the bulls' eye, as noted previously.

The bulls' eye was re-drawn in the next step - where Ammann proposed a "conservative" ratio of 0.75 between the calibration RE and verification RE statistics. Using this "conservative" ratio, he threw out 419 out of 1000 votes. The salient question is whether this "conservative" procedure has any validity or whether it's more like throwing out black votes because they couldn't answer a skill-testing question like naming the capital of a rural county in Tibet or identify the 11th son of Ramesses II. I'll provide some details below and you decide.
First no one has ever heard of this "conservative" benchmark - and I mean, no one. You can't look up this "conservative" ratio in Draper and Smith or other statistical text. The "conservative" benchmark is completely fabricated. So everyone's statistical instincts should be on red alert (as Spence_UK and Ross' have been and as mine were.)

So I thought - let's look at the votes that didn't count. What did the rejected votes actually look like? First of all, if a simulation had a negative RE score, those would fail the test and be re-assigned to -9999. OK, but those didn't matter, because they were already to the left of the target; the 99% bulls' eye wasn't affected by this.
The only ones that mattered were the votes with RE scores higher than MBH which were thrown out on this new technicality. There were 13 votes thrown out on this pretext which I list below in order of decreasing RE score (note once again how high both the calibration and RE scores are in these rejected votes.) Most of the rejected votes had calibration RE values above 0.3, but slightly lower the value of the calibration RE in the WA emulation of MBH (0.39), but the third one in the list had both a calibration RE and a verification RE that were higher than MBH. Nonetheless, the vote still got thrown out. The RE score was too "good".

For a calibration RE of 0.3957, the maximum allowable verification RE to be eligible would be 0.528! (0.3957/.75). Turn that over in your minds, folks. If the calibration RE was 0.3957, unless the verification RE was exactly between 0.4817 (MBH) and 0.528, the score would be be placed to the left of MBH and the bulls' eye re-drawn. Redneck scrutineers would be proud.

The data and more are at Climate Audit.

0 Comments:

Post a Comment

<< Home