Dread Tomato Addiction: Deconstructing Dembski (2005)

Introduction

In his 2005 paper Specification: the pattern that signifies intelligence, William Dembski tries to give a rigorous definition to his concept of Complex Specified Information (CSI). This paper has numerous problems, the most painful of which is repeated equivocation of terms, making it very difficult to read. Once I got past the equivocation, I discovered basic errors in how probabilities are calculated and interpreted. One error in particular is very hard to swallow, and anyone with a basic understanding of probability should know better. Dembski has a master’s degree in statistics and a PhD in mathematics, therefore it is reasonable to think he knows better. How could he be so wrong?

Here's the spoiler, in case you don't care to read the whole post:

The concept of CSI in Dembski (2005) is based on a meaningless number, which is interpreted as probability even though it is not. As a consequence, CSI cannot have the meaning and interpretation stated. Dembski's math is wrong.

More after the fold ...

Taking a closer look

Incredulous that such an error might have escaped the authors' notice, I decided take a closer look, to see if there might be some information or context that I was overlooking. This required some study of Algorithmic Information Theory (AIT). Some heavy reading there, but fortunately inference by AIT is actually quite similar to the Maximum Likelihood and Bayesian methods I already know. I also looked around for other sources of criticism that might have already answered this question. Two particularly good sources of criticism are Elsberry and Shallit (2011), and Devine (2014). These papers are exhaustive criticisms of Dembski’s work overall, but neither of these mentions my error, so it seems I have found something original to criticize.

Now let’s take a closer look at how Complex Specified Information is defined. Here is the relevant portion of the paper; pop it out to enlarge, or open a copy (PDF) to page 20 for easier reading.

From Dembski (2005), pp 20.

Dembski is describing his formula for estimating the chance occurrence of an event E, and T is some pattern or sequence of events given as evidence for E, he then adjusts this probability for complexity and opportunity. The key part of the formula is,

`M*N*phi_s(T)*P[T|H]`

where M and N are positive integers representing replicational resources (how many opportunities are there for T to occur), `phi_s(T)` is a positive integer representing the descriptive complexity of a sequence T, and P[T|H] is the probability of the sequence T given hypothesis H.

Dembski refers to this as an upper bound on the probability because of a limitation of AIT; you can never be sure there isn't some shorter coding scheme for a sequence, therefore the probability might be smaller under that other coding scheme. That's OK, because the coding scheme is not an issue here. All I really care about is that Dembski says that `M*N*phi_s(T)*P[T|H]` is a probability.

The paragraph immediate following has Dembski's interpretation of this probability:

From Dembski (2005), pp 20-21.

Here Dembski's interpretation is clear; smaller probabilities (less than 1/2) are evidence against the event E occurring by chance. Taking `log_2` re-scales the number, but isn't necessary.
See also (2) in the addendum.

And now some statistics

That’s what Dembski says about CSI. Now let’s take a moment to review some basic statistics. Consider the following simple binomial probability experiment. I propose to roll a 6-sided die 10 times, and before rolling I will tell you the probability of rolling one or more 6’s. I make the following statement:

The probability of rolling a “6” is 1/6 or about 0.167, so the probability of rolling at least one "6" in 10 rolls is 10 times that, giving a probability of 10/6 or 1.67.

If you know even a little bit about probability, you ought to be suspicious of my statement, because probabilities must range between 0 and 1, and 1.67 is greater than 1. This cannot be a probability so my statement is obviously wrong. A correct statement would be:

The probability of rolling a “6” is 1/6 or about 0.167, so the expected number of 6’s in 10 rolls is 10 times that, giving the expected value of 10/6 or 1.67.

That is, I expect to roll "6" an average of 1.67 times in 10 tries. This simple experiment is a demonstration of the binomial distribution, and my mistake in the first statement was confusing the average or expected value of the number of 6’s rolled for the probability of rolling at least one "6". Now let’s change the experiment a little, imagine using a 100-sided die instead of a 6-sided die (or use percentile dice!). My first statement now becomes:

The probability of rolling a “100” is 1/100 or 0.01, so the probability of rolling at least one “100” in 10 rolls is 10 times that, giving a probability of 10/100 or 0.10.

This statement is also wrong for the same reason as before, but is less obviously wrong because 0.10 is between 0 and 1, which looks like a probability. Someone who doesn’t understand probability, or who does not understand the circumstances, might be fooled into thinking this number (0.10) is a probability bounded on (0,1). The correct interpretation is again the expected number of times “100” is rolled in 10 trials. It could be 0 (zero) or as high as 10, but on average this value will be 0.10, and the number of “100” events observed will follow a binomial distribution.

It doesn’t matter if the die has 6 sides, 100 sides, or 10^150 sides, the expected number of events X that occur with probability `p` in `N` trials is `N*p`, the expectation of a binomial random variable. This number is NOT a probability.
See also (3) in the addendum.

How Dembski is wrong

Dembski starts with the probability P[T|H] and multiplies by `M*N` (ignoring `phi_s(T)` for the moment). Just as in my dice examples above, the result is the expectation of the number of T events that will be observed in `M*N` binomial trials with probability of success P[T|H]. Dembski then multiplies by another positive integer `phi_s(T)`, a measure of descriptive length, resulting in a number which has no apparent interpretation at all.
Additionally, since this is not a probability, there is no justification for a result "less than 1/2" being evidence against the hypothesis H (the chance hypothesis).

Edits: After re-reading the final example in the paper (page 23), Dembski shows how this number could be greater than 1. This is in direct contradiction to page 21, where he uses it as a probability. Dembski seems to be confused about his own creation. The point remains that he refers to this as a upper bound on the probability, when it is neither a probability nor an upper bound.

Conclusion

The concept of Complex Specified Information put forward in Dembski (2005) is fundamentally flawed. CSI is stated to depend on a probability, but the method used does not result in a proper estimate of probability. As a result CSI has no meaningful interpretation.

CSI is known to be flawed before I found this additional bug, so my criticism here may be moot. I suspect the only reason no other critics (notably Elsberry, Shallit, and Devine) have written about this previously, is it requires accepting more fundamental errors before it can even be considered.

Epilogue (April 20, 2023): A discussion with Joe Felsenstein prompted this post at Panda's Thumb:

Discussion: Is William Dembski's CSI argument mistaken or merely useless?

I think this resolves the question conclusively, and further discussion in the comments changed my own interpretation too. Briefly, phi_s(T) is a ranking, and shouldn't be using it as a number of additional trials as I describe above. Unfortunately for Dembski, that leaves the definition of CSI completely unsalvageable. It's not even an expected value, but a probability multiplied by a ranking, which doesn't seem to have any meaning at all.

Addendum

1) I almost titled this post "Equivocation: the pattern that typifies nonsense", but I think I'll save that zinger for another day.

2) I should note that I am assuming Dembski is correct in his use and interpretation of `phi_s(T)` and `P[T|H]`. Elsberry and Shallit (2011) and Devine (2014) strongly disagree with Dembski's usage of these quantities, and give correct methods.

3) There is a correct way to do the calculation Dembski needs, which results in a probability:

`P["At least one success" | N,p] = 1-((1-p)^N)`

This is a basic probability calculation which Dembski could hardly have avoided learning while studying for a Master's degree in statistics. The probability Dembski seems to want to calculate (after some additional manipulation) is approximately:
`P["E occurs at least once"] ~~ 1-1/(e^(M*N*phi_s(T)*P[T|H]))`
which should be quite accurate for P[T|H] < 0.1.

[corrected 2/23/2016]

References

Dembski, W. A. (2005). Specification: the pattern that signifies intelligence. Philosophia Christi, 7(2), 299-343. http://www.bilimfelsefedin.org/blog/wp-content/uploads/Specification_-_The_Pattern_That_Signifies_Intelligence_-_William_Dembski.pdf

Devine, S. (2014). An algorithmic information theory challenge to intelligent design. Zygon®,49(1), 42-65. http://onlinelibrary.wiley.com/doi/10.1111/zygo.12059/abstract

Elsberry, W., & Shallit, J. (2011). Information theory, evolutionary computation, and Dembski’s “complex specified information”. Synthese, 178(2), 237-270.
http://link.springer.com/article/10.1007/s11229-009-9542-8

Weisstein, Eric W. Binomial Distribution. From MathWorld--A Wolfram Web. Retrieved February 12, 2016, from http://mathworld.wolfram.com/BinomialDistribution.html

Weisstein, Eric W. "Probability." From MathWorld--A Wolfram Web. Retrieved February 13, 2016, from http://mathworld.wolfram.com/Probability.html

William A. Dembski. (2016, February 4). In Wikipedia, The Free Encyclopedia. Retrieved February 12, 2016, from https://en.wikipedia.org/w/index.php?title=William_A._Dembski&oldid=703225648

Pages

Friday, February 12, 2016

Deconstructing Dembski (2005)