# Difference between revisions of "Chance News 34"

m |
|||

Line 106: | Line 106: | ||

Submitted by Paul Alper | Submitted by Paul Alper | ||

+ | |||

+ | ==How a statistical formula won the war== | ||

+ | [http://www.guardian.co.uk/world/2006/jul/20/secondworldwar.tvandradio Gavyn Davies does the maths,] Gavyn Davies, The Guaridan (UK), July 20 2006.<br> | ||

+ | |||

+ | The old article relates how statisticians were called on | ||

+ | to estimate the number of enemy tanks prior to the allied | ||

+ | attack on western front in 1944. | ||

+ | |||

+ | The statisticians had one key piece of information, | ||

+ | which was the serial numbers on a few captured tanks. | ||

+ | Assuming that the tanks were logically numbered | ||

+ | in the order in which they were produced, | ||

+ | was enough to enable the statisticians to make | ||

+ | an estimate of the total number of tanks | ||

+ | that had been produced up to any given moment, | ||

+ | based on the highest serial number in the sample and the sample size. | ||

+ | |||

+ | Suppose the tanks were numbered 1 to N, | ||

+ | where N was the total number of tanks produced | ||

+ | and that five tanks had been captured with serial numbers 20, 31, 43, 78 and 92, say. | ||

+ | From a sample (S) of five and a maximum serial number (M) of 92, | ||

+ | it was deduced that a good estimator of the number of tanks would | ||

+ | be (M-1)(S+1)/S. | ||

+ | In the example given, this translates to (92-1)(5+1)/5, which is equal to 109.2. | ||

+ | |||

+ | It transpires that the estimated number was 245 and after the war | ||

+ | it was confirmed that the actual number per month was 246, | ||

+ | whereas intelligence estimates were incorrectily far higher. | ||

+ | |||

+ | ===Questions=== | ||

+ | * What assumptions are involved in the formula given in the article? | ||

+ | ** How robust is the estimate? | ||

+ | * Should the serious consequence of the estimation (launching an invasion) have any influence on the way the estimation is performed? | ||

+ | * Can you think of any other information that might have helped to solve the problem? | ||

+ | |||

+ | Submitted by John Gavin. |

## Revision as of 12:14, 17 February 2008

## Contents

## Quotation

One more fagot of these adamantine bandages is the new science of Statistics.

Ralph Waldo Emerson

Fate from The Conduct of Life (1860, rev.1876)

## Forsooth

The following Forsooths are from the February 2008 issue of RSS NEWS.

Twenty-six new cases of the inflammatory lung disease sarcoidosis [were seen amongst rescuers] in the first five years after 9/11. Five or fewer rescuers got sarcoidosis anually before 9/11.

New York Daily News

21 September 2007.

Actually, I like the Poles' second idea even better. Instead of re-enacting a battle, they suggested, the summiteers should re-sit advanced level mathematics. Voting weights should be based on the square roots of the member states' populations. (Pocket caclulators allowed.)

The next two forsooths were suggested by Paul Alper

Much of the data on overweight people and obesity are limited, equivocal and compromised.

Patrick Basham and John Luik in BMJ, Volume 336, page 244, 2 February 2008

The adverse effects of obesity on health are well established, serious, and causal.

R.W. Jeffery and N.E. Sherwood in BMJ, Volume 336, page 245, 2 February 2008

I didn't major in math, Huckabee said to the Conservative Political Action Conference meeting, according to the Associated Press. I majored in miracles, and I still believe in them.

## Telomeres Tell A Lot

Conventional wisdom, indeed wisdom of any form, indicates that physical activity, a.k.a. regular exercise, is good for you. In particular, intuition would imply that the risk factors for age-related diseases such as diabetes, cancer, hypertension, obesity and osteoporosis would be reduced if people were engaged in physical activity. To make a direct connection between ageing and physical activity, consider a paper in the Archives of Internal Medicine (Vol.168, No. 2, January 28, 2008), “The Association Between Physical Activity in Leisure Time and Leukocyte Telomere Length” by Cherkas, et al.

“Telomeres consist of tandemly repeated DNA sequences that play an important role in the structure and function of chromosomes.” Leukocyte telomere length (LTL) is a proxy variable for one’s biological age as opposed to one’s chronological age. That is, the longer one’s telomeres, the younger one actually is. Conversely, the shorter the telomeres, the more aged.

This study measured the telomeres of 2401 twins who were put into four mutually exclusive categories of physical activity: “Inactive,” “Light,” “Moderate,” and “Heavy” corresponding to “16 minutes, 36 minutes, 102 minutes and 199 minutes” physical activity per week, respectively. The result after adjusting for “Age, sex, and extraction year” was that the “LTL of the most active subjects (group 4) was an average 200 (SE, 79) nt [nucleotides] longer than that of the inactive subjects (group 1)” producing a p-value of .006. The biological implication is “that the most active subjects had telomeres the same length as sedentary individuals up to 10 years younger, on average. This difference suggests that inactive subjects may be biologically older by 10 years compared with more active subjects.” When more complete information was available concerning BMI (biomass index), smoking and SES (socioeconomic status) this reduced the number of subjects to 1531 from the 2401; the LTL difference increased to 213 nt and the p-value increased to .02. Below are a summary table and Figure 1

### Discussion

1. The article states, “The results of this study can be extrapolated to other white individuals (men and women) of North European origin.” Find a biologist or a helpful librarian to determine whether it is suspected that non-whites have different telomere lengths and/or have a different distribution. If so, what does this imply about telomere length and ageing?

2. There were about nine times as many women in the study as men. Why might this be a concern?

3. Something important is missing in Figure 1 and its absence serves to magnify the average difference. What is it?

4. The subjects in the study were twins and therefore, attracted extra lay media attention. Six of the ten authors are affiliated with Kings College, London. From the Kings College website, “Comparing the telomere lengths of twins who were raised together but take different amounts of exercise, reduces the effect of genetic and environmental variation and so provides a more powerful test of the hypothesis.” Obtain the article and reference #21 to determine why twins as subjects as opposed to non-twins are sort of beside the point.

5. There was a “discordant twin-pair analysis” performed “as a further confirmation of the larger analysis.” A paired 2-tailed t test for 67 twin pairs, separated by at least a two category difference is displayed in Figure 2. What defect does it share with Figure 1? Why is it even more misleading given that a paired t test is being done?

6. The article states, “A limitation of this type of study is that physical activity level was self-reported.” Why might this be a limitation?

7. Assume there is a positive association between LTL and physical activity. Give an alternative explanation to physical activity causing greater telomere length. Give another alternative explanation.

Submitted by Paul Alper

## Modeling of Diabetes

Intuition can be deceiving. Obvious examples: the earth is flat and at the center of the solar system, Saddam must have had nuclear weapons, bootstrapping can't possibly be valid, earth, air, fire, water and that's it. An intuitive medical model of type 2 diabetes, according to an article by Rob Stein in the Washington Post of February 6, 2008, is "that the lower the blood sugar the better, and that lowering blood-sugar levels to normal saves lives." But, the results of the ACCORD (Action to Control Cardiovascular Risk in Diabetes) trial involving 10,251 randomly assigned patients turned out to "inject an element of uncertainty into what has been dogma." In the stronger words of Dr. Richard Grimm Jr. who helped design the study, "very surprising, shocking."

Surprising and shocking because "257 patients receiving the intensive treatment [lowering the blood sugar level to that of a person who did not have diabetes] had died compared to 203 receiving the standard treatment [lowering the blood sugar level to that of the average person with diabetes]." This result "prompted federal health officials to abruptly stop one part of the trial so thousands of the type 2 diabetes patients in the study could be notified and switched to less risky treatment."

### Discussion

Assume that approximately half of the 10,251 patients were in the intensive treatment group and half were in the standard treatment group.

1. Why would the researchers do a one-tail test rather than a two-tail test?

2. Here is a Minitab run for the data given in the article:

Sample |
x |
N |
p |

1 |
257 |
5125 |
0.050146 |

2 |
203 |
5126 |
0.039602 |

Difference = p (1) - p (2)

Estimate for difference: 0.0105443

95% upper bound for difference: 0.0172689

Test for difference = 0 (vs < 0): Z = 2.58 P-Value = 0.995

Fisher's exact test: P-Value = 0.996

Why is the P-Value so ridiculously high?

Submitted by Paul Alper

## How a statistical formula won the war

Gavyn Davies does the maths, Gavyn Davies, The Guaridan (UK), July 20 2006.

The old article relates how statisticians were called on to estimate the number of enemy tanks prior to the allied attack on western front in 1944.

The statisticians had one key piece of information, which was the serial numbers on a few captured tanks. Assuming that the tanks were logically numbered in the order in which they were produced, was enough to enable the statisticians to make an estimate of the total number of tanks that had been produced up to any given moment, based on the highest serial number in the sample and the sample size.

Suppose the tanks were numbered 1 to N, where N was the total number of tanks produced and that five tanks had been captured with serial numbers 20, 31, 43, 78 and 92, say. From a sample (S) of five and a maximum serial number (M) of 92, it was deduced that a good estimator of the number of tanks would be (M-1)(S+1)/S. In the example given, this translates to (92-1)(5+1)/5, which is equal to 109.2.

It transpires that the estimated number was 245 and after the war it was confirmed that the actual number per month was 246, whereas intelligence estimates were incorrectily far higher.

### Questions

- What assumptions are involved in the formula given in the article?
- How robust is the estimate?

- Should the serious consequence of the estimation (launching an invasion) have any influence on the way the estimation is performed?
- Can you think of any other information that might have helped to solve the problem?

Submitted by John Gavin.