Do you ever have situations in which you know there must be a hole in your logic somewhere, but you can’t figure out where it is? Here’s one of mine.
Let us suppose that there is a citizen of ancient Rome who had let us say six children who survived into adulthood, so that the issue of his having at least two generations of descendants is not an issue. Let’s call this citizen Barry. Then it would seem to me that it must be true, in that case, that every human being on the planet now would be a direct descendant of that person, Barry, including those in Africa and China, although perhaps not someone in South America or Hawaii or Borneo or someplace, but that’s a separate issue and I don’t want to get distracted by that.
I sort of KNOW that there is something wrong with my logic here, but here it is anyway. Let us say that the population of the earth in the year 100 BC was 300 million, which is a high-end estimate; most estimates are a little lower than that. But assuming that there are 300,000,000 people on the earth and that only six of them are descendants of Barry in the first generation, that would mean that in the first generation .999 999 98 of the people on earth are NOT direct descendants of Barry.
But, unless two descendants of Barry marry (or produce children without marrying, gasp). . .unless two descendants of Barry marry, then in the SECOND generation this number would go to .999 999 98 squared, assuming that the descendants of Barry produce an average number of next-generation descendants. .999 999 98 squared is .999 999 96. In the third generation this number would go to .999 999 92.
Note that we are not assuming here that people are banned from mating with their siblings and cousins; we are merely assuming that it is statistically improbable. If we assume that people are actually BANNED from mating with their siblings, then the number goes down slightly, although not by enough that it makes any difference. It would make a difference later down the chain, but then, it’s a false assumption later down the chain, so there doesn’t seem to be any reason to worry about that.
So in the third generation .999 999 92 of the world population is NOT descended from Barry, in the fourth generation .999 999 84, and in the fifth generation .999 999 68. It takes about 20 generations for this number to move appreciably. It actually takes 20 generations, using these assumptions, to reach the point at which 1% of the world’s population is descended from Barry. But after 20 generations, things change very rapidly. The percentage of the world which IS descended from Barry goes from 1% to 2% in the next generation, the 21st generation. It vaults past 10%--actually well past 10%--in the 24th generation. It reaches 50%, basically, in the 26th generation. By the 29th generation only one-half of one percent of the world’s population is NOT descended from Barry. In the 30th generation, a child is NOT descended from Barry only if someone in that one-half of one percent of the world’s population mates with someone ELSE in that same tiny sliver of the population. Basically, by the 30th generation, the entire population of the world is descended from Barry. In the 31st generation, it is statistically improbable that that there would be a single person on earth who is NOT descended from Barry.
Of course, due to racial sub-groups and isolation, after a few generations the matches are not random vis a vis Barry’s line yes/no. Let us suppose there are two primogenitors, Barry and Umfumu, and that Umfumu was in Nigeria; my apologies if Umfumu is not a Nigerian name. After the fourth generation those who ARE descended from Barry—let us call them Romans—are much more likely to mate with other descendants of Barry than with descendants of Umfumu.
But this doesn’t seem to make any difference in the key issue of whether, by the current time, every citizen of the world is descended from Barry (and from Umfumu). It only takes 30 generations for Barry’s DNA to be included in the DNA of every person. We have. . .well, probably about 84 generations to work with. So if we assume that after the 30th generation there is ONE descendant of Barry who enters the breeding population of the descendants of Umfumu—just one--then in another 30 generations Barry’s line will have invaded the population of descendants of Umfumu, as well. This doesn’t have to happen immediately; you actually have hundreds of years in there for one descendant of Barry to cross over.
And, in fact, there have always been some small number of people who did move across barriers. The Romans liked to capture lions and tigers and elephants and hippopotamus and such and display them in Rome, and there were Africans who came to Rome with the animals as caretakers or ringmasters. There were Africans who were raised in Rome and became prominent citizens of Rome. There were Romans who were delegated to relatively remote parts of Africa ("remote" from the Romans’ point of view.) It just takes one in every several hundred years.
Also, geographical isolation is more gradual than definitive, although there are some definitive barriers. Town A is 25 miles from Town B, Town B is 25 miles from Town C, Town AY is 25 miles from town AZ, but Town A is a thousand miles from Town AZ. When you have 84 generations to work with, DNA can GRADUALY work its way around the globe, moving just a few miles in each generation.
Perhaps, if one area has an unusual degree of geographic separation from the rest of the globe, such as a Samoan island or a tribe in the Amazon jungle, then that tribe might be an exception to the rule. But generally, it seems to me (logically) that it must be true that IF Cicero had living descendants, then you and I both HAVE to be descendants of Cicero.
But intuitively, it seems to me that there must be some flaw in this logic that I just have never been able to spot. So. . .whaddaya think?
OK, here is a vaguely similar problem which has to do with 1955 Boston Red Sox. The 1955 Red Sox scored 470 runs at home, 285 on the road, while allowing 395 runs at home, 257 on the road. They played 78 home games, 76 on the road, but still, it creates a Park Factor for the season of 156, which is remarkable even for Fenway, where the Park Run Index was 108 in 1954 and 108 in 1956.
Because of this extreme Park Run Index for 1955, if you ask "Who had the greatest pitcher’s season of the 1950s" and you rely on the Park Run Index, you will reach the conclusion that the greatest pitcher’s season of the 1950s was by Frank Sullivan of the Red Sox in 1955. Maybe you won’t reach this conclusion if you use strikeouts and walks rather than runs allowed, but this is a dodge. Jackie Jensen, who drove in 116 runs and had a .369 on base percentage, will be "normalized" to mediocrity because he is creating runs in an environment where runs are believed to be very abundant, more abundant than they actually were. I have certainly, while using otherwise reasonable methods, concluded that Frank Sullivan in 1955 had the greatest pitcher’s season of the 1950s, and I am not the only person who has done this; other analysts have found themselves stuck with the same conclusion.
Intuitively, we all know that this is not true; Frank Sullivan in 1955 was 18-13 with a 2.91 ERA in 260 innings, which is a very good season, but not better than Robin Roberts or Bobby Shantz in 1952, and probably not better than your average Warren Spahn season. It merely LOOKS like an incredible season if you combine the 2.91 ERA and the 156 Park Run Index.
And, intuitively, we all know what the problem is. A team CAN play a double-header in which one game is 1-0 and the other one is 16-13, and it isn’t necessarily the pitchers or the sun or anything; it just happens. It CAN happen that all of your slugfests in a season (or 15 out of the 20) happen in your home park, while 15 out of your 20 pitcher’s duels are on the road, and this can happen without regard to the park effect. It doesn’t happen OFTEN; it’s a one-in-a-thousand type fluke—but it happened here.
You can deal with this problem, as a statistical analyst, by using multi-season park effects; you can do that and I have, but there are logical problems (and practical problems) with that approach, as well. What I am really asking is, when there is a fluke of this nature, how do we recognize that it is a fluke, without relying on external data such as data from other seasons?
We are working on Win Shares and Loss Shares now, and we’re back to this problem. Basically, I would rather use one-year Park Factors than three-year Park Factors, because I think generally this causes fewer problems and leads to more accurate measurements. Also, while I will use strikeouts and walks and home runs allowed, I will rely heavily on runs allowed to evaluate pitchers, because it is more accurate to do that than to trust the thrie troo outkummes.
So I’m back to the problem: How do I (1) use single-season Park Factors, and (2) use pitcher’s runs allowed, without (3) getting a stupid number for Frank Sullivan (and Jackie Jensen) in 1955?
In a way, it is parallel to the United Airlines problem. . .I realize that none of you will understand what the hell I am talking about, but I have been thinking about writing this article for several years, and decided to do so because the United Airlines problem reminded me of it. All of the individual policies which led to the United Airlines Public Relations nightmare might be perfectly defensible. You COULD have policies like "take care of any and all overbooking issues BEFORE you put people on the plane, not after" or "pay whatever you have to pay to get customers to accept being taken off a plane"; you COULD have those policies, but probably the policies they did have are individually defensible. The only thing is that if you have that series of policies which CAN result in this kind of an outcome, then you need to have a fail-safe policy which says "No matter what happens, you don’t drag a 70-man down the aisle of the airplane with blood running down his face."
The same here. You COULD have a policy of using three-year or five-year park factors; that would be OK. You COULD have a policy of using strikeouts, walks and homers allowed instead of runs allowed; that would be OK, although I’m not sure it would actually help in this case. Those options would be OK, but the other options would be OK, too.
The only thing is that if we’re going to use single-season Park Factors and runs allowed rates, then we need to have some sort of fail-safe policy which says "No matter what happens, you can’t publish ratings that you know are wrong. Frank Sullivan, 1955, was NOT the best pitcher/season of the 1950s, so you can’t publish ratings that show that he was."
So what I am really asking is, "What is the fail-safe policy that protects us in a case like this?"
The Roomba and the Corner
Our house has a lot of dust. It’s a great old house, but it is twenty years older than Fenway Park, so that’s an issue, but also, neither my wife nor I is inclined to grab a dust rag. I have allergies which the dust dus not help, and also, sometimes I am embarrassed by how much dust we have in the house. A couple of years ago, I told Susie that what I wanted for Christmas was one of them machines that pulls dust out of the air. We are reasonably well off and don’t need more stuff; we need less dust.
Well, you know how that goes; what you get for Christmas is what your wife thinks that you need, so she said "What about a Roomba instead?" A Roomba is a circular vacuum cleaner that moves randomly around the room for a while. It is not cheap, so we decided to get a Roomba for Christmas; this is what we got each other, half a Roomba each.
Well, the Roomba helps SOME with the dust in the house, although actually not very much; we still have dust covering everything in the house that isn’t moved every couple of weeks, but maybe it is 20% better than it used to be. The Roomba doesn’t do a great job, honestly; it will run over a dust bunny in the carpet five times and just push it further down into the carpet on each pass.
We’re not really unhappy with the Roomba. It’s not like we don’t use the Roomba; we use it every week, and sometimes we’ll use it every day for a couple of weeks. It’s not like it doesn’t pick up dirt and dust; it just doesn’t pick up ALL of the dirt and dust. Maybe it gets 60% of what it should get, so you run it a second time, and then you’ve got 84%, so you run it a third time, and then you’ve got 94%, so you run it a fourth time, and then you’ve got 97-98%. Having the Roomba do a room four times is a lot easier than vacuuming it yourself once, so it keeps the house a little bit cleaner, maybe. Also, the Roomba goes automatically under beds, chairs, sofas, cedar chests, desks, chests of drawers, bathroom cabinets, etc. In our house dust would pile up under some of those places for months, others for years, so there would be a lot of dust under there, contributing to the house’s general dust level. Roomba gets in there and cleans that dust out, so that makes a difference.
Anyway, one of the Roomba’s problems is that it loves corners. It doesn’t actually GET to the dust in a corner; it’s round, so it can’t get into the corner of the corner. I mean like a corner of the room. If you have like a 2-foot by 3-foot area in a room, more or less blocked off from the room by furniture, Roomba will find that corner and get stuck there, cleaning that little corner of the room relentlessly until it shuts off. Very often, if you pick it up and put it out in the center of the floor, it will immediately drive back into the corner of the room and get stuck there again.
I don’t quite understand the mathematics of this; I sort of intuitively understand them, but I don’t really understand them. On one level, it seems that if the Roomba can find its way through a 12-inch opening to get INTO a corner, it should be able to find its way through the 12-inch opening to get OUT of the corner. It will, once in a while, but mostly not; mostly it just gets into the corner and stays there. You have to learn to block off the corners before you start the Roomba, which is still easier than vacuuming the room yourself.
But it seems to me that this is a remarkably good symbol of what happens in life. It seems to me that this is one of life’s great lessons: that it is much easier to get INTO a corner than it is to get OUT of a corner. I have known lots of people who got stuck in a corner, and just never got out, or got stuck in a corner and stayed there for decades before they could get out. Drug use is a corner, drug addiction; it is a hell of lot easier to get into this habit than it is to get out. Alcohol use, tobacco use, sure, but there are lots of corners like that. A 17-year-old boy gets his 15-year old girlfriend pregnant; he’s in a corner. They’re both in a corner. They don’t really like each other; they don’t belong together, and they’re not financially able to provide for a child. They get married, get divorced in three years. They’re stuck in a corner. It was a hell of a lot easier to get into that corner than it is to get out. You make a couple of bad bets, you owe a bookie $10,000, you borrow $10,000 from a mob guy to pay the bookie, you’ll be paying the mob $100 a week for the rest of your life. It was a lot easier to get into that corner than it is to get out. A young guy takes a dead-end job, just trying to make a living; then he buys a car that it a little more expensive than he can really afford, so he can’t afford to quit the job and look for another one. He’s in a corner. I have known people who got stuck in a corner like that for years. It was a hell of a lot easier to get in there than it is to get out.
Mathematically, there is probably something to do with the relationship between floor space and perimeter space which predicts the difficulty the Roomba has in escaping a corner—but does the same math apply to the human problems, or does some parallel math that we are unable to see apply to those problems? Just wondering. Also, I still want one of them machines that pulls the dust out of the air.