BILL JAMES ONLINE

Estimates

September 6, 2020

How do you estimate things? For a long time, I just took wild guesses and I did okay. But lately I’ve been more methodical. I wonder if anyone else uses the method I’ve developed, or would care to.

I call it the Halves method. Say, I’m on my bike, and I want to ride for twenty minutes but I have no timepiece with me. Instead of just riding and guessing "OK this feels like twenty minutes," what I’ll do is ask myself "What is the least amount of time I could possibly have ridden so far?" and also guess the most amount of time. Say those figures are 10 minutes and 25 minutes—I’ll then find the midway point and then I’ll ask myself "OK, now if you had to pick a number halfway between that mid-point and either extreme, which way would you go?" Say I decide to pick the higher figure: the new mid-point between 13 minutes and 25 minutes is 19 minutes, which tells me that I’m almost done riding. When I get home to check my clock, I’ll usually find that "19 minutes" was a pretty close estimate. Sometimes I get surprised, but not very often, and not by much.

Doing a further step would result in greater precision, perhaps, but requires much more work and shaves only a few minutes off the 19. (Or adds a few minutes, depending, of course. It’s much more work, I find, because at that point I have no clue which direction, high or low, makes more sense so I’m reduced to wild guessing, which was exactly what this method is meant to avoid.)

With baseball stats, this method is often useful. Say I’m wondering how many homeruns Dick Allen hit. I have no specific memory of that particular number, but my method tells me that he certainly hit at the very least 250 and I don’t think he played enough to have hit any more than 370. So I’ve got a midpoint of 310, and I think he hit between 310 and 370 rather than between 250 and 310, so my final midpoint is 340.

The number turns out to be 351, so I think I came closer using my method than I would have if I’d just guessed at the number.

Let’s try it again, with a number I’m even less likely to have memorized. Say I’m looking for the lifetime winning percentage for Jerry Koosman.  I know Koos had some 20-loss seasons, and he didn’t get great run support with the Mets, so I’d guess there’s no way he wound up with a w/l pct very close to .600—let’s peg the absolute top figure at .580, and I can safely say he was too good a pitcher to go below .500 either, so let’s set .510 as the bottom figure, yielding a midpoint of .545. Do we want to go now with the higher figure or the lower one? I’d say the lower one, giving us a final number of .527.

Koosman’s winning percentage turns out to be .515, very close to my absolute bottom starting figure of .510, but not that far off my final figure of .527, either.

Suppose I’m trying to figure out how many times Ron Hunt was hit by pitches in his career. I know the famous top figure is 50 in one painful season, but I also know that was an anomalous figure, not even approached in Hunt’s other years, and also that he rarely if ever played a really full season, and that there were years early in his career when he wasn’t really HBPed that much, so I’ll set the high figure at 270 and the low figure at 140. Leaving out the arithmetic, that gives me a final figure of 237.

Hunt was actually HBPed 243 times.

I was a big Mets fan in Koosman’s and Hunt’s heyday, so maybe that gives me a little advantage in choosing them for my examples, but if I chose a non-Met, I think I’d just have to have to start with wider parameters. Take, say, Tony Oliva’s RBIs. Do you have an idea off the top of your head how many men Oliva drove in during his career? Me, neither.

We all know, I think, that he was a real good hitter with some power, but he wasn’t a gigantic RBI guy, and he was hurt a lot, so: I think 1400 RBIs is the most he could have conceivably driven in, and 800 is my lower figure. Does that seem fair? That gives us a midpoint of 1100 runs, and since 1400 is like Jim Rice/Orlando Cepeda territory, I’d pick the lower end as the way I’d have to go, yielding 950 as my final estimate. And—Holy crap! I’ve hit this one right on the head. Oliva drove in 947 runs in his career.

That was supposed to be my example of what happens when I have no idea at all, so in a way that example didn’t work out too well. But I’m sure there are examples of wrongheaded choices, of which there are two steps in the process that could easily go wrong, missing either extreme of high or low, and missing which direction to go in from the mid-point. Do either of those, and you’re bound to be way off.

If I had chosen the higher figure for Oliva, for example, I’d have been WAY off the mark, or if I’d set the lower figure even lower, say at 600, I’d have been way off in my final estimate. And I’m sure there are instances where I’ll make one of those mistakes, and sometimes both of them, but if I can avoid those Scyllas and Charybdises, I can find my way back home.

Try this method, and let me know how it works out for you, both the successes and the failures of the method. I find it fun to think about, riding on my bike.

 
 

COMMENTS (2 Comments, most recent shown first)

jfenimore
You can make it home if you avoid those Sirens.
5:13 PM Sep 6th
 
jrickert
This reminds me of a demonstration often done in some statistics classes. The students are asked to make 10 90% confidence intervals for some quantities (length of the Missouri River, weight of the building they are sitting in, the number of mosquitos born in the state of Minnesota in 2019, etc.). If they make good intervals the values should fall within an average of 9 out of 10 of their intervals. Usually, the average is around 3 or 4 out of 10. An illustration that people often underestimate the uncertainty. That's something to be careful about when making these estimates.
Our general knowledge of baseball does help create tight bounds, making it plausible that this might allow us to get in the neighborhood.
5:00 PM Sep 6th
 
 
© 2011 Be Jolly, Inc. All Rights Reserved.