Reliable and Unreliable Batting Averages
We have this problem, in my Ballpark League, of pitchers being coded with batting averages which are wildly inaccurate representations of their true batting ability. To take a case in point, some of you who are old farts like me will remember that in the early 1960s Hank Aguirre was the most notorious bad-hitting pitcher in baseball. He was supplanted later in the decade by Sandy Koufax and then by Dean Chance, but when I first started reading baseball stuff, jokes and stories about what a terrible hitter Hank Aguirre was were common. Aguirre had 388 at bats in his career and collected 33 hits for a career average of .085. He never hit a home run, and he struck out in more than 60% of his career at bats, 236 out of 388.
In 1967, however, the 36-year old reliever had two at bats, and hit a triple. It was the only triple of his career. He hit .500 for the season—1 for 2—so Ballpark, showing a rather bizarre lack of insight into the situation, makes him a .500 hitter for the season. Since his only hit was a triple, almost every at bat on his card is a triple. This distorts his role on the team immensely, as Ballpark managers are so tempted to sneak him into the game before the pitcher’s spot is due up. It’s a BIG deal, actually, because if you’re down one run in the top of the ninth inning and you can put him in the game, then you’ve got a .500 hitter with a slugging average higher than Barry Bonds at his best leading off the bottom of the ninth inning. It makes a difference.
I use this extreme example to illustrate the concept of a "true" batting ability. Obviously Hank Aguirre’s true batting average is not .500. That’s an unreliable number. The question I am getting to is, how many at bats does it take before a player’s batting average becomes a reliable statement of his underlying ability? How reliable is a player’s batting average (as an indicator of his true ability) after 100 at bats, or 200 at bats, or 500 at bats?
I should stress here that I am only talking about the true ability to hit for average. Obviously there are some .270 hitters who are tremendously productive, like Mike Schmidt, and there are some .270 hitters who are not at all productive, like Manny Trillo, who was actually a .263 hitter but you get the point. I am not dealing with that issue at all here; I am simply talking about the reliability of the batting average. Not talking about park effects either.
I don’t know why it took me until I was 72 years old to study this, but this is a pretty direct, simple study that addresses that question to my satisfaction. What I did was:
1) Generated a "known" batting average for each of 1,000 simulated players. The known batting average could in theory be anywhere between .170 and .370, but tended to cluster around .270.
2) Created a randomized process in which a player would exactly match that batting average in an infinite number of at bats, but the sequence of hits and outs was random, and
3) Compared the "generated" or "output" batting average with the "known" or "input" batting average.
In other words, how long does it take for a .270 hitter to hit .270, and a .318 hitter to hit .318? How many at bats does that take?
There are two concepts here that you could confuse if I don’t do a good job of distinguishing them. Those two concepts are accuracy and reliability. A player’s batting average may be an accurate representation of his true batting skill in a very small number of at bats. A player may have a true batting average of .250, and in four at bats he may hit .250. His batting average is a perfectly accurate representation of his skill level.
It MIGHT be, but we have no way of knowing. It’s not reliable information. Accuracy concerns the batting average of a single player. Reliability is a characteristic of a group of batting averages, which changes with the number of at bats.
Circling back to this:
2) I created a randomized process in which a player would exactly match that batting average in an infinite number of at bats, but the sequence of hits and outs was random.
There will be readers who will say to themselves "But hitting is not random. It depends on the performance of the pitcher, the performance of the hitter. It’s a skill; not a random occurrence.
Well, yes. A series of at bats by a hitter is not a random sequence. At bats by a hitter are not random, but they have almost all of the identifiable characteristics of randomness. They’re not random, but they have the characteristics of randomness. It’s like your dog. Somewhere else in the world there is another dog who has the same number of legs and eyes and ears as your dog, a dog who is the same height and weight as your dog, the same color and color patterns as your dog; in short, another dog which is virtually identical to your dog. He has (nearly) all the characteristics of your dog, but he is not your dog. A series of at bats by a player is not random, but, because there are a long, long series of randomizing factors that separate each at bat from the next at bat, the pattern has nearly all the identifiable characteristics of randomness.
So anyway, how long does it take before a .270 hitter can reliably be expected to hit .270?
Well, what do we mean by reliable, within this study?
"Reliability" means the frequency of accuracy. Let us say that after X number of at bats, every player’s batting average was the same as his true ability. If that were true, then we could say that the player’s known batting average is a 100% reliable indicator of his true hitting ability after X at bats.
But that means that we have to define "accuracy". If a player has a true batting level of .280 but hits .282, you wouldn’t say that was "inaccurate", would you? That’s pretty accurate.
But if a player has a true batting level of .280 and hits .325, you wouldn’t say that his batting average was an accurate measure of his batting skill, would you? I am sure you wouldn’t. He’s not really a .325 hitter.
I decided that, for purposes of the study, 30 points was the outside limit of what might be called "accurate." If a player who is a true .260 hitter hits .289, that’s not very accurate, but it is more accurate than if he hits .320 or .195. It’s not wildly inaccurate.
This is the scale I used. If a player’s "trial" batting average is the same as his "true" batting average, then the batting average is 100% accurate.
If a player’s trial batting average is 1 point off from his true batting average, I would score that as 97% accurate—29 out of 30.
If a player’s trial batting average is 2 points off from his true batting average, I would score that as 93% accurate—28 out of 30.
If a player’s batting average in the trial is 25 points off from his true, long-term batting average, I would score that as only 17% accurate—5 out of 30.
And if a player’s batting average was 30 points off from his true batting average, that would not be considered accurate. 29 points off, 3% accurate, but 30 points off, 0% accurate.
The reliability is the average level of accuracy, among many players, for a certain number of at bats. At 50 at bats, the average hitter prototype within the study had a batting average which was 17% accurate as a representation of his true underlying batting average. On average, if a player has 50 at bats, his batting average is going to be 25 points off from what it should be.
This is a chart of the reliability of batting averages up to 100 at bats:
10 At Bats 8% Average error more than 28 points batting average.
20 At Bats 11% Average error more than 27 points batting average.
30 At Bats 14% Average error more than 26 points batting average. Read "more than" into the following sentences; I’ll skip typing it but you’ll know it is there. I’ll explain later.
40 At Bats 15% Average error 25-26 points batting average.
50 At Bats 17% Average error 25 points batting average.
60 At Bats 19% Average error 24 points batting average.
  70 At Bats 21% Average error 24 points batting average.
80 At Bats 23% Average error 23 points batting average.
90 At Bats 25% Average error 22-23 points batting average.
  100 At Bats 26% Average error 22 points batting average.
So after 100 at bats, a player’s batting average is about 26% reliable as an indication of his true ability to hit for average, and the average hitter is more than 22 points off from his true, long-term batting average. There is another useful question we can ask here, which is "At X number of at bats, how many players will be within 30 points of their true batting average?"
At 20 at bats, 21% of players will have a batting average within 30 points of their true batting average. In 30 at bats, that’s 28%. Then, increasing it by 10 bats, the numbers are 31%, 35%, 38%, 42%, 46%, 48% and 51%. After 100 at bats, essentially one-half of players will have a batting average which is within 30 points of what they should be hitting, and about one-half will not.
The "more than" thing. . . .let’s say that at a certain level of at bats, a batting average is 10% reliable. That would mean that it is 90% inaccurate. Assuming that 30 points is zero percent reliable, that would suggest that 10% reliable means 27 points average error.
Yes, but not quite, because errors larger than 30 points are being effectively counted as 30 points. If a player’s batting average is 30 points away from what it should be, that is counted as zero, but if it is 40, 50 or 60 points away from what it should be, that also is counted as zero. So an average error of 27 points actually means more than 27 points. This effect lasts until 1500 to 2000 at bats. After 1500 to 2000 at bats there are no players left who have discrepancies greater than 30 points, so after 2000 at bats it is no longer "greater than 8 points"; it is just 8 points. (If I was doing the study over I would have steered around that problem, but you know. . . .at some point you have to move on to the next study.)
Taking it up from there:
At Bats
|
125
|
150
|
175
|
200
|
225
|
250
|
275
|
300
|
Reliability of Batting Average
|
28%
|
31%
|
34%
|
35%
|
37%
|
39%
|
40%
|
41%
|
Average Error of Batting Average
|
.022
|
.021
|
.020
|
.020
|
.019
|
.018
|
.018
|
.018
|
% of Players within 30 points of True Average
|
53%
|
60%
|
63%
|
64%
|
67%
|
69%
|
72%
|
74%
|
After 300 at bats, a player’s batting average is 41% reliable as an indication of what he should hit. 74% of players are within 30 points of their true ability at that point, but 30 points isn’t much of a claim; that’s a 60-point window out of a range that runs generally from .200 to .340, or 140 points.
What do you think of as a full season’s worth of at bats? How about 550? 550 at bats is about what regular gets in a season, isn’t it? How reliable is the batting average at that point?
At Bats
|
350
|
400
|
450
|
500
|
550
|
600
|
650
|
700
|
Reliability of Batting Average
|
44%
|
46%
|
49%
|
51%
|
53%
|
55%
|
57%
|
58%
|
Average Error of Batting Average
|
.017
|
.016
|
.015
|
.015
|
.014
|
.013
|
.013
|
.013
|
% of Players within 30 points of True Average
|
77%
|
81%
|
85%
|
89%
|
90%
|
91%
|
93%
|
95%
|
At 550 at bats the batting average is 51% reliable as an indicator of what the player should hit, but the average hitter is still 14 points away from where he should be—meaning, of course, that he is exactly on target sometimes and 30 points off sometimes. If a player hits .330 one year and .285 the next the media will give you explanations for it, but it doesn’t really mean anything; it’s just random fluctuation. It’s just something that happens. The chip shots and line drives caught by the shortstop don’t even out in 550 at bats. It takes more than a season for that to happen.
Let’s go now to the multi-season numbers:
At Bats
|
1000
|
1500
|
2000
|
2500
|
3000
|
4000
|
5000
|
6000
|
Reliability of Batting Average
|
64%
|
70%
|
73%
|
76%
|
77%
|
81%
|
83%
|
84%
|
Average Error of Batting Average
|
.011
|
.009
|
.008
|
.007
|
.007
|
.006
|
.005
|
.005
|
% of Players within 30 points of True Average
|
98%
|
99%
|
100%
|
100%
|
100%
|
100%
|
100%
|
100%
|
Of note there: after 1500 at bats the average player’s batting average is within 10 points of what he should hit. It takes about three years as a regular for the AVERAGE gap between a player’s batting average and his true skill level to shrink to less than 10 points. By 2000 at bats, 100% of players are hitting within 30 points of what they should hit—not a true 100%, but 100% rounded off. There was one "player" in the study who was still 31 points off target after 4,146 at bats.
6,000 at bats is a pretty long career, ten years as a regular or a little more. After 6,000 at bats you can be really, really certain that a player’s batting average is within 30 points of his true skill level, and you can be 90% sure than it is within 10 points. And finally, this chart:
At Bats
|
7000
|
8000
|
9000
|
10000
|
11000
|
12000
|
13000
|
Reliability of Batting Average
|
86%
|
87%
|
87%
|
88%
|
89%
|
89%
|
90%
|
Average Error of Batting Average
|
.004
|
.004
|
.004
|
.040
|
.003
|
.003
|
.003
|
% of Players within 30 points of True Average
|
100%
|
100%
|
100%
|
100%
|
100%
|
100%
|
100%
|
I stopped at 13,000 because that’s about as long as a career can go, but also because the data was stressing out my poor little computer. 1,000 players and 13,000 events for each player, and it takes 9 cells to generate and record each event, figure each batting average and compare it to the "true batting average"; with some other counting stuff the spreadsheet is over 250 million cells, which a small personal computer has been known to complain about. I don’t actually have to keep all 250 million cells "live" at the same time. I don’t think my computer would do that.
One more little point. . . .a low batting average achieves a higher level of reliability than a high batting average, in the same number of at bats. In 200 at bats, a .300 hitter is more likely to hit .350 than a .200 hitter is to hit .250. Fairly obvious reasons for that, so I’ll skip the explanation.
Anyway, the conclusion is that for the all of the bloops and bleeders to reliably even out so that the batting average represents skill and conditions, but is no longer meaningfully effected by luck. . .that takes much, much longer than any player’s career. In 13,000 at bats, a number that almost nobody reaches, a batting average is 90% reliable, but the average player is still 3 points over what he deserved or 3 points under it. As to how many at bats it would take before the batting average of a player is 100% accurate as to his skill level, I really don’t know, but it is obvious from the data that it would be more than 50,000 at bats.
One other note of explanation. . .. .I am not able to comment on your responses to this or any other article. It’s a software issue. You may have noticed that our software is not always what we would want it to be. The software architecture was created 15 years ago by a company that we have not worked with for quite a few years now, and it’s a struggle for us to keep it working.
I had this problem with the software, which was that I had two log-in IDs with the same name but different passwords, one of which took me to the reader experience and one of which took me to the Admin site. When the Admin software forced me to update my password, which it does pretty often, something weird would happen that we don’t understand, and I would have to re-create my entry path into the reader portion. That was annoying, so I asked the computer guys if they could fix it so that I had only one log-in, and they did, but after they fixed it I was never able to get on to the site as a user at all. I can get on to READ it, as anyone can; I can read your posts, but I can’t respond to them, haven’t been able to for months. So. . .it’s awkward. You probably know that our software is awkward, but you have no idea how awkward it really is. I offer you my apologies, and my regrets for my inability to respond to your comments. There is probably some way to do it, and there is probably some way to train a hummingbird to light on your finger, but I have no idea what either one of those is. Thanks for reading.