I am being peppered in "Hey, Bill" with questions about Batting Averages on Balls in Play, which are good questions, but I did a couple of short studies related to the issue, and I thought I would report those here, and then I’ll answer the "Hey, Bill" questions as well, but some of them I’ll answer by referencing this study.
One of the core functions of Sabermetrics is to distinguish between what there is in the statistics which is real and can be relied upon, and what there is that is merely a manifestation of luck. The first meaningful illustration of this in our field had to do with pitchers’ won-lost records. In the 1970s and up until the 1970s, it was almost universally accepted by baseball men that a pitcher’s won-lost record was the truest indicator of how well he had pitched, because "luck evens out over the course of a season." If a pitcher went 14-10 but with a 4.38 ERA, while a teammate went 9-15 but with an ERA a run better, people would say that the 14-10 pitcher had pitched well when it mattered, that he was a winner, that he knew how to win, that he knew how to close out a victory, etc. etc. Which, by the way, is the actual data from Joe Niekro and Ken Holtzman with the 1968 Cubs; Niekro was 14-10 with a 4.38 ERA, while Holtzman was 9-15 but with a 3.35 ERA. The next year Holtzman went 17-13, while Niekro went 8-18, so apparently over the winter he forgot how to win, although he later remembered.
In 1978 Jim Slaton went 17-11 with a 4.12 ERA, while a teammate, Dave Rozema, went 9-12 with a 3.14 ERA. The same year Chris Knapp went 14-8 with a 4.21 ERA while a teammate, Paul Hartzell, pitched almost as many innings but went 6-10 with a 3.44 ERA. In 1977 Bob Forsch went 20-7 with a 3.48 ERA, while a teammate, Harry Rasmussen, posted the exact same ERA in more innings, but went 11-17. The explanation that prevailed at the time was that Forsch was a winner. The next year Forsch went 11-17. Harry Rasmussen changed his name to Eric.
What we saw, in our field, was that these things were not actually a result of a pitcher being a "winner" or a "loser", but of luck not evening out. When you actually looked at the number of runs scored for the "winners" and the number of runs scored for the losers, this was pretty obvious. All we were trying to say was that the won-lost record, in some outlying cases, some relatively unusual cases, was misleading because the element of luck reflected in it was imbalanced.
For ten years after we began making this argument, people pitched silly arguments at us in an effort to defend the traditional view of the issue. People took what we had said as if we had said that a pitcher’s won-lost record was meaningless. Well, they would say, but the pitcher himself is a hitter in the National League. Isn’t one of the reasons that one pitcher gets more runs than another just because he is a better hitter? And what about park effects. . .doesn’t the pitcher who gets more runs usually pitch in the high-run parks? And won’t he continue to pitch in the high-run parks the next year? And. .here’s one I heard a thousand times. . ."That’s what I’m saying, Bill. Blyleven pitched just well enough to lose."
Eventually, almost everyone who works in baseball came to understand what we were saying. Nobody who makes decisions for a baseball team in 2016 really believes that the won-lost record is a reliable indicator of how well a pitcher has pitched, although the old-line thinking survives among a few announcers and the oddball scout. This is what we do, in our field. If a team scores 670 runs and allows 700 runs but finishes 90-72, that isn’t a special, hidden ability; it’s luck. If a team goes 30-15 in one-run games, that’s luck. I don’t care how many clever ways you come up to explain why it might not be luck; it’s luck. If 16% of the fly balls allowed by a pitcher turn into home runs, that’s bad luck. I don’t care how many ways you come up with to explain the subtleties of pitching to me; that’s just bad luck.
Allen Craig hit .400 with runners in scoring position in 2012, and .454 in 2013. That’s luck. Nobody actually has an ability to hit with runners in scoring position. The ability is to hit; there is no special ability which arises only in limited circumstances. When it happens, it’s luck. If it happens twice in a row, it’s just a LARGE pile of luck.
This Ball in Play issue is structurally identical to the run support/won-lost record debate. What we are trying to say is that a batting average in many cases involves an element of luck, and that we can actually see what that element of luck is by looking at the player’s batting average on balls in play. If he has hit .370 on balls in play, he’s exactly like a pitcher who got 6 runs a game to work with last year: he was lucky, and he’ll be less lucky next year, and you can bet on it.
Apparently, based on my mail, this message hasn’t been widely understood yet. Aren’t there some hitters who hit .340 or .345 with runners in scoring position over the course of a career? Well, yes, of course there are a few hitters who have better-than-normal ability to hit the open spots in the defense; no one ever suggested that there were not. That’s like the occasional pitcher who hits .250 and has a little power, so he bumps up his offensive support a little bit. If a player hits a lot of line drives, doesn’t that increase his batting average on balls in play? Well, first of all, if a player hits a lot of line drives, that’s luck. 21% of balls in play are Line Drives. Miguel Cabrera hits 22% (Fangraphs). Derek Jeter in his career was 20%. David Ortiz in his career has been 20%. If you hit 25%, you’ve just been lucky.
Anyway, somebody asked me if anybody had studied the relationship between aging and batting averages on balls in play. Well. . hell, I don’t know. It’s easy to do. . .I’ll go study it.
I took all players in history who had 6000 or more plate appearances, and who were not active either in 1912 or before (because strikeout data is spotty from before 1912) and are not active now (meaning that they did not play in 2015, except I included Torii Hunter and Michael Could Die Here because they have announced their retirements. This came to 499 players. . .
You use long-career data for studies of aging so that you don’t get people popping into and out of the study. If you don’t use long-career data you get a constantly changing pool of players.
So, these 499 players. . .these are their Batting Averages on Balls in Play, by age:
AGE
|
IPAvg
|
|
AGE
|
IPAvg
|
|
AGE
|
IPAvg
|
21
|
.301
|
|
27
|
.305
|
|
33
|
.297
|
22
|
.301
|
|
28
|
.304
|
|
34
|
.297
|
23
|
.305
|
|
29
|
.304
|
|
35
|
.295
|
24
|
.305
|
|
30
|
.303
|
|
36
|
.293
|
25
|
.306
|
|
31
|
.302
|
|
37
|
.293
|
26
|
.308
|
|
32
|
.300
|
|
38
|
.288
|
This is the area in which we have at least 50,000 plate appearances at each age. The batting average on balls in play starts at .301 at age 21, goes up until age 26, and declines after age 26.
Well, that’s interesting. ..there is an age-related pattern. OK, now I’ve got the data together, let’s look at some other things.
In my data, these are the highest single-season batting averages on balls in play (300 or more plate appearances):
First
|
Last
|
YEAR
|
AGE
|
BIP
|
Hits
|
IPAvg
|
Babe
|
Ruth
|
1923
|
28
|
388
|
164
|
.423
|
Rogers
|
Hornsby
|
1924
|
28
|
479
|
202
|
.422
|
Harry
|
Heilmann
|
1923
|
28
|
466
|
193
|
.414
|
Rod
|
Carew
|
1977
|
31
|
547
|
225
|
.411
|
Rogers
|
Hornsby
|
1921
|
25
|
523
|
214
|
.409
|
From this we learn that the highest batting averages on balls in play were in the early 1920s. These aren’t the highest ever, possibly; these are just the highest in my data. Players who had long careers. In the dead ball era the outfielders played very shallow, much more shallow than they do today. When the lively ball era arrived (1920) you had outfielders playing shallow with batters who were capable of hitting the ball over their heads. These are the lowest in my data:
First
|
Last
|
YEAR
|
AGE
|
BIP
|
Hits
|
IPAvg
|
Ted
|
Simmons
|
1981
|
31
|
334
|
68
|
.204
|
Dick
|
McAuliffe
|
1971
|
31
|
392
|
81
|
.207
|
Al
|
Cowens
|
1983
|
31
|
311
|
66
|
.212
|
Everett
|
Scott
|
1915
|
22
|
338
|
72
|
.213
|
Willie
|
Jones
|
1953
|
27
|
417
|
89
|
.213
|
Frank
|
Bolling
|
1964
|
32
|
303
|
65
|
.215
|
Ed
|
Brinkman
|
1965
|
23
|
357
|
77
|
.216
|
Ed
|
Brinkman
|
1972
|
30
|
459
|
99
|
.216
|
Carlton
|
Fisk
|
1985
|
37
|
425
|
92
|
.216
|
So you can hit anywhere from .200 to .400 on balls in play, basically.
Next thing I looked at was batting averages on balls in play of players up to age 28, and after age 28. Up to age 28, the highest (career) batting averages on balls in play were these ten guys:
First
|
Last
|
BIP<28
|
Rogers
|
Hornsby
|
.370
|
Wade
|
Boggs
|
.370
|
Kiki
|
Cuyler
|
.370
|
Rod
|
Carew
|
.367
|
George
|
Sisler
|
.361
|
Paul
|
Waner
|
.360
|
Derek
|
Jeter
|
.357
|
Jeff
|
Conine
|
.355
|
Al
|
Simmons
|
.353
|
Earle
|
Combs
|
.353
|
Conine is kind of a fluke there; he didn’t have many plate appearances up to age 28. Anyway, one of my questions in "Hey, Bill" was to the effect that Wade Boggs was consistent in having very high Batting Averages on Balls in Play. Actually, Boggs’ batting average on balls in play was .370 up to age 28, but .337 after that—the largest slippage on this list, other than Conine.
First
|
Last
|
BIP<28
|
Past 28
|
Rogers
|
Hornsby
|
.370
|
.357
|
Wade
|
Boggs
|
.370
|
.337
|
Kiki
|
Cuyler
|
.370
|
.338
|
Rod
|
Carew
|
.367
|
.358
|
George
|
Sisler
|
.361
|
.337
|
Paul
|
Waner
|
.360
|
.329
|
Derek
|
Jeter
|
.357
|
.344
|
Jeff
|
Conine
|
.355
|
.309
|
Al
|
Simmons
|
.353
|
.323
|
Earle
|
Combs
|
.353
|
.328
|
But everybody regresses toward the mean, over time. Players who hit .340 up to age 28 will hit less than .340 after 28, although they will stay higher-than-normal. These are the players who have the lowest in-play averages up to age 28:
First
|
Last
|
BIP<28
|
Past 28
|
Mark
|
McGwire
|
.228
|
.272
|
Graig
|
Nettles
|
.232
|
.250
|
Harmon
|
Killebrew
|
.237
|
.254
|
Dave
|
Kingman
|
.239
|
.253
|
Rocky
|
Colavito
|
.239
|
.258
|
Ralph
|
Kiner
|
.243
|
.264
|
Tom
|
Brunansky
|
.245
|
.275
|
Willie
|
Jones
|
.247
|
.261
|
John
|
Mayberry
|
.248
|
.249
|
Matt
|
Williams
|
.249
|
.303
|
The guys with the lowest in-play averages are the guys with the uppercuts, the home run swings. Speed also has something to do with it. If you have a very low in-play average up to age 28 it will go up after age 28, but it will remain low.
Anyway, I did one more study, and this is really the critical one; this is the money shot. The essence of the issue is:
a) whether the player has a high in-play average in the base season, and
b) whether his batting average goes up or down in the next season.
If a player hits .385 on balls in play in one season, how likely is it that his batting average will go down the next season? Not his in-play batting average; his OVERALL batting average. Is it 60%? 70%?
It’s over 80%. If we had more data, it might be 90%.
I sorted the players by their in-play batting averages, what people have taken to calling BABIP. Hate that expression; it’s like scraping chalk on the blackboard. Anyway, Group 1 was players who had an in-play average of .380 or above, Group 2 was .360 to .379, Group 3 was .340 to .359, Group 4 was .320 to .339, etc. Group 10 was .220 and below.
Of those who had an in-play average of .360 to .379, 85% had a drop in their overall batting average in the following season (if they had 300 or more plate appearances in the following season.) Of those who had an in-play average of .220 to .239, 90% had an INCREASE in their overall batting average in the following season:
Group
|
Batting Average
|
Count
|
Up
|
Down
|
Majority Pct
|
1
|
.380
|
|
and up
|
68
|
12
|
56
|
82%
|
2
|
.360
|
to
|
.379
|
175
|
27
|
148
|
85%
|
3
|
.340
|
to
|
.359
|
402
|
102
|
300
|
75%
|
4
|
.320
|
to
|
.339
|
844
|
286
|
558
|
66%
|
5
|
.300
|
to
|
.319
|
1202
|
532
|
670
|
56%
|
6
|
.280
|
to
|
.299
|
1125
|
611
|
513
|
54%
|
7
|
.260
|
to
|
.279
|
684
|
475
|
209
|
69%
|
8
|
.240
|
to
|
.259
|
283
|
220
|
63
|
78%
|
9
|
.220
|
to
|
.239
|
77
|
69
|
8
|
90%
|
10
|
Below
|
.220
|
13
|
12
|
1
|
92%
|
At the ends of the chart, we approach predictions with near to a 100% reliability, 90% or thereabouts. But what is more surprising, to me, is the strength of the indicator, even if the deviance from normal is relatively modest. If a player hits just .330 on balls in play—a fairly modest average, really still in the fat part of the chart—but even at .330, the odds that his batting average will decline the following season (if he has 300 or more plate appearances) are two to one. If he hits just .270 on balls in play, the odds that his batting average will increase are two to one. So even in the fat part of the chart, the Ball in Play average provides a very strong indicator of whether the player has been living on luck.
Thanks for reading.