Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

The Ball in Play

By Bill James

April 28, 2016

I am being peppered in "Hey, Bill" with questions about Batting Averages on Balls in Play, which are good questions, but I did a couple of short studies related to the issue, and I thought I would report those here, and then I’ll answer the "Hey, Bill" questions as well, but some of them I’ll answer by referencing this study.

One of the core functions of Sabermetrics is to distinguish between what there is in the statistics which is real and can be relied upon, and what there is that is merely a manifestation of luck. The first meaningful illustration of this in our field had to do with pitchers’ won-lost records. In the 1970s and up until the 1970s, it was almost universally accepted by baseball men that a pitcher’s won-lost record was the truest indicator of how well he had pitched, because "luck evens out over the course of a season." If a pitcher went 14-10 but with a 4.38 ERA, while a teammate went 9-15 but with an ERA a run better, people would say that the 14-10 pitcher had pitched well when it mattered, that he was a winner, that he knew how to win, that he knew how to close out a victory, etc. etc. Which, by the way, is the actual data from Joe Niekro and Ken Holtzman with the 1968 Cubs; Niekro was 14-10 with a 4.38 ERA, while Holtzman was 9-15 but with a 3.35 ERA. The next year Holtzman went 17-13, while Niekro went 8-18, so apparently over the winter he forgot how to win, although he later remembered.

In 1978 Jim Slaton went 17-11 with a 4.12 ERA, while a teammate, Dave Rozema, went 9-12 with a 3.14 ERA. The same year Chris Knapp went 14-8 with a 4.21 ERA while a teammate, Paul Hartzell, pitched almost as many innings but went 6-10 with a 3.44 ERA. In 1977 Bob Forsch went 20-7 with a 3.48 ERA, while a teammate, Harry Rasmussen, posted the exact same ERA in more innings, but went 11-17. The explanation that prevailed at the time was that Forsch was a winner. The next year Forsch went 11-17. Harry Rasmussen changed his name to Eric.

What we saw, in our field, was that these things were not actually a result of a pitcher being a "winner" or a "loser", but of luck not evening out. When you actually looked at the number of runs scored for the "winners" and the number of runs scored for the losers, this was pretty obvious. All we were trying to say was that the won-lost record, in some outlying cases, some relatively unusual cases, was misleading because the element of luck reflected in it was imbalanced.

For ten years after we began making this argument, people pitched silly arguments at us in an effort to defend the traditional view of the issue. People took what we had said as if we had said that a pitcher’s won-lost record was meaningless. Well, they would say, but the pitcher himself is a hitter in the National League. Isn’t one of the reasons that one pitcher gets more runs than another just because he is a better hitter? And what about park effects. . .doesn’t the pitcher who gets more runs usually pitch in the high-run parks? And won’t he continue to pitch in the high-run parks the next year? And. .here’s one I heard a thousand times. . ."That’s what I’m saying, Bill. Blyleven pitched just well enough to lose."

Eventually, almost everyone who works in baseball came to understand what we were saying. Nobody who makes decisions for a baseball team in 2016 really believes that the won-lost record is a reliable indicator of how well a pitcher has pitched, although the old-line thinking survives among a few announcers and the oddball scout. This is what we do, in our field. If a team scores 670 runs and allows 700 runs but finishes 90-72, that isn’t a special, hidden ability; it’s luck. If a team goes 30-15 in one-run games, that’s luck. I don’t care how many clever ways you come up to explain why it might not be luck; it’s luck. If 16% of the fly balls allowed by a pitcher turn into home runs, that’s bad luck. I don’t care how many ways you come up with to explain the subtleties of pitching to me; that’s just bad luck.

Allen Craig hit .400 with runners in scoring position in 2012, and .454 in 2013. That’s luck. Nobody actually has an ability to hit with runners in scoring position. The ability is to hit; there is no special ability which arises only in limited circumstances. When it happens, it’s luck. If it happens twice in a row, it’s just a LARGE pile of luck.

This Ball in Play issue is structurally identical to the run support/won-lost record debate. What we are trying to say is that a batting average in many cases involves an element of luck, and that we can actually see what that element of luck is by looking at the player’s batting average on balls in play. If he has hit .370 on balls in play, he’s exactly like a pitcher who got 6 runs a game to work with last year: he was lucky, and he’ll be less lucky next year, and you can bet on it.

Apparently, based on my mail, this message hasn’t been widely understood yet. Aren’t there some hitters who hit .340 or .345 with runners in scoring position over the course of a career? Well, yes, of course there are a few hitters who have better-than-normal ability to hit the open spots in the defense; no one ever suggested that there were not. That’s like the occasional pitcher who hits .250 and has a little power, so he bumps up his offensive support a little bit. If a player hits a lot of line drives, doesn’t that increase his batting average on balls in play? Well, first of all, if a player hits a lot of line drives, that’s luck. 21% of balls in play are Line Drives. Miguel Cabrera hits 22% (Fangraphs). Derek Jeter in his career was 20%. David Ortiz in his career has been 20%. If you hit 25%, you’ve just been lucky.

Anyway, somebody asked me if anybody had studied the relationship between aging and batting averages on balls in play. Well. . hell, I don’t know. It’s easy to do. . .I’ll go study it.

I took all players in history who had 6000 or more plate appearances, and who were not active either in 1912 or before (because strikeout data is spotty from before 1912) and are not active now (meaning that they did not play in 2015, except I included Torii Hunter and Michael Could Die Here because they have announced their retirements. This came to 499 players. . .

You use long-career data for studies of aging so that you don’t get people popping into and out of the study. If you don’t use long-career data you get a constantly changing pool of players.

So, these 499 players. . .these are their Batting Averages on Balls in Play, by age:

AGE	IPAvg	AGE	IPAvg	AGE	IPAvg
21	.301	27	.305	33	.297
22	.301	28	.304	34	.297
23	.305	29	.304	35	.295
24	.305	30	.303	36	.293
25	.306	31	.302	37	.293
26	.308	32	.300	38	.288

This is the area in which we have at least 50,000 plate appearances at each age. The batting average on balls in play starts at .301 at age 21, goes up until age 26, and declines after age 26.

Well, that’s interesting. ..there is an age-related pattern. OK, now I’ve got the data together, let’s look at some other things.

In my data, these are the highest single-season batting averages on balls in play (300 or more plate appearances):

First	Last	YEAR	AGE	BIP	Hits	IPAvg
Babe	Ruth	1923	28	388	164	.423
Rogers	Hornsby	1924	28	479	202	.422
Harry	Heilmann	1923	28	466	193	.414
Rod	Carew	1977	31	547	225	.411
Rogers	Hornsby	1921	25	523	214	.409

From this we learn that the highest batting averages on balls in play were in the early 1920s. These aren’t the highest ever, possibly; these are just the highest in my data. Players who had long careers. In the dead ball era the outfielders played very shallow, much more shallow than they do today. When the lively ball era arrived (1920) you had outfielders playing shallow with batters who were capable of hitting the ball over their heads. These are the lowest in my data:

First	Last	YEAR	AGE	BIP	Hits	IPAvg
Ted	Simmons	1981	31	334	68	.204
Dick	McAuliffe	1971	31	392	81	.207
Al	Cowens	1983	31	311	66	.212
Everett	Scott	1915	22	338	72	.213
Willie	Jones	1953	27	417	89	.213
Frank	Bolling	1964	32	303	65	.215
Ed	Brinkman	1965	23	357	77	.216
Ed	Brinkman	1972	30	459	99	.216
Carlton	Fisk	1985	37	425	92	.216

So you can hit anywhere from .200 to .400 on balls in play, basically.

Next thing I looked at was batting averages on balls in play of players up to age 28, and after age 28. Up to age 28, the highest (career) batting averages on balls in play were these ten guys:

First	Last	BIP<28
Rogers	Hornsby	.370
Wade	Boggs	.370
Kiki	Cuyler	.370
Rod	Carew	.367
George	Sisler	.361
Paul	Waner	.360
Derek	Jeter	.357
Jeff	Conine	.355
Al	Simmons	.353
Earle	Combs	.353

Conine is kind of a fluke there; he didn’t have many plate appearances up to age 28. Anyway, one of my questions in "Hey, Bill" was to the effect that Wade Boggs was consistent in having very high Batting Averages on Balls in Play. Actually, Boggs’ batting average on balls in play was .370 up to age 28, but .337 after that—the largest slippage on this list, other than Conine.

First	Last	BIP<28	Past 28
Rogers	Hornsby	.370	.357
Wade	Boggs	.370	.337
Kiki	Cuyler	.370	.338
Rod	Carew	.367	.358
George	Sisler	.361	.337
Paul	Waner	.360	.329
Derek	Jeter	.357	.344
Jeff	Conine	.355	.309
Al	Simmons	.353	.323
Earle	Combs	.353	.328

But everybody regresses toward the mean, over time. Players who hit .340 up to age 28 will hit less than .340 after 28, although they will stay higher-than-normal. These are the players who have the lowest in-play averages up to age 28:

First	Last	BIP<28	Past 28
Mark	McGwire	.228	.272
Graig	Nettles	.232	.250
Harmon	Killebrew	.237	.254
Dave	Kingman	.239	.253
Rocky	Colavito	.239	.258
Ralph	Kiner	.243	.264
Tom	Brunansky	.245	.275
Willie	Jones	.247	.261
John	Mayberry	.248	.249
Matt	Williams	.249	.303

The guys with the lowest in-play averages are the guys with the uppercuts, the home run swings. Speed also has something to do with it. If you have a very low in-play average up to age 28 it will go up after age 28, but it will remain low.

Anyway, I did one more study, and this is really the critical one; this is the money shot. The essence of the issue is:

a) whether the player has a high in-play average in the base season, and

b) whether his batting average goes up or down in the next season.

If a player hits .385 on balls in play in one season, how likely is it that his batting average will go down the next season? Not his in-play batting average; his OVERALL batting average. Is it 60%? 70%?

It’s over 80%. If we had more data, it might be 90%.

I sorted the players by their in-play batting averages, what people have taken to calling BABIP. Hate that expression; it’s like scraping chalk on the blackboard. Anyway, Group 1 was players who had an in-play average of .380 or above, Group 2 was .360 to .379, Group 3 was .340 to .359, Group 4 was .320 to .339, etc. Group 10 was .220 and below.

Of those who had an in-play average of .360 to .379, 85% had a drop in their overall batting average in the following season (if they had 300 or more plate appearances in the following season.) Of those who had an in-play average of .220 to .239, 90% had an INCREASE in their overall batting average in the following season:

Group	Batting Average			Count	Up	Down	Majority Pct
1	.380		and up	68	12	56	82%
2	.360	to	.379	175	27	148	85%
3	.340	to	.359	402	102	300	75%
4	.320	to	.339	844	286	558	66%
5	.300	to	.319	1202	532	670	56%
6	.280	to	.299	1125	611	513	54%
7	.260	to	.279	684	475	209	69%
8	.240	to	.259	283	220	63	78%
9	.220	to	.239	77	69	8	90%
10	Below		.220	13	12	1	92%

At the ends of the chart, we approach predictions with near to a 100% reliability, 90% or thereabouts. But what is more surprising, to me, is the strength of the indicator, even if the deviance from normal is relatively modest. If a player hits just .330 on balls in play—a fairly modest average, really still in the fat part of the chart—but even at .330, the odds that his batting average will decline the following season (if he has 300 or more plate appearances) are two to one. If he hits just .270 on balls in play, the odds that his batting average will increase are two to one. So even in the fat part of the chart, the Ball in Play average provides a very strong indicator of whether the player has been living on luck.

Thanks for reading.

COMMENTS (12 Comments, most recent shown first)

Steven Goldleaf
The part that I still have trouble with, after reading this patient explanation is: aren't hitters TRYING to hit line drives? If that's so, and I think it is, why wouldn't some batters be better at it than others? If you threw a hundred MLB fastballs to me, I'd be lucky to hit more than one line drive. If you throw 100 to a minor-leaguer, he'll hit maybe 10 line drives. If you throw to a AAAA batter, he'll hit fifteen, and to a MLB regular 20. Isn't that because they're skilled at hitting line drives?
1:36 PM May 2nd

shthar
So is BABIP real and can be relied upon, or is it merely a manifestation of luck?
1:09 PM May 2nd

tangotiger
Not to get too much into the weeds: I define the point at which the stat you observe is 50% luck and 50% talent the point at which the observed standard deviation is 1.41 (or square root of 2) times the random variation standard deviation (luck).

For example, that group of pitchers that I mentioned: they averaged 3064 balls in play (BIP). When you allow that many BIP, the amount of random variation you will get is one standard deviation = .0083, or sqrt(.3*.7/3064). That is, had we observed the spread in in-play batting average of these pitchers was .0083, we'd conclude EVERYTHING was luck. But since we deal with humans, nothing can be COMPLETELY luck.

What we actually observed among these pitchers was one SD = .0125, or 1.50 x random variation. So, at 3064 balls in play, the spread we observe is a bit more talent than luck. At about 2400 balls in play, we'd find that what we observe is an equal amount of luck and talent.

For things like strikeouts, the number of PA needed falls DRASTICALLY, somewhere around 150 PA or less.
12:14 AM Apr 29th

studes
You hit the ball out of the park, that's not luck.

True if you're just defining the fielding part of the equation as luck. But there is some luck in home run rates (admittedly not nearly as much as in BABIP). Pitchers, ballparks, wind blowing, gameday temperature, humidity can vary from game-to-game--even from PA to PA. These all impact BABIP too.

Guess it all depends on what you're trying to measure.
7:37 PM Apr 28th

studes
Pizza Cutter, for one example, found that batter BABIP stabilizes at 820 balls in play, while it takes 2,000 balls in play for pitcher BABIP to stabilize.

He defines stable as the point at which future outcomes can be predicted equally well by either player-specific data or by general randomness. I hope I said that correctly. Tango can do better--I know he has some issues with Pizza's definition.

If I'm thinking of my proportions right, at 200 balls in play, the batter's performance would be 80% "luck" and the pitcher's would be 90% "luck". But the difference would get wider as the data increases, maybe maxing out at around a 20 point difference.

Anyway, here's the link for more info:
www.fangraphs.com/library/principles/sample-size/
7:32 PM Apr 28th

tangotiger
When I look at the data, I break it down into binary components, which is a method that I've adopted from Voros.

For example, Voros would do:
$BB = BB/PA
$SO = SO/(PA-BB)
$HR = HR/(PA-BB-SO)
$nonHRH = (H-HR)/(PA-BB-SO-SO) or inplay BA
And you can continue
$2B3B = (2b+3b)/(H-HR)
$3B = 3b / (2b+3b)

At every step, Voros would remove one component, so that each metric is independent of the others, a very binary tree approach.

Of course, there's no real reason to have to follow that form. For example, in the third line, you can instead say:
$H = H/(PA-BB-SO) or BACON
$HR = HR/H

And so on. The idea is to try to isolate events in a way that makes sense.

There's a dozen other combinations you can try.
6:19 PM Apr 28th

bjames
I disagree with the point about BACON being more useful than BABIP, although it is certainly a better acronym. ..I mean, who doesn't like Bacon. If you offered me BABIP with my breakfast, I'd say, "No, thanks; I threw up yesterday." But BACON isn't useful because it doesn't measure what is truly relevant, which is the extent to which the batter has just been lucky. You hit the ball out of the park, that's not luck.

Anyway, while it is true that the "skill" element for hitters is larger than it is for pitchers, this is more misleading than instructive. For pitchers, it is 90% luck and 10% skill. For hitters, it is 80% luck and 20% skill. But it's still mostly a transient phenomenon. And thank God it's not a transient epiphenomenon.
1:16 PM Apr 28th

Scott_Ross
I'd like to add that one of the things that gets lost in a lot of BABIP talk is that each player has their own baseline BABIP, though they all hover around .300. And so where I find BABIP particularly useful is when I see a guy whose average is way down or way up, e.g. Daniel Murphy is batting .391 right now, looking at his career BABIP heading into this year, it was .314, and so far this year it's .439, and so we can see that he's been very lucky. Conversely, Kyle Seager is batting .143, but his 2016 BABIP is .121, well down from his career BABIP of .288. It can be an unsatisfying statistic for a lot of people because its real value is confirming or debunking what other stats are telling us.
1:13 PM Apr 28th

schoolshrink
Great article. A cool study might be pre-steroid, steroid, and post-steroid era comparisons. McGuire obviously sticks out, but can BABiP be used to set apart steroid-era hitters from other eras, as if we need more evidence to suggest the impact of steroids. Just a thought.
11:53 AM Apr 28th

tangotiger
I sent a note to Bill the other day. I should have copy/pasted it. Anyway, it was something like I took all players born since 1971, looked at their stats age 23-29, min 3000 PA. I got something like 185 or 188 pitchers and batters each.

The spread in BABIP was one SD = .012 for pitchers, .020 for hitters.

If it was pure random, we'd have expected one SD = .008 or something. So, backing out that number, that leaves us with a true talent level of one SD =
.009 pitchers
.018 hitters

Essentially, the spread in talent is twice as wide for hitters than pitchers.

Then again, if you look at HR and SO and BB rates, you'll get wider for hitters than pitchers.

The point is that for BABIP, the spread in talent and the spread in luck, for pitchers is around the same amount, when PA = 3000 to 6000 or so.

For batters, it's not as bad.

***

And I agree with Studes that, generally speaking for HITTERS, wobaCON (wOBA on Contact) is more useful than removing HR from the numerator and denominator.

Makes much more sense for pitchers.
11:34 AM Apr 28th

studes
For hitters, I think I like BACON (Batting Average on Contact) better than BABIP or IPAvg. BACON includes home runs. I get why we look at BABIP (or IPAvg) for pitchers, but I don't really get why it's useful to do so for batters. I mean, I've done it lots of times but I don't know why.

I also want to mention that every study I've seen shows that there is much more luck involved with pitcher BABIP than hitter BABIP. Hitters have more consistent ability to influence their batted ball outcomes.

Pop-up rate is a better predictor of future hitter BABIP than line drive rate. There's lots of variance in line drive rates, but pop-up rates tend more to be an intrinsic "skill" of a batter.
10:57 AM Apr 28th

mrbryan
Thanks, Bill! I've always found your work to be an antidote to the run of the mill thoughtless comments that pass for discussion in sports. I don't think a day goes by in which one does not hear reference to a player who is "showing signs of emerging from a slump" or "very dangerous right now, because he is hot." One repeatedly hears reference to players gaining or losing skills which cannot be measured - "he came in to the game 0 for 10 against this pitcher, but he must have learned something because he has two hits today," or the never-ending discussions of "clutch hitters" and the importance of batting order. Articles like this one are a glass of cool fresh water set apart from a great salty sea.
10:03 AM Apr 28th

The Ball in Play

COMMENTS (12 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: