Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

The High Cost of the Free Pass

By Bill James

March 30, 2020

The High Cost of the Free Pass

Before I get to the real work to be done today, I needed to spend a couple of paragraphs updating you about the amendments outlined before.

In terms of double plays, some of the numbers changed a little bit when I recalculated Estimated Runners on First Base, but not really; all of the teams which were best at turning the double play or worst at turning the double play are still evaluated essentially the same.

In terms of stolen base defense, more surprisingly, the 1920 Braves hold on to the #1 spot of all time, regardless of how the question is analyzed, despite the addition to the data set of the caught stealing numbers from 1900 to 1919. A few of the 1900-1919 teams are among the Top 10 or Top 20 in terms of turning the opposition’s running game into a weapon for the defense, but the top 6 or 8 are the same on both ends.

I have combined Wild Pitches, Passed Balls and Balks into one category called "One Base Mistakes". Now that I have done it I am not sure what the practical benefit of that is; it just makes the data look stable and useful, rather than flukish. The best team ever at avoiding One Base Mistakes was the Baltimore Orioles in the strike-shortened 1994 season (Mike Mussina, Ben McDonald, Jamie Moyer, catcher Chris Hoiles.) They had 18 Wild Pitches (the fewest in baseball), one balk (tied for the fewest in baseball, the average was 6) and 5 Passed Balls (tied for fewest in the American League, although one National League team had only 3.) They finished 63-49, in second place behind the Yankees at the time that the calendar stopped. The second-best team ever in terms of avoiding one-base mistakes was the 1977 Red Sox team that I wrote about before in this series—Tiant, Fergie Jenkins, Carlton Fisk, etc. The worst team ever was either the 1987-1988 Texas Rangers (in terms of the raw number of mistakes) or the 1936 Philadelphia A’s (if the raw number of mistakes is compared to period norms.) Combining them does normalize the data to an extent; the best teams (1994 Baltimore and 1977 Boston) are 2.6 standard deviations better than the norm, while the worst (1988 Texas and 1936 Philadelphia) are 4.8 standard deviations worse than the norm.

In terms of changing from a decade norm to a rolling-decade norm, I have not done that yet, although it is kind of looking like I might have to, but I haven’t done it yet.

OK, walks allowed. The 1932 Cincinnati Reds walked only 276 batters; the 1915 Philadelphia A’s walked 827. The theory of this research is that the Cincinnati Reds prevented a lot of runs by not walking people. The 276 walks is not a record low; the 1933 Reds walked only 257, but the issue here is how many batters they DIDN’T walk, that they might conceivably have walked if they had been Ryne Duren or Bobby Witt or somebody. Steve Dalkowski. The 1932 Cincinnati Reds faced 220 more batters than they did in 1933. How many of those batters they MIGHT have walked, had they been Steve Dalkowski, varies as to whether we assume that the wildest pitcher is 3 standard deviations below the norm, 4 standard deviations below the norm, or 5 standard deviations below the norm, but in any case, it is enough that we regard the 1932 team has having prevented more walks than the 1933 team. Or any other team, such as the 1918 New York Giants, who walked only 228, but that was a war-shortened season, or the 1904 Red Sox, who walked only 233, but the league norms were lower in 1904 than in 1932, lower in the Cy Young era than in the Carl Hubbell era.

The Reds pitching staff in that era was led by the Nashville Narcissus, Red Lucas. Lucas was actually a pretty decent pitcher; in 1932 he was just 13-17 but had a 2.84 ERA in 269 innings. Lucas, when I first became a baseball fan, held the record for career pinch hits and pinch hitting at bats. He was a career .281 hitter who was used as a pinch hitter more than 500 times in his career. In early 1933 the Reds traded for Paul Derringer, who went 7-27 that season despite a 3.30 ERA, league ERA was 3.34. The Reds were terrible in that era; they finished last every season from 1931 to 1934. It was miles to the outfield fences; there were very few home runs in that park, so most of their staff was just guys who weren’t actually major league pitchers; they just went out and threw the ball over the plate and let people hit it. This worked out sort-of-OK at home, but very poorly on the road; in 1933 they won 37 games at home but only 21 on the road. And, as I mentioned, they had no center fielder; in 1933 their center field player was Chick Hafey, who had played left field for the Cardinals for years, but had been traded to Cincinnati when he got to be 29 years old because that’s what Branch Rickey believed in; he always wanted to trade his players away before their value collapsed.

Anyway, how many batters did the Cincinnati Reds NOT walk, that they might reasonably have walked? It depends on whether we set the bar at 3 standard deviations below the norm, 4 standard deviations, 5 standard deviations, or some other deviant yet to be determined. A Deviant to be named later. (I always wanted to do that with the Red Sox. Sometimes when you have a player who is a pain in the ass, you trade him away for whatever you can get, and then later the other team will give you somebody that THEY can’t stand and their manager wants to get rid. I always wanted us to make an announcement that Joe Schmuck had been traded to San Diego for a trouble-maker to be named later.)

If we use 3 standard deviations below the norm, the 1932 Cincinnati Reds did NOT walk 467 batters that they might reasonably have walked, which would probably save them somewhere around 150 runs. If we use 4 standard deviations, then it would be 552 walks (about 180 runs), and if we use 5 standard deviations, then it would be 637 walks (about 210 runs).

On the other end of the Elephant, the 1915 Philadelphia Athletics were basically a team of teenagers and minor leaguers which had been pieced together to replace the stars who had been sold to the other American League teams so that they would not flee to the Federal League where they could make better money. In 1915 they walked 120 batters MORE than the standard of 3 standard deviations below the norm; in other words, they were worse than incompetent in this area. This means that we probably can’t use a standard of 3 standard deviations below the norm; we probably HAVE to go to at least 4. Not necessarily, not absolutely; there are only six teams in history which were worse than the 3-SD cutoff, and none of the others was worse than -60, so the problem is kind of contained, but still, negative numbers play hell with an analysis of this nature, so you have to avoid them. Not that I think we would up using 3 standard deviations as the cutoff, anyway.

If we use 4 standard deviations as the misery-mark, then the 1915 A’s were 38 walks worse than terrible; if we use 5 standard deviations, then they were 44 walks better than the misery-mark.

Transition here. Let me mark that appropriately. . .

Important Transition Here. My general thinking here, my first set of working assumptions, is that I might use 3 standard deviations below the norm as the zero point for categories in which the worst number you can post is zero, but 4 standard deviations below the norm as the zero point in categories which the BEST number you can post is zero. In other words, 3 standard deviations below the norm in a category like strikeouts, where the bad teams have the lowest numbers, but 4 standard deviations in a category like walks, where the bad teams have the highest numbers. My previous research (previous in this series. . ..the stuff I have posted over the past two weeks) has shown that in almost every area, the best/worst teams are not quite 3 standard deviations from the norm in the "zero-limited" direction, but almost 4 standard deviations from the norm in the "skies the limit" direction, the direction from the norm in which there is no zero. So my first inclination is to use 3 standard deviations below the norm when small numbers indicate bad performance, but 4 standard deviations worse than the norm when large numbers indicate bad performance.

Let us suppose that there is a "worksheet" for every team in major league history, on which we are tallying up their Runs Prevented. There will be 11 categories of Run Prevention on the sheet: Strikeouts, Control, Home Run Avoidance, Wild Pitch Avoidance, Hit Batsmen Avoidance, Balk Avoidance, Fielding Range (DER), Fielding Consistency (Fielding Percentage), Stolen Base Control, Double Plays, and Passed Ball Avoidance. Ultimately, we have to tag a number of Runs Prevented to each of those things. Like this:

Team: 1600 Merchants of Venice

Runs Prevented By:
Strikeouts
Control
Home Run Avoidance
Hit Batsmen Avoidance
Wild Pitch Avoidance
Balk Avoidance
Fielding Range (DER)
Fielding Consistency (F Pct)
Double Plays
Stolen Base Control
Passed Ball Avoidance

Sum of the Above
Actual Runs Prevented:
Error/Discrepancy:

At this point, we can fill in one element of this worksheet for each team, and we can make preliminary estimates about two others. We know what the actual Runs Prevented for each team are, because I explained that process. Let’s make four initial assumptions about strikeouts and walks:

1) That the lower boundary for strikeouts in 3 standard deviations below the norm,

2) That the lower boundary for walks is 4 standard deviations worse than the norm,

3) That each strikeout is worth .30 runs, and

4) That each walk avoided is worth .32 runs

Understanding that these initial assumptions will not control the future course of the research. If we make those four assumptions, then we can fill in the data for runs prevented by strikeouts and control. That would give us this worksheet, for the defending World Champions:

Team: 2019 Washington

Runs Prevented By:
Strikeouts	206
Control	58
Home Run Avoidance
Hit Batsmen Avoidance
Wild Pitch Avoidance
Balk Avoidance
Fielding Range (DER)
Fielding Consistency (F Pct)
Double Plays
Stolen Base Control
Passed Ball Avoidance

Sum of the Above	264
Actual Runs Prevented:
Error/Discrepancy:	879

We would thus be able to "explain" or "attribute" 264 of the 879 runs prevented by the Washington Nationals’ pitching and defense. The other 615 runs would still have to be explained by the other performance areas.

For all teams in baseball history since 1900, the number of runs prevented that we will have to attribute to somebody is 1,783,676. The number attributed by this process, for all teams, would be 457,184. That would be 26% of the whole.

Intuitively, that percentage feels like it is way too low. I would suspect that, over all of history, strikeouts and control would be maybe 50% of run prevention, wouldn’t you think? I think it is more than 26%.

It’s too early in the process to worry about that. We’ll run the numbers for all 11 categories, and then we’ll see where we are, and then we’ll see what we can do to move the numbers closer to the target. Thanks for reading; I hope this clarifies what I am trying to do at least a little bit at least for some people.

COMMENTS (17 Comments, most recent shown first)

Brock Hanke
Bill - Thanks for taking the time to answer DJ's question in some detail. If I've got this right, what it amounts to is the you're not comparing strikeouts to any other form of out; you are comparing strikeouts to any other kind of Plate Appearance. That certainly will increase the value of the strikeout, because the added Plate Appearances that are not outs are going to be all positive forces in scoring runs that the strikeout took away. So far, this all makes very good sense.
2:40 AM Apr 1st

mpiafsky
The merchant of Venice was, of course, great great grandfather to Koufax and Kenny Holtzman. But most scouts agree that his daughter Jessica was the true pitcher in the family and reminisce about her skill at throwing caskets from windows.
1:00 PM Mar 31st

DJ_Man
The issue here is not the value of a strikeout compared to another out. The issue here is the value of a strikeout. If you are evaluating the pitcher and you start with his innings, then the number of outs is given, fixed, or known, and you must merely distinguish a strikeout from another out. But dealing with the full array of possible outcomes--balls in play, home runs, etc.--the issue is the value of a strikeout as contrasted with any other outcome, such a double, a home run, or a walk.

Thanks! I think I can see the difference now.
11:34 PM Mar 30th

Guy123
There is a fairly straightforward way to measure the share of runs saved (and thus wins created by the defense) generated by each of Bill's factors. He has calculated the standard deviations (SD) for each factor in each decade. So we know the runs saved by a 1 SD change in each factor. If we sum the runs saved by a 1-SD change in each factor, then for each factor we just divide the runs saved by the total to calculate the proportion contributed. What this exercise does is essentially take all the defensive variation among teams -- runs allowed above or below average -- and allocates them to Bill's factors.

Let's take the 2010s as an example. To keep it simple, I'm going to look only at the 5 factors that really matter: DER, Fielding %, K, BB, and HR. Here is what a 1 SD change meant for a team in each factor (in the 2010s):
DER 45 hits
Fld% 15 errors
Ks 147 Ks
BB 50 BB
HR 32

Here are the approximate run values of a 1-SD improvement in each factor:
DER 34
Fld% 11
Ks 44
BB 15
HR 45
Total 149
So a team that was +1 SD on every factor would surrender 149 fewer runs than average (+15 wins).

Finally, turning this into proportions, here is the proportion of runs saved (defined as the difference in runs allowed between good and bad defensive teams) contributed by each factor:
DER 23%
Fld% 7%
Ks 29%
BB 10%
HR 30%
(All the other factors together probably contribute less than 5%, so you can knock these down a point or two if you're so inclined.)

To return to the original question, it appears that K and BB together account for about 40% of runs saved in this decade. It's possible the proportion was a bit lower in earlier decades (I'm too lazy to check now), but the overall proportion has to much higher than 25%, as Bill suspected.
9:18 PM Mar 30th

bjames

Third try, responding to DJ

I'm puzzled by one thing here. I thought that his research showed that, in terms of its effect upon scoring runs, a strikeout isn't all that much worse than any other out.

The issue here is not the value of a strikeout compared to another out. The issue here is the value of a strikeout. If you are evaluating the pitcher and you start with his innings, then the number of outs is given, fixed, or known, and you must merely distinguish a strikeout from another out. But dealing with the full array of possible outcomes--balls in play, home runs, etc.--the issue is the value of a strikeout as contrasted with any other outcome, such a double, a home run, or a walk.
8:22 PM Mar 30th

bjames
I'm puzzled by one thing here. I thought that his research showed that, in terms of its effect upon scoring runs, a strikeout isn't all that much worse than any other out.

Let me try this. The issue HERE is not the value of a strikeout compared to another out. The issue here is the value of a strikeout.
8:09 PM Mar 30th

bjames
I'm puzzled by one thing here. I thought that his research showed that, in terms of its effect upon scoring runs, a strikeout isn't all that much worse than any other out.

You've got a really fundamental misunderstanding there; it would take me a half-hour to explain it to you. Perhaps somebody on the site can help you?
5:51 PM Mar 30th

bjames
I bet this is a more difficult process. Do you feel that way?

Oh yes; it's a much, much more difficult problem.
5:50 PM Mar 30th

bjames
I'm curious why you didn't use a similar scaling for Runs Prevented.

I didn't do that because there is no reason that I would have done that. It's an entirely different problem; using the same approach would make no sense.
5:49 PM Mar 30th

voxpoptart
I really enjoy watching you put this together one step at a time, long before we or you know what will actually work. The Baseball Abstracts were already formative for me: my first sustained experience at watching the scientific process get explained in clear detail after-the-fact. But this is you letting us in on assumptions that might need to be overhauled, procedures that might crash into corners and get stuck, *before* the happy ending has been assured. I think that's valuable, and I appreciate it.
4:51 PM Mar 30th

bearbyz
It does seem low, but would it be best how it compares with the other categories before making any changes.
3:38 PM Mar 30th

BobGill
DJ: I think the point with strikeouts in this context is that since a ball in play generally stands about a 3-in-10 chance of becoming a hit, each strikeout equals minus-0.3 of a hit, so it helps the defense that much.
1:11 PM Mar 30th

DJ_Man
Fascinating to see how Bill puts together this model one step at a time.

I'm puzzled by one thing here. I thought that his research showed that, in terms of its effect upon scoring runs, a strikeout isn't all that much worse than any other out. If you re-run a simulation, changing all of a teams outs to strikeouts, they only lose a few runs over the course of a season. I recall a statement that all of Dave Kingman's extra strikeouts only cost his teams six or seven runs over his career. Am I remembering that correctly, or has later research invalidated that?

It may be that 26% is quite generous in giving credit for strikeouts.
11:45 AM Mar 30th

BobGill
I agree on the point about "seeing the research revealed as the system evolves," and I'm enjoying the chance to watch it. I assume you went through something similar years ago while working on runs created. But given the differences in measuring offense and defense, I bet this is a more difficult process. Do you feel that way?
8:23 AM Mar 30th

willibphx
My opinion is that it is low but probably not by that much. While my analysis is not as detailed or rigorous as your work, HRs allowed/avoidance has a very strong correlation with runs allowed, perhaps as much as 25%-30% by itself. Look forward to the next step on the journey.
8:09 AM Mar 30th

evanecurb
This is really interesting. We’re seeing the research revealed as the system evolves. I’m not sure I’d do that if I were Bill, but I’m not Bill.
7:48 AM Mar 30th

jgf704
You seem to have settled on a scale for these run elements as 3 to 4 standard deviations. I'm curious why you didn't use a similar scaling for Runs Prevented. It seems you set that scale at basically the average number of runs (which is 7 to 9 standard deviations).
7:23 AM Mar 30th

The High Cost of the Free Pass

COMMENTS (17 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: