Saving Private Runs
We move on now to the second stage of our Project, the project being to estimate how many runs have been saved by each player in major league history. The first thing that we need to do here is to estimate the number of runs saved by each team.
In order to do that, we need five pieces of data for each team:
The League number of outs recorded (Innings Pitched X 3)
The League number of runs scored
The team number of outs recorded
The Adjusted Park Factor for the team, and
The number of runs allowed by the team.
Given that, it’s not complicated. In 2019 there were 21,690.2 innings pitched in the American League, and 11,859 runs scored (12,018 runs allowed; there is a small discrepancy one way or the other.) Anyway, 21,690.2 innings is 65,072 outs. The Houston Astros had an outstanding pitching staff led by Justin Verlander and Gerrit Cole. They pitched 1,462.1 innings, or 4,387 outs, so they accounted for 4387/65072 of the league’s outs, or .067418. One fifteenth, basically. They thus could be expected to allow .067418 of the league’s runs allowed; .067418 times 12,018 is 810.2251, so they could have been expected to allow 810 runs.
Except that they pitched in a hitter’s park, so the expected runs allowed is higher than that. Adjusted Park factor of 1.037; their expected runs allowed go up to 840.32.
The "zero point" for them is twice that number. If they had allowed twice that number of runs, that would be 1680.64 runs allowed. They actually allowed only 640 runs, or 1040.64 runs less than they theoretically might have allowed, had they had zero talent on their pitching staff and in their defensive play.
This is a high number. It is, in fact, the 19th highest of all time. In this system, you will have a high number if
(a) You have good pitching and defense, and
(b) You play in a hitter’s park.
You will have a low number of runs saved against zero if
(a) You have bad pitching and defense, and
(b) You play in a pitcher’s park.
Later, at the end of the process, we will take the park effects back out of it, and put teams in a high-run environment on an equal footing with teams in a low-run environment. But you have to figure RUNS saved first, before you can park-adjust the runs.
Think of all of the greatest pitching staffs you can remember or you know about—the 1954 Cleveland Indians, the 1965 Dodgers, 1968 Cardinals, 1971 Orioles, the 1986 Mets, 1998 Braves, the 2011 Phillies, the 2017 Dodgers. Basically, very few or none of those teams are going to show up on the list of the teams saving the most runs, because almost all of them pitched in pitcher-friendly environments. The 2011 Phillies, although they are in the top 20% of teams by runs saved, actually saved fewer RUNS than the 1930 Phillies, the famous team that finished 52-102 despite gargantuan hitting achievements. The 1930 Phillies played in a league in which the average team scored 878 runs, and in a park that inflated the expectation by another 12%, so they had an expectation of allowing 1022 runs, and a "zero-talent based" expectation of 2044 runs. They actually allowed 1,199 runs, 177 more than expected, but 845 less than they potentially might have allowed. The 2011 Phillies played in a league in which the average team scored 668 runs, park neutral, so they had an expectation of allowing 668 runs; actually 676 if you watch the decimal points better. Twice that would be 1,352 runs; they actually allowed 529 runs, 147 less than expected, but only 823 less than their zero point. We’ll take the park and league out of it at the end of the road.
So, the ten teams which saved the most runs, compared to the zero point, are:
YEAR
|
City
|
Team
|
Lg
|
Exp RA
|
R
|
Team Runs Saved
|
W
|
L
|
2000
|
Colorado
|
Rockies
|
NL
|
1062
|
897
|
1227
|
82
|
80
|
1900
|
Boston
|
Braves
|
NL
|
944
|
739
|
1150
|
66
|
72
|
1932
|
Philadelphia
|
A's
|
AL
|
944
|
751
|
1137
|
94
|
60
|
1926
|
Philadelphia
|
A's
|
AL
|
852
|
569
|
1135
|
83
|
67
|
1936
|
Boston
|
Red Sox
|
AL
|
946
|
764
|
1129
|
74
|
80
|
1930
|
Chicago
|
Cubs
|
NL
|
994
|
870
|
1117
|
90
|
64
|
1930
|
St. Louis
|
Cardinals
|
NL
|
948
|
784
|
1112
|
92
|
62
|
1955
|
Boston
|
Red Sox
|
AL
|
878
|
652
|
1103
|
84
|
70
|
1936
|
Cleveland
|
Indians
|
AL
|
970
|
862
|
1077
|
80
|
74
|
1970
|
Chicago
|
Cubs
|
NL
|
872
|
679
|
1065
|
84
|
78
|
And the 10 teams which saved the FEWEST runs, compared to the zero point, are:
YEAR
|
City
|
Team
|
Lg
|
Exp RA
|
R
|
Team Runs Saved
|
W
|
L
|
1904
|
Washington
|
Senators
|
AL
|
542
|
743
|
341
|
38
|
113
|
1981
|
Cleveland
|
Indians
|
AL
|
392
|
442
|
342
|
52
|
51
|
1968
|
Washington
|
Senators
|
AL
|
509
|
665
|
352
|
65
|
96
|
1915
|
Philadelphia
|
A's
|
AL
|
621
|
888
|
354
|
43
|
109
|
1981
|
San Diego
|
Padres
|
NL
|
405
|
455
|
355
|
41
|
69
|
1908
|
New York
|
Yankees
|
AL
|
534
|
710
|
359
|
51
|
103
|
1981
|
Pittsburgh
|
Pirates
|
NL
|
394
|
425
|
364
|
46
|
56
|
1907
|
Washington
|
Senators
|
AL
|
527
|
690
|
365
|
49
|
102
|
1909
|
Washington
|
Senators
|
AL
|
512
|
655
|
368
|
42
|
110
|
1981
|
Texas
|
Rangers
|
AL
|
381
|
389
|
372
|
57
|
48
|
Saving only 341 runs in a full season is equivalent to scoring only 341 runs over the course of a season. The 1908 St. Louis Cardinals scored only 371 runs in 154 games, which I think is the lowest ever. Parallel accomplishments.
I’d spend more time on these lists of teams, but it’s not that significant of an accomplishment. On average, the teams that ALLOW more runs, also SAVE more runs, because the variation in runs caused by the combination of the park and era is larger than the variation caused by the performance of the team. The 510 teams saving the most runs had an average expectation of 797 runs allowed, allowed an average of 712, and thus saved an average of 881:
|
|
|
Exp RA
|
R
|
Team Runs Saved
|
W
|
L
|
Most Team Runs Saved
|
797
|
712
|
881
|
86
|
71
|
Next Most
|
733
|
702
|
764
|
82
|
76
|
Average
|
693
|
690
|
696
|
80
|
78
|
Not too Many Runs Saved
|
659
|
687
|
632
|
76
|
82
|
Fewst Runs Saved
|
599
|
673
|
525
|
67
|
84
|
The bottom 510 teams had an expectation of allowing 599 runs, actually allowed 673, and thus saved an average of 525.
What we have to do now is to make the categories add up. We have to figure how many runs were saved by strikeouts, how many were saved by control (not walking people), how many were saved by catchers preventing the running game, how many were saved by double plays, etc., and we have to make those numbers add up to the runs saved by the team, with a reasonable margin of error. If we can do that, then we will have succeeded. If we can’t, then the effort is a failure.
Sort of a separate article here. I’ve made some changes/amendments/ adjustments to the process here, which I need to explain. I think there are four changes that I need to announce, or admit to, or something.
1) Responding to W. T. Mons comment:
Baseball-Reference.com has caught stealing datums for the pre-1920 years. I believe they came from Pete Palmer, although where he got them I couldn't say. Anyway, they show the 1900 Phillies allowing 273 steals and throwing out 185 runners.
I wasn’t aware that that data was there. I have now copied all of that data into my spreadsheets, and I will use that going forward. Not only going forward, but going backward as well; I’ll have to re-calculate the stolen base stuff that I published a few days ago, which unfortunately is not the end of the damage control; having the additional data will also force me to re-calculate expected Double Plays for each team, which means that I will have to recalculate Estimated Runners on First Base for each team. It’s a pain in the ass, but you know; you just have to do those things.
2) Following the query about whether I was using Decade Norms based on each calendar or "rolling averages", I re-thought that issue, and I think it might be better to go in the other direction. . .move it over to rolling averages. I haven’t done that yet. I’m not sure which one I am going to do, actually. There are obvious advantages to doing the rolling average, but at some point I am going to have to figure, for example, how many runs were saved by Bill Dickey in 1937. When I do that I will have to copy the "background level" for each player into every column. As it is set up now I have 12 "background levels", one for each decade; actually there are 132 of them, because there is a background level for each of 11 categories. If I change to a rolling average, then there are 120 background levels to work with, which is actually 1,320. It becomes a logistical struggle, to keep track of all of those—and it doesn’t REALLY make that much difference as a practical matter; a player might once every so often gain or lose one run saved because we changed the background level, but that’s all, and it’s just an estimate, anyway; it’s not a hard fact. Now I think I have talked myself out of making the change.
It’s a case of "the perfect is the enemy of the good." I have limited data-processing skills, always have had. What is important to me is the concept. If the idea works, if it attracts an audience, somebody who has better programming skills can straighten out the details later.
3) I have decided that, for purposes of this stage of the analysis, I am going to combine Wild Pitches, Passed Balls and Balks into one category, which we can call "OBM" or "One Base Mistakes". I thought about doing this earlier, but I was troubled by the fact that I’ll have to un-bundle them later on to attribute them to individual players. But now I’ve decided that attributing them to individual players later on isn’t big a problem.
Individually, one team’s Passed Ball number is 8 standard deviations from the norm, and one team’s Balk number I think is 6 standard deviations from the norm. It’s kind of like a left fielder who sets up in left field 750 feet from the batter. The data is normally 250-280 feet, but there’s this one guy who sets up 750 feet away. Data that is that far out in left field is difficult to deal with, particularly when you have to establish the norms as a first step. I’m hoping (and expecting) that combining the data will normalize it somewhat.
4) I realized that, in entering the "league" data, I had entered the league HITTER’S data in some areas in which I should have entered the league PITCHER’S data—hits by the hitters, rather than hits allowed by pitchers, for example. It doesn’t make any real difference; it’s the same one way or the other until inter-league play, and not very much different after that, usually much less than a 1% discrepancy, but still . . ..I fixed those.
The consequence of these four adjustments is that many of the findings I have previously reported here would now be outdated, or, if you prefer, incorrect. It’s not what the data would show, necessarily, if evaluated today.
It’s a normal part of research. You always find things that you didn’t do right early in the study, or data that you had missed, or data problems that you hadn’t planned for. It always happens; you’re always have to go back and re-do things.