Remember me

Saving Private Runs

March 27, 2020
                                                   Saving Private Runs
 

            We move on now to the second stage of our Project, the project being to estimate how many runs have been saved by each player in major league history.  The first thing that we need to do here is to estimate the number of runs saved by each team. 

            In order to do that, we need five pieces of data for each team:

            The League number of outs recorded (Innings Pitched X 3)

            The League number of runs scored

            The team number of outs recorded

            The Adjusted Park Factor for the team, and

            The number of runs allowed by the team. 

 

            Given that, it’s not complicated.   In 2019 there were 21,690.2 innings pitched in the American League, and 11,859 runs scored (12,018 runs allowed; there is a small discrepancy one way or the other.)  Anyway, 21,690.2 innings is 65,072 outs.  The Houston Astros had an outstanding pitching staff led by Justin Verlander and Gerrit Cole.   They pitched 1,462.1 innings, or 4,387 outs, so they accounted for 4387/65072 of the league’s outs, or .067418.  One fifteenth, basically.   They thus could be expected to allow .067418 of the league’s runs allowed; .067418 times 12,018 is 810.2251, so they could have been expected to allow 810 runs.

            Except that they pitched in a hitter’s park, so the expected runs allowed is higher than that.  Adjusted Park factor of 1.037; their expected runs allowed go up to 840.32. 

            The "zero point" for them is twice that number.  If they had allowed twice that number of runs, that would be 1680.64 runs allowed.   They actually allowed only 640 runs, or 1040.64 runs less than they theoretically might have allowed, had they had zero talent on their pitching staff and in their defensive play. 

            This is a high number.  It is, in fact, the 19th highest of all time.   In this system, you will have a high number if

(a)  You have good pitching and defense, and

(b)  You play in a hitter’s park. 

You will have a low number of runs saved against zero if

(a)  You have bad pitching and defense, and

(b)  You play in a pitcher’s park. 

 

Later, at the end of the process, we will take the park effects back out of it, and put teams in a high-run environment on an equal footing with teams in a low-run environment.  But you have to figure RUNS saved first, before you can park-adjust the runs. 

Think of all of the greatest pitching staffs you can remember or you know about—the 1954 Cleveland Indians, the 1965 Dodgers, 1968 Cardinals, 1971 Orioles, the 1986 Mets, 1998 Braves, the 2011 Phillies, the 2017 Dodgers.  Basically, very few or none of those teams are going to show up on the list of the teams saving the most runs, because almost all of them pitched in pitcher-friendly environments.    The 2011 Phillies, although they are in the top 20% of teams by runs saved, actually saved fewer RUNS than the 1930 Phillies, the famous team that finished 52-102 despite gargantuan hitting achievements.   The 1930 Phillies played in a league in which the average team scored 878 runs, and in a park that inflated the expectation by another 12%, so they had an expectation of allowing 1022 runs, and a "zero-talent based" expectation of 2044 runs.  They actually allowed 1,199 runs, 177 more than expected, but 845 less than they potentially might have allowed.   The 2011 Phillies played in a league in which the average team scored 668 runs, park neutral, so they had an expectation of allowing 668 runs; actually 676 if you watch the decimal points better.  Twice that would be 1,352 runs; they actually allowed 529 runs, 147 less than expected, but only 823 less than their zero point.   We’ll take the park and league out of it at the end of the road.

So, the ten teams which saved the most runs, compared to the zero point, are:

YEAR

City

Team

Lg

Exp RA

R

Team Runs Saved

W

L

2000

Colorado

Rockies

NL

1062

897

1227

82

80

1900

Boston

Braves

NL

944

739

1150

66

72

1932

Philadelphia

A's

AL

944

751

1137

94

60

1926

Philadelphia

A's

AL

852

569

1135

83

67

1936

Boston

Red Sox

AL

946

764

1129

74

80

1930

Chicago

Cubs

NL

994

870

1117

90

64

1930

St. Louis

Cardinals

NL

948

784

1112

92

62

1955

Boston

Red Sox

AL

878

652

1103

84

70

1936

Cleveland

Indians

AL

970

862

1077

80

74

1970

Chicago

Cubs

NL

872

679

1065

84

78

 

And the 10 teams which saved the FEWEST runs, compared to the zero point, are:

YEAR

City

Team

Lg

Exp RA

R

Team Runs Saved

W

L

1904

Washington

Senators

AL

542

743

341

38

113

1981

Cleveland

Indians

AL

392

442

342

52

51

1968

Washington

Senators

AL

509

665

352

65

96

1915

Philadelphia

A's

AL

621

888

354

43

109

1981

San Diego

Padres

NL

405

455

355

41

69

1908

New York

Yankees

AL

534

710

359

51

103

1981

Pittsburgh

Pirates

NL

394

425

364

46

56

1907

Washington

Senators

AL

527

690

365

49

102

1909

Washington

Senators

AL

512

655

368

42

110

1981

Texas

Rangers

AL

381

389

372

57

48

 

Saving only 341 runs in a full season is equivalent to scoring only 341 runs over the course of a season.  The 1908 St. Louis Cardinals scored only 371 runs in 154 games, which I think is the lowest ever.  Parallel accomplishments. 

I’d spend more time on these lists of teams, but it’s not that significant of an accomplishment.  On average, the teams that ALLOW more runs, also SAVE more runs, because the variation in runs caused by the combination of the park and era is larger than the variation caused by the performance of the team.  The 510 teams saving the most runs had an average expectation of 797 runs allowed, allowed an average of 712, and thus saved an average of 881:

 

     

Exp RA

R

Team Runs Saved

W

L

Most Team Runs Saved

797

712

881

86

71

Next Most

733

702

764

82

76

Average

693

690

696

80

78

Not too Many Runs Saved

659

687

632

76

82

Fewst Runs Saved

599

673

525

67

84

 

The bottom 510 teams had an expectation of allowing 599 runs, actually allowed 673, and thus saved an average of 525.  

What we have to do now is to make the categories add up.  We have to figure how many runs were saved by strikeouts, how many were saved by control (not walking people), how many were saved by catchers preventing the running game, how many were saved by double plays, etc., and we have to make those numbers add up to the runs saved by the team, with a reasonable margin of error.  If we can do that, then we will have succeeded.  If we can’t, then the effort is a failure. 

 

 

 

Sort of a separate article here.   I’ve made some changes/amendments/ adjustments to the process here, which I need to explain.  I think there are four changes that I need to announce, or admit to, or something. 

1)      Responding to W. T. Mons comment:

Baseball-Reference.com has caught stealing datums for the pre-1920 years. I believe they came from Pete Palmer, although where he got them I couldn't say. Anyway, they show the 1900 Phillies allowing 273 steals and throwing out 185 runners.

 

I wasn’t aware that that data was there.   I have now copied all of that data into my spreadsheets, and I will use that going forward.   Not only going forward, but going backward as well; I’ll have to re-calculate the stolen base stuff that I published a few days ago, which unfortunately is not the end of the damage control; having the additional data will also force me to re-calculate expected Double Plays for each team, which means that I will have to recalculate Estimated Runners on First Base for each team.  It’s a pain in the ass, but you know; you just have to do those things.

2) Following the query about whether I was using Decade Norms based on each calendar or "rolling averages", I re-thought that issue, and I think it might be better to go in the other direction. . .move it over to rolling averages.  I haven’t done that yet.   I’m not sure which one I am going to do, actually.   There are obvious advantages to doing the rolling average, but at some point I am going to have to figure, for example, how many runs were saved by Bill Dickey in 1937.  When I do that I will have to copy the "background level" for each player into every column.   As it is set up now I have 12 "background levels", one for each decade; actually there are 132 of them, because there is a background level for each of 11 categories.   If I change to a rolling average, then there are 120 background levels to work with, which is actually 1,320.   It becomes a logistical struggle, to keep track of all of those—and it doesn’t REALLY make that much difference as a practical matter; a player might once every so often gain or lose one run saved because we changed the background level, but that’s all, and it’s just an estimate, anyway; it’s not a hard fact.   Now I think I have talked myself out of making the change.

It’s a case of "the perfect is the enemy of the good."  I have limited data-processing skills, always have had.  What is important to me is the concept.  If the idea works, if it attracts an audience, somebody who has better programming skills can straighten out the details later.   

3)  I have decided that, for purposes of this stage of the analysis, I am going to combine Wild Pitches, Passed Balls and Balks into one category, which we can call "OBM" or "One Base Mistakes".  I thought about doing this earlier, but I was troubled by the fact that I’ll have to un-bundle them later on to attribute them to individual players.  But now I’ve decided that attributing them to individual players later on isn’t big a problem.  

Individually, one team’s Passed Ball number is 8 standard deviations from the norm, and one team’s Balk number I think is 6 standard deviations from the norm.   It’s kind of like a left fielder who sets up in left field 750 feet from the batter.  The data is normally 250-280 feet, but there’s this one guy who sets up 750 feet away.   Data that is that far out in left field is difficult to deal with, particularly when you have to establish the norms as a first step.   I’m hoping (and expecting) that combining the data will normalize it somewhat.

4)  I realized that, in entering the "league" data, I had entered the league HITTER’S data in some areas in which I should have entered the league PITCHER’S data—hits by the hitters, rather than hits allowed by pitchers, for example.   It doesn’t make any real difference; it’s the same one way or the other until inter-league play, and not very much different after that, usually much less than a 1% discrepancy, but still . . ..I fixed those. 

The consequence of these four adjustments is that many of the findings I have previously reported here would now be outdated, or, if you prefer, incorrect.  It’s not what the data would show, necessarily, if evaluated today. 

It’s a normal part of research.  You always find things that you didn’t do right early in the study, or data that you had missed, or data problems that you hadn’t planned for.   It always happens; you’re always have to go back and re-do things.    

 
 

COMMENTS (8 Comments, most recent shown first)

MarisFan61
(Thanks.)
10:39 PM Mar 28th
 
myersb
This is the first time I've logged in in a week or so, and I am SHOCKED to see that apparently Bill, whose life's work has been all about finding the answers to questions, is refusing to tolerate discussion of the runs scored = runs saved conundrum.

I guess that discussion will have to take place elsewhere, as I see it already is.

Very surprised. This is the first time I've ever seen something here or in past writings that I completely disagree with. So either I'm wrong, and am asking to be told why, or I'm not, and being told to shut up about it.

Really?​
7:39 PM Mar 28th
 
Fireball Wenz
I like to test these conclusions against common knowledge, and was surprised to see the 68 Senators there, because two of their players, Brinkman and Casanova, had stellar reps. But I see Brinkman only played 76 games and made 11 errors. Frank Howard, playing LF and 1B, actually led the team in errors! Tim Cullen, another good gloveman, had a horrible year, and guys like Frank Coggins and Hank Allen were brutal. Everyone's range factors look OK because the pitchers weren't striking anyone out.
10:27 AM Mar 28th
 
KaiserD2
I share MarisFan61's view that there is a lot more to be said about the issues raised by Bill's runs saved methodology. Since Bill has made it clear that he doesn't want any discussion of them here, I have put my own thoughts down at my own blog site, baseballgreatness.com. I invite anyone who is interested to read it there and to comment.

David Kaiser
8:17 AM Mar 28th
 
Brian
Interestingly, whether the context is 1930 in the Baker Bowl or 1968 in the Astrodome , the offense starting point is constant at zero runs. The defense starting point , on the other hand, fluctuates based on context. I am guessing when you get to the point of taking the park effects out of it that will put you closer to a constant starting point from the defensive side.
11:51 PM Mar 27th
 
MarisFan61
Bill, please allow me this -- it's the only thing I'm going to say about it (evermore).

The crux of the thing I was trying to ask about before is this thing that you're saying as part of the basis of this article:
"The "zero point" for them is twice that number." (i.e. twice the number of expected runs allowed)

I hope you could explain, either here or elsewhere, why this is so. Is it sort of a "definition," i.e. you are somewhat arbitrarily choosing to define that as the zero point, or some factual inference that follows from other things?
That's the essence of what I could not see; and from the discussions that we had, others didn't either.

(Promise -- that's the last I'll say of it. And truly, I'm genuinely interested to know, as are others of us.)
11:49 PM Mar 27th
 
willibphx
Noting the inclusion of four teams from the 1981 short season will you be adjusting the numbers up to a full season or a per game comparison for worst team allowing or creating runs? The Indians caught my as they over .500 for the season.
10:52 PM Mar 27th
 
CharlesSaeger
I believe Pete is using an estimate of OCS based on the league rate of catcher assists, runners and left on base (basically, using his old Outs On Base formula with DP for OOB (which they mostly are), so H+BB+HB-R-LOB-DP). I'm sure those aren't actual totals—the league CS% is too stable at 44% between 1898 (when they changed how SB were scored) and 1920 (when they started counting CS officially) though they're reasonable estimates.

Frankly, having the OSB total is worth far more than the OCS total, since it varies much, much more. And OCS is going to be mostly proportional to catcher assists.
10:21 PM Mar 27th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy