Remember me

Re-Starting Runs Saved Against Zero

June 17, 2020
               Re-Starting Runs Saved Against Zero

 

            I am back to work on the project of estimating how many runs have been saved (prevented) by each pitcher and each defensive player since 1900.   I started back to work on it over the weekend of June 6-7.   Fortunately, when I was working on it before, I saved all of the articles in one word file, so that I was able to read through what I had done beginning to end—a good research practice which is quite unlike me; ordinarily I would have to chase down all 20-some articles and try to piece together the sequence.

            Anyway, reading through the articles or as much of them as I could stand to read, I immediately spotted several major errors in my approach, as you sometimes will when you walk away from your work and can look at it with fresh eyes.   Making these various mistakes, I had entangled myself in a jungle of bad assumptions and false starts, to such an extent that I was confused, and, for the last 5 or 7 articles, just kind of wandering around aimlessly, hacking through the underbrush to no particular purpose. 

            In the previous set of articles I created two very different ways to estimate a team’s runs saved, both of which were frankly terrible.   In re-reading the material, one of my first thoughts was "There has to be a better way to establish Team Runs Saved, so what is it?"  This led to a new and much better Team Runs Saved method.  I would go so far as to say that the previous two methods were wrong, and this one is right. 

            I suppose what I should emphasize now is that the theoretical zero point is the point at which runs saved are equal to runs scored.   Supposing that an average team in a given set of circumstances would allow 750 runs, then the zero point is 1500 runs allowed. The new figure for Team Runs Saved is calculated thus:

1)     Figure the League Average of Runs Scored Per Inning,

2)     Apply that to Team’s Innings Pitched (League Average Expected Runs) (LAXR)

3)     Park-Adjust that,

4)     To figure the expected Runs Allowed for the team (TXR),

5)     Divide the team’s innings pitched by two,

6)     And subtract that number from Team Expected Runs Allowed (TXR). 

7)     The resulting figure is the Park Run Adjustment, or PRA. The Park Run Adjustment can be either positive or negative, and averages zero in theory, although it is not exactly zero in fact.   The PRA is a positive number in a hitter’s park, and a negative number in a pitcher’s park.  It would average zero in fact if the historical run average for all teams was 4.50 runs.  Since the historical run average for all teams is not 4.50 but 4.47, the average PRA is not zero, but -6. 

8)     The new figure for Team Runs Saved is Innings Pitched, plus the PRA, minus the runs allowed by the team.   In other words, we are assuming that, in a neutral park in an average league, a team with zero pitching and zero defense would allow one run per inning, or 9 runs per 9 innings.  We adjust this for the PRA—which is actually a park AND LEAGUE run adjustment, since what we were adjusting is the league norm—and subtract the number of runs that they allowed.  The difference is the number of runs that they saved.

 

For illustration, the 1968 St. Louis Cardinals—the team for which Bob Gibson posted a 1.12 ERA—allowed 472 runs in 4,437 thirds of an inning, which is 1,479 innings.   The league totals were 5,577 runs scored with 44,043 thirds of an inning.  Given their innings pitched, which were almost exactly one-tenth of the league’s total—it was a ten-team league—we would have expected the Cardinals to allow 562 runs scored (561.8407).

However, Busch Stadium in that season functioned as a pitcher’s park, with a park adjustment factor of .925 439.  Based on that, we multiply the league average expected runs (LAXR), 562, by the park adjustment (.925 439) to get the expected runs allowed by the team, which, for the 1968 Cardinals, is 520 (519.949). 

If they had been expected to allow 4.50 runs per game—assumed to be the average number—if they had been expected to allow that number, that would be 739.5 runs.  Their expected runs allowed is only 520, which is a difference of 220 runs.  The team’s Park Run Adjustment is -220 (-219.551).    What we are saying, in English, is that this team is expected to allow less than 4.50 runs per game because of their park and league.

 

The Cardinals pitched 1,479 innings.  Had they had zero pitching and zero defense in a historically neutral park, then, we theorize that they would have allowed 1,479 runs.  Because of the park and league, we reduce that number by 220 runs, making 1,259.   We theorize that, in this run condition, had they had zero competent pitching and zero defense, they would have allowed 1,259 runs. 

In fact, they allowed only 472 runs.  The difference is 787 runs—1259, minus 472.   So the Cardinals pitching and defense in 1968 "saved" 787 runs.  This is a good number.   Among all teams in baseball history since 1900, it ranks in the top 20%.   It’s not a GREAT pitching-and-defense combination; it’s a very good pitching-and-defense combination, working in a very low-run environment.

 

This method of establishing Runs Saved by a Team is clearly better than either of the two methods that I tried before in this series of articles.  This method does extremely well on three critical tests of what the Team Runs Saved should be:

 

1)      The average for a team is almost the same as the average runs scored by a team,

2)     The standard deviation, also, is almost the same, thus fulfilling the essential condition that offense and defense are equal components in success, and

3)     Good teams do dramatically better in this measurement than weak teams. 

There are 2,550 teams in my data, or five groups of 510 teams.  The top 510 teams, the 510 teams which we credit with saving the most runs, have an average won-lost record of 91-68.  The bottom 510 teams have an average won-lost record of 62-88.  

 

The essential difference between this method and the method that I started with is that it applies the park/league run adjustment AS A NUMBER, whereas the earlier method applied the adjustment AS A PERCENTAGE.   Applying it as a percentage, the earlier method applied the run adjustment TWICE, essentially applying it to the runs that they would have allowed if they had had an average defense, and then applying it again to the runs they would have allowed above that number, if they had very bad pitching and defense.   This method applies the run adjustment only once.    The earlier method created very high Runs Saved numbers for teams playing in high-run contexts, thus concluding that the team which saved or prevented the most runs, in all of baseball history, was the 2000 Colorado Rockies. 

The 2000 Rockies DID save a lot of runs, and they still do very well in the new method.  They’re just not #1 anymore.  Before, I was crediting them with saving 1,227 runs, with the average figure being 699.   Now, we’re crediting them with saving 880 runs, with the average being 705.  

There are two facts which convince me that this new method is "right", and that the old method was, frankly, wrong.  One is the standard deviation of runs saved.   The standard deviation of runs scored by a team within the data is 90.  By the new method, the standard deviation of runs SAVED (prevented) by a team is 89.  By the old method, it was 127.    In other words, the old method was saying that, in winning games, preventing runs was much more important than scoring them.   How I failed to pick this up in my original work, I don’t know.  I should have picked it up sooner. 

Also, the new method—despite the lower standard deviation--creates a much higher correlation between Runs Saved and Winning Percentage.  The new method does a much better job of making the good teams look good and the bad teams look bad.   So I’m pretty much convinced that this is the method I should have used before.  Under the new method, these are the 20 teams in the study which prevented the most runs:

 

YEAR

City

Team

Lg

Team Runs Saved

1926

Philadelphia

A's

AL

1135

1905

Chicago

Cubs

NL

917

2017

Cleveland

Indians

AL

991

2019

Houston

Astros

AL

1041

2018

Houston

Astros

AL

932

1955

Boston

Red Sox

AL

1103

2007

Boston

Red Sox

AL

1052

1970

Chicago

Cubs

NL

1065

1973

Baltimore

Orioles

AL

906

2002

Atlanta

Braves

NL

903

1906

Chicago

Cubs

NL

797

1913

New York

Giants

NL

887

1997

Atlanta

Braves

NL

908

1991

Toronto

Blue Jays

AL

952

1929

Philadelphia

A's

AL

1049

2011

Texas

Rangers

AL

1026

1969

Baltimore

Orioles

AL

829

1957

Brooklyn

Dodgers

NL

977

1904

Cincinnati

Reds

NL

938

1993

Atlanta

Braves

NL

879

 

Of those 20 teams, all 20 had winning percentages of at least .545, and all were at least 14 games over .500.   15 of the 20 teams won at least 90 games, and ten of the 20 teams won more than 100 games.  The top 45 teams on the list all had winning records.

 

Among the teams showing as having saved the FEWEST runs, almost all came from the strike-shortened 1981 and 1994 seasons.   Among teams which played a full season, the teams which show as having saved the fewest runs are almost all 100-loss teams. 

 

In the end, if you could be patient enough to let me get there, which I know some of you will not be but I hope most of you will. . .in the end, you will see that this "Team Runs Saved" number acts as a "cage" from which the wild animals cannot escape.  The wild animals are the individual category numbers.   The number for strikeouts is a wild animal; the number for home runs allowed is a wild animal.   The Team Runs Saved number is the cage from which these Wild Animals cannot escape. 

In the end, the category numbers will act as "claim points" against the fixed boundary of the team runs saved.   What the system is essentially saying is, "We know that the St. Louis Cardinals in 1968 saved 787 runs as opposed to a team which had no ability to prevent runs.   Who will step forward to claim those runs?"   We WANT the individual numbers on the 1968 Cardinals to add up to 787, as much as possible—but we know that they won’t.   We’ll have to adjust them to make them add up.   But that’s way down the road. After we have struggled long and hard to make everything add up right, we will adjust for the fact that it doesn’t.   The Team Runs Saved number is the "control".   It is what we will adjust to, when we get to that point.  

 
 

COMMENTS (16 Comments, most recent shown first)

FrankD
Another example (although probably too late for anybody to read) ....

Your team ('system') has a way of detecting what the next pitch will be. They beat on a garbage can to tell the batter. The batter hits it over the fence. Is this just the batter's HR, not to be credited a little for the 'system'? And, if not, then why are members of the 'system' banned for a while?
6:26 PM Jun 19th
 
FrankD
I'm sure I'm not explaining myself properly about the 'system' and defense. What I'm not concerned is about training. I am concerned about in game decisions and how players are used. Maybe this can't be separated from individual performance (the data we have). In the BJHBA there is a discussion on intelligence and the decision of where to position before the ball is hit. A smart player or smart 'system' will have the player play in the (hopefully) optimal position. But what if the 'system' orders the player to a non-optimal position: that will show up in the data but it is not the players fault, nor is it necessary to reward the player for the 'system' putting him in the best slot. It is not an effect if we assume all teams (systems) play each player in the optimal position. This effect, if largest enough, may be detectable in defensive performances when players switch teams ....
5:57 PM Jun 19th
 
LesLein
In my previous comment I forget to mention that the old Baseball Encyclopedias used to adjust the park factor calculation to account for the fact that each team had different road parks. Unfortunately I threw away mine.
9:52 AM Jun 18th
 
LesLein
How is the park factor calculated? Is it home game totals versus road game totals? If so, couldn’t this be misleading? The neutral park for the 1968 Cards doesn’t include Busch Stadium. Every team has a different neutral park. Doesn’t this distort the results a little?
9:46 AM Jun 18th
 
bjames

Responding to Frank D
How do you separate the individual numbers from the system?


This is not a real issue; it's just something that people like to talk about. We evaluate things from the end point, the production, not from the inputs. The batter who hits a home run was taught to hit by his father, when he was 7 years old, and was taught something else by his high school coach, and was taught something additional in the minors. When he hits a home run, HE hits the home; he is credited with it. If his batting coach instructed him not to swing until he has a strike on him, or if he instructed him to swing at the first pitch. . . .it doesn't matter. It's got nothing to do with anything. The player is evaluated by what he does.
12:26 AM Jun 18th
 
bjames
Maris--

If you have a moment of understanding, hold on to that moment. Don't let anyone confuse you with a different interpretation. If you have a clear understanding, even for a moment, then try to get back to that moment. You can build the understanding from there.
12:22 AM Jun 18th
 
FrankD
Better example, look at NFL and Belechek: is it the player(s) or the system? Ok, they got Brady. But they seem to pick up roll players from other teams who will listen to Belechek et. al. and do very well. Now, football is less individual than baseball but defense in baseball is more team oriented than offense. I think we really have to assign a team score to defense before we can properly evaluate an individuals contribution to defense. In a way its a park effect, lets just call it a defense system effect.
10:04 PM Jun 17th
 
FrankD
Interesting .... now that you've bounded defense, as you've said the next step is the allocation of credit to this bound. The infield plays more balls so therefore must get more credit. But, how do you credit the 'institution' that tells where the defenders should play. Clearly teams given exactly the same players but different info on where to set up will perform differently. I don't know how to do this but there must be some fudge factor for the team leadership as a whole. You often brought Stengel's Yankees and their double play numbers. Now Rizzuto, Martin, McDougald, et. al. were good, but how much did Stengel's system/leadership/whatever help? How do you separate the individual numbers from the system? A wild hypothetical: a team gets only sinker ball pitchers, manicures the infield accordingly, and emphasis defensive players for infielders. How do break this down for credit for individuals. Are the '39 '40 Reds the best defense ever or did they set this all up?
9:52 PM Jun 17th
 
GuillermoMountain
Beautiful and logical. It is ironic that we can quantify the value of the one guy at the plate more than we can quantify the value of the nine other guys trying to stop him. Maybe ironic is the wrong word choice; I guess it makes sense that one big contribution is more easily measurable than a bunch of smaller ones. Thankfully, we will likely understand the “difficult half” of the game so much better when this project is completed.
8:52 PM Jun 17th
 
cderosa
Hi Bill,
I'm very interested in this project, thanks for sharing your work on it.

I understand why you have to use a kind of stand-in figure for "zero runs prevented" because literally zero runs prevented is an infinite number of runs.

The figure you've chosen (twice the average runs scored), produces a .200 winning percentage in the Pythagorean formula. That is, given average offense in a neutral park, a team that allows twice as many runs as scored would be expected to win 20% of its games.

That seems like a fine "bottom," a performance that is basically devoid of major league ability.

But if we take a .200 team that is pathetic offensively, and average defensively (preventing an average number of runs), it won't score zero runs. If the league average of runs scored/prevented is 750, a team would have to score about 375 runs to win 20% of its games.

So Runs Prevented is going to be different that Runs Created, which assigns *all* of a team's runs to players. Runs Prevented is going to account for the Runs Prevented beyond what it takes to play .200 ball given average offense.

My question is whether that puts Runs Prevented on a kind of "platform" that is going to cause you problems later, in trying to build ways to weight offensive and defensive performances.

Chris DeRosa

8:32 PM Jun 17th
 
willibphx
First of all congratulations on the breakthrough, a small adjustment in logic but a major step forward. Lots of work to do to cage the wild animals but the foundation looks impressive.
3:57 PM Jun 17th
 
trn6229
Thank you, Bill. As a long time Strat-O-Matic Baseball player, Bob Gibson and his 1.12 ERA is seared in my brain. He started slow, was around 3-5, lost some low scoring games. Then from around June 1st over his next 100 innings pitched, he allowed something like three runs in 100 innings. The 1968 Cardinals could field well. McCarver was a good catcher then, Cepeda, Javier, Maxvill and Shannon could all field well. The outfield of Brock, Flood and Maris was excellent. I know their was a Sports Illustrated cover of that team in the locker room in street clothes and showing how much money they each made. Bob Gibson was the epitome of an angry black man at that time. I like Juan Marichal better. He won more games in the 1960s. It is not Marichal's fault that the Giants made some dumb trades.

Take Care,
your friend Tom Nahigian
2:31 PM Jun 17th
 
MarisFan61
.....One member on Reader Posts says he sees it completely differently than what I said.....
So please consider it a 'work in progress' on our part.
2:25 PM Jun 17th
 
chuck
Bill, would any of the teams from either the strike-shortened seasons or from seasons with shorter schedules (1903, 1918, 1919 for example) show up as runs-saved leaders if you were to show the runs saved on a per game basis? My guess is that perhaps a Braves team from 1994 or 1995 might.
12:51 PM Jun 17th
 
MarisFan61
Bill: I feel less bad. :-)
Adding "theoretical" to runs saved being equal to runs scored seems to make all the difference in the world.
As you may recall, some of us (me most, I think) went through conniptions trying to understand why it would be that runs save "is" equal to runs scored (and you got furious over our saying it wasn't clear why this would be so).
Being the theoretical thing: I easily see that.

Or am I wrong again about this being a big difference and a key part of what you have tweaked.....
12:09 PM Jun 17th
 
evanecurb
I feel as though I witnessed a Eureka! moment in real time, Bill. Thanks for taking the time to explain this discovery. I'm very interested to see the new results. The top 20 teams certainly pass the smell test, as there are two Tinker-Evers-Chance teams, two Atlanta "Big 3" starting pitcher teams, and two Brooks-Belanger-Blair O's teams.
10:01 AM Jun 17th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy