Runs, RBI and Home Runs
As a bird of whackground, word of background, there used to be a stat called Runs Produced, which was Runs + RBI – HR. No one pays any attention to this stat anymore; it was kind of amateur sabermetrics from the paleolithic era, but for some reason the discussion about this essentially obsolete stat has come back to the surface. I always argued, because I always believed, that it made no sense to subtract home runs from the stat, if you wanted to accurately evaluate hitters. Suppose that you have two teams of players. One team has scored 800 runs, driven in 750 runs, but has hit 200 home runs; the other team has scored 800 runs, driven in 750 runs, but has hit only 100 home runs. Is one of them better than the other? The unnatural removal of the home run, I always thought, was an expression of bias against power hitters, derived from the prehistoric preference for batting average.
Apparently not. Tom Tango has taken time to try to tease through the test of whether Runs + RBI – HR or just Runs + RBI, leave the home runs alone, was actually the better formula to evaluate a hitter. He concluded that Runs + RBI – HR was in fact the better formula. I know this because he told me so, private and public exchanges, but I didn’t go to his blog and check out his research because I didn’t want his way of thinking about the issue to bias my own way of thinking through the problem. If I was to try to figure out which was the better formula, how would I go about it?
I decided to approach it in this way. There must be SOME way to translate the "runs produced" by this formula into runs created, right? How do we do this?
I actually tested three formulas:
A. Runs + RBI – Home Runs
B. Runs + RBI
C. Runs + RBI + Home Runs
I took all players who had 500 plate appearances in a season from 1980 to 2018, which is a total of 5,346 players, or about the average number of fans at a Baltimore Orioles game. For each of those players, I estimated his Runs Created by a good and careful method, and also figured his Runs Produced by formulas A, B, and C.
The 5,346 players in the study created an average of 83 runs—83.0919, if you really must know.
By formula A, the players had "produced" an average of 136.0322 runs. The ratio of Runs Produced to Actual Runs Created, then, is 1 to .610 825.
Suppose, then, that we estimated each player’s Runs Created by the formula (Runs + RBI – HR) * .610 825. By this formula, the #1 offensive season of that era was by Manny Ramirez in 1999, when Manny scored 131 runs and drove in 165 runs with only 44 measly home runs. That’s 252 runs produced (131 + 165 -44), which, multiplied by .610 825, suggests that he created 153.9 runs.
In reality, he created only 139.1 runs, probably, an error of 14.9 runs, the way it rounds off. Suppose that we figure the error for each of the 5,346 players in the study. Then we repeat the process with Runs Produced formula B, and formula C. I’d share the numbers with you, but that doesn’t matter; the only thing that matters is the conclusion.
When I ran that study with Formula C, Runs plus RBI PLUS home runs, the gross error for the 5,346 players was 50,256 runs, or 9.40 Runs Per Player.
When I ran the study with Formula B, Runs plus RBI, leaving the home runs alone, the gross error was 44,118 runs, or 8.25 Runs Per Player.
When I ran the study with Formula C, Runs plus RBI MINUS home runs, the gross error was 43,884 runs, or 8.21 Runs Per Player.
Conclusion? Runs plus RBI minus Home Runs is in fact the most accurate of the three formulas, or, perhaps better stated, is the least inaccurate.
It should be noted, because it will later become relevant, that the difference between the accuracy of Formula A and the accuracy of Formula B is very, very small—an average error of 8.21 runs as opposed to 8.25.
What is very apparent from this data is that there would be a more accurate version of the formula, if we cared to pursue it, which would be something like (Runs + RBI - .55 * home runs) or maybe
(Runs + RBI - .62 * home runs) or something like that. It is intuitively obvious, to me, that the graph charting the error in the formula has reached bottom and headed back up somewhere between (Runs + RBI - .000 * Home Runs) and (Runs + RBI – 1.000 Home Runs), somewhere probably in the range of .600. But since no one would or should use that formula if I figured out what the optimal point was, I’m going to let that pass.
But there is one more question to be answered here. What if the number of outs is different? In other words, we take the players who created 100 runs by this estimate, and the players who created 100 runs by that estimate. If may be that estimate "A" is better than estimate "B"; it not only may be true, it is true, we’ve established that. But what if the "A" players create their runs while making more outs?
I sorted players by runs created, estimated by Formulas A, B and Formula C, and then looked at the number of OUTS used by:
1) The top 500 players,
2) The top 1000 players,
3) The bottom 1000 players, and
4) The bottom 500 runs.
And, sure enough, it does turn out that the players who are credited with creating more runs by Formula A are also making more outs. The top 500 players in Runs Produced, by Formula A, made an average of 434.3 outs, while the top 500 players in Runs Produced, by Formula B made an average of 432.0 outs.
There are two questions here:
(q1) Which formula is most accurate in estimating the number of runs created by the player? And
(q2) Which group of players, identified by which formula, is actually the most effective at creating runs, per out used?
Formula A, while it is the most accurate in estimating runs created, is BY FAR the least accurate formula of the three at identifying the most effective offensive players. It achieves "accuracy" at the expense of ignoring the number of outs made.
The 500 players identified by formula A (as producing the most runs) created an average of 7.31 runs per 27 outs. But the 500 players identified by Formula B as producing the most runs created an average of 7.53 runs per 27 outs (actually 7.53496), while those identified by Formula C as producing the most runs also created an average of 7.53 runs per 27 outs (actually 7.528.) This is a relatively huge difference—much, much larger than the difference noted by the other approach.
The 500 LEAST effective hitters, as identified by Formula A, created an average of 3.83 runs per 27 outs, while the 500 least effective hitters, as identified by Formula B, created an average of only 3.79 runs per 27 outs.
So I have to say, bottom line, that I have reached a different conclusion in regard to this issue that Tango reached, repeating that I have not read his study, and don’t know how it was done or why he reached that conclusion. The better formula is NOT Runs + RBI – Home Runs; it is simply Runs + RBI.