The Big Item on Our Agenda
Perhaps now would be a good time to mention that this entire effort has been an extended April Fool’s Joke?
The Big Item on our Agenda is DER, Defensive Efficiency Record. The next three categories to be dealt with are Balk Avoidance, Defensive Efficiency, and Error Rate/Fielding Percentage.
Balks are trivial and annoying and we have already talked about them two or three times, so let’s just skip them in the current discussion. They’re in there, but they don’t really make much impact.
For many years, Defensive Efficiency was the biggest item in Run Prevention. "Defense" was essentially how runs were prevented, and balls in play were essentially how runs were scored. In the 1920s teams struck out 2 to 3 times a game. There were many fewer home runs, somewhat fewer walks, and many more balls in play. Balls in Play resulted in outs; Balls in Play resulted in runs. The essence of the game was Balls in Play.
Our analytical pathway here is parallel to the other categories. I figured the DER for every team, and reported on that earlier; the highest ever was the 1906 Cubs, with Tinker and Evers and Chance. Then I figured the norm for each decade, and the Standard Deviation for each decade. Then I figured what 3 Standard Deviations below the norm would be, and 4 Standard Deviations, and 5 Standard Deviations, although I am only using "four" in these test runs, but I don’t actually know what the right zero-value point is, but I’m guessing 4 Standard Deviations as a starting point to the analysis.
Then I figured how many Balls in Play would have become hits against a given team at that DER, and subtracted the number of hits they gave up (Excluding Home Runs) from the number that they would have been expected to give up with a zero-value defense. The difference is the number of hits the team prevented, compared to a totally incompetent defense.
But then we have to place a run value on each hit, and what is that value? The problem is that a ball in play that is not caught and becomes a hit might be a single, a double or a triple, and I don’t have the data about how many doubles and triples are allowed by each team in my data set. That data DOES exist now; it did not exist 20 years ago, but it does now. But it would probably take me two or three weeks to type all of that stuff into my data, for very little gain, so I don’t think I’ll do that; we have to leave something for the next generation of researchers. Maybe it wouldn’t take me two weeks; maybe it would only take me three days, I don’t know, but in any case I don’t want to do it.
Let’s assume that 70% of the balls in play which become hits are singles, 24% are doubles, and 6% are triples. Let’s assume that the value of a single is .70 runs, the value of a double is 1.00 runs, and the value of a triple is 1.27 runs. With those assumptions, the run value of a ball in play which becomes a hit would be .8062 runs; I know that I haven’t lost all of my marbles yet because I was still able to calculate that in my head. Very easily; I wonder if, after I lose my other mental faculties, I’ll still be able to do that? I visualize myself on my deathbed; I no longer recognize my children and cannot remember the words for "salt" and "pepper", but somebody asks me what 19 divided by 26 is, and I’m like, "Oh, everybody knows that, it .731."
Anyway, let’s assume that the value of a ball in play on which no play is made would be .81 runs. The 2019 Oakland A’s saved 199 runs on balls in play, contrasted with a completely incompetent defense, which is figured as follows:
1. The A’s had a team DER of .722 (.722 384 4).
2. The Decade norm is .705 (.705 433).
3. The Standard Deviation is .010 696.
4. Four Standard Deviations below the norm would be .662 648; I can’t do that in my head, or at least didn’t.
5. The A’s had 4,110 balls in play against them (6,153 batters faced, minus strikeouts (1,299), walks (477), Hit Batsmen (66) and Home Runs (201).)
6. If they had had a DER of .663 (.662 648) they would have allowed 1,387 hits on balls in play.
7. They actually allowed only 1,141 hits on balls in play,
8. A difference of 246 hits (245.517).
9. Multiply that by .81, and you have 199 (198.86).
The 199 runs is a very good total; out of the 2,550 teams in our study, that is the 785th best. The four BEST totals are actually all from the 1915 Federal League; it’s an atypical league, apparently. I’ll probably have to fix something at some time to get a different reading out of that league, but you know. . . I’m just outlining the system now, not doing finishing touches, and anyway, how much does anybody care about the Federal League? The highest total NOT from the Federal League is the 1908 Pittsburgh Pirates, whose defense saved them 408 runs. That’s a good choice. The DERs were very high in that era. The 1908 Pirates were a good team (98-56) and posted a 2.12 team ERA, second-best in the league, despite a well below average strikeout total and a slightly worse than average number of walks. I’m very comfortable with them as the team that saved the most runs by running down balls in play. They prevented a lot of runs, and they weren’t preventing runs by strikeouts or walks, so DER is a good explanation for their success.
On the other end of the scale is the 1930 Philadelphia Phillies, who are credited with only 4 runs saved by DER. Their DER basically IS the zero competence line. In this there is a story, but let’s move along for now.
Fielding Percentage. The team which saved the most runs by making the plays they are supposed to make, by my calculations, was the 1905 Chicago White Sox. They were managed by, appropriately enough, Fielder Jones. The White Sox were a good team; they finished 92-60 and led the league in ERA, at 1.99. They had a Hall of Fame shortstop, George Davis, and nobody else in the lineup was very notable. They won the World Series the next year, 1906, with a team that is remembered as the Hitless Wonders.
Anyway, they had a team Fielding Percentage of .968, which would be very low by today’s standards, but was very high by the standards of the time. The league average for Fielding Percentage was .957; the period norm was .952, and nobody else in the American League was higher than .962—six points behind the White Sox. Had the White Sox been 4 standard deviations below the norm, their fielding percentage would have been .919, and they would have committed 542 errors, as opposed to the mere 217 that they actually committed. So they get credit for not committing 325 errors.
But how many runs do you save, by not committing 325 errors? Tough one. Not so many people seem to have written about the Run Cost of an error, in part because errors are not parts of either the hitter’s record or the pitcher’s record, so who the hell cares about them, and in part because it is difficult to know what the cost of an error actually is. An error can be any of a number of things. It can be a ball that should have been fielded cleanly, but wasn’t, putting a runner on first base, or second base, or third base. Many errors, however, "merely" advance runners; a batter rolls the ball slowly to shortstop, but the shortstop, who has no real chance to make a play, makes an ill-advised throw to first, and throws the ball away. It is a hit AND an error. That is common; an error on a stolen base attempt, sending the thief to third base, is not uncommon. Some errors don’t really do anything; you pop up the ball in foul territory, the third baseman drops it, but the batter strikes out on the next pitch.
Errors are more diverse than other plays, more unlike one another than are singles, doubles, triples or walks. That makes them harder to evaluate. But if we assume that a single is worth .70 runs, and an error can be either more costly than a single or less costly, but is less costly than a single more often than it is more costly. . .well, does .60 runs for an error seem reasonable to you? It seems reasonable to me. If you know of a better reason to pick a number, let me know.
So we concluded that the 1905 White Sox get credit for not making 325 errors that they could have made with an infield of Dave Kingman, Jerry Browne, Jose Oquendo and Dean Palmer, and that each error not made has a run prevention value of .60 runs. That’s 195 runs. On the other end of the scale is the 1981 New York Mets, who beat the zero-value standard for fielding percentage by less than 1 run.
Through the study, the sum total of Runs Prevented by DER (Range) is estimated at 458,845 runs. The total of Runs Prevented by Fielding Percentage is estimated at 136,142 runs. Along with a few runs prevented by not committing Balks (5,705), this brings the total of Estimated Runs Prevented to 1,433,040. Despite my earlier misgivings, this is 80% of the expected total of Runs Prevented.
In a certain perhaps twisted sense, I think this validates my concept. My idea was that if I could (1) establish the level of zero competence in each performance area leading to run prevention, and (2) estimate how many runs were prevented by each, that the total of those should more or less balance with the total of runs prevented, assumed equal to runs scored. I established the level of zero competence by looking for the level at which there are no teams. There are basically no teams 4 standard deviations below the average, so that’s where I have put the limit. This all seems to have worked in a certain sense. I expected them to balance, more or less, and they do.
But man, have I got some problems with this thing. At this point, the origin of the disparities that I commented on in the last article are comically obvious. Many of the teams which I have represented as saving very few runs are calculating, category by category, as saving many, many more runs than I have assumed, while many of the teams which I have represented as saving quite a lot of runs are calculating as having saved many, many less than I have assumed.
Does that make sense, or do you need specific cases to explain? For example, the 1909 Washington Senators were projected to save only 368 runs. So far, accounting for their runs saved category by category, they have saved 695. They’re already 88% over budget, and the work isn’t finished. The 1908 St. Louis Cardinals are 81% over budget (710/397), the 1908 Yankees 77% (636/359), the 1906 Dodgers 70% (641/378), the 1908 White Sox 66% (882/531), the 1915 Giants 62% (675/416), etc. Altogether, 403 teams—all of them in low-run environments—have already exceeded their allocation of Runs Prevented.
On the other end of the see saw, the 1929 Philadelphia Phillies, budgeted for Run Prevention at 902, have only identified 286 runs so far (32%). The 1930 Phillies are also at 32%, the 1925 Phillies are at 40%, the 1999 and 2000 Colorado Rockies are at 40%, the 1950 St. Louis Browns are at 42%, etc. 43 teams are under 50%, and 499 teams are under two-thirds. We’re not going to get there. All of them are teams that played in high-run environments, and mostly, they are BAD teams that played in high-run environments.
So I understand what the problem is now, on one level. I have to use the term "environmental effects" here; no, I haven’t gone green on you. Environmental effects means the run environment. If the run environment is very high—a high-scoring park in a high-scoring league--then the value of each item of run prevention is inherently exaggerated. This is not a statistical quirk; it actually is. A strikeout is worth more in a high-run environment than it is in a run-scarce environment, because the hitter is likely to do more damage when you don’t strike him out in a high-run environment. A walk is more damaging in a high-run environment than in a low-run environment, because in the high-run environment it is more likely that the player who draws the walk will come around to score.
It isn’t more damaging in the WIN column, but it is more damaging in the RUN column. We’re dealing now with runs.
My system has no way of adjusting the value of each event for the run environment. So we’re evaluating run-prevention elements as if they existed in a neutral universe, but comparing the results to the results for teams that played in a specific run environment—sometimes high, and sometimes low. So what do I do about that?
Well, I COULD invent a way to adjust the value of each Run-Prevention event for the specific offensive context. I could do that, but I won’t be happy about it.
As I have noted previously, my Runs Prevented system is counter-intuitive, anyway. It proposes answers that seem, on the surface of it, totally wrong. It argues that the 1930 Phillies—the most notoriously inept pitching-and-defense combination of all time—prevented 845 runs, while the 1967 Chicago White Sox, who had a staff ERA of 2.45, prevented only 613 runs. On the surface of it, this seems nuts.
The system works that way, as I explained before, because the differences between run environments are actually much larger than the differences between good and bad teams. There is no doubt that that is true; you can verify that in 100 different ways. The differences between run environments are OBVIOUSLY larger than the differences in runs allowed by good teams vs. bad teams.
The problem is, there is an intuitive logic which says that the 1960s teams were preventing more runs than the big-hitting teams of the steroid era, and there is a mathematical logic, founded on the types of mathematical logic that we use all the time in the rest of Sabermetrics, which says the opposite.
If you ask me, do I really believe that the 1930 Philadelphia Phillies "prevented" more runs than the 1967 White Sox, well, no, I don’t really believe it. It is past credulity. I would be much happier with a system that accommodated our intuitive logic. But the problem is that, to work with the NUMBERS, I absolutely have to have a mathematical logic that holds together. It has to be coherent, it has to be comprehensive, and I have to be able to explain it and defend it. At the moment, I just do not have any such structure. And I don’t know how I would create one.
There is a second problem with my system at this time, which is that, as I have it structured now, pitchers are only going to be accounting for about 50% of Runs Prevented, with Fielders accounting for the other 50%. That doesn’t seem right, either. But one problem at a time.
I’ll keep plugging away. In the next installment I’ll estimate the Runs Prevented by Double Plays, Passed Ball Avoidance and Stolen Base Defense, which will finish this stage of the process. Then I’ll study what I have, and I’ll listen to what you have to say, and I’ll try to see if I can figure out a way forward. Thanks for reading.
April Fools!