Component Theory
Having made a better estimate of the runs saved by each team, we can now move on to the business of attributing those runs saved to "components" of the game, which is a step toward attributing them to individual players. Here again, I have made many changes to the method outlined before, changes that seemed obvious as I read through the work I had done earlier, and put aside for a few weeks.
The BIGGEST mistake that I was making, other than using a bad estimate of Team Runs Saved, is that I was frequently, and in different ways, doing something that I sometimes criticize other researchers for doing. I was confusing measurement with evaluation. This is measurement. I am simply trying to measure the number of runs saved by each pitcher and each fielder. Once I have the measurement, we can use that to make evaluations, but we’re not there yet.
Evaluation depends upon context, but measurement is context-neutral, or more context-neutral. What I am trying to do here is parallel to what I did in the 1970s, when I developed Runs Created.
Suppose that you have two players with identical batting stats; let’s say that they each hit .270 with 12 homers, 138 hits in 511 at bats, same number of doubles, triples, home runs and walks drawn; the same number of everything—but one player did this in 1968, whereas the other one did it in the year 2000, the heart of the steroid era. Is the number of runs created by the two players the same, or is it different?
The number of runs they have created is assumed by the runs created method to be the same. Of course there are minor contextual differences in the number of runs that can be expected to result from a single, a double, a triple, minor variations over time, but not REALLY; the run impact of each event does not change significantly between eras. Of course there is a difference in the value of the two players; the player who creates 70 runs in 1968 has more value than the player who creates 70 runs in 2000, because it takes more runs to win a game in 2000. But that’s evaluation. This is just measurement.
The same with a pitcher, or with a fielder. A pitcher who pitches 200 innings, strikes out 140 batters, walks 70, gives up 16 home runs. . . .the pitcher’s run contribution is the same, whether he does this in 1968 or in 2000, just as the batter’s is the same. Of course, his VALUE is different, and we will adjust for that.
Well, we’ve adjusted for it already, because the team’s Runs Saved is different in a different Run Context. We’ll adjust for the context when we get to the evaluation cycle.
But before, I was adjusting RUNS SAVED for context, and also adjusting EACH ELEMENT for context, as we went along—thus, I was, in essence, adjusting for the same thing twice.
Try to explain it this way: WE WILL GET TO CONTEXT. I promise you, we will not lose sight of the context in which performances occur here. But what I am going to try to do now is put everybody on the same playing field. I’ll explain more as we go along.
In the earlier work, I measured each team’s runs saved in twelve categories, which I had simplified to eleven by combining Stolen Bases Allowed and Caught Stealing into one number. For the moment, I am going to undo that, moving us back to twelve categories, and then reduce those 12 categories down to seven.
Actually, we’re going to have to reduce 13 categories down to seven. In the work I did before, I don’t think I dealt with outfield assists. I don’t know why; maybe I just spaced out on them, or maybe I thought they were too small to worry about and I would work them in somewhere later. Anyway, the 7 categories that we will deal with now are:
1) Strikeouts,
2) Walks and Hit Batsmen,
3) One Base Advancement Events, which are Wild Pitches, Balks, Passed Balls and Stolen Bases,
4) Hits Allowed on Balls in Play,
5) Home Runs Allowed,
6) Negative Baserunner Events, which are Double Plays, Outfield Assists and Runners Caught Stealing, and
7) Errors.
We’re working with 13 categories there; we’re just counting it as seven by combining events which have the same or very similar effects. We’ll take them back apart, of course, later on.
Two of those categories, Categories 1 and 6, are positives for the defense. In my next effort to evaluate each category, I will consider ALL strikeouts to be run-prevention events, rather than strikeouts above some minimum, and I will consider all runners caught stealing to be run-prevention events. Double plays will essentially all be positive events, as well, except that we have to normalize them for context before we can do anything with them. It’s a different problem. Double Plays are now like the only thing that we are neutralizing the context of before we estimate the runs saved.
The two "positive" categories, the two "High Number is Better" categories, 1 and 6, are counted against zero, whereas the "negative" categories, the Low-Number-is-Better categories, are all except one measured against a standard of five standard deviations worse than the average, the average being the average across all 120 years of the study. So there is a question there: is the strikeout pitcher being given an advantage, because ALL of his strikeouts count, whereas the negative categories are counted against a limit?
If anything, the strikeout pitcher is working at a DIS-advantage in this way. The average pitcher, over time, has struck out .136 of the batters he has faced. The standard deviation on the team level is .043. The average pitcher is only 3.15 standard deviations from zero. The strikeout pitcher, being measured against zero, is being measured against a standard which only 3.15 standard deviations below the norm, whereas most of the other categories are being measured against 5.00 standard deviations below the norm. (Home Runs, against 4.00 standard deviations above the norm.)
And the same with the other "positive, high-number is good" category, which is adjusted double plays plus runners caught stealing. Those, again, are measured against zero, but again, zero is only 3.47 standard deviations below the norm. But there isn’t really anything I can do about that. We’re trying to make a symmetrical measurement of an asymmetrical universe. A pitcher gives up one hit an inning, on average, more or less. You can’t give up less than zero hits an inning, but you can certainly give up more than two hits in an inning. It is the nature of raw stats that they are zero-bounded on one side, but unlimited on the up side. Zero competence has different consequences in different areas.
With every strikeout counting and strikeout totals going up and up, the pitcher in modern baseball is credited with many more runs prevented by way of the strikeout than was the pitcher in, let’s say, 1928. But suppose that you compare a pitcher in 1928 and a pitcher in 2018, but not a pitcher with identical stats, but rather, a pitcher with the same impact, but each representing the fashion of his era. The pitcher in 2018 might have 240 strikeouts in 200 innings, whereas the equally good pitcher in 1928 might have 60 strikeouts in 200 innings. If the pitcher in 2018 has 180 more strikeouts, and if each strikeout is given a run value of .25 runs, then the pitcher in 2018 will be credited with preventing 45 more runs, even though the higher strikeout total is in part a creation of the game that he is playing, in which every hitter is trying to hit a moonshot.
BUT. The 200-inning pitcher in 1928 might have given up 8-10 home runs all year. Bob Smith, 1928, pitched 244 innings, gave up 11 home runs. He was 13-17, 3.87 ERA, gave up 11 home runs. The comparable pitcher in 2018 might be Chase Anderson of Milwaukee, 3.93 ERA. He struck out 128 batters in 158 innings. In 1928 he would have had a higher strikeout rate than Lefty Grove, who led the American League in strikeouts.
But he also gave up 30 home runs in 158 innings. It’s maybe, what, 22, 23 homers more than Bob Smith, pro-rated per inning? If each home run has a run cost of 1.4 runs and Smith gets credit for NOT allowing home runs, they come out somewhere about even.
At least, that’s the way I am going to TRY to make this work now. I’m still working on it; we’ll see how it goes.