Pause
OK, we have completed the first two phases of our project here, the first two phases being (1) finding the "floor", the level of zero competence, in each of the 11 specified areas, and (2) estimating how many runs are saved, historically, by each team in each of those areas, compared to the floor. Floor, base, misery level, zero-competence level. . . .those are interchangeable terms here. We have staggered drunkenly to the end of the second stage, more or less.
I am going to make two changes to what I have done so far, or one change twice. I am going to change the zero competence level for strikeouts and double plays from three standard deviations below the norm to four standard deviations below the norm. There are two reasons for doing this, which I suspect are pretty obvious, but I’ll spell them out, anyway:
1) The standard for everything other than strikeouts and double plays is four standard deviations below the norm, so those two categories were out of line with the rest of the system. There was a REASON why they were out of line, there was a theory behind it, but still, it is easier to explain the system and easier to justify the system if you can just say that the zero-competence line is 4 standard deviations below the norm on the team level, rather than saying that it is 4 standard deviations below the norm except when it isn’t, and then getting into a long-winded explanation about zero-bounded and unbounded categories.
2) Our estimate for all 11 categories was 13% below the target. Increasing the percentage of strikeouts and double plays that we give credit for brings us closer to the target of 1.78 million runs saved for all teams.
Changing from -3SD to -4SD for strikeouts increases the number of ERP (Estimated Runs Prevented) by strikeouts from 248,355 to 331,131, an increase of 82,776. (Occasionally I like to make up acronyms, even though I know I won’t use them.) Changing from -3SD to -4SD for double plays increases the number of ERP (Estimated Runs Prevented) by double plays from 62,176 to 82,877, an increase of 20,701.
Also, while doing this, I realized that I had failed to include the Runs Prevented by Balk Avoidance in my running total; that’s another 5,705 runs. These three changes—two changes and one correction—bring the total of estimated Runs Prevented to 1,657,784, or 93% of the target number.
Of course, we COULD make the calculations match the target perfectly by going to 4.07 standard deviations below the norm or some other number, but that would be silly, because our values at this point aren’t THAT good. I mean, our values must be more or less right, or it’s unlikely we would come as close as we did to hitting our target, but there are techniques that can be and will be applied to fine-tune the category values for this system, and we would have to do that even if we were exactly on target, which would just throw us off target again, so then we’d have to recalculate everything anyway.
The runs saved by each of the 11 elements of run prevention, historically, are as follows:
Category
|
Runs Saved (in Thousands)
|
DER
|
459
|
Strikeouts
|
331
|
HR Avoidance
|
325
|
Control
|
209
|
Fielding Percentage
|
136
|
Double Plays
|
83
|
Stolen Base Defense
|
49
|
Hit Batsmen Avoidance
|
33
|
Wild Pitch Avoidance
|
17
|
Passed Ball Avoidance
|
10
|
Balk Avoidance
|
6
|
We’re off by 7% on the gross weight of the whole. Figured team by team, we have an average error of 123 runs, or 18%, and a standard error of 155, or 22%. (The standard error is a method based on the squares of each error. It basically says that being off by 20 runs on a team is four times worse than being off by 10 runs on a team. The standard error cannot be less than the average error, and is usually more than the average error.)
Anyway, this will be the starting point (tomorrow) for the third phase of the project, which is the reconciliation stage. Trying to understand where you’ve gone wrong, and trying to make things add up better. 18 to 22% isn’t bad; I expected to be off by more than that, probably 25 to 40%--plus I have a pretty good understanding of WHY I am wrong or where I am wrong, so we should be able to move forward.
For whatever help it may give you in understanding what we are doing, this is the Run Prevention Chart for the 2016 Chicago Cubs:
2016 Chicago Cubs (103-58)
|
|
|
Runs Prevented By:
|
|
|
Strikeouts
|
236
|
|
Control
|
58
|
|
Home Run Avoidance
|
162
|
|
Hit Batsmen Avoidance
|
11
|
|
Wild Pitch Avoidance
|
4
|
|
Balk Avoidance
|
2
|
|
Fielding Range (DER)
|
251
|
|
Fielding Consistency (F Pct)
|
34
|
|
Double Plays
|
31
|
|
Stolen Base Control
|
9
|
|
Passed Ball Avoidance
|
3
|
|
|
|
|
Sum of the Above
|
801
|
|
Actual Runs Prevented:
|
802
|
|
Error/Discrepancy:
|
1
|
|
The 2016 Cubs, the team that broke the Billy Goat’s heart, are a team for which the system happens to get about the right answer, which at this point is just coincidence. The Cubs played in a context in which an average team, based on long-established sabermetric methods, could have been expected to score and allow 679 runs. They won big because they scored 808 runs, but also because they prevented 802 runs. They are more or less a 50/50 team; their pitching and defense were as strong as their offense, about the same.
Among the things you can do with this information, once we really get the system working, is to compare two teams in ways that we could not otherwise have compared them. Two teams competing in the World Series; you can analyze and compare their defensive performance now in ways that it was not possible before; well, not NOW now, but when we get this jalopy running. We’ll be able to say that this team’s defense saved 78 more runs than that team’s defense in these six areas, but that team’s defense saved 106 more runs than that team’s defense in the other five areas. We’ll be able to distinguish between pitching and defense in ways that we can’t now. My mind is roiling with like a million things we can do with this data, most of which, of course, won’t turn out to be half as interesting in fact as they seem like they might be in theory.
But the first purpose, of course, is to create a platform of equality between hitters and fielders. Comparing a slugging first baseman to a defender—Boog Powell to Mark Belanger, or Howie Kendrick to Yan Gomes, let’s say, we are equipping some future Yan Gomes to go to his arbitration hearing and say that yes, Howie Kendrick created 58 and I created only 39 runs in similar playing time, but I Kendrick prevented only 14 runs, and I prevented 41. . . ..or whatever the data shows.
Or this. We are equipping some future GM, in contemplating a trade, to look at the record of a 28-year-old free agent shortstop, and say "OK, he saved 46 runs in 2031, 47 in 2032, 53 in 2033, 45 last year, which was 2034. Our shortstops the last four years have saved 38, 40, 36, and last year just 15, so signing this guy is going to save us 10-15 runs a year."
Of course, there are methods now that enable us to evaluate fielders, but this is my real point. It has always bothered me, about fielder’s evaluations, that they are so difficult to cross-check with alternative approaches.
Runs created methods are based on objective methods, and can be cross-checked in many, many different ways. You can see that teams do in fact score the number of runs that our methods predict that they should score. There is a great internal consistency in them, and you can predict how many runs they would lose if this player was injured or traded or out of the lineup. You can judge how well those methods work.
Or let me explain it this way. My methods of approaching Runs are one way; Pete Palmer’s were radically different—and yet they converge on shared conclusions.
Runs Saved +/- estimates are no doubt accurate and reliable to a good extent; I am not suggesting that they are not. What I am saying is that I wish there were better ways to check. SOMETIMES I believe what the Runs Saved systems say about a defender, and sometimes I don’t. When I don’t, I’m limited as to what I can say on the other side.
This, I am hoping, will be a way to construct that second look at the issue—often reaching the same conclusions, no doubt, but if not that, then constructing a way to challenge those conclusions. That is my main purpose here.