Randomness Vs. Reason
Bill said: Have you considered possibly getting rid of user options within the framework, and relying instead on a system that represents your best estimate of the player's value?
I consider it a feature, not a bug, that Reference and Fangraphs each implemented their own version of WAR without my involvement or blessing. And when I create my own implementation of WAR for Baseball Savant, I'll make different choices.
The key, for me, is that I'll be able to trace the value down to the single play level. Someone will be able to ask "How much did Stanton's double on Aug 4, 2023 at Fenway, with a runner on 1B, in the 7th inning, down by 3 affect his value?", and I will be able to give a specific answer. It may not be the answer that Bill James or Joe Posnanski or John Doe would want to give because you might each see that situation in a different way, but I would at least be able to state the plausible reason for it.
Tom Tango
****************************************************
It's not a feature, Tom. It's a bug. It’s a gigantic bug. It is a bug which might ultimately destroy your life’s work.
Or not. Maybe it’s a feature; who knows.
This probably seems like an odd place from which to launch the essay, but the 19th century landscape architect Frederick Law Olmstead took the opposite position from yours. Olmstead, for those of you who recognize the name but can’t quite place it, is the king of landscape architects, the father of the field. If you live in a city, no matter what city it is, it is likely that the largest and most prominent green space in your town was laid out by Olmstead—or if not Olmstead, at least by one of his acolytes.
Olmstead was endlessly frustrated by cities attempting to force things into the spaces he had created. No sooner would Olmstead dedicate the park that he had designed than people would show up wanting to chip off a piece of it as a place for a flower garden, or a band shell, or a gazebo, or a petting zoo, or a playground, or an open-air theater, or God knows what. This drove Olmstead absolutely bonkers. "Suppose that an architect designed a beautiful building," he would say. . . .not these words but something like it. "Suppose that an architect designed a fantastically beautiful building, but as soon as he had cut the ribbon, people would line up wanting to put a candy store in the lobby, or a small lending library, or a local version of the Hyde Park Speaker’s Corner. If this was the way it was, there would be no beautiful buildings."
On the one hand I could see Olmstead’s point; on the other hand, it was a fantastically arrogant position, as if Olmstead by his genius had claimed permanent dominion over a good-sized portion of every city center. In the 120 years since his death, some cities have crowded his "emerald necklaces" with all manner of alternative uses, while others have more or less kept their hands off. Many of the other uses are now among the blessings of the city, and some of them at not among the blessings of the city. You can argue it either way. As I see it, Tom has invited people to use his art museum as a public toilet.
Going back to pre-sabermetrics, going back to the world in which I grew up, it was almost a universal assumption that there was little OR NO connection between statistical accomplishments by individual players and success by teams. The general understanding, really the universal assumption, was that players had good individual statistics or poor ones, and there was some loose connection between playing skills and player stats, evident in the case of a player like Willie Mays or Stan Musial, but not any reliable or definitive connection. Statistics weren’t about winning. Statistics were one thing; winning games was a different thing. Players had to evaluated by scouting, reputation, and witness, with their statistics standing at the back of the room.
This blind spot was created by about 6 or 8 supporting misunderstandings.
1) People (meaning the public, the media, and the professional community) had little or no understanding of park effects and no generalized, systematic references to park effects. They often confused performances favored by the environment with outstanding performances. Pete Palmer, more than anyone else, helped to improve understanding in this area.
2) People (professionals and media) had limited understanding about which statistics were important and which were not. The importance of batting averages was greatly exaggerated--a point that Tom makes regularly--while on base percentages were a non-entity. RBI and Wins by pitchers were given much more importance than was appropriate, as were stolen bases, while power and broad-based accomplishment were under-valued.
3) The role of random variance in the creation of statistics was--and still is today--vastly underrated. Little progress has been made in this area, and the belief in Baseball Reference WAR and Fangraphs WAR has contributed to that lack of progress.
4) People universally or almost universally assumed that WALKS were a function of (were controlled by) pitchers, and that individual variations in walk rates reflected merely the choices of pitchers. In reality the hitter is a larger element in when a walk occurs than is the pitcher.
5) The public did not understand that differences in team wins, while they were a direct result of individual player performances, were much greater than the differences in individual performances. In other words, they did not realize that in order for Team A to win twice as many games as Team B, Team A did not need to be twice as good as Team B at anything, but merely 40% better, meaning 20% better on offense and 20% better in pitching and defense. Therefore, small differences in offensive elements were of much more significance than they appeared to be.
6) Clutch ability was 100% accepted by the entire baseball community until Dick Cramer raised the issue of whether it actually exists, I think in 1975, could have been 1974 or 1976. Clutch ability was not only accepted, but was assumed to be of IMMENSE importance in winning games. I think it is not an overstatement to say that media members in the 1960s believed that an actual ABILITY to perform in the clutch was the largest reason that teams won or lost. Clutch performance explained RBI, it explained wins in close games, and it explained ultimate successes such as winning the World Series.
Voros McCracken's realization that hits on balls in play were essentially a random outcome was the biggest thing that re-ordered the media's thinking on that issue.
Just giving credit to those who helped the public to see the light. . .Dick, Voros, Pete, yourself, and others.
But as a consequence of these "blinders", these misunderstandings and this misplaced emphasis, people did not see that there were reliable connections between individual stats and team wins. They could and did believe, for example, that Ken Boyer in 1964 was more valuable than Willie Mays, that Dick Groat in 1960 was more valuable than Eddie Mathews or Hank Aaron or Willie Mays, that George Bell in 1987 was more valuable than Alan Trammell or Wade Boggs. There was no method to systematically process individual accomplishments into value.
What we essentially did, in the first generation of sabermetrics, was to establish that individual playing statistics DO translate into value, if you adjust for the park effects, if you focus on on-base percentage and power as well as batting average, if you understand that timing is more determined by random screens than by clutch ability, if you understand that the stolen base is essentially a break-even effort. Essentially, what we did was to prove to the satisfaction of almost everyone that statistics CAN be translated into value, if you understand what you are looking at.
A sighted person cannot imagine what it is like to be blind. No one can understand what it is like not to know things that you have always known. I wrote about 1978 that "the purpose of batting is not to have a high batting average, not to have good counting statistics in any category, but rather, to create runs. A hitter should be evaluated by how many runs he creates." This is now so obvious that it is impossible to imagine that people in the 1960s lacked this understanding. It’s just obvious to us, therefore it doesn’t seem possible that we lived in a world in which this was not understood.
But the dead horse that I am beating is this: that if you destroy the predictive connection between individual performance and value to the team, you destroy sabermetrics. Blanking out or zero-valuing selected performance parameters blocks the predictive connection of those stats to wins and losses. This draws us back toward the pre-sabermetric world in which people did not see the connection between performance and team wins. It creates a blind spot, essentially like the blind spots that preceded sabermetrics.
And there is every reason in the world to believe that if you facilitate that practice, it will grow over time. In theory, it could grow until it entirely eviscerates our field. Stated another way, if we endorse the work of people who place assumptions antithetical to the fundamental principles of sabermetrics inside of formulas used to calculate value, that negates the work of our field. It states, in essence, that player performance does NOT reliably correlate to success for the team. This throws us back into the assumptions of the 1950s, the 1960s and before.
If I understand correctly the logic of our friends on the other side of this issue, it is that that which cannot be shown to be a result of actual ability cannot be assumed to have value. That which cannot be shown to be a result of actual ability must be assumed to have occurred at random, or by luck. I would agree that that which cannot be shown to be a result of actual ability cannot be assumed to have (very much) value FOR THE FUTURE. But when applied to what has happened in the past, this same assumption becomes a dagger aimed at the heart of sabermetrics.
Why?
Because that same assumption can be applied to almost anything that happened in the past. A player hits .225 with little power with the bases empty, but hits like Babe Ruth when there are runners on base. As a result of this, his team wins an extra five or six games.
Well, they want to say, that was just luck. He can’t replicate that next year, so it was just luck, so we can’t attribute any additional value to that player. We have to let those extra five or six wins go unexplained.
Or this one. A team scores 750 runs and allows 700, but because they do very well in 3-2 games and 4-3 games and 5-4 games, they win 95 games when they could be expected to win only 86. Because they win 95 games, they are in position to compete in the World Series, which they win.
Another team also scores 750 runs and allows 700, but because they do poorly in 1-0 games and 2-1 games an 6-5 games, they win only 78 games, their Manager and General Manager get fired, and the team is overhauled for the next season.
The WAR Baron of whom I speak wants to say that these two teams are really the same. All value resides in runs scored and runs allowed, he wants to say. Departures from runs scored/allowed in the "wins" column, he wants to say, cannot be replicated in other seasons, therefore they are just luck, therefore they should be ignored in assessing contributions to victory.
But winning games is what sports are about. My argument is simple: a team which wins 95 games must be evaluated as if they had won 95 games. A team that wins 78 games must be evaluated as if they had won 78 games. Period.
I acknowledge that in designing Win Shares, I was too literal about this, too absolute. There IS such a thing as luck, and teams do win or lose, to an extent, because they are lucky. I should have allowed some room in the system for what must reasonably be represented as luck.
But there are also real things in the universe of sports that are difficult to measure, so far impossible to measure. Sometimes we can’t measure things because they don’t exist, and sometimes we can’t measure things because they are just too difficult to measure. The processes needed to measure them are too delicate for us, too precise. There are real things that really exist, but which we can’t measure by formulas and normal relationships. I wrote about that in "underestimating the fog."
To assert that everything we cannot presently measure must be just luck is arrogant. But it also has two quite terrible consequences. One is that it obstructs the future development of understanding. Once we assert that everything we do not understand happened merely because of luck, then we have an explanation for all of that which we don’t understand, and therefore we are no longer searching for any better explanation. We’ve given up.
I just stumbled across a quote from Albert Einstein that explains perfectly what I have been trying to say for 40 years: It is not that I am so smart, but that I stay with problems longer. If you say that that which do not understand is random, just luck, then you are saying that we should walk away from this problem. I believe what we should say is not "this is just random." What we should say is that "there is something here which we do not fully understand." Not fully understanding it, not being able to explain it in specific terms, we must allocate it to all of the players on the team proportional to their playing time, or proportional to their documented contributions to victory. What we absolutely must NOT do is just to say that it is all random and therefore meaningless.
The second reason that it is terrible to do this is that, once you begin to do that, it will spread without limit.
Voros demonstrated that a pitcher giving up hits on balls in play is a random outcome, not a predictable outcome. A pitcher who allows 700 balls in play over the course of a season may wind up allowing 200 hits on those balls in play, or he may end up allowing 250. It’s just random.
Well, if it is just random, why should we pay any attention to it? The consequence of allowing 200 hits rather than 250 may be 1.50 runs per game on his ERA, but why pay any attention to that? That’s not skill, is it? It’s just luck.
And then the same applies to hitters. Norm Cash had a .372 batting average on balls in play one season, .218 the next year, with his three true outcomes (strikeouts, walks and homers) being almost exactly the same in both seasons. Why? A corked bat would never explain that. Why did it happen? It’s just random. Batting averages on balls in play are just random. BABIP.
So those are out the window, too. Why are we seeing value in players’ batting averages on balls in play? Why are we attributing value to that? It’s just random.
And home runs? Why, home run frequency are just as subject to random variations as are batting averages on balls in play. A player hits 45 homers one year, 27 the next. So what? It’s just random. We haven’t documented the randomness in the same ways, but that’s still true. Every year he hits a certain number of long fly balls hit into the wind, and a certain number of long fly balls hit WITH the wind. Every year he faces a different, and random, set of pitchers. Every year those pitchers make mistakes at random. It doesn’t even out. Logically, it CAN’T even out. A player hits 45 homers, then 27, then 41, then 40, then 32. It’s just random. Why should we pay any attention to those random variations?
Home Runs are called a "true" outcome, but they are only called a true outcome because it is an immediate outcome, a dispositive outcome. Otherwise, it is no more a true outcome than is a single or a fly ball to left. And strikeouts? Strikeouts are prey to as many randomizing factors as is anything on baseball field. Mickey Mantle struck out 75 times in 623 plate appearances in 1957, 126 times in 639 plate appearances in 1959. You think it’s not random? Of course it’s random.
I can’t prove this at the present time, but if you check, you’ll find that this is true. A player gets 110 hits on 400 balls in play one year, 130 hits the next year; it’s just random. He strikes out 110 times one year; he strikes out 130 the next year. It’s just random. If you study it, you will find that the up-and-down variations in strikeouts are on the same scale as the up-and-down variations in hits—yet we say that one of them is a "true" outcome, and the other is a random perturbation. It makes no sense. There is randomization in all of it.
So once you say that THIS should be ignored because it is just a random perturbation, you are logically compelled to say that THAT too should be ignored because it is just a random perturbation, and this other thing, and that one as well. Playing time is random. This player steps into an empty job when he is 21 years old, starts out 17-for-50 (.340), holds a job for 15 years and gets 2500 hits and goes into the Hall of Fame. This other player doesn’t get a shot until he was 23 because there happens to be no job open for him, breaks his ankle a week into his career, gets another shot the next year and, at random, starts out 9-for-50, and concludes his career with 14 hits and 1 homer, and it is just random. He is just as good as the other guy; it was just random.
But what are you left with? At the end of this process you will be left with nothing. At the end of the process, by declaring things to be random, you will be left interpreting all successes and all failures as just something that happened. The only evaluation left to you will be 21st century scouting—exit velocities and barrel percentages and running speed and spin rates and some sort of tracking percentage that we haven’t quantified yet. You’ll have no explanation for why anybody won or why they lost, or what the value of any player’s contributions were. I know that you don’t believe it yet, but that is where that road leads.
And then, on the other hand, there is this road:
The key, for me, is that I'll be able to trace the value down to the single play level. Someone will be able to ask "How much did Stanton's double on Aug 4, 2023 at Fenway, with a runner on 1B, in the 7th inning, down by 3 affect his value?", and I will be able to give a specific answer.
Well that’s great, if you can do that. If you can do that, that’s the exact opposite of the road that others have started down, the road of attributing nothing to anything except it was just luck. The problem I would foresee with that is that between the vastly unequal weights of essentially similar events—ie Mookie Betts and Bill Mazeroski. . . .between the vastly unequal weights given to similar events and the inherently de-stabilizing effects of the structure of WAR. Between those two you’re going to wind up with data that will look to everyone else as if it is unreasonable. Two players in similar parks and with similar defense, each of them hits .275 with 20 homers, but one of them will be evaluated with 8 WAR and the other only with 1 WAR, I think.
But apart from that, I’m all with you in pursuing that goal. I am with you, because inside this universe of randomness, there is something real that is happening, something that is predictable and unpredictable, something that is both understood and not understood. What you have to do to gain insight into that world, and to take understanding from it, it to stay with the problem long enough.