Remember me

Randomness vs. Reason

January 11, 2023
                                         Randomness Vs. Reason

 

 

 

Bill said: Have you considered possibly getting rid of user options within the framework, and relying instead on a system that represents your best estimate of the player's value?  

 

I consider it a feature, not a bug, that Reference and Fangraphs each implemented their own version of WAR without my involvement or blessing.  And when I create my own implementation of WAR for Baseball Savant, I'll make different choices.  

 

The key, for me, is that I'll be able to trace the value down to the single play level.  Someone will be able to ask "How much did Stanton's double on Aug 4, 2023 at Fenway, with a runner on 1B, in the 7th inning, down by 3 affect his value?", and I will be able to give a specific answer.  It may not be the answer that Bill James or Joe Posnanski or John Doe would want to give because you might each see that situation in a different way, but I would at least be able to state the plausible reason for it.  

 

 

Tom Tango

 

 

****************************************************

 

It's not a feature, Tom.  It's a bug.  It’s a gigantic bug.  It is a bug which might ultimately destroy your life’s work. 

 

Or not.  Maybe it’s a feature; who knows.

 

This probably seems like an odd place from which to launch the essay, but the 19th century landscape architect Frederick Law Olmstead took the opposite position from yours.  Olmstead, for those of you who recognize the name but can’t quite place it, is the king of landscape architects, the father of the field.  If you live in a city, no matter what city it is, it is likely that the largest and most prominent green space in your town was laid out by Olmstead—or if not Olmstead, at least by one of his acolytes.

 

Olmstead was endlessly frustrated by cities attempting to force things into the spaces he had created.  No sooner would Olmstead dedicate the park that he had designed than people would show up wanting to chip off a piece of it as a place for a flower garden, or a band shell, or a gazebo, or a petting zoo, or a playground, or an open-air theater, or God knows what.  This drove Olmstead absolutely bonkers.  "Suppose that an architect designed a beautiful building," he would say. . . .not these words but something like it.  "Suppose that an architect designed a fantastically beautiful building, but as soon as he had cut the ribbon, people would line up wanting to put a candy store in the lobby, or a small lending library, or a local version of the Hyde Park Speaker’s Corner.  If this was the way it was, there would be no beautiful buildings." 

 

On the one hand I could see Olmstead’s point; on the other hand, it was a fantastically arrogant position, as if Olmstead by his genius had claimed permanent dominion over a good-sized portion of every city center.  In the 120 years since his death, some cities have crowded his "emerald necklaces" with all manner of alternative uses, while others have more or less kept their hands off.  Many of the other uses are now among the blessings of the city, and some of them at not among the blessings of the city.  You can argue it either way. As I see it, Tom has invited people to use his art museum as a public toilet. 

 

Going back to pre-sabermetrics, going back to the world in which I grew up, it was almost a universal assumption that there was little OR NO connection between statistical accomplishments by individual players and success by teams.  The general understanding, really the universal assumption, was that players had good individual statistics or poor ones, and there was some loose connection between playing skills and player stats, evident in the case of a player like Willie Mays or Stan Musial, but not any reliable or definitive connection.  Statistics weren’t about winning. Statistics were one thing; winning games was a different thing.  Players had to evaluated by scouting, reputation, and witness, with their statistics standing at the back of the room. 

 

This blind spot was created by about 6 or 8 supporting misunderstandings.   

 

1)  People (meaning the public, the media, and the professional community) had little or no understanding of park effects and no generalized, systematic references to park effects.  They often confused performances favored by the environment with outstanding performances.   Pete Palmer, more than anyone else, helped to improve understanding in this area.  

 

2)  People (professionals and media) had limited understanding about which statistics were important and which were not.  The importance of batting averages was greatly exaggerated--a point that Tom makes regularly--while on base percentages were a non-entity.   RBI and Wins by pitchers were given much more importance than was appropriate, as were stolen bases, while power and broad-based accomplishment were under-valued. 

 

3)  The role of random variance in the creation of statistics was--and still is today--vastly underrated.  Little progress has been made in this area, and the belief in Baseball Reference WAR and Fangraphs WAR has contributed to that lack of progress.  

 

4)  People universally or almost universally assumed that WALKS were a function of (were controlled by) pitchers, and that individual variations in walk rates reflected merely the choices of pitchers.  In reality the hitter is a larger element in when a walk occurs than is the pitcher. 

 

5)  The public did not understand that differences in team wins, while they were a direct result of individual player performances, were much greater than the differences in individual performances.  In other words, they did not realize that in order for Team A to win twice as many games as Team B, Team A did not need to be twice as good as Team B at anything, but merely 40% better, meaning 20% better on offense and 20% better in pitching and defense.  Therefore, small differences in offensive elements were of much more significance than they appeared to be.

 

6)  Clutch ability was 100% accepted by the entire baseball community until Dick Cramer raised the issue of whether it actually exists, I think in 1975, could have been 1974 or 1976.  Clutch ability was not only accepted, but was assumed to be of IMMENSE importance in winning games.   I think it is not an overstatement to say that media members in the 1960s believed that an actual ABILITY to perform in the clutch was the largest reason that teams won or lost.   Clutch performance explained RBI, it explained wins in close games, and it explained ultimate successes such as winning the World Series. 

 

Voros McCracken's realization that hits on balls in play were essentially a random outcome was the biggest thing that re-ordered the media's thinking on that issue.   

 

Just giving credit to those who helped the public to see the light. . .Dick, Voros, Pete, yourself, and others.   

 

But as a consequence of these "blinders", these misunderstandings and this misplaced emphasis, people did not see that there were reliable connections between individual stats and team wins.  They could and did believe, for example, that Ken Boyer in 1964 was more valuable than Willie Mays, that Dick Groat in 1960 was more valuable than Eddie Mathews or Hank Aaron or Willie Mays, that George Bell in 1987 was more valuable than Alan Trammell or Wade Boggs.   There was no method to systematically process individual accomplishments into value.   

 

 

What we essentially did, in the first generation of sabermetrics, was to establish that individual playing statistics DO translate into value, if you adjust for the park effects, if you focus on on-base percentage and power as well as batting average, if you understand that timing is more determined by random screens than by clutch ability, if you understand that the stolen base is essentially a break-even effort.  Essentially, what we did was to prove to the satisfaction of almost everyone that statistics CAN be translated into value, if you understand what you are looking at.  

 

A sighted person cannot imagine what it is like to be blind. No one can understand what it is like not to know things that you have always known.  I wrote about 1978 that "the purpose of batting is not to have a high batting average, not to have good counting statistics in any category, but rather, to create runs.  A hitter should be evaluated by how many runs he creates."  This is now so obvious that it is impossible to imagine that people in the 1960s lacked this understanding.   It’s just obvious to us, therefore it doesn’t seem possible that we lived in a world in which this was not understood. 

 

But the dead horse that I am beating is this: that if you destroy the predictive connection between individual performance and value to the team, you destroy sabermetrics.  Blanking out or zero-valuing selected performance parameters blocks the predictive connection of those stats to wins and losses.  This draws us back toward the pre-sabermetric world in which people did not see the connection between performance and team wins.  It creates a blind spot, essentially like the blind spots that preceded sabermetrics.  

 

And there is every reason in the world to believe that if you facilitate that practice, it will grow over time.  In theory, it could grow until it entirely eviscerates our field.   Stated another way, if we endorse the work of people who place assumptions antithetical to the fundamental principles of sabermetrics inside of formulas used to calculate value, that negates the work of our field.  It states, in essence, that player performance does NOT reliably correlate to success for the team. This throws us back into the assumptions of the 1950s, the 1960s and before. 

 

If I understand correctly the logic of our friends on the other side of this issue, it is that that which cannot be shown to be a result of actual ability cannot be assumed to have value.   That which cannot be shown to be a result of actual ability must be assumed to have occurred at random, or by luck.   I would agree that that which cannot be shown to be a result of actual ability cannot be assumed to have (very much) value FOR THE FUTURE.  But when applied to what has happened in the past, this same assumption becomes a dagger aimed at the heart of sabermetrics. 

 

Why?

 

Because that same assumption can be applied to almost anything that happened in the past.  A player hits .225 with little power with the bases empty, but hits like Babe Ruth when there are runners on base.   As a result of this, his team wins an extra five or six games. 

 

Well, they want to say, that was just luck.  He can’t replicate that next year, so it was just luck, so we can’t attribute any additional value to that player.   We have to let those extra five or six wins go unexplained. 

 

Or this one.  A team scores 750 runs and allows 700, but because they do very well in 3-2 games and 4-3 games and 5-4 games, they win 95 games when they could be expected to win only 86.  Because they win 95 games, they are in position to compete in the World Series, which they win.

 

Another team also scores 750 runs and allows 700, but because they do poorly in 1-0 games and 2-1 games an 6-5 games, they win only 78 games, their Manager and General Manager get fired, and the team is overhauled for the next season. 

 

The WAR Baron of whom I speak wants to say that these two teams are really the same.  All value resides in runs scored and runs allowed, he wants to say.  Departures from runs scored/allowed in the "wins" column, he wants to say, cannot be replicated in other seasons, therefore they are just luck, therefore they should be ignored in assessing contributions to victory. 

 

But winning games is what sports are about.  My argument is simple:  a team which wins 95 games must be evaluated as if they had won 95 games.  A team that wins 78 games must be evaluated as if they had won 78 games.  Period. 

 

I acknowledge that in designing Win Shares, I was too literal about this, too absolute.  There IS such a thing as luck, and teams do win or lose, to an extent, because they are lucky. I should have allowed some room in the system for what must reasonably be represented as luck. 

 

But there are also real things in the universe of sports that are difficult to measure, so far impossible to measure. Sometimes we can’t measure things because they don’t exist, and sometimes we can’t measure things because they are just too difficult to measure.  The processes needed to measure them are too delicate for us, too precise.   There are real things that really exist, but which we can’t measure by formulas and normal relationships.  I wrote about that in "underestimating the fog." 

 

To assert that everything we cannot presently measure must be just luck is arrogant.  But it also has two quite terrible consequences.  One is that it obstructs the future development of understanding.  Once we assert that everything we do not understand happened merely because of luck, then we have an explanation for all of that which we don’t understand, and therefore we are no longer searching for any better explanation.  We’ve given up. 

 

I just stumbled across a quote from Albert Einstein that explains perfectly what I have been trying to say for 40 years:  It is not that I am so smart, but that I stay with problems longer.  If you say that that which do not understand is random, just luck, then you are saying that we should walk away from this problem.   I believe what we should say is not "this is just random."  What we should say is that "there is something here which we do not fully understand."  Not fully understanding it, not being able to explain it in specific terms, we must allocate it to all of the players on the team proportional to their playing time, or proportional to their documented contributions to victory.  What we absolutely must NOT do is just to say that it is all random and therefore meaningless. 

 

The second reason that it is terrible to do this is that, once you begin to do that, it will spread without limit.  

 

Voros demonstrated that a pitcher giving up hits on balls in play is a random outcome, not a predictable outcome.  A pitcher who allows 700 balls in play over the course of a season may wind up allowing 200 hits on those balls in play, or he may end up allowing 250.  It’s just random.  

 

Well, if it is just random, why should we pay any attention to it?   The consequence of allowing 200 hits rather than 250 may be 1.50 runs per game on his ERA, but why pay any attention to that?  That’s not skill, is it?  It’s just luck.

 

And then the same applies to hitters.  Norm Cash had a .372 batting average on balls in play one season, .218 the next year, with his three true outcomes (strikeouts, walks and homers) being almost exactly the same in both seasons.   Why?  A corked bat would never explain that.   Why did it happen?  It’s just random.  Batting averages on balls in play are just random.  BABIP.

 

So those are out the window, too.  Why are we seeing value in players’ batting averages on balls in play?  Why are we attributing value to that?  It’s just random. 

 

And home runs?   Why, home run frequency are just as subject to random variations as are batting averages on balls in play.  A player hits 45 homers one year, 27 the next.  So what?  It’s just random.  We haven’t documented the randomness in the same ways, but that’s still true.   Every year he hits a certain number of long fly balls hit into the wind, and a certain number of long fly balls hit WITH the wind.  Every year he faces a different, and random, set of pitchers.   Every year those pitchers make mistakes at random.  It doesn’t even out.  Logically, it CAN’T even out.   A player hits 45 homers, then 27, then 41, then 40, then 32.  It’s just random.  Why should we pay any attention to those random variations? 

 

Home Runs are called a "true" outcome, but they are only called a true outcome because it is an immediate outcome, a dispositive outcome.  Otherwise, it is no more a true outcome than is a single or a fly ball to left.  And strikeouts?  Strikeouts are prey to as many randomizing factors as is anything on baseball field.   Mickey Mantle struck out 75 times in 623 plate appearances in 1957, 126 times in 639 plate appearances in 1959.  You think it’s not random?  Of course it’s random. 

 

I can’t prove this at the present time, but if you check, you’ll find that this is true.  A player gets 110 hits on 400 balls in play one year, 130 hits the next year; it’s just random.   He strikes out 110 times one year; he strikes out 130 the next year.   It’s just random.  If you study it, you will find that the up-and-down variations in strikeouts are on the same scale as the up-and-down variations in hits—yet we say that one of them is a "true" outcome, and the other is a random perturbation.  It makes no sense.   There is randomization in all of it. 

 

So once you say that THIS should be ignored because it is just a random perturbation, you are logically compelled to say that THAT too should be ignored because it is just a random perturbation, and this other thing, and that one as well.  Playing time is random.   This player steps into an empty job when he is 21 years old, starts out 17-for-50 (.340), holds a job for 15 years and gets 2500 hits and goes into the Hall of Fame.  This other player doesn’t get a shot until he was 23 because there happens to be no job open for him, breaks his ankle a week into his career, gets another shot the next year and, at random, starts out 9-for-50, and concludes his career with 14 hits and 1 homer, and it is just random.  He is just as good as the other guy; it was just random.     

 

But what are you left with?  At the end of this process you will be left with nothing.  At the end of the process, by declaring things to be random, you will be left interpreting all successes and all failures as just something that happened.  The only evaluation left to you will be 21st century scouting—exit velocities and barrel percentages and running speed and spin rates and some sort of tracking percentage that we haven’t quantified yet.   You’ll have no explanation for why anybody won or why they lost, or what the value of any player’s contributions were.   I know that you don’t believe it yet, but that is where that road leads. 

 

And then, on the other hand, there is this road:

 

The key, for me, is that I'll be able to trace the value down to the single play level.  Someone will be able to ask "How much did Stanton's double on Aug 4, 2023 at Fenway, with a runner on 1B, in the 7th inning, down by 3 affect his value?", and I will be able to give a specific answer.

 

Well that’s great, if you can do that.   If you can do that, that’s the exact opposite of the road that others have started down, the road of attributing nothing to anything except it was just luck.  The problem I would foresee with that is that between the vastly unequal weights of essentially similar events—ie Mookie Betts and Bill Mazeroski. . . .between the vastly unequal weights given to similar events and the inherently de-stabilizing effects of the structure of WAR.   Between those two you’re going to wind up with data that will look to everyone else as if it is unreasonable.  Two players in similar parks and with similar defense, each of them hits .275 with 20 homers, but one of them will be evaluated with 8 WAR and the other only with 1 WAR,  I think.  

 

But apart from that, I’m all with you in pursuing that goal.  I am with you, because inside this universe of randomness, there is something real that is happening, something that is predictable and unpredictable, something that is both understood and not understood.  What you have to do to gain insight into that world, and to take understanding from it, it to stay with the problem long enough.  

 
 

COMMENTS (30 Comments, most recent shown first)

TheRicemanCometh
Just a tidbit to the John Hiller comment. WPA lists him at 8.4 that year, which I believe is the highest ever for a relief pitcher. He led the majors (pitchers and hitters) in WPA that year.
3:03 PM Jan 19th
 
Anyone
I'm with Bill in that in evaluating a player's past/career accomplishments, performance should be the main factor rather than ability, including such aspects of performance as clutch performance that are, at the least, much more noise (which I prefer as another poster does to "random") than ability, if there is any ability involved at all.

I'd say also we need to be careful with that, though, to avoid giving players credit for having good teammates, which is why I really hate any use of pitcher wins. But if an average player hits like Babe Ruth in clutch situations a given year, then yes: That player should get extra credit, not when considering a future contract, but in evaluating his performance that season.

A couple of notes on BABIP, as applied, among other things, to Cash: Cash probably did something a lot better when he had the great BABIP than when he had an awful one. Voros asserted that pitchers have no impact on BABIP (which has evolved among almost everyone to mean pitchers have a lot less impact on BABIP than we thought, but not zero), though even in Voros' view it isn't random, as much of the difference is his defense's fielding.

But a batter's BABIP is largely ability and no one I know of claims that that part is mostly noise.
6:41 PM Jan 16th
 
raincheck
We are measuring two things. They should have two measurements.

1) What happened last year. I am strongly in the Bill camp here. Our knowledge is incredibly imperfect. Our measures should start with what happened and allocate it as intelligently as we can today to the players who made it happened. And we should work hard to better at it next year.

2) What is likely to happen next year? Here Tom’s think8ng is helpful. Should I sign Norm Cash to huge contract based on what he did last year? Should I dump Nolan Ryan because I can replace him with two 8-7 pitchers? This is where we should be trying to separate skill from luck. As best we can. And get better at it every year.

What happened happened. And our measurements of what happened need to reflect what happened. To do otherwise assumes knowledge we don’t have. But we can do our best to understand what is repeatable as well.
12:58 PM Jan 14th
 
mauimike
Here's what I think is interesting. Mr. James is becoming wise. I've always wanted to be wise and I think I'm getting there. Bill, has been thinking about, trying to figure out, wondering about baseball for 50 or 60 years and he realizes that he might not know more now than when he started and think of all the work he's done. I don't understand all the numbers you boys use and I don't want to think as deeply as you do, but I like reading your conclusions and following your thought processes, but in the end, it doesn't mean much. Life is what it is and we'll never know enough to understand most of it. That's life. Do any of us understand women? But most of us have spent most of our lives with one and for most of us they are a necessary part of our lives, but do you know them. Maybe some, but not all and we manage. We figure things out as best we can and call the rest luck, or f**k it I don't know. Wisdom is knowing how little we know. How little we'll ever know and to just be resigned to it. And most important, have a sense of humor about it. Because we ain't going to figure it all out. We don't have enough information and we don't know, what we don't know. And that's Ok. It's fun to try, just don't take yourself to seriously. Nobody else does.
6:46 AM Jan 14th
 
FrankD
I like this debate. I would add that the discrepancy or even the 'failures' of any of the analyses described is not a flaw, its an opportunity for improvement of the analyses. It is by studying error that science/understanding advances. Of course, we fight tooth-and-nail for our position, in academia your position is based upon your past work (and true for some sabermatricians too), and an attack/correction on your past work is seen as an attack on your own edifice. But that is how knowledge advances ..... this debate here is extremely calm compared to the fights in academia.
11:41 PM Jan 13th
 
tangotiger
Well, that's a data limitation. If you notice, Fangraphs has their win probability go back only to 1974. So, 1973-and-earlier, Fangraphs is not availing themselves of whatever Leverage Index that Reference is aware of.

I'll let David Appelman know that maybe they should make an explicit note about that.
6:11 PM Jan 13th
 
docfordock
My concern is with things like the John Hiller problem. Fwar for 1973 is 2.8, Bref has him at 7.9. I understand the differing assumption involved and the arithmetic that leads to the differing answers. But both are attempting to measure essentially the same thing - Hiller's win contribution to his team based on run prevention, while attempting to adjust for relative contributions for defense. But the answers are radically different.

The different implementations are a positive in that they force assumptions to be stated explicitly and allow them to be contrasted in a clear and transparent way. That's a good thing - a feather in the WAR cap. But in this instance the effect is to highlight analytical difficulties in measuring the value contributions of pitchers and fielders and the magnitude of the potential effects.
5:09 PM Jan 13th
 
tangotiger
Well-said! I do find it interesting how folks might have a different view of something based on the decimals. He hits ONE HUNDRED POINTS higher in comparing .340 to .240 certainly sounds like a lot compared to TEN POINTS in comparing .34 to .24.

Then you get the "he can't hit his weight", etc. It's all so very silly. As Bull Durham reminded us, 50 points is one squibbler a week.
4:47 PM Jan 13th
 
tangotiger
Well-said! I do find it interesting how folks might have a different view of something based on the decimals. He hits ONE HUNDRED POINTS higher in comparing .340 to .240 certainly sounds like a lot compared to TEN POINTS in comparing .34 to .24.

Then you get the "he can't hit his weight", etc. It's all so very silly. As Bull Durham reminded us, 50 points is one squibbler a week.
4:47 PM Jan 13th
 
hotstatrat
It's not the fault of WAR or even Batting Average that an insignificant extra decimal gives a false assurance of precision. It tells us how close the number is the next highest decimal of significance. It's the interpreter's fault for putting any unmerited significance on that extra decimal.
4:33 PM Jan 13th
 
tangotiger
doc: I can easily suggest to everyone that they present WAR in steps of 0.5 wins, or 0.25 wins, and make that that uncertainty level.

Win Shares bypasses that by multiplying by 3, which is exactly the same as rounding in steps of 0.333 wins.

But, I'm not going to force or insist on it. One decimal place is fine for single season, and 0 decimals for multi-year. If this is the biggest problem that I have to handle, then that's fine.
1:59 PM Jan 13th
 
docfordock
It seems to me Bill's comment conflates two different one criticisms - one about WAR being a framework with multiple possible implementations and a second about the nature of uncertainty and measurement error impacting on WAR regardless of the implementation.

I agree with tangotiger and jwilt about the usefulness of accommodating different implementations for the reasons they've given here and elsewhere.

But the uncertainty critique stands. With WAR calculating results to tenths of a win, there is an inherent false precision (one that is particularly glaring when addressing defense and pitching). And while proponents can fairly acknowledge the point and make the appropriate caveats, the reality of how WAR is actually invoked in context of MVP or HOF discussions, or in evaluating trades or contracts, undermines those caveats.

jwilt's critique that Win Shares does not seem to solve the uncertainty and measurement errors in an entirely satisfactory way is well taken. But it doesn't fix the problem for WAR either.
12:21 PM Jan 13th
 
Guy123
jwilt: I believe Win Shares allocates credit for unexplained wins (or losses) based on players' productivity (marginal runs created/saved), not based on playing time. So it will be assigned disproportionately to the best-performing players.

This approach may have some appeal to people in cases where teams were fortunate, though I don't see any reason to think that productive players are more fortunate than bad players. But it produces perverse results when a team wins fewer games than expected, punishing the most productive players for the team's ill fortune.

IF one wants to allocate these extra wins and losses, I think playing time would be a better approach.
11:57 AM Jan 13th
 
Rallymonkey5
I was curious as to the range of win probability added, so I did this test:

Players post-integration, at least 500 PA, with ops+ between 100 and 105 - average to slightly above average hitters.

Then sorted by WPA.

The best was Claudell Washington, 1982, at +4 wins. He was an average hitter overall but in high leverage hit 324/375/583.

The worst was Howie Kendrick, 2012, at -2.4 wpa. In high leverage the HK47 hit only 226/252/304.

So Bill is right, we could see a 6 win swing between players who on the surface look average.
11:51 AM Jan 13th
 
tangotiger
jwilt: Excellent, and I hope Bill reads your post. I've said that point many times, regarding the reallocation as well as the plug-and-play features of WAR, but your entire post is extremely well said.

Bill thinks it's a necessity that the sum of parts add up to the whole with regards to Wins. I think that ends up being a fool's errand unless you incorporate the timing properly, at which point you will come to the realization that it's going to introduce alot of wonky results. This is a "be careful what you wish for" scenario.

Bill sidesteps all that by looking at things at the seasonal level, and not worry about specific games that lead to those "extra" wins. It LOOKS like all the wins add up, but it's not true. It's being forced in by ignoring individual games, and just let everything add up and cancel out as much as possible.

And so, keeping the sum of parts to just RUNS is alot more palatable. The "worst" case is you will find some 20 or 30 extra runs from Dave Parker in 1986, and that actually is a GOOD thing for the most part.
11:23 AM Jan 13th
 
jwilt
When you allocate credit for the unexplainable, for the luck, by playing time (like Win Shares does) you will probably get a reasonable result most of the time. But you will certainly get it wrong much of the time. You will give credit to the knowingly undeserving and vice versa.

Example, one of my favorite teams, the 2012 Orioles. They won 93 games, but outscored their opponents by just seven runs. Win Shares allocates the credit for the additional 11 or 12 wins proportionally by playing time. Adam Jones and JJ Hardy had around 700 PAs, they get the most credit.

But we know that if you look at a metric like WPA that credits and debits for situational performance, that almost all of the '12 Orioles' additional wins were due to a historically good bullpen (or bullpen management). The bullpen was +13.5 in WPA while the hitters zero, the starting pitchers -1. So we know that proportional allocation of "excess" wins here is incorrect. Almost all of that credit should go to the bullpen and/or Buck Showalter, but instead the bulk of it goes to players like Jones and Hardy.

I think this example illustrates that by allocating wins in this way we've done nothing to clear up who was really responsible. It's a choice, and one no better or more accurate than the one WAR makes.

A framework like WAR allows one to incorporate data like that when you have it. It allows one to easily bring in new information like Statcast OAA when that became available. Win Shares does not, and years after WPA and OAA were introduced there's no straightforward way to bring in those type of things that help clear the fog.

Also, with an open and easily understood framework like WAR there will always be competing flavors. Tom Tango could have a Tom Tango site where his flavor is featured, but bb-ref and Frangraphs will always have their own. In fact, if Win Shares was more easily accessible that would happen, too. Until recently there was a Baseball Gauge site that had their own implementation of WS that was largely from Bill James' book, but didn't match in all cases. Because Bill published how to calculate WS 20+ years ago and told the sabermetric community to have at it, the door was made wide open to anyone to make their own flavor. It's just that the process is very labor intensive and time consuming, and almost no one did.​
7:15 AM Jan 13th
 
hotstatrat
Well put, Evan. Very good suggestion, Frank.
2:31 PM Jan 12th
 
FrankD
I made a typo: "Wind is blowing in, don't start Kingman"

Also - using 'randomness' to stop analysis is very close to saying "Its God's will" and we know how that thinking stymies all inquiries.
1:44 PM Jan 12th
 
FrankD
Interesting paper. I would rather use the term noise or error rather than randomness for what is now not attributable to some cause based on our current understanding. As Bill has stated lumping this into the randomness bin (or luck bin) implies that we can never make any progress in understanding into what we have declared random. If we think of what we can't currently attribute to some underlying cause as noise or error then we admit that we can attack this area of ignorance and continually increase our understanding.

An increase of data should allow us to reduce the 'noise'. Using Bill's example of a hitter striking long flies either into the wind or with the wind is not, for the past, random. Although very tedious we could go back and get the wind conditions for games in the past and then better understand what happened. Call this day-to-day park effect. And maybe this information could be used for predictions like "Wind is blowing on, don't start Kingman"....
1:38 PM Jan 12th
 
evanecurb
Stats useful in forecasting performance and stats used to record performance should be two different groups, likely with some overlap.
9:58 AM Jan 12th
 
evanecurb
Stats useful in forecasting performance and stats used to record performance should be two different groups, likely with some overlap.
9:58 AM Jan 12th
 
evanecurb
Stats useful in forecasting performance and stats used to record performance should be two different groups, likely with some overlap.
9:58 AM Jan 12th
 
hotstatrat
Life is random. We are lucky to exist. Sports are a diversion from frightening randomness of life. We create stories about our sports teams and heroes in order to cope - and statistics help us fans do that. Bill, Pete, Tom, Voros, and others have popped many of our reality bubbles created with our statistics, so we've created new ones - more sophisticated ones. The danger might be that they become or already are so sophisticated that only some fans will grasp them. Then, good or bad, we have people living in two different realities.

Do I have a point, here? Not so much, but it felt important somehow to share this tangental thought about this. That's my story.

9:52 AM Jan 12th
 
Mongo1962
AJD600 said:
In Bill's point 5 above, he says that if a team needs to be 40% better, it can be 20% better on offense and 20% better on pitching/defense. I think that would make the team 20% better overall. Yes? No??

No it adds up to an approximately 40% improvement.

If a team scores and allows 100 runs, a 20% improvement in both metrics would result in 120 runs scored (multiplying 100 runs by 1.2) and 83 runs allowed (dividing 100 runs by 1.2). That's 44.6% more runs scored than runs allowed.
9:00 PM Jan 11th
 
wovenstrap
Oh yeah, I wanted to say something briefly here about Olmstead. I did a paper on him in college. I appreciate Bill's point but the thing about Olmstead is that he was responding to a European tradition of very very curated and sculpted and pruned and symmetrical practices of park design (think Versailles). Being a rude and rough Mer'kan dude he didn't like that and wanted to find a way to create spaces in urban environments where nature could run wild. He designed Central Park and Central Park was noteworthy for having a lot of just untended crap around the corners and stuff, off to the side.

So I think THIS was his complaint. He wasn't saying "oh no, they are runing my beautiful precious jewel," he was saying "goddammit can't they let the nature run wild a little bit more? Just let it run wild and DON'T put a goddamn jungle gym on it right away."
8:21 PM Jan 11th
 
wovenstrap
smithinger. I think Bill's point is that the second question is a good deal more abstract and therefore less meaningful. So at the start of the 1996 season you can come up with a more "objective" integer to represent Maddux's worth as a stand-in for his predicted performance but the 1996 season is going to come along with all of its concrete specificity and randomness and cancel it out anyway because then those 1996 wins and losses will matter more than anyone's guess as to what 1995 "meant."

Bill is here noting the tendency of sabermetricians to grasp after that "objective" value hanging out there in space but in order to do so you have adjust everything that actually happened out of existence and you end up with a rating that doesn't describe anything at all. At least the games that happened are real and we can say "well that weak-hitting shortstop did hit those 10 big HRs that year, it doesn't fit our preconceptions of that player but they led to an actual title, or 2nd-place finish, or hot week as the case may be." You have to be grounded in the actual things that happened until we have a perfect understanding of the laws of physics and sports, which we will never attain.
8:11 PM Jan 11th
 
AJD600
In Bill's point 5 above, he says that if a team needs to be 40% better, it can be 20% better on offense and 20% better on pitching/defense. I think that would make the team 20% better overall. Yes? No?​
7:49 PM Jan 11th
 
smithinger
if you destroy the predictive connection between individual performance and value to the team, you destroy sabermetrics.

I see uses for both. If you want to answer the question, how much credit does Greg Maddux deserve for his contributions to the 90 wins achieved by the 1995 Atlanta Braves?,this is a problem of attribution. However much credit you apportion to each Atlanta Brave for their contribution in 1995, it must sum to 90 wins.

If you want to answer the question, given Greg Maddux's performance in the 1995 season, how much value would he add to a typical team the next season?, tying player performance to the 90 win number would potentially bias your results. If a team won a large percentage of 1 run games, that's unlikely to happen in the future, and you do not want to credit individuals for that.

The former question is helpful for deciding who should be MVP, who should be in HOF, etc. The latter question is helpful for estimating how much it is worth to have Greg Maddux join your team the next season.


6:19 PM Jan 11th
 
Gfletch
I think the Albert Einstein quote...is that the one where his opinion was summarized by part of it, "I refuse to believe that God plays dice with the universe."

I think there is a middle ground between the extremes. There is, without doubt, a certain amount of randomness to all things at a certain level. In sabermetrics, this is demonstrable because everything in baseball is different as time goes on. Mookie Betts himself is different from one plate appearance to the very next one. He is at least a few minutes older, the time of day is different, the weather is slightly different, he may be facing a different pitcher, or the same pitcher who is also a little bit differnt than he was the last time he faced Mookie Betts.

And then there are park effects, and years between games, and changes in those things and in situations and teammates, everything keeps changing.

But there is a certain amount of dependability, of persistence in who the player is from one moment or day or week or year. We can measure a great deal of that.

Of course we cannot stop people from creating their own versions of statistical formulas. I do so see the problem as you have stated it, Bill, that a constant proliferation of versions of WAR will destroy the bridge between baseball scientists and baseball fans. I think it would be better if there was only one official version of WAR or any other formula, while the scientists come to some agreement as to a need for change.

Babe Ruth's career WAR (Baseball Reference version) is 162.7. I'm much more an average fan than a baseball scientist. I dislike the precision suggested by that number. I'd rather than number be rounded up or down (in deference to randomness) and expressed as WAR per 162 games, say. But if we have two respected sources that differ on that number, that's a problem for the average doofuss such as myself. Well, I don't think I'm a doofuss, but a doofuss wouldn't know if he was, would he? A doofuss, I mean.

Final thought - mercifully - is that we shouldn't want the scientists to huddle together in isolation, elitists disdainful of the public. Popularization of science is extremely important. Else the public will one day burn the sabermetric library of Alexandria to the ground. In that respect, I would prefer that we have one version of a WAR type formula and any changes to be the subject of discussion before any changes. How that can happen (and is it happening?) is certainly out of my hands.
5:17 PM Jan 11th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy