Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

Judge and Altuve

By Bill James

November 17, 2017

2017-59

Judge and Altuve

The bedrock assumption upon which all sabermetrics is founded is that the importance of each statistical accomplishment depends upon its connection to wins and losses. It was a belief of sportswriters and baseball professionals, in the pre-analytical era, that individual player statistics could be dismissed because they had little to do with wins and losses. The connection between individual player statistics and wins and losses was not well understood, in 1970, by any of us. In 1974 the Oakland A’s hit just .247, second-lowest average in the American League, but the team won the World Series and was third in the league in runs scored. It was easy, at that time, for people to use statistical anomalies like that to dismiss the significance of individual batting statistics. See here; here’s a team that was just about the worst-hitting team in the league, but they won the World Series. Batting stats don’t mean nothin’.

Without valid statistical analysis, they could make any argument that they wanted to make. RBI are the game’s most important stat. The stolen base is the key to the modern offense. Walks are things that the pitcher does, not things that the batter does. The sacrifice bunt is a great play. Pitchers can be evaluated by won-lost records. Johnny Bench is not an all-time great catcher because he never hit .300. One argument was as good as another.

Modern analysis, sabermetrics, whatever you want to call it. . . .we overcame that kind of thinking by making two critical assumptions: that each statistical accomplishment acquires its significance by its connection to wins and losses, and that every statistic must be looked at in the context of its outside influences. The most critical assumption was the first one, that each statistic acquires its importance by its connection to wins and losses. When we were moving out of the primordial soup, that was the first and most critical step.

We come, then, to the present moment, at which some of my friends and colleagues wish to argue that Aaron Judge is basically even with Jose Altuve, and might reasonably have been the Most Valuable Player. It’s nonsense. Aaron Judge was nowhere near as valuable as Jose Altuve. Why? Because he didn’t do nearly as much to win games for his team as Altuve did. It is NOT close. The belief that it is close is fueled by bad statistical analysis—not as bad as the 1974 statistical analysis, I grant, but flawed nonetheless. It is based essentially on a misleading statistic, which is WAR. Baseball-Reference WAR shows the little guy at 8.3, and the big guy at 8.1. But in reality, they are nowhere near that close. I am not saying that WAR is a bad statistic or a useless statistic, but it is not a perfect statistic, and in this particular case it is just dead wrong. It is dead wrong because the creators of that statistic have severed the connection between performance statistics and wins, thus undermining their analysis.

Look, there is a general relationship between runs and wins, a normal relationship, and there is a specific relationship, based on this specific player and this specific team. If you evaluate Altuve and Judge by the general and normal relationship of runs to wins, then it appears that Judge is almost even with Altuve. But if you evaluate them by the specific relationship of Altuve’s runs to the Astros wins and Judge’s runs to the Yankees wins, then Altuve moves up and Judge moves down, and a significant gap opens up between—large enough, in fact, that Judge drops out of the #2 spot, dropping behind Eric Hosmer of Kansas City.

The first indication that there is a problem with applying the normal and general relationship is this. The Yankees, by the normal and general relationship, should have won 102 games, when in fact they won only 91. That’s a BIG gap. The Yankees played poorly in one-run games (18-26) and other close games, which is why they fell short of their expected wins. I am getting ahead of my argument in making this statement now, but it is not right to give the Yankee players credit for winning 102 games when in fact they won only 91 games. To give the Yankee players credit for winning 102 games when in fact they won only 91 games is what we would call an "error". It is not a "choice"; it is not an "option". It is an error.

When you express Judge’s RUNS. . .his run contributions. . . when you express his runs as a number of wins, you have to adjust for the fact that there are only 91 wins there, when there should be 102. (The Astros should have won 101 games and did win 101 games, so that’s not an issue with Altuve.) But back to the Yankees, one way to do that is to say that the Yankee win contributions, rather than being allowed to add up to 102, must add up to 91. That’s a good way to do it, and, of course, if you do that, it reduces Judge’s win contribution by 11% Using WAR, it reduces his win contribution by MORE THAN 11%, because the replacement level remains the same while his win contribution diminishes, so the wins ABOVE THE REPLACEMENT LEVEL are decreased by more like 16%. Judge drops from 8.1 WAR to 6.8.

The potential problem with this approach is that it holds each of the Yankee players to responsible for the shortage of expected wins proportional to their runs created. It could be that it wasn’t Judge who was responsible for this shortfall, but Jacoby Ellsbury or Gary Sanchez or Chase Headley.

But when you look at Aaron Judge’s situational data, it quickly becomes apparent that Judge is not only proportionally responsible for the Yankees poor performance in close games, but that he is more than proportionally responsible. He is disproportionately responsible for the Yankees poor performance in close games. And, while Judge’s situational stats vary from poor to terrible, Altuve’s vary from solid to sensational:

With runners in scoring position, Judge hit .262, 22 points less than his overall average.

Judge hit slightly better with the bases empty than with runners on base, his OPS being 90 points higher with the bases empty. Altuve’s OPS was 1 point higher with men on base than with the bases empty.

In the late innings of close games (100 plate appearances), Judge hit .216 with a .780 OPS. But when the Yankees were 4 or more runs ahead or 4 or more runs behind (112 plate appearances), he hit .382 with an OPS of 1.500.

In the late innings of close games, Jose Altuve hit .441 with a 1.190 OPS. When the Astros were 4 or more runs ahead or 4 or more runs behind, Altuve hit .313 with a .942 OPS.

In what Baseball Reference identifies as "high leverage" situations, Judge hit .219 with an .861 OPS. In medium leverage situations he improved to .297 with a 1.058 OPS, and in low leverage situations he hit .299 with a 1.115 OPS. Altuve hit .337-.377-.329 in those three situations.

So there isn’t any doubt that Judge was in fact more than proportionally responsible for the Yankees’ less-than-stellar performance in close games. In discounting his performance by only 11%, 16% relative to replacement level, we’re actually still overrating him.

We reach, then, the key question in this debate: is it appropriate, in assigning the individual player credit for wins, to do so based on the usual and normal relationship of runs to wins, or based on the actual and specific relationship for this player and this team?

I have been silent on this issue for more than 20 years, and let me explain why. In the 1990s I developed Win Shares, while younger analysts developed WAR. At that time it was my policy not to argue with younger analysts. I was much more well-known, at that time, than they were, and it’s a one-way street. When you are at the top of a profession, you don’t speak ill of those who coming along behind you. It’s petty, and it’s just not done. Some of those people did take pot shots at me and some didn’t, but. . .well, it’s a one-way street. I’ve got mine; I’m not pulling up the ladder behind me.

But that was a long time ago. We’re not there anymore. WAR is not an upstart statistic; it is the dominant statistic. We can debate its merits on an equal footing.

The logic for applying the normal and usual relationship is that deviations from the normal and usual relationship should be attributed to luck. There is no such thing as an "ability" to hit better when the game is on the line, goes the argument; it is just luck. It’s not a real ability.

But. . . I have held my peace on this for 20-some years. . .that argument is just dead wrong. There are five reasons why it is wrong.

First, we do not, in fact, "know" that there is no such thing as an ability to hit better or worse in a key situation. We do know that MOST deviations from normal performance in clutch situations are the result of luck, rather than ability, and we cannot prove that those deviations are not 100% due to chance—but we can’t prove that they are 100% due to chance, either. The data would look very much the same as it does whether those deviations were 100% due to chance or whether they were 70% due to chance. We do not, in fact, know which one it is.

I acknowledge that, in the 1970s and 1980s, sabermetrics reached a consensus on this issue, and I acknowledge that I was part of that consensus. But we wrong. We jumped the gun. We should have remained agnostic on the issue until more convincing analysis is done.

Second, it doesn’t matter whether it is luck or skill. Tom Tango offers this analogy: suppose that you buy a $2 lottery ticket for a chance to win $3 million. After the lottery has been conducted, the lottery ticket is no longer worth $2. It is either worth $3 million, or it is worth nothing.

The people who use WAR in this manner are in essence pretending that the $3 million ticket and the $2 ticket are of the same value, even though the lottery has been conducted. There is a "lottery" element in baseball, yes, a luck element, but we can’t ignore that. It’s part of the game.

Third, there are "luck" elements all over the statistics. You can’t adjust them out of existence; it’s impossible. A player hits .270 one year and .330 the next, and he’s the same hitter one year that he was the other, it’s just luck. Are you going to adjust that difference out of their values, because you know it is just luck? A player hits 32 homers one year, 25 the next; it’s just luck. Are you going to adjust that out of existence, because you can’t prove that it isn’t just luck?

A player draws 60 walks one year; the next year, because his team happens to face more pitchers who have poor control and he happens to have different umpires behind home plate, he draws 90 walks. It’s just luck. Are you going to adjust the statistics to remove that luck?

That’s petty luck, but what about the BIG luck? A player is a passenger in a car during spring training, gets into a car wreck and misses the first half of the season—or maybe the rest of his career. It is just luck. Are you going to adjust THAT luck out of existence?

Reality is the baseline for statistical analysis; not what reality should have been, but what it actually was. There is no other way to make statistical analysis work. It makes NO sense to selectively ignore this one element of luck, after you have accepted all of the other elements.

And fourth and finally, the connection between wins and other statistical accomplishments is the basis of statistical analysis. When you sever the connection between wins and statistics, you are no longer doing statistical analysis. What you are doing then the same thing that Maury Allen did in when he said that Johnny Bench was not an all-time great because he never hit .300. You are picking and choosing which stats you will pay attention to and which you will ignore, based not on their connection to wins and losses, but based on your own prejudices. When you do that, it is no longer valid statistical analysis.

The odd thing is that these analysts are faithful to the principle that the value of each statistical accomplishment is based on its relationship to wins all through their process, only to drop it just as they get to the finish line. If you backtrack the logic of their system, you can see that they base the value of a double or a walk or a stolen base on its relationship to wins. They make park adjustments, for example, which are of exactly the same nature; they are variances in the runs-to-wins ratio. If a one player creates 110 runs but in a hitter’s park and another player 105 runs but in a pitcher’s park, they acknowledge that the 105-run player has more value. Why? Because 105 runs in a pitcher’s park will win more games than 110 runs in a hitter’s park. There is no other reason for making that adjustment.

Of if a team scores 700 runs in 1965 but another team scores 720 runs in 1975, they will agree that the team which scores 700 runs in 1965 has the better offense. Why? Because 700 runs in 1965 has more win impact than 720 runs in 1975.

But if a team actually wins 80 games when they might have won 90. . .well, we’ll pretend that they won 90. It makes no sense.

There is one more argument that we have to deal with here. Let us assume for the sake of argument that this "run-based deviation" results primarily or entirely from luck, and in particular let’s look at the comparison between Eric Hosmer and Aaron Judge. Judge created more runs than Hosmer, with fewer outs, but Hosmer had more win impact because his team was more win-efficient based on the runs that they scored and allowed. Let’s assume that is just luck. Would you rather have Aaron Judge next year, or Eric Hosmer?

You would rather have Aaron Judge, obviously—and in fact I would; I would rather have Aaron Judge next year than Eric Hosmer. It is perfectly reasonable to create estimates of projected value in future seasons which are based on the usual and normal relationship of runs to wins.

But suppose you had a 21-year-old player who was in the International League this year, Triple-A, and suppose that in the International League he posted this batting record:

G	AB	R	H	2B	3B	HR	RBI	BB	SO	HBP	GDP	AVG	OBP	SLG
142	529	273	340	31	1	182	423	108	50	7	2	.643	.702	1.737

What that is, if you are curious, is Harmon Killebrew’s batting stats in the 142 best games of Killebrew’s career. . .not 142 consecutive games, obviously, but the best games from his long career. But suppose that you had a player in AAA who did that.

That player, looking forward, would be the most valuable property in baseball, would he not? He would outrank even the great Shohei Otani, I think you would have to agree.

He would be the most valuable property in baseball in the future, but would that make him the American League MVP this year?

Well, of course it would not. What a player may reasonably be expected to do in the future has nothing to do with his value in a season which is in the past. What creates value for a baseball player is winning games. You cannot discard that principle, and have a valid analysis.