John Dewan introduced six candidates for the MVP race in an article posted a couple of days ago. I just got back from vacation and realized that I have NO idea who should win the NL MVP race, so I thought I would spend a few minutes looking at John’s six candidates.
1) The Carpenter is hitting just .233 with runners in scoring position and has homered almost exactly twice as often with the bases empty as with men on base. I think those are legitimate issues. 27 of his 33 home runs are with the bases empty, a split that is influenced but not created by at bats. On the other hand Carpenter is hitting .250 with 11 homers at home, .288 with 22 homers on the road, OPS .828 and 1.090.
The issue in an MVP race is not how Carpenter as an individual is impacted by the park; it is the value of what the player has done. Suppose that there was a neutral offensive park, but that park was very favorable to a left-handed hitter, but very difficult for a right-handed hitter. Suppose that in that park there were two MVP candidates, a right-handed hitter who was badly hurt by the park, and a left-handed hitter who was helped by the park. Overall their runs created are the same. Is the park relevant to the MVP debate?
It is not, or not really. The issue is the value of what the player did, not what he would have done or might have done in some other context. If two players create 100 runs each in the same offensive context, the same number of wins for their team are likely to result, so the value is the same, since the "value" is the number of wins that result.
One may argue that in another park, the right-handed hitter would have been more valuable than the left-handed hitter, and that may be true. It’s not relevant. What a player might have done, could have done, would have done in some other set of circumstances is not relevant and cannot be relevant, because (a) that kind of analysis leads to endless speculation on many different issues, and (b) we don’t actually know. We don’t KNOW what the player would have done in some other park. We’re trying to stick, as much as possible, to the facts, rather than speculation.
Of course, if you want to say, "these guys are equal in value in these circumstances, but the one guy is really better on an underlying level than the other guy, so I’m going to vote for the guy who was hurt by the park," sure. It’s not wildly irrational to think in those terms as a tie-breaker, small-change type of thing. It’s outside the lines of strictly rational value, but then, many things which are true are outside the lines of strictly rational thought, until rational thought catches up with them.
But as long as we are staying within the lines of those issues which have been thoroughly thought through, the issue is not how Carpenter as an individual is effected by the park, but the relationship between his runs created and the number of runs necessary to win a game in his environment. So what is the Park Factor for St. Louis this year?
The Cardinals this year (through Saturday) have scored and allowed 506 runs in 61 games at home, 570 runs in 63 games on the road. That calculates to a raw park factor of .922, adjusted park effect of .964. Baseball Reference lists the St. Louis Park factor this year at .98, but I don’t know how they’re getting that number. It probably has something to do with their using run elements (singles, doubles, triples, etc.) rather than run totals, I don’t know. If that’s what they are doing, I would argue that it’s not what is most relevant to the MVP race. A park factor based on run elements would be more accurate than a park factor based on runs in projecting what happens in the future, but an MVP contest is not about what happens in the future; it is about the value of what happened in the past. What happened in the past is real runs, so you use real runs to calculate the park factors in an MVP debate, I would argue.
So let’s say the Park Effect is .964. The National League average is 4.41 runs per game, and the Cardinals have played 124 games, so that’s a "context" of 527 runs (124 times 4.41 times .964). Actually, since there are two teams playing in each game, it’s a context of 1054 runs, so Matt Carpenter has created 103 runs in a context of 1,054 runs, or 9.8% of the runs in the context.
2) Trevor "What’s The" Story. The Rockies this year have a raw Park Factor of 1.20, adjusted park effect of 1.093, so that’s 4.82 runs a game. They’ve played 123 games, so that’s a context of 1,186 runs (123 * 4.41 * 1.093 * 2). Story has created 90 runs, so that’s 7.6% of runs in context.
You’ve got 18 hitters in a game, so the average hitter has created 5.6% of the runs in context, or 1 out of 18. Carpenter is at 9.8%, Story is at 7.6%, so Story isn’t even remotely comparable to Carpenter on that level.
Story is hitting .277 with runners in scoring position, so that’s not anything notable. He has, however, hit 15 bombs with men on base as opposed to 10 with the bases empty, so you’d have to give him a couple of points for that.
3) Freddie Freeman, leads the National League in hitting at the moment at .320. He has created about 104 runs. The park is almost perfectly neutral, adjusted park effect of 1.002, so in 122 games that’s a context of 1079 runs (122 * 4.41 * 1.002 * 2). Freeman has created 104 runs, so that’s 9.6% of context runs. About the same as Carpenter.
4) Nolan Arenado. Colorado context is 1,186 runs, as we established earlier. His walk rate is better this year than it has been, so his on base percentage is up to .391, and he has 14 homers, 41 RBI on the road, although his average is still almost 100 points higher at home than on the road, as it often has been in the past. His OPS+ this year is 143, which is a career high by about ten points.
Anyway, RA Nado has about 99 Runs Created, so that’s 8.3% of context. It’s a good number but it’s not really an MVP number.
5) Javier Baez, leads the National League in RBI with 89. We have him figured with about 78 Runs Created. The Cubs this year have allowed 282 runs at home, only 215 on the road, so they have a Park Factor of 1.168, adjusted of 1.078. That creates a context of 1,160 runs for the Cubs (122 * 4.41 * 1.078 * 2). 78 Runs Created out of 1,078 is 6.7% which is. . .well, if 8.3% isn’t really an MVP type number, then clearly 6.7% isn’t.
Javier has a strikeout/walk ratio of 116 to 18, not real good. I know he has some special skills in terms of defense, but I think to talk about him as the MVP, you’re kind of falling into old style thinking, ignoring the Park Effects and the on base percentage, which is .319.
6) Jacob de Grom Grom. The Mets this year have a raw Park Factor of .699, adjusted Park Effect of .860. Context runs for a Mets player would be 122 (games) times 4.41 times .860 times 2, or 926 runs.
The hitters, we have been comparing to zero. The equivalent number for a pitcher would be 2 times the norm. . ..in other words, suppose that the context for a hitter is 100. If the hitter is at 120 he is +120 from zero. If a pitcher is at 80 he has equal impact, and he is +120 from 200. The zero baseline for runs NOT allowed is twice the norm or runs allowed.
De Grom is 104 runs better than a pitcher allowing twice the league norm in the Mets’ run context. (The norm for the Mets, park-adjusted, is 3.793 runs per 9 innings. Twice that would be 7.586. A pitcher allowing 7.586 runs per nine innings would allow 142 runs in 168 innings. De Grom has allowed 38 in 168 innings, so he is 104 runs better than a zero performance level.)
104 runs in a 926-run context is 11.2%, or a higher number than Carpenter or Freeman. The wrinkle is that of the runs prevented, some are prevented by the fielders. How many?
We don’t know. The Mets, I would guess, are not a good fielding team compared to the average, but the average is not zero. Let’s say that an average team allows 700 runs in a season; then it is axiomatically true that an average team allows 700 runs less than twice the average. Of those 700 runs, not ALL are prevented by the pitchers; some are prevented by the pitchers, some by the fielders. An average team is + or – zero compared to the average, but not + or – zero compared to zero. We have to have some way to remove the fielders from the 104 runs "saved" by de Grom, which we don’t have, since we don’t have a zero-point calculation system for fielders.
In this contest, the Rockies players have an advantage over the other players, which is that the Rockies have won 67 games with individual stats which would ordinarily produce only 60 wins. They are 67-56, should be 60-63 based on their runs scored and allowed. That means that each run they are producing is worth 11% more in terms of wins than it "ought" to be or would be expected to be.
So that boosts Story’s number, 7.6% of context, up to about 8.5%, and Arenado, at 8.3%, up to about 9.2%, putting them in better shape as MVP candidates. The Mets are about 5% short, so that would cut de Grom back by about 5%, but we don’t know 5% of what anyway, so we don’t know what to apply the 5% to.
Baseball-Reference WAR lists the NL leaders as Scherzer, Nola and deGrom, all in the range of 7.8 to 8.4, no position players over 5.3, and lists Lorenzo Cain first among position players at 5.3. I don’t know that that’s a credible ranking.
Just as a general comment. . . I don’t think John’s "Total Runs" system is intended to be an ultra-sophisticated method; I think it is intended to be more of a shortcut. The system mixes up zero-based calculations (Runs Created) with average-based calculations (Runs Saved), and also does not "normalize" batting stats by adjusting for the offensive context, which obviously we know needs to be done, and John would do that if he was actually making an argument that Matt Carpenter was the MVP, rather than just using this method to focus on the candidates. But mixing up zero-based values with average-based values (a) is theoretically improper, and (b) causes real, real, real serious problems in fact.
I suspect that the reason that Baseball Reference shows the top pitchers as ridiculously far ahead of the position players is that they ALSO are mixing up zero-based calculations with average-based calculations. I don’t know that; I don’t really understand the system, but I suspect that what they are doing is calculating the runs saved by pitchers against a zero point (twice the league norm), and then adjusting that for the AVERAGE defensive performance, thus implicitly assuming that defense has zero value in the average case. That was the problem with the Pete Palmer Linear Weights system; it implicitly confused performance averages with zero-based numbers. I would suppose that WAR, which is a descendant of Linear Weights, still has that problem. But I don’t actually understand their system well enough to say. I suspect that the people who created the system probably don’t understand it that well, either. It’s an inherently confusing process, combining different measurements into one, and almost all systems that attempt to do so wind up accidentally adding apples to grapefruit.