What Do We Expect a Hitter to Do Next Year?
This is a continuation of the article that I started yesterday, about good years and bad years. The article assumes that we can identify players who have had good years, and players who have had bad years. But how do we do this?
Let’s use Mike Trout, 2016, and Bryce Harper, 2016, as the illustrative cases. Prior to the 2016 season, Mike Trout had 2,877 career plate appearances, with a career OPS of .956. We multiply the .956 by the 2,877, which produces 2750.412. This number doesn’t have any direct meaning; it is merely the WEIGHT of his career playing time, applied to his OPS.
Second, we multiply his LAST SEASON’S OPS by his last season’s plate appearances. In his last season (2015) Trout had a .991 OPS in 682 plate appearances. .991 times 682 is 675.862.
This last figure, we multiply by four. 675.862 times four is 2703.448. We add this to the first figure, which was 2750.412. The total is 5453.860.
Then we take his career plate appearances (2,877), and add FOUR TIMES his plate appearances from 2015, his most recent season. The most reason season plate appearances is 682; four times that is 2,728, so we have 2,877 + 2,728, which is 5,605.
Then we divide the "product" total—5453.860—by the plate appearance total, 5,605. The result is .973. So, in this first round of the calculations, we expect Mike Trout to have an OPS in 2016 of .973.
Bryce Harper through 2015 had 2,143 plate appearances, with a career OPS of .902. 2,143 times .902 is 1932.986, so we save that.
In 2015 Harper had 654 plate appearances, with an OPS of 1.109. 654 times 1.109 is 725.286. Multiply that by four; that’s 2,901.144. We add these two together—1932.986 and 2901.144—and that’s the numerator of our formula, 4834.130.
Harper had 2,143 plate appearances in his career and 654 in 2015. Multiplying the 654 by four and adding the 2143, that makes 4759. That’s the denominator. Dividing 4834.13 by 4759 yields 1.016. Our first estimate of the expected OPS for Bryce Harper in 2016, then, is 1.016.
2016, however, was to be a hitter’s year, as I suspect you all know. The major league OPS in 2016 was .739. In 2015 it was .721; in 2014 it was .700. The relevant point is that it moved up. It moved up from .721 to .739, which is 2.5%.
We wouldn’t have known this at the time, but still, we have to adjust for it. If we didn’t adjust, we would show almost all hitters in 1968 as having poor seasons, and almost all hitters from 1930 as having good seasons—not "almost all", really, but maybe 70% or something. Too many. If I had teams and leagues identified with the players in my data I would adjust for the park factors and the leagues, but I don’t, so I just adjust expectations by the major league OPS.
Trout’s expected .973 OPS, adjusted upward by 2.5%, becomes .997. Harper’s expected 1.016 OPS, adjusted upward by 2.5%, becomes 1.041. That’s the second-level estimate. There are only three levels in this process, so we’re not a long way from the end of this series.
The problem is that players tend to drift toward the center. If you take the universe of players who have an expected OPS somewhere around 1.000, like Trout and Harper, you will find that they do NOT have actual next-year OPS of 1.000. You will find that they are very, very good players with very high OPS, but not 1.000. There are six players in history who have had an expected OPS for a season of exactly 1.000—Ty Cobb in 1920, Rogers Hornsby in 1924, Ken Williams in 1924, Rudy York in 1939, Stan Musial in 1944, and Frank Thomas in 1999. All six of those guys were productive hitters, with OPS of .867 or higher, but only one of them actually had an OPS of 1.000 or better. At this point we would wind up saying that five of the six players had poor seasons.
Everything drifts toward the center—on both ends. If we used these estimates as they are, we would wind up saying that the great majority of guys who had expected OPS of 1.000 were having bad years, because they had OPS of .950, whereas the great majority of the guys who had expected OPS of .550 had GOOD years because they had OPS of .600. There are 11 hitters in history who would have expected OPS of .550 for a season—John Richmond in 1883, Bill McClellan in 1884, Curt Welch in 1885, Jim Keenan in 1888, Doc Lavan in 1916, Moe Berg in 1928, Ed Stroud in 1968, John Bateman in 1968, Tim Cullen in 1970, Darrel Chaney in 1976, and Tim Bogar in 1999. In fact, every one of those players had an actual OPS higher than .550, and 9 of the 11 had actual OPS higher than .600. Over time, everything regresses toward the mean.
We don’t want to wind up saying that all of the players who were expected to have an OPS of 1.000 had bad years, because they had OPS of .950, while all of the players who had expected OPS of .550 had good years, because they had an OPS of .600. The player’s personal history is not the only information relevant to the question of what should be expected from him next season.
We have to normalize expectations, then, by moving players some distance toward the center. We do this by multiplying the player’s expected OPS by three, adding the major league OPS for the season, and dividing by four. When we do this for the six players previously listed at an expected OPS of 1.000, their expectation drops to somewhere between .921 (Musial in ’44) to .945 (Thomas in 1999). That way, three of those players meet their expectations, and three do not, which is what we want—half of the players in any group to meet their expectations. (Nine of the 11 players with an expected OPS of .550 still are shown as having good seasons, but that’s just a fluke. If we expand the test group to all players from .549 to .551, then we have 19 players having good years, and 23 having poor seasons.)
Trout’s expected OPS before was .997, but the major league average was .739. When we modify Trout’s expectation in this way, however, Trout’s expected OPS drops to .933, while Harper’s drops from 1.041 to .966.
Trout’s actual OPS was .991, 58 points better than expectation, so he had a good year, while Harper’s actual OPS was .814, much less than his expected .966, so he did not have a good year.
OK, that’s the end of that road, the end of the process for players like Trout and Harper. I should have said this earlier, but we use THAT process for establishing expectations for players who have had at least 300 major league plate appearances entering the season. We’ll have to use a different process for rookies and for players who have limited playing time before the season. You can try to adjust for aging at this level, but it just causes problems. If you adjust expectations upward further for Trout and Harper, because they are young, you wind up with a pretty good number of absurd outcomes, young players who won the MVP Award but are shown as having poor seasons because they did not meet expectations. An age adjustment at this level is not helpful or appropriate, although we’ll do some age stuff later in the process.
Next let’s deal with the special rules. One special rule is that a player who wins the MVP Award is always considered to have had a good season, although I don’t believe there is anyone who would have been considered to have been having a bad season anyway, I’m not sure. I remember that I checked, but I don’t remember if there was anybody there.
I mentioned that all players with less than 200 plate appearances are presumed to be neutral. A player with less than 200 plate appearances can be considered to have had a bad season if
a) He has at least 100 plate appearances,
b) He had at least 300 plate appearances the previous season, and
c) His OPS is at least 100 points below his career OPS entering the season.
597 players are designated as having bad seasons by this rule.
A player with less than 200 plate appearances can be considered as having a GOOD year if
a) He is at least 29 years old (hence, not fighting for regular playing time if he has not previously had it)
b) He has at least 100 plate appearances,
c) He has at least 80% of his previous seasons’ plate appearances,
d) His OPS is at least 25 points above his career norm coming into the season, and
e) His OPS is at least .650.
338 players are designated as having good seasons by this rule. This rule allows us to designate those veteran pinch hitters like Jerry Lynch, Gates Brown, John Vander Wal, and Dave Hansen as having had a good season, when they have a good season. Generally speaking, guys with 150 at bats, you can ignore them, but if they’re actually playing an important role of the team and they have a good year, then it looks absurd to ignore them.
Also, any player who has 100 to 199 plate appearances and has an OPS of at least .900 is designated as having had a good year; 116 players are affected by this rule. There is also a special rule: Shane Spencer in 1998 had a good year. One player is affected by this rule.
I’m just trying to make sure, when we count the number of players who have good years on any team, that we have counted everybody who has a real impact. If a guy hits .180 and loses his job in early June, we count that as a bad year.
OK, we’ve dealt with probably 85% of the players now. . .all players who had 300 plate appearances entering the season, and all players who had 200 or fewer plate appearances during the season. What we haven’t dealt with yet is the rookies and first-year regulars and sort-of regulars, the players who got 200 or more plate appearances with little history of previous playing time.
For one group of players who are remaining, I. . ..well, I confess. I cheated. If a player (in the remaining group. . the remaining 8,000 or so players) if a player
a) had an OPS at last 10% below the league norm, and
b) played less than 100 games in the rest of his career, after that season,
then I marked him as having had a poor season. This is cheating, because I am using the player’s FUTURE record to make a determination that is supposed to be made based on his play in that season. But sometimes I would prefer to be sure that I have the answer right, rather than doing everything the way it optimally ought to be done. If a player has an OPS 10% below the league, and he disappears after that season. . .well, that’s pretty good evidence to conclude that he didn’t get the job done. So let’s mark those guys as negatives.
This gets rid of 1,000 or so players; we’ve got about 7,000 left, as I recall; didn’t make a note of what it was. What do we do with the rest of these guys?
This is what I did. I divided each player’s OPS by the major league OPS for the season, so that a .792 OPS in a season in which the major league OPS was .720 is a relative OPS of 1.100. Then I sorted the players by their defensive position. . .catchers, first basemen, etc.. Then I figured the normal "relative production" for first-year players at each position. That is, if the rookie shortstops had a normal production rate of 93% of the league OPS and the league OPS was .700, then I marked the "expectation" for that player at 93% of .700, which would be .659. Then, if a player was above expectation, he was marked as having had a good season. If he was near expectations AND KEPT HIS JOB THE NEXT YEAR—again, cheating—I marked that player as having had a good year. But if he was in the bottom 45% of his group, then I marked him as having had a disappointing season as a hitter. Many players had disappointing years with the bat in their first year, but nonetheless went on to long careers, of course, but you have to leave it that way because that’s what really happens.
One other thing I should have explained earlier. In general, if a player’s actual OPS was less than his expected OPS, then that’s marked as a poor season. There is, however, an exception. There was a rule that at least 45% of the players at any age (up to 40) had to be considered to have met expectations, unless it was not reasonable to so designate them. At age 38, probably 65, 70% of the players come in below their expected OPS, but you can’t really say that they have had poor seasons, because, after all, they are 38 years old. I made the rule that I had to mark at least 45% of them as having had good seasons, even though some of these may have missed their expectation. They are "ranked" by actual OPS, divided by expected OPS; AT LEAST the top 45% in every age group have to be marked as having had good seasons.
Finally, since I know that formulas are sometimes stupid, I allowed myself to overrule the formula classification if I was certain that it was wrong. Occasionally, the formulas just don’t work. 80% of the time, when the formulas say that a player had a bad year, you can look at it and agree that that is obviously true. Another 15 to 19% of the time, you might think that a player’s season was borderline; you can call it a good season, you can call it a bad season, can’t really argue it either way.
But once in a while, the system just doesn’t work for some reason, and it will say that a player has had a poor season when there is just no way that that is true. I would rather be right than be faithful to my protocols. I reserve the right to re-classify a season if the mathematical system is just totally wrong.
A few cases where this happened. I think the system wants to say that Rickey Henderson had a poor season in 1982, which was the year that he stole 130 bases. His OPS the previous two seasons was .820 and .845; that year it is .780, so the system thinks it’s a subpar season. Not a reasonable conclusion.
Willie McCovey in 1977, which was his comeback year with the Giants; he hit .280 with 28 homers, 86 RBI after the entire baseball world thought he was washed up. I remember I was out there, saw the Giants play a couple of games that summer; Willie was the big hero of the season, the basis of all of their promotional campaigns. For some reason the system thinks that was a bad year. Can’t live with that.
Ivan Rodriguez in 1991; his relative OPS is low, even compared to the relative OPS of other rookie catchers—but he was such a defensive sensation that it is just not reasonable to say that the Rangers were disappointed he only hit .264. Joe Carter in 1993, which was the year he hit the walk-off home run in the World Series. He hit 33 homers and drove in 121 runs in the regular season. His OPS was .802 whereas it was expected to be .804, but nobody reasonably would say that Carter had an off season; can’t live with that. Maury Wills in 1959; his mid-season callup fixed a gaping wound in the Dodger lineup, and ignited a charge to the pennant; can’t reasonably say that was a disappointing performance. (The Dodgers were in fourth place in early June, when Wills made his major league debut.)
There are 54,000 hitter’s seasons that we classified as good, bad or neutral, and of those 54,000 I overruled the system and arbitrarily changed the result of something less than 50, or less than 1/10th of one percent. But I think it is better to do that than to just allow the system to be obviously wrong about something.