Remember me

Beyond Monthlies

September 7, 2016

Once I figured out that Baseball-Reference.com’s monthly splits vary almost exclusively because of small sample sizes in accordance with weather-related highs and lows (on the individual batter-level), I wondered what purpose such splits serve. I suppose "idle curiosity" is a sufficient purpose. If you’d like to see how well your favorite batter hit during the month of your fifth birthday, for some reason, it’s right there, you could satisfy your curiosity easily through the magic of the interwebs. But as far as finding significance in anything such a search would yield, that’s less satisfying.

One of the things I’ve learned recently, after years of perusing Baseball-Reference.com (hereafter "bbref"), is that not all of their splits are useful, or even helpful. Some are even misleading, if you’re the kind of person who is willing to be misled, as I am, by numbers that appear to be meaningful.  I’m sure that the designers of bbref are professionals who know what they’re doing (which is usually a polite way of saying that I think they’re bozos who don’t know their asses from a hole in the ground) but I really can’t understand why on earth they would set up the "first half" and the "second half" of the season to coincide with the All-Star break rather than, you know, the first and second halves of the season. Is it easier to do it that way? That’s the only explanation I can come up with that makes any sense at all:  you just look up the date of the All-Star game, tote up the numbers up to that point and put them under "first half" and then put the remainder under "second half" rather than look up the date of the 81st and 82nd game for each of the 30 teams.  I don’t actually know how much easier that is, but that’s all I can come up with. Anyone? Why break the season up into one half that has maybe 87 games and another half that has 75?

I did waste a little time researching players who had better first halfs than second halfs, and vice-versa, until I realized that nearly everyone somehow played more games in the first "half" than he did in the second "half," and that was because there are, according to bbref, about twelve more games in the first half of the season than there are in the second half. I was all full of beans, for example, with the news that Mickey Mantle was a far superior hitter in the first half of his seasons, when he hit 308 of his 536 HRs. If you double that figure you get 616, but if you double his second half-figures, it’s "only" 456 HRs, which looked like a pretty significant difference, basically the difference between Jim Thome on the all-time HR list and Carl Yastrzemski.  It’s far less significant, of course, once you adjust for the greater number of games Mantle’s teams played in the first vs. the second "halves" of bbref’s seasons.

Now, as it happens, Mantle was still a better hitter in the first half of his seasons, just not as significantly better as bbref’s skewed split would have you think. (Adjusted for a more even split of his seasons into true first and second halves, he probably hit around 280 HRs in the first half and around 255 in the second half, a 25-HR difference as opposed to the 80-HR difference that bbref’s splits imply.) We can see the real difference in his OPS, which dropped from 1.010 to .941, as bbref has it, probably too big a drop to be accounted for by the assumption (a big one) that Mantle’s greatest days took place uniformly just before the All-Star break. (He would need to have had a 2.000 OPS, approximately, on those days to account for that difference.) Maybe the best shorthand way of conveying the true dropoff would be in the number of GIDPs Mantle had in bbref’s first and second halves: he grounded into 21 more DPs in the shorter second "half" than he did in the first half, which would of course be even larger if we adjust the "halves" into true halves.  His strikeouts, which appear to go up in bbref’s first half (881/838), probably go down a tad if we adjust for the difference in calculating the true mid-point of the year.

So bbref’s split of the season into "halves" is misleading at best, useless at worst. Are there any other major issues with their other categories?

I’ve already used Sandy Koufax as an example of what’s misleading about bbref’s monthly splits, and shown how pitchers’ monthly stats, and particularly Koufax of all the pitchers in the Hall of Fame, can be particularly misleading because of the smaller sample sizes in starting pitchers’ games as contrasted with those of everyday players. I’ve speculated that a minimal normal range of monthly performances for everyday players is usually within .050 to .100 OPS points. ("Minimal normal" means that that’s the least amount of monthly fluctuation you can expect when nothing else is going on, like injuries, exhaustion, etc. The numbers will just bounce around that much if nothing else is going on, and bounce around more if there are other factors to consider.)  I’ve also given a hypothetical example of a great Hall of Fame everyday player who over a 22-year career was above that normal .100 OPS range in the month of July, which also describes an actual Hall of Famer by the name of Willie Mays.

Now mind you, this is Willie Mays we’re describing here, so his worst months are going to be better than most players’ best months, but we’re comparing Mays’ 22 July performances to his own performance in other months, and it’s a sharp dropoff: Mays had an .870 OPS in his Julys, batting .280. In every other month, he batted at least .300. Compare that overall performance to May, probably Mays’ best month, when he had only five more at-bats than he did in his Julys, making it a very even comparison:

 

AB

H

HR

RBI

OPS

Mays in July

1951

546

111

 

312

.870

Mays in May

1956

601

131

371

.982

                           &n​bsp;                        &n​bsp;                        &n​bsp;                        &n​bsp;        

What happened? Probably nothing. I’m just pointing out what a .100+ OPS dropoff looks like. It ain’t hay. If you were comparing one player who over three seasons put up 37 HR, 104 RBI to another player’s 44 HR and 124 RBIs, with all else equal, you’d doubtless say they were both great but the second player was clearly greater.

Mays’ July-to-May dropoff is a large one among the great, consistent hitters.  His twin, Hank Aaron, was the model of monthly consistency, having a range of only .040 OPS between his best month and his worst. (And maybe smaller than that, as I will show in just a minute.) I looked at the career monthly performances, and a few other items on bbref’s "splits" pages, of the twelve MLB leaders in career plate appearances, and Aaron was the only one to come in under .045. (The complete chart of all the categories I’ve explored appears at the very end of this article, after I’ve explained what each of the categories means, and what I think I’ve discovered in each of them. My rationale for choosing PAs as my standard of selecting these twelve players is simply that players with a lot of PAs are going to give us the largest sample sizes and also have careers of unusual consistency, since players with uneven performances in the second halves of their careers tend to get benched, as these twelve players never were, other than at the very ends of their careers. You’ve got to be good to get a lot of PAs, but you also have to be consistently good.)  Derek Jeter btw was the only other player besides Aaron to break the .050 OPS-range mark, as measured by months. Now if Jeter had had a few off-years in a row, the Yankees would have benched him—wait, what am I talking about? That’s crazy talk. But if any other player had a few off-years, he would have ridden the pine for sure.

The reason I say that Aaron may have had an even smaller range than .040 OPS points is that not all months are created equal:  It used to be that April was not a full month as far as MLB was concerned. The season would start officially in early-to-mid April, giving all of these players fewer games played in April than in other months, often hundreds of fewer games over a career, so I figured the OPS ranges two separate ways, both eliminating April from consideration and including it, to see if the smaller sample size made for anomalous conclusions. They didn’t, but I will print the findings for both a 5-month season and a 6-month season. Aaron’s performance was one of three outliers in April, telling me that there wasn’t, at least within this gang of twelve batters, any sign that the lower number of April games created a significant effect. Aaron’s range among May, June, July, August, and September was a mere .030.  On that basis, he must be the most consistent player ever.

The other categories I looked at included "platooning," where 10 of the twelve batters fell into that .050-100 range of OPS points. (Actually, .048 to .111, but that’s close enough for jazz, as one of my creative-writing professors, a jazz drummer, liked to say.) I thought that platooning would be an example of a bbref split that was meaningful, since platooning is a known split that has been recognized in baseball strategy for a hundred years or more. Most of these players, I presume, would show a small platoon difference, because otherwise they would have been rested more when their teams went up against certain pitchers. The two outliers were Craig Biggio, who had a negligible .028 platoon difference, and Carl Yastrzemski who had a whopping .199 platoon difference. I’m not here to make judgments, but doesn’t that .199 kinda stick out? You’ve got a batter whose OPS is 200 points lower against lefties than righties and you play him day-in and day-out for 24 years?  Seems to me that someone might have caught on this somewhere around Yaz’s 18th season, or maybe I’m not getting something. Or maybe someone did twig to it, which is why Yaz batted only 13,994 times and not 13,995.

The next highest, the .111 platoon difference,  btw, belonged to Hank Aaron, which surprised me a little-- "model of consistency" and all--  and the only other platoon difference greater than .1 was Derek Jeter’s, at .102. Make of that what you will. Everyone else’s beside Biggio (and Ripken, who came in at .048) came in at the range of .050-.100 OPS points, which seems like a normal distribution range of random variations for great and consistent hitters. Most players who have much greater differences are those who can expect to be rested on a regular basis, but in this cohort there is nothing to see here.

The next category I looked at was how each of the twelve hitters did facing Power/Finesse pitchers. Bbref thoughtfully gives us three sub-categories here, Power, Finesse, and Average. I toted up each of these batters (besides Ty Cobb, for whom this data was missing), noting the difference between their best and their worst OPS against each of the three types of pitchers. The most interesting thing I found was that about a third of these batters (4/11) had an extreme performance facing an average pitcher on the Power/Finesse spectrum. This was kinda surprising, and bears more study, because you’d think (I’d think) that if there was a meaningful spectrum of Power and Finesse pitchers, it would be anomalous for a batter NOT to do his best and his worst against the two extremes. Of course these twelve batters are far from typical, nor do they form a significant sample size, to make any broad judgments about general tendencies, but I did find the frequency of "Average" power/finesse pitchers at the extreme here a little strange.

On the Power/Finesse spectrum, there were some OPS ranges that were so teeny-tiny as to be virtually meaningless: Musial had a range of .009 between his performance facing Power and Finesse pitchers, but Rickey Henderson had only a .012 range, and Rose had only an .024 range. Ten of these eleven batters appear to hit every type of pitcher on the Power/Finesse spectrum equally well, with only one (Biggio, .121) coming in above .075.

The final category I looked at was "innings." I thought this category would be the most random of all. Why should a batter, a great and consistent batter, after all, do any better or worse in one inning as compared with any other inning? This category would yield, I felt, the most random results of all. Rather than look at individual innings, I decided I could more usefully look at three of the largest groupings of innings bbref provides: namely, innings 1-3, 4-6 and 7-9. This strategy would not only wipe out the argument (specious at best) that batters would do well or poorly in one particular inning but would also triple the sample size.

You could make a case for why these batters would have a higher OPS in the first three innings (Inning #1 is a productive inning, and all these batters hit at the top of the lineup, so it would tend to boost offensive performance, and if a starting pitcher is going to get lit up, he’ll often get bombed out in the first inning before he settles down). And you could make the case for high OPSes in the middle three innings (starting pitchers tiring, and/or being replaced by long relievers, who are usually pitchers not good enough to crack the starting rotation or to close out games—i.e., the worst pitchers in MLB). And of course you could make the case for these batters, especially these great batters, exceling in the final 3 innings—when the game is on the line, when the cream rises to the top, or when the game is sometimes long out of reach and the mopup men are throwing meaningless pitches anyway).  But I think these reasons are mostly nonsense—they tend to cancel each other out, and I’d be amazed if there was any real skill involved in hitting in one inning that diminished or grew markedly in any other inning.  Innings 1-3, 4-6, and 7-9 are a prime example of what I consider to be splits of a random nature, and the numbers seem to bear me out: I’ve added to the OPSes for each batter a notation if that figure is his high or low OPS  (abbreviated H or L, with no notation if it’s the median figure), and you can see that these are distributed pretty randomly.  I’ve listed the actual OPSes rather than the ranges in the innings categories on the chart at the end of this article, but the high difference was Jeter’s, only an .089 range difference between his innings 4-6 and 7-9, and the low was Murray’s, a difference of .018 between his innings 4-6 and 7-9.) Now, maybe there is some pattern going on here that I’m unable to see, but it seems to me that if there is no pattern to anyone hitting in a group of innings, then why the hell are we so carefully breaking this stuff down by inning, other than curiosity?

This is my main point here, I suppose: all this data exists, and I’m grateful to bbref for providing it, even if we can’t draw any solid conclusions of significance about this data, even in the largest sample sizes available. We can conclude that there is a degree of instability at this sample size level (which is about the largest we’re likely to see at the individual-player level) of up to .100 or so OPS points that is mostly random.  So in smaller samples, like in a given month of a given season for an individual player, you would be misled by any remarks that he has shown a significant fluctuation in his stats, even fluctuations much greater than the minimal normal range for a great player over the course of his career, which is at least .100 OPS points.

This research all stems from my "Families of Hitters" project, where I’m still struggling to distinguish where random differences between players end and significant distinctions begin. A preliminary conclusion I have reached, which may not surprise those of you more statistically sophisticated than I am, is that a .100 OPS point difference between two players is—well, it may be nothing. If I find two players separated by .100 OPS points, that isn’t even going to tell me for sure which one is the greater player. That point exists—I’m not going to try to tell you that Junior Ortiz is comparable to David Ortiz—but I can only identify a wide range (say, from .100 to .300 OPS points) where two essentially dissimilar players may have some valid points of comparison. Much beyond .300 OPS points and we’re talking about apples and oranges, at least in the sample sizes that crop up at individual pitcher/batter matchups, which max out at around 250 Plate Appearances.

Meantime, here’s that chart of the various splits I’ve researched for the top 12 in plate appearances (listed alphabetically). The first three columns are the innings, the next two measure the largest difference in monthly OPSes (for May-September and for April-September), next is the biggest OPS difference when facing Power and Finesse (and Average) pitchers, and finally each player’s platoon difference in OPS.

 

 

1-3 

4-6 

7-9

5 months

6 months

 p/f/a*

platoon 

Aaron

.895 L

.924

.962 H

.030

.040

.040

.111

Biggio

.823 H

.803

.750 L

--

.106

.121

.028

Bonds

1.081 H

1.072

.992 L

--

.085

.071

.098

Cobb

--

--

--

.067

.118

--

.060

Henderson

.817

.831 H

.812 L

--

.079

.012*

.074

Jeter

.831

.850 H

.761 L

--

.048

.073

.102

Mays

.942

.915 L

.963 H

--

.112

.075*

.089

Murray

.832

.846 H

.828 L

.112

.129

.054

.075

Musial

.957 L

.969

.976 H

--

.062

.009

.077

Ripken

.760 L

.799

.811 H

--

.071

.055*

.048

Rose

.755 L

.803

.805 H

--

.051

.024

.069

Yaz

.868 H

.835

.822 L

--

.115

.029*

.199

 

*Signifies the "average" P/F pitcher was on the extreme

                         &nbs​p;     

 

 
 

COMMENTS (4 Comments, most recent shown first)

Brock Hanke
For all I know, you've already done this, but Musial presents a rare opportunity. Despite playing his entire career in the same ballpark, which had no dimension changes of note during Musial's career, Musial was a very different type of hitter up through 1947 than he was from 1948 on. In particular, his homer rate after 1947 is much higher than it was before. If you ran the two "Musials" though the p/f/a algorithm, Musial's numbers might change a lot between his two "eras." Or not. In any case, it might be useful to see if there is a TYPE of hitter who has unusual p/f/a's.
7:05 AM Sep 15th
 
MarisFan61
Yes -- "June Swoon," a blast from the past (pardon the cliche) that I haven't thought of it ~50 years, but it rings a definite bell.

I also seem to remember that they finished 2nd every year. :-)
That was the easiest prediction to make -- SF Giants, 2nd place. We just didn't know who'd be 1st.
10:18 PM Sep 8th
 
Steven Goldleaf
Thanks again, George. I certainly didn't mean to argue that all the splits are random, or all equally random. I was hoping to establish a baseline of randomness, by finding stats (like monthly OPSes) that have no basis for fluctuating over the course of long careers of consistent players and seeing how much they jump around. Things that we can find, or at least argue, have an explanation would then jump around MORE, in addition to the random baseline, but there will always be (I argue) that minimal amount of jumping around due to the inherent instability in the data itself. Does that make any sense to you? It makes a little sense to me.
8:06 PM Sep 7th
 
grising
Good article, Steven! There's a wealth of information in the Baseball-Reference splits, so I'm glad you're examining it and publishing on it. Despite your well-argued article, I do think that there's some real information to be gleaned; it's not all randomness (although that does play a factor, of course). For example, I think the home/away split is pretty crucial for some players. I think we've overvalued or undervalued many players because we don't take into precise account how much their home park helped or hurt them.

I agree about the first and second half weirdness on Baseball-Reference: Why not just do the split at 81 games (or 77 for pre-1962), rather than the All-Star break? By the way, one reason that first half stats might be more (i.e., more games and PAs or innings pitched) is injuries. I'm sure that players often play in the first half and then are injured, missing all or most of the second half. Players injured in the second half of the season have time to heal over the off-season; thus, injuries in the second half probably don't affect games in the first of the next year much.
5:28 PM Sep 7th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy