Bill has been rightly pointing out the utter idiocy of relying on, or even mentioning, the results of tiny pitcher-batter matchups during TV broadcasts. Sometimes, I wonder, though if the inverse proposition might have some validity. The only value I can see in mentioning the results of tiny sample-size matchups is the possible absence of the opposite conclusion.
Suppose Abe Atter has 3-for-5 with a homer and double vs. A. P. Itcher. Bill might say "Means absolutely nothing," and he’s right—Atter could then go 0-for-5 against Itcher, which would be easily explained by a mere reversion to the norm, and it shows nothing about how well Atter hits Itcher.
But might we draw from the initial 3-for-5 the implication that Itcher certainly doesn’t dominate Atter?
Unless we’re willing to entertain the possibility that Atter is the single luckiest SOB on Green’s God-Earth, that these are the only three hits he will ever get against Itcher if he bats against him until the Thursday before Doomsday, we must concede that until about that Tuesday, we’re certainly going to let Atter take his licks against Itcher.
Without a willingness to attribute these three hits to blind luck prevailing over all rational thought, we must concede that Atter seems to see Itcher’s pitches well, that he gets the bat around on him ok, and that he must have a fair amount of confidence when he steps in the box facing Itcher. Of course, given the small sample size, we keep an open mind as to the nature of Atter’s three hits, which might be due to luck if they are a swinging bunt, a Texas League blooper, and a grounder to the hole that the shortstop had trouble digging out of his mitt, but in my example the homer and the double denote something of some significance.
Not a lot, but maybe some little thing? Maybe something as small as "Itcher hasn’t figured out yet what location he should try to throw to against Atter" or maybe it’s just that Itcher has figured that out but needs to execute better. Will he figure it out? Will he execute better? Probably, but those first five at-bats indicate, at least, that he hasn’t done so yet.
Five at-bats is just too small a sample size to have meaning, but that will not restrain the Joe Bucks and the Ron Darlings from drawing their conclusions, and then changing those conclusions out loud every few at-bats from that point on. If I understand Bill’s thesis correctly (always a dangerous premise—I’ve gotten about one "That’s right" from him for every five "No, no, no, you imbecile!"s), the prevailing, or even overwhelming, influence on someone’s ability to hit is simply that batter’s inherent ability, and not the batter’s hotness, or the pitcher’s coldness, or really any other factor. Everything else is just the luck of the draw. Someone who hits at, say, an .800 OPS clip, in other words, is going to hit .800 in July, at home, with runners on 2B, in almost any conceivable situation, given enough at-bats.
Of course, that .800 clip is subject to certain predictable, rational adjustments: if his home park is conducive to hitting, it would go up in his home park. If he’s batting against lefties as a lefthanded batter, it goes down by the typical platoon disadvantage. If it’s July, his OPS goes up at the rate that batting universally goes up in the warmer weather. And so on. We err when we start attributing any sort of irrational hocus-pocus to variations in someone’s performance, such as "clutch ability," "big-game performance" (which always makes me think he does well facing lions, water buffaloes, and elephants), or some special talent facing particular pitchers or teams.
Over the course of a career, I’d also suppose that an .800 OPS guy is more like an .850 OPS guy near his peak, and maybe .775 OPS as he gets started and as he winds down his career.
But most of the fluctuations, apart from these predictable influences, are random, simply due to small sample size above all else.
And sample size is always going to remain too small. Put another way, by the time that one player’s sample size of at-bats approaches "large enough to take seriously" that player has almost certainly changed enough to make his previous body of at-bats meaningless. It’s only when we look at many players’ at-bats—a team’s, or a league’s, or all batters in MLB—that we can confidently identify meaningful trends. Suppose you’re looking at a batter who has a reputation for being a clutch hitter—let’s call him Jerek Deter—and let’s say he hits very well in certain situations. Those situations could easily be explained by dumb luck when the sample sizes are tiny: if Deter hits safely three times in his first five post-season at-bats, we can safely dismiss that performance as due to the small sample size, not any real clutch ability, and even when he’s hit safely twenty times in his first fifty post-season at-bats, we know that a .400 average in only fifty at-bats isn’t really a big enough sample to register a meaningful difference from his non-clutch BA of over .300. By the time Deter has enough at-bats to begin to approach a meaningful figure, and we start to having to wonder if his post-season hitting is more than a statistical fluke, however (and post-season at-bats really don’t get near that point for anyone), Deter is in his late thirties, and his game—the game that got him to that point—has changed, often radically. Maybe he has focused on greater plate discipline rather than sheer muscle. Maybe he has become more of a breaking-ball hitter to compensate for his slower reflexes on the fastball. Maybe he is now more reliant of his vast knowledge of pitchers’ strategies rather than his diminishing bat-speed. The point is that he’s now a different Jerek Deter than the one who racked up those previous "clutch" at-bats.
Individual pitcher-batter matchups don’t begin to approach even the small sample sizes of one batter’s post-season performances, of course. By the time Abe Atter has faced A. P. Itcher a number of times large enough to qualify as "non-comical," both of them will have aged, made adjustments to their game, suffered game-changing injuries, re-thought their goals, etc. to render the previous body of at-bats less than relevant.
Few players accumulate enough at-bats in the World Series to begin to approach significant sample sizes, though this doesn’t stop yammerers from yammering about itsby-bitsy teeny-weeny sample sizes. There are a few, mostly New York Yankees of the 1950s and 1990s-2000s, who get enough appearances to be acclaimed as World Series performers, but often there’s a funny thing about even such sample sizes: they sometimes break down in surprising ways.
We’ve been talking, over in Reader Posts, about one such World Series star, Whitey Ford, and his spectacular performance in Series play from 1950 through the early 1960s. Through game 4 of the 1962 Series, Ford had pitched 118 innings with a 10-4 W-L record, and a 1.98 ERA. Stellar, right? In the first inning of game four of the 1962 Series, he was his usual stellar, spectacular self, allowing 0 runs against the powerful Giants’ lineup, lowering that 1.98 ERA just a tad—
But from that point on, Whitey Ford turned into Whitey Fraud:
Starting in inning 2 of game four, he stunk out the joint. His World Series record, after coming out to the mound to start the second inning, was 0-4 in his final 27 IP, and his ERA from that point through his last World Series in 1964 was a horrendous 6.00.
Wha’ hoppen?
Maybe his career was headed downhill, and his Series play simply followed suit?
Plainly not—two of his most intimidating seasons were in the period, 1963-64, that his World Series performance was so abysmal: he went 41-13 in those seasons.
It’s obvious (BJOL caveat: obvious to me) that he just experienced a regression to the mean, as the sample size of his innings pitched increased. Whitey Ford, in other words, just played more like Whitey Ford –and since he was playing above Whitey Ford-levels up to that point, he pitched his final 27 World Series innings, quite understandably, below that level.
It sometimes works the other way, of course. When I was a kid, I was vaguely aware of Hank Bauer being in the pantheon of World Series stars, having hit 7 HRs in Series play, which is a ton, considering that most players never get 7 World Series at-bats. What I didn’t know was that in Bauer’s first five Series, he was a big bum:
In his first five World Series, 1949-1953, Bauer had 80 at-bats but a puny 13 hits, 11 of them singles, none of them HRs. That’s a .163 BA, rounding up. Bauer did all his damage in his final four Series, when he batted 108 more times and got 33 hits, including 7 HRs and 33 RBI. That’s a .306 average, and the power numbers are impressive if we multiply the at-bats by six to yield a full 648-AB season: that’s 42 HRs and a Hack Wilson-breaking 198 RBI. Pretty fair hitting.
Did Bauer suddenly learn how to hit in the Series? Did Ford suddenly forget how to pitch? Which one is the real Hank Bauer? Which one is the real Whitey Ford? Neither, or none, of course. Even both halves taken together don’t give a truly representative sample of either man, but the larger samples are more meaningful, and also closer to both men’s lifetime performance numbers. If you judge young Whitey Ford by his first decade of Series play, or old Hank Bauer by his second half, you do him the same injustice if you judge either man by his less-stellar performance. I've always felt that Ted Williams got a terrible, undeserved rap because of one unrepresentatively poor Series, and that Barry Bonds got a chance to redeem his reputation simply by getting to play in the post-season beyond his initial poor showings.
Occasionally, even in the relatively small sample sizes, we will see remarkable (seeming) consistency. Both halves of the real Derek Jeter’s post-season heroics are equally impressive: in Jeter’s first 79 post-season games, he had a .308 BA, .384 OBP, and .461 SLG, with 10 HRs. In his next (and last) 79 post-season games, the BA stuck at .307, the OBP fell to .364 and the SLG rose to .468. His OPSes were a mere .013 apart. With exactly 10 HRs. Hard to do that so precisely without some serious planning.
But even within that remarkable-seeming consistency lies an awful lot of practical inconsistency: take one sample of 32 consecutive post-season at-bats from Jeter. In the sixteen at-bats from October 10, 2005 (facing the Angels) through October 6, 2006 (facing the Tigers), Jeter got 10 hits, including 2 HRs and 4 doubles. Presumably hot as hot can be, in his next sixteen post-season at bats against Detroit and Cleveland (in 2007), he got 2 singles. (And of course you understand that Jeter had been ice-cold just prior to the hot at-bats I cherry-picked for this example, and hot again after the "cold" 16 at-bats.) In this most consistent of post-season performers, sixteen at-bats is just way too insignificant to be of any predictive use, yet you will hear announcers intoning constantly about much smaller samples of at-bats of one batter’s history (often ancient history) against a particular pitcher in the postseason, as if there were any value in that data point.
We all know why announcers prefer repeating this fallacy to simply shutting up about: as nature abhors a vacuum, TV abhors silence. Better to intone nonsense or gibberish than to deliver more than a few consecutive seconds of dead-air. Imposing a narrative, to use Nora Ephron’s concise summary of what artists must by their nature do, is where announcers go wrong. They think they’re artists. But announcing a game is not an art—the "narrative" of a game does not exist, except in retrospect. While fiction-writers must identify meaningful threads running through the fabric of their storylines, and emphasize those threads so that their readers notice them, baseball announcers have close-to-zero meaningful threads to point out: who’s up, who’s pitching, what the count is, and every so often the score and who’s warming up in the bullpen. I almost never need to know the history, or lack thereof, of the batter/pitcher combo, the batter’s history against this team, or in his last four games, or with runners where they are right now, or almost any of the extraneous and useless information I’m bombarded with on a regular basis.
A player who’s doing well can start doing much better at any point in the game, or suddenly do much worse. TV announcers have no more insight into the future than anyone else has—all they do have that the rest of us lack is tons and tons of data, most of it sheer garbage: batter-pitcher matchups, how a team is hitting recently with two men out, or with men on base, or relievers facing their first batter—all this data, and much much more, takes the form of a meaningful narrative because the TV announcers are desperate to seem knowledgeable about areas where no knowledge is available. I sit here, watching the Dodgers play the Red Sox, and all I think, over and over, is "Bullshit, bullshit, utterly without predictive significance, more meaningless bullshit, shut up, yet more bullshit, truism, total coincidence, still more bullshit." Maybe I should watch with the sound off?