Sunday’s Red Sox game was played under a threat of rain which, sometime in the 7th inning, became real rain, with real umbrellas and three-hour rain delay (May 19, 2013). This was the last game of the series. The Red Sox were trying to get out of town and get on to their next destination, and the Red Sox’ announcers were talking about various ways to move the game along under those conditions. Jerry Remy told about some games when he was playing and batting leadoff, in which the umpires at the start of the game would tell the leadoff hitter for each team something like "We’ve got to catch a plane. You might look to be swinging at anything on the corners."
It struck me that I might be able to document that that’s a real effect. I have a spreadsheet which has some information about every game played in the majors between 1960 and 2011. .. .retrosheet data, obviously. The file doesn’t have complete play by play or complete records, but it has the box score line for each starting pitcher, the score of the game, and some other stuff. I organized the data to identify the last game of every home stand; not the last game of each series, but the last game of each home stand. The last of the home stand, we know the umpires are moving on after the game. If there is an effect of the strike zone changing because the umpires are leaving town, that’s where it has to show up.
And it does. You probably know that studies like this are usually blind alleys. You chase down an effect that you think should be there, and. . .no evidence of an effect. You don’t know that the effect doesn’t exist; you just can’t prove that it does. That’s what happens 90% of the time. This is the other 10%. There clearly is such an effect.
In all games in that file, teams scored an average of 4.42 runs per game—4.49 by the home team, and 4.34 by the road team. In final games of a home stand, the average dropped to 4.37—4.43 by the home team, and 4.31 by the road team. We’re dealing with 22,000+ final games of home stands there and with data for two teams (and two starting pitchers) per game, so. . .nothing is random.
Looking at the performances of starting pitchers, the average number of innings pitched by starting pitchers actually dropped in the final games of home stands, from 18.51 outs recorded on average to 18.40. This occurs, I think, because teams often have an off day after the final game of a home stand. Managers are more aggressive using the bullpen when they have an off day coming up, for obvious reasons.
Starting pitchers in the final games of home stands had a more-than-proportional decrease in hits allowed, runs allowed, earned runs allowed and walks. Whereas innings pitched per start decreased only from 6.17 to 6.13, hits allowed by starting pitchers (in these starts) decreased from 6.19 to 6.12. Runs allowed per nine innings decreased from 4.52 to 4.47, and the ERA of starting pitchers under these conditions improved from 4.09 to 4.05.
Of most relevance to our enquiry, walks per nine innings decreased under these conditions from 3.11 to 3.08, while strikeouts increased from 5.69 to 5.80. The large increase in strikeouts is the most striking thing about the study, no pun intended. It is much larger, as a percentage, than any other effect measured by the study, in part because there is a technical problem which is causing it to be over-measured.
We know that these effects exist, then, and we also know that these effects in real life are significantly larger than we have measured them as being. We know this, first, because in the data I stated before, I contrasted the overall rates of data with the rates of performance in the final games of home stands. In other words, the final games of home stands were included in the overall group. About 20% of games were final games of home stands. If I had contrasted the performance in final games of home stands with other games, the measured effect would have been about 25% larger.
The real underlying effect would be larger than the measured effect, as well, for other reasons that are more difficult to evaluate. The underlying effect would be present in some games in which it was not measured; in other words, there would be some games other than final games of home stands in which the umpires would be looking to catch a plane after the game. Further, the underlying effect would not absolutely be present in all of the final games of home stands, because, in some cases, the umpires would be flying out the next day, rather than on the evening after the game. This causes a "mixing" of the data, and the mixing of the data causes the effects to be under-measured—in the same way that, if you combined at bats of Mickey Mantle with a certain unknown number of at bats by other hitters of unknown quality, that would understate Mickey Mantles’ ability as a hitter. But what we can certainly conclude is that the effect is real.
OK, then there is the "surprise" part of the research. Remy joked about playing back in the stone age, before umpires could be disciplined for making outrageously bad calls or for just being lousy umpires, before umpires were subject to the scrutiny that they are today. I thought, then, that the effect might have been larger in the earlier part of the data than in the more recent seasons.
I divided the data, then, into two sections—what we could call the Yastrzemski years (1960 to 1981) and what we could call the computer age (1982 to 2011). My thesis was that the effects might have been larger in the Yastrzemski era than in modern baseball.
Here’s the surprise. The exact opposite was true. In fact, all of the observed effect—all of it—was from the computer age. In the years 1960 to 1981, there is no Final Game effect which is observable in the data.
Be careful to note what I am saying there. I am not saying that the Final Game effect did not exist or did not operate in the earlier era. What I am saying is that we cannot observe it in this data. In fact, in the years 1960 to 1981, while the "bullpen usage effect" did operate in these games—that is, starting pitchers came out of the game sooner—the hits allows, runs allowed, and ERA of the starting pitchers in final games was actually higher—worse—than in other games. In that era, the ERA of starting pitchers in all games was 3.69; in final games of home stands, it was 3.73. Walks per nine innings did decrease slightly in that era, from 3.04 to 3.01, but so did strikeouts, from 5.24 to 5.23.
What explains this?
What must explain the data is that there is some other effect in final games of home stands in that era which is masking the Final Game Effect in the data. What is the other effect?
I would guess that it is double headers. In that era, double headers were much more common than they are now. Scheduled double headers were much more common then than they are now, but also rainout/makeup games were more common then than now. Rainouts were more common then than they are now. The fields weren’t as good, the grounds crews weren’t as good, and they couldn’t handle as much rain. In the final game of a home stand, in that era, there was very often a double header. . ..probably 30, 40% of the time.
The pitchers who started the second games of double headers were, on average and in general, somewhat weaker than other starting pitchers. A lot of times a guy who was normally a long reliever would start the second game of a double header. In the data that I have here I can track the quality of starting pitchers, and the data does show this to be true, although it shows it to be a much smaller effect than I would have guessed. The larger effect appears to be that second games of double headers are just different. Since the time required to play a double header is difficult to predict, you probably don’t schedule an umpire’s flight to leave town an hour after the second game is supposed to be over. I would guess that when a double header was to be played on that day, the normal practice (in that era) was to schedule the umpires to fly out the next morning, rather than on the evening after the double header. That’s my theory; that’s the best I can do to explain why the Final Game effect is not seen in the data from that era.
But those of you who are sophisticated in understanding data will immediately spot the other implications of this. If the effect is not observable in some of the data, that must mean that the effect is larger and stronger where it can be observed.
Bingo. In the computer age (1982-2011), the ERA of starting pitchers in Final Games of home stands improves from 4.32 to 4.18. Strikeouts increase by more; walks decrease by more. That’s not all a "Final Game" effect, actually. In the computer age, the quality of pitchers pitching in the final game of a home stand actually is a little better than the overall quality of pitchers.
I understand why that happens. . .it’s a little difficult to explain, and probably requires more time than it justifies. It has to do with keeping your best pitchers on their slot. In modern baseball there are very few double headers, so that effect essentially disappears. In modern baseball the usage patterns of pitchers are much more regular than they were years ago. In modern baseball everybody uses a five-man rotation—but that doesn’t mean that the pitcher pitches every fifth day. A pitcher starting on a five-day rotation will make about 50% of his starts on four days’ rest, and about 50% on longer rest, because of off days in the schedule.
It will happen sometimes that your best starting pitcher would be on five days rest anyway, because of an off day or a rain out, but, with an off day coming up after a home stand, he would be working on six days’ rest. In that situation, a manager will often jump his #1 pitcher up in the rotation so that he works on four days’ rest, rather than six. It’s just a small effect, but it is enough to cause the overall quality of pitchers working in the Final Games of home stands to be slightly better than average, which causes the Final Game effect to be slightly over-measured in the "computer age" part of this study.
Anyway, what we know now is:
1) The "Final Game Effect" postulated by Jerry Remy during that broadcast is certainly real and measurable,
2) The effect is obviously larger than we have measured it because of unavoidable "mixing" of the data, and
3) The effect is also being masked, in part of the data, by other and irrelevant effects.
Our study showed that strikeouts per nine innings by starting pitchers in the Final Game of a home stand increase, walks decrease and runs scored decrease. I think we can safely conclude, because of the "mixing" and "masking" effects that we know are there, that the real underlying effects are probably twice as large as our estimates of them. The umpiring is different when the umpire has to catch a plane after the game, and it does change the game.
Technical Point
Don’t Read This Unless You Like to Be Bored
For technical reasons, the increase in strikeouts in the Final Games of home stands, which shows as an increase from 5.69 to 5.80 per nine innings, is being slightly over-measured. The reason for this is as follows.
Over time, home stands have gotten shorter (and road trips have gotten shorter.) This means that the percentage of games which are final games of home stands has increased. Also, over time, strikeouts have steadily increased. That creates a "common bias" which unites strikeouts and final games of home stands, and this causes the increase in strikeouts in final games of home stands to be measured as being slightly larger than it actually is.
Tag Along Study
While I was studying this, I thought I would also study first games of home stands (and second games, and third games, etc.) to see what effects might be observed there.
That part of the study doesn’t show much. The average quality of a starting pitcher in the first game of a home stand is better than in undescribed games—both by the home team and the road team, but particularly by the "home" team. You can probably figure out why that happens; it has to do with off days before the home stand starts. Anyway, for that reason, the winning percentage of the home team in the first game of a home stand is .545, whereas in the second game it is .540, in the third game .536, in the fourth game .534. After that it tends to go back up slightly, in part because after that, the #1 starter is starting again.
Other than that, and the point that home stands have gotten significantly shorter over the years, that part of the study really doesn’t show very much.