201723
A Day at the Beach
I had a couple of questions in the "Hey, Bill" file that I didn’t know the answers to:
Do you know if there has been research done on how well teams hit the game following an off day?
Cap0088
Do you know, is there a jet lag effect? Do teams play worse in the first games of away series than they do the rest of the series?
taosjohn
These are kind of basic questions about baseball. . .underlying stuff that we probably should know, in our biz, so I decided to research the general question. I don’t know that I actually reached an answer to either of these questions, but I took a step in that direction.
In order to study these issues, we need first of all a method to determine how often a team should be EXPECTED to win a given game. Unless you know how many games a team should be expected to win (out of a group of games) it is difficult to say whether they have exceeded expectations or have not.
On the simplest level we could just assume that every team has a 5050 chance of winning each game. That would get the right number of expected wins for each season or on a very largescale level, but would get less and less accurate as the groups of games become smaller and more specific.
On the second level, we could just assume that a team’s chance of winning each game is their wonlost record for the season. If the team was 9072 on the season, their chance of winning each game would be 55.6%. That’s a step in the right direction, but it is still inaccurate in smaller groups of games, such as all of the games AGAINST any team, unless it’s a .500 team. What if two teams which both have 9072 records on the season meet? Do they EACH have a 55.6% chance of winning the game?
On the third level, we can find each team’s chance of winning the game by crossmultiplying their wins and losses. If one team was 9072 and the other was 8478, then the first team’s chance of winning the game is (90 *78)/ [(90*78) + (72*84)], and the second team’s chance of winning the game is (84*72)/ [(90*78) + (84*72)]. That’s "known" sabermetrics—something I discovered in the 1970s—and it is an accurate method that solves the last problem and gets us a step closer to the estimates we want, but there are still issues.
One of those issues is, home and road. Let’s take a 9072 team against a 7290 team. The 9072 team will win 61% of the games between these two teams (.610), we know, but are they going to win 61% of the games in their home park, and 61% on the road? Obviously not. The next step is to make estimates that work for home games and road games. This is the fourth level.
This is new sabermetrics here, and this is by far the most significant part of this article. In the next few paragraphs I will explain how to establish the expecting winning percentage for each team, given their wonlost records for the season and which is the home team. I have developed methods to attempt to do this before, but this is a much better method, backed by better research. This new method will be or could be useful for doing hundreds of different studies, and I will probably do some of those studies over the next few months, and publish them here. I have basically been stuck on this problem since the 1970s, but I think I am past it now.
But first let me say. . .this method does NOT adjust for who the starting pitchers are. That might be the fifthlevel estimate, to adjust for who the starting pitchers are. That’s more complicated math, and I’m not ready to do that yet, but we have to take THIS step before we move on to THAT step. One thing at a time. Or maybe the fifth level would be to adjust this method for the teamspecific multiyear park effects. It’s probably different if the home team is Colorado than it is if the home team is the Phillies, so you could adjust for that; maybe that’s the fifthlevel estimate. I don’t know; I’m not there yet.
So the expected winning percentage of team A in a particular game, if A is the season’s winning percentage of team A and B is the season’s winning percentage of team B, is
(A * (1B))/ (((A *(1B))+ (B * (1A)))
That’s the crossmultiplication formula that I mentioned before, which yields an expected winning percentage of .610 (.609 756) for Team A when a 9072 team plays a 7290 team, and an expected winning percentage for Team B of .390.
Let’s say that that is Estimate 1 (Est1) for this process. Estimate 2 can be any of four things.
If Team A is the home team and Est1 is .500 or over, then you find Estimate 2 by taking the estimate for Team B, multiplying that by .89, and subtracting the result from 1.000. In this case, that would be .653 (.652 683). Team A, the better team AND at home, has an expected winning percentage of .653.
If Team A is the home team but Estimate 1 is .500 or less, then you find Estimate 2 by simply multiplying the expected winning percentage of Team A by 1.11. This does not apply in our test case (9072 vs. 7290). If Estimate 1 is exactly .500, then these two methods are exactly the same, and it doesn’t matter which one you use.
If Team A is the ROAD team and Est1 is .500 or over, then you find Estimate 2 by taking the estimate for Team B, multiplying that by 1.11, and subtracting the result from 1.000. In our test case, if Team A is on the road, then their expected winning percentage is .567 (.566 829).
If Team A is the road team and Est1 is .500 or less, then you find Estimate 2 by simply multiplying estimate one by .89.
OK, so there are four possible scenarios or situations, and, by choosing the appropriate method from the four, we find the expected winning percentage (Est2) for the team in any of the four conditions. Basically, we’re just modifying the SMALLER estimate from Step 1 by 11% up or down depending on the home field advantage. But we’re not quite done. There’s an Estimate 3 (Est3), but Estimate 3 is the final step of the process, so that’s our final estimate.
For estimate 3, we take estimate 2, multiply it by .94, and add .03. If Estimate 2 is in normal ranges or central ranges.400 to .600—then this adjustment makes almost no difference, and you’d have to study ten million games to know whether it was helpful or not. If the result is more extreme . . . well, many of you have probably encountered this problem from research of all kinds. When the expected result gets close to 1.000 or to .000, the actual result tends to kick back toward the center a little bit like a swimmer kicking off the edge of a swimming pool.
If a team has a 50% chance of winning a game, this last adjustment doesn’t change the estimate at all; it stays at .500. If a team has a .450 or .550 chance of winning a game, this adjustment changes that to .453 or .547. If a team has a .400 or .600 chance of winning a game, this adjustment changes it to .406 or .594. .700, it changes to .688; .800, it changes to .782. It does that because reallife data tells us that we have to. It’s the "There’s always a chance" effect. There is always a chance that a greatly outmanned fighter will land a punch, and win a fight he ought to lose. There is always a chance that a rookie called up despite a 5.17 ERA for Sheboygan will pitch a 3hit shutout against the Cubbies.
If you studied the batting average of the very worst hitters against the very best pitchers—Charlie O’Brien against Randy Johnson—you might calculate that Charlie O’Brien should hit .160 against Randy Johnson, or something like that. But he wouldn’t actually hit .160; he’d hit .170 or something. I haven’t even studied that, but I know that that’s true, because that’s just the way data works. Extreme combinations of loading factors never produce results AS extreme as the math shows that they should be. The math may work perfectly in the center of the chart, but when you get near the edges of the chart it crinkles toward the center. Random events are gravity; they pull things toward the center.
Anyway. . . I did study THIS—I studied wins and losses based on season wonlost records and home/road for 208,160 gamesand I know that that adjustment makes the method work better. We overstate the homefield advantage slightly in forming Est2, and then Est3 pulls it back toward the center. Let’s look at some reallife cases:
September 9, 10, and 11, 2003, Tigers (43119) against the Yankees (10161) in New York. Detroit has an expected winning percentage of .180 in each game. In fact, in a threegame series, they did not win a game.
August 7, 8, and 9, 2009, Reds (7884) against Giants (8874) in San Francisco. The Reds have a .397 expected winning percentage for each game. In fact, the Reds won two out of three.
Dodgers against the Giants in Dodger Stadium, 1987. The Dodgers were 7389 that season, the Giants 9072. However, in Dodger Stadium the Dodgers had an expected winning percentage of .443. In fact, the Dodgers were 33 in the six games played between the two at Dodger Stadium.
Cubs against the Mets in Wrigley Field, 1969. Memorable matchup. The Cubs were 9270 that season, the Mets 10062. However, in Wrigley Field the games between the two were almost a tossup, the Cubs having an expected winning percentage of .498 48, almost .499. In fact, the two played nine games in Wrigley that season, and the Cubs were 45.
Twins against the Rangers in Texas, 2006. The Twins were 9666 in 2006, winning their division; the Rangers were 8082. In Texas, the Twins had an expected winning percentage of .551. In fact, the Twins were 33 in the six games they played in Arlington that summer.
Astros against the Expos in Montreal, 1976. The Astros were 8082 that year, but the Expos were 55107. An average team would beat the Expos that year almost two times in three. The Astros were almost exactly an average team. Even in Montreal, their expected winning percentage against Montreal was .610. In fact, the Astros were 51 in six games in Montreal that summer.
Yankees against the Marlins in Yankee Stadium, June 5, 6, and 7, 1998. The Yankees were 11448. The Marlins, although the defending World Champions, were 54108. The Yankees (at home) had an expected winning percentage of .825 against the Marlins. This is the worst mismatch in my study, the highest expected winning percentage for one team (and the lowest for the other team) of any games played between 1965 and 2013. In fact, the Yankees swept the threegame series.
OK, so that’s how the method works. Now let’s get to the basic question posed by the readers: do teams play better after a day off?
Short answer: teams which have a day off before a game have a very, very small advantage in the first and second games after the day off, which is almost certainly caused by being able to reset their pitching staff, rather than by any other factor.
First, let me acknowledge the limitations of the study. I didn’t factor in who the starting pitchers were, in estimating the win probability for each team. The data I used was Retrosheet data for the years 1965 to 2013, and it was not NEW retrosheet data; some of it is a few years old. There are a few games from the years 1965 to 2013 which are missing, which could cause us to misidentify a game as following a day off, when in reality it was not. I don’t believe that is a real issue. ..well, if I did believe it was a real issue, as opposed to a theoretical one, I would have done the study some other way. There aren’t very many games missing, and a missing game would not reasonably cause an effect to show up; it just might mask a real effect, to some very tiny extent.
In my data, a rainout is the same as an off day. In reality, of course; a rainout ISN’T the same as an off day; you hang around the park three hours waiting to see if you are going to play, that isn’t a day off. Also, I didn’t distinguish between a day off in the middle of a home stand and a day "off" in which the team flew from Tampa Bay to Seattle, which is no day at the beach. Some things could look the same in the study which are not really the same; this is the universal truth about research. Also—and we will see the relevance of this in a moment—I did not consider whether the OTHER team had ALSO had a day off before the game. Very often, if our team had a day off before the game, the other team did, as well. I am guessing that this happens about 50% of the time. If it happens 50% of the time, it would cause a real effect to be understated by 50%, or perhaps more. So that’s an issue. You can cover all those issues when you redo the study; I’ll leave that up to you.
OK, with those limitations understood, here’s what I did. I counted how many GAMES the team had played without a day off. Suppose that a team is off on Thursday, plays on Friday, Saturday and then a doubleheader on Sunday. The game on Friday is marked "1", meaning that the team had a day off before this game, the game on Saturday is marked "2", and the games on Sunday are marked "3" and "4". So we’re counting the number of games a team has played since they had a day off, rather than the number of consecutive days on which the team has played.
This is a chart of all the data; after the chart we’ll start explaining what it all means:
G

Count

RPG

Runs


ORPG

Opp Runs


Ex Wins

Act Wins

Ratio

1

27623

4.37

120672


4.33

119667


13802.7

13890.5

1.006

2

25941

4.39

113905


4.34

112648


12960.0

13109.0

1.011

3

23222

4.39

102013


4.39

101912


11601.3

11640.5

1.003

4

19347

4.41

85332


4.47

86569


9649.6

9554.0

0.990

5

17976

4.47

80420


4.56

81994


8958.5

8851.5

0.988

6

16911

4.40

74484


4.39

74201


8429.2

8493.0

1.008

7

11883

4.49

53354


4.45

52829


5945.4

5997.5

1.009

8

10531

4.43

46688


4.42

46557


5280.4

5287.0

1.001

9

9829

4.38

43081


4.45

43740


4931.4

4799.0

0.973

10

8082

4.55

36777


4.55

36772


4059.7

4046.5

0.997

11

6587

4.46

29395


4.45

29306


3300.2

3270.0

0.991

12

5945

4.45

26473


4.42

26279


2978.3

2991.5

1.004

13

5595

4.47

25005


4.48

25077


2805.5

2808.5

1.001

14

3828

4.45

17032


4.52

17299


1916.5

1891.5

0.987

15

3210

4.48

14385


4.47

14360


1607.9

1587.5

0.987

16

2899

4.42

12801


4.42

12816


1452.5

1460.0

1.005

17

2322

4.53

10517


4.43

10281


1165.2

1169.0

1.003

18

1621

4.37

7091


4.41

7142


808.9

800.5

0.990

>18

4808

4.33

20817


4.32

20793


2426.9

2433.0

1.003












Totals

208160

4.42

920242


4.42

920242


104080.0

104080.0


So there are 27,623 teams which had a day off before the game was played, whereas there are 25,941 which were playing their second game since they had had a day off. This is the full data for that:
G

Count

1

27623

2

25941

3

23222

4

19347

5

17976

6

16911

7

11883

8

10531

9

9829

10

8082

11

6587

12

5945

13

5595

14

3828

15

3210

16

2899

17

2322

18

1621

>18

4808



Totals

208160

You notice the big dropoff after "six". That’s Monday, basically. A lot of teams play Tuesday through Sunday, then they have Monday off again. There another big dropoff after "13"; you play one Monday, you get the next one off, very often.
The 1968 St. Louis Cardinals played 51 games without a day off, the most of any team in my study, 51 games in 49 days. The stretch ran from July 18 to September 4. I tweeted that out after I found it, but I failed to notice what their wonlost record was during that stretch. It was 3021, a little bit below their pace for the season as a whole, but not notably.
But this actually understates it a little bit. The "day off" on July 17 was not an actual day off; it was a rainout in the middle of a threegame series. Before that, they had played 7 games in the previous 6 days. So actually, it’s a 58game, 56day stretch without a real day off, and the Cardinals were 3622 during that stretch.
In the study I counted a Tie Game as a half a win; didn’t know what else to do with them. That explains why the Actual Wins (Act. Wins) in the chart are often .5, rather than whole numbers.
OK, teams following a day off in the study are +88 wins versus expectation, which is an advantage of a little more than onehalf of one percent. Teams in the second game following a day off are +149 games, and in the third game following a day off, +39 games. In games four and five following a day off teams are 96 and 107. These numbers are small relative to the size of the studies, and they could in theory result from random effects.
If you think about what happens with a day off, the largest effect is that you get to reset your starting rotation. Let’s say you get three days off, like the AllStar Game; then you to almost completely reset your starting rotation. If two days off, this is less true but still true; with one day off, less true but still a LITTLE BIT true.
When you play eight or ten days in a row, it very often happens that you wind up using a sixth starter or an emergency starter. When you have a day off, that almost never happens in the first couple of games back. After a day off, you’ve almost always got one of your top three starters going. I am certain that we could prove that this is true. . .in fact, hell, I’ll go do that. Excuse me a moment.
My data has in it the Season Scores for the starting pitchers for each game. It’s an obsolete version of the Season Score, but that doesn’t seem to matter too much.
Wow. Those effects are a lot bigger than I thought they would be. Well, I’ll get back to that in a moment. Let’s approach it from the other end.
You may notice, in the chart we gave before, that teams scored only 4.37 runs per game in the first game following a day off, whereas overall they scored 4.42 runs per game.
G

Count

RPG


ORPG

1

27623

4.37


4.33

2

25941

4.39


4.34

3

23222

4.39


4.39

4

19347

4.41


4.47

5

17976

4.47


4.56

Teams scored FEWER runs following a day off than they did when they had played the four previous days—yet they had a higher winning percentage. Why?
Because they had better pitching. Teams scored fewer runs following a day off because, very often, their opponent had ALSO had a day off the previous day. Both teams have a day off, they both get to reset their starting rotation to at least some small degree, so they both put a good pitcher on the mound. But focusing just on the teams that we KNOW had a day off—the "focus" teams in the study—they allowed 4.33 runs per game in the first game after a day off, 4.34 in the second game, 4.39 in the third game, 4.47 in the fourth game, and 4.56 in the fifth game following their last day off.
This happens because, as the rotations run, you get more fifth and sixth starters making starts. In the first game after a day off, 50.2% of starting pitchers had Season Scores of 100.000 or higher. In the second game after a day off, this number for some reason increases to 52.8%; I don’t know why that happens. In any case, in the third game following the last day off, this number drops to 44.8%, in the fourth game to 41.6%, and in the fifth game, 32.9%. That’s a huge change, given the scale of the study. In the second game following a day off, 52.8% of starting pitchers have season scores of 52.8% or higher; in the fifth game following a day off, only 32.9%. It drops from over onehalf to less than onethird.
After the fifth game, this number goes back up to 45.5% in the 6^{th} game, 45.0% in the 7^{th} game, 43.4% in the 8^{th} game, 41.3% in the 9^{th} game, and 35.4% in the 10^{th} game. Then it recovers to 44.0% in the 11^{th} game.
You see what’s happening, obviously. It’s the fiveman rotation. When a team has a day off, there is a tendency to get back to the start of the fiveman rotation. In the 6^{th} game, you’re back to the start of the fiveman rotation, sometimes, and in the 11^{th} game, you’re back there again. Of course the regularity of the pattern breaks down over time. And the same thing is happening for the opposing teams, but it is happening for only about half the teams, because only about half of them had a day off that matched up with the focus team’s day off.
The actual "fatigue" effects. . . ehn. There’s not that much evidence for them in the study. Teams playing their 16^{th} game without a day off won 1,460 games, against an expectation of 1452.5, granting that, by the 16^{th} game, you’re back to the front end of the rotation again.
One of the questioners asked about jet lag. I didn’t get to that issue; that’s a step beyond us. I did establish a methodology that gets us a step closer to that issue, but I’m not there yet. But I did break down the firstday and secondday effects into home games and road games.
Home teams had a .548 winning percentage in their first game following a day off (.547 764) as opposed to an expected winning percentage in those games of .543 (.543 311), so they overachieved by 8/10ths of one percent. Road teams had a .455 winning percentage following a day off (.455 040) as opposed to an expectation of .453 (.453 137), so they overachieved by 4/10ths of one percent.
But in the SECOND game following a day off, home teams overachieved by the same 8/10ths of one percent at home, but 1.6% on the road, posting a .463 winning percentage against an expectation of .455.
Well. .not everything is meaningful, and I don’t really know what that means. We’re dealing with a large study here—208,000 team games—but when you break those down into games played on the second day following a day off, then you’re down to 25,941 games. Break that down to home and road, you’re down to 13,031 road games. Break that down to wins and losses, you’re down to 6,000 wins and 7,000 losses, more or less. Divide one number by another, a small percentage difference may no longer be statistically meaningful. Thanks for reading.