Username:	Password:

Remember me

Forgot your username/password?

Print Email

Home>Articles

A Day at the Beach

By Bill James

May 7, 2017

2017-23

A Day at the Beach

I had a couple of questions in the "Hey, Bill" file that I didn’t know the answers to:

Do you know if there has been research done on how well teams hit the game following an off day?

Cap0088

Do you know, is there a jet lag effect? Do teams play worse in the first games of away series than they do the rest of the series?

taosjohn

These are kind of basic questions about baseball. . .underlying stuff that we probably should know, in our biz, so I decided to research the general question. I don’t know that I actually reached an answer to either of these questions, but I took a step in that direction.

In order to study these issues, we need first of all a method to determine how often a team should be EXPECTED to win a given game. Unless you know how many games a team should be expected to win (out of a group of games) it is difficult to say whether they have exceeded expectations or have not.

On the simplest level we could just assume that every team has a 50-50 chance of winning each game. That would get the right number of expected wins for each season or on a very large-scale level, but would get less and less accurate as the groups of games become smaller and more specific.

On the second level, we could just assume that a team’s chance of winning each game is their won-lost record for the season. If the team was 90-72 on the season, their chance of winning each game would be 55.6%. That’s a step in the right direction, but it is still inaccurate in smaller groups of games, such as all of the games AGAINST any team, unless it’s a .500 team. What if two teams which both have 90-72 records on the season meet? Do they EACH have a 55.6% chance of winning the game?

On the third level, we can find each team’s chance of winning the game by cross-multiplying their wins and losses. If one team was 90-72 and the other was 84-78, then the first team’s chance of winning the game is (90 *78)/ [(90*78) + (72*84)], and the second team’s chance of winning the game is (84*72)/ [(90*78) + (84*72)]. That’s "known" sabermetrics—something I discovered in the 1970s—and it is an accurate method that solves the last problem and gets us a step closer to the estimates we want, but there are still issues.

One of those issues is, home and road. Let’s take a 90-72 team against a 72-90 team. The 90-72 team will win 61% of the games between these two teams (.610), we know, but are they going to win 61% of the games in their home park, and 61% on the road? Obviously not. The next step is to make estimates that work for home games and road games. This is the fourth level.

This is new sabermetrics here, and this is by far the most significant part of this article. In the next few paragraphs I will explain how to establish the expecting winning percentage for each team, given their won-lost records for the season and which is the home team. I have developed methods to attempt to do this before, but this is a much better method, backed by better research. This new method will be or could be useful for doing hundreds of different studies, and I will probably do some of those studies over the next few months, and publish them here. I have basically been stuck on this problem since the 1970s, but I think I am past it now.

But first let me say. . .this method does NOT adjust for who the starting pitchers are. That might be the fifth-level estimate, to adjust for who the starting pitchers are. That’s more complicated math, and I’m not ready to do that yet, but we have to take THIS step before we move on to THAT step. One thing at a time. Or maybe the fifth level would be to adjust this method for the team-specific multi-year park effects. It’s probably different if the home team is Colorado than it is if the home team is the Phillies, so you could adjust for that; maybe that’s the fifth-level estimate. I don’t know; I’m not there yet.

So the expected winning percentage of team A in a particular game, if A is the season’s winning percentage of team A and B is the season’s winning percentage of team B, is

(A * (1-B))/ (((A *(1-B))+ (B * (1-A)))

That’s the cross-multiplication formula that I mentioned before, which yields an expected winning percentage of .610 (.609 756) for Team A when a 90-72 team plays a 72-90 team, and an expected winning percentage for Team B of .390.

Let’s say that that is Estimate 1 (Est1) for this process. Estimate 2 can be any of four things.

If Team A is the home team and Est1 is .500 or over, then you find Estimate 2 by taking the estimate for Team B, multiplying that by .89, and subtracting the result from 1.000. In this case, that would be .653 (.652 683). Team A, the better team AND at home, has an expected winning percentage of .653.

If Team A is the home team but Estimate 1 is .500 or less, then you find Estimate 2 by simply multiplying the expected winning percentage of Team A by 1.11. This does not apply in our test case (90-72 vs. 72-90). If Estimate 1 is exactly .500, then these two methods are exactly the same, and it doesn’t matter which one you use.

If Team A is the ROAD team and Est1 is .500 or over, then you find Estimate 2 by taking the estimate for Team B, multiplying that by 1.11, and subtracting the result from 1.000. In our test case, if Team A is on the road, then their expected winning percentage is .567 (.566 829).

If Team A is the road team and Est1 is .500 or less, then you find Estimate 2 by simply multiplying estimate one by .89.

OK, so there are four possible scenarios or situations, and, by choosing the appropriate method from the four, we find the expected winning percentage (Est2) for the team in any of the four conditions. Basically, we’re just modifying the SMALLER estimate from Step 1 by 11% up or down depending on the home field advantage. But we’re not quite done. There’s an Estimate 3 (Est3), but Estimate 3 is the final step of the process, so that’s our final estimate.

For estimate 3, we take estimate 2, multiply it by .94, and add .03. If Estimate 2 is in normal ranges or central ranges--.400 to .600—then this adjustment makes almost no difference, and you’d have to study ten million games to know whether it was helpful or not. If the result is more extreme . . . well, many of you have probably encountered this problem from research of all kinds. When the expected result gets close to 1.000 or to .000, the actual result tends to kick back toward the center a little bit like a swimmer kicking off the edge of a swimming pool.

If a team has a 50% chance of winning a game, this last adjustment doesn’t change the estimate at all; it stays at .500. If a team has a .450 or .550 chance of winning a game, this adjustment changes that to .453 or .547. If a team has a .400 or .600 chance of winning a game, this adjustment changes it to .406 or .594. .700, it changes to .688; .800, it changes to .782. It does that because real-life data tells us that we have to. It’s the "There’s always a chance" effect. There is always a chance that a greatly outmanned fighter will land a punch, and win a fight he ought to lose. There is always a chance that a rookie called up despite a 5.17 ERA for Sheboygan will pitch a 3-hit shutout against the Cubbies.

If you studied the batting average of the very worst hitters against the very best pitchers—Charlie O’Brien against Randy Johnson—you might calculate that Charlie O’Brien should hit .160 against Randy Johnson, or something like that. But he wouldn’t actually hit .160; he’d hit .170 or something. I haven’t even studied that, but I know that that’s true, because that’s just the way data works. Extreme combinations of loading factors never produce results AS extreme as the math shows that they should be. The math may work perfectly in the center of the chart, but when you get near the edges of the chart it crinkles toward the center. Random events are gravity; they pull things toward the center.

Anyway. . . I did study THIS—I studied wins and losses based on season won-lost records and home/road for 208,160 games--and I know that that adjustment makes the method work better. We overstate the home-field advantage slightly in forming Est2, and then Est3 pulls it back toward the center. Let’s look at some real-life cases:

September 9, 10, and 11, 2003, Tigers (43-119) against the Yankees (101-61) in New York. Detroit has an expected winning percentage of .180 in each game. In fact, in a three-game series, they did not win a game.

August 7, 8, and 9, 2009, Reds (78-84) against Giants (88-74) in San Francisco. The Reds have a .397 expected winning percentage for each game. In fact, the Reds won two out of three.

Dodgers against the Giants in Dodger Stadium, 1987. The Dodgers were 73-89 that season, the Giants 90-72. However, in Dodger Stadium the Dodgers had an expected winning percentage of .443. In fact, the Dodgers were 3-3 in the six games played between the two at Dodger Stadium.

Cubs against the Mets in Wrigley Field, 1969. Memorable matchup. The Cubs were 92-70 that season, the Mets 100-62. However, in Wrigley Field the games between the two were almost a tossup, the Cubs having an expected winning percentage of .498 48, almost .499. In fact, the two played nine games in Wrigley that season, and the Cubs were 4-5.

Twins against the Rangers in Texas, 2006. The Twins were 96-66 in 2006, winning their division; the Rangers were 80-82. In Texas, the Twins had an expected winning percentage of .551. In fact, the Twins were 3-3 in the six games they played in Arlington that summer.

Astros against the Expos in Montreal, 1976. The Astros were 80-82 that year, but the Expos were 55-107. An average team would beat the Expos that year almost two times in three. The Astros were almost exactly an average team. Even in Montreal, their expected winning percentage against Montreal was .610. In fact, the Astros were 5-1 in six games in Montreal that summer.

Yankees against the Marlins in Yankee Stadium, June 5, 6, and 7, 1998. The Yankees were 114-48. The Marlins, although the defending World Champions, were 54-108. The Yankees (at home) had an expected winning percentage of .825 against the Marlins. This is the worst mismatch in my study, the highest expected winning percentage for one team (and the lowest for the other team) of any games played between 1965 and 2013. In fact, the Yankees swept the three-game series.

OK, so that’s how the method works. Now let’s get to the basic question posed by the readers: do teams play better after a day off?

Short answer: teams which have a day off before a game have a very, very small advantage in the first and second games after the day off, which is almost certainly caused by being able to re-set their pitching staff, rather than by any other factor.

First, let me acknowledge the limitations of the study. I didn’t factor in who the starting pitchers were, in estimating the win probability for each team. The data I used was Retrosheet data for the years 1965 to 2013, and it was not NEW retrosheet data; some of it is a few years old. There are a few games from the years 1965 to 2013 which are missing, which could cause us to misidentify a game as following a day off, when in reality it was not. I don’t believe that is a real issue. ..well, if I did believe it was a real issue, as opposed to a theoretical one, I would have done the study some other way. There aren’t very many games missing, and a missing game would not reasonably cause an effect to show up; it just might mask a real effect, to some very tiny extent.

In my data, a rainout is the same as an off day. In reality, of course; a rainout ISN’T the same as an off day; you hang around the park three hours waiting to see if you are going to play, that isn’t a day off. Also, I didn’t distinguish between a day off in the middle of a home stand and a day "off" in which the team flew from Tampa Bay to Seattle, which is no day at the beach. Some things could look the same in the study which are not really the same; this is the universal truth about research. Also—and we will see the relevance of this in a moment—I did not consider whether the OTHER team had ALSO had a day off before the game. Very often, if our team had a day off before the game, the other team did, as well. I am guessing that this happens about 50% of the time. If it happens 50% of the time, it would cause a real effect to be under-stated by 50%, or perhaps more. So that’s an issue. You can cover all those issues when you re-do the study; I’ll leave that up to you.

OK, with those limitations understood, here’s what I did. I counted how many GAMES the team had played without a day off. Suppose that a team is off on Thursday, plays on Friday, Saturday and then a double-header on Sunday. The game on Friday is marked "1", meaning that the team had a day off before this game, the game on Saturday is marked "2", and the games on Sunday are marked "3" and "4". So we’re counting the number of games a team has played since they had a day off, rather than the number of consecutive days on which the team has played.

This is a chart of all the data; after the chart we’ll start explaining what it all means:

G	Count	RPG	Runs	ORPG	Opp Runs	Ex Wins	Act Wins	Ratio
1	27623	4.37	120672	4.33	119667	13802.7	13890.5	1.006
2	25941	4.39	113905	4.34	112648	12960.0	13109.0	1.011
3	23222	4.39	102013	4.39	101912	11601.3	11640.5	1.003
4	19347	4.41	85332	4.47	86569	9649.6	9554.0	0.990
5	17976	4.47	80420	4.56	81994	8958.5	8851.5	0.988
6	16911	4.40	74484	4.39	74201	8429.2	8493.0	1.008
7	11883	4.49	53354	4.45	52829	5945.4	5997.5	1.009
8	10531	4.43	46688	4.42	46557	5280.4	5287.0	1.001
9	9829	4.38	43081	4.45	43740	4931.4	4799.0	0.973
10	8082	4.55	36777	4.55	36772	4059.7	4046.5	0.997
11	6587	4.46	29395	4.45	29306	3300.2	3270.0	0.991
12	5945	4.45	26473	4.42	26279	2978.3	2991.5	1.004
13	5595	4.47	25005	4.48	25077	2805.5	2808.5	1.001
14	3828	4.45	17032	4.52	17299	1916.5	1891.5	0.987
15	3210	4.48	14385	4.47	14360	1607.9	1587.5	0.987
16	2899	4.42	12801	4.42	12816	1452.5	1460.0	1.005
17	2322	4.53	10517	4.43	10281	1165.2	1169.0	1.003
18	1621	4.37	7091	4.41	7142	808.9	800.5	0.990
>18	4808	4.33	20817	4.32	20793	2426.9	2433.0	1.003

Totals	208160	4.42	920242	4.42	920242	104080.0	104080.0

So there are 27,623 teams which had a day off before the game was played, whereas there are 25,941 which were playing their second game since they had had a day off. This is the full data for that:

G	Count
1	27623
2	25941
3	23222
4	19347
5	17976
6	16911
7	11883
8	10531
9	9829
10	8082
11	6587
12	5945
13	5595
14	3828
15	3210
16	2899
17	2322
18	1621
>18	4808

Totals	208160

You notice the big drop-off after "six". That’s Monday, basically. A lot of teams play Tuesday through Sunday, then they have Monday off again. There another big drop-off after "13"; you play one Monday, you get the next one off, very often.

The 1968 St. Louis Cardinals played 51 games without a day off, the most of any team in my study, 51 games in 49 days. The stretch ran from July 18 to September 4. I tweeted that out after I found it, but I failed to notice what their won-lost record was during that stretch. It was 30-21, a little bit below their pace for the season as a whole, but not notably.

But this actually understates it a little bit. The "day off" on July 17 was not an actual day off; it was a rainout in the middle of a three-game series. Before that, they had played 7 games in the previous 6 days. So actually, it’s a 58-game, 56-day stretch without a real day off, and the Cardinals were 36-22 during that stretch.

In the study I counted a Tie Game as a half a win; didn’t know what else to do with them. That explains why the Actual Wins (Act. Wins) in the chart are often .5, rather than whole numbers.

OK, teams following a day off in the study are +88 wins versus expectation, which is an advantage of a little more than one-half of one percent. Teams in the second game following a day off are +149 games, and in the third game following a day off, +39 games. In games four and five following a day off teams are -96 and -107. These numbers are small relative to the size of the studies, and they could in theory result from random effects.

If you think about what happens with a day off, the largest effect is that you get to re-set your starting rotation. Let’s say you get three days off, like the All-Star Game; then you to almost completely re-set your starting rotation. If two days off, this is less true but still true; with one day off, less true but still a LITTLE BIT true.

When you play eight or ten days in a row, it very often happens that you wind up using a sixth starter or an emergency starter. When you have a day off, that almost never happens in the first couple of games back. After a day off, you’ve almost always got one of your top three starters going. I am certain that we could prove that this is true. . .in fact, hell, I’ll go do that. Excuse me a moment.

My data has in it the Season Scores for the starting pitchers for each game. It’s an obsolete version of the Season Score, but that doesn’t seem to matter too much.

Wow. Those effects are a lot bigger than I thought they would be. Well, I’ll get back to that in a moment. Let’s approach it from the other end.

You may notice, in the chart we gave before, that teams scored only 4.37 runs per game in the first game following a day off, whereas overall they scored 4.42 runs per game.

G	Count	RPG	ORPG
1	27623	4.37	4.33
2	25941	4.39	4.34
3	23222	4.39	4.39
4	19347	4.41	4.47
5	17976	4.47	4.56

Teams scored FEWER runs following a day off than they did when they had played the four previous days—yet they had a higher winning percentage. Why?

Because they had better pitching. Teams scored fewer runs following a day off because, very often, their opponent had ALSO had a day off the previous day. Both teams have a day off, they both get to re-set their starting rotation to at least some small degree, so they both put a good pitcher on the mound. But focusing just on the teams that we KNOW had a day off—the "focus" teams in the study—they allowed 4.33 runs per game in the first game after a day off, 4.34 in the second game, 4.39 in the third game, 4.47 in the fourth game, and 4.56 in the fifth game following their last day off.

This happens because, as the rotations run, you get more fifth and sixth starters making starts. In the first game after a day off, 50.2% of starting pitchers had Season Scores of 100.000 or higher. In the second game after a day off, this number for some reason increases to 52.8%; I don’t know why that happens. In any case, in the third game following the last day off, this number drops to 44.8%, in the fourth game to 41.6%, and in the fifth game, 32.9%. That’s a huge change, given the scale of the study. In the second game following a day off, 52.8% of starting pitchers have season scores of 52.8% or higher; in the fifth game following a day off, only 32.9%. It drops from over one-half to less than one-third.

After the fifth game, this number goes back up to 45.5% in the 6^th game, 45.0% in the 7^th game, 43.4% in the 8^th game, 41.3% in the 9^th game, and 35.4% in the 10^th game. Then it recovers to 44.0% in the 11^th game.

You see what’s happening, obviously. It’s the five-man rotation. When a team has a day off, there is a tendency to get back to the start of the five-man rotation. In the 6^th game, you’re back to the start of the five-man rotation, sometimes, and in the 11^th game, you’re back there again. Of course the regularity of the pattern breaks down over time. And the same thing is happening for the opposing teams, but it is happening for only about half the teams, because only about half of them had a day off that matched up with the focus team’s day off.

The actual "fatigue" effects. . . ehn. There’s not that much evidence for them in the study. Teams playing their 16^th game without a day off won 1,460 games, against an expectation of 1452.5, granting that, by the 16^th game, you’re back to the front end of the rotation again.

One of the questioners asked about jet lag. I didn’t get to that issue; that’s a step beyond us. I did establish a methodology that gets us a step closer to that issue, but I’m not there yet. But I did break down the first-day and second-day effects into home games and road games.

Home teams had a .548 winning percentage in their first game following a day off (.547 764) as opposed to an expected winning percentage in those games of .543 (.543 311), so they over-achieved by 8/10ths of one percent. Road teams had a .455 winning percentage following a day off (.455 040) as opposed to an expectation of .453 (.453 137), so they overachieved by 4/10ths of one percent.

But in the SECOND game following a day off, home teams over-achieved by the same 8/10ths of one percent at home, but 1.6% on the road, posting a .463 winning percentage against an expectation of .455.

Well. .not everything is meaningful, and I don’t really know what that means. We’re dealing with a large study here—208,000 team games—but when you break those down into games played on the second day following a day off, then you’re down to 25,941 games. Break that down to home and road, you’re down to 13,031 road games. Break that down to wins and losses, you’re down to 6,000 wins and 7,000 losses, more or less. Divide one number by another, a small percentage difference may no longer be statistically meaningful. Thanks for reading.