Print

Hot Teams and Cold, Dead Subjects

September 5, 2017
 2017-43

 

Hot Teams and Cold, Dead Subjects

              Can you stand one more study of whether teams actually get hot?

              This study is different from the studies of this issue that I have done before, first in that it is larger and better organized than the studies I have done before, and second in that it does find evidence of some effect—not trace evidence or suggestive evidence, but actual, clear and persuasive evidence of SOME effect.   Whether that is a "hot hand" effect or a slightly different phenomena is questionable, but there is SOMETHING there, which is a step forward from previous studies of the issue.

              There are two related conditions:  

              1) That a team could be "hot" at one moment and "cold" at another, and

              2)  That a team could have actual changes in how good they were, over the course of a season or suddenly as a consequence of an injury or trade. 

              Whether teams get hot and cold is debatable, but obviously there have to be some changes in the quality of a team over the course of a season.   Every team does not wind up the season in the same condition that they began the season.    Players get old, get injured; young players develop and veterans return from injury.   That has to make SOME difference in how teams perform. 

              In this study we reach the point of being able to document that there IS some variation in team performance over a season which is related to how well they have been playing recently.   There IS a predictable and meaningful consequence of the team having played well or played poorly in recent games.   It’s a small effect, but not so small that it is likely to be the result of an accident of measurement. 

              OK, here’s what I did.   First of all, I studied every regular season team/game in my game logs from 1965 to 2013, which is 208,160 team/games (104,080 distinct games.  A few games are missing from the logs.)   For each game, I "predicted" the likelihood that each team would win, based on the end-of-season won-lost record of each team and which team was at home.   In other words, on July 8, 1969, the Cubs played the Mets in Shea Stadium.    The Cubs in 1969 were 92-70, the Mets were 100-62.   Based on THAT information, the Mets would have a 55.1% chance to win the game, and the Cubs would have a 44.9% chance to win the game.   But the Mets were at home.  When we factor in the home field advantage, the Mets’ expectation for a win goes from .551 to .594, and the Cubs’ expectation for a win goes from .449 to .406.  

              A week later (July 14) the Mets and Cubs played in Wrigley Field.   In Wrigley Field, the Mets’ expectation for a win goes from .551 to .502, and the Cubs’ expectation for a win goes from .449 to .498.   In Wrigley, it’s basically a tossup.  

              Then I measured how "hot" each team had been in the previous games.   How I measured this is not important.   I will explain how I measured it, because this is a quasi-scientific paper and that is the requirement of science, but it doesn’t really matter; I promise you that if you study the same material with the same general approach but make up a different formula to measure how hot the teams have been in recent games, you are GOING to get the same result.  

              The question is, will teams do better vs. expectations for that game when they are "hot" than when they are "cold"?    The answer is:   Yes, they do.   The next question will be "Why did I find a hot/cold effect in THIS study, when I haven’t found it in previous studies?"   I’ll get to that question in just a moment, but first I need to explain how I measured whether a team was hot or cold.  

              Every team had a "Heat Index" of 50.000 on opening day.   After the opening day, each team’s score was whatever their score had been before the game, times .940, plus 6 points if they had won the game, plus one point for each run they had scored in the game, minus one point for each run they had allowed.  

              The Cubs in 1969 had been hot from June 23 to July 4, winning ten out of thirteen games.   At the end of that time the Cubs had a heat index of 84.7, which is quite high (93rd percentile).   They played the Cardinals before the Mets, however, and the Cardinals beat them the last three games of a four-game series, so then they were no longer such a hot team.   The three straight losses dropped their heat index heading into the Mets’ series to 61.95 (69th percentile). 

              The Mets beat them in Game 1, 4 to 3.   That dropped their heat index to 57.2 (61.95 *.94, minus one).   The Mets beat them again in Game 2, 4 to 0.   That dropped their heat index to 49.8. 

              The Cubs, though, rallied to win the last game of the Mets’ series and four of the five after that.   They were hot again, back up to 72.7.   For the Cubs, the hottest moment of their 1969 season was after the game of June 6, when they beat the Reds 14 to 8 for their seventh straight win, ten out of twelve.  Their heat index at that time was 109.1.  The coldest moment of their season was on September 15, when the Expos beat them 8 to 2, their 11th loss in 12 games, dropping their heat index to 14.9. 

              For the Mets, their hottest moment of the 1969 season was on September 13, when they completed a ten-game winning streak, giving them a heat index of 108.3, and their coldest moment was on April 17, when six losses in seven dropped their early-season record to 3-7, giving them a heat index of 27.8. 

              I then sorted the games by the team’s heat index coming into the game.   The hottest that any team was in my study was the Yankees on August 12, 1998, after they swept the Twins in a three-game series, 7-3, 7-0 and 11-2.  That gave the Yankees seven straight wins, 19 out of 23, and a heat index of 148.97.   They were so hot that even though they won the next two games, their heat index went DOWN because they won by the modest scores of 2-0 and 6-4.   That is almost impossible to do, to get SO hot that your heat index goes down with a win.   The coldest that any team was in my study was the 1996 Tigers who, after starting the season 8-7, lost 39 out of 44 games to drop to 13-46.  

              I sorted the teams into 10 groups, according to how hot they were.   (Of course, there were a large number of opening-day teams in the center of the chart with scores of 50.00.   I just arbitrarily split those into groups 5 and 6.)   There were 20,816 team/games in each group.  The teams in Group 1—the hottest teams in the study—had an expectation of 11,998 wins in 20,816 games, but actually won only 11,573 games, 425 fewer than expected.    They under-performed by 3.5%.    This is the summary data for all ten groups:

 

 

 

Group

Wins

Expected Wins

Ratio

1

11573

11998

.965

2

11345

11443

.991

3

10874

11096

.980

4

10736.5

10814

.993

5

10504

10531

.997

6

10260.5

10303

.996

7

10114.5

10071

1.004

8

9966

9762

1.021

9

9648.5

9369

1.030

10

9058

8693

1.042

 

              Well, but if the hot teams won fewer games than expected and the cold teams won more games than expected, doesn’t that prove that there is no hot/cold effect?

              No, it doesn’t, because of what we will call the MASOG effect.   Mathematical Artifact/Segregation of Games.    MASOG.  

              Take a .500 team, 81-81.   If they won their game yesterday, then they are 80-81 in their other 161 games.   If they lost yesterday, they are 81-80 in their other games.  

              In a single game, the effects of this are not large enough to be observable against random variations in won-lost patterns, but when you study teams which are red hot or ice cold, these effects become noticeable in the data.   Teams in the top 10% of the heat index might generally have won 7 of their last 8 games or something, while teams at the bottom of the chart might generally be 1-7 in their last eight.    Assuming they are .500 teams, that would mean that the hot teams had 74 wins left in the pool and 80 losses, while the cold teams have 80 wins and 74 losses.   The lower-than-expected winning percentage for hot teams HAS to be there, because of the mathematical artifact created from segregating out of the data a series of wins or a series of losses.  

              So how, then, do I reach the conclusion that the data clearly shows hot/cold effects (or else shows some effect from teams being actually better in one part of the season or another?)

              Here’s how we know.    I sorted the data for each team at random; that is, the games for the 1969 Cubs were not put in sequence with games from other teams or from other seasons, but the games of the 1969 Cubs were put in a random sequence.    Then I repeated the study, as if the random sequence was the team’s actual sequence of games.  

              If there IS a hot/cold effect (or a change of quality during the season effect), then two things should happen.   First, the standard deviation of the Heat Index should be larger in the "real" data than in the random sequence data.     If teams ACTUALLY get hot during the season, or if they actually get better or worse during the course of the season, then the standard deviation of the Heat Index should be larger in the real data than in the random sequence data.  

              Second, if teams actually get hot or cold during the season or if their performance varies according to the time line, then the MASOG effect should be smaller in the real than in the randomly sorted data, for the same reason.  

              Both of those things were clearly true.   In the real-sequence data, the standard deviation of the Heat Index was 23.82.   In the random-sequence data, it was 23.68.  

              In the real-sequence data, the 20% hottest teams won 523 games fewer than expected, because of the MASOG effect, and the 20% coldest teams won 644 more than expected.    In the random-sequence data, the (illusory) hottest teams won 666 games fewer than expected (cross yourself here to ward off the effects of the devil’s number), while the coldest teams won 725 more than expected.   The data is what would expect it to be, if there is a "real" hot team/cold team or change-of-level effect.  Here’s a chart:

Real Data

   

Random Data

Group

Wins

Expected Wins

Ratio

   

Group

Win Tot

Ex Win Tot

Ratio

1

11573

11998

.965

   

1

11515

11952

.963

2

11345

11443

.991

   

2

11176.5

11405

.980

3

10874

11096

.980

   

3

10911

11097

.983

4

10736.5

10814

.993

   

4

10637.5

10784

.986

5

10504

10531

.997

   

5

10639

10548

1.009

6

10260.5

10303

.996

   

6

10222.5

10330

.990

7

10114.5

10071

1.004

   

7

10251.5

10046

1.020

8

9966

9762

1.021

   

8

9857

9773

1.009

9

9648.5

9369

1.030

   

9

9754

9424

1.035

10

9058

8693

1.042

   

10

9116

8721

1.045

 

              There are actually lots of clues in this chart that there is a real effect.   The expected wins of the hottest teams is 11,952 in the random data, 11,998 in the real data.   That’s because the standard deviation of the heat index is higher.   Reverse effect on the other side.   The effects sited in the paragraph above the charts is largest in the 1-2 and 9-10 groups, but it is still consistently detectable in the 3-4-5 6-7-8 groups.   

              I repeated the experiment, creating a second random sequence for each team.   Again, every indication of a real effect was present in comparing the random to the real data.   In the second random sort, the standard deviation of the heat index dropped to 23.39, as opposed to 23.82 in the real data.   All of the effects noted above were larger in the second random sort than in the first:


 

 

 

Real Data

   

Random Data

Group

Wins

Expected Wins

Ratio

   

Group

Win Tot

Ex Win Tot

Ratio

1

11573

11998

.965

   

1

11407.5

11919

.957

2

11345

11443

.991

   

2

11074.5

11418

.970

3

10874

11096

.980

   

3

10940

11067

.989

4

10736.5

10814

.993

   

4

10869

10829

1.004

5

10504

10531

.997

   

5

10512

10525

.999

6

10260.5

10303

.996

   

6

10342

10341

1.000

7

10114.5

10071

1.004

   

7

10093.5

10084

1.001

8

9966

9762

1.021

   

8

10024.5

9769

1.026

9

9648.5

9369

1.030

   

9

9620.5

9398

1.024

10

9058

8693

1.042

   

10

9196.5

8729

1.054

 

              I repeated the experiment a third time.   Same thing.   Standard deviation of the Heat Index in the randomly sorted data:  23.49.   Bottom 20% of the teams have 844 more wins than expected in the randomly sorted data, 644 more in the actual data.    I repeated the experiment a fourth time.  Same thing.   Every indicator consistent with a real effect was present in the data. 

 

              This leaves us with three questions:

              1)  Why did THIS study find a clear hot/cold effect when my previous studies of this subject have failed to find one?

              2)  How large is this effect in real terms? And

              3)  How can we tell whether this is actually a hot/cold effect or is in fact a change-of-performance level effect? 

 

 

              1)  Why did THIS study find a clear hot/cold effect when my previous studies of this subject have failed to find one?

              Three reasons.   First, it is an enormous study, looking at 208,000 games—whereas in the 1980s I might have studied one team in a 162-game season.   The size of the study magnifies the effect, which enables us to spot a real effect which would be invisible in a smaller sample. 

              Second, earlier this year I developed a method to adjust a team’s expected winning percentage in a given game for the home and road differential.   I have had a method since the 1970s to estimate a team’s chance of winning based on the two won-lost records.   But until recently I had never had a method to adjust that for which was the home team.   

              In a study like this, it is helpful to calibrate expectations as carefully as you can.   Having that process available is a step forward in that regard.  Fuzzier expectations would lead to smaller measured separations in the data.  

              And third, I was able to figure out this time in general terms how to counter-adjust for the MASOG effect.   You can’t "see" the hot team effect until you counter-adjust for the MASOG effect.   My previous studies, I had not been able to do that. 

              2)  How large is this effect in real terms?

              It’s small.   You have to go to four decimals to measure it; let’s put it that way.   It’s about a 1% effect at the margins.   In other words, the very hottest teams (the top 15%) are likely to win about 1% more games than when estimated without awareness of the effect.  A .500 team, when red hot, is about a .505 team.   The very coldest teams (the coldest 15%) are likely to lose about 1% more often than expected otherwise. 

              3)  How can we tell whether this is actually a hot/cold effect or is in fact a change-of-performance level effect?

              At this time I have no understanding of that question, and also, it would be a violation of best practices for me to try to wrap THAT question into this one.   The development of knowledge works best with cautious steps.   When you try to build one study on top of another, what will usually happen is that they will collapse and you will have nothing.

              It could be done.   There’s some way it could be done, but there might not be enough data in the universe to actually do it.   You could compare teams at different parts of the season, for example, reasoning that real changes in performance between May and September would be larger than real changes between July and August.   You could study it by injuries, maybe.   There’s some way it could be done. 

              While I have not done this, this study also shows how you might be able to identify hot/cold effects for hitters or pitchers, at a low level.   Let’s say pitchers:

              1)  Create game logs for starting pitchers which include 200,000 starts or more,

              2)  Develop a method to estimate what pitcher’s ERA for the game SHOULD BE in consideration of the park, the opponent, his season’s ERA and whether the team is at home or on the road,

              3)  Develop a method to distinguish between hot pitchers and cold pitchers,

              4)  Figure the collective ERA for pitchers according to how hot they have been,

              5)  Put the data for each pitcher in random-order sequence, and

              6)  Compare the real-sequence data to the random-sequence data. 

              My guess it that it could be done.   I’ll open this up to comments by users in a day or two.   Thanks.   

 
 

COMMENTS (14 Comments, most recent shown first)

msandler
how hot is cleveland right now?
2:34 PM Sep 14th
 
3for3
I don't know about the rest of you, but if a top 15% team wins 1% more games than expected, I'd say it was no effect at all, whether it was statistically significant or not.
10:12 PM Sep 6th
 
FrankD
What might be illustrative is to chunk the data set into halves - say by years 1965-1989 and 1990-2013 and see if the same effect is in both the 'old' data set and the 'new'.
6:40 PM Sep 6th
 
Guy123
Bill, what was the win% of the last 10 games for each group? Basically, I'm trying to get a feel of what being "hot" means? Is that a .700 record?
Tango, we can almost reverse-engineer this from Bill's randomized results. A very hot team (top decile) should have a .574 win%, but in fact has .553. If you assume the "heat" accumulates over 15 previous games, then .574 teams would have to play at .780 to lower their expected winning % that much. Or they could play at a .723 clip over 20 games. To lower expected wins that much over just 10 games, the team would have to win at a torrid .893 rate.

In Bill's "real" data, hot teams are slightly hotter, but the difference appears to be quite small.

3:16 PM Sep 6th
 
Guy123
Another way to create your deciles would be to use "Heat+", that is, the ratio of a team's current Heat Index to its mean Heat Index for the season. That way, each decile should consist of teams of roughly equal strength (close to .500), giving you a purer measure of heat. I don't expect it will change your findings at all, but it would better separate a team's strength from its "hotness."

1:06 PM Sep 6th
 
MarisFan61
About there being an argument for ending the study at August 31: Shouldn't it just as much be July 31? At least for recent years, since trade deadline moves became such an extremely prominent thing.

My impression is that's the main watershed moment where better teams start doing 'more better' and worse teams 'more worse.' And, I would think that's a greater effect than how teams do or don't use/look at their kid call-ups in September.

I imagine that a big part of the bent toward considering September the outlier is all the talk about how ridiculous it is to have such an expanded roster and that something ought to be done about it, because (among other possible reasons for the view) it makes teams into something different than they'd been. But isn't the trade-deadline stuff a larger factor?
11:56 AM Sep 6th
 
tangotiger
Bill: great thanks.

One thing about "recency" (persistent) v "hotness" (transient) is that we should see the effect differently for the next game (G+1) and for two games from now (G+2) and so on (G+n). A team that is "hot" should see a better record at G+1 than at G+10, on the idea that their hotness will dississipate, to some degree or other. Some team can stay hot for 3 games and others for 13 games.

However, if we have a "recency" effect, like say trading for Manny Ramirez, then the record you find at G+3 should be very similar to the record at G+13.

(All relative to expectations.)

That said, with .505 being the "hot" game, I'm not sure how much room there is to distinguish between the two phenomenas.

***

Joe: I did some research in that, along with others. I think Clay Davenport maybe. And yes, September records are influenced by current standings. You can guess the reason. You can make a case to end the study on Aug 31.
9:20 AM Sep 6th
 
joedimino
If I recall correctly, teams tend to be drawn towards the center, except for in September, correct? In September, the bad teams tends to play a little worse and the good teams a little better.

If I am recalling this correctly (I want to say I read it in something Bill wrote at some point in the last 30 or so years, but maybe it was someone else) could that be influencing this one way or the other? Since the effect is so small, could that be what we are seeing here?
5:18 AM Sep 6th
 
JackKeefe
The subtext of this article is about expectations, about meeting or exceeding them. It's not surprising that the measurable effect of hot streaks is small. We're always talking about tiny decimal place level differences in baseball---how one flare hit a week can turn someone into a .300 hitter---but they are palpable distinctions nevertheless.

Because people are trained to look for patterns, and see them everywhere, even in places we shouldn't, one's expectation of an event colors the experience of the event. When a team is red hot, and everybody is playing well and the breaks keep going their way, they expect to win. Early deficits don't matter because they are confident they will come back. Nobody panics, everybody executes, and they just relentlessly keep coming at you. There is power in belief.

A team on a cold streak is usually trying too hard. Pitchers walk guys they shouldn't, batters try to hit 5-run homers, runners make foolhardy outs on the basepaths, fielders make key boots that lead to runs. If they fall behind, they get frustrated; if they take the lead, they dread the inevitable comeback. All the while they are waiting for the other shoe to drop---the bad bounce, the bad call---expecting to lose because that has been the pattern.


12:53 AM Sep 6th
 
bjames
Tom. . ..I'll see if I can come up with that data. It's not something I would just have. . .I might be able to calculate it.
11:09 PM Sep 5th
 
hotstatrat
Warning: you don't care:
I can't help but point out that my Junior High buddies and I drank Borden milk in 1969 so we could use their coupons to go to as many Mets games as possible that summer. Game 2 of that Cubs series (July 9) was one of them - a Tom's Seaver no hitter spoiled by Jimmy Qualls in the 9th inning. That was the year the Mets went from doormat darlings to World Champions. The next year Borden dropped their promotion.
9:33 PM Sep 5th
 
tangotiger
Bill, what was the win% of the last 10 games for each group? Basically, I'm trying to get a feel of what being "hot" means? Is that a .700 record?

So, basically, a .500 team that runs hot, and plays at .700, will then play their next game at .505?

If that's the case, that means it would play at 97.5% to their talent and 2.5% to their hotness.
6:14 PM Sep 5th
 
OldBackstop
HeyBill, some interesting stuff.

I'm still struggling with applying a season long success rate to the actual teams we are looking at after a pump or dump mid-season in the current fashion style.

The Mets, optimistically, might be 70-92 this year...but in June they went 14-14, ending the month 38-42. They swept the Cubs in mid-June with a lineup that had five guys with an .OPS over .800. Until last week that team still led the NL in home runs.

That team is gone. The best catcher, first baseman, second baseman and four outfielders are playing on other teams or done for the year. The current team is on a 6-18 tear after a win last night. We have one (1) of our top seven starting pitchers on track to get more than 20 starts. Our highest valued fantasy player yesterday in Fanduel was a 4 foot tall Japanese teenager we picked off waivers from Toronto. God made our one remaining power threat foul a ball off his nose yesterday. His own nose.

Is it just me, or is the pump and dump, tear down and rebuild trategy getting to the point where the playoffs are All Star games? Is this measurably different from the past?


4:30 PM Sep 5th
 
MarisFan61
I got real excited about this, on behalf of ignorami worldwide -- "Hey, us morons have been right" -- till I got to the part about how small the effect is.

BTW, I had to work a little hard to get why in this study the records for 'hot' teams are routinely worse than "expected," and for 'cold' teams, better than expected. I think a lot of other people will too. But don't worry too much, everybody -- it's not that hard. :-)

Thanks for this!
3:39 PM Sep 5th
 
 
©2017 Be Jolly, Inc. All Rights Reserved.|Web site design and development by Americaneagle.com|Terms & Conditions|Privacy Policy