Remember me

Two Leagues Diverged In a Yellow Wood

August 22, 2007

            I have been working on some simulations designed to better understand a couple of features of the 2006 season, with regard to which I have three questions for you, which will be:

1)  Are my methods reasonable?

2)  Are my conclusions reasonable? and

3)  Do you have any ideas as to how we can get better answers than these?

       But first, the questions I was addressing, which are:

       1)  How unusual is it to get a significant imbalance between the leagues?   If we assume that the leagues are made up of teams which move independently up and down, better and worse, if we assume that we have 16 teams in one league and 14 in the other, what would be the expected differences in quality between the two leagues?

        2)  Detroit.  What has happened in Detroit--for a team to improve so dramatically in three years--SEEMS, in a sense, almost impossible. . . .it would seem that, to go from a team that bad to a championship team in three years would require that every decision you make be a good one, that every gamble pay off, which of course won't happen.   But how unusual is it really?

       I built a model to study these two questions, in this way.   In the first year of the model, the quality or "underlying expected winning percentage" of each team was simply the average of ten random numbers.

        In the second year of the model, and in every subsequent year, the underlying expected winning percentage of each team was

       7 times whatever it was the previous season

       + 3 random numbers

        Divided by ten.

        Using the term "underlying expected winning percentage'" to recognize that a team with a .572 figure in a league in which the average is .519 isn't going to play .572 ball; they're going to play something around .550 ball.  

        We tend to think of leagues as being always at .500, this being a sort of "necessary fallacy" of statistical analysis. . ..when you are measuring everything against the average, you place the average at .500 and you tend to assume that this is always the same.   But in fact, of course, the quality of leagues ISN'T always the same; it only MEASURES the same because we don't know how to measure it in absolute terms.    That's the issue I was trying to get at. . .how large are the differences in league quality that we are missing by assuming that the leagues are always of the same quality?

        The clear and simple answer to that question is "They're bigger than I would have guessed", or at least they appear to be bigger based on this model.   In my model I had 14 teams in one league, 16 teams in the other, 1003 years in each cycle of the study, and I ran 12 cycles, thus 12036 years with 12 different start points.  For each season 1 through 12036, I figured the average quality of an American League team, and the average quality of a National League team, and subtracted the one from the other.   This, stated in absolute terms, was called the "difference", and the difference was then multiplied by 162 so that it could be stated as a number of games.    In other words, in year 4188 the average quality of an American League team was .50485 and the average quality of a National League team was .51102, a difference of .00617.   Multiplying that by 162, the average National League team was 1.0001 games better than the average American League team in simulated year 4188. 

       How large would you guess were the normal differences between the leagues?   One game?  Two games?   Stop and think about that for a second, because on that question rests the issue of whether my results were reasonable.  

        In my simulation there were

        2231 years (out of 12036) in which the difference between the two leagues was less than one game.

        2090 years in which the difference was one to two games (1.000 to 1.99999999).

        1872 years in which the difference was two to three games.

         In other words, in essentially one-half of the simulated seasons, the difference between the quality of the two leagues was 3 games or more, and in the other half it was less than 3 games.  These are larger differences than I would have anticipated before doing the study.  I would have guessed that the difference between the two leagues would rarely be more than two games.  In fact, it was USUALLY more than two games.    

        Continuing on, there were

        1654 seasons in which the difference was 3 to 4 games.

        1339 seasons in which it was 4 to 5 games.

        1029 seasons in which it was 5 to 6 games.   

        In 15% of the seasons, the average difference in quality between the teams in the two leagues was greater than 6 games—assuming that there was no CAUSE for such a difference; assuming that it was just a random separation.

             There were

             706 seasons in which the difference between the leagues was 6 to 7 games per team.

             473 seasons in which it was 7 to 8.

             264 seasons in which it was 8 to 9.

             161 seasons in which it was 9 to 10. 

             We’re now up to 11,819 total, but that still leaves 217.   In almost 2% of seasons, the difference between the average team in the two leagues was MORE THAN 10 GAMES—a truly huge difference.

             There were

             106 seasons in which the difference was 10 to 11 games.

             53 seasons in which it was 11 to 12.

             29 in which it was 12 to 13.

             13 in which it was 13 to 14.

             9 in which it was 14 to 15.

             4 in which it was 15 to 16.

             2 in which it was 16 to 17.

             And one season—simulated season 7642—in which the average difference between teams in the two leagues was 17.48 games.

             A critical question here, which you are no doubt asking yourself, is “What was your standard deviation of team winning percentage, James?”   Obviously, the higher the standard deviation of team winning percentage in the model, the larger will be the gaps that open up between the leagues.   If this were a more sophisticated study the first thing I would have done would have been to establish that the standard deviation of winning percentage in my model was the same as the standard deviation of winning percentage in real life, whatever that is.   But since this was just an exploratory venture I didn’t do that: I just constructed the model on what seemed like reasonable assumptions and checked to see that the spread of winning percentages looked reasonable.   The standard deviation of team winning percentage in my model was .0696.   I am sure that there is some function somewhere which says that if the standard deviation of team winning percentage is x and there are 14 teams in a league, the standard deviation of league winning percentage will be y—and further, if the standard deviation of league winning percentage is y and there are two leagues, there will be z instances per thousand in which the separation between them exceeds m.   Those functions would be a sort of short cut through this research, but I don’t know what those functions are.

             Another flaw in my model is that it assumes that the winning percentage of each team rises and falls independent of all other teams, when in fact it is more likely, since the teams compete with one another, that teams tend to track in similar directions, all going up together and down together to some degree.    But I was interested in the question of to what extent we did NOT need to rely on that expectation in order to explain a performance gap between the leagues.

             My answer with respect to the Detroit issue is even more problematic and speculative than my answer to the leagues question.   It seemed to me that there was a sort of general resemblence between my model and real life.   In real life you replace some percentage of your team every winter. . ..let’s call it 30%.   To rebuild your team from “terrible” to “outstanding” in three years—both in the model and in real life—requires that the “replacement parts” be uniformly outstanding.   What I was essentially asking was “How rare was that?”

             The answer I got here was “not as rare as I would have expected”.   In each of the 12 cycles I had 1003 years, which meant that I had 1000 possible comparisons between a team and the same team 3 years earlier.   Since there were 30 teams being “followed” or “modeled”, I had 30,000 this-year-to-three-years-ago comparisons in each cycle, or 360,000 altogether.  

            By the way, somebody told me that in Dombrowski’s entire career as a General Manager he has had only three winning teams.   I don’t know whether this is true.   Does anybody know?   Anyway,  I labeled a comparison a “hit” or a “Detroit-type-occurrence” if the underlying expected winning percentage in year x4 was at least .200 greater than the winning percentage in year x1   Out of 360,000 comparisons there were 2011 hits, or one for each 179 team/seasons.   In essence, the model suggests that a dramatic three-year improvement in winning percentage should be expected, somewhere in the major leagues, once every six seasons.  This is much more than I would have guessed—but then, it’s a very crude model for that question, more troubling for that question than for the separation-of-leagues question, because there are more unmeasured variables.   With all of the problems of the other study, the fact is that if we get the standard deviation of team winning percentage right, we’ll probably get the answer about right.   In this part of the study, the “Detroit” part of the study, the parallel question is “Do teams change from season to season in the same ways in the model that they do in real life?”   That’s a question which has many more dimensions, and thus is much harder to drive to an affirmative answer. 

             But I would suggest that they must be generally the same, that the way in which teams change from season to season in the model and the way they change in real life must be essentially similar.   But you can re-create the model and study that if you’re a’ mind to. . ..I’ll leave that up to you.

            We get back to the questions at the beginning:

1)  Are my methods reasonable?

2)  Are my conclusions reasonable? and

3)  Do you have any ideas as to how we can get better answers than these?

            I will look forward to your reactions.

Bill James

 
 

COMMENTS (3 Comments, most recent shown first)

sljy
Dombrowski has had 4 winning seasons counting 2007. Backing up, there have been 16 seasons in which Dave Dombrowski was the GM for the team prior to the season starting, starting with 1989 Montreal Expos to 2007 Detroit Tigers. His teams have been above .500 in 4 of those seasons, '90 Expos, '97 Marlins, '06 Tigers and '07 Tigers, including 2 pennants and 1 World Series winner.
2:51 PM May 9th
 
bokonin
The reasoning seems nice in theory. What I'd ask is this: the era of liberal free agency has been going on for, I dunno, a shade under 20 years? (It's my strong impression that teams were more conservative in re-shaping themselves until the 1990s.) A .200 winning percent increase is 32 games, basically. Have there been three teams that leapt by 32 games in three years, in that time? There's the Tigers, but also the Diamondbacks 2004-07. Billy Beane only got a 28-game three-year improvement... the Yankees prorate to a 29-game improvement from 1991-94, but that's cheating .. Ooh! The Braves, 1990-93! So yes, I think your model is doing fine, is maybe even somewhat cautious (if I've missed anybody, which is very possible).

Of course, the Tigers jumped 52 games. Now that we've established your model's credibility, how often did THAT happen in your model?
12:40 PM Apr 1st
 
tangotiger
The observed standard deviation (SD) of win% over the last few decades (I think since 1961) was .072, mighty close to what you ended up with. The TRUE standard deviation of win% is .060. That is, if you start with a distribution of teams where their underlying true rates implies a spread of 1 SD = .060, given 162 games, your observed spread will be 1 SD = .072........... In order to determine the reasonableness of your method, which prima facie it is, I'd like to know what the observed SD is, year by year. I'd like to make sure that you aren't seeding your model in such a way that the SD follows a trend. Can you post that?
12:43 PM Mar 6th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy