Time on the CrossMultiplication
(Sabermetrics is essentially derived from Econometrics. One of the foundation texts of Econometrics was "Time on the Cross", a 1974 book by Robert Fogel and Stanley Engerman. Just paying homage there.)
On a list of my favorite things to do, admitting that I was on the wrong side of an issue ranks somewhere between "posting bail for a relative" and "cleaning up after a house fire." Occasionally, however, there is an obligation that one can no longer duck. I am fairly sure now that in the October 28 article, "The CrossMultiplication Issue", I was batting lefthanded against a lefthanded pitcher. I was on the wrong side of the plate.
How do we know this? Somebody in the discussion suggested, I think, that if you figured the expected wins for every team in a league game by game, they would not add up to exactly the team’s wins (and losses). So I thought "OK, why don’t we do that, then? It doesn’t sound like it would be that hard."
And it isn’t hard. First, I searched for a "perfect test league", by which I mean a league in which (a) every team plays every other team the same number of times, and (b) the schedule is completed, with every game played. You can perform the test on any team regardless of games missed or the balance of the schedule; it is just easier if you’re not skating around those issues.
For perfect test leagues I chose the National League in the years 1908 and 1963. In 1908 the NL completed their schedule with each team playing 22 games against every other team—granted this took a little extra effort—and the 1963 NL completed their schedule with each team playing 18 games against every other team. In 1963 LA (9963) played 18 games against New York (51111). Los Angeles has an expected .774 winning percentage (.773 764) against the Mets, which gives them an expectation of 13.93 wins in 18 games. Do that for eight other opponents, and you have LA’s expected wins for the season; do that for the nine other teams, and you have every team’s expected wins (and losses) for the 1963 season. It’s not hard.
Except that it doesn’t work. Well, it doesn’t work the way that I had hoped it would work, mostly expected that it would work, and planned to come back here and brag about the fact that it worked. I wanted LA to come out with an expectation of 99.0000 wins, and New York to come out with an expectation of 51.00000 wins. In other words, I wanted the "output wonlost records" from the study to be the same as the input records at the beginning of the study. But LA comes out with 100.52 expected wins, and New York with 48.46. Basically, the process performs worse (in this test) than the other side of the argument suggested that it would.
OK, two questions:
(1) Why doesn’t it work?, and
(2) What adjustments could be made to make it work perfectly?
Working on this, I have come to an understanding about the first question. I understand now why the system doesn’t work, and I’ll explain that later. The second question, still working on it, not really working on it because I have other stuff I am more interested in, but I don’t have the answer for you.
A few points to begin with:
1) There doesn’t seem to be any difference, for purposes of this study, between "actual" data such as that from the 1908 and 1963 National Leagues, and madeup data. The data performs exactly the same whether it is (a) actual data, (b) data that could be actual but isn’t, or (c) data that does not conform to the ordinary rules by which leagues are formed.
2) In the studies that I have done, every team that finished over .500 in real life then overperformed in the test sample. That is, the 1963 Dodgers, who won 99 games in real life, show with 100.51 expected wins in the test. The 1963 Chicago Cubs, who finished just 8280, still rise to 82.08 expected wins in the test. Every .500+ team in the study showed an increase, and every sub.500 team showed a decrease. Not saying this is an absolute rule, but it’s a 100% rule in the studies that I have done.
3) This might seem to conflict with the statement made by Tom Tango in a response posted to the October 28 article
Tangotiger
Well, I have a working model that lets me do all kinds of stuff. And I can tell you that plugging in a .600 team for an NBAtype distribution of opponents will result in an average of .594.
A .750 team will result in .738, which is pretty close to as bad as it gets. How much of a problem this is, I guess it's up to whatever it is that you are trying to do.
1:20 PM Oct 28^{th}
And other similar statements. What I just said might appear to be in conflict with this, but I’m not really sure that it is. In a discussion like this, what we mean by "overperforming" and "underperforming" is very difficult to keep straight, and what is meant by the .600 team and what is meant by the .594 outcome is very easy to confuse. And there is another issue here, which we’ll get to later in the article; I am just telling you not to jump ahead and assume that these two statements are in conflict; they probably aren’t. You’ll understand by the end of the article, if you can somehow stay awake and stay sober.
4) Several people have assumed that increasing the standard deviation of winning percentages would increase the discrepancy or error here (that is, the difference between the expected outcome and the actual outcome.) Actually, it appears more likely that increasing the standard deviation of winning percentage would DECREASE the discrepancy, not increase it. I’m not certain of that; that’s just what I suspect, based on the little studies I have done.
I can see why people would ASSUME that the opposite is true. As teams move further away from .500 (in my tests), the discrepancy between their expected and actual winning percentage increases. The reasons for that are fairly obvious, and I can see why people would assume that that means that the problem would become more serious if more teams in the league moved further away from .500.
There are two reasons I question that assumption. One is that. . . well, how do I explain this? In baseball, the 1963 Dodgers’ .611 winning percentage is a very high percentage. In other sports, a .611 winning percentage would not be the best in the league; it might be the 8^{th}best winning percentage in a 30team league, or something.
But if you put other teams in the competition ABOVE the Dodgers (that is, above .611), then that pushes the Dodgers’ expected wins down just a little bit, down closer to the reallife winning percentage.
The bigger issue, though, is that the "discrepancy curve" bends at some point. I don’t know exactly what that point is; I suspect it is about .667, but I haven’t studied it enough to be sure where it bends, but when you think about, the discrepancy curve HAS to bend back toward zero as the standard deviation of wins increases. If a team has a winning percentage of 1.000, there expected winning percentage against any opponent will be 1.000, so the discrepancy there HAS to be zero, since the 1.000 opponent can never lose.
Well, if the discrepancy curve REACHES zero, then it has to APPROACH zero, right? And think about it. Suppose that a .750 team is playing a .250 team, a situation that would never arise in baseball, but could arise in football or basketball, where the standard deviation of winning percentage is larger. But if a .750 team is playing a .250 team, then the better team is going to win 90% of the time, and the .250 team will win only 10% of the time. There’s just not that much room for the discrepancy curve to operate. If the formula is saying that the expected outcomes are .100 and .900, then a "corrected" formula can’t possibly say that it is going to be .950 and .050. It can’t say .925 and .075. As the numbers approach zero and 100%, the discrepancy is being squeezed against the sides of the box. So the discrepancies, rather than becoming MORE significant as the differences between teams increase, might actually become LESS significant. I think.
OK, moving on now toward understanding/acceptance of the fact that my previous position was in error. In what we can call the first round of studies, I found that every winning team had an "output" expected wins that was greater than their actual winning percentage, and every losing team had expected wins that were LESS than their actual wins.
Well, I thought. . . .trying to defend my previous position. . . .well, I thought, Sure. The discrepancy occurs because the .600 team is always playing in what is not otherwise a .500 league. A .600 team comes out with an expected winning percentage greater than .600 because their opponents—the OTHER teams in the league—are under .500.
If that was true, then I could use that to justify my previous argument, and stick to my guns. But unfortunately, that isn’t true, either. I modified test leagues so that, for example, the 1963 Dodgers are still 9963, but the rest of the league is at .500. This is what I referred to earlier as "data that does not conform to all of the ordinary rules by which leagues are formed." In real life you can’t have one team at .611 and the rest of the league at .500, but in a theoretical test, you can. (Actually, you could make that work in real life, but let’s not go there; there is no profit in it.)
Anyway, the point is that I thought surely that would show that the .611 team plays .611 ball, if their opposition is .500 teams. But actually it doesn’t. It gets them CLOSER to .611, but it overshoots the mark. The .611 team (9963), rather than coming in with 100.52 expected wins, now comes in at 98.27 or 98.31 or something. Closer to 99.0000, but not there. So why does that happen?
Go one step further. Suppose that you put the 1963 Dodgers (9963) up against a league which is .500 even without the Dodgers and in which all of the other teams are the same. The Dodgers are 9963, and every other team is 8181. THEN the process works; then the Dodgers come out with 99.00000 expected wins.
So what do we learn from that?
We learn that there are actually two reasons why the expected wins do not add up to exactly the team’s actual wins, why the output data is a tiny bit different from the input data. One is that the other teams in the league are not .500 teams overall, and this factor pushes the expected wins for the 1963 Dodgers UP from 99.000. But the other reason is that the expected wins do not stay exactly in place unless the standard deviation of winning percentage for the other teams is zero. This factor pushes the expected wins DOWN from 99.000.
And what that means is, Maccarone was right. Tjmaccarone, early in this discussion, told me that the formula was imperfect because the sum of expectations for a team playing a .400 opponent and a .600 opponent is not the same as the sum of expectations for a team playing all .500 opponents. I now have to concede that not only was he right about that, but that it is a valid and relevant test for the perfection of the method. The two problems stated earlier:
1) Why doesn’t it work?, and
2) What adjustments could be made to make it work perfectly?
We now know the answer to the first of those. It doesn’t work because Maccarone was right.
The other question, what adjustments could be made to make it work perfectly. . . there has to be a theoretical answer to that question, I think, but at this point we don’t know what it is. There has to be some way of tracing the exact curves formed by the data as it moves from .000 to 1.000.
Working on that problem, I tried all kinds of different adjustments before I found what works ALMOST perfectly in most cases. Going back to the National League in 1908. . .these are the actual standings:

Wins

Losses

Chi

99

55

Pitt

98

56

NY

98

56

Phi

83

71

Cin

73

81

Bos

63

91

Bkn

53

101

St L

49

105

And to that, we can add the expected wins calculated by crossmultiplication, and the discrepancies between the input and output wins:

Wins

Losses


Output

Discrepancy

Chi

99

55


100.70

1.70

Pitt

98

56


99.61

1.61

NY

98

56


99.61

1.61

Phi

83

71


83.43

0.43

Cin

73

81


72.72

0.28

Bos

63

91


61.98

1.02

Bkn

53

101


51.16

1.84

St L

49

105


46.79

2.21








616

616


616.00

10.72

The sum total of the discrepancies for all eight teams in the league is 10.72 wins, or about 1.34 wins per team.
In order to make the output wins match the input wins, we need to shrink slightly the standard deviation of winning percentages before doing the crossmultiplication. One way to do that would be to use not the actual wins and losses for each team, but the square root of the wins and losses for each team. If you do THAT, however, then the expected wins for Chicago (the league champions at 9955). . their expected wins drop to 90, actually 89.77. So that’s moving in the right direction, only way too far.
I thought perhaps there was a "perfect exponent" which would move the expected wins for each team exactly as far as they should be moved. There isn’t, but you can come damned close. The mostperfect exponent for the National League in 1908 would be .91623. In other words, you can . . ..
Well, the inherent error of the process is very small, to begin with. But you can reduce that very small error to a microscopic error if, rather than using each team’s actual wins and losses, you take their wins and losses to the power .91623.
Now, I really have no idea what is meant by "taking a team’s wins and losses to the power .91623". I understand the concept of a square root; I understand the concept of a cube root, or a third root or a fourth root or a fifth root. 99 to the fifth root is that number which, multiplied by itself four times, becomes 99. The fifth root of 99 is 2.50684; multiply 2.50684 by itself again and again four times, you get 99. I understand that.
What is meant by 99 to the power .91623, I don’t really follow. What I know is that if you tell my computer to take 99 to the power .91623, it will give you a number, which is 67.37. If you take 55 to the power .91623, it will give you a number, which is 39.31629. The 1908 Cubs record was 9955. If you PRETEND that their actual record was 67.37  39.32, and you do the same for all of the other teams in the league, and you run these through the crossmultiplication process, you get almost perfect expected wins for every team. If you do that, the sum error for all eight teams, which was 10.72 wins before (1.34 per team) drops to .2930039, which is .0366 per team. Divide that by 154, and it is .000 238 per game. It become a nearly perfect expectation.
It would be neat if that process worked for every league, but no such luck. If you do the same for the National League in 1963, the .91623 exponent doesn’t work at all; in fact, it makes the process LESS accurate, rather than more.
In the National League in 1963, the "inherent error" of the formula is only 8.59 wins for the entire league, or 0.86 wins per team, whereas in 1908 it was 1.34 per team. But the "perfect exponent" for the National League in 1963 is 1.0011; that is, almost 1.000, but slightly GREATER.
And do you want to know why?
It is because, in the National League in 1963, there are an unequal number of winning and losing teams. There were 7 winning teams, 3 losing teams. That’s why the exponent thing doesn’t work.
I haven’t done A LOT of research here, because I have other things I am trying to work on, but it appears that the exponent .92 (generally) works very well to "correct" or adjust the formula, so long as there are an equal number of winning and losing teams in the league, which there often are or usually are. Winning teams always overproject; losing teams always underproject. If there are an equal number of winning and losing teams in the league, then applying the exponent 0.92 (or something very close to it) reduces the standard deviation of expected winning percentage, thereby decreases the expected wins for the good teams but increases the expected wins for the bad teams. It balances, so it makes the formula almost perfect. But if there are an unequal number of winning and losing teams in the league, then the "perfect exponent" adjustment doesn’t work, doesn’t help.
OK, that’s about as much as I understand about this process. I’m going back to work on my other stuff now.