The Cross-Multiplication Issue

October 28, 2022
                                   The Cross-Multiplication Issue

 

            First of all, I have a couple of mistakes to acknowledge here.  In yesterday’s "Hey, Bill" answers, I referred to the problem I am now trying to discuss as a "picayune, niggling, trifling, petty, irrelevant little objection."  This was grossly inappropriate.  First of all, it was grossly inappropriate because it was disrespectful to the contrary opinion, and thus rude to the man who raised the issue and to Tom Tango, who supported his objection.  Beyond that, it was grossly inappropriate because, if we are discussing the implications of the method—which we are—then the objection that the gentleman raised was a serious issue that should have been taken seriously, rather than dismissed in this fashion. 

            The gentleman who raised the question was tjmaccarone, or at least that is the name that he uses here; my apologies to Mr. Maccarone.  But the next day, Mr. Maccarone sent another post, trying to expand on the point that he had raised.  I read the post, decided not to publish it, and deleted it.  This wasn’t improper; I am the editor of this site, I have to make editorial decisions, I have to decide when a discussion is no longer productive and we’re going to move on.   But it was a BAD editorial decision.  It was a mistake.   I deleted his query before I took the time to understand the implications of what he was saying.  I’d like to recover the query and post it as a headnote to this article, but. . .it’s gone.  Sorry.  My bad.

            What the gentleman was saying, if I can reconstruct it from memory, from my perhaps imperfect understanding of it, was that the method was. . . well, good enough for government work, but that

1)     If you applied it to all of the teams, every game of a schedule, there would be small distortions, and

2)     Under some conditions, such as perhaps in the NBA where the standard deviation of winning percentage is larger, these might become meaningful issues. 

Well, yes and  no.  The distortion that Mr. Maccarone suggests DOES exist; it is present in the test that he suggests.   I believe it is actually larger than he suggests.  But the problem is not in the cross-multiplication approach.  The problem is in the test that he wants to use, which is completely irrelevant to the method. 

All "closed" leagues have a .500 cumulative winning percentage.  The major leagues now, there is inter-league play, but the two leagues together form a closed system which must always center at .500.   The "output" strength of every league is a .500 winning percentage. 

But does this represent reality?  Well, of course it does not.   A high school league and a major league both have exactly the same output winning percentage, but is the strength of the leagues the same?   In 1953 (or in any season in that era), the National League and the American League both had output winning percentages of .500.  Does this prove that the American League was just as strong as the National League in that era? 

Well, of course it does not.   If the output winning percentages were true representations of the strength of the teams, that WOULD be the case, but it isn’t. The American League in 1960 had a winning percentage of .500.  In 1961 they added two expansion teams.  In 1961 they still had a winning percentage of .500.  Does this prove that the expansion did NOT water down the quality of the league? 

Of course it does not. 

What I am saying is, the "output winning percentage" of any and every league is a misrepresentation of reality.   There is no rule in nature that the composition of every league must contain a .400 team and a .600 team, a .450 team and a .550.  There is no rule in nature that a league must be "balanced."   It is nonsense.  There is no such requirement for the formation of leagues.

The "small inaccuracy" that Mr. Maccarone cites results from imposing on the league a requirement that the league must be balanced.  You’ve got a .400 team; well, you’ve got to have a .600 team.   No.  There is no such rule.  This does not happen in reality. 

This is what happens in reality. . . let us say this represents 16 teams which are divided into two leagues of eight.   The teams have "natural strength levels" which are like this:

 

Atlanta

AL

.500

Boston

AL

.563

Chicago

AL

.549

Dallas

AL

.541

El Paso

AL

.465

Fresno

AL

.345

Greensboro

AL

.726

Houston

AL

.562

 

 

 

Indianapolis

NL

.433

Jacksnville

NL

.663

Kansas City

NL

.559

Los Angeles

NL

.515

Memphis

NL

.284

New York

NL

.485

Oklahoma City

NL

.686

Phoenix

NL

.518

 

 

 

American League Avg

 

.531

National League Avg

 

.518

 

Or like this:

Atlanta

AL

.407

Boston

AL

.391

Chicago

AL

.500

Dallas

AL

.695

El Paso

AL

.486

Fresno

AL

.505

Greensboro

AL

.525

Houston

AL

.498

 

 

 

Indianapolis

NL

.496

Jacksnville

NL

.500

Kansas City

NL

.370

Los Angeles

NL

.501

Memphis

NL

.637

New York

NL

.607

Oklahoma City

NL

.709

Phoenix

NL

.448

 

 

 

American League Avg

 

.501

National League Avg

 

.534

 

Or like this:

Atlanta

AL

.520

Boston

AL

.452

Chicago

AL

.489

Dallas

AL

.636

El Paso

AL

.337

Fresno

AL

.720

Greensboro

AL

.506

Houston

AL

.484

 

 

 

Indianapolis

NL

.524

Jacksnville

NL

.583

Kansas City

NL

.337

Los Angeles

NL

.498

Memphis

NL

.324

New York

NL

.423

Oklahoma City

NL

.500

Phoenix

NL

.542

 

 

 

American League Avg

 

.518

National League Avg

 

.466

 

Or like this;

Atlanta

AL

.437

Boston

AL

.472

Chicago

AL

.271

Dallas

AL

.500

El Paso

AL

.265

Fresno

AL

.494

Greensboro

AL

.464

Houston

AL

.750

 

 

 

Indianapolis

NL

.739

Jacksnville

NL

.361

Kansas City

NL

.502

Los Angeles

NL

.516

Memphis

NL

.738

New York

NL

.500

Oklahoma City

NL

.464

Phoenix

NL

.743

 

 

 

American League Avg

 

.456

National League Avg

 

.570

 

           Whatever the actual strength levels of the teams in each league, the experience of playing through a schedule is going to push every league to a .500 level.   But the experience of playing through a league doesn’t have a damned thing to do with this problem.  The method is based on the actual strength of the various teams, whatever that is. 

            Mr. Maccarone has stated directly that the winning percentage of the .600 team against a .400 team plus a .600 team has to be the same as their winning percentage against a .500 team.   But there isn’t ANY reason to believe that that is true.   The belief that it is true is based on the assumption that the league must be balanced.  But the league only APPEARS to be balanced because we force the teams through a schedule which has to result in a .500 record.  The league is NOT balanced; the true strength balances at SOME point, but we have no way of knowing what it is.  The true strength of each league balances at some unknown point, with no legitimate expectation that any two points will add up to anything.  

           When you think about it, I think you will realize that that is true. Feel free to convince me otherwise.  But we’re not working this out in "Hey, Bill."  I don’t have the patience for that, or the right attitude, or something.   I’m a very flawed human being. 

 

 

 

 
 

COMMENTS (30 Comments, most recent shown first)

tangotiger
True winning percentage is not Actual winning percentage. True is true. Actual is observed. Pythag is also observed.
10:17 AM Nov 2nd
 
tjmaccarone
I'd be surprised if it really mattered if you worked off the Pythagorean method versus true winning percentage.​
6:58 PM Nov 1st
 
TonyClifton
Would working off Pyth rather than true W/L improve results?
4:28 PM Nov 1st
 
tangotiger
NHL: Just move Shootout Losses into Actual Losses. Or, use Regulation Points (0.5 wins and 0.5 losses for games that go into OT). The latter is probably the best thing to do sabermetrically.
7:44 AM Oct 31st
 
DefenseHawk
Just don't try this out on the NHL. Because the league composite winning percentage is above .500.

This is because of something called Bettman Points, aka a "Three Point Game."

If you lose a game in overtime or in a shootout and you get a point. Since your opponent gets two points for the win, dividing the points awarded by two gets you 1.5 instead of 1.0.

Sure, you can figure a team's "true winning percentage."

A = Points Awarded / Games Played
B = Opponents Points Awarded / Games Played

Thus, "True Winning Percentage" = A / (A+B)

If a team, let's say the Ottawa Senators, has a record of 10 wins, 9 losses and 1 overtime loss, they'd have 21 in those 20 games.

But three of their wins came in overtime or a shootout.

Their opponents recorded 23 points (18 in Ottawa's 9 losses and 3 more in Ottawa's 3 OT/SO wins) in those games.

The NHL says their "points percentage" is .525. (they started calling it "points percentage" only after criticism.)

But is Ottawa better than a .500 team in terms of winning percentage? After all, they gave up more points to their opponents than they thmselves were awarded.

So what would be the Senators' "True Winning Percentage"?

A = 21 / 20 / 2 = .525
B = 23 / 20 / 2 = .575

True Winning Percentage = .477.
12:24 AM Oct 31st
 
tjmaccarone
I did something a bit further with the basketball upset data. I checked whether the upsets were at home or on the road. Of the 14 games where a below .250 team beat an above .750 team, 12 were won by home teams.

The normal home court advantage in basketball is such that the home team wins 60% of the time. The odds ratio approach suggests that this means that this can be achieved by a 10% penalty to the road team and a 10% bonus to the road team. If I do this, I find that the road teams should have won about 6 upsets and the home teams should have won about 11. This difference is both a factor of 3, and statistically significant at the 94% level. Given the small sample, the difference might be less than a factor of 3, but it is almost certainly real, and it could even be greater than a factor of 3.

As another example, in the NCAA tournament, generally at neutral sites, the 16 seeds are 1-143 against the 1 seeds. Here, Bill's point about the schedules the teams playing being different is important. The 1 seeds are generally about .900 teams from the toughest conferences and the 16 seed are generally .550 teams from the worst conferences. It would take some careful work to figure out what those really mean on the same scale, but if you work it through just by assuming the .900 team plays a "normal" schedule, and figure what you need from the weaker team to get 1 win in 144 games, you'd come to the conclusion that the 16 seeds would go something like 2-28 if they played a Big 12 team's schedule. I don't buy that. I think the odds ratio drastically overpredicts major upsets.

5:47 PM Oct 30th
 
tjmaccarone
Your .750 versus .738 is for the whole league. For some individual matchups, it will be a much bigger effect. Beyond that, it's not straightforward to figure out how to fix it.
1:43 PM Oct 30th
 
tangotiger
"One is whether the formula works well under a set of idealized assumptions, but for teams far from .500. I think the answer to that is clearly "no.""

I don't know why you are making it binary like that. I've already shown that we are talking about the difference between .750 and .738. To me, that's clearly perfectly fine, and I'll continue to use Odds Ratio. To you, that's clearly perfectly NOT fine.

Rather than talk about it in binary terms, just specify the error range, and then we can move forward, or move on, as the case may be.
1:27 PM Oct 30th
 
tjmaccarone
Sure, but if that's the case, then there's almost no point in having a method at all. If you think you can come up with a different "true" winning percentage at different times of year, then the question is how you combine those numbers.

From a mathematical point of view, using the existing version of the formula with real records assumes that the record is compiled in an environment where (1) both teams were playing, on average, .500 teams and (2) the quality of those .500 teams is the same. Otherwise, Bill's constraint that a .600 team should come out to .600 against .500 teams isn't a meaningful one, and his argument that this has to be the right formula fails right away. This also comes out of a more formal mathematical approach to the problem.

So I think there are really two separate issues. One is whether the formula works well under a set of idealized assumptions, but for teams far from .500. I think the answer to that is clearly "no." The other is whether that issues is as important as confounding factors like the ones Bill brought up, or TonyClifton or raincheck brought up. I think the answer to that is that the other factors are almost certainly more important for major league baseball teams. For other situations like e.g. whether it accurately predicts how many non-sluggers could hit homers off Mariano Rivera -- I think the issues I raised may be relevant.
1:03 PM Oct 30th
 
TonyClifton
Oh, and something that I have been noodling on.....in this tear down era, how reflective of a full season is a team anyway? I lokk at the Nats in 2021....40-38 on July 1, execute an epic teardown*, and finished the season on a 16-43 tear.

*Traded Kyle Schwarber, Brad Hand, Yan Gomes, Josh Harrison
Max Scherzer, Trea Turner and Jon Lester.

Two utterly different teams, a blended season record yields no insight.

And on the flip side, the Dodgers, while an excellent team all year, played something like .750 ball after adding Scherzer and Turner.
2:48 AM Oct 30th
 
TonyClifton
Back in the 60s I was fooling around on my Commodore 64 and came up with a thing called "Pythagorean Winning Percentage" (copyright TClifton, 1967) which I calculated by......ahh, you guys probably couldn't understand it, but maybe it would help here rather than actual W/L....
2:22 AM Oct 30th
 
tjmaccarone
Yes, of course that's the case, but if there's any meaning to making a formula that predicts how one team will do against another, then it should be as correct as possible for some ideal cases. Then one can make it more complicated for less ideal cases.
6:44 PM Oct 29th
 
raincheck
The definition of a “.600 team” is also slippery. Which of their pitchers were you facing? Was a key player injured or resting? Was it home or road? Sunny or blustery? Was the shortstop hung over?

There are million factors that make any one contest between two teams something other than a pure mathematical product of their records.
5:00 PM Oct 29th
 
tjmaccarone
No, that's not the problem at all. If 0.000 or 1.000 were the real underlying percentages, the odds ratio would be fine. Where I believe it runs into problems is for things a bit less extreme than those.

4:19 PM Oct 29th
 
Mike137
tjmaccarone -- If there are "major issues" then you will need to state them and provide some basis for why you think them important.

In a game, the "true winning percentage" is presumably not an issue since that would be built in.

The odds ratio gives a central value. To predict the probability of upsets requires distributions. In extreme cases, distributions become asymmetric. For instance, with an undefeated team playing a winless team gives expected winning percentages of 100% and 0%. But any deviation can be in only one direction. So if you play enough games, you will eventually get less than 100% and greater than 0%. But that is not a problem with odds ratio; it is a problem with misinterpreting it.
3:26 PM Oct 29th
 
tjmaccarone
Mike137 -- I still think that there are some substantial issues here, if you're talking about probabilities of major upsets. In the 1987 Abstract, Bill raised the issue of getting matchups right for tabletop games. I think the probability of rare events is probably overestimated for things like that, too, and the range of probabilities for some of those rare events is much larger than a factor of a few.



2:31 PM Oct 29th
 
Mike137
The initial discussion was not silly. Tjmaccarone pointed out a seeming paradox; such things are valuable in testing understanding.

But now we have an explanation. We have learned something from it. A team's actual winning percentage is not its true winning percentage against .500 teams, unless it is a .500 team. That introduces errors in using the formula, but the errors are typically only a percent or two. That is useful.
1:56 PM Oct 29th
 
mauimike
I realize that most of the things we discuss is silly BS, but this is sillier than most.
11:52 AM Oct 29th
 
tjmaccarone
I agree, Tango. But if we had the "true" winning percentages for these outlier teams, both the good and bad teams would almost certainly be closer to .500. That would *increase* the number of upsets expected. This provides additional support for the idea that the odds ratio overestimates the number of upsets.
12:55 AM Oct 29th
 
tangotiger
You really have to be careful in treating an OBSERVED .750 win% as a TRUE .750 win%. The way you are doing it, it's not going to tell you what you want it to tell you.
10:25 PM Oct 28th
 
tjmaccarone
And the reason I think this may be a bigger effect than what Tango wrote for his simulation for NBA-like winning percentages is that the average over the whole league includes mostly games against teams with "normal" winning percentages. If there is a major problem, it will be mostly in the mismatches. It seems likely to me that log5 would overpredict the number of major upsets by a much larger factor than any other problem it might have.
10:03 PM Oct 28th
 
tjmaccarone
I did a test with NBA games, going back to 1961. I looked at all games where a >=.750 team played a <=.250 team. There were 245 of these, 14 of which were upsets. The expectation from log5 is that there would be 16.7 upsets. Unfortunately, the numbers aren't big enough to say much from it. On the surface, the number of upsets is overestimated by 20%, but of course with small sample sizes, it's not conclusive. I suppose I could try going down to 2/3, 1/3 and the sample would be much bigger, but I probably won't spend the time.
9:58 PM Oct 28th
 
tangotiger
To show the math: The difference of two distributions of Runners A and B has a mean of -0.05, with a standard deviation of 0.1414. That gives us a win% of .638.

Similarly, Runners B v C would result in a win% of .638.

The Odds Ratio method (or Log5) would say .757 win%.

But, if you use the difference of distributions of Runners A and *C*, that has a mean of -0.10, with a standard deviation of 0.1414. And that is .760 win%.

As you can see: Odds Ratio is close but not perfect.

So I suppose what you need to do is convert each team's true talent "win%" into some value. So, instead of ".600", that team has a value of "3.6", with a standard deviation of 10. A .500 team has a value of 0, and same standard deviation. And the difference of two distributions with those means and standard deviation is .600.

A .400 team has a value of -3.6, with an SD of 10.

And the difference of those two distributions? .695.

Therefore, Odds Ratio is a very very close and simple approximation. But if you want to take the extra step and go the extra mile and go full probabilistic, then this is how you do it.

Hopefully one or two people followed all that.
9:53 PM Oct 28th
 
tangotiger
The example I use is running in a race. Suppose you have a runner A that runs in 10 seconds, with a normal distribution of 1 standard deviation of 0.1 seconds.

You have another runner B who runs in 10.05 seconds on average, with the same distribution.

And another runner C in 10.1 seconds, with the same distribution and so on up to runner Z.

You can compare the two distributions to see how often runner A beats runner B. And then how often runner B beats runner C and so on. Each of these matchups actually will generate identical winning percentages.

Now, if you tried to use the log5 or Odds Ratio to determine if Runner A beats Runner C (by multiplying AB and AC), you will be close, but not exact if you used the two distributions directly of AC.

And you can keep going and going and the more space between the two runners, the less the Odds Ratio will work. The distribution method will always work, because those are actual distributions of each runner. The Odds Ratio basically tries to capture the effect of those distributions, but it's not perfect.



7:18 PM Oct 28th
 
jgf704
As I see it, odds ratio is just a model. It's got a good probabilistic basis (there is a nice paper on this in the Fall 2014 Baseball Research Journal). But, like all models, it can be off in certain cases.

An alternative model is

Wpct(A vs. B) = 0.5 + Wpct.A - Wpct.B

This one does not "fail" tjmaccarone's example, but it does fail in places the odds ratio method does not (when Wpct.A or Wpct.B equals 1 or 0).

FWIW, this alternative model is what you get if you linearize the non-linear odds ratio model around Wpct = 0.5.
4:23 PM Oct 28th
 
Mike137
I think that part of the confusion might be due to not carefully distinguishing between "true talent" and "actual record".

Imagine a two team "league" in which one team has a true talent level of .600 and the other .400. After playing a large sample of games, we should not expect their records to approach .600 and .400. We should expect the records to approach the odds ratio results of .692 and .308. If the two teams play each other, we obviously should expect the better team to win 69.2% of the time. But if we use the actual winning percentages with the odds ratio method, we get an incorrect expectation of 83.5%.

With a real league, even the best and worse teams play opponents who are on average near .500. So the error is small, as Tango finds.
3:10 PM Oct 28th
 
tangotiger
Well, I have a working model that lets me do all kinds of stuff. And I can tell you that plugging in a .600 team for an NBA-type distribution of opponents will result in an average of .594.

A .750 team will result in .738, which is pretty close to as bad as it gets. How much of a problem this is, I guess it's up to whatever it is that you are trying to do.
1:20 PM Oct 28th
 
tjmaccarone
Bill,

First, I appreciate the apology.

Second, I think a lot of what you write in this article is correct, but I still stand by my original objection as well. You originally argued that because the odds ratio clearly gave the correct result for a few special cases, it must be correct. That's clearly not the case, as there are an infinite number of mathematical function that can run through a set of points.

As for the unbalanced schedule -- for sure, in the real world, that's the case, but a correct formula should be able to do what it is supposed to do in abstract cases. The basis of the odds ratio approach is that the league average is .500, so a .600 team is 1.2 times as good as the league average. If that's not the case, then the winning percentage that the team actually achieved isn't reflective of its real talent level, and a different number should be used in the odds ratio.

I'd tend to agree with Tom Tango's statement that for Major League Baseball, this is a rounding error in all cases, and I made that point when I wrote you in Hey Bill. I think for other sports, like basketball, where the top teams often win 75% or more of their games, and the bottom teams often with 25% or less of their games, that this could make a big difference. And in college basketball, where the top teams often win 90% of their games, it could be an even bigger deal (although there, the schedules are extremely unbalanced).

My prediction would be that if one looked at NBA teams' records, for the best teams against the worst teams, the best teams would do a bit better than the odds ratio predicts, and that there are enough NBA games that have been played between .750 teams and .250 teams that this would be meaningful, at least at the level a gambler might care about (a difference of about 1 in every 20 such games, so that probably after a few seasons, this gap starts to mean something). I'm not sure I know or could quickly learn a good programming tool to pull that information off basketball-reference quickly, or the patience to do it by looking up the results one by one, but I may give it a try. Obviously for college basketball, the gap would be bigger, but given how unbalanced the schedules are, much harder to work out properly.
10:51 AM Oct 28th
 
tangotiger
This is really a question of being mathematically perfect, or just being really really close.

But if you work it out, as I have (though my interest is more in players than teams, but the same principle applies), the Odds Ratio is a very close approximation but it is not mathematically perfect.

If you choose ANY distribution of opposing teams, and you start with a presume win% (say .600), and then you match up that team, one by one, via Odds Ratio, you will get something approaching, but not equal to, .600000. Say you get .599.

Well, then if you now presume your team is .599, and make it go through the cycle, you end up with .598 as the overall average. And so on.

You can also remove the team itself, and so rather than this team facing one of 30 teams, it faces one of 29 teams. Suddenly, this .600 team will be a .602 team.

I suppose you can try to figure it out such that a true .600 ends up being an average .600 against a matchup of teams, based on the how much it plays against "itself".

In the grand scheme of things, it amounts to a rounding error. It's really more of a mathematical, probabilistic fun exercise rather than learning anything new analytically.

It's really not worthy of any debate other than a scholarly one.
9:11 AM Oct 28th
 
TonyClifton
Well, Bill, if I see T-Mac around, I'll direct him here for your apology.

But I dunno, he was hitting the Glenmorangie 18 pretty hard down at the BJOL RP Lounge last night.

Kept saying " Bill called me a niggler....."
7:22 AM Oct 28th
 
 
©2022 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy