I have plowed this ground before, but bear with me for a few minutes. Let us suppose that there are six NFL teams, and six games.
In week one, Arizona beats Baltimore 13-7, Chicago beats Detroit 34-21, and El Paso beats Fort Worth 28-21.
In week two, Arizona beats Detroit 23-13, Baltimore beats El Paso 21-17, and Chicago beats Fort Worth 42-24.
Standings: Arizona 2-0, Chicago 2-0, Baltimore 1-1, El Paso 1-1, Detroit 0-2, Fort Worth 0-2.
We are assuming, for the sake of simplicity, that all the games are played on neutral fields. We can take these scores, and try to figure out how the teams rank. We assume initially that every team has a value of 100. When Arizona beats Baltimore by six, then:
a) the total values for the two teams must total up to 200, and
b) Arizona gets six more points than Baltimore.
Arizona gets 103, Baltimore 97.
By this method Arizona has game scores of 103.0 (for the game against Baltimore) and 105.0 (for the ten-point win over Detroit.) That’s an average of 104.0.
The averages for the six teams at this point are:
|
Chicago
|
107.75
|
|
Arizona
|
104.00
|
|
El Paso
|
100.75
|
|
Baltimore
|
99.50
|
|
Detroit
|
94.25
|
|
Ft. Worth
|
93.75
|
We then change these “output values” to the input values, and re-run the same process. The Arizona/Baltimore game now becomes Arizona (104.0) against Baltimore (99.5). That’s a total of 203.5. In the second round, then, the scores must total up to 203.5, and Arizona must have six more than Baltimore. That makes it Arizona, 104.75, vs Baltimore, 98.75.
After the second round, the rankings of the six teams are:
|
Chicago
|
108.6
|
|
Arizona
|
104.4
|
|
Baltimore
|
100.4
|
|
El Paso
|
99.4
|
|
Detroit
|
94.3
|
|
Ft. Worth
|
92.8
|
After the third round, the rankings become:
|
Chicago
|
108.8
|
|
Arizona
|
104.9
|
|
Baltimore
|
100.7
|
|
El Paso
|
98.8
|
|
Detroit
|
94.7
|
|
Ft. Worth
|
92.1
|
After some very large number of cycles of re-calculation, the values become
|
Chicago
|
108.8
|
|
Arizona
|
106.5
|
|
Baltimore
|
100.8
|
|
El Paso
|
97.2
|
|
Detroit
|
96.2
|
|
Ft. Worth
|
90.5
|
After which you can re-calculate until hell freezes over, and the numbers are never going to change. That’s the end point of the system.
OK, but there is a second issue that can be accessed with the same data. The other thing we can figure with the same data is each team’s tendency to score and allow points. Chicago has scored and allowed 121 points in their two games; Arizona has scored and allowed 56. These are the totals for the six teams:
|
Chicago
|
121
|
|
Ft. Worth
|
115
|
|
Detroit
|
91
|
|
El Paso
|
87
|
|
Baltimore
|
58
|
|
Arizona
|
56
|
If we’re going to predict the scores of games, we need to be able to predict the victor and the margin of victory, but we also need to predict how many points will be scored.
I had a notion that we could perhaps do this in this way. An average NFL game this season has featured 44.51 points (44.0 in this data, but 44.51 in real life). This can be seen as the product of the “scoring and allowing” tendencies of the two teams. The square root of 44.51 is 6.67. If each team has a “scoring and allowing” tendency of 6.67, the game figures to allow 44.51 points.
Let us assume initially, then, that each team has an S&A figure of 6.67, in the same way that we initially assumed before that each team had a “quality” of 100.00. When Arizona played Baltimore there were only 20 points scored in the game. Looking at it from Arizona’s standpoint, if Baltimore’s S&A tendency is 6.67, that implies that Arizona’s must be 20 divided by 6.67, which would be three, or some number very close to it with more digits. Same for Baltimore; their S&A derivative from the first game is also 3.00.
Arizona’s S&A derivatives from the first two rounds, then, are 2.999 (for the game against Baltimore, 13-7), and 5.397 (for the game against Detroit, 23-13). The average of these two is 4.198. These are the averages for the six teams:
|
Chicago
|
9.070
|
|
Ft. Worth
|
8.621
|
|
Detroit
|
6.822
|
|
El Paso
|
6.522
|
|
Baltimore
|
4.348
|
|
Arizona
|
4.198
|
In the second round of calculations, then, Arizona is assumed to have an S&A tendency of 4.198, and Baltimore of 4.348. I thought perhaps that, by this method, each team’s tendency to score and allow points could be backed out of their games.
It doesn’t work. Actually what happens. . ..it’s quite interesting. . .what happens is that at the end of the second round, every team goes exactly back to the norm, 6.67. After the third round, they go back to the figures above, and after the fourth round, back to 6.67. They never creep toward their actual individual S&A tendency, as we intended, because they keep leaping over it and then leaping over it the other direction.
How do we stop that from happening?
We don’t allow them to leap over the goal. We prevent that from happening by changing the “output score” of the calculation from the average S&A output score (AvgS&AOS) to this:
(AvgS&AOS + 6.67) / 2
In other words, we allow it to move halfway to where it is trying to go. Chicago becomes not 9.070 but 7.87:
(9.070 + 6.67) / 2 = 7.87
That works. That was the second thing I tried, and it worked perfectly. . . unusual to get something like this to work that quickly, but it did. After one round of calculations, the S&A tendencies appear to be:
|
Chicago
|
7.87
|
|
Ft. Worth
|
7.64
|
|
Detroit
|
6.75
|
|
El Paso
|
6.60
|
|
Baltimore
|
5.51
|
|
Arizona
|
5.43
|
When Arizona plays Baltimore, then, we assume (from Arizona’s standpoint) that Baltimore has an S&A tendency of 5.51. There are 20 points scored in the game, so Arizona’s S&A tendency must be that number which, when multiplied by 5.51, produces 20. Which is 20 divided by 5.51, which is 3.68.
After two rounds of calculations, we get these S&A values for each team:
|
Chicago
|
7.18
|
|
Ft. Worth
|
7.10
|
|
Detroit
|
6.71
|
|
El Paso
|
6.63
|
|
Baltimore
|
5.97
|
|
Arizona
|
5.91
|
And after a large number of cycles or re-calculation, we get these:
|
Chicago
|
7.41
|
|
Ft. Worth
|
7.28
|
|
Detroit
|
6.72
|
|
El Paso
|
6.62
|
|
Baltimore
|
5.82
|
|
Arizona
|
5.76
|
Not that large a number of cycles of re-calculation, actually; this system zeroes in on its destination targets much more rapidly than the other system. As in the ranking method, the system at some point stops moving, and delivers output values from each cycle which are the same as the input values—but this happens much more quickly in this system. I suspect it happens more rapidly because the technique of only allowing the system to move only halfway to its goal stabilizes the data more rapidly, and actually might be adapted into the original system, but that’s a question for another time.
Anyway, if Arizona has an S&A tendency of 5.76 and Baltimore has an S&A tendency of 5.82, we would expect that, when the two teams meet, the number of points in the game would be 5.76 * 5.82, or 34 (33.52). When Chicago plays Ft. Worth, we would expect that the number of points in the game would be 7.41 * 7.28, or 54.
Let us suppose that Chicago played El Paso, which they didn’t in this phony data sample. But if they did, we would predict:
1) that Chicago would win by 12 points (11.6), and
2) that 49 points would be scored in the game.
If 49 points are scored in the game and Chicago wins by 11.6, we would thus predict that Chicago would win 30-19--actually 30.3 to 18.7, but for obvious reasons we round it off.
There are a couple of technical issues here that I will come back to, but these are the S&A tendencies of the 32 NFL teams, based on this season’s data:
|
Team
|
Conf
|
Rank
|
|
Denver
|
A
|
7.08
|
|
Arizona
|
N
|
7.05
|
|
New Orleans
|
N
|
7.02
|
|
Houston
|
A
|
7.01
|
|
San Diego
|
A
|
7.00
|
|
San Francisco
|
N
|
6.96
|
|
NY Jets
|
A
|
6.91
|
|
Green Bay
|
N
|
6.90
|
|
Dallas
|
N
|
6.86
|
|
Chicago
|
N
|
6.84
|
|
Philadelphia
|
N
|
6.84
|
|
Detroit
|
N
|
6.82
|
|
Minnesota
|
N
|
6.80
|
|
NY Giants
|
N
|
6.79
|
|
St. Louis
|
N
|
6.72
|
|
Seattle
|
N
|
6.67
|
|
Indianapolis
|
A
|
6.60
|
|
Kansas City
|
A
|
6.60
|
|
Atlanta
|
N
|
6.58
|
|
Jacksonville
|
A
|
6.58
|
|
Buffalo
|
A
|
6.56
|
|
Miami
|
A
|
6.52
|
|
Cleveland
|
A
|
6.48
|
|
Baltimore
|
A
|
6.45
|
|
Cincinnati
|
A
|
6.45
|
|
New England
|
A
|
6.38
|
|
Tampa Bay
|
N
|
6.36
|
|
Washington
|
N
|
6.32
|
|
Pittsburgh
|
A
|
6.31
|
|
Tennessee
|
A
|
6.30
|
|
Oakland
|
A
|
6.27
|
|
Carolina
|
N
|
6.23
|
A game between Denver and Arizona would figure to generate 50 points (7.08 * 7.05). A game between Oakland and Carolina would figure to generate 39 points.
A lot of these numbers were surprising to me, by the way. Who knew that Oakland was one of the NFL’s most conservative, close-to-the vest teams, or that San Diego and San Francisco were so high on the wild and crazy list? Anyway, we are now in position to predict the scores of NFL games, based on the data. For this week:
|
|
|
|
|
|
Oakland
|
13
|
Miami
|
28
|
|
Detroit
|
10
|
Carolina
|
32
|
|
Chicago
|
23
|
Green Bay
|
24
|
|
New Orleans
|
26
|
Kansas City
|
20
|
|
Baltimore
|
20
|
Giants
|
24
|
|
Philadelphia
|
28
|
Cincinnati
|
16
|
|
Minnesota
|
17
|
Tampa Bay
|
26
|
|
Houston
|
18
|
Indianapolis
|
28
|
|
Denver
|
17
|
Atlanta
|
29
|
|
St. Louis
|
18
|
San Francisco
|
29
|
|
Arizona
|
28
|
Seattle
|
20
|
|
Tennessee
|
25
|
Jacksonville
|
17
|
|
San Diego
|
17
|
Pittsburgh
|
27
|
|
Dallas
|
21
|
Washington
|
23
|
|
Cleveland
|
21
|
Buffalo
|
22
|
The margins here are a little different than they were in my earlier predictions for the week, because I was still following the earlier announced policy of artificially reducing the margin when the game didn’t figure to be close.
A couple of things I was concerned about with this method. You may remember that, in the “ranking” system, it makes no difference whatsoever what initial values are entered for each team, so long as the numbers entered average out to 100. If you enter initial values of 300.00 for Detroit, zero for Tennessee and zero for the Giants, it makes no difference; you get the same output as if you enter them all initially at 100. That’s because the final output is a product of the scores of the games, rather than being in any way related to the initial assumptions.
So, I wondered, would this be true here, as well?
Actually, it is MORE true; in this case you don’t even have to get the average right. If you enter initial values of 2500, -6329, .000001, and -12.74, within a few cycles you get the exact same values as if you had started everybody out at 6.67. The only thing you can’t do is enter an initial value of zero, since that causes the system to try to divide by zero, which of course doesn’t work. Otherwise, the end-point values are entirely unrelated to the starting point values.
OK, but. . . .related problem. You remember that, to calculate the output, we stabilized the data by adding 6.67 to each team’s average, and dividing by two. I wondered: does this tend to push the teams back toward 6.67, thus creating an artificially low standard deviation for the S&A tendency scores?
It doesn’t. .. and here I really don’t understand what is happening in the process; maybe you can explain it to me, or maybe I’ll figure it out. I tested the system by adding to each team in each calculation cycle not 6.67, but the arbitrary number of 8.00.
If you had asked me to guess what that would do, my guesses would have been:
1) that it might cause the output figures to cluster around 8.00, rather than 6.67,
2) that it might cause the entire system to go haywire, and never stabilize, and
3) that it might have no effect whatsoever.
All wrong. What happens is that it DOES effect the data, but only to some very small extent. When you add an arbitrary 8—a number picked at random—rather than 6.67—a number that represents the square root of the points scored in an average NFL game—the output numbers DO change, but not meaningfully. The go down by about 1%. The teams wind up in exactly the same order with essentially the same numbers, but the lowest numbers go down by slightly less than 1.01%, and the highest by a little less than 1.4%.
That doesn’t make any sense to me; I don’t understand why that would happen. But in any case, apparently adding some out-of-range, arbitrary number does NOT cause the system to meaningfully malfunction—therefore, I think we can conclude that, in this case as well, the output numbers are the result simply of what is in the data, rather than being influenced by any input assumption. So, until I can figure out why that happens, it seems to me that I don’t need to worry about it.