INTRODUCING NFL POWER RATINGS

By Bill James

September 22, 2008

Almost a year ago I introduced a system of power ratings for College Football teams. Now we turn our attention to the NFL.

Here’s how it works. We start with the assumption that every NFL team has a “power” of 100.000. Actually, it makes no difference whatsoever what initial assumption we make; if we assumed initially that the St. Louis Rams had a power rating of 3200.00 and all other NFL teams were 0.000, we would wind up with precisely the same answers we have now. The “answers” are formed by the scores of the games, not by the initial assumption.

But anyway, we start with the assumption that everybody is even. The first week of the season the Cardinals played the 49ers in San Francisco, and the Cardinals won 23-13. The Cardinals appear to be ten points better than San Francisco, plus San Francisco was at home; we add another two points for that. The Cardinals appear, based on that one game, to be 12 points better than the 49ers.

The “input values” for the game total 200.00—100.00 for Arizona, 100.00 for San Francisco. If Arizona is twelve points better than San Francisco, then, that makes it Arizona 106, San Francisco 94. The “output values” for the game, then, are Arizona 106, San Francisco 94.

In the second week of the season Arizona beat Miami at home (in Arizona), 31-10. Arizona appears to be 21 points better than Miami, minus two for the home field advantage; Arizona is 19 points better than Miami. The input values for the game total 200.00—the starting assumption—and the output values are Arizona, 109.5, Miami, 90.5.

In the third week Arizona lost to the Redskins, in Washington, 24 to 17. Washington appears to be five points better than Arizona, so the output values from that game are Washington 102.5, Arizona 97.5.

We now have three values for Arizona—106.00, from the game with San Francisco, 109.5, from the game with Miami, and 97.5, from the game with Washington. The average of those three is 104.333. This becomes Arizona’s “working number”, so to speak, our new working assumption about the strength of Arizona’s team. Our initial working assumption was that their strength was 100.000; our new assumption is that it is 104.333.

The first-round calculations for the teams I have mentioned here are:

Arizona 104.333

San Francisco 101.500

Washington 100.167

Miami 100.000

San Francisco lost to Arizona but they won their other two games. We thus have a basis to believe that they may be a better-than-average team, despite the loss to Arizona, thus that Arizona’s win is more impressive than we first thought.

Miami has losses by 6 points and 21 points—a total of 27—but they also have a 25-point win over New England. All totaled, that’s -2 points, but two of the three games are on the road, which is +2. Miami winds up exactly back where they started; at this point they appear to be an exactly average NFL team.

We then repeat the process, using these working numbers as the input values for the next round of calculations. Week one:

Arizona 23 San Francisco 13 at San Francisco

Working numbers Arizona 104.333 San Francisco 101.500

Working total 205.8333

Arizona +12

Output values

Arizona 108.92

San Francisco 96.92

Week two—Arizona vs. Miami, Arizona wins by 19—now has a game value of 204.333. Arizona gets one-half of that—102.1667—plus one-half of the 19, 9.5, making a total of 111.67:

Arizona 31 Miami 10 in Arizona

Working numbers Arizona 104.333 Miami 100.000

Working total 204.333

Arizona +19

Output values

Arizona 111.67

Miami 92.67

And for week three:

Arizona 17 Washington 24 in Washington

Working numbers Arizona 104.333 Washington 100.167

Working total 204.50

Arizona -5

Output values

Arizona 99.75

Washington 104.75

In the second round of our calculations, then, Arizona’s “output values” are 108.92, 111.67 and 99.75. The average of those is 106.78. This becomes our new working assumption about the strength of Arizona’s team; it is 6.78 points better than an average NFL team.

After two rounds of calculation, the values we have are

Arizona 106.778

Washington 102.000

San Francisco 101.389

Miami 100.292

We then do a third round of calculations, using these are the “start values” or “input values”. After three rounds we get these values:

Arizona 108.336

Washington 103.542

San Francisco 100.926

Miami 100.329

After four rounds, these:

Arizona 109.300

Washington 104.781

San Francisco 100.449

Miami   100.148

And after five rounds, these:

Arizona 109.880

Washington 105.772

San Francisco 100.923

Miami 99.823

Initially, all of the teams that Arizona played appeared to be strong teams, so that Arizona’s output rating is pushed a little bit higher in each round of calculations. Beginning with the third round, however, the working numbers for San Francisco and Miami begin to sink, as the system begins to realize that the schedules that they have played are not that good. The working number for Arizona reaches a peak, after eight rounds, of 110.4274, but then it begins to fall, as the sinking values for San Francisco and Miami begin to pull down the output numbers for Arizona. After ten rounds the output values are these:

Arizona 110.227

Washington 108.556

San Francisco 98.296

Miami 97.050

Washington is gaining ground on Arizona, and after twenty rounds of re-calculating, they have passed them:

Washington 109.205

Arizona 108.711

San Francisco 96.925

Miami 93.247

The system is trying to find the point at which Arizona is 12 points better than San Francisco, 19 points better than Miami and 5 points worse than Washington—but also trying to accommodate the score of every other game played in the league this year—not a huge number of games, by the way (46 games so far). After thirty rounds of calculation we are nearer to this goal:

Washington 108.973

Arizona 107.629

San Francisco 96.326

Miami 91.175

San Francisco is “trying to reach” 12 points below Arizona, and they are basically there, so that game now has minimal impact in pushing the system. Washington is trying to reach 5 points ahead of Arizona, and they are moving slowing in that direction. Miami is trying to reach 19 points below Arizona, and they are getting close; they’re now down by 16 and a half. After forty rounds Washington has edged up to 1.7 points ahead of Arizona, and Miami has dropped off to almost 17 points behind them:

Washington  108.785

Arizona 107.048

San Francisco 96.037

Miami 90.101

The goal of the system is to explain the outcome of every game that has been played. In each round of calculation, however, the system makes less overall progress toward that goal than it did in the previous round. After hundreds of rounds of calculation, the system entirely stops moving. At that point, the working numbers become the rankings:

Washington 108.536

Arizona 106.389

San Francisco 95.718

Miami 88.853

At this point, we’re pretty close to “predicting” or matching the outcomes of all three of these games. San Francisco, which lost to Arizona by ten points at home, is now shows as 10.7 points worse than Arizona. Miami, which lost to Arizona by 21 points in Arizona, is now shown as 17.5 points worse than Arizona. Washington, which beat Arizona by 7 points in Washington, is now shown as 2.1 points better than Arizona. All of the results are “explained” to the maximum extent that they CAN be explained by any set of assumptions about the values of the teams.

These, then, are the rankings for the 32 NFL teams, as of September 21, 2008:

AFC		NFC
Team	Rnk	Team	Rnk
Denver	113.8	Dallas	116.2
Tennessee	112.2	Philadelphia	111.5
Baltimore	109.4	Tampa Bay	110.7
San Diego	108.7	New Orleans	110.7
Pittsburgh	105.6	Chicago	109.8
Buffalo	104.9	Washington	108.5
Jacksonville	103.4	NY Giants	107.5
Cincinnati	101.4	Carolina	106.5
Indianapolis	100.1	Arizona	106.4
Cleveland	96.4	Green Bay	105.6
Oakland	94.1	Minnesota	104.1
Houston	90.9	San Francisco	95.7
Miami	88.9	Atlanta	95.7
NY Jets	82.0	Seattle	90.1
New England	78.2	Detroit	81.7
Kansas City	74.7	St. Louis	74.7

We show Denver and Dallas as being the two best teams in football at this time, a conclusion is consistent with the observations of sportswriters. We show the two teams from the Show-Me state in a virtual tie for the distinction of being the worst team in the NFL, which shouldn’t surprise anybody, either. There are, however, a number of surprises and a couple of real shocks in the rankings:

San Diego—winless at this point, with the help of the worst officiating call on record—is nonetheless shown by our system as the fourth-best team in the AFC. San Diego is 0-2, but look at the two losses. Leaving aside the famous blown call—we can’t correct for that—San Diego lost to Carolina by two points and to Denver by one. They’ve lost two games by three points—1.50 points per game. We conclude, then, that they are on essentially the same level as the two teams they have played. The level of the two teams they have played is very high—Carolina 106.5, Denver 113.8. If they were exactly the same level as those two teams, they’d be at 110.2. They’re not at that level; they’re a point and a half behind that, at 108.7. That still makes them one of the best teams in football.

The Jets play tonight at San Diego. The Jets are 1-1, San Diego is 0-2—but our system says that San Diego should win by almost 30 points. We don’t want to take that too seriously; our system explains the past much better than it predicts the future. Whatever the outcome of that game, it will change the rankings not only of those two teams, but of the entire league. If San Diego wins by 30, the change will be minimal; everybody will stay about where they are. But if the Jets win the game, then that will force a re-evaluation not only of the Jets and San Diego, but of Denver, Carolina, and the entire AFC East.

The AFC East, at this point, looks absolutely terrible—but that is based on a relatively few “pinion points” locking the division in place relative to the rest of football. The real question is: Is New England really that bad?

Well, I have a lot of confidence in the system. I am fairly certain that my method is getting as much information out of these game scores as is there to be had. At the same time, three games is three games. In those three games:

1. New England beat Kansas City, in Foxborough, by only 7 points. Kansas City is terrible. We have Kansas City at 74.7. That places New England at only five points better than Kansas City, which means about 20 points worse than an average NFL team.

2. New England beat the Jets, in East Rutherford. But we have no evidence within the system, yet, that the Jets are any good, either, so that game would still appear to suggest that the Pats are about 11 points worse than an average NFL team.

3. New England lost to Miami, at home, by 25 points. Enough said.

Based on the information that we have within this set of 46 games, it appears that the 2008 New England Patriots are really not a good football team. But. ..it is three games, and we’ll see where that goes in the future. If the Jets can hang in the game with the Chargers, that will push the Patriots (and Miami) well up in the rankings.

The NFC’s answers to the San Diego Chargers are the New Orleans Saints and the Chicago Bears. Both teams are 1-2, but they rank fourth and fifth in the NFC in our rankings—ahead of the Super Bowl Champion New York Giants, who are 3-0.

Again, it is early, and we don’t want to take the rankings too seriously. But the Saints have played three very good teams—Tampa Bay, Washington and Denver—with two of the three on the road. They have been outscored by a total of three points—one point a game. They rank 1.00 behind their competition, and their competition has been brutal.

The Giants, on the other hand, squeaked by Cincinnati in overtime, 26-23. They pounded lumps on St. Louis, but then everybody has; the Rams lost 41-13 to the Giants, but 38-3 to Philadelphia and 37-13 to Seattle. They did beat Washington, which was their best game of the year, but there is no basis yet to conclude that the 2008 Giants are a powerhouse. They’re 3-0 against soft competition—and they were damned lucky to get by the Bengals. Maybe they’re really good—and maybe they’re not. Our system says they’re pretty good, but New Orleans is better. We’ll see.

The Bears are about the same as New Orleans—a brutal early schedule (Indianapolis, Carolina and Tampa Bay) with two of the three games on the road. They whupped Indianapolis by 16, lost to Carolina by 3 and Tampa by 3. The two losses are serious for them, because even if they are actually a better team than the Giants, they’re still two games behind them, and there are only 13 games left to be played. If they go 9-4 in the remaining 13 games that would be impressive—but they still might not make the playoffs. The twelve best teams in football are not going to make the playoffs. The good teams that win the close games are going to make the playoffs. The Chargers, Bears and Saints, so far, have been good teams that lost close games. They need to start winning those, and, given the brevity of the NFL schedule, they need to start winning those now, rather than later.

The most anomalous game in the NFL this season, easily, was yesterday’s Miami/New England game. Even assuming that the rankings are correct and New England is one of the worst teams in football this year, Miami would still figure to beat them, in Foxborough, by ten points or less. They beat them by 25. There’s a conundrum: New England beats the Jets, the Jets beat Miami, but Miami thrashes New England.

Of course, those things happen in every sport in every league every year. Almost. I remember when I was in Grade School, we played in an eight-team basketball league, and every single game went according to form. After one round of games one team was 7-0, another was 6-1, 5-2, 4-3, 3-4, 2-5, 1-6, 0-7, and after the second round the teams were 14-0, 12-2, 10-4, etc. I remember trying to explain to a friend of mine how remarkable that was, for the games to break that way, but he didn’t get it; he thought that was the way it should be; the better team would always win.

Anyway, the normal thing is that some games go according to form, and some don’t. Those games are the largest “exceptions to form” in the NFL this year, the games in the AFC East, and the rest of the season will tell us what the outlier is.

COMMENTS (6 Comments, most recent shown first)

andyd
When I've messed around with this type of thing in the past, I've found that adding data from the prior season can be interesting as well and helps flatten out early-season variation. You can weight last season 50%, 75%, etc. according to taste.
10:56 PM Oct 13th

JesseSeg
awesome
3:40 PM Sep 24th

elricsi
[sarcasm on]Oh great, another sport that Bill will take over.[sarcasm off]

P.S. I love this power rating business, and for years I have been following the Massey Ratings, Colley Ratings, Sagarin Ratings, etc. This system seems very similar to one (or some) system(s), but I forget which. All the good ones do something like this.
10:20 PM Sep 22nd

bjames
1) It would be interesting to do baseball ratings. I think the assumption is that it is less necessary with the longer season, but. ..it might be that there is some information there that you would get out earlier with a rating system.

2) SOME strength of schedule adjustment is "fairly standard in sports", of course, but
a) This doesn't relieve me of the responsibility to explain my own system, and
b) I don't see anyone else getting the same kind of results I'm getting.

ESPN Power Ratings shows tonight's Chargers/Jets game as a tossup. Fox Sports Power Ratings have the Giants as the best team in the league. MSNBC has the Jets AHEAD of the Chargers. JFM (a power ratings site) still has the Patriots as the best team in the league.
8:15 PM Sep 22nd

wovenstrap
Is there any reason this would not work if your season were, say, 162 games long? Or are the records for such a league indication enough?
7:36 PM Sep 22nd

tangotiger
Bill, this strength of schedule adjustment is fairly standard in the sports world, and is captured in a model called Bradley-Terry. Ken Butler does these rankings for College Hockey, calling his system KRACH, explained here: http://www.mscs.dal.ca/~butler/krachexp.htm
Whether you use only W/L, or only use Points For/Allowed, or both, they are all variations of the same thing (a logistic regression model).
2:14 PM Sep 22nd

INTRODUCING NFL POWER RATINGS

COMMENTS (6 Comments, most recent shown first)

Leave a comment

Report inappropriate comment


Type of Abuse:
Comments: