This started about four years ago, when I was appearing on a panel with the great Peter Gammons. There was a local player who was a Hall of Fame candidate, and the audience asked me whether I thought he should be in the Hall of Fame. I thought, frankly, that he should be allowed to visit the Hall of Fame like anybody else, but as the room was clearly sympathetic to the candidate, this didn’t seem like the right thing to say. I would have been happy to lie about it and say that I thought he should be elected without delay, but unfortunately I was on record as saying that he should not, and as explaining at great length why he should not, so I would have looked stupid trying to weasel out of it. I explained as briefly as I could that I wished him the best, but I didn’t think he was a Hall of Fame player, and talked briefly about secondary averages, home park effects, runs created relative to context, and some other nonsense. The crowd was getting surly.
At this point somebody asked Peter Gammons the same question, and Gammons replied “Less than .300 and less than 400 homers is tough.” It was a perfect answer. Ten words, and he had closed off the debate.
There is a simple idea at the core of this answer, which is that a player’s Hall of Fame position can be triangulated with respect to a very few stats. Since then I have wondered, How accurately can one predict a player’s Hall of Fame status with respect to just three stats? Hits, home runs, batting average. How well can you do?
The first thing we have to describe is what we mean by “accurate”. This is the scoring method that I used:
1) If the system predicts that an eligible player should be in the Hall of Fame and he is in the Hall of Fame, that’s a “+”, or a “Win” for the system.
2) If the system predicts that a player should be in the Hall of Fame but he is not, that’s an type-one loss for the system.
3) If the system fails to predict that a player should be in the Hall of Fame but he is, that’s a type-two loss for the system.
4) If the system predicts that a player should not be in the Hall of Fame and he is not, that’s a non-event. It doesn’t count for or against the system.
We kind of have to do it that way, because if you count it as a “success” when the system says a player should not be in the Hall of Fame and he is not, then your system will be right 99% of the time whether it is worth a crap or not. My system says that Rafael Belliard should not be in the Hall of Fame—and he isn’t! You can rack up a lot of points that way. We don’t count those points.
Let’s start by simply “predicting” that every player with 3000 career hits will be in the Hall of Fame, and every player with less than 3000 will not. How accurate is that?
It’s 16% accurate. It yields 22 right answers, and zero type-one errors. No one predicted by this system to be in the Hall of Fame is not in (not counting Pete Rose, because Rose is not eligible.) But it gives us 119 type-two errors, since there are 119 hitters in the Hall of Fame who did not have 3000 career hits.
Definitional quibbles. .. .not counting players who also managed and are in the Hall of Fame as managers, not counting Negro League stars, pitchers, etc. You know that stuff.
So we’re 22 for 141, which is 16%.
If we go down to 2900 hits, our system improves to 22% accuracy. At 2800 hits, we’re up to 27%; at 2700 hits, up to 30%, etc.
But by 2700 hits, we have errors of both kinds. Harold Baines, Al Oliver, Rusty Staub, Roberto Alomar, Vada Pinson, Bill Buckner, Dave Parker and Doc Cramer had 2700 hits, but are not in the Hall of Fame. If we predict that all players with 2700 hits should be in the Hall of Fame, we’ll be wrong on those 8 players. At 2700 hits, we have 45 correct predictions, 8 type one errors, and 96 type two errors, as there are 96 hitters who are in the Hall of Fame with less than 2700 hits.
This “system”—this one-element measure—continues to improve in accuracy until we reach the level of 2,210 hits. At 2,210 hits, the system is 48% accurate—91 correct predictions, 47 type one errors, 50 type two errors. If you go either up or down from 2,210, the system becomes less accurate.
Getting ahead of ourselves, we will be able to improve on that level of accuracy—but not by a whole lot. That’s almost as good as we’re going to do, and I’m telling you that now because I don’t want to set up expectations and then fail to deliver. But getting back to the line of march, next let’s check Home Runs, as a one-dimensional measure.
As a single-element predictor of Hall of Fame status, home runs are much less accurate than hits. The accuracy of home runs as a Hall of Fame predictor peaks at 294 home runs, but that accuracy is just 22%. At 294 home runs we have 41 accurate predictions, 46 type one errors, and 100 type two errors (meaning that there are 100 Hall of Fame hitters who did not hit 294 homers.) That’s the best we can do with home runs.
Marginally better with batting average. Batting average, as a predictor, peaks at .311, and the accuracy at that level is 35% (ignoring players with less than 500 career hits.)
OK; now comes the hard part. The hard part is figuring out how to combine these elements to get maximum accuracy. Since there are billions of possible ways to combine them, we can’t try them all. Let’s start with “Hits + Home Runs”.
Adding together hits and home runs, the accuracy of the prediction system, at its peak, reaches .492. The peak is at 2,386. 91 players have a total of 2,386 Hits and Homers, and are in the Hall of Fame. But 44 players have 2,386 Hits + Homers and are not in the Hall of Fame, and 50 players are in the Hall of Fame but don’t have 2,386 Hits + Homers—a total of 94 errors. 91 and 94; 49%.
I tried Hits plus 2 * Homers, but this makes the optimal prediction less accurate. I tried Hits plus 3 * Homers, but that was even less accurate. Adding additional weight to a home run did not bring us closer to the target. Working with 1.50*homers or 1.25*homers might have given us some small benefits, I don’t know. I didn’t try those things.
OK; let’s work with Hits + Homers, and try to bring in Batting Average. Again, there are a million ways to do this:
(Hits + Homers) * Batting Average
(Hits * Batting Average) + Homers
2 (Hits * Batting Average) + Homers
(Hits + Homers) * (Batting Average - .100)
(Hits + Homers) * (Batting Average + .100)
(Hits + Homers + 250) * Batting Average
Etc., etc., without limit. For each formula that you try, you have to find the “cutoff level” at which that formula has maximum value.
I experimented with formulas of this nature for several hours. The best formula that I was able to find. . .well, actually there were two that tied:
A) (Hits + Homers) * (Batting Average - .050)
B) (Hits + Homers +1500) * (Batting Average + .100)
(A) is simpler than (B) and just as accurate, so we’ll use (A). This formula reaches its peak accuracy at 584. If a player’s Hits plus Homers, times Batting average minus .050. . .if that number is greater than 584, the player will be a Hall of Famer.
This formula makes 96 correct predictions. It also predicts Hall of Fame status for 35 players who have not made the Hall Of Fame, and fails to predict Hall of Fame status for 45 players who have been selected. That’s 96-80, or 55% accurate.
Conservatively stated; we could argue that this prediction is 73% accurate. Of 131 players who have a score of 584 or more, 96 are in the Hall of Fame. That’s 73% (96-35). But I think it’s better to say 55%. It’s not a fantastic level of accuracy, but remember, we’re only using a tiny bit of the information available to us. We’ve got reams of stats about every player; we’re only using three of them. We don’t know whether the player is a catcher or a first baseman. We don’t know whether he stole a thousand bases or a dozen. We don’t know whether the player was a World Series hero or Bill Buckner. We’re at 55% without considering any of that. It’s not too bad.
Why are We Doing This?
We are doing it because it is useful, as a part of a package, to know where a player stands with respect to the Hall of Fame. What is a Hall of Fame combination of these three critical stats? Nomar Garciaparra retired this spring with 1,747 hits, 229 homers and a .313 batting average. Is that a Hall of Fame triangle, or is it not?
It is not. It’s 519; the cutoff is 584.
Johnny Damon, on the other hand, does have a Hall of Fame combination. Damon entered the year with 2,425 hits, 207 homers, and a .288 average. That’s comparable to Ryne Sandberg (2,386 hits, 282 homers, .285), Enos Slaughter (2,383 hits, 169 homers, .300) or Jim Bottomley (2,313 hits, 219 homers, .310). It scores at 628. Most players in with combinations similar to that have historically made the Hall of Fame.
Am I saying that Johnny Damon has the Hall of Fame made in the shade? No, of course not. There are always other factors. Sandberg was a Gold Glove second baseman. Slaughter missed three years with World War II. Bottomley had several buddies on the Veteran’s Committee. There are always other factors—plus, our system is only 55% accurate. 45% of the time, players with Hall of Fame combinations will not make the Hall. It’s not a mandate; it’s just a yardstick. It is useful to have yardsticks.
But Wait a Minute
But there is another problem with suggesting that Damon’s a Hall of Famer, which is: that it seems obvious that the standards which have prevailed in the past cannot prevail in the future.
Expansion. There are 30 teams now, not 16, and careers are longer than they used to be. In order for the standards which have applied in the past to apply into the future, there would have to be an increase of 100% or more in the number of players who are selected. That’s not going to happen, because nobody wants it to happen. People complain about the declining standards for the Hall of Fame, but. . .it’s not the real world. In the real world, the standards for Hall of Fame selection started to creep up years ago, and the rate of increase is going to accelerate.
So the standard in the future isn’t likely to be 584, but something more like 700.
A few lists for you. These are the top ten scores of all time:
First
|
Last
|
H
|
HR
|
Avg
|
Score
|
Ty
|
Cobb
|
4189
|
117
|
.366
|
1362
|
Hank
|
Aaron
|
3771
|
755
|
.305
|
1154
|
Stan
|
Musial
|
3630
|
475
|
.331
|
1153
|
Pete
|
Rose
|
4256
|
160
|
.303
|
1117
|
Tris
|
Speaker
|
3514
|
117
|
.345
|
1070
|
Babe
|
Ruth
|
2873
|
714
|
.342
|
1048
|
Rogers
|
Hornsby
|
2930
|
301
|
.358
|
997
|
Willie
|
Mays
|
3283
|
660
|
.302
|
993
|
Honus
|
Wagner
|
3415
|
101
|
.327
|
975
|
Nap
|
Lajoie
|
3242
|
82
|
.338
|
958
|
All Hall of Famers except the guy who isn’t eligible. These are the top ten scores for players not in the Hall of Fame:
First
|
Last
|
H
|
HR
|
Avg
|
Score
|
Pete
|
Rose
|
4256
|
160
|
.303
|
1117
|
Barry
|
Bonds
|
2935
|
762
|
.298
|
917
|
Rafael
|
Palmeiro
|
3020
|
569
|
.288
|
856
|
Manny
|
Ramirez
|
2494
|
546
|
.313
|
800
|
Ken Jr.
|
Griffey
|
2763
|
630
|
.285
|
797
|
Derek
|
Jeter
|
2747
|
224
|
.317
|
794
|
Alex
|
Rodriguez
|
2531
|
583
|
.305
|
793
|
Harold
|
Baines
|
2866
|
384
|
.289
|
778
|
Craig
|
Biggio
|
3060
|
291
|
.281
|
775
|
Gary
|
Sheffield
|
2689
|
509
|
.292
|
773
|
And these are the top ten scores for players who are eligible for the Hall of Fame, but not in:
First
|
Last
|
H
|
HR
|
Avg
|
Score
|
Harold
|
Baines
|
2866
|
384
|
.289
|
778
|
Al
|
Oliver
|
2743
|
219
|
.303
|
750
|
Roberto
|
Alomar
|
2724
|
210
|
.300
|
734
|
Dave
|
Parker
|
2712
|
339
|
.290
|
732
|
Vada
|
Pinson
|
2757
|
256
|
.286
|
711
|
Steve
|
Garvey
|
2599
|
272
|
.294
|
701
|
Fred
|
McGriff
|
2490
|
493
|
.284
|
699
|
George
|
Van Haltren
|
2532
|
69
|
.316
|
691
|
Bill
|
Buckner
|
2715
|
174
|
.289
|
690
|
Rusty
|
Staub
|
2716
|
292
|
.279
|
690
|
I went through a lot of gyrations trying to find some formula which didn’t show Harold Baines to be well above the normal standard of a Hall of Famer. I couldn’t find any such formula. Every combination that I tried, without exception, showed Harold Baines to be the highest-scoring player who is eligible for the Hall of Fame but has not been selected.
These are the lowest-scoring players who are in the Hall of Fame:
First
|
Last
|
H
|
HR
|
Avg
|
Score
|
Ray
|
Schalk
|
1345
|
11
|
.253
|
276
|
Roger
|
Bresnahan
|
1252
|
26
|
.279
|
293
|
Roy
|
Campanella
|
1161
|
242
|
.276
|
317
|
Frank
|
Chance
|
1273
|
20
|
.296
|
318
|
Phil
|
Rizzuto
|
1588
|
38
|
.273
|
363
|
Joe
|
Tinker
|
1687
|
31
|
.262
|
365
|
Johnny
|
Evers
|
1659
|
12
|
.270
|
368
|
Tommy
|
McCarthy
|
1496
|
44
|
.292
|
372
|
Rick
|
Ferrell
|
1692
|
28
|
.281
|
397
|
Hughie
|
Jennings
|
1527
|
18
|
.311
|
404
|
And these are the 20 highest-scoring active players as of the close of the 2009 season:
First
|
Last
|
YEAR
|
H
|
HR
|
Avg
|
Score
|
Manny
|
Ramirez
|
2009
|
2494
|
546
|
.313
|
800
|
Ken Jr.
|
Griffey
|
2009
|
2763
|
630
|
.285
|
797
|
Derek
|
Jeter
|
2009
|
2747
|
224
|
.317
|
794
|
Alex
|
Rodriguez
|
2009
|
2531
|
583
|
.305
|
793
|
Gary
|
Sheffield
|
2009
|
2689
|
509
|
.292
|
773
|
Ivan
|
Rodriguez
|
2009
|
2711
|
305
|
.299
|
751
|
Chipper
|
Jones
|
2009
|
2406
|
426
|
.307
|
729
|
Vladimir
|
Guerrero
|
2009
|
2249
|
407
|
.321
|
721
|
Todd
|
Helton
|
2009
|
2134
|
325
|
.328
|
684
|
Garret
|
Anderson
|
2009
|
2501
|
285
|
.295
|
682
|
Johnny
|
Damon
|
2009
|
2425
|
207
|
.288
|
628
|
Omar
|
Vizquel
|
2009
|
2704
|
78
|
.273
|
619
|
Jim
|
Thome
|
2009
|
2138
|
564
|
.277
|
615
|
Ichiro
|
Suzuki
|
2009
|
2030
|
84
|
.333
|
598
|
Albert
|
Pujols
|
2009
|
1717
|
366
|
.334
|
591
|
Bobby
|
Abreu
|
2009
|
2111
|
256
|
.299
|
590
|
Magglio
|
Ordonez
|
2009
|
1974
|
277
|
.312
|
590
|
Carlos
|
Delgado
|
2009
|
2038
|
473
|
.280
|
577
|
Miguel
|
Tejada
|
2009
|
2114
|
285
|
.289
|
573
|
Edgar
|
Renteria
|
2009
|
2185
|
132
|
.288
|
550
|