Introduction
· This is my second attempt at "Bill James Fusion", which, much like fusion cooking, is when I take 2 different Bill James concepts and combine them into something a little different (and hopefully delicious, although your taste experience may vary).
My initial fusion attempt was combining the Bill James creations often referred to as "Hall of Fame metrics" (Hall of Fame Monitor, Black Ink, Gray Ink, and Hall of Fame Standards) with Similarity Scores. This time, I’m combining traditional Similarity Scores with what Bill used to refer to as "Seasonal Notation", which was simply a player’s stats expressed in a per-162 game context. OK, maybe the concept of expressing statistics on a per-162 game basis isn’t originally Bill’s, but I do believe he came up with the term "Seasonal Notation", so that’s good enough for me.
· This is not a new concept. I do remember seeing Similarity Scores per 162 games as a feature on the now-defunct "Baseball Gauge" (or "Seamheads") web site, although I’m using a different set of categories and penalties in coming up with the scores. Also, I believe they only included it as part of their Negro Leagues Database section on the site, although I’m not positive I’m remembering that 100% correctly, and it’s too late for me to verify that.
· This was my initial attempt at this approach, so I think there’s a lot of room for potential improvement. I suspect many of you would have picked different categories or come up with different penalties, and that’s certainly understandable. This is just my attempt at coming up with a scoring approach and seeing what kind of results it generated.
· Although traditional Similarity Scores are often referenced in the context of Hall of Fame discussions and comparisons, I’m not really pushing for the same thing here. In my opinion, Hall of Fame candidates have a lot of potential areas to consider – their total careers, their peaks, the impact of individual seasons, contributions to successful teams, awards and honors, milestones, records, and so on.
Seasonal Notation Similarity Scores, by themselves, give some sense of the level and quality of a player’s performance, but not how prolific they were. And, many of my examples are for players who had short careers, and those short careers often "benefit" from being expressed in a per-162 game context because the player in question did not experience an extended decline phase.
For example, Pete Rose’s Seasonal Notation and rate stats (batting average, OBP, etc.) suffer, in part, because he kept on playing and playing and playing. Had he stopped playing a few years earlier, his rate stats and his Seasonal Notation numbers would have looked a lot better, but then he also wouldn’t have enjoyed the "bulk" totals he currently possesses. It’s a double-edged sword. And, the Hall of Fame, I believe, tends to favor those with longer careers, or at least those with more impressive career totals.
Background
I’ve always loved the concept of Similarity Scores, because the topic is close to my heart. In my everyday job, I am involved in demand planning and forecasting, and the concept of similarity is when we try to plan and forecast the items and product categories that we sell. Is a new product similar to another one that already exists? How similar? In what way might it be different? Is it in a similar product category but with some key product feature differences? Are the items being compared different brands? Are they priced differently? Are they promoted differently? What kind of "sales curve" do they follow? Do they tend to have stable sales, or do they fluctuate wildly by time of year? What are the implications of the similarities, and what are the implications of the differences?
Now, I will say that I suspect traditional Similarity Scores may not be leaned upon as heavily as they may once were. Bill introduced them roughly 40 years ago, and people use them for all kinds of comparisons, including (but not limited to) Hall of Fame discussions. I think they were a big step forward in how we compare players (and, boy, do we like to compare players!). But there are a lot of caveats in using them, as everyone (including Bill) acknowledges. Similarity Scores use basic career stat categories, ones that are not adjusted for time or place, so that a home run is a home run regardless of when or where it was hit, and a .300 average is treated the same regardless of whether it was generated in the 1930’s or the 1960’s. Also, the categories for hitters are strictly offense-based, although there is a positional adjustment.
To level set, below is the explanation of Similarity Scores from baseball-reference.com (there’s one for batters and one for pitchers, this is just the one for hitters):
Similarity scores are not our concept. Bill James introduced them in the mid-1980s, and we lifted his methodology from his book The Politics of Glory (p. 86-106). To compare one player to another, start at 1000 points and then subtract points based on the statistical differences of each player.
Batters
One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.
The key here is that traditional Similarity Scores use a player’s career statistics. What I want to take a look at is comparing players on a "per opportunity" basis, which in my case is per 162 games. I could have used plate appearances, but I decided to put everything into a context of 162 games. And I think the most interesting results are for players who had abbreviated careers.
A couple of quick examples, with a quick sidebar:
For some reason, Al Rosen is a fascinating player to me, I suppose because he packed a lot into a very short career. A few bullet points:
· Rosen only played 10 seasons, and the first 3 of those were no more than brief cups of coffee as he was stuck behind Cleveland’s All Star third baseman, Ken Keltner, so he really only had 7 seasons, and really only 5 good ones.
· When Rosen finally did get an opportunity, he made the most of it. In 1950, he broke the AL record for home runs by a rookie with 37, a mark that stood until Mark McGwire hit 49 in 1987.
· In 1953, Rosen had what may be the best season any third baseman has ever had when he hit 43 home runs, drove in 145 runs, scored 115 runs, slugged .613, had an OPS+ of 180, and had 367 total bases, all of which were league-leading figures. He also hit .336, just missing the batting title (and a triple crown) by a single point to Mickey Vernon’s .337. In addition, he realized a rWAR of 10.1, which is still the only time a third baseman has achieved a WAR of 10.0 or higher. Rosen was named the unanimous MVP.
· In 1954, Rosen had one of the greatest individual performances in All Star history when he went 3 for 4 with 2 home runs, 5 RBI, and a walk. The 2 home runs and the 5 RBI are tied for the single-game highs in All Star game history.
Rosen also had what I think most people would agree was a generally successful post-playing career as an executive for the Yankees, Astros, and Giants.
Anyway, here’s the stat line for Al Rosen and his top 5 comps by traditional Similarity Scores:
Name
|
Score
|
G
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
SB
|
BB
|
SO
|
BA
|
SLG
|
Al Rosen
|
1,000
|
1,044
|
3,725
|
603
|
1,063
|
165
|
20
|
192
|
717
|
39
|
587
|
385
|
.285
|
.495
|
Bob Horner
|
934
|
1,020
|
3,777
|
560
|
1,047
|
169
|
8
|
218
|
685
|
14
|
369
|
512
|
.277
|
.499
|
Anthony Rendon
|
912
|
1,026
|
3,830
|
624
|
1,100
|
269
|
16
|
151
|
611
|
45
|
476
|
682
|
.287
|
.484
|
Josh Hamilton
|
910
|
1,027
|
3,909
|
609
|
1,134
|
234
|
24
|
200
|
701
|
50
|
352
|
938
|
.290
|
.516
|
Jim Ray Hart
|
906
|
1,125
|
3,783
|
518
|
1,052
|
148
|
29
|
170
|
578
|
17
|
380
|
573
|
.278
|
.467
|
Charlie Keller
|
904
|
1,170
|
3,790
|
725
|
1,085
|
166
|
72
|
189
|
760
|
45
|
784
|
499
|
.286
|
.518
|
Rosen had a short career (he spent a few years at the start of his career behind Ken Keltner, and then he retired early due to back issues), and so naturally the players considered most similar to him were players with similar career lengths who lined up close to his career stats.
But these aren’t the type of players who Rosen reminds me of. Well, Rendon feels like a decent comp, but he’s also an active player whose stats are in flux. I think Rosen was quite a bit better overall than Horner and Hart. Hamilton and Keller were good players (and I think Keller was probably a better overall hitter than Rosen), but they were outfielders. So, this feels a little unsatisfying to me in terms of capturing what kind of player Rosen was.
Another example:
Here’s the stat line for Dodger Hall of Famer Roy Campanella and his top 5 comps by traditional Similarity Scores:
Name
|
Score
|
G
|
PA
|
AB
|
R
|
H
|
2B
|
3B
|
HR
|
RBI
|
SB
|
BB
|
SO
|
BA
|
SLG
|
Roy Campanella
|
1,000
|
1,430
|
5,648
|
4,951
|
771
|
1,401
|
226
|
30
|
260
|
1,017
|
34
|
605
|
501
|
.283
|
.498
|
Javy Lopez
|
913
|
1,503
|
5,793
|
5,319
|
674
|
1,527
|
267
|
19
|
260
|
864
|
8
|
357
|
969
|
.287
|
.491
|
Brian McCann
|
866
|
1,755
|
6,850
|
6,067
|
742
|
1,590
|
294
|
5
|
282
|
1,018
|
25
|
640
|
1,054
|
.262
|
.452
|
Walker Cooper
|
862
|
1,473
|
5,082
|
4,702
|
573
|
1,341
|
240
|
40
|
173
|
812
|
18
|
309
|
357
|
.285
|
.464
|
Troy Tulowitzki
|
857
|
1,291
|
5,415
|
4,804
|
762
|
1,391
|
264
|
24
|
225
|
780
|
57
|
511
|
900
|
.290
|
.495
|
Jason Varitek
|
829
|
1,546
|
5,839
|
5,099
|
664
|
1,307
|
306
|
14
|
193
|
757
|
25
|
614
|
1,216
|
.256
|
.435
|
So, these players do have career hitting stats that bear some similarity to Campanella (although Lopez is the only one with a score over 900), but part of that is that Campanella had 2 major influences on his career stats – his early years and stats are severely understated due to his time spent in the Negro Leagues (8 years but with only 214 games that have been captured), and then his paralyzing injury before the 1958 season that eliminated whatever time he may have had remaining. As a result, Campanella only had the equivalent of about 9 full seasons worth of games.
So, while these are fine players, they are not the players that Campanella reminds me of.
Approach
In coming up with the scheme for Seasonal Notation Similarity Scores, I decided to keep some of the categories from traditional Similarity Scores, but to eliminate others.
I did away with total games played since everything is being expressed as a per-162 game context, so it’s totally unnecessary. I also did away with at bats, as I felt like it wasn’t real valuable in the per-162 game context.
I also eliminated hits, doubles, triples, and strikeouts, as I didn’t consider them essential. I could have kept strikeouts, but it’s been in such flux over time that I felt like I’d have to adjust or index everyone’s figures, and I was trying to keep this version pretty simple, so I just decided to eliminate them at this point.
So, from the original Similarity Score methodology, I’m keeping 7 of the original categories:
· Home Runs
· Runs
· RBI
· Walks
· Stolen Bases
· Batting Average
· Slugging Percentage
The first 5 are then adjusted to "per 162 games", with batting average and slugging percentage staying as is.
What am I adding?
I thought OBP should be included (I was kind of surprised that it wasn’t already, I had always assumed it was), so I added it. That brings us to 8.
I also thought some more current metrics might be useful, ones that adjust for context, so I added
· WAR (baseball reference version, per 162 games)
· dWAR (per 162 games)
· OPS+
That gives me 11 categories rather than the original 13. 10 would have been a more satisfying number, but I decided not to let that bother me.
Now, I know that dWAR (defensive WAR) and WAR overlap some (WAR essentially is total player value covering hitting, baserunning, and defense, and both dWAR and WAR incorporate a positional adjustment). Also, I’m sure not everyone is sold on dWAR as a measure, but ultimately I decided to keep both of them. WAR is a good approximation of overall value, and dWAR is at least something we can use to try to quantify defensive value, so I felt like they both brought something to the table, but I didn’t go all the way to bring in oWAR (offensive WAR) as a separate metric.
One of the other reasons I’m including dWAR is that I decided to make this Similarity Score totally about comparing players at the same primary position. That is, there’s no position adjustment I’m making in the score calculation as is done in traditional Similarity Scores – you’re either the same position or you’re not. I did come up with a "switch" in my spreadsheet that allows Similarity Scores to be generated ignoring the position mandate, but in that option I get rid of dWAR. Mostly, I’m going to focus on players at the same position, because I think that is a large part of what I think of as "similarity". I’m sure not everyone would agree with this, and I’m aware that, in most cases, a player’s "primary" position is not the only one they played at, so dWAR may not be a perfect metric to leverage, but it’s the approach I took.
The next step was establishing penalties for differences in each category. Without going into too much detail, I played around with the penalties until I reached what I got results that I was comfortable with based on the range of values in each category, the scale that each category uses, and the rollup of penalty points applied.
The table below summarizes where I landed, keeping in mind that most of the penalty point figures are a lot different than traditional Similarity Scores because we’re dealing with per-162 game figures, so the data we’re comparing is on a much smaller scale (with smaller differences) than career totals, and the relative size of the penalties for each unit difference had to reflect a different magnitude than in traditional Similarity Scores:
Category
|
Penalty for Difference
|
Home Runs per 162 games
|
2 points for each HR per 162 games
|
Runs per 162 games
|
1 point for each run per 162 games
|
RBI per 162 games
|
1 point for each RBI per 162 games
|
Stolen Bases per 162 games
|
3 points for each stolen base per 162 games
|
Walks per 162 games
|
1 point for each walk per 162 games
|
Batting Average
|
1 point for each .001 difference
|
OBP
|
.75 points for each .001 difference
|
Slugging Pct.
|
.5 points for each .001 difference
|
WAR per 162 games
|
10 points for each 1.0 WAR per 162 games
|
dWar per 162 games
|
4 points for each 0.1 dWAR difference per 162 games
|
OPS+
|
1 point for each point difference
|
Again, there’s nothing magical about these penalty points – I just played around with them until I got what I thought were reasonable results. I’m sure they could be improved upon.
Examples
OK. Hopefully that’s enough of a setup. Let’s put it through its paces.
I find that most of the "interesting" examples tend to be players who had abbreviated careers of one kind or another, as those are the ones who tend to benefit by looking at stats expressed in Seasonal Notation. Of course, often it’s true that those players get the benefit of not having what I would call an "elongated" decline phase which can affect a player’s rate stats. I fully acknowledge that effect.
A few notes:
· In each table, I’m going to put each category stat included in the calculation.
· The lists will show the top 10 comps in descending order of the score (the player being compared to is listed first, then the #1 comp, then the #2 comp, and so on).
· "SN" is shorthand for "Seasonal Notation".
· Unless otherwise noted, I’m only including comparison players who have at least 1,000 career games played and are classified as playing the same "primary" position.
· I’m also going to include career games as an information column just to put each player’s total career length in perspective, although obviously total career games are not part of the comparison. But, on many of these examples, I’m using players who had relatively short careers, so this is just a reminder of that and to keep the comparisons in perspective.
· Finally, if someone is among the player’s current top 10 traditional Similarity Score comps, I’ll put that rank in parentheses by the player’s name, so we get a sense of which players can be considered as similar regardless of whether we’re looking at their careers or their seasonal notation.
Let’s start by circling back to Roy Campanella:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Roy Campanella
|
|
1,000
|
29
|
87
|
115
|
4
|
69
|
4.7
|
1.00
|
126
|
.283
|
.363
|
.498
|
1,430
|
Yogi Berra
|
Y
|
932
|
27
|
90
|
109
|
2
|
54
|
4.6
|
0.70
|
125
|
.285
|
.348
|
.482
|
2,120
|
Johnny Bench
|
Y
|
906
|
29
|
82
|
103
|
5
|
67
|
5.6
|
1.48
|
126
|
.267
|
.342
|
.476
|
2,158
|
Bill Dickey
|
Y
|
902
|
18
|
84
|
109
|
3
|
61
|
5.1
|
0.92
|
127
|
.313
|
.382
|
.486
|
1,789
|
Gabby Hartnett (9)
|
Y
|
898
|
19
|
71
|
96
|
2
|
57
|
4.6
|
1.08
|
126
|
.297
|
.370
|
.489
|
1,990
|
Buster Posey
|
|
872
|
19
|
78
|
86
|
3
|
64
|
5.3
|
1.16
|
129
|
.302
|
.372
|
.460
|
1,371
|
Jorge Posada (6)
|
|
865
|
24
|
80
|
94
|
2
|
83
|
3.8
|
0.23
|
121
|
.273
|
.374
|
.474
|
1,829
|
Carlton Fisk
|
Y
|
862
|
24
|
83
|
86
|
8
|
55
|
4.4
|
1.10
|
117
|
.269
|
.341
|
.457
|
2,499
|
Mike Piazza
|
Y
|
858
|
36
|
89
|
113
|
1
|
64
|
5.0
|
0.13
|
143
|
.308
|
.377
|
.545
|
1,912
|
Javy Lopez (1)
|
|
850
|
28
|
73
|
93
|
1
|
38
|
3.2
|
0.61
|
112
|
.287
|
.337
|
.491
|
1,503
|
Brian McCann (2)
|
|
831
|
26
|
68
|
94
|
2
|
59
|
3.0
|
0.73
|
110
|
.262
|
.337
|
.452
|
1,755
|
So, we can see that 4 players who were on Campanella’s top 10 traditional comp list also made his seasonal notation top 10. Campanella’s top 2 traditional comps (Lopez and McCann) are still on the list, but they’re much further down, while Hartnett is a little higher up, and Posada’s about the same.
The big difference now is that Campanella’s top 4 comps are all Hall of Famers, and his #1 comp is his contemporary and fellow 3-time 1950’s MVP, Yogi Berra, which I have to say I’m pretty happy with. And, as you can see, they’re pretty comparable among most categories, with Campanella having a little better OBP (and higher walks) and Slugging Percentage, and a little higher dWAR, with Yogi (of course) having the much career games figure. But on a per-162 game performance basis, they’re pretty close.
I don’t know about you, but this is a very satisfying list to me. Because of many of the reasons outlined earlier, Campanella often suffers when compared to other great catchers. Campanella is 17th in JAWS, for example. Now, that’s not a complaint…..everyone recognizes why he ranks so low on something like that, and we make proper adjustments. His total career games that have been officially captured is only about 1,400 games, and that’s a relatively low total. But, again, we know why that is.
When I think of Campanella, I think of him as a top-10 all-time catcher, possibly even top 5 depending on your perspective of what’s important and how to make the necessary adjustments. But I like the fact that his best comps are players like Berra, Bench, Dickey, and Hartnett rather than Lopez, McCann, Cooper and Tulowitzki.
OK. How about revisiting Al Rosen?
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Al Rosen
|
|
1,000
|
30
|
94
|
111
|
6
|
91
|
5.0
|
0.06
|
137
|
.285
|
.384
|
.495
|
1,044
|
Chipper Jones
|
Y
|
902
|
30
|
105
|
105
|
10
|
98
|
5.5
|
-0.06
|
141
|
.303
|
.401
|
.529
|
2,499
|
Eddie Mathews
|
Y
|
897
|
35
|
102
|
98
|
5
|
98
|
6.5
|
0.38
|
143
|
.271
|
.376
|
.509
|
2,391
|
Josh Donaldson
|
|
895
|
34
|
100
|
98
|
5
|
88
|
6.0
|
0.70
|
135
|
.269
|
.367
|
.505
|
1,201
|
David Wright
|
|
895
|
25
|
97
|
99
|
20
|
78
|
5.0
|
0.03
|
133
|
.296
|
.376
|
.491
|
1,585
|
Anthony Rendon (2)
|
|
890
|
24
|
99
|
96
|
7
|
75
|
5.1
|
0.79
|
126
|
.287
|
.369
|
.484
|
1,026
|
Troy Glaus
|
|
888
|
34
|
94
|
100
|
6
|
90
|
4.0
|
0.32
|
119
|
.254
|
.358
|
.489
|
1,537
|
Hank Thompson (9)
|
|
877
|
22
|
92
|
85
|
8
|
83
|
4.6
|
0.33
|
122
|
.274
|
.376
|
.460
|
1,069
|
George Brett
|
Y
|
875
|
19
|
95
|
96
|
12
|
66
|
5.3
|
0.13
|
135
|
.305
|
.369
|
.487
|
2,707
|
Ron Santo
|
Y
|
866
|
25
|
82
|
96
|
3
|
80
|
5.1
|
0.63
|
125
|
.277
|
.362
|
.464
|
2,243
|
Bob Elliott
|
|
861
|
14
|
87
|
98
|
5
|
79
|
4.2
|
-0.25
|
124
|
.289
|
.375
|
.440
|
1,978
|
Rosen’s previous #1 comp (Bob Horner) falls out of the top 10 (he’s down at 18 now). 2 others (Rendon and Thompson) are still in the top 10. Rosen’s #1 comp is now Chipper Jones.
Now, I will say this. Chipper is the #1 comp, but Chipper’s still better, and Chipper has edges in nearly every category above. Chipper is better on a seasonal basis, and he’s light years ahead on career, as his career is two-and-a half times as long. But, Rosen checks in with per-162 game figures of 30 HR, 94 RBI, 91 walks, a 137 OPS+, and a WAR per 162 of 5.0. That’s a pretty darn good ballplayer.
Again, as often happens with Similarity Scores, you can have players who are the most similar to you but who are better players than you. And, of course, the players right after Chipper Jones are players like Donaldson, Wright, Rendon, and Glaus. Donaldson (like Rosen) won an MVP (as did Elliott) and had other high finishes, and both Wright and Rendon have placed high as well. It’s an interesting blend of some of the very elite at the position (Brett, Mathews, Jones, Santo) and others who were pretty good, but for a much shorter time.
But I feel like this list gives a better representation of the quality of player Rosen was when he was actually playing than his traditional comp list. I’m not trying to put him in the Hall of Fame - he had a very short career. But he was a very good player while he was in there.
Who else can we look at? How about Jackie Robinson?
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Jackie Robinson
|
|
1,000
|
16
|
111
|
87
|
23
|
86
|
7.3
|
1.18
|
133
|
.313
|
.410
|
.477
|
1,416
|
Charlie Gehringer
|
Y
|
881
|
13
|
124
|
100
|
13
|
83
|
5.9
|
0.75
|
125
|
.320
|
.404
|
.480
|
2,323
|
George Grantham (1)
|
|
809
|
12
|
102
|
80
|
15
|
80
|
3.7
|
-0.24
|
122
|
.302
|
.392
|
.461
|
1,444
|
Dustin Pedroia
|
|
808
|
15
|
99
|
78
|
15
|
67
|
5.6
|
1.66
|
113
|
.299
|
.365
|
.439
|
1,512
|
Frankie Frisch
|
Y
|
808
|
7
|
107
|
87
|
29
|
51
|
5.0
|
1.51
|
110
|
.316
|
.369
|
.432
|
2,311
|
Tony Lazzeri
|
Y
|
806
|
17
|
92
|
111
|
14
|
81
|
4.4
|
0.48
|
121
|
.292
|
.380
|
.467
|
1,740
|
Eddie Collins
|
Y
|
801
|
3
|
104
|
74
|
42
|
86
|
7.1
|
0.46
|
142
|
.333
|
.424
|
.429
|
2,826
|
Rod Carew
|
Y
|
796
|
6
|
93
|
67
|
23
|
67
|
5.3
|
-0.11
|
131
|
.328
|
.393
|
.429
|
2,469
|
Nap Lajoie
|
Y
|
795
|
5
|
98
|
104
|
25
|
34
|
7.0
|
0.66
|
150
|
.338
|
.380
|
.466
|
2,480
|
Ryne Sandberg
|
Y
|
794
|
21
|
99
|
79
|
26
|
57
|
5.1
|
1.01
|
114
|
.285
|
.344
|
.452
|
2,164
|
Roberto Alomar
|
Y
|
792
|
14
|
103
|
77
|
32
|
70
|
4.6
|
0.22
|
116
|
.300
|
.371
|
.443
|
2,379
|
Robinson’s top comps in traditional Similarity Scores are George Grantham, Daniel Murphy, Freddie Lindstrom, Edgardo Alfonso, and Denny Lyons. Lindstrom is the only Hall of Famer among Robinson’s top 10 traditional comp list (although Jose Altuve is currently sitting at #8). Grantham still makes the list, but it’s now virtually all Hall of Famers.
Now, much like Campanella, we understand the limits of traditional Similarity Scores when it comes to Robinson. Robinson had a very short career in MLB (10 seasons), and he didn’t debut with the Dodgers until age 28, so his career numbers understate his true value. He’s 10th in JAWS, which is impressive enough as is even on face value, but he’s even better than that. Robinson is 16th in career WAR among second basemen, but 6th in WAR7 (top 7 seasons). Robinson was a great player, arguably top 5 at the position. The brevity of his career is a big part of why his top 10 traditional comps aren’t very impressive.
Anyway, Seasonal Notation Similarity Scores illustrate how great Jackie was, and it yields a much more impressive list of comps. His WAR/162 of 7.3 is higher than every 2nd baseman with 1,000 or more career games except for Rogers Hornsby (9.1). And, even though his updated top 10 comp list is 80% Hall of Famers, the only one with a relatively high Similarity Score figure is Charlie Gehringer (881). They’re reasonably similar, except Robinson has more WAR/162, more stolen bases per 162, and more quantitative defensive value. Gehringer’s a great player, one of my all-time favorites, but I think Robinson was the better player, all of which plays into the greatness and uniqueness of Robinson’s career performance.
How about another player who had an abbreviated career? Let’s look at Don Mattingly:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Don Mattingly
|
|
1,000
|
20
|
91
|
100
|
1
|
53
|
3.8
|
-0.56
|
127
|
.307
|
.358
|
.471
|
1,785
|
Ripper Collins
|
|
965
|
20
|
92
|
98
|
3
|
53
|
3.7
|
-0.49
|
126
|
.296
|
.360
|
.492
|
1,084
|
Adrian Gonzalez (6)
|
|
922
|
27
|
84
|
101
|
1
|
66
|
3.7
|
-0.30
|
129
|
.287
|
.358
|
.485
|
1,929
|
Eddie Murray
|
Y
|
918
|
27
|
87
|
103
|
6
|
71
|
3.7
|
-0.62
|
129
|
.287
|
.359
|
.476
|
3,026
|
Ted Kluszewski
|
|
911
|
26
|
80
|
97
|
2
|
46
|
3.0
|
-0.93
|
123
|
.298
|
.353
|
.498
|
1,718
|
Mike Sweeney
|
|
910
|
24
|
85
|
101
|
6
|
58
|
2.8
|
-0.86
|
118
|
.297
|
.366
|
.486
|
1,454
|
Joe Torre
|
Y
|
908
|
18
|
73
|
87
|
2
|
57
|
4.2
|
-0.02
|
129
|
.297
|
.365
|
.452
|
2,209
|
Tony Perez
|
Y
|
906
|
22
|
74
|
96
|
3
|
54
|
3.2
|
-0.39
|
122
|
.279
|
.341
|
.463
|
2,777
|
Justin Morneau
|
|
905
|
26
|
81
|
103
|
1
|
60
|
2.8
|
-0.69
|
120
|
.281
|
.348
|
.481
|
1,545
|
Cecil Cooper (1)
|
|
905
|
21
|
86
|
96
|
8
|
38
|
3.1
|
-0.84
|
121
|
.298
|
.337
|
.466
|
1,896
|
Kent Hrbek
|
|
900
|
27
|
84
|
101
|
3
|
78
|
3.6
|
-0.71
|
128
|
.282
|
.367
|
.481
|
1,747
|
One reason I’m including Mattingly is that his #1 comp has one of the highest Seasonal Notation Similarity Scores that I’ve come across, and that’s Ripper Collins at a whopping 965. Collins was a member of the famous St. Louis Cardinals’ "Gas House Gang" of the 1930’s, but I think he often gets overshadowed by the more memorable characters from that team, such as Dizzy and Daffy Dean, Frankie Frisch, Pepper Martin, Joe Medwick and Leo Durocher. In the Gang’s most famous season (1934), Collins was probably the team’s most valuable position player (tied for league lead with 35 HR’s, led the league in total bases and slugging), and probably the 2nd best overall after Dean (who famously won 30 games). But, Collins ultimately had a pretty short career with only 7 seasons of 100 or more games.
Anyway, Collins is a very strong across-the-board match for Mattingly, with no major differences in the individual categories. All 10 of Mattingly’s top comps have scores of 900 or above.
2 players (Gonzalez and Cooper) carry over from Mattingly’s traditional Similarity Score comp list. 3 of the comps are Hall of Famers, but Torre is in more for his managerial success (although he was a fine player as well), while Murray and Perez both had much longer careers.
Charlie Keller came up early in the article, a great hitter with an abbreviated career. Let’s run him through the tool:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Charlie Keller
|
-
|
1,000
|
26
|
100
|
105
|
6
|
109
|
6.1
|
-0.12
|
152
|
.286
|
.410
|
.518
|
1,170
|
Lance Berkman
|
-
|
902
|
32
|
99
|
106
|
7
|
104
|
4.5
|
-0.95
|
144
|
.293
|
.406
|
.537
|
1,879
|
Bob Johnson
|
-
|
893
|
25
|
108
|
112
|
8
|
93
|
4.9
|
-0.50
|
139
|
.296
|
.393
|
.506
|
1,863
|
Ralph Kiner
|
Y
|
860
|
41
|
107
|
112
|
2
|
111
|
5.3
|
-1.18
|
149
|
.279
|
.398
|
.548
|
1,472
|
Jeff Heath
|
-
|
851
|
23
|
91
|
104
|
7
|
69
|
4.4
|
-0.63
|
139
|
.293
|
.370
|
.509
|
1,383
|
Carl Yastrzemski
|
Y
|
847
|
22
|
89
|
90
|
8
|
90
|
4.7
|
0.05
|
130
|
.285
|
.379
|
.462
|
3,308
|
Monte Irvin
|
Y
|
843
|
22
|
89
|
108
|
8
|
73
|
5.0
|
0.24
|
134
|
.304
|
.388
|
.489
|
1,032
|
Ken Williams
|
-
|
825
|
23
|
100
|
106
|
18
|
66
|
5.0
|
-0.42
|
138
|
.319
|
.393
|
.530
|
1,397
|
Matt Holliday
|
-
|
823
|
27
|
98
|
104
|
9
|
68
|
3.8
|
-1.12
|
132
|
.299
|
.379
|
.510
|
1,903
|
Christian Yelich
|
-
|
808
|
24
|
103
|
85
|
20
|
83
|
5.0
|
-0.52
|
132
|
.292
|
.379
|
.477
|
1,095
|
Kevin Mitchell (2)
|
-
|
808
|
31
|
83
|
101
|
4
|
65
|
3.9
|
-1.07
|
142
|
.284
|
.360
|
.520
|
1,223
|
Keller’s top 3 traditional comps were Josh Hamilton, Kevin Mitchell, and Al Rosen. Mitchell is the only top 10 comp from Keller’s traditional list that survives, and he’s down at #9. I like the top 4 very much here – Keller, Berkman, Johnson, and Kiner all seem like the same mold – good combination of batting average, OBP, and pop, and generating around 100 runs/RBI/walks per 162 games played, really valuable offensive weapons. Kiner separated himself from the others because he gained notoriety from the 7 consecutive seasons he led the league in home runs, but they all seem to be cut from the same cloth.
How about Eric Davis, who for a season or two was about an exciting a player as I can remember watching?
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Eric Davis
|
|
1,000
|
28
|
93
|
93
|
35
|
74
|
3.6
|
-0.91
|
125
|
.269
|
.359
|
.482
|
1,626
|
Ray Lankford (10)
|
|
892
|
23
|
92
|
83
|
25
|
79
|
3.6
|
0.05
|
123
|
.272
|
.364
|
.477
|
1,701
|
Dale Murphy
|
|
881
|
30
|
89
|
94
|
12
|
73
|
3.5
|
-0.51
|
121
|
.265
|
.346
|
.469
|
2,180
|
Andrew McCutchen
|
|
876
|
25
|
97
|
86
|
18
|
85
|
4.2
|
-0.70
|
131
|
.280
|
.373
|
.476
|
1,761
|
Carlos Beltran
|
|
861
|
27
|
99
|
99
|
20
|
68
|
4.4
|
0.13
|
119
|
.279
|
.350
|
.486
|
2,586
|
Chili Davis
|
|
859
|
23
|
82
|
91
|
9
|
79
|
2.5
|
-0.94
|
121
|
.274
|
.360
|
.451
|
2,435
|
Ellis Burks
|
|
858
|
29
|
101
|
98
|
15
|
64
|
4.0
|
-0.54
|
126
|
.291
|
.363
|
.510
|
2,000
|
Grady Sizemore
|
|
846
|
22
|
97
|
76
|
21
|
71
|
4.1
|
0.06
|
115
|
.265
|
.349
|
.457
|
1,101
|
Fred Lynn
|
|
846
|
25
|
87
|
91
|
6
|
71
|
4.1
|
-0.26
|
129
|
.283
|
.360
|
.484
|
1,969
|
Bobby Murcer
|
|
843
|
21
|
83
|
89
|
11
|
73
|
2.7
|
-1.34
|
124
|
.277
|
.357
|
.445
|
1,908
|
Amos Otis
|
|
841
|
16
|
89
|
82
|
28
|
61
|
3.5
|
-0.31
|
115
|
.277
|
.343
|
.425
|
1,998
|
Outside of Lankford, it’s a completely different set of players from Davis’ traditional Similarity Score list. Davis’ top comps on the traditional scale are Kirk Gibson, Jeromy Burnitz, Darryl Strawberry, and Raul Mondesi, all of whom are pretty good comps, but none of whom were primarily center fielders.
Interesting to note that none of the comps are currently Hall of Famers. It’s a lot of players who exhibit 20-20 type of potential, but none of them has been able to reach Cooperstown, although Dale Murphy and Carlos Beltran are certainly in the discussion.
Davis is just a little bit shy of being the only player in history who would have a per-162 stat line with both 30 HR and 30 stolen bases (Fernando Tatis Jr. and Ronald Acuna Jr. both currently have that status, but of course they are quite early in their respective careers). Bobby Bonds is the closest at 29 HR and 40 stolen bases per 162.
Speaking of Darryl Strawberry:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Darryl Strawberry
|
|
1,000
|
34
|
92
|
102
|
23
|
84
|
4.3
|
-0.74
|
138
|
.259
|
.357
|
.505
|
1,583
|
Reggie Jackson
|
Y
|
934
|
32
|
89
|
98
|
13
|
79
|
4.2
|
-0.94
|
139
|
.262
|
.356
|
.490
|
2,820
|
Jose Canseco
|
|
893
|
40
|
102
|
121
|
17
|
78
|
3.6
|
-1.18
|
132
|
.266
|
.353
|
.515
|
1,887
|
Rocky Colavito
|
|
888
|
33
|
85
|
102
|
2
|
84
|
3.9
|
-0.40
|
132
|
.266
|
.359
|
.489
|
1,841
|
Kirk Gibson
|
|
873
|
25
|
98
|
86
|
28
|
71
|
3.8
|
-0.63
|
123
|
.268
|
.352
|
.463
|
1,635
|
Bob Allison
|
|
871
|
27
|
85
|
84
|
9
|
84
|
3.6
|
-0.56
|
127
|
.255
|
.358
|
.471
|
1,541
|
Jose Bautista (9)
|
|
865
|
31
|
92
|
88
|
6
|
93
|
3.3
|
-0.71
|
124
|
.247
|
.361
|
.475
|
1,798
|
David Justice
|
|
863
|
31
|
93
|
102
|
5
|
91
|
4.1
|
-0.26
|
129
|
.279
|
.378
|
.500
|
1,610
|
Jackie Jensen
|
|
860
|
22
|
91
|
105
|
16
|
84
|
3.1
|
-0.44
|
120
|
.279
|
.369
|
.460
|
1,438
|
Jack Clark
|
|
859
|
28
|
91
|
96
|
6
|
103
|
4.3
|
-1.04
|
137
|
.267
|
.379
|
.476
|
1,994
|
Roger Maris
|
|
853
|
30
|
91
|
94
|
2
|
72
|
4.2
|
-0.18
|
127
|
.260
|
.345
|
.476
|
1,463
|
Reggie Jackson is by far the best comp by this method, with pretty close figures across the board except that Strawberry had about twice the rate of stolen bases, although Jackson through age 30 was stealing at a rate of about 20 per 162 games, but then that fell way off as he aged. In any case, Strawberry and Reggie were very comparable through their 20’s, and they do show a very strong similarity on a per 162 basis, with Reggie ultimately playing nearly twice as many games.
Colavito’s another interesting comp with tight similarity to Strawberry across the board except for stolen bases, which represents the majority of the penalty points in the calculation of the score.
How about Mo Vaughn? Vaughn had a productive but relatively brief career.
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Mo Vaughn
|
|
1,000
|
35
|
92
|
114
|
3
|
78
|
2.9
|
-1.33
|
132
|
.293
|
.383
|
.523
|
1,512
|
Fred McGriff
|
|
931
|
32
|
89
|
102
|
5
|
86
|
3.5
|
-1.14
|
134
|
.284
|
.377
|
.509
|
2,460
|
Miguel Cabrera
|
|
929
|
31
|
94
|
113
|
2
|
75
|
4.3
|
-1.22
|
145
|
.310
|
.387
|
.532
|
2,587
|
David Ortiz
|
Y
|
928
|
36
|
95
|
119
|
1
|
89
|
3.7
|
-1.41
|
141
|
.286
|
.380
|
.552
|
2,408
|
Carlos Delgado
|
|
927
|
38
|
99
|
120
|
1
|
88
|
3.5
|
-1.37
|
138
|
.280
|
.383
|
.546
|
2,035
|
Rafael Palmeiro
|
|
917
|
33
|
95
|
105
|
6
|
77
|
4.1
|
-0.61
|
132
|
.288
|
.371
|
.515
|
2,831
|
Hal Trosky (9)
|
|
914
|
27
|
100
|
122
|
3
|
66
|
3.6
|
-0.96
|
130
|
.302
|
.371
|
.522
|
1,347
|
Travis Hafner
|
|
912
|
29
|
85
|
100
|
2
|
82
|
3.4
|
-1.34
|
134
|
.273
|
.376
|
.498
|
1,183
|
Prince Fielder (1)
|
|
910
|
32
|
87
|
103
|
2
|
85
|
2.4
|
-2.06
|
134
|
.283
|
.382
|
.506
|
1,611
|
Jason Giambi
|
|
903
|
32
|
88
|
103
|
1
|
98
|
3.6
|
-1.41
|
139
|
.277
|
.399
|
.516
|
2,260
|
José Abreu
|
|
901
|
33
|
89
|
115
|
2
|
47
|
4.0
|
-1.09
|
135
|
.290
|
.350
|
.515
|
1,113
|
Lots of players with over 900 Similarity Scores. By traditional Similarity Scores, Vaughn’s top comps are Prince Fielder (who is still on the list above but drops to #8), Paul Goldschmidt (still active), Ted Kluszewski, Freddie Freeman (still active), and David Justice. McGriff is a pretty good fit for Vaughn, but of course McGriff’s career was about 60% longer. Cabrera also generates a pretty high score, but I do think he’s distinctly better than what Vaughn produced, and, of course, had a much longer career as well. Vaughn had a nice stat line, and it was nice career while it lasted.
Before we looked at Campanella and Robinson, both of whom had their career stats affected by the Color Line and time spent in the Negro Leagues. How about looking at players who were primarily or even exclusively Negro Leaguers?
Now, I’m sure some might argue that we can’t take the stats literally, and there’s certainly discussion to be had there. I know there are a lot of efforts going on in the realm of trying to translate the stats for Negro League players into Major League Equivalencies (MLE), and I was tempted to leverage the work that had been done in that area, but I think I’ll save that exercise for another time after I get more of a chance to digest them. Besides, if we start making adjustments for Negro League stats, don’t we kind of have to consider doing that for everyone? Didn’t players who played in the National League and American League during the existence of the Negro Leagues benefit stat-wise from not facing all of the best available talent? Stats always reflect an intersection of performance, context, talent, circumstances, rules, ballparks, competition, and any number of other factors. They are never pure results. But, if you start adjusting some, I think you have to start adjusting for everyone. I mean, does anyone think that Ty Cobb would produce a .366 career average in another era? Neither do I.
So, for now…..what if we simply take the Negro League stats on face value, but with the understanding that there are likely many things that influence them? What do they imply, and what kind of shape and comparisons do they invite? How about if we do that for now, and I will also take it as a "to-do" to follow up with MLE’s and see what kind of results those yield?
Here's the great Buck Leonard:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Buck Leonard
|
|
1,000
|
26
|
147
|
152
|
9
|
109
|
7.9
|
-0.39
|
181
|
.345
|
.450
|
.589
|
587
|
Lou Gehrig
|
Y
|
912
|
37
|
141
|
149
|
8
|
113
|
8.5
|
-0.67
|
179
|
.340
|
.447
|
.632
|
2,164
|
Jimmie Foxx
|
Y
|
839
|
37
|
122
|
134
|
6
|
102
|
6.5
|
-0.41
|
163
|
.325
|
.428
|
.609
|
2,317
|
Hank Greenberg
|
Y
|
819
|
38
|
122
|
148
|
7
|
99
|
6.4
|
-0.51
|
159
|
.313
|
.412
|
.605
|
1,394
|
Dan Brouthers
|
Y
|
790
|
10
|
148
|
126
|
25
|
81
|
7.7
|
-0.16
|
171
|
.342
|
.423
|
.520
|
1,676
|
Jeff Bagwell
|
Y
|
734
|
34
|
114
|
115
|
15
|
106
|
6.0
|
-0.54
|
149
|
.297
|
.408
|
.540
|
2,150
|
Johnny Mize
|
Y
|
712
|
31
|
96
|
115
|
2
|
74
|
6.1
|
-0.56
|
158
|
.312
|
.397
|
.562
|
1,884
|
Joey Votto
|
|
710
|
28
|
95
|
91
|
7
|
110
|
5.5
|
-0.47
|
148
|
.302
|
.416
|
.520
|
1,900
|
Todd Helton
|
|
706
|
27
|
101
|
101
|
3
|
96
|
4.5
|
-0.36
|
133
|
.316
|
.414
|
.539
|
2,247
|
Frank Thomas
|
Y
|
691
|
36
|
104
|
119
|
2
|
116
|
5.1
|
-1.57
|
156
|
.301
|
.419
|
.555
|
2,322
|
Paul Goldschmidt
|
|
668
|
31
|
104
|
102
|
15
|
92
|
5.6
|
-0.36
|
142
|
.293
|
.389
|
.521
|
1,469
|
*This is based on comps with 1,000 or more games. If I lower it to 500 or more, another Negro League great (Mule Suttles) pops in at #3.
Now, obviously Leonard’s traditional Similarity Score comps aren’t great examples because Leonard’s published career stats only capture 587 games, so he tends to be compared on the basis of career totals of players who have that same type of context. As a result, it mostly captures other great Negro League stars like Cristobal Torriente, Bullet Rogan, Ben Taylor, Edgar Wesley, and Heavy Johnson, who were great players with similar number of career "official" games, but the primary AL/NL players it comes up with are Dale Alexander and Lefty O’Doul, who had short careers.
As you can see, Leonard’s Seasonal Notation Similarity Score comps yield a pretty star-studded list, and the comp with the highest score, by far, is none other than Lou Gehrig. Poetic justice, in my book, and in Leonard’s "book" as well. Leonard’s biography, which he wrote with James A. Riley, was titled "Buck Leonard: The Black Lou Gehrig: The Hall of Famer's Story in His Own Words", and comparisons to Gehrig were common while Leonard was active. The great Monte Irvin once commented that, had Leonard been allowed to play in MLB, then they might have referred to Gehrig as the white Buck Leonard instead of the other way around.
Now, again, I’m sure some might be reluctant to directly compare Negro League stats to NL or AL stats, and I understand that. You can draw your own conclusions, However, by any standard, Leonard was probably the greatest first baseman in Negro League history, and Gehrig was probably the greatest in either the NL or the AL. The bold type (league category leadership) on Leonard’s baseball reference page definitely reaches out and grabs you almost as much as Gehrig’s does. I think Leonard is probably more comparable to Gehrig (and vice versa) than anyone else in history, and that includes Gehrig’s traditional #1 comp, Jimmie Foxx.
I think Leonard and Gehrig are both elite, and they deserve to be compared to each other. I suspect the consensus of most experts is that Gehrig had a little more home run power and was maybe a little better overall. Buck O’Neill commented that he though Gehrig had more home run power but thought that Leonard was better defensively. In his "Baseball 100" list, Joe Posnanski has Gehrig at #14, and Leonard at #53. Isolating it to just first basemen (and if we consider Stan Musial as an outfielder rather than a first baseman, which is what I would normally do), Posnanski has Gehrig #1 (#14 overall), Albert Pujols #2 (#23 overall), Jimmie Foxx #3 (#33 overall), and Leonard #4 (#53 overall). In the New Bill James Historical Abstract (which is now about 20 years old), Bill also had Gehrig at #14 overall with Leonard at #65, but he had more first basemen in between the two. His top first basemen were, in order, Gehrig, Foxx, Mark McGwire, Mule Suttles (who could also be considered an outfielder) Jeff Bagwell, Eddie Murray, Johnny Mize, Harmon Killebrew, and then Leonard. So, he essentially had Leonard around #8 or #9 among first basemen.
By the way, I tried using the Major League Equivalent figures (MLE’s) that I alluded to earlier on Leonard, and if use those, his #1 Seasonal Notation comp would be Will Clark. Clark’s a great player, but I think Leonard was probably better than Clark.
How about Cool Papa Bell?
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Cool Papa Bell
|
|
1,000
|
8
|
155
|
80
|
39
|
72
|
4.3
|
-0.51
|
126
|
.325
|
.395
|
.446
|
1,199
|
Jimmy Ryan
|
|
889
|
9
|
132
|
88
|
34
|
65
|
3.5
|
-0.81
|
124
|
.308
|
.375
|
.444
|
2,014
|
George Van Haltren
|
|
884
|
6
|
134
|
83
|
47
|
71
|
3.2
|
-0.92
|
122
|
.316
|
.386
|
.418
|
1,990
|
Hugh Duffy
|
Y
|
862
|
10
|
145
|
121
|
54
|
62
|
4.0
|
-0.23
|
123
|
.326
|
.386
|
.451
|
1,737
|
Mike Griffin
|
|
859
|
4
|
151
|
77
|
51
|
87
|
4.4
|
-0.09
|
123
|
.296
|
.388
|
.407
|
1,513
|
Earle Combs
|
Y
|
853
|
6
|
132
|
70
|
11
|
75
|
5.0
|
-0.30
|
125
|
.325
|
.397
|
.462
|
1,455
|
Pete Browning
|
|
851
|
6
|
131
|
90
|
35
|
64
|
5.6
|
-0.79
|
163
|
.341
|
.403
|
.467
|
1,183
|
George Gore
|
|
848
|
6
|
164
|
76
|
21
|
89
|
4.9
|
-0.49
|
136
|
.301
|
.386
|
.411
|
1,310
|
Mike Donlin
|
|
844
|
8
|
103
|
84
|
33
|
48
|
4.5
|
-0.86
|
144
|
.333
|
.386
|
.468
|
1,049
|
Ben Chapman
|
|
826
|
8
|
108
|
92
|
27
|
78
|
4.0
|
0.05
|
114
|
.302
|
.383
|
.440
|
1,717
|
Edd Roush
|
Y
|
822
|
6
|
91
|
81
|
22
|
40
|
3.8
|
-0.50
|
126
|
.323
|
.369
|
.446
|
1,967
|
I’m not real satisfied with that list, because it’s heavily dominated by pre-1900 players like Ryan, Van Haltren, Duffy, Griffin, Browning, and Gore, and a lot of that is driven by those players’ stolen base figures that got a boost from how stolen bases were defined for several years in that era (which included things like taking an extra base on a single, for example). Players from that era also have the advantage of being in a high-scoring environment, which enables them to be better "comps" to Bell’s rather striking per-162 game figure of 155 runs scored.
So, this is a good case where I think some intervention would be a good idea. How about if we limit it to players whose careers were from 1901 or later?
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Cool Papa Bell
|
|
1,000
|
8
|
155
|
80
|
39
|
72
|
4.3
|
-0.51
|
126
|
.325
|
.395
|
.446
|
1,199
|
Earle Combs
|
Y
|
853
|
6
|
132
|
70
|
11
|
75
|
5.0
|
-0.30
|
125
|
.325
|
.397
|
.462
|
1,455
|
Ben Chapman
|
|
826
|
8
|
108
|
92
|
27
|
78
|
4.0
|
0.05
|
114
|
.302
|
.383
|
.440
|
1,717
|
Edd Roush
|
Y
|
822
|
6
|
91
|
81
|
22
|
40
|
3.8
|
-0.50
|
126
|
.323
|
.369
|
.446
|
1,967
|
Cesar Cedeno
|
|
791
|
16
|
88
|
79
|
44
|
54
|
4.3
|
-0.35
|
123
|
.285
|
.347
|
.443
|
2,006
|
Brett Butler
|
|
772
|
4
|
99
|
42
|
41
|
83
|
3.6
|
-0.45
|
110
|
.290
|
.377
|
.376
|
2,213
|
Johnny Damon
|
|
767
|
15
|
109
|
74
|
27
|
65
|
3.7
|
-0.13
|
104
|
.284
|
.352
|
.433
|
2,490
|
Kenny Lofton
|
|
756
|
10
|
118
|
60
|
48
|
73
|
5.3
|
1.19
|
107
|
.299
|
.372
|
.423
|
2,103
|
Dom DiMaggio
|
|
754
|
10
|
121
|
72
|
12
|
87
|
3.9
|
0.35
|
111
|
.298
|
.383
|
.419
|
1,399
|
Lenny Dykstra
|
|
753
|
10
|
102
|
51
|
36
|
81
|
5.4
|
0.90
|
120
|
.285
|
.375
|
.419
|
1,278
|
Amos Otis
|
|
748
|
16
|
89
|
82
|
28
|
61
|
3.5
|
-0.31
|
115
|
.277
|
.343
|
.425
|
1,998
|
That may feel a little better, but it also results in generally lower scores. Bell’s rate of 155 runs per 162 games is obviously a tough comparison point that results in a pretty hefty penalty for most of these players who can’t approach that level.
Combs is a pretty good match at face value in most categories except for stolen bases, although Combs reportedly had excellent speed himself. Of course, Bell’s speed was, by most accounts more in the "elite" realm as opposed to just merely "excellent". Combs reportedly was a pretty proficient base stealer at AA Louisville, but Combs didn’t really translate his speed into stolen bases at the MLB level, which, if you’re hitting at the top of the order ahead of players like Ruth and Gehrig, would make sense. Miller Huggins essentially instructed him to get on base and let the big guys hit him in, and I can’t say as I blame him.
By the way, if I don’t control for position (which also means eliminating dWAR and recalculating penalties and scores) and I only take players from 1901 or later, the top 3 comps for Bell are Ross Youngs, Kiki Cuyler, and Paul Molitor.
One more note…..like Leonard, I went ahead and ran Bell using the MLE’s on him, and his #1 comp was Edd Roush (908), who played roughly 10 years prior to Bell and who is also the #4 comp on Bell’s list displayed above. So….what do you think of Cool Papa Bell and Edd Roush as comparable? Maybe….
Anyway, in Bell’s case, I’m not sure which list yields the better "similarity". Is it the group dominated by the pre-1900 players? Or the more modern one? I’m not sure.
I will say that some of the truly elite Negro League players (thinking in particular of Josh Gibson and Oscar Charleston) present a challenge because of their very unique and high level nature of their stats, but that can also be true of traditional career-based Similarity Scores. For example, the "most similar" traditional Similarity Score comp for Pete Rose is Paul Molitor at a measly 678, and the closest to Cy Young is Walter Johnson at 703. Sometimes, there are no great comps.
Gibson’s top Seasonal Notation comp is Mike Piazza, but the Similarity Score is a ridiculously low 418, but then again, Gibson’s top traditional Similarity Score comp is another Negro League star catcher (Biz Mackey), and that score is under 700. Gibson just simply doesn’t have anyone who compares very closely unless his stats undergo a severe adjustment. His stats per 162 are simply off the charts.
Charleston’s top Seasonal Notation comp based on 1,000 or more games is Ty Cobb, but it’s also a pretty low score (732). Charleston’s top 2 comps if we lower the threshold to 500 or more games are 2 great Negro League center fielders, Turkey Stearnes (Score of 850) and Cristóbal Torriente (747).
So, I think there are certainly some things we can glean from putting Negro League stars through this mechanism to see how players compare at face value of the stats on a per 162 basis, but there might have to be some additional work done on applying some adjustments to make the comparisons even more meaningful.
A few other quick ones for a few of the big-name Negro League stars, focusing on the #1 comps only, and to be honest, I’m only going to give their "face value" #1 comps, because I think the #1 "
MLE" comps don’t do them justice.
The # 1 comp for Biz Mackey is Bill Dickey (890)
The # 1 comp for Mule Suttles is Hank Greenberg (864)
The # 1 comp for Judy Johnson is Pie Traynor (910)
The # 1 comp for Ray Dandridge is also Pie Traynor (960)
The # 1 comp for Cristóbal Torriente is Tris Speaker (881)
The # 1 comp for John Henry Lloyd is Arky Vaughan (802) (note: I made Lloyd’s primary position shortstop even though his official data classifies him with more games at second base).
The # 1 comp for Turkey Stearnes is Joe DiMaggio (795)
The # 1 comp for Martín Dihigo is Alex Rodriguez (917)
(Dihigo is listed in the database as a SS. If I ignore position and dWAR, other top comps include Ken Williams, Larry Walker, and Earl Averill).
The # 1 comp for Monte Irvin is Carl Yastrzemski (909)
(note that Irvin’s stats combine both his time in the Negro Leagues as well as his years in the National League)
How about an active player? Again, the big caveat here is that an active player’s rate stats and per-162 game figures won’t necessarily hold up once his complete career is in the books, but it’s still fun to compare.
Let’s try Freddie Freeman:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Freddie Freeman
|
|
1,000
|
28
|
100
|
97
|
5
|
80
|
4.5
|
-0.83
|
138
|
.295
|
.384
|
.509
|
1,565
|
Will Clark
|
|
966
|
23
|
97
|
99
|
5
|
77
|
4.6
|
-0.83
|
137
|
.303
|
.384
|
.497
|
1,976
|
Rafael Palmeiro
|
|
937
|
33
|
95
|
105
|
6
|
77
|
4.1
|
-0.61
|
132
|
.288
|
.371
|
.515
|
2,831
|
Fred McGriff
|
|
925
|
32
|
89
|
102
|
5
|
86
|
3.5
|
-1.14
|
134
|
.284
|
.377
|
.509
|
2,460
|
Dolph Camilli
|
|
918
|
26
|
102
|
103
|
7
|
103
|
4.7
|
-0.54
|
136
|
.277
|
.388
|
.492
|
1,490
|
Hal Trosky (10)
|
|
908
|
27
|
100
|
122
|
3
|
66
|
3.6
|
-0.96
|
130
|
.302
|
.371
|
.522
|
1,347
|
Norm Cash
|
|
908
|
29
|
81
|
86
|
3
|
81
|
4.0
|
-0.71
|
139
|
.271
|
.374
|
.488
|
2,089
|
Kent Hrbek
|
|
906
|
27
|
84
|
101
|
3
|
78
|
3.6
|
-0.71
|
128
|
.282
|
.367
|
.481
|
1,747
|
Miguel Cabrera
|
|
904
|
31
|
94
|
113
|
2
|
75
|
4.3
|
-1.22
|
145
|
.310
|
.387
|
.532
|
2,587
|
Eddie Murray
|
Y
|
901
|
27
|
87
|
103
|
6
|
71
|
3.7
|
-0.62
|
129
|
.287
|
.359
|
.476
|
3,026
|
Mo Vaughn (8)
|
|
900
|
35
|
92
|
114
|
3
|
78
|
2.9
|
-1.33
|
132
|
.293
|
.383
|
.523
|
1,512
|
Earlier, I mentioned that Don Mattingly/Ripper Collins had an extremely high Similarity Score of 965, but Freeman and Will Clark nose them out by 1 point, although the similarity will probably start to deteriorate some over the balance of Freeman’s career. Freeman’s got a few more home runs, but every other category is really close.
By the way, does Freeman seem on track to the Hall of Fame? I’m not sure. He’s making good progress, picking up markers here and there with an MVP and a World Series ring, but his MVP was in an abbreviated season, and I kind of get the sense that he may not be realizing enough "sizzle", for lack of a better word. He’s tracking pretty well to the stat line of Eddie Murray through age 31, although Murray’s WAR through the same age was about 10 higher than Freeman’s (Murray didn’t have an MVP, but he did finish 2nd twice). Murray was ahead, but not by all that much, and then Murray tacked on about 200 more home runs and about 1,400 more hits from that point forward to finish over 500 homers and 3,000 hits. If Freeman replicates that kind of bulk from age 32 on, he’ll be in good shape, but of course that remains to be seen. I think he’s making good progress, but he’s not at "lock" status yet.
How about one more active player? Here’s Jose Altuve:
Name
|
HOF?
|
Score
|
HR-SN
|
R-SN
|
RBI-SN
|
SB-SN
|
BB-SN
|
WAR-SN
|
dWAR-SN
|
OPS+
|
BA
|
OBP
|
Slug
|
Games
|
Jose Altuve
|
|
1,000
|
18
|
100
|
72
|
29
|
50
|
4.7
|
0.06
|
125
|
.308
|
.360
|
.462
|
1,437
|
Roberto Alomar
|
Y
|
912
|
14
|
103
|
77
|
32
|
70
|
4.6
|
0.22
|
116
|
.300
|
.371
|
.443
|
2,379
|
Larry Doyle
|
|
888
|
7
|
88
|
73
|
27
|
57
|
4.1
|
-0.20
|
125
|
.290
|
.357
|
.408
|
1,766
|
Craig Biggio
|
Y
|
877
|
17
|
105
|
67
|
24
|
66
|
3.7
|
-0.16
|
112
|
.281
|
.363
|
.433
|
2,850
|
Ryne Sandberg
|
Y
|
875
|
21
|
99
|
79
|
26
|
57
|
5.1
|
1.01
|
114
|
.285
|
.344
|
.452
|
2,164
|
Rod Carew
|
Y
|
847
|
6
|
93
|
67
|
23
|
67
|
5.3
|
-0.11
|
131
|
.328
|
.393
|
.429
|
2,469
|
George Grantham
|
|
847
|
12
|
102
|
80
|
15
|
80
|
3.7
|
-0.24
|
122
|
.302
|
.392
|
.461
|
1,444
|
Frankie Frisch
|
Y
|
847
|
7
|
107
|
87
|
29
|
51
|
5.0
|
1.51
|
110
|
.316
|
.369
|
.432
|
2,311
|
Ray Durham
|
|
844
|
16
|
102
|
72
|
22
|
67
|
2.8
|
-0.42
|
104
|
.277
|
.352
|
.436
|
1,975
|
Hardy Richardson
|
|
843
|
9
|
137
|
101
|
25
|
46
|
5.0
|
0.36
|
131
|
.299
|
.344
|
.437
|
1,334
|
Julio Franco
|
|
842
|
11
|
|