Remember me

Seasonal Notation Similarity Scores

February 26, 2022
Introduction
 
·         This is my second attempt at "Bill James Fusion", which, much like fusion cooking, is when I take 2 different Bill James concepts and combine them into something a little different (and hopefully delicious, although your taste experience may vary). 

My
initial fusion attempt was combining the Bill James creations often referred to as "Hall of Fame metrics" (Hall of Fame Monitor, Black Ink, Gray Ink, and Hall of Fame Standards) with Similarity Scores. This time, I’m combining traditional Similarity Scores with what Bill used to refer to as "Seasonal Notation", which was simply a player’s stats expressed in a per-162 game context. OK, maybe the concept of expressing statistics on a per-162 game basis isn’t originally Bill’s, but I do believe he came up with the term "Seasonal Notation", so that’s good enough for me.

·         This is not a new concept. I do remember seeing Similarity Scores per 162 games as a feature on the now-defunct "Baseball Gauge" (or "Seamheads") web site, although I’m using a different set of categories and penalties in coming up with the scores. Also, I believe they only included it as part of their Negro Leagues Database section on the site, although I’m not positive I’m remembering that 100% correctly, and it’s too late for me to verify that.

·         This was my initial attempt at this approach, so I think there’s a lot of room for potential improvement. I suspect many of you would have picked different categories or come up with different penalties, and that’s certainly understandable. This is just my attempt at coming up with a scoring approach and seeing what kind of results it generated.
 
·         Although traditional Similarity Scores are often referenced in the context of Hall of Fame discussions and comparisons, I’m not really pushing for the same thing here. In my opinion, Hall of Fame candidates have a lot of potential areas to consider – their total careers, their peaks, the impact of individual seasons, contributions to successful teams, awards and honors, milestones, records, and so on. 

Seasonal Notation Similarity Scores, by themselves, give some sense of the level and quality of a player’s performance, but not how prolific they were. And, many of my examples are for players who had short careers, and those short careers often "benefit" from being expressed in a per-162 game context because the player in question did not experience an extended decline phase. 

For example, Pete Rose’s Seasonal Notation and rate stats (batting average, OBP, etc.) suffer, in part, because he kept on playing and playing and playing. Had he stopped playing a few years earlier, his rate stats and his Seasonal Notation numbers would have looked a lot better, but then he also wouldn’t have enjoyed the "bulk" totals he currently possesses. It’s a double-edged sword. And, the Hall of Fame, I believe, tends to favor those with longer careers, or at least those with more impressive career totals.
 
Background
 
I’ve always loved the concept of Similarity Scores, because the topic is close to my heart. In my everyday job, I am involved in demand planning and forecasting, and the concept of similarity is when we try to plan and forecast the items and product categories that we sell. Is a new product similar to another one that already exists? How similar? In what way might it be different? Is it in a similar product category but with some key product feature differences? Are the items being compared different brands? Are they priced differently? Are they promoted differently? What kind of "sales curve" do they follow? Do they tend to have stable sales, or do they fluctuate wildly by time of year? What are the implications of the similarities, and what are the implications of the differences?
 
Now, I will say that I suspect traditional Similarity Scores may not be leaned upon as heavily as they may once were. Bill introduced them roughly 40 years ago, and people use them for all kinds of comparisons, including (but not limited to) Hall of Fame discussions. I think they were a big step forward in how we compare players (and, boy, do we like to compare players!). But there are a lot of caveats in using them, as everyone (including Bill) acknowledges. Similarity Scores use basic career stat categories, ones that are not adjusted for time or place, so that a home run is a home run regardless of when or where it was hit, and a .300 average is treated the same regardless of whether it was generated in the 1930’s or the 1960’s. Also, the categories for hitters are strictly offense-based, although there is a positional adjustment.
 
To level set, below is the explanation of Similarity Scores from baseball-reference.com (there’s one for batters and one for pitchers, this is just the one for hitters):
 
Similarity scores are not our concept. Bill James introduced them in the mid-1980s, and we lifted his methodology from his book The Politics of Glory (p. 86-106). To compare one player to another, start at 1000 points and then subtract points based on the statistical differences of each player.
 
Batters
One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.
 
The key here is that traditional Similarity Scores use a player’s career statistics. What I want to take a look at is comparing players on a "per opportunity" basis, which in my case is per 162 games. I could have used plate appearances, but I decided to put everything into a context of 162 games. And I think the most interesting results are for players who had abbreviated careers.
 
A couple of quick examples, with a quick sidebar:
 
For some reason, Al Rosen is a fascinating player to me, I suppose because he packed a lot into a very short career. A few bullet points:
 
·         Rosen only played 10 seasons, and the first 3 of those were no more than brief cups of coffee as he was stuck behind Cleveland’s All Star third baseman, Ken Keltner, so he really only had 7 seasons, and really only 5 good ones.

·         When Rosen finally did get an opportunity, he made the most of it. In 1950, he broke the AL record for home runs by a rookie with 37, a mark that stood until Mark McGwire hit 49 in 1987.

·         In 1953, Rosen had what may be the best season any third baseman has ever had when he hit 43 home runs, drove in 145 runs, scored 115 runs, slugged .613, had an OPS+ of 180, and had 367 total bases, all of which were league-leading figures. He also hit .336, just missing the batting title (and a triple crown) by a single point to Mickey Vernon’s .337. In addition, he realized a rWAR of 10.1, which is still the only time a third baseman has achieved a WAR of 10.0 or higher. Rosen was named the unanimous MVP.

·         In 1954, Rosen had one of the greatest individual performances in All Star history when he went 3 for 4 with 2 home runs, 5 RBI, and a walk. The 2 home runs and the 5 RBI are tied for the single-game highs in All Star game history.
 
Rosen also had what I think most people would agree was a generally successful post-playing career as an executive for the Yankees, Astros, and Giants.
 
Anyway, here’s the stat line for Al Rosen and his top 5 comps by traditional Similarity Scores:
 
Name
Score
G
AB
R
H
2B
3B
HR
RBI
SB
BB
SO
BA
SLG
Al Rosen
1,000
1,044
3,725
603
1,063
165
20
192
717
39
587
385
.285
.495
Bob Horner
934
1,020
3,777
560
1,047
169
8
218
685
14
369
512
.277
.499
Anthony Rendon
912
1,026
3,830
624
1,100
269
16
151
611
45
476
682
.287
.484
Josh Hamilton
910
1,027
3,909
609
1,134
234
24
200
701
50
352
938
.290
.516
Jim Ray Hart
906
1,125
3,783
518
1,052
148
29
170
578
17
380
573
.278
.467
Charlie Keller
904
1,170
3,790
725
1,085
166
72
189
760
45
784
499
.286
.518
 
Rosen had a short career (he spent a few years at the start of his career behind Ken Keltner, and then he retired early due to back issues), and so naturally the players considered most similar to him were players with similar career lengths who lined up close to his career stats.  
 
But these aren’t the type of players who Rosen reminds me of. Well, Rendon feels like a decent comp, but he’s also an active player whose stats are in flux. I think Rosen was quite a bit better overall than Horner and Hart. Hamilton and Keller were good players (and I think Keller was probably a better overall hitter than Rosen), but they were outfielders. So, this feels a little unsatisfying to me in terms of capturing what kind of player Rosen was.
 
Another example:
 
Here’s the stat line for Dodger Hall of Famer Roy Campanella and his top 5 comps by traditional Similarity Scores:
 
Name
Score
G
PA
AB
R
H
2B
3B
HR
RBI
SB
BB
SO
BA
SLG
Roy Campanella
1,000
1,430
5,648
4,951
771
1,401
226
30
260
1,017
34
605
501
.283
.498
Javy Lopez
913
1,503
5,793
5,319
674
1,527
267
19
260
864
8
357
969
.287
.491
Brian McCann
866
1,755
6,850
6,067
742
1,590
294
5
282
1,018
25
640
1,054
.262
.452
Walker Cooper
862
1,473
5,082
4,702
573
1,341
240
40
173
812
18
309
357
.285
.464
Troy Tulowitzki
857
1,291
5,415
4,804
762
1,391
264
24
225
780
57
511
900
.290
.495
Jason Varitek
829
1,546
5,839
5,099
664
1,307
306
14
193
757
25
614
1,216
.256
.435
 
So, these players do have career hitting stats that bear some similarity to Campanella (although Lopez is the only one with a score over 900), but part of that is that Campanella had 2 major influences on his career stats – his early years and stats are severely understated due to his time spent in the Negro Leagues (8 years but with only 214 games that have been captured), and then his paralyzing injury before the 1958 season that eliminated whatever time he may have had remaining. As a result, Campanella only had the equivalent of about 9 full seasons worth of games. 
 
So, while these are fine players, they are not the players that Campanella reminds me of.
 
Approach
 
In coming up with the scheme for Seasonal Notation Similarity Scores, I decided to keep some of the categories from traditional Similarity Scores, but to eliminate others. 
 
I did away with total games played since everything is being expressed as a per-162 game context, so it’s totally unnecessary. I also did away with at bats, as I felt like it wasn’t real valuable in the per-162 game context. 
 
I also eliminated hits, doubles, triples, and strikeouts, as I didn’t consider them essential. I could have kept strikeouts, but it’s been in such flux over time that I felt like I’d have to adjust or index everyone’s figures, and I was trying to keep this version pretty simple, so I just decided to eliminate them at this point. 
 
So, from the original Similarity Score methodology, I’m keeping 7 of the original categories:
 
·         Home Runs
·         Runs
·         RBI
·         Walks
·         Stolen Bases
·         Batting Average
·         Slugging Percentage
 
The first 5 are then adjusted to "per 162 games", with batting average and slugging percentage staying as is.
 
What am I adding? 
 
I thought OBP should be included (I was kind of surprised that it wasn’t already, I had always assumed it was), so I added it. That brings us to 8.
 
I also thought some more current metrics might be useful, ones that adjust for context, so I added
 
·         WAR (baseball reference version, per 162 games)
·         dWAR (per 162 games)
·         OPS+
 
That gives me 11 categories rather than the original 13. 10 would have been a more satisfying number, but I decided not to let that bother me.
 
Now, I know that dWAR (defensive WAR) and WAR overlap some (WAR essentially is total player value covering hitting, baserunning, and defense, and both dWAR and WAR incorporate a positional adjustment). Also, I’m sure not everyone is sold on dWAR as a measure, but ultimately I decided to keep both of them. WAR is a good approximation of overall value, and dWAR is at least something we can use to try to quantify defensive value, so I felt like they both brought something to the table, but I didn’t go all the way to bring in oWAR (offensive WAR) as a separate metric.
 
One of the other reasons I’m including dWAR is that I decided to make this Similarity Score totally about comparing players at the same primary position. That is, there’s no position adjustment I’m making in the score calculation as is done in traditional Similarity Scores – you’re either the same position or you’re not. I did come up with a "switch" in my spreadsheet that allows Similarity Scores to be generated ignoring the position mandate, but in that option I get rid of dWAR. Mostly, I’m going to focus on players at the same position, because I think that is a large part of what I think of as "similarity". I’m sure not everyone would agree with this, and I’m aware that, in most cases, a player’s "primary" position is not the only one they played at, so dWAR may not be a perfect metric to leverage, but it’s the approach I took.
 
The next step was establishing penalties for differences in each category. Without going into too much detail, I played around with the penalties until I reached what I got results that I was comfortable with based on the range of values in each category, the scale that each category uses, and the rollup of penalty points applied.  
 
The table below summarizes where I landed, keeping in mind that most of the penalty point figures are a lot different than traditional Similarity Scores because we’re dealing with per-162 game figures, so the data we’re comparing is on a much smaller scale (with smaller differences) than career totals, and the relative size of the penalties for each unit difference had to reflect a different magnitude than in traditional Similarity Scores:
 
Category
Penalty for Difference
Home Runs per 162 games
2 points for each HR per 162 games
Runs per 162 games
1 point for each run per 162 games
RBI per 162 games
1 point for each RBI per 162 games
Stolen Bases per 162 games
3 points for each stolen base per 162 games
Walks per 162 games
1 point for each walk per 162 games
Batting Average
1 point for each .001 difference
OBP
.75 points for each .001 difference
Slugging Pct.
.5 points for each .001 difference
WAR per 162 games
10 points for each 1.0 WAR per 162 games
dWar per 162 games
4 points for each 0.1 dWAR difference per 162 games
OPS+
1 point for each point difference
 
Again, there’s nothing magical about these penalty points – I just played around with them until I got what I thought were reasonable results. I’m sure they could be improved upon.
 
Examples
 
OK. Hopefully that’s enough of a setup. Let’s put it through its paces. 
 
I find that most of the "interesting" examples tend to be players who had abbreviated careers of one kind or another, as those are the ones who tend to benefit by looking at stats expressed in Seasonal Notation. Of course, often it’s true that those players get the benefit of not having what I would call an "elongated" decline phase which can affect a player’s rate stats. I fully acknowledge that effect.
 
A few notes:
 
·         In each table, I’m going to put each category stat included in the calculation. 

·         The lists will show the top 10 comps in descending order of the score (the player being compared to is listed first, then the #1 comp, then the #2 comp, and so on).

·          "SN" is shorthand for "Seasonal Notation".  

·         Unless otherwise noted, I’m only including comparison players who have at least 1,000 career games played and are classified as playing the same "primary" position.

·         I’m also going to include career games as an information column just to put each player’s total career length in perspective, although obviously total career games are not part of the comparison. But, on many of these examples, I’m using players who had relatively short careers, so this is just a reminder of that and to keep the comparisons in perspective.

·         Finally, if someone is among the player’s current top 10 traditional Similarity Score comps, I’ll put that rank in parentheses by the player’s name, so we get a sense of which players can be considered as similar regardless of whether we’re looking at their careers or their seasonal notation.
 
Let’s start by circling back to Roy Campanella:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Roy Campanella
 
1,000
29
87
115
4
69
4.7
1.00
126
.283
.363
.498
1,430
Yogi Berra
Y
932
27
90
109
2
54
4.6
0.70
125
.285
.348
.482
2,120
Johnny Bench
Y
906
29
82
103
5
67
5.6
1.48
126
.267
.342
.476
2,158
Bill Dickey
Y
902
18
84
109
3
61
5.1
0.92
127
.313
.382
.486
1,789
Gabby Hartnett (9)
Y
898
19
71
96
2
57
4.6
1.08
126
.297
.370
.489
1,990
Buster Posey
 
872
19
78
86
3
64
5.3
1.16
129
.302
.372
.460
1,371
Jorge Posada (6)
 
865
24
80
94
2
83
3.8
0.23
121
.273
.374
.474
1,829
Carlton Fisk
Y
862
24
83
86
8
55
4.4
1.10
117
.269
.341
.457
2,499
Mike Piazza
Y
858
36
89
113
1
64
5.0
0.13
143
.308
.377
.545
1,912
Javy Lopez (1)
 
850
28
73
93
1
38
3.2
0.61
112
.287
.337
.491
1,503
Brian McCann (2)
 
831
26
68
94
2
59
3.0
0.73
110
.262
.337
.452
1,755
 
So, we can see that 4 players who were on Campanella’s top 10 traditional comp list also made his seasonal notation top 10. Campanella’s top 2 traditional comps (Lopez and McCann) are still on the list, but they’re much further down, while Hartnett is a little higher up, and Posada’s about the same. 
 
The big difference now is that Campanella’s top 4 comps are all Hall of Famers, and his #1 comp is his contemporary and fellow 3-time 1950’s MVP, Yogi Berra, which I have to say I’m pretty happy with.  And, as you can see, they’re pretty comparable among most categories, with Campanella having a little better OBP (and higher walks) and Slugging Percentage, and a little higher dWAR, with Yogi (of course) having the much career games figure. But on a per-162 game performance basis, they’re pretty close.
 
I don’t know about you, but this is a very satisfying list to me. Because of many of the reasons outlined earlier, Campanella often suffers when compared to other great catchers. Campanella is 17th in JAWS, for example. Now, that’s not a complaint…..everyone recognizes why he ranks so low on something like that, and we make proper adjustments. His total career games that have been officially captured is only about 1,400 games, and that’s a relatively low total. But, again, we know why that is.
 
When I think of Campanella, I think of him as a top-10 all-time catcher, possibly even top 5 depending on your perspective of what’s important and how to make the necessary adjustments. But I like the fact that his best comps are players like Berra, Bench, Dickey, and Hartnett rather than Lopez, McCann, Cooper and Tulowitzki.
 
OK. How about revisiting Al Rosen?
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Al Rosen
 
1,000
30
94
111
6
91
5.0
0.06
137
.285
.384
.495
1,044
Chipper Jones
Y
902
30
105
105
10
98
5.5
-0.06
141
.303
.401
.529
2,499
Eddie Mathews
Y
897
35
102
98
5
98
6.5
0.38
143
.271
.376
.509
2,391
Josh Donaldson
 
895
34
100
98
5
88
6.0
0.70
135
.269
.367
.505
1,201
David Wright
 
895
25
97
99
20
78
5.0
0.03
133
.296
.376
.491
1,585
Anthony Rendon (2)
 
890
24
99
96
7
75
5.1
0.79
126
.287
.369
.484
1,026
Troy Glaus
 
888
34
94
100
6
90
4.0
0.32
119
.254
.358
.489
1,537
Hank Thompson (9)
 
877
22
92
85
8
83
4.6
0.33
122
.274
.376
.460
1,069
George Brett
Y
875
19
95
96
12
66
5.3
0.13
135
.305
.369
.487
2,707
Ron Santo
Y
866
25
82
96
3
80
5.1
0.63
125
.277
.362
.464
2,243
Bob Elliott
 
861
14
87
98
5
79
4.2
-0.25
124
.289
.375
.440
1,978
 
Rosen’s previous #1 comp (Bob Horner) falls out of the top 10 (he’s down at 18 now). 2 others (Rendon and Thompson) are still in the top 10. Rosen’s #1 comp is now Chipper Jones.
 
Now, I will say this. Chipper is the #1 comp, but Chipper’s still better, and Chipper has edges in nearly every category above. Chipper is better on a seasonal basis, and he’s light years ahead on career, as his career is two-and-a half times as long. But, Rosen checks in with per-162 game figures of 30 HR, 94 RBI, 91 walks, a 137 OPS+, and a WAR per 162 of 5.0. That’s a pretty darn good ballplayer. 
 
Again, as often happens with Similarity Scores, you can have players who are the most similar to you but who are better players than you. And, of course, the players right after Chipper Jones are players like Donaldson, Wright, Rendon, and Glaus. Donaldson (like Rosen) won an MVP (as did Elliott) and had other high finishes, and both Wright and Rendon have placed high as well.   It’s an interesting blend of some of the very elite at the position (Brett, Mathews, Jones, Santo) and others who were pretty good, but for a much shorter time.
 
But I feel like this list gives a better representation of the quality of player Rosen was when he was actually playing than his traditional comp list. I’m not trying to put him in the Hall of Fame - he had a very short career. But he was a very good player while he was in there.
 
Who else can we look at? How about Jackie Robinson?
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Jackie Robinson
 
1,000
16
111
87
23
86
7.3
1.18
133
.313
.410
.477
1,416
Charlie Gehringer
Y
881
13
124
100
13
83
5.9
0.75
125
.320
.404
.480
2,323
George Grantham (1)
 
809
12
102
80
15
80
3.7
-0.24
122
.302
.392
.461
1,444
Dustin Pedroia
 
808
15
99
78
15
67
5.6
1.66
113
.299
.365
.439
1,512
Frankie Frisch
Y
808
7
107
87
29
51
5.0
1.51
110
.316
.369
.432
2,311
Tony Lazzeri
Y
806
17
92
111
14
81
4.4
0.48
121
.292
.380
.467
1,740
Eddie Collins
Y
801
3
104
74
42
86
7.1
0.46
142
.333
.424
.429
2,826
Rod Carew
Y
796
6
93
67
23
67
5.3
-0.11
131
.328
.393
.429
2,469
Nap Lajoie
Y
795
5
98
104
25
34
7.0
0.66
150
.338
.380
.466
2,480
Ryne Sandberg
Y
794
21
99
79
26
57
5.1
1.01
114
.285
.344
.452
2,164
Roberto Alomar
Y
792
14
103
77
32
70
4.6
0.22
116
.300
.371
.443
2,379
 
Robinson’s top comps in traditional Similarity Scores are George Grantham, Daniel Murphy, Freddie Lindstrom, Edgardo Alfonso, and Denny Lyons.   Lindstrom is the only Hall of Famer among Robinson’s top 10 traditional comp list (although Jose Altuve is currently sitting at #8). Grantham still makes the list, but it’s now virtually all Hall of Famers. 
 
Now, much like Campanella, we understand the limits of traditional Similarity Scores when it comes to Robinson. Robinson had a very short career in MLB (10 seasons), and he didn’t debut with the Dodgers until age 28, so his career numbers understate his true value. He’s 10th in JAWS, which is impressive enough as is even on face value, but he’s even better than that. Robinson is 16th in career WAR among second basemen, but 6th in WAR7 (top 7 seasons). Robinson was a great player, arguably top 5 at the position. The brevity of his career is a big part of why his top 10 traditional comps aren’t very impressive.
 
Anyway, Seasonal Notation Similarity Scores illustrate how great Jackie was, and it yields a much more impressive list of comps. His WAR/162 of 7.3 is higher than every 2nd baseman with 1,000 or more career games except for Rogers Hornsby (9.1). And, even though his updated top 10 comp list is 80% Hall of Famers, the only one with a relatively high Similarity Score figure is Charlie Gehringer (881). They’re reasonably similar, except Robinson has more WAR/162, more stolen bases per 162, and more quantitative defensive value. Gehringer’s a great player, one of my all-time favorites, but I think Robinson was the better player, all of which plays into the greatness and uniqueness of Robinson’s career performance.
 
How about another player who had an abbreviated career? Let’s look at Don Mattingly:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Don Mattingly
 
1,000
20
91
100
1
53
3.8
-0.56
127
.307
.358
.471
1,785
Ripper Collins
 
965
20
92
98
3
53
3.7
-0.49
126
.296
.360
.492
1,084
Adrian Gonzalez (6)
 
922
27
84
101
1
66
3.7
-0.30
129
.287
.358
.485
1,929
Eddie Murray
Y
918
27
87
103
6
71
3.7
-0.62
129
.287
.359
.476
3,026
Ted Kluszewski
 
911
26
80
97
2
46
3.0
-0.93
123
.298
.353
.498
1,718
Mike Sweeney
 
910
24
85
101
6
58
2.8
-0.86
118
.297
.366
.486
1,454
Joe Torre
Y
908
18
73
87
2
57
4.2
-0.02
129
.297
.365
.452
2,209
Tony Perez
Y
906
22
74
96
3
54
3.2
-0.39
122
.279
.341
.463
2,777
Justin Morneau
 
905
26
81
103
1
60
2.8
-0.69
120
.281
.348
.481
1,545
Cecil Cooper (1)
 
905
21
86
96
8
38
3.1
-0.84
121
.298
.337
.466
1,896
Kent Hrbek
 
900
27
84
101
3
78
3.6
-0.71
128
.282
.367
.481
1,747
 
One reason I’m including Mattingly is that his #1 comp has one of the highest Seasonal Notation Similarity Scores that I’ve come across, and that’s Ripper Collins at a whopping 965. Collins was a member of the famous St. Louis Cardinals’ "Gas House Gang" of the 1930’s, but I think he often gets overshadowed by the more memorable characters from that team, such as Dizzy and Daffy Dean, Frankie Frisch, Pepper Martin, Joe Medwick and Leo Durocher.   In the Gang’s most famous season (1934), Collins was probably the team’s most valuable position player (tied for league lead with 35 HR’s, led the league in total bases and slugging), and probably the 2nd best overall after Dean (who famously won 30 games). But, Collins ultimately had a pretty short career with only 7 seasons of 100 or more games.
 
Anyway, Collins is a very strong across-the-board match for Mattingly, with no major differences in the individual categories. All 10 of Mattingly’s top comps have scores of 900 or above.
 
2 players (Gonzalez and Cooper) carry over from Mattingly’s traditional Similarity Score comp list. 3 of the comps are Hall of Famers, but Torre is in more for his managerial success (although he was a fine player as well), while Murray and Perez both had much longer careers.
 
Charlie Keller came up early in the article, a great hitter with an abbreviated career. Let’s run him through the tool:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Charlie Keller
-
1,000
26
100
105
6
109
6.1
-0.12
152
.286
.410
.518
1,170
Lance Berkman
-
902
32
99
106
7
104
4.5
-0.95
144
.293
.406
.537
1,879
Bob Johnson
-
893
25
108
112
8
93
4.9
-0.50
139
.296
.393
.506
1,863
Ralph Kiner
Y
860
41
107
112
2
111
5.3
-1.18
149
.279
.398
.548
1,472
Jeff Heath
-
851
23
91
104
7
69
4.4
-0.63
139
.293
.370
.509
1,383
Carl Yastrzemski
Y
847
22
89
90
8
90
4.7
0.05
130
.285
.379
.462
3,308
Monte Irvin
Y
843
22
89
108
8
73
5.0
0.24
134
.304
.388
.489
1,032
Ken Williams
-
825
23
100
106
18
66
5.0
-0.42
138
.319
.393
.530
1,397
Matt Holliday
-
823
27
98
104
9
68
3.8
-1.12
132
.299
.379
.510
1,903
Christian Yelich
-
808
24
103
85
20
83
5.0
-0.52
132
.292
.379
.477
1,095
Kevin Mitchell (2)
-
808
31
83
101
4
65
3.9
-1.07
142
.284
.360
.520
1,223
 
Keller’s top 3 traditional comps were Josh Hamilton, Kevin Mitchell, and Al Rosen. Mitchell is the only top 10 comp from Keller’s traditional list that survives, and he’s down at #9. I like the top 4 very much here – Keller, Berkman, Johnson, and Kiner all seem like the same mold – good combination of batting average, OBP, and pop, and generating around 100 runs/RBI/walks per 162 games played, really valuable offensive weapons. Kiner separated himself from the others because he gained notoriety from the 7 consecutive seasons he led the league in home runs, but they all seem to be cut from the same cloth.
 
How about Eric Davis, who for a season or two was about an exciting a player as I can remember watching?
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Eric Davis
 
1,000
28
93
93
35
74
3.6
-0.91
125
.269
.359
.482
1,626
Ray Lankford (10)
 
892
23
92
83
25
79
3.6
0.05
123
.272
.364
.477
1,701
Dale Murphy
 
881
30
89
94
12
73
3.5
-0.51
121
.265
.346
.469
2,180
Andrew McCutchen
 
876
25
97
86
18
85
4.2
-0.70
131
.280
.373
.476
1,761
Carlos Beltran
 
861
27
99
99
20
68
4.4
0.13
119
.279
.350
.486
2,586
Chili Davis
 
859
23
82
91
9
79
2.5
-0.94
121
.274
.360
.451
2,435
Ellis Burks
 
858
29
101
98
15
64
4.0
-0.54
126
.291
.363
.510
2,000
Grady Sizemore
 
846
22
97
76
21
71
4.1
0.06
115
.265
.349
.457
1,101
Fred Lynn
 
846
25
87
91
6
71
4.1
-0.26
129
.283
.360
.484
1,969
Bobby Murcer
 
843
21
83
89
11
73
2.7
-1.34
124
.277
.357
.445
1,908
Amos Otis
 
841
16
89
82
28
61
3.5
-0.31
115
.277
.343
.425
1,998
 
Outside of Lankford, it’s a completely different set of players from Davis’ traditional Similarity Score list. Davis’ top comps on the traditional scale are Kirk Gibson, Jeromy Burnitz, Darryl Strawberry, and Raul Mondesi, all of whom are pretty good comps, but none of whom were primarily center fielders.
 
Interesting to note that none of the comps are currently Hall of Famers. It’s a lot of players who exhibit 20-20 type of potential, but none of them has been able to reach Cooperstown, although Dale Murphy and Carlos Beltran are certainly in the discussion.
 
Davis is just a little bit shy of being the only player in history who would have a per-162 stat line with both 30 HR and 30 stolen bases (Fernando Tatis Jr. and Ronald Acuna Jr. both currently have that status, but of course they are quite early in their respective careers). Bobby Bonds is the closest at 29 HR and 40 stolen bases per 162.
 
Speaking of Darryl Strawberry:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Darryl Strawberry
 
1,000
34
92
102
23
84
4.3
-0.74
138
.259
.357
.505
1,583
Reggie Jackson
Y
934
32
89
98
13
79
4.2
-0.94
139
.262
.356
.490
2,820
Jose Canseco
 
893
40
102
121
17
78
3.6
-1.18
132
.266
.353
.515
1,887
Rocky Colavito
 
888
33
85
102
2
84
3.9
-0.40
132
.266
.359
.489
1,841
Kirk Gibson
 
873
25
98
86
28
71
3.8
-0.63
123
.268
.352
.463
1,635
Bob Allison
 
871
27
85
84
9
84
3.6
-0.56
127
.255
.358
.471
1,541
Jose Bautista (9)
 
865
31
92
88
6
93
3.3
-0.71
124
.247
.361
.475
1,798
David Justice
 
863
31
93
102
5
91
4.1
-0.26
129
.279
.378
.500
1,610
Jackie Jensen
 
860
22
91
105
16
84
3.1
-0.44
120
.279
.369
.460
1,438
Jack Clark
 
859
28
91
96
6
103
4.3
-1.04
137
.267
.379
.476
1,994
Roger Maris
 
853
30
91
94
2
72
4.2
-0.18
127
.260
.345
.476
1,463
 
Reggie Jackson is by far the best comp by this method, with pretty close figures across the board except that Strawberry had about twice the rate of stolen bases, although Jackson through age 30 was stealing at a rate of about 20 per 162 games, but then that fell way off as he aged. In any case, Strawberry and Reggie were very comparable through their 20’s, and they do show a very strong similarity on a per 162 basis, with Reggie ultimately playing nearly twice as many games.
 
Colavito’s another interesting comp with tight similarity to Strawberry across the board except for stolen bases, which represents the majority of the penalty points in the calculation of the score.
 
How about Mo Vaughn? Vaughn had a productive but relatively brief career.
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Mo Vaughn
 
1,000
35
92
114
3
78
2.9
-1.33
132
.293
.383
.523
1,512
Fred McGriff
 
931
32
89
102
5
86
3.5
-1.14
134
.284
.377
.509
2,460
Miguel Cabrera
 
929
31
94
113
2
75
4.3
-1.22
145
.310
.387
.532
2,587
David Ortiz
 Y
928
36
95
119
1
89
3.7
-1.41
141
.286
.380
.552
2,408
Carlos Delgado
 
927
38
99
120
1
88
3.5
-1.37
138
.280
.383
.546
2,035
Rafael Palmeiro
 
917
33
95
105
6
77
4.1
-0.61
132
.288
.371
.515
2,831
Hal Trosky (9)
 
914
27
100
122
3
66
3.6
-0.96
130
.302
.371
.522
1,347
Travis Hafner
 
912
29
85
100
2
82
3.4
-1.34
134
.273
.376
.498
1,183
Prince Fielder (1)
 
910
32
87
103
2
85
2.4
-2.06
134
.283
.382
.506
1,611
Jason Giambi
 
903
32
88
103
1
98
3.6
-1.41
139
.277
.399
.516
2,260
José Abreu
 
901
33
89
115
2
47
4.0
-1.09
135
.290
.350
.515
1,113
 
Lots of players with over 900 Similarity Scores. By traditional Similarity Scores, Vaughn’s top comps are Prince Fielder (who is still on the list above but drops to #8), Paul Goldschmidt (still active), Ted Kluszewski, Freddie Freeman (still active), and David Justice.   McGriff is a pretty good fit for Vaughn, but of course McGriff’s career was about 60% longer. Cabrera also generates a pretty high score, but I do think he’s distinctly better than what Vaughn produced, and, of course, had a much longer career as well. Vaughn had a nice stat line, and it was nice career while it lasted.
 
Before we looked at Campanella and Robinson, both of whom had their career stats affected by the Color Line and time spent in the Negro Leagues. How about looking at players who were primarily or even exclusively Negro Leaguers? 
 
Now, I’m sure some might argue that we can’t take the stats literally, and there’s certainly discussion to be had there. I know there are a lot of efforts going on in the realm of trying to translate the stats for Negro League players into Major League Equivalencies (MLE), and I was tempted to leverage the work that had been done in that area, but I think I’ll save that exercise for another time after I get more of a chance to digest them. Besides, if we start making adjustments for Negro League stats, don’t we kind of have to consider doing that for everyone? Didn’t players who played in the National League and American League during the existence of the Negro Leagues benefit stat-wise from not facing all of the best available talent?  Stats always reflect an intersection of performance, context, talent, circumstances, rules, ballparks, competition, and any number of other factors. They are never pure results. But, if you start adjusting some, I think you have to start adjusting for everyone.   I mean, does anyone think that Ty Cobb would produce a .366 career average in another era? Neither do I.
 
So, for now…..what if we simply take the Negro League stats on face value, but with the understanding that there are likely many things that influence them? What do they imply, and what kind of shape and comparisons do they invite? How about if we do that for now, and I will also take it as a "to-do" to follow up with MLE’s and see what kind of results those yield?
 
Here's the great Buck Leonard:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Buck Leonard
 
1,000
26
147
152
9
109
7.9
-0.39
181
.345
.450
.589
587
Lou Gehrig
Y
912
37
141
149
8
113
8.5
-0.67
179
.340
.447
.632
2,164
Jimmie Foxx
Y
839
37
122
134
6
102
6.5
-0.41
163
.325
.428
.609
2,317
Hank Greenberg
Y
819
38
122
148
7
99
6.4
-0.51
159
.313
.412
.605
1,394
Dan Brouthers
Y
790
10
148
126
25
81
7.7
-0.16
171
.342
.423
.520
1,676
Jeff Bagwell
Y
734
34
114
115
15
106
6.0
-0.54
149
.297
.408
.540
2,150
Johnny Mize
Y
712
31
96
115
2
74
6.1
-0.56
158
.312
.397
.562
1,884
Joey Votto
 
710
28
95
91
7
110
5.5
-0.47
148
.302
.416
.520
1,900
Todd Helton
 
706
27
101
101
3
96
4.5
-0.36
133
.316
.414
.539
2,247
Frank Thomas
Y
691
36
104
119
2
116
5.1
-1.57
156
.301
.419
.555
2,322
Paul Goldschmidt
 
668
31
104
102
15
92
5.6
-0.36
142
.293
.389
.521
1,469
 
*This is based on comps with 1,000 or more games. If I lower it to 500 or more, another Negro League great (Mule Suttles) pops in at #3.
 
Now, obviously Leonard’s traditional Similarity Score comps aren’t great examples because Leonard’s published career stats only capture 587 games, so he tends to be compared on the basis of career totals of players who have that same type of context.   As a result, it mostly captures other great Negro League stars like Cristobal Torriente, Bullet Rogan, Ben Taylor, Edgar Wesley, and Heavy Johnson, who were great players with similar number of career "official" games, but the primary AL/NL players it comes up with are Dale Alexander and Lefty O’Doul, who had short careers.
 
As you can see, Leonard’s Seasonal Notation Similarity Score comps yield a pretty star-studded list, and the comp with the highest score, by far, is none other than Lou Gehrig. Poetic justice, in my book, and in Leonard’s "book" as well. Leonard’s biography, which he wrote with James A. Riley, was titled "Buck Leonard: The Black Lou Gehrig: The Hall of Famer's Story in His Own Words", and comparisons to Gehrig were common while Leonard was active. The great Monte Irvin once commented that, had Leonard been allowed to play in MLB, then they might have referred to Gehrig as the white Buck Leonard instead of the other way around.
 
Now, again, I’m sure some might be reluctant to directly compare Negro League stats to NL or AL stats, and I understand that. You can draw your own conclusions, However, by any standard, Leonard was probably the greatest first baseman in Negro League history, and Gehrig was probably the greatest in either the NL or the AL. The bold type (league category leadership) on Leonard’s baseball reference page definitely reaches out and grabs you almost as much as Gehrig’s does. I think Leonard is probably more comparable to Gehrig (and vice versa) than anyone else in history, and that includes Gehrig’s traditional #1 comp, Jimmie Foxx
 
I think Leonard and Gehrig are both elite, and they deserve to be compared to each other. I suspect the consensus of most experts is that Gehrig had a little more home run power and was maybe a little better overall. Buck O’Neill commented that he though Gehrig had more home run power but thought that Leonard was better defensively. In his "Baseball 100" list, Joe Posnanski has Gehrig at #14, and Leonard at #53. Isolating it to just first basemen (and if we consider Stan Musial as an outfielder rather than a first baseman, which is what I would normally do), Posnanski has Gehrig #1 (#14 overall), Albert Pujols #2 (#23 overall), Jimmie Foxx #3 (#33 overall), and Leonard #4 (#53 overall). In the New Bill James Historical Abstract (which is now about 20 years old), Bill also had Gehrig at #14 overall with Leonard at #65, but he had more first basemen in between the two. His top first basemen were, in order, Gehrig, Foxx, Mark McGwire, Mule Suttles (who could also be considered an outfielder) Jeff Bagwell, Eddie Murray, Johnny Mize, Harmon Killebrew, and then Leonard. So, he essentially had Leonard around #8 or #9 among first basemen.
 
By the way, I tried using the Major League Equivalent figures (MLE’s) that I alluded to earlier on Leonard, and if use those, his #1 Seasonal Notation comp would be Will Clark. Clark’s a great player, but I think Leonard was probably better than Clark.
 
How about Cool Papa Bell?
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Cool Papa Bell
 
1,000
8
155
80
39
72
4.3
-0.51
126
.325
.395
.446
1,199
Jimmy Ryan
 
889
9
132
88
34
65
3.5
-0.81
124
.308
.375
.444
2,014
George Van Haltren
 
884
6
134
83
47
71
3.2
-0.92
122
.316
.386
.418
1,990
Hugh Duffy
Y
862
10
145
121
54
62
4.0
-0.23
123
.326
.386
.451
1,737
Mike Griffin
 
859
4
151
77
51
87
4.4
-0.09
123
.296
.388
.407
1,513
Earle Combs
Y
853
6
132
70
11
75
5.0
-0.30
125
.325
.397
.462
1,455
Pete Browning
 
851
6
131
90
35
64
5.6
-0.79
163
.341
.403
.467
1,183
George Gore
 
848
6
164
76
21
89
4.9
-0.49
136
.301
.386
.411
1,310
Mike Donlin
 
844
8
103
84
33
48
4.5
-0.86
144
.333
.386
.468
1,049
Ben Chapman
 
826
8
108
92
27
78
4.0
0.05
114
.302
.383
.440
1,717
Edd Roush
Y
822
6
91
81
22
40
3.8
-0.50
126
.323
.369
.446
1,967
 
I’m not real satisfied with that list, because it’s heavily dominated by pre-1900 players like Ryan, Van Haltren, Duffy, Griffin, Browning, and Gore, and a lot of that is driven by those players’ stolen base figures that got a boost from how stolen bases were defined for several years in that era (which included things like taking an extra base on a single, for example). Players from that era also have the advantage of being in a high-scoring environment, which enables them to be better "comps" to Bell’s rather striking per-162 game figure of 155 runs scored. 
 
So, this is a good case where I think some intervention would be a good idea. How about if we limit it to players whose careers were from 1901 or later?
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Cool Papa Bell
 
1,000
8
155
80
39
72
4.3
-0.51
126
.325
.395
.446
1,199
Earle Combs
Y
853
6
132
70
11
75
5.0
-0.30
125
.325
.397
.462
1,455
Ben Chapman
 
826
8
108
92
27
78
4.0
0.05
114
.302
.383
.440
1,717
Edd Roush
Y
822
6
91
81
22
40
3.8
-0.50
126
.323
.369
.446
1,967
Cesar Cedeno
 
791
16
88
79
44
54
4.3
-0.35
123
.285
.347
.443
2,006
Brett Butler
 
772
4
99
42
41
83
3.6
-0.45
110
.290
.377
.376
2,213
Johnny Damon
 
767
15
109
74
27
65
3.7
-0.13
104
.284
.352
.433
2,490
Kenny Lofton
 
756
10
118
60
48
73
5.3
1.19
107
.299
.372
.423
2,103
Dom DiMaggio
 
754
10
121
72
12
87
3.9
0.35
111
.298
.383
.419
1,399
Lenny Dykstra
 
753
10
102
51
36
81
5.4
0.90
120
.285
.375
.419
1,278
Amos Otis
 
748
16
89
82
28
61
3.5
-0.31
115
.277
.343
.425
1,998
 
That may feel a little better, but it also results in generally lower scores. Bell’s rate of 155 runs per 162 games is obviously a tough comparison point that results in a pretty hefty penalty for most of these players who can’t approach that level. 
 
Combs is a pretty good match at face value in most categories except for stolen bases, although Combs reportedly had excellent speed himself.   Of course, Bell’s speed was, by most accounts more in the "elite" realm as opposed to just merely "excellent". Combs reportedly was a pretty proficient base stealer at AA Louisville, but Combs didn’t really translate his speed into stolen bases at the MLB level, which, if you’re hitting at the top of the order ahead of players like Ruth and Gehrig, would make sense. Miller Huggins essentially instructed him to get on base and let the big guys hit him in, and I can’t say as I blame him.
 
By the way, if I don’t control for position (which also means eliminating dWAR and recalculating penalties and scores) and I only take players from 1901 or later, the top 3 comps for Bell are Ross Youngs, Kiki Cuyler, and Paul Molitor.
 
One more note…..like Leonard, I went ahead and ran Bell using the MLE’s on him, and his #1 comp was Edd Roush (908), who played roughly 10 years prior to Bell and who is also the #4 comp on Bell’s list displayed above.   So….what do you think of Cool Papa Bell and Edd Roush as comparable? Maybe….
 
Anyway, in Bell’s case, I’m not sure which list yields the better "similarity". Is it the group dominated by the pre-1900 players? Or the more modern one? I’m not sure.
 
I will say that some of the truly elite Negro League players (thinking in particular of Josh Gibson and Oscar Charleston) present a challenge because of their very unique and high level nature of their stats, but that can also be true of traditional career-based Similarity Scores. For example, the "most similar" traditional Similarity Score comp for Pete Rose is Paul Molitor at a measly 678, and the closest to Cy Young is Walter Johnson at 703. Sometimes, there are no great comps.
 
Gibson’s top Seasonal Notation comp is Mike Piazza, but the Similarity Score is a ridiculously low 418, but then again, Gibson’s top traditional Similarity Score comp is another Negro League star catcher (Biz Mackey), and that score is under 700. Gibson just simply doesn’t have anyone who compares very closely unless his stats undergo a severe adjustment.  His stats per 162 are simply off the charts.
 
Charleston’s top Seasonal Notation comp based on 1,000 or more games is Ty Cobb, but it’s also a pretty low score (732). Charleston’s top 2 comps if we lower the threshold to 500 or more games are 2 great Negro League center fielders, Turkey Stearnes (Score of 850) and Cristóbal Torriente (747)
 
So, I think there are certainly some things we can glean from putting Negro League stars through this mechanism to see how players compare at face value of the stats on a per 162 basis, but there might have to be some additional work done on applying some adjustments to make the comparisons even more meaningful. 
 
A few other quick ones for a few of the big-name Negro League stars, focusing on the #1 comps only, and to be honest, I’m only going to give their "face value" #1 comps, because I think the #1 "
MLE" comps don’t do them justice.
 
The # 1 comp for Biz Mackey is Bill Dickey (890)
The # 1 comp for Mule Suttles is Hank Greenberg (864)
The # 1 comp for Judy Johnson is Pie Traynor (910)
The # 1 comp for Ray Dandridge is also Pie Traynor (960)
The # 1 comp for Cristóbal Torriente is Tris Speaker (881)
The # 1 comp for John Henry Lloyd is Arky Vaughan (802) (note: I made Lloyd’s primary position shortstop even though his official data classifies him with more games at second base).
The # 1 comp for Turkey Stearnes is Joe DiMaggio (795)
The # 1 comp for Martín Dihigo is Alex Rodriguez (917)
(Dihigo is listed in the database as a SS. If I ignore position and dWAR, other top comps include Ken Williams, Larry Walker, and Earl Averill).
The # 1 comp for Monte Irvin is Carl Yastrzemski (909)
(note that Irvin’s stats combine both his time in the Negro Leagues as well as his years in the National League)
 
How about an active player? Again, the big caveat here is that an active player’s rate stats and per-162 game figures won’t necessarily hold up once his complete career is in the books, but it’s still fun to compare.
 
Let’s try Freddie Freeman:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Freddie Freeman
 
1,000
28
100
97
5
80
4.5
-0.83
138
.295
.384
.509
1,565
Will Clark
 
966
23
97
99
5
77
4.6
-0.83
137
.303
.384
.497
1,976
Rafael Palmeiro
 
937
33
95
105
6
77
4.1
-0.61
132
.288
.371
.515
2,831
Fred McGriff
 
925
32
89
102
5
86
3.5
-1.14
134
.284
.377
.509
2,460
Dolph Camilli
 
918
26
102
103
7
103
4.7
-0.54
136
.277
.388
.492
1,490
Hal Trosky (10)
 
908
27
100
122
3
66
3.6
-0.96
130
.302
.371
.522
1,347
Norm Cash
 
908
29
81
86
3
81
4.0
-0.71
139
.271
.374
.488
2,089
Kent Hrbek
 
906
27
84
101
3
78
3.6
-0.71
128
.282
.367
.481
1,747
Miguel Cabrera
 
904
31
94
113
2
75
4.3
-1.22
145
.310
.387
.532
2,587
Eddie Murray
Y
901
27
87
103
6
71
3.7
-0.62
129
.287
.359
.476
3,026
Mo Vaughn (8)
 
900
35
92
114
3
78
2.9
-1.33
132
.293
.383
.523
1,512
 
Earlier, I mentioned that Don Mattingly/Ripper Collins had an extremely high Similarity Score of 965, but Freeman and Will Clark nose them out by 1 point, although the similarity will probably start to deteriorate some over the balance of Freeman’s career. Freeman’s got a few more home runs, but every other category is really close.
 
By the way, does Freeman seem on track to the Hall of Fame? I’m not sure. He’s making good progress, picking up markers here and there with an MVP and a World Series ring, but his MVP was in an abbreviated season, and I kind of get the sense that he may not be realizing enough "sizzle", for lack of a better word. He’s tracking pretty well to the stat line of Eddie Murray through age 31, although Murray’s WAR through the same age was about 10 higher than Freeman’s (Murray didn’t have an MVP, but he did finish 2nd twice). Murray was ahead, but not by all that much, and then Murray tacked on about 200 more home runs and about 1,400 more hits from that point forward to finish over 500 homers and 3,000 hits. If Freeman replicates that kind of bulk from age 32 on, he’ll be in good shape, but of course that remains to be seen. I think he’s making good progress, but he’s not at "lock" status yet.
 
How about one more active player? Here’s Jose Altuve:
 
Name
HOF?
Score
HR-SN
R-SN
RBI-SN
SB-SN
BB-SN
WAR-SN
dWAR-SN
OPS+
BA
OBP
Slug
Games
Jose Altuve
 
1,000
18
100
72
29
50
4.7
0.06
125
.308
.360
.462
1,437
Roberto Alomar
Y
912
14
103
77
32
70
4.6
0.22
116
.300
.371
.443
2,379
Larry Doyle
 
888
7
88
73
27
57
4.1
-0.20
125
.290
.357
.408
1,766
Craig Biggio
Y
877
17
105
67
24
66
3.7
-0.16
112
.281
.363
.433
2,850
Ryne Sandberg
Y
875
21
99
79
26
57
5.1
1.01
114
.285
.344
.452
2,164
Rod Carew
Y
847
6
93
67
23
67
5.3
-0.11
131
.328
.393
.429
2,469
George Grantham
 
847
12
102
80
15
80
3.7
-0.24
122
.302
.392
.461
1,444
Frankie Frisch
Y
847
7
107
87
29
51
5.0
1.51
110
.316
.369
.432
2,311
Ray Durham
 
844
16
102
72
22
67
2.8
-0.42
104
.277
.352
.436
1,975
Hardy Richardson
 
843
9
137
101
25
46
5.0
0.36
131
.299
.344
.437
1,334
Julio Franco
 
842
11
82
77
18
59
2.8
-0.18
111
.298
.365
.417
2,527
 
Altuve is only 31 and had a nice bounce-back season in 2021 after a rough 2020 that was not just pandemic-shortened but also was his first season dealing with the effects of the sign-stealing controversy. But, all things considered, Altuve lines up pretty well with Alomar at this point, and it’s my opinion that he’s still well on a track to the Hall of Fame despite the controversy, especially if he continues to perform well.
 
Wrapping it Up
 
Well, I could go on forever with examples, but I’m sure you get the idea. I haven’t tried coming up with something similar for pitchers yet, but may go that route if it seems fruitful. 
 
If you have any players you’d like to see run through this methodology, please submit them in the comments and I’d be happy to share the results. I should be able to produce one for any position player you suggest.
 
Thank you for reading,
 
Dan
 
 
 
 

COMMENTS (16 Comments, most recent shown first)

OwenH
Fantastic article, Dan. I always enjoy your pieces and posts here, and this is one of your best.
4:05 PM Feb 28th
 
Manushfan
Hrmmm John Stone! I remember him. And Greenwell yeah that actually makes sense too.
11:51 AM Feb 28th
 
mpiafsky
This is pretty great. I'm also a big fan of similarity scores, but they're not quite useful. Your lists are clearly superior-- at least for upper echelon players and presumably journeymen as well. Thank you.
10:14 AM Feb 28th
 
3for3
Ha, mine only double posted!

As far as an 'accelerator', I think anything I'd come up with would make the numbers much different from traditional sim scores. Perhaps something like 1 point for the first difference, then 1.03, 1.06, 1.09 etc. You could then adjust the values so that they start out smaller. Probably something a spreadsheet could handle.
10:09 AM Feb 28th
 
DMBBHF
Sorry for the multiple posts, guys....My system was acting goofy and posted my prior reply 3 times.
9:09 AM Feb 28th
 
malbuff
Excellent study. This one will be a lot of fun to play with.
9:08 AM Feb 28th
 
DMBBHF
Thanks for all the comments, guys...

Terry,

One of the things I built into the model was the option to change the variables easily. When I remove the position requirement (which also removes dWAR) and I remove any penalties for stolen bases (but keep the other category penalties), Jackie's top 10 looks like this:

Arky Vaughan
Mickey Cochrane
Charlie Gehringer
Christian Yelich
Minnie Minoso
Paul Waner
Wade Boggs
Rusty Greer
Bobby Abreu
John Olerud

Kind of weird mix, but that's what it came up with. You're right, a pure statistical comparison is a bit of a challenge for Jackie.

By the way, when I plug in Rickey Henderson, by far the closest comp is Tim Raines with an 808 score (Raines is #9 by traditional Similarity Scores).

Manushfan,

Here are the top 10 for Heinie:

John Stone 915 (Stone and Manush were teammates for a couple of years)
Riggs Stephenson 912
Zack Wheat 903
Chick Hafey 897
Bibb Falk 896
Mike Greenwell 882
Joe Medwick 880
Bobby Veach 875
Bob Fothergill 874
Irish Meusel 872

Wheat and Medwick are also on Manush's traditional top 10 list.

3for3,

Yes, that's a good idea regarding how to penalize successive differences and handling big gaps. I'd be interested to hear if you some specifics on how to try to capture that, and maybe I can incorporate it?

Thanks all!
Dan

9:07 AM Feb 28th
 
DMBBHF
Thanks for all the comments, guys...

Terry,

One of the things I built into the model was the option to change the variables easily. When I remove the position requirement (which also removes dWAR) and I remove any penalties for stolen bases (but keep the other category penalties), Jackie's top 10 looks like this:

Arky Vaughan
Mickey Cochrane
Charlie Gehringer
Christian Yelich
Minnie Minoso
Paul Waner
Wade Boggs
Rusty Greer
Bobby Abreu
John Olerud

Kind of weird mix, but that's what it came up with. You're right, a pure statistical comparison is a bit of a challenge for Jackie.

By the way, when I plug in Rickey Henderson, by far the closest comp is Tim Raines with an 808 score (Raines is #9 by traditional Similarity Scores).

Manushfan,

Here are the top 10 for Heinie:

John Stone 915 (Stone and Manush were teammates for a couple of years)
Riggs Stephenson 912
Zack Wheat 903
Chick Hafey 897
Bibb Falk 896
Mike Greenwell 882
Joe Medwick 880
Bobby Veach 875
Bob Fothergill 874
Irish Meusel 872

Wheat and Medwick are also on Manush's traditional top 10 list.

3for3,

Yes, that's a good idea regarding how to penalize successive differences and handling big gaps. I'd be interested to hear if you some specifics on how to try to capture that, and maybe I can incorporate it?

Thanks all!
Dan

9:06 AM Feb 28th
 
DMBBHF
Thanks for all the comments, guys...

Terry,

One of the things I built into the model was the option to change the variables easily. When I remove the position requirement (which also removes dWAR) and I remove any penalties for stolen bases (but keep the other category penalties), Jackie's top 10 looks like this:

Arky Vaughan
Mickey Cochrane
Charlie Gehringer
Christian Yelich
Minnie Minoso
Paul Waner
Wade Boggs
Rusty Greer
Bobby Abreu
John Olerud

Kind of weird mix, but that's what it came up with. You're right, a pure statistical comparison is a bit of a challenge for Jackie.

By the way, when I plug in Rickey Henderson, by far the closest comp is Tim Raines with an 808 score (Raines is #9 by traditional Similarity Scores).

Manushfan,

Here are the top 10 for Heinie:

John Stone 915 (Stone and Manush were teammates for a couple of years)
Riggs Stephenson 912
Zack Wheat 903
Chick Hafey 897
Bibb Falk 896
Mike Greenwell 882
Joe Medwick 880
Bobby Veach 875
Bob Fothergill 874
Irish Meusel 872

Wheat and Medwick are also on Manush's traditional top 10 list.

3for3,

Yes, that's a good idea regarding how to penalize successive differences and handling big gaps. I'd be interested to hear if you some specifics on how to try to capture that, and maybe I can incorporate it?

Thanks all!
Dan

9:06 AM Feb 28th
 
ventboys
That's a lot to unpack, but just looking at one guy ... what would you get if you took Jackie Robinson and removed stolen bases? I suspect his closest historical comps should be players who, in their respective eras, probably stolen way more bases.

I think the cloest player to Jackie as a player -- skillsetwise -- was probably Rickey Henderson. But I don't know if that's ever going to sift out of a purely statistical comparison.

Good stuff!
12:48 AM Feb 28th
 
tigerlily
Thanks Dan. I think this is another legitimate way to put player's careers into context.
9:51 PM Feb 27th
 
FrankD
Great study. I believe you have a better similarity measurement in that yours passes the 'eye test'. This is important in that although a math/model results in some outputs, one of the first tests of these outputs is 'does this really make sense'.
5:12 PM Feb 27th
 
Manushfan
I like this! Be interesting to see how Mr Manush fares in this.
12:33 PM Feb 27th
 
3for3
Love this. I was really hoping to see Eric Davis here. Thanks for adding him. Some of his comparables are less than satisfying, but he is a fairly unique player, so getting 10 matches means you have to dig down deeper.

One thing I always though the similarity score missed was when there was a big gap in just one stat, that didn't matter as long as the rest look good.

An example for Eric Davis, is his cousin, Chili. Eric stole 35 bases, while Chili only stole 9. Fred Lynn as well.

If I was ever to try a project like this, I'd have each successive difference count more. I realize that might make the method too complicated, but I think it would get players who are more similar in all aspects of the game on the list.
11:04 AM Feb 27th
 
3for3
Love this. I was really hoping to see Eric Davis here. Thanks for adding him. Some of his comparables are less than satisfying, but he is a fairly unique player, so getting 10 matches means you have to dig down deeper.

One thing I always though the similarity score missed was when there was a big gap in just one stat, that didn't matter as long as the rest look good.

An example for Eric Davis, is his cousin, Chili. Eric stole 35 bases, while Chili only stole 9. Fred Lynn as well.

If I was ever to try a project like this, I'd have each successive difference count more. I realize that might make the method too complicated, but I think it would get players who are more similar in all aspects of the game on the list.
11:04 AM Feb 27th
 
LoradoTaftWright
Interesting idea, and I think the NS* comps are overall closer than the traditional ones.

I think Frankie Frisch is the most Robinson-like of the new comps, far more so than The Mechanical Man (who was a great player though). Not in a really quantifiable manner though.

* New System
8:35 PM Feb 26th
 
 
©2024 Be Jolly, Inc. All Rights Reserved.|Powered by Sports Info Solutions|Terms & Conditions|Privacy Policy